POST /api/v1/ai-sessions/{call_id}/control is the single endpoint
your voice AI uses to drive a live call. The JSON body’s event
field is the discriminator — there are 25 verbs across three groups:
call-state primitives (13), conferencing + supervisor (9), and
lead / context (3).
Authentication
| Credential | When to use | Scope |
|---|---|---|
Per-call callback token (yt_cb_<26 ULID>) | Production — your voice AI gets one in the WS metadata frame, valid 30 min, bound to one call_id | voice_agent:control |
| Tenant API key | Admin / dev / testing — explicit opt-in only | ai_sessions:control_admin |
token_refresh text frame on the WS. Agents that ignore the refresh
lose control-verb access at the 30-min mark; the call continues
without interruption.
Request envelope
- Tenant fence.
call_idmust belong to the token’s tenant. Cross-tenant access returns 403. - Rate limit. 10 req/sec per
call_id. - Idempotency. The Yotel SDKs auto-generate an
Idempotency-Keyfor state-changing verbs. Replays of the same(call_id, key)pair return the cached response. See Idempotency.
Response envelope
result carries verb-specific data (e.g. {"conference_id": "..."})
for verbs that have a return; empty {} otherwise.
Group 1 — Call-state primitives (13)
Operate on the AI’s own call leg.transfer
Bridge the caller to an agent queue, an E.164 number, or a SIP URI.
The AI’s leg drops; outcome ← transferred_*. Fires
ai_session.transferred + ai_session.ended.
Python
hangup
End the call. outcome ← hangup; fires ai_session.ended.
TypeScript
log
Append a structured entry to ai_sessions.metadata.logs[]. No
state effect, no webhook — useful for breadcrumbs in the audit log.
Python
mute / unmute
Mute one party for the rest of the call.
target is "caller" (default) or "agent" (the AI’s leg).
Python
hold / unhold
Place the caller on hold; optional moh_url for music. unhold
clears the state.
TypeScript
send_dtmf
Inject DTMF digits into the call (e.g. when navigating an external
IVR after a transfer attempt).
Python
play_audio
Play either a pre-recorded URL or TTS text — exactly one of
audio_url / tts_text must be set. See the TTS limitation below.
Python
set_disposition
Save a disposition code on the call. source is recorded as "ai".
Python
recording_pause / recording_resume
Stop/start recording mid-call (e.g. PCI compliance during card
capture). Optional reason is stored on the audit log.
Python
get_call_state
Read-only snapshot of current state. No idempotency key needed.
Group 2 — Conferencing + supervisor (9)
Add other participants to the call, monitor or barge into another call leg, or escalate to a human supervisor.conference_start
Promote the AI’s two-leg call into a conference. The AI stays in.
Fires ai_session.conference_changed.
Python
conference_add
Dial a participant into the conference. participant_type is one of
e164 | sip_uri | agent | supervisor.
Python
conference_remove
Kick a participant by member_id. member_id comes from the
participants[] array on ai_session.conference_changed.
conference_leave
The AI’s leg drops; the rest of the conference continues.
request_supervisor
Enqueue a supervisor escalation. UI alert + WebSocket push to
on-duty supervisors. Fires ai_session.escalated twice: once on
invocation (supervisor_id null), once on claim.
Python
whisper
One-way audio injection into another call leg (typically an agent’s
ear during a coaching session). Requires target_call_id and an
existing monitor or conference relationship — bare invocation on an
unrelated call returns 403.
Python
barge
Promote from monitor-only to a full participant in target_call_id.
Both sides hear the AI.
monitor_start / monitor_stop
Open or close a one-way audio fork from a target call into the AI’s
WS. mode is listen (default) or listen_and_whisper.
Python
Group 3 — Lead / context (3)
Persist data back to the lead row or schedule follow-up work.set_lead_field
Patch one custom field on the lead’s custom_fields JSONB.
Python
set_lead_status
Override the lead status. Recorded as status_set_by='ai'.
Python
schedule_callback
Insert a callbacks row. The campaign engine picks it up at
scheduled_at. Optional voice_agent_id to route the callback to a
different agent (e.g. escalate Tier-1 → Tier-2).
Python
Idempotency
Idempotency-Key is honored on every state-changing verb (read-only
log and get_call_state ignore it). The Python and TypeScript SDKs
auto-generate a UUID v4 per call so retries are safe by default;
override with idempotency_key=... (Python) / idempotencyKey: ...
(TS) when you want cross-process dedup.
| Class of verb | Behaviour |
|---|---|
transfer, hangup, conference_* | Strongly idempotent — second invocation hits a state-machine 409, but a replay with the same key returns the cached 200. |
log, set_lead_field, set_disposition, set_lead_status | Last-write-wins; idempotency key just suppresses retry storms. |
play_audio, whisper, send_dtmf | Without a key, replays replay-the-action. Key dedupes within a 60s window. |
(call_id, idempotency_key).
TTS not yet supported
play_audio and whisper accept a tts_text field in the v1
protocol, but the dispatcher currently returns 422 when that
shape is sent — TTS rendering isn’t wired in v1. Use audio_url
with a pre-rendered WAV/MP3 for now. TTS support is tracked for
v1.1.
Error reference
| HTTP | SDK class (Python / TS) | Typical cause |
|---|---|---|
| 400 | ValidationError / ValidationError | Body schema invalid (e.g. both audio_url and tts_text set, missing target_call_id) |
| 401 | AuthenticationError / AuthenticationError | Callback token expired or bad signature |
| 403 | PermissionDenied / PermissionDenied | Tenant fence (call_id ≠ token tenant), missing scope, or target_call_id lacks monitor/conference relationship |
| 404 | NotFoundError / NotFoundError | Call doesn’t exist |
| 409 | ConflictError / ConflictError | Call already terminated, conference state mismatch, or voice_agent in use during delete |
| 422 | ValidationError | TTS not wired (use audio_url) |
| 424 | NoVoiceAgentConfigured / NoVoiceAgentConfigured | Override hierarchy resolved no agent — set a tenant default or pin per campaign/flow |
| 429 | RateLimitedError / RateLimitedError | 10 req/sec/call_id exceeded, or max_concurrent cap on the agent hit. retry_after_s returned |
| 5xx | ServerError / ServerError | Yotel-internal — request_id returned for support |
See also
- Voice agents quickstart — set up the routing alias before invoking control verbs.
- AI session webhook events — async events that fire as a side-effect of these verbs.
- Authentication & scopes — full scope
catalogue including
voice_agents:*andai_sessions:*.

