voice_agent row is a tenant-scoped routing alias — name + WS URL +
auth + audio config. It carries no behaviour. Behaviour lives entirely
in your AI service.
This page covers tenant-side wiring. For mid-call control verbs see
Control API; for lifecycle webhooks see
AI session events.
Prerequisites
- A Yotel tenant API key with
voice_agents:writescope (scopes reference). - A voice AI service running at a public
wss://...URL that implements v1 of the audio-fork protocol. Your AI sees raw G.711 from the caller and writes G.711 back; Yotel proxies between SIP and the WS for you. - Optional: a static
auth_tokenthat Yotel will send on the WS upgrade asAuthorization: Bearer <token>. Use it so your AI rejects connections that aren’t from Yotel.
Step 1 — Create a voice agent
max_concurrent is enforced by Redis-backed counters; the campaign
engine + flow executor reject new calls with 429 max_concurrent
once the cap is hit.
Optional: set as tenant default
Defaults are the bottom of the override hierarchy (flow node > campaign > tenant default > lazy-create fromtenants.voice_agent_default_ws_url). If you’ll always use the
same agent, set it once and stop wiring per-campaign:
Python
Quick start: tenant-level default URL
For single-agent tenants who don’t want to managevoice_agents rows
explicitly, set tenants.voice_agent_default_ws_url once. The first
campaign call lazy-creates a voice_agents row from that URL and
pins it as the tenant default. Subsequent updates to the tenant
column do not mutate the existing row — once created, the
voice_agents row is the source of truth.
The auto-created row uses sensible defaults (no Bearer auth, max
concurrent = 100, drachtio subprotocol, conservative 30-minute
token lifetime). Tenants who need different values can update the
row via the standard voice_agents.update() API after bootstrap.
Optional: pin token lifetime or subprotocol
Two per-agent knobs from migration 032:token_lifetime—'30min_with_refresh'(default) or'ws_session'. The first keeps the §11.3 refresh-at-25-minutes contract; the second mints a +24h-exp token that’s auto-revoked when the WS closes. Pick'ws_session'if your AI service can’t handle thetoken_refreshtext frame.subprotocol—'audio.drachtio.org'(default) or'audio.jambonz.org'. Pinned per-agent and validated at originate against the FreeSWITCH host’sMOD_AUDIO_FORK_SUBPROTOCOL_NAME; a mismatch raises 424 fail-fast rather than letting the WS upgrade fail later.
Python
TypeScript
Step 2 — Attach to an outbound campaign
Setpost_answer_action="connect_voice_agent" and either pin a
voice_agent_id on the campaign or rely on the tenant default.
connect_voice_agent extension, opens the audio fork to the
resolved ws_url, and starts streaming.
Step 3 — Or attach to a flow node (inbound DID calls)
For inbound calls handled by the visual flow builder, drop aconnect_voice_agent node into your flow and pin a voice_agent_id
on its data block. Inbound calls arriving at that node hand off to
the AI exactly like an outbound campaign call.
connect_voice_agent nodes without a
voice_agent_id, so misconfiguration fails fast on flow save —
not at call time.
Step 4 — Run a call
Outbound: start the campaign as usual (POST /campaigns/{id}/start).
Each lead is dialled, DND-scrubbed, answered, and forked to your
WS. You’ll see call.started → call.answered → ai_session.started
on your webhook endpoint within a few seconds.
Inbound: dial the DID associated with the flow. Yotel runs the
flow, hits the connect_voice_agent node, and connects the audio.
Stereo recording is enabled automatically (L=caller, R=AI); the
call.recording_ready webhook will arrive with stereo: true once
the WAV is uploaded.
What happens on the wire
call_id, tenant_id, voice_agent_id, lead context, and a
yotel_callback_token (a 30-min JWT bound to that call). Use the
callback token to invoke control verbs back into Yotel.
Next steps
- Control API reference — all 25 verbs your AI can invoke mid-call (transfer, hangup, hold, conference, request_supervisor, schedule_callback, …).
- AI session webhook events — the 5 new
ai_session.*events plus the stereo flag oncall.recording_ready. - v1 audio-fork protocol spec — wire format your WebSocket implementation must conform to.

