Skip to main content
Yotel is AI-Agent-ready telephony. The platform handles SIP, media, recording, supervision, dispositions, and webhooks; a voice AI service you control speaks the v1 audio-fork protocol on a WebSocket and drives the conversation. A voice_agent row is a tenant-scoped routing alias — name + WS URL + auth + audio config. It carries no behaviour. Behaviour lives entirely in your AI service.
This page covers tenant-side wiring. For mid-call control verbs see Control API; for lifecycle webhooks see AI session events.

Prerequisites

  • A Yotel tenant API key with voice_agents:write scope (scopes reference).
  • A voice AI service running at a public wss://... URL that implements v1 of the audio-fork protocol. Your AI sees raw G.711 from the caller and writes G.711 back; Yotel proxies between SIP and the WS for you.
  • Optional: a static auth_token that Yotel will send on the WS upgrade as Authorization: Bearer <token>. Use it so your AI rejects connections that aren’t from Yotel.

Step 1 — Create a voice agent

curl -X POST https://api.yotel.in/api/v1/voice-agents \
  -H "Authorization: Bearer $YOTEL_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Sales Bot",
    "ws_url": "wss://ai.example.com/yotel/audio-fork",
    "auth_token": "shared-secret-rotates-monthly",
    "audio_format": {"sample_rate": 8000, "encoding": "pcma"},
    "max_concurrent": 25
  }'
max_concurrent is enforced by Redis-backed counters; the campaign engine + flow executor reject new calls with 429 max_concurrent once the cap is hit.

Optional: set as tenant default

Defaults are the bottom of the override hierarchy (flow node > campaign > tenant default > lazy-create from tenants.voice_agent_default_ws_url). If you’ll always use the same agent, set it once and stop wiring per-campaign:
Python
client.voice_agents.set_as_default(agent.id)
If none of the four steps resolves, AI calls fail with 424 NoVoiceAgentConfigured.

Quick start: tenant-level default URL

For single-agent tenants who don’t want to manage voice_agents rows explicitly, set tenants.voice_agent_default_ws_url once. The first campaign call lazy-creates a voice_agents row from that URL and pins it as the tenant default. Subsequent updates to the tenant column do not mutate the existing row — once created, the voice_agents row is the source of truth. The auto-created row uses sensible defaults (no Bearer auth, max concurrent = 100, drachtio subprotocol, conservative 30-minute token lifetime). Tenants who need different values can update the row via the standard voice_agents.update() API after bootstrap.

Optional: pin token lifetime or subprotocol

Two per-agent knobs from migration 032:
  • token_lifetime'30min_with_refresh' (default) or 'ws_session'. The first keeps the §11.3 refresh-at-25-minutes contract; the second mints a +24h-exp token that’s auto-revoked when the WS closes. Pick 'ws_session' if your AI service can’t handle the token_refresh text frame.
  • subprotocol'audio.drachtio.org' (default) or 'audio.jambonz.org'. Pinned per-agent and validated at originate against the FreeSWITCH host’s MOD_AUDIO_FORK_SUBPROTOCOL_NAME; a mismatch raises 424 fail-fast rather than letting the WS upgrade fail later.
Python
agent = client.voice_agents.create(
    name="Sales Bot",
    ws_url="wss://ai.example.com/yotel/audio-fork",
    token_lifetime="ws_session",        # no refresh frames
    subprotocol="audio.jambonz.org",    # binary L16 audio
)
TypeScript
const agent = await client.voiceAgents.create({
  name: "Sales Bot",
  ws_url: "wss://ai.example.com/yotel/audio-fork",
  token_lifetime: "ws_session",
  subprotocol: "audio.jambonz.org",
});

Step 2 — Attach to an outbound campaign

Set post_answer_action="connect_voice_agent" and either pin a voice_agent_id on the campaign or rely on the tenant default.
curl -X PATCH https://api.yotel.in/api/v1/campaigns/$CID \
  -H "Authorization: Bearer $YOTEL_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "post_answer_action": "connect_voice_agent",
    "voice_agent_id": "'$AGENT_ID'"
  }'
When the campaign dials a lead and the callee answers, FreeSWITCH runs the connect_voice_agent extension, opens the audio fork to the resolved ws_url, and starts streaming.

Step 3 — Or attach to a flow node (inbound DID calls)

For inbound calls handled by the visual flow builder, drop a connect_voice_agent node into your flow and pin a voice_agent_id on its data block. Inbound calls arriving at that node hand off to the AI exactly like an outbound campaign call.
{
  "id": "node-3",
  "type": "connect_voice_agent",
  "data": {
    "voice_agent_id": "<agent-uuid>"
  }
}
The flow validator rejects connect_voice_agent nodes without a voice_agent_id, so misconfiguration fails fast on flow save — not at call time.

Step 4 — Run a call

Outbound: start the campaign as usual (POST /campaigns/{id}/start). Each lead is dialled, DND-scrubbed, answered, and forked to your WS. You’ll see call.startedcall.answeredai_session.started on your webhook endpoint within a few seconds. Inbound: dial the DID associated with the flow. Yotel runs the flow, hits the connect_voice_agent node, and connects the audio. Stereo recording is enabled automatically (L=caller, R=AI); the call.recording_ready webhook will arrive with stereo: true once the WAV is uploaded.

What happens on the wire

Yotel originate ─▶ FreeSWITCH dials lead ─▶ caller answers

                          │ uuid_audio_fork wss://your-agent/...

                  ┌──────────────────┐
                  │  Your voice AI   │
                  │  (v1 protocol)   │
                  └──────────────────┘

                          │ POST /api/v1/ai-sessions/{call_id}/control

                Yotel control endpoint  ─▶  webhooks fire
Your AI receives the metadata frame on connect — it carries call_id, tenant_id, voice_agent_id, lead context, and a yotel_callback_token (a 30-min JWT bound to that call). Use the callback token to invoke control verbs back into Yotel.

Next steps