Voice agents quickstart

Yotel is AI-Agent-ready telephony. The platform handles SIP, media, recording, supervision, dispositions, and webhooks; a voice AI service you control speaks the v1 audio-fork protocol on a WebSocket and drives the conversation. A voice_agent row is a tenant-scoped routing alias — name + WS URL + auth + audio config. It carries no behaviour. Behaviour lives entirely in your AI service.

This page covers tenant-side wiring. For mid-call control verbs see Control API; for lifecycle webhooks see AI session events.

Prerequisites

A Yotel tenant API key with voice_agents:write scope (scopes reference).
A voice AI service running at a public wss://... URL that implements v1 of the audio-fork protocol. Your AI sees raw G.711 from the caller and writes G.711 back; Yotel proxies between SIP and the WS for you.
Optional: a static auth_token that Yotel will send on the WS upgrade as Authorization: Bearer <token>. Use it so your AI rejects connections that aren’t from Yotel.

Step 1 — Create a voice agent

curl -X POST https://api.yotel.in/api/v1/voice-agents \
  -H "Authorization: Bearer $YOTEL_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Sales Bot",
    "ws_url": "wss://ai.example.com/yotel/audio-fork",
    "auth_token": "shared-secret-rotates-monthly",
    "audio_format": {"sample_rate": 8000, "encoding": "pcma"},
    "max_concurrent": 25
  }'

import os, yotel
client = yotel.Client(api_key=os.environ["YOTEL_KEY"])

agent = client.voice_agents.create(
    name="Sales Bot",
    ws_url="wss://ai.example.com/yotel/audio-fork",
    auth_token="shared-secret-rotates-monthly",
    audio_format={"sample_rate": 8000, "encoding": "pcma"},
    max_concurrent=25,
)
print(agent.id)

import { YotelClient } from "@yotel/client";
const client = new YotelClient({ apiKey: process.env.YOTEL_KEY! });

const agent = await client.voiceAgents.create({
  name: "Sales Bot",
  ws_url: "wss://ai.example.com/yotel/audio-fork",
  auth_token: "shared-secret-rotates-monthly",
  audio_format: { sample_rate: 8000, encoding: "pcma" },
  max_concurrent: 25,
});
console.log(agent.id);

max_concurrent is enforced by Redis-backed counters; the campaign engine + flow executor reject new calls with 429 max_concurrent once the cap is hit.

Optional: set as tenant default

Defaults are the bottom of the override hierarchy (flow node > campaign > tenant default > lazy-create from tenants.voice_agent_default_ws_url). If you’ll always use the same agent, set it once and stop wiring per-campaign:

Python

client.voice_agents.set_as_default(agent.id)

If none of the four steps resolves, AI calls fail with 424 NoVoiceAgentConfigured.

Quick start: tenant-level default URL

For single-agent tenants who don’t want to manage voice_agents rows explicitly, set tenants.voice_agent_default_ws_url once. The first campaign call lazy-creates a voice_agents row from that URL and pins it as the tenant default. Subsequent updates to the tenant column do not mutate the existing row — once created, the voice_agents row is the source of truth. The auto-created row uses sensible defaults (no Bearer auth, max concurrent = 100, drachtio subprotocol, conservative 30-minute token lifetime). Tenants who need different values can update the row via the standard voice_agents.update() API after bootstrap.

Optional: pin token lifetime or subprotocol

Two per-agent knobs from migration 032:

token_lifetime — '30min_with_refresh' (default) or 'ws_session'. The first keeps the §11.3 refresh-at-25-minutes contract; the second mints a +24h-exp token that’s auto-revoked when the WS closes. Pick 'ws_session' if your AI service can’t handle the token_refresh text frame.
subprotocol — 'audio.drachtio.org' (default) or 'audio.jambonz.org'. Pinned per-agent and validated at originate against the FreeSWITCH host’s MOD_AUDIO_FORK_SUBPROTOCOL_NAME; a mismatch raises 424 fail-fast rather than letting the WS upgrade fail later.

Python

agent = client.voice_agents.create(
    name="Sales Bot",
    ws_url="wss://ai.example.com/yotel/audio-fork",
    token_lifetime="ws_session",        # no refresh frames
    subprotocol="audio.jambonz.org",    # binary L16 audio
)

TypeScript

const agent = await client.voiceAgents.create({
  name: "Sales Bot",
  ws_url: "wss://ai.example.com/yotel/audio-fork",
  token_lifetime: "ws_session",
  subprotocol: "audio.jambonz.org",
});

Step 2 — Attach to an outbound campaign

Set post_answer_action="connect_voice_agent" and either pin a voice_agent_id on the campaign or rely on the tenant default.

curl -X PATCH https://api.yotel.in/api/v1/campaigns/$CID \
  -H "Authorization: Bearer $YOTEL_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "post_answer_action": "connect_voice_agent",
    "voice_agent_id": "'$AGENT_ID'"
  }'

client.campaigns.update(
    campaign_id,
    post_answer_action="connect_voice_agent",
    voice_agent_id=agent.id,
)

await client.campaigns.update(campaignId, {
  post_answer_action: "connect_voice_agent",
  voice_agent_id: agent.id,
});

When the campaign dials a lead and the callee answers, FreeSWITCH runs the connect_voice_agent extension, opens the audio fork to the resolved ws_url, and starts streaming.

Step 3 — Or attach to a flow node (inbound DID calls)

For inbound calls handled by the visual flow builder, drop a connect_voice_agent node into your flow and pin a voice_agent_id on its data block. Inbound calls arriving at that node hand off to the AI exactly like an outbound campaign call.

{
  "id": "node-3",
  "type": "connect_voice_agent",
  "data": {
    "voice_agent_id": "<agent-uuid>"
  }
}

The flow validator rejects connect_voice_agent nodes without a voice_agent_id, so misconfiguration fails fast on flow save — not at call time.

Step 4 — Run a call

Outbound: start the campaign as usual (POST /campaigns/{id}/start). Each lead is dialled, DND-scrubbed, answered, and forked to your WS. You’ll see call.started → call.answered → ai_session.started on your webhook endpoint within a few seconds. Inbound: dial the DID associated with the flow. Yotel runs the flow, hits the connect_voice_agent node, and connects the audio. Stereo recording is enabled automatically (L=caller, R=AI); the call.recording_ready webhook will arrive with stereo: true once the WAV is uploaded.

What happens on the wire

Yotel originate ─▶ FreeSWITCH dials lead ─▶ caller answers
                          │
                          │ uuid_audio_fork wss://your-agent/...
                          ▼
                  ┌──────────────────┐
                  │  Your voice AI   │
                  │  (v1 protocol)   │
                  └──────────────────┘
                          │
                          │ POST /api/v1/ai-sessions/{call_id}/control
                          ▼
                Yotel control endpoint  ─▶  webhooks fire

Your AI receives the metadata frame on connect — it carries call_id, tenant_id, voice_agent_id, lead context, and a yotel_callback_token (a 30-min JWT bound to that call). Use the callback token to invoke control verbs back into Yotel.

Next steps

Control API reference — all 25 verbs your AI can invoke mid-call (transfer, hangup, hold, conference, request_supervisor, schedule_callback, …).
AI session webhook events — the 5 new ai_session.* events plus the stereo flag on call.recording_ready.
v1 audio-fork protocol spec — wire format your WebSocket implementation must conform to.

Get started

OAuth

Core concepts

Voice agents

SDKs

Integrations

Prerequisites

Step 1 — Create a voice agent

Optional: set as tenant default

Quick start: tenant-level default URL

Optional: pin token lifetime or subprotocol

Step 2 — Attach to an outbound campaign

Step 3 — Or attach to a flow node (inbound DID calls)

Step 4 — Run a call

What happens on the wire

Next steps

​Prerequisites

​Step 1 — Create a voice agent

​Optional: set as tenant default

​Quick start: tenant-level default URL

​Optional: pin token lifetime or subprotocol

​Step 2 — Attach to an outbound campaign

​Step 3 — Or attach to a flow node (inbound DID calls)

​Step 4 — Run a call

​What happens on the wire

​Next steps

Prerequisites

Step 1 — Create a voice agent

Optional: set as tenant default

Quick start: tenant-level default URL

Optional: pin token lifetime or subprotocol

Step 2 — Attach to an outbound campaign

Step 3 — Or attach to a flow node (inbound DID calls)

Step 4 — Run a call

What happens on the wire

Next steps