Retries and dead-lettering - Yotel Developer Docs

Retry schedule

A delivery that fails retryably (HTTP 408, 429, 5xx, timeout, network error) gets retried on this schedule, with ±10% jitter:

After attempt #	Next retry scheduled
1	+5 seconds
2	+30 seconds
3	+3 minutes
4	+15 minutes
5	+1 hour
6	+6 hours
7	dead — no further attempts

Total window: ~7 hours 24 minutes.

What counts as a retryable failure

HTTP status	Verdict
2xx	Delivered. No retry.
408 Request Timeout	Retry.
429 Too Many Requests	Retry.
5xx	Retry.
other 4xx (400, 401, 403, 404, …)	Dead immediately. No retry.
Connection timeout / refused / reset	Retry.

We don’t retry 4xx (other than 408/429) because those indicate customer config errors — wrong URL, auth rejected on your side, route missing. Retries won’t fix those; they just burn your logs and our queue.

Auto-disable

After 20 consecutive failures AND no successful delivery in 24h, we auto-disable the subscription. You’ll:

See disabled_at set in the dashboard
See disabled_reason = "auto_disabled_failure_threshold" on the row
Stop receiving any deliveries until you manually re-enable

This is a circuit breaker — once your endpoint is abandoned or comprehensively broken, we stop wasting both sides’ resources. To re-enable: dashboard → Webhooks → the subscription → Re-enable.

The conjunction is load-bearing. A subscription with 20 failures followed by one success has consecutive_failures reset to 0 by the success — it won’t get auto-disabled. This means a brief bad hour doesn’t trigger a disable the next day.

Inspecting delivery state

Dashboard → Webhooks → Delivery log shows every attempt with:

Status (delivered, failed, dead, pending)
Attempt number (1 to 7)
HTTP status code returned
Response snippet (first 500 chars)
Duration in milliseconds
Next-retry-at timestamp for pending/failed rows

Filter by subscription + event type to debug a specific integration.

Ordering guarantee

None. Retries + per-tenant queuing mean a delivery for event E1 scheduled at t=0 may arrive after a delivery for E2 scheduled at t=1, especially when E1 hit a 5xx and E2 didn’t. If your pipeline needs strict ordering, key on timestamp in the payload and let your database sort them before processing.

What if my endpoint takes longer than 10 seconds?

We enforce a 10s HTTP timeout per attempt. A response that takes longer is counted as a failure (timeout) and retried. If you have legitimately long-running webhook work (calling external services, batch DB writes), the pattern is:

Accept the POST and enqueue the event_id to your own queue (Kafka, SQS, Celery, whatever).
Return 202 Accepted immediately.
Process asynchronously.

This decouples our delivery SLA from your processing SLA.

​Retry schedule

​What counts as a retryable failure

​Auto-disable

​Inspecting delivery state

​Ordering guarantee

​What if my endpoint takes longer than 10 seconds?

Retry schedule

What counts as a retryable failure

Auto-disable

Inspecting delivery state

Ordering guarantee

What if my endpoint takes longer than 10 seconds?