Retry schedule
A delivery that fails retryably (HTTP 408, 429, 5xx, timeout, network
error) gets retried on this schedule, with ±10% jitter:
| After attempt # | Next retry scheduled |
|---|
| 1 | +5 seconds |
| 2 | +30 seconds |
| 3 | +3 minutes |
| 4 | +15 minutes |
| 5 | +1 hour |
| 6 | +6 hours |
| 7 | dead — no further attempts |
Total window: ~7 hours 24 minutes.
What counts as a retryable failure
| HTTP status | Verdict |
|---|
| 2xx | Delivered. No retry. |
| 408 Request Timeout | Retry. |
| 429 Too Many Requests | Retry. |
| 5xx | Retry. |
| other 4xx (400, 401, 403, 404, …) | Dead immediately. No retry. |
| Connection timeout / refused / reset | Retry. |
We don’t retry 4xx (other than 408/429) because those indicate
customer config errors — wrong URL, auth rejected on your side,
route missing. Retries won’t fix those; they just burn your logs
and our queue.
Auto-disable
After 20 consecutive failures AND no successful delivery in 24h,
we auto-disable the subscription. You’ll:
- See
disabled_at set in the dashboard
- See
disabled_reason = "auto_disabled_failure_threshold" on the row
- Stop receiving any deliveries until you manually re-enable
This is a circuit breaker — once your endpoint is abandoned or
comprehensively broken, we stop wasting both sides’ resources.
To re-enable: dashboard → Webhooks → the subscription → Re-enable.
The conjunction is load-bearing. A subscription with 20 failures
followed by one success has consecutive_failures reset to 0 by
the success — it won’t get auto-disabled. This means a brief bad
hour doesn’t trigger a disable the next day.
Inspecting delivery state
Dashboard → Webhooks → Delivery log shows every attempt with:
- Status (delivered, failed, dead, pending)
- Attempt number (1 to 7)
- HTTP status code returned
- Response snippet (first 500 chars)
- Duration in milliseconds
- Next-retry-at timestamp for pending/failed rows
Filter by subscription + event type to debug a specific integration.
Ordering guarantee
None. Retries + per-tenant queuing mean a delivery for event
E1 scheduled at t=0 may arrive after a delivery for E2
scheduled at t=1, especially when E1 hit a 5xx and E2 didn’t.
If your pipeline needs strict ordering, key on timestamp in the
payload and let your database sort them before processing.
What if my endpoint takes longer than 10 seconds?
We enforce a 10s HTTP timeout per attempt. A response that takes
longer is counted as a failure (timeout) and retried.
If you have legitimately long-running webhook work (calling external
services, batch DB writes), the pattern is:
- Accept the POST and enqueue the event_id to your own queue (Kafka,
SQS, Celery, whatever).
- Return 202 Accepted immediately.
- Process asynchronously.
This decouples our delivery SLA from your processing SLA.