Skip to main content

Retry schedule

A delivery that fails retryably (HTTP 408, 429, 5xx, timeout, network error) gets retried on this schedule, with ±10% jitter:
After attempt #Next retry scheduled
1+5 seconds
2+30 seconds
3+3 minutes
4+15 minutes
5+1 hour
6+6 hours
7dead — no further attempts
Total window: ~7 hours 24 minutes.

What counts as a retryable failure

HTTP statusVerdict
2xxDelivered. No retry.
408 Request TimeoutRetry.
429 Too Many RequestsRetry.
5xxRetry.
other 4xx (400, 401, 403, 404, …)Dead immediately. No retry.
Connection timeout / refused / resetRetry.
We don’t retry 4xx (other than 408/429) because those indicate customer config errors — wrong URL, auth rejected on your side, route missing. Retries won’t fix those; they just burn your logs and our queue.

Auto-disable

After 20 consecutive failures AND no successful delivery in 24h, we auto-disable the subscription. You’ll:
  • See disabled_at set in the dashboard
  • See disabled_reason = "auto_disabled_failure_threshold" on the row
  • Stop receiving any deliveries until you manually re-enable
This is a circuit breaker — once your endpoint is abandoned or comprehensively broken, we stop wasting both sides’ resources. To re-enable: dashboard → Webhooks → the subscription → Re-enable.
The conjunction is load-bearing. A subscription with 20 failures followed by one success has consecutive_failures reset to 0 by the success — it won’t get auto-disabled. This means a brief bad hour doesn’t trigger a disable the next day.

Inspecting delivery state

Dashboard → Webhooks → Delivery log shows every attempt with:
  • Status (delivered, failed, dead, pending)
  • Attempt number (1 to 7)
  • HTTP status code returned
  • Response snippet (first 500 chars)
  • Duration in milliseconds
  • Next-retry-at timestamp for pending/failed rows
Filter by subscription + event type to debug a specific integration.

Ordering guarantee

None. Retries + per-tenant queuing mean a delivery for event E1 scheduled at t=0 may arrive after a delivery for E2 scheduled at t=1, especially when E1 hit a 5xx and E2 didn’t. If your pipeline needs strict ordering, key on timestamp in the payload and let your database sort them before processing.

What if my endpoint takes longer than 10 seconds?

We enforce a 10s HTTP timeout per attempt. A response that takes longer is counted as a failure (timeout) and retried. If you have legitimately long-running webhook work (calling external services, batch DB writes), the pattern is:
  1. Accept the POST and enqueue the event_id to your own queue (Kafka, SQS, Celery, whatever).
  2. Return 202 Accepted immediately.
  3. Process asynchronously.
This decouples our delivery SLA from your processing SLA.