Stripe Webhooks: The Retry Quirks That Will Eventually Burn You
Stripe's webhook docs are pretty good — until they're not. Here's what I've learned after years of building payment systems that actually have to survive the real world.
Stripe Webhooks: The Retry Quirks That Will Eventually Burn You
Stripe's webhook documentation is genuinely among the best in the payments industry. And yet I've watched the retry behavior silently double-fulfill orders, double-credit accounts, and cause support nightmares that took hours to untangle. The docs tell you what happens. They're quieter about what it means for your code when things go sideways at 2am.
What Webhooks Are Actually Solving
Stripe's event model exists because payment outcomes are asynchronous and your server can die. A charge doesn't always succeed instantly. A subscription renewal happens on Stripe's schedule, not yours. A dispute gets filed weeks after the transaction. If you only rely on the synchronous API response from your checkout flow, you'll miss half the lifecycle of a payment.
Webhooks are Stripe's way of pushing those events to you. But "push" implies reliability, and reliability in distributed systems means retry logic — which means your endpoint will sometimes receive the same event more than once, and you need to be ready for it.
This is the part that bites people.
Stripe's Retry Schedule (And What It Means in Practice)
Stripe retries failed webhook deliveries over roughly 3 days: immediately, then at 5 minutes, 30 minutes, 2 hours, 5 hours, 10 hours, and so on out to about 72 hours total. If your endpoint returns anything other than a 2xx within 30 seconds, Stripe considers it a failure and queues a retry.
That 30-second timeout is the first footgun. If your webhook handler does anything slow — sends an email, calls a third-party API, generates a PDF — you can time out and trigger a retry even though your code actually ran successfully. You just didn't respond fast enough.
The fix everyone knows: respond 200 immediately, process async.
// routes/api.php
Route::post('/webhooks/stripe', [StripeWebhookController::class, 'handle']);
<?php
namespace App\Http\Controllers;
use Illuminate\Http\Request;
use App\Jobs\ProcessStripeEvent;
use Stripe\Webhook;
use Stripe\Exception\SignatureVerificationException;
class StripeWebhookController extends Controller
{
public function handle(Request $request)
{
$payload = $request->getContent();
$sigHeader = $request->header('Stripe-Signature');
try {
$event = Webhook::constructEvent(
$payload,
$sigHeader,
config('services.stripe.webhook_secret')
);
} catch (SignatureVerificationException $e) {
return response('Invalid signature', 400);
}
// Respond immediately. Do the work in a job.
ProcessStripeEvent::dispatch($event->id, $event->type, $event->data->object);
return response('', 200);
}
}
That's table stakes. But dispatching to a queue doesn't solve the idempotency problem — it just moves it.
Idempotency: More Than Just Checking a Flag
Every Stripe event has an id field like evt_1OqXkZ2eZvKYlo2C8hL3mNpQ. The naive approach is to store processed event IDs and skip duplicates:
<?php
namespace App\Jobs;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use App\Models\StripeEventLog;
class ProcessStripeEvent implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable;
public function __construct(
public string $eventId,
public string $eventType,
public object $eventData
) {}
public function handle(): void
{
// Idempotency guard — atomic upsert to avoid race conditions
$inserted = StripeEventLog::insertOrIgnore([
'stripe_event_id' => $this->eventId,
'event_type' => $this->eventType,
'processed_at' => now(),
]);
if (!$inserted) {
// Already processed, bail out safely
return;
}
match ($this->eventType) {
'payment_intent.succeeded' => $this->handlePaymentSuccess(),
'customer.subscription.deleted' => $this->handleSubscriptionCanceled(),
'invoice.payment_failed' => $this->handleInvoiceFailure(),
default => null,
};
}
// ... handler methods
}
The key is insertOrIgnore (or a database-level unique constraint on stripe_event_id) rather than a SELECT-then-INSERT pattern. If two retries hit your queue worker simultaneously — which absolutely happens under load — a SELECT-check-then-INSERT has a race window. The unique constraint is the real guard; insertOrIgnore is just the clean way to use it from Laravel.
-- migration
CREATE TABLE stripe_event_logs (
id BIGSERIAL PRIMARY KEY,
stripe_event_id VARCHAR(255) NOT NULL,
event_type VARCHAR(100) NOT NULL,
processed_at TIMESTAMP NOT NULL,
CONSTRAINT uq_stripe_event_id UNIQUE (stripe_event_id)
);
The Replay Gotcha Nobody Warns You About
Here's the one that actually burned a client of mine — a healthcare SaaS doing subscription billing. Stripe's dashboard lets you replay events manually. Their ops team, troubleshooting a missed provisioning step, replayed a customer.subscription.created event from three months prior. My idempotency check caught it... on the production database. On their staging environment, which used the same Stripe account in test mode but a different database, the event ID was new. The job ran. It provisioned a test account with a real user ID collision that cascaded into a messy data fix.
Lesson: webhooks and Stripe test mode share event ID namespaces that are independent of each other. If you're using the same Stripe account's test mode webhook endpoint pointing at multiple environments (local, staging, CI), you can get duplicate processing across environments. Not a Stripe bug — just a thing to know.
More practically: Stripe's manual replay sends the original event with the original created timestamp. If your handler does anything time-sensitive — like expiring a free trial that should have been 14 days from created — a replayed event can do the wrong thing even with perfect idempotency, because you're re-running business logic against a stale timestamp. Guard against this explicitly:
private function handleTrialStarted(): void
{
$subscription = $this->eventData;
$trialEnd = $subscription->trial_end;
// If trial_end is already in the past, don't re-provision
if ($trialEnd && $trialEnd < now()->timestamp) {
Log::warning('Skipping stale trial_start event', [
'event_id' => $this->eventId,
'trial_end' => $trialEnd,
]);
return;
}
// proceed
}
The Signature Verification Timing Issue
Stripe signs webhooks with a timestamp embedded in the Stripe-Signature header. The Webhook::constructEvent() call validates that the timestamp is within a tolerance window (default: 300 seconds). This is good — it prevents replay attacks.
But if your server's clock drifts, or if a queued event sits in a backlogged queue for more than 5 minutes before you verify the signature (because you passed the raw payload to the job and verify inside the job), you'll start seeing SignatureVerificationException on perfectly legitimate events.
Verify the signature in the controller, before queuing. Pass the verified event data to the job, not the raw payload. I see people do it backwards — verify in the job for cleanliness — and it works great until their queue backs up during a traffic spike.
When I'd Reach for Stripe's Official Laravel Package
spatie/laravel-stripe-webhooks handles a lot of this boilerplate and adds a nice job-per-event-type routing pattern. I've used it on smaller projects. But I've moved away from it on anything complex because it obscures the retry and idempotency behavior behind abstraction, and when something goes wrong you're debugging two layers instead of one. For a SaaS billing system handling real money, I want to own that code.
laravel/cashier is different — that's for Stripe's subscription primitives specifically. It has its own webhook handling built in. If you're using Cashier, lean on it for subscription events and write your own handler for everything else. Mixing both can lead to double-handling if you're not careful about which event types each handler claims.
When I Would and Wouldn't Use Raw Webhook Handling
Reach for rolling your own when:
- You have complex post-payment workflows (provisioning, fulfillment, notifications across systems)
- You need fine-grained observability on event processing
- You're handling high volume and want control over queue priority and retry behavior
- You're integrating Stripe events with other systems (I've piped
invoice.paidinto a LIMS billing module for a biotech client — no package was going to handle that routing cleanly)
Reach for a package abstraction when:
- It's a simple SaaS with standard subscription flows
- You're moving fast and the business logic is thin
- You'll actually maintain the package upgrades
The Honest Summary
Stripe webhooks are well-designed. The retry system is doing the right thing. The idempotency problem isn't Stripe's fault — it's an inherent property of at-least-once delivery, and at-least-once is the only kind that's reliable. Your job is to write a handler that doesn't care how many times it's called.
Get a unique constraint in the database. Verify signatures before you queue. Dispatch fast, process async. Watch out for stale timestamps in replays. These aren't exotic edge cases — they're the normal failure modes of a production payment system under any real load.
I've shipped enough billing integrations that I keep a private starter template for this. Every new project that touches Stripe gets it on day one, not after the first incident.
Need help shipping something like this? Get in touch.