log in
consulting hosting industries the daily tools about contact

Laravel Queues: A Supervisor Config That Actually Survives Production

Most Laravel queue setups work fine in dev and fall apart quietly in prod. Here's the Supervisor config I've landed on after too many silent failures.

The default Laravel queue Supervisor config in the docs will get you started. It will also let jobs silently disappear after a Redis restart, leave zombie workers after a deploy, and OOM-kill your workers with no restart if you're not paying close attention. I've been burned by all three. Here's what I actually run.

What Laravel Queues Are Actually Doing

I'm not going to recap the docs. You know jobs get pushed onto a Redis list (or database, or SQS), and queue:work pops them off and runs them. What's worth internalizing is that queue:work is a long-running PHP process. It loads your app once and reuses that bootstrap for every job. That's the performance win. It's also why deploys are a footgun — the worker is still running old code until you restart it, and if you don't handle that deliberately, you get subtle bugs that are nearly impossible to reproduce.

Supervisor is the process manager that keeps those workers alive. It's a Linux daemon that watches your queue:work processes and restarts them when they die. It's not glamorous. It works. I've tried Horizon for smaller projects, and I'll touch on that at the end, but for most of the production systems I manage, plain Supervisor with a well-tuned config is what I trust.

The Config I Actually Deploy

Here's the full Supervisor program config I drop into /etc/supervisor/conf.d/laravel-worker.conf:

[program:laravel-worker]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/html/artisan queue:work redis \
    --sleep=3 \
    --tries=3 \
    --max-time=3600 \
    --max-jobs=500 \
    --timeout=90 \
    --queue=critical,default,low
numprocs=4
autostart=true
autorestart=true
startretries=10
user=www-data
stopwaitsecs=120
stopasgroup=true
killasgroup=true
stdout_logfile=/var/log/supervisor/laravel-worker.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/supervisor/laravel-worker-error.log
stderr_logfile_maxbytes=10MB
stderr_logfile_backups=3
environment=HOME="/var/www",USER="www-data"

Let me walk through the parts that actually matter.

--max-time=3600 and --max-jobs=500

This is the OOM kill fix. Without these, a single worker process runs forever, processing job after job, and PHP's memory creep will eventually get it. Laravel has --max-jobs and --max-time specifically for this. After 500 jobs or one hour — whichever comes first — the worker exits cleanly, and Supervisor restarts it with a fresh memory footprint.

I've seen workers on a healthcare client's system balloon from 60MB to over 800MB over a weekend because nobody had set these. The jobs kept running — mostly — but eventually Redis connections started timing out because the process was swapping. --max-jobs=500 on a process that handles maybe 2,000 jobs a day means each worker recycles ~4 times a day. Clean.

--timeout=90 and stopwaitsecs=120

These two need to be coordinated and almost nobody talks about it. --timeout=90 is how long a single job is allowed to run before the worker kills it. stopwaitsecs=120 is how long Supervisor waits for the worker to stop gracefully before it sends SIGKILL.

If stopwaitsecs is less than --timeout, you can get this scenario: Supervisor sends SIGTERM to the worker, the worker is mid-job and waiting for the job to finish or timeout, but Supervisor gets impatient and SIGKILLs it first. The job dies mid-execution, potentially leaving data half-written. Always set stopwaitsecs to at least --timeout + 30.

stopasgroup=true and killasgroup=true

Laravel workers sometimes spawn child processes — certain jobs shell out, or you're using something that forks. Without these two flags, Supervisor sends the stop signal to the parent process only. The children keep running as orphans. With stopasgroup and killasgroup, the signal goes to the entire process group. This bit me on a biotech client's system that was running shell-based file conversion jobs. The parent would die, Supervisor would report it clean, and three convert processes were still grinding away on the server.

--queue=critical,default,low

Queue priority via ordering. Workers check critical first, then default, then low. Simple and effective. I dispatch payment-related jobs to critical, standard business logic to default, and things like report generation or email digests to low.

In Laravel:

// Dispatch to a specific queue
ProcessPayment::dispatch($order)->onQueue('critical');
GenerateMonthlyReport::dispatch($client)->onQueue('low');

// Or set it on the job class itself
class GenerateMonthlyReport implements ShouldQueue
{
    public string $queue = 'low';
    // ...
}

Handling Deploys Without Dropping Jobs

This is the one that causes the most subtle bugs. When you deploy new code, the running workers still have the old code in memory. They'll continue processing jobs — including jobs that were dispatched with the new code — using the old job class definitions. If you renamed a method, changed a constructor signature, or refactored a dependency, you're in trouble.

The right move is to signal workers to restart after every deploy. queue:restart posts a restart signal to the cache, and each worker finishes its current job and then exits. Supervisor brings them back up with fresh code.

I have this in my deploy script (we use Envoy, but this is the core of it):

# After composer install, migrations, config:cache, etc.
php artisan queue:restart

# Optionally give workers a moment to drain current jobs
sleep 5

# Reload Supervisor config if it changed
supervisorctl reread
supervisorctl update
supervisorctl restart laravel-worker:*

The queue:restart signal is cache-based — it writes a timestamp to your cache store, and workers compare it on each iteration. Which means if your Redis goes down and you flush it, the restart signal is gone and workers won't know to reload. That's fine for most cases, but be aware of it.

If you want belt-and-suspenders, the supervisorctl restart at the end is a hard restart of all worker processes. I do both. queue:restart is graceful. supervisorctl restart is the backstop.

The Redis Restart Problem

When Redis goes down — planned maintenance, crash, whatever — your queue workers will throw a connection exception on the next job pop and die. Supervisor restarts them, they die again immediately because Redis is still down, and depending on your startretries and backoff, Supervisor may eventually give up and mark the program as FATAL.

When Redis comes back, your workers are dead and not coming back without a manual supervisorctl start laravel-worker:*. This has bitten me at 2am.

The fix has two parts. First, startretries=10 in the Supervisor config gives the worker 10 restart attempts before giving up. Combined with Supervisor's default exponential backoff, that's a decent window for Redis to recover.

Second, in your Laravel config/database.php Redis config, set a reasonable retry and backoff:

'redis' => [
    'client' => env('REDIS_CLIENT', 'phpredis'),
    'default' => [
        'host' => env('REDIS_HOST', '127.0.0.1'),
        'port' => env('REDIS_PORT', 6379),
        'database' => env('REDIS_DB', 0),
        'read_timeout' => 60,
        'retry_interval' => 500,  // ms between retries
        'retry_count' => 10,
    ],
],

And in your config/queue.php, make sure block_for is set if you're using Redis:

'redis' => [
    'driver' => 'redis',
    'connection' => 'default',
    'queue' => env('REDIS_QUEUE', 'default'),
    'retry_after' => 120,
    'block_for' => 5,  // seconds to block waiting for jobs
],

block_for keeps the worker from hammering Redis with constant polls. It does a blocking list pop for up to 5 seconds, then loops. Much friendlier than tight polling.

Failed Jobs — Don't Skip This

Make sure you've run the migration for the failed_jobs table and that your config/queue.php has failed.driver set to database. When a job exhausts its --tries, it lands there instead of silently disappearing.

php artisan queue:failed-table
php artisan migrate

I check the failed jobs table as part of my morning monitoring sweep. For higher-stakes clients, I have a simple Artisan command that runs on a schedule and fires a Slack alert if the count exceeds a threshold:

// In app/Console/Commands/AlertOnFailedJobs.php
public function handle()
{
    $count = DB::table('failed_jobs')
        ->where('failed_at', '>=', now()->subHour())
        ->count();

    if ($count > 5) {
        // Fire your notification here
        Log::channel('slack')->critical("{$count} failed jobs in the last hour");
    }
}

Not fancy. But I'd rather have something dumb that works than a complex monitoring setup that's one more thing to break.

When I'd Reach for Horizon Instead

Laravel Horizon is excellent. The dashboard is genuinely useful, the metrics are good, and the auto-scaling of workers based on queue depth is a real feature for high-throughput systems. I use it on a couple of e-commerce clients where job volume is spiky and visibility matters.

But Horizon adds a dependency (it manages Supervisor itself, sort of), it requires Redis specifically, and the dashboard is one more surface to secure. For most of what I build — healthcare apps, biotech LIMS integrations, real estate portals — queue volume is steady and predictable. Supervisor with this config is simpler, has fewer moving parts, and I understand exactly what it's doing. Simple wins in production at 2am.

If you're pushing millions of jobs a day or need per-queue throughput graphs, use Horizon. If you're running a normal business application with a few thousand jobs a day, this config is all you need.

One More Thing

Log your jobs. Not every line, but at minimum log when a job starts, when it completes, and when it fails with enough context to reproduce. I've spent too many hours on failed jobs where the only info in failed_jobs.exception was Connection refused with no indication of what the job was actually trying to do. Add a $this->job->uuid() and relevant model IDs to your log context. Future you will be grateful.

This config has been running on production systems for a couple years across a range of clients with zero silent worker deaths I couldn't trace back to something I'd configured wrong. That's about as good as it gets with long-running PHP processes.

Need help shipping something like this? Get in touch.