When PgBouncer Becomes the Bottleneck Instead of the Fix
PgBouncer saved my Postgres clusters for years — until it quietly became the thing killing them. Here's what I missed.
PgBouncer fixed my Postgres connection problems for years without me really understanding why it worked. Then I hit a situation where it stopped working, and I had to actually understand it. That's what this post is about.
The Problem PgBouncer Actually Solves
Postgres is not Node. It doesn't handle thousands of concurrent connections gracefully. Each connection spawns a backend process — real memory, real OS overhead. A busy Laravel app with 20 PHP-FPM workers across 4 web servers is already asking for 80 connections before you've written a single query. Add queues, cron jobs, admin tooling, and you're at 150 fast. Postgres starts sweating around there on most VPS configs. At 300+ you're watching pg_stat_activity like a trauma nurse.
PgBouncer sits in front of Postgres and multiplexes client connections down to a small, sane pool of actual server connections. In transaction pooling mode — the most aggressive mode, and the one most people end up on — a client connection only holds a real server connection for the duration of a transaction. The rest of the time it's parked. You can have 500 application connections feeding through 20 actual Postgres backends. That's the pitch, and it's real.
I've been running PgBouncer in front of pretty much every production Postgres instance I manage since around 2017. For most workloads it's invisible infrastructure — you set it up, it hums, you forget it's there.
Until you can't.
How It Became My Bottleneck
I had a client in the biotech space — LIMS integration, a lot of async processing, bulk inserts from instrument data. The database was fine. CPU headroom, memory headroom, I/O was fine. But the app was timing out under load. Specifically, queue workers were backing up and eventually dying with SQLSTATE[08006]: connection failed errors.
My first instinct was Postgres. Checked pg_stat_activity, looked at lock waits, ran EXPLAIN ANALYZE on the slow queries. Everything looked reasonable. Then I actually looked at PgBouncer.
psql -h 127.0.0.1 -p 6432 -U pgbouncer pgbouncer -c "SHOW POOLS;"
The output told the story:
database | user | cl_active | cl_waiting | sv_active | sv_idle | sv_used | maxwait
-----------+------+-----------+------------+-----------+---------+---------+---------
myapp_db | app | 20 | 87 | 20 | 0 | 0 | 14
87 clients waiting. maxwait at 14 seconds. sv_active pinned at 20 — which was exactly my pool_size setting. I'd set pool_size = 20 years ago on a smaller workload and never revisited it. The pooler had a hard ceiling of 20 server connections and was queuing everything else.
But here's what made it subtle: Postgres had plenty of max_connections headroom. The database wasn't the constraint at all. PgBouncer was artificially throttling me down to 20 concurrent queries on a database that could have handled 80 without blinking.
The Config That Bit Me
A default or copy-pasted pgbouncer.ini will have something like:
[databases]
myapp_db = host=127.0.0.1 port=5432 dbname=myapp_db
[pgbouncer]
listen_port = 6432
listen_addr = 127.0.0.1
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
reserve_pool_size = 5
reserve_pool_timeout = 3
server_idle_timeout = 600
log_pooler_errors = 1
max_client_conn = 1000 looks generous. And it is — PgBouncer will happily accept 1000 client connections. It'll just queue all of them behind 20 actual server connections. You can have a thousand people in line for 20 cash registers and wonder why checkout is slow.
The fix was obvious in retrospect: raise default_pool_size to something that reflects your actual Postgres capacity. For that cluster I bumped it to 80, kept a 10-connection reserve, and the queue drained immediately.
default_pool_size = 80
reserve_pool_size = 10
reserve_pool_timeout = 2
But that just revealed the next problem.
Transaction Pooling Mode Will Wreck You If You're Not Careful
Transaction pooling is the mode everyone wants because it gives you the most aggressive multiplexing. It's also the mode most likely to break your application in ways that are hard to debug.
The rule is simple and brutal: anything that lives outside a transaction boundary doesn't work right. That means:
SET LOCALvariables — gone between statements- Prepared statements — not usable (or require
server_reset_queryto be very carefully configured) - Advisory locks — dangerous, can leak to another session
LISTEN/NOTIFY— broken entirely in transaction mode- Temporary tables — session-scoped, will behave unpredictably
Laravel's DB facade generally behaves fine because it wraps things properly. But the moment you start using pg_advisory_lock() for a distributed mutex pattern — which I've done for queue deduplication — you are in trouble. The lock might get released to a different application session before your code releases it intentionally.
I found this one the hard way with a job deduplication system. Advisory locks seemed to be releasing randomly. Took me an embarrassing amount of time to connect it to the pooling mode. Switched that specific connection to session pooling mode for the queue worker pool, problem gone.
PgBouncer lets you configure per-database or per-user pool modes, which is the right answer when you have mixed workloads:
[databases]
myapp_db = host=127.0.0.1 port=5432 dbname=myapp_db pool_mode=transaction
myapp_db_queue = host=127.0.0.1 port=5432 dbname=myapp_db pool_mode=session pool_size=10
Point your queue workers at port 6432 with myapp_db_queue as the database name. They get session semantics. Web workers hit myapp_db and get the aggressive transaction pooling. It's a bit ugly to maintain but it works.
Monitoring This Thing Properly
PgBouncer's SHOW commands are your instrumentation. I have a simple health check that I run via a scheduled Laravel command that pages me if any of these thresholds trip:
public function handle(): void
{
$conn = new PDO(
'pgsql:host=127.0.0.1;port=6432;dbname=pgbouncer',
'pgbouncer',
config('database.pgbouncer_password')
);
$pools = $conn->query('SHOW POOLS')->fetchAll(PDO::FETCH_ASSOC);
foreach ($pools as $pool) {
$waiting = (int) $pool['cl_waiting'];
$maxwait = (int) $pool['maxwait'];
$database = $pool['database'];
if ($waiting > 20) {
Log::warning("PgBouncer: {$database} has {$waiting} waiting clients");
}
if ($maxwait > 5) {
Log::warning("PgBouncer: {$database} maxwait is {$maxwait}s");
}
}
$stats = $conn->query('SHOW STATS')->fetchAll(PDO::FETCH_ASSOC);
foreach ($stats as $stat) {
// avg_wait_time is in microseconds
$avgWaitMs = ($stat['avg_wait_time'] ?? 0) / 1000;
if ($avgWaitMs > 100) {
Log::warning("PgBouncer: {$stat['database']} avg wait {$avgWaitMs}ms");
}
}
}
The key metrics are cl_waiting (clients queued up for a server connection), maxwait (how long the longest-waiting client has been sitting there), and avg_wait_time from SHOW STATS. If cl_waiting is consistently above zero during normal load, your pool_size is too small for the workload.
When to Actually Reach for PgBouncer
PgBouncer is the right answer when you have many short-lived client connections overwhelming Postgres — which is the default state of any PHP application that doesn't have persistent connections or a connection pool of its own. PHP-FPM spawning new connections on every request is the classic case. PgBouncer is essentially free to run alongside Postgres on the same host and adds maybe 1-2ms of latency in the worst case.
I'd reach for it any time:
- You're running PHP-FPM and hitting
max_connectionserrors in Postgres logs - You have more than ~50 concurrent application processes across your fleet
- You're on a managed Postgres instance (RDS, Cloud SQL) where you can't tune
max_connectionsyourself and the limits are painful
I'd be more careful about it (or at least very deliberate about pool mode) when:
- Your app uses advisory locks,
LISTEN/NOTIFY, or explicit temp tables - You're running long-lived persistent processes that expect session semantics — some queue workers, websocket servers
- You're already running something like pgpool-II for HA and you're stacking poolers (please don't stack poolers)
One alternative worth knowing: PgBouncer is a single-threaded process. For very high connection volumes — thousands of clients — that single thread becomes the bottleneck. At that scale, pgpool-II handles threading better, or you look at Odyssey from Yandex, which is multi-threaded and designed for massive scale. I haven't needed Odyssey in production yet, but I've read enough of the source to trust it.
The Real Lesson
PgBouncer is not a set-and-forget tool, and default_pool_size = 20 is not a reasonable default for a production app under real load. The right pool size is roughly: how many concurrent queries can your Postgres actually handle well, minus some headroom for admin access and replication. Check your CPU core count, your I/O throughput, your query latency under load — then set a number that reflects reality.
The monitoring commands are right there in the protocol. SHOW POOLS takes half a second to run. I should have been checking it all along instead of assuming the tool was working just because the connection errors had stopped.
Treat your pooler like the infrastructure it is. It has config, it has limits, it has failure modes. Don't install it and walk away.
Need help shipping something like this? Get in touch.