Postgres VACUUM and Table Bloat: What Tutorials Don't Tell You

Every Postgres tutorial shows you how to write a query. Almost none of them explain what happens to your table after you UPDATE a million rows — and why, six months later, a simple SELECT COUNT(*) is taking four seconds on a table that should fit in RAM.

That's bloat. And VACUUM is the thing standing between you and it.

What Postgres Actually Does When You Update a Row

This is the part that surprises people who come from MySQL or SQL Server. When you UPDATE a row in Postgres, it doesn't modify the row in place. It marks the old row as dead and writes a new row elsewhere in the page. Same deal with DELETE — the old row just gets flagged, it doesn't go anywhere immediately.

This is MVCC (Multi-Version Concurrency Control) doing its job. It's why Postgres can give you a consistent read snapshot without table locks. It's genuinely elegant. But it means your tables accumulate dead tuples — ghost rows that take up real space and get scanned by the query planner whether it wants to or not.

AUTOVACUUM is supposed to clean this up automatically. And it does — most of the time, on tables you haven't misconfigured, under load profiles the defaults were designed for. The defaults were designed for a lot of workloads, but not all of them.

When Autovacuum Falls Behind

I had a client — a Seattle e-commerce shop, mid-size catalog, heavy promotional cycles — where their order_items table ballooned to about 40 GB over two years. The actual live data was maybe 8 GB. The rest was dead tuples from status updates, fulfillment writes, and a repricing job that touched every row twice a night.

Autovacuum was running. It just couldn't keep up. By default, autovacuum kicks in when dead tuples exceed autovacuum_vacuum_scale_factor (20% of the table) plus autovacuum_vacuum_threshold (50 rows). On a 10-million-row table, that's 2,000,050 dead tuples before vacuum even wakes up. And when it does run, it's intentionally throttled via autovacuum_vacuum_cost_delay so it doesn't hammer I/O. On a busy table with aggressive writes, it's playing catch-up constantly.

You can see exactly how bad it is:

SELECT
  schemaname,
  relname,
  n_live_tup,
  n_dead_tup,
  ROUND(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct,
  last_autovacuum,
  last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 20;

If you're seeing dead_pct above 10-15% on a busy table, pay attention. If you're seeing it above 30%, something is wrong and query performance is already suffering.

What Bloat Actually Costs You

Dead tuples waste disk space, obviously. But the more insidious cost is query performance. The planner's row estimates get stale (which is why ANALYZE matters separately from VACUUM). Sequential scans touch more pages than necessary. Index scans have to skip dead entries. Your working set no longer fits in shared_buffers the way it should.

And there's a third thing people don't talk about: transaction ID wraparound. Postgres uses a 32-bit transaction ID counter. When it gets close to wrapping around (about 2 billion transactions), Postgres will start warning you, and eventually it will refuse to accept new writes entirely — a forced emergency vacuum situation. I've never personally hit full wraparound but I've seen the warning emails come in at 3 AM and it's not a fun morning. You can check your exposure:

SELECT
  datname,
  age(datfrozenxid) AS xid_age,
  2147483647 - age(datfrozenxid) AS xids_remaining
FROM pg_database
ORDER BY xid_age DESC;

If xid_age is creeping past 1.5 billion, start paying attention. Aggressive autovacuuming (or a manual VACUUM FREEZE) is your fix.

Tuning Autovacuum Per-Table

The global autovacuum settings are a blunt instrument. What you actually want is per-table tuning for your hot tables. Postgres lets you do this with storage parameters:

ALTER TABLE order_items SET (
  autovacuum_vacuum_scale_factor = 0.01,
  autovacuum_vacuum_threshold = 100,
  autovacuum_analyze_scale_factor = 0.005,
  autovacuum_vacuum_cost_delay = 2
);

That tells Postgres: vacuum order_items when dead tuples exceed 1% of the table (instead of 20%), start with just 100 dead tuples, and reduce the cost delay so it runs faster. On a write-heavy table this makes an enormous difference. The repricing job I mentioned earlier — after tuning these settings, bloat on that table stabilized under 5%.

You're not overriding global autovacuum. You're just telling it to be more aggressive on the tables that need it.

Running VACUUM Manually When You Need To

Sometimes autovacuum isn't enough and you need to intervene. A few scenarios where I'll run it manually:

After a large bulk delete or update (autovacuum won't see it fast enough)
Before a known high-traffic event
After a migration that touched millions of rows

Plain VACUUM reclaims dead tuples and makes the space available for reuse within the same table. It does not return space to the OS. For that you need VACUUM FULL, which rewrites the table entirely and locks it for the duration. Don't run VACUUM FULL on a production table during business hours. I've seen people do this on a 20 GB table and take the site down for 45 minutes.

-- Safe to run live. Doesn't lock. Doesn't shrink the file.
VACUUM (VERBOSE, ANALYZE) order_items;

-- Reclaims disk space to OS. Locks the table. Do this in a maintenance window.
VACUUM FULL order_items;

If you need to reclaim disk without a full lock, pg_repack is the tool. It rebuilds the table online with minimal locking. I've used it on tables in the 50 GB range without downtime. Install it, read the docs, don't wing it.

# Install via apt or whatever your distro uses, then:
pg_repack -d mydb -t order_items

The Visibility Map and Index-Only Scans

One more thing that doesn't get enough airtime: VACUUM maintains the visibility map, which tracks which pages contain only live tuples. The query planner uses this to enable index-only scans — where it can satisfy a query entirely from an index without touching the heap at all. If your visibility map is stale because VACUUM hasn't run recently, you lose those index-only scans and fall back to heap fetches.

You can see visibility map coverage:

SELECT
  relname,
  heap_blks_read,
  heap_blks_hit,
  idx_blks_read,
  idx_blks_hit
FROM pg_statio_user_tables
WHERE relname = 'order_items';

And to see whether a specific query is using an index-only scan or falling back to heap:

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, status FROM order_items WHERE customer_id = 12345;

Look for Index Only Scan vs Index Scan. The difference in I/O on a large table is significant.

What I'd Actually Monitor in Production

In every Postgres app I run on managed hosting, I have a cron job that dumps this to a monitoring table daily:

SELECT
  relname,
  pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
  n_dead_tup,
  n_live_tup,
  ROUND(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct,
  last_autovacuum::date,
  last_autoanalyze::date
FROM pg_stat_user_tables
JOIN pg_statio_user_tables USING (relid, schemaname, relname)
WHERE schemaname = 'public'
ORDER BY n_dead_tup DESC;

If dead_pct crosses 15% on any significant table, I want to know. I've wired this into a simple Laravel artisan command that posts to Slack if anything looks off. Not glamorous, but it's caught problems before users noticed them.

When to Dig Into This vs. Just Paying for RDS

If you're on RDS or Aurora and you're happy letting Amazon manage this, fine — RDS does a reasonable job with autovacuum defaults and the Performance Insights tooling will at least surface bloat problems before they become catastrophic. But you still need to understand what you're looking at when it flags something. The knobs are the same knobs.

If you're self-hosting Postgres (which I do for several clients on bare metal or DigitalOcean), this is on you. Nobody is watching your pg_stat_user_tables but you.

When I'd Reach for a Different Tool

Honestly, for most OLTP workloads Postgres handles this fine once you've tuned the per-table autovacuum settings and added basic monitoring. I'm not reaching for a different database because of bloat.

Where I'd reconsider the data model before tuning VACUUM: high-churn append-only patterns where you're updating status columns millions of times a day. In those cases I've had luck partitioning by a time column and dropping old partitions instead of deleting rows. Dropping a partition is instantaneous and doesn't leave a single dead tuple behind.

Also: if you have a queue-like table in Postgres — rows inserted and deleted constantly — that's a bloat machine. TimescaleDB, or just using Redis or SQS for the queue and Postgres for the durable stuff, is worth considering.

Bloat isn't a Postgres flaw, it's a tradeoff MVCC makes. Understand the tradeoff, tune accordingly, and monitor it. The teams that get burned are the ones who treat the database as a black box until something falls over.

Postgres VACUUM: The Maintenance Reality Tutorials Skip