log in
consulting hosting industries the daily tools about contact

JSONB Everywhere Is a Smell

JSONB is one of Postgres's best features. It's also one of the easiest to abuse. Here's where I draw the line.

JSONB saved me on a project once, then nearly wrecked a different one. The difference wasn't the feature — it was whether I was using it to solve a real modeling problem or just to avoid making a decision.

I see this pattern constantly in codebases I inherit: a metadata column of type jsonb on half the tables, a settings blob on the other half, and somewhere in the middle, a data column that contains what should have been three normalized tables. It starts as pragmatism and ends as archaeology.

What JSONB Is Actually Good For

Let me be clear: JSONB is genuinely excellent. It's not a toy. I've used it in production for years and I'd miss it badly if it disappeared.

The cases where I reach for it without hesitation:

  • External data you don't own. Webhook payloads from Stripe, HL7 FHIR resources from an EHR, MLS feed extensions — stuff where the upstream schema changes and you're not in control. I worked on a healthcare integration last year where the FHIR extension array on patient records could contain anything the source system felt like sending. Storing that as JSONB and querying the parts I actually care about was the right call.
  • Sparse, user-configurable attributes. E-commerce products are the classic example. A t-shirt has size and color. A generator has voltage, phase, and fuel type. A book has ISBN and edition. You can't normalize that without either a horrifying EAV table or 40 nullable columns.
  • Audit logs and event envelopes. When I'm storing an event record and the payload varies by event type, JSONB on a payload column is clean and appropriate.
  • Feature flags and per-tenant config. Structured enough that you can read it in code, sparse enough that a schema column per flag would be absurd.

Those are legitimate uses. They have one thing in common: the data is genuinely variable, either in shape or in origin.

Where It Goes Wrong

The smell I'm talking about is different. It looks like this:

CREATE TABLE orders (
  id          bigserial PRIMARY KEY,
  customer_id bigint NOT NULL REFERENCES customers(id),
  created_at  timestamptz NOT NULL DEFAULT now(),
  data        jsonb NOT NULL
);

And inside data:

{
  "status": "shipped",
  "shipping_address": {
    "line1": "123 Main St",
    "city": "Seattle",
    "state": "WA",
    "zip": "98101"
  },
  "line_items": [
    { "sku": "WIDGET-01", "qty": 2, "unit_price": 1499 }
  ],
  "total_cents": 2998
}

I've walked into this exact structure more than once. The developer who built it wasn't lazy — they were moving fast and JSONB felt flexible. But status is queried in every order list view. total_cents is summed in every revenue report. line_items gets joined against a product table in fulfillment logic. This isn't variable data. This is a schema that someone decided not to write.

Now you get to enjoy:

-- This works, technically
SELECT
  id,
  (data->>'total_cents')::integer AS total_cents,
  data->>'status' AS status
FROM orders
WHERE data->>'status' = 'shipped'
  AND created_at > now() - interval '30 days';

That data->>'status' predicate can't use a normal index without a generated column or a partial expression index. You lose NOT NULL constraints on fields that absolutely should be required. Foreign keys become impossible — line_items[*].sku can't reference your products table. And six months from now, a new developer writes data->'total_cents' (the wrong operator, returns JSON not text) and the cast silently returns null instead of blowing up.

A Concrete Gotcha That Cost Me Real Time

I took over maintenance on a print management platform a while back. Jobs were stored with a spec JSONB column containing things like paper stock, finish, page count, bleed dimensions, turnaround tier. All queryable business data. The previous dev had put a GIN index on spec for containment queries. Fine.

The problem showed up when the client needed a report: average turnaround by paper stock for the last 12 months, broken down by month. In a normalized schema, that's a straightforward GROUP BY. Instead I was writing:

SELECT
  date_trunc('month', created_at) AS month,
  spec->>'paper_stock' AS paper_stock,
  AVG((spec->>'turnaround_days')::integer) AS avg_turnaround
FROM jobs
WHERE created_at > now() - interval '12 months'
  AND spec->>'paper_stock' IS NOT NULL
GROUP BY 1, 2
ORDER BY 1, 2;

The query ran. The GIN index didn't help this access pattern. It did a seq scan over 800k rows casting JSON text to integer for every row. Explain analyze showed 4.2 seconds on a table that should have answered in under 100ms.

The fix was a generated column:

ALTER TABLE jobs
  ADD COLUMN paper_stock text GENERATED ALWAYS AS (spec->>'paper_stock') STORED,
  ADD COLUMN turnaround_days integer GENERATED ALWAYS AS ((spec->>'turnaround_days')::integer) STORED;

CREATE INDEX ON jobs (paper_stock);
CREATE INDEX ON jobs (turnaround_days);

Query time dropped to 40ms. But now I had a hybrid: some of spec lived in real columns, some didn't. It was fine as a fix, but it's a flag that the original design was wrong. Those fields should have been columns from day one.

The Test I Actually Use

Before I put something in JSONB, I ask three questions:

  1. Will I filter rows by this value in a WHERE clause? If yes, it wants to be a column.
  2. Will I aggregate this value (SUM, AVG, COUNT)? If yes, it wants to be a column.
  3. Does the existence or value of this field have a foreign key relationship with another table? If yes, it absolutely needs to be a column.

If the answer to all three is no — the field is informational, read-only, variable in shape, or sourced externally — JSONB is reasonable.

The other thing I ask: could this be its own table? The line_items example above isn't a JSONB problem or even a schema problem, really. It's a normalization problem. Line items are entities. They have their own lifecycle. They belong in an order_line_items table with a foreign key back to orders.

When I'd Actually Reach for JSONB in a New Project

Here's how I'd structure an order system if I were starting fresh today:

CREATE TABLE orders (
  id              bigserial PRIMARY KEY,
  customer_id     bigint NOT NULL REFERENCES customers(id),
  status          text NOT NULL DEFAULT 'pending',
  total_cents     integer NOT NULL DEFAULT 0,
  shipping_address jsonb,          -- external postal address, no FK needed
  created_at      timestamptz NOT NULL DEFAULT now(),
  updated_at      timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE order_line_items (
  id          bigserial PRIMARY KEY,
  order_id    bigint NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
  product_id  bigint NOT NULL REFERENCES products(id),
  quantity    integer NOT NULL CHECK (quantity > 0),
  unit_price_cents integer NOT NULL
);

status and total_cents are real columns — I query on them constantly. shipping_address is JSONB — it's a snapshot of a postal address at order time, it doesn't need to be normalized, and I'm never grouping or joining on zip code. Line items are their own table.

This isn't dogma. I'll use JSONB more aggressively if the domain is genuinely dynamic. But the default should be: columns first, JSONB when the data earns it.

When I'd Skip JSONB Entirely

If the application is primarily reporting-heavy — finance, operations dashboards, compliance — I get much more conservative. Analysts write SQL against these databases. Business intelligence tools connect to them. Every JSONB field is friction for anyone who isn't a developer. The flexibility cost is real.

Also: if you're on a team with mixed Postgres experience, JSONB columns become a trap. The operator precedence between -> and ->> isn't obvious. Forgetting the cast on a numeric field causes bugs that look like data problems. It's a leaky abstraction that requires everyone touching the codebase to understand it.

What I Tell Clients

When I'm scoping a new project and someone asks about a flexible schema, I explain it this way: JSONB is an escape hatch, not a foundation. If you're building a house and you put escape hatches in the load-bearing walls, you'll be fine until you're not.

Use it for the things it's genuinely good at — external data, sparse attributes, event payloads, config blobs. Don't use it to avoid thinking about your schema. That thinking is the work. Skipping it just means doing it later, in production, with data you can't easily migrate, while something is on fire.

Postgres gives you JSONB because sometimes the world is messy. It doesn't give it to you so you can make your own data messy.

Related

Need help shipping something like this? Get in touch.