log in
consulting hosting industries the daily tools about contact

Why Your DBA Hates Your PostGraphile-Generated Queries

PostGraphile is genuinely impressive until you look at what it's actually sending to Postgres. Here's what surprised me.

PostGraphile is one of those tools that feels like a cheat code the first time you point it at a Postgres schema and get a fully typed GraphQL API in under a minute. Then you go to production and your DBA — or in my case, me wearing the DBA hat — starts looking at pg_stat_statements and has a very bad afternoon.

I want to be clear: I don't think PostGraphile is bad software. It's actually impressively engineered. But there's a specific category of problem it creates that I didn't fully appreciate until I was deep in it for a healthcare client, and I want to save you the same discovery.

What PostGraphile Actually Does

The pitch is simple: introspect your Postgres schema, generate a GraphQL API that mirrors it, handle auth via row-level security, and let your frontend developers query whatever they need without writing a resolver. It leans heavily on Postgres features — RLS, functions, smart comments — to the point where your database schema is your API contract.

For rapid prototyping and internal tooling, it's legitimately great. For a client-facing product with real load, you need to understand what's happening underneath.

The N+1 Problem, PostgreSQL Edition

Every GraphQL developer has heard of the N+1 problem. PostGraphile's answer to it is the @graphile/pg-aggregates plugin and, more importantly, its built-in look-ahead and join inflation. It uses a technique where it tries to collapse nested queries into a single SQL statement using lateral joins and JSON aggregation.

That sounds smart. And it is smart. The problem is what that single SQL statement looks like.

Here's a simple GraphQL query a frontend developer might write:

query {
  patients(first: 20, orderBy: LAST_NAME_ASC) {
    nodes {
      id
      lastName
      firstName
      appointments(last: 5, orderBy: SCHEDULED_AT_DESC) {
        nodes {
          scheduledAt
          provider {
            fullName
            npi
          }
        }
      }
    }
  }
}

Reasonable query. Three levels deep. PostGraphile collapses this into something roughly like:

select
  to_json(
    json_build_object(
      'data', json_build_object(
        'patients', json_build_object(
          'nodes', coalesce(
            (
              select json_agg(__local_0__)
              from (
                select
                  __local_1__."id" as "@id",
                  __local_1__."last_name" as "@lastName",
                  __local_1__."first_name" as "@firstName",
                  (
                    select coalesce(json_agg(__local_2__ order by __local_2__."@scheduledAt" desc), '[]'::json)
                    from (
                      select
                        __local_3__."scheduled_at" as "@scheduledAt",
                        (
                          select json_build_object(
                            'fullName', __local_4__."full_name",
                            'npi', __local_4__."npi"
                          )
                          from providers __local_4__
                          where __local_4__.id = __local_3__.provider_id
                          limit 1
                        ) as "@provider"
                      from appointments __local_3__
                      where __local_3__.patient_id = __local_1__.id
                      order by __local_3__.scheduled_at desc
                      limit 5
                    ) __local_2__
                  ) as "@appointments"
                from patients __local_1__
                order by __local_1__.last_name asc
                limit 20
              ) __local_0__
            ), '[]'::json
          )
        )
      )
    )
  )

I've simplified this. The actual output is gnarlier. The key issue: that correlated subquery on providers runs once per appointment row, inside a subquery that runs once per patient row. PostGraphile didn't actually solve N+1 here — it just moved it inside one round-trip to the database. Postgres is now doing the N+1 work internally.

What EXPLAIN ANALYZE Actually Shows

When I ran EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) on a query like this against a production-sized dataset — roughly 80k patients, 1.2M appointments — the plan was not pretty.

  • Postgres was doing a nested loop with a sequential scan on the providers table for each appointment row considered.
  • No index was being used on providers.id inside the correlated subquery context, because the planner decided a seq scan was cheaper given the statistics at the time.
  • Total planning + execution: 4.2 seconds.

The fix was a combination of:

  1. Adding a missing index on appointments(patient_id, scheduled_at DESC)
  2. Rewriting the problematic query as a Postgres function that PostGraphile could expose — more on that in a second.
  3. Setting default_statistics_target higher on the relevant columns.

The index alone dropped that specific query to 380ms. Still not great for a paginated list, but survivable while we worked on the function approach.

PostGraphile Functions Are the Real Escape Hatch

PostGraphile has a feature that more people should use from day one: you can define Postgres functions and it'll expose them as GraphQL queries or mutations automatically. This is how you take back control.

create or replace function patients_with_recent_appointments(
  page_size int default 20,
  page_offset int default 0
)
returns setof patients
language sql
stable
as $$
  select p.*
  from patients p
  where exists (
    select 1 from appointments a
    where a.patient_id = p.id
      and a.scheduled_at >= now() - interval '90 days'
  )
  order by p.last_name asc
  limit page_size
  offset page_offset;
$$;

comment on function patients_with_recent_appointments(int, int)
  is E'@name patientsWithRecentAppointments';

Now PostGraphile exposes this as a patientsWithRecentAppointments query, and I control the SQL entirely. The frontend still gets the nice GraphQL interface. I get to write a query that uses the indexes I've built.

For the appointment/provider join, I moved to a materialized view that PreGraphile can query against, refreshed every 15 minutes — perfectly acceptable latency for that particular use case.

The Pagination Gotcha

PostGraphile's default cursor-based pagination uses ROW() comparisons under the hood for keyset pagination, which is correct in theory. In practice, I've seen it generate multi-column ROW comparisons that Postgres can't satisfy with a standard B-tree index:

where (last_name, id) > ('Smith', 1042)

That can use a composite index on (last_name, id), but only if Postgres decides the ROW comparison is SARGable in the current planner version and with your table statistics. On Postgres 14 and earlier I hit cases where it wasn't, and it fell back to filtering after a full sort. Always check EXPLAIN on your paginated queries before going live. Every time. Not once during dev. In staging, against production-scale data.

The Connection Overhead Nobody Talks About

PostGraphile runs a query to introspect your schema on startup, which is fine. But every GraphQL request also goes through its permission/RLS logic, which can add round-trips depending on your setup. If you're using postgraphile in library mode in a Node.js app without connection pooling configured correctly, you can exhaust your Postgres connection limit faster than you'd expect under moderate load.

I run PgBouncer in transaction mode in front of Postgres for any PostGraphile deployment now. Non-negotiable. Without it, I've seen a burst of 50 concurrent GraphQL requests open 50 Postgres connections that don't close cleanly.

# pgbouncer.ini (relevant bits)
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
pool_mode = transaction
max_client_conn = 500
default_pool_size = 20

Note: transaction mode means you can't use session-level things like SET LOCAL in the naive way. PostGraphile has settings for this (pgSettings callback in library mode), but you need to be deliberate about it.

When I'd Reach for PostGraphile

I'd use it for:

  • Internal admin tools and dashboards where query complexity is bounded and load is low
  • Rapid prototyping where you need a real API fast and will refactor later
  • Projects where your team is Postgres-fluent and will audit the generated SQL before shipping

I wouldn't use it as a set-it-and-forget-it solution for a public-facing API with unpredictable query shapes. The moment you let frontend developers write arbitrary nested queries against your schema without guardrails, you're going to have a performance incident. Not might. Will.

Hasura has some of the same problems, by the way. Any tool that translates arbitrary GraphQL into SQL without a human reviewing the execution plans is going to bite you eventually. PostGraphile is just particularly good at hiding the complexity until it matters.

What I Actually Do Now

For clients where PostGraphile makes sense, I use it with a short checklist before launch:

  1. Enable EXPLAIN logging for queries over 100ms in development
  2. Seed staging with production-scale data and run the actual GraphQL queries the frontend team has written
  3. Add indexes reactively based on what pg_stat_statements surfaces
  4. Move any query that touches more than two joins into a Postgres function
  5. PgBouncer in front, always

Following that checklist, I've had PostGraphile deployments that hold up fine. Ignoring it is how you end up on a call at 11pm because someone added a new GraphQL fragment that does a five-level-deep join and took down the database.

PostGraphile is a genuinely clever piece of software. Treat it like a code generator that needs review, not a black box that's handling your database for you, and you'll get along fine. The DBA hat is still yours to wear.

Related

Need help shipping something like this? Get in touch.