EXPLAIN / EXPLAIN ANALYZE — The Complete Guide to PostgreSQL Query Plans

By Ghazi

What is a query plan?

PostgreSQL devises a query plan for each query it receives. Choosing the right plan to match the query structure and the properties of the data is absolutely critical for good performance, so the system includes a complex planner that tries to choose good plans. The EXPLAIN command lets you see exactly what that planner decided — before or after execution.

EXPLAIN vs. EXPLAIN ANALYZE

These two commands are closely related but do fundamentally different things.

EXPLAIN shows the planner's estimated execution strategy without running the query. PostgreSQL constantly tracks query statistics, so though EXPLAIN gives us an estimate, it is a good baseline for finding performance issues.

EXPLAIN ANALYZE actually executes the query and shows real runtime statistics alongside the estimates. With this keyword, EXPLAIN does not only show the plan and PostgreSQL's estimates — it also executes the query (so be careful with UPDATE and DELETE!) and shows the actual execution time and row count for each step. This is indispensable for analyzing SQL performance.

Side effects warning

Because EXPLAIN ANALYZE truly runs the statement, writes to the database will happen. If you wish to use it on an INSERT, UPDATE, DELETE, MERGE, CREATE TABLE AS, or EXECUTE statement without letting the command affect your data, wrap it in a transaction and roll it back.

BEGIN;
EXPLAIN ANALYZE UPDATE orders SET status = 'shipped' WHERE id = 42;
ROLLBACK;

Full syntax

EXPLAIN [ (
  ANALYZE    [ boolean ],
  VERBOSE    [ boolean ],
  COSTS      [ boolean ],
  BUFFERS    [ boolean ],
  SETTINGS   [ boolean ],
  TIMING     [ boolean ],
  SUMMARY    [ boolean ],
  FORMAT     { TEXT | XML | JSON | YAML }
) ] statement;

Option reference

OptionDefaultRequires ANALYZEWhat it shows
ANALYZEoffActual execution time and row counts per node
VERBOSEoffNoOutput column lists, schema-qualified names
COSTSonNoEstimated startup and total cost, rows, width
BUFFERSoffYesShared/local/temp block reads, writes, and hits
SETTINGSoffNoNon-default planner configuration parameters
TIMINGonYesPer-node actual timing (disable to reduce overhead)
SUMMARYonYesPlanning and execution time footer
FORMATTEXTNoOutput format (TEXT is most human-readable)

The recommended incantation for real performance work:

EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

EXPLAIN (ANALYZE, BUFFERS) (with track_io_timing turned on) will show you everything you need to know to diagnose SQL statement performance problems.

Reading the output

Structure: it's a tree

A query plan is a tree of nodes. Each node represents one step the database takes to produce rows. Indentation shows parent–child relationships. Execution flows from the innermost/deepest nodes upward to the root.

SortHash JoinSeq Scan on ordersHashSeq Scan on customersroot (last)leaf (first)execution flow
A query plan is a tree. Deepest nodes execute first; the root returns the final result.

Anatomy of a node

Seq Scan on orders(cost=0.00..28.50 rows=1850 width=16)estimated(actual time=0.026..1.790 rows=1850 loops=1)actualstartup costtotal costest. rowsavg widthstartup mstotal msactual rowsloop count
Each node shows the planner's estimates and (with ANALYZE) the actual runtime numbers.

Estimated fields (cost=...):

  • 0.00 — startup cost: the planner's estimate of cost before this node returns its first row
  • 28.50 — total cost: the planner's estimate of cost to return all rows
  • rows=1850 — estimated number of rows this node will output
  • width=16 — estimated average byte width of each output row

Actual fields (actual time=...):

EXPLAIN ANALYZE gives you a second parenthesis with the actual execution time in milliseconds, the actual row count, and a loop count that shows how often that node was executed.

  • 0.026 — actual startup time in ms
  • 1.790 — actual total time in ms
  • rows=1850 — actual rows returned
  • loops=1 — how many times this node was executed

Important: When loops > 1, the time shown is the average per loop. Multiply actual time by loops to get the true total time spent in that node.

Cost units

The unit for the estimated query cost is artificial — 1 is the cost for reading an 8 kB page during a sequential scan. Costs only mean something relative to each other; they are not milliseconds. The planner compares plan costs to pick the winner.

The footer

At the bottom of an EXPLAIN ANALYZE output, you'll see:

Planning Time: 0.5 ms
Execution Time: 12.3 ms

Execution Time is the total wall-clock time the query took. This is the number that matters most for end users.

Scan node types

The EXPLAIN statement shows how tables involved in a statement will be scanned — by index scan or sequential scan — and if multiple tables are used, what kind of join algorithm will be used.

Seq Scanreads every rowapplies filterIndex ScanB-treejumps to matchingrows via indexBitmap Scan1. build bitmap010011002. fetch in ordercollects pointers, thenfetches in physical orderLow selectivityHigh selectivity← % of table rows that match →many rowsmoderatefew rows
PostgreSQL chooses the scan strategy based on how selective the query is.

Sequential Scan (Seq Scan)

The Seq Scan operation scans the entire relation (table) as stored on disk. Every row is read, and a filter is applied if there's a WHERE clause. This is expected and efficient for small tables or when a large proportion of rows are needed. It becomes a problem when it appears on a large table where only a few rows match.

Seq Scan on users  (cost=0.00..14.00 rows=2 width=222)
  Filter: (email = 'alice@example.com')

Index Scan

The Index Scan performs a B-tree traversal, walks through the leaf nodes to find all matching entries, and fetches the corresponding table data. Good for selective queries that return a small number of rows.

Index Only Scan

An Index Only Scan node occurs when the index completely covers the predicate and any returned columns — meaning no heap (table) access is needed at all. This is the most efficient scan type when applicable.

Bitmap Index Scan / Bitmap Heap Scan

These two always appear together. A bitmap scan fetches all the tuple-pointers from the index in one go, sorts them using an in-memory bitmap data structure, and then visits the table tuples in physical tuple-location order. This is the middle ground — more selective than a full Seq Scan but less precise per-lookup than an Index Scan. It excels when multiple index conditions are combined with AND/OR.

The Recheck Cond line you sometimes see below Bitmap Heap Scan appears when the bitmap becomes too large and PostgreSQL converts it to a lossy, page-level bitmap — it must re-check each row to confirm it actually matches.

Join node types

Generally, join operations process only two tables at a time. When a query has more joins, they are executed sequentially — first two tables, then the intermediate result with the next table.

Nested Looprowfor each outer row,scan inner tableHash Joinhashtablebuild hash, thenprobe with each rowMerge Joinsortedsortedboth inputs sorted,scan in parallel
PostgreSQL picks the join strategy based on table sizes, available indexes, and work_mem.

Nested Loop

For each row from the outer table, PostgreSQL scans the inner table looking for matches. Nested Loop is useful for smaller datasets or indexed tables. However, it can be slow with large datasets unless properly optimized. When the inner side has an index on the join column, this strategy is very efficient.

Hash Join

The hash join loads the candidate records from one side of the join into a hash table, which is then probed for each record from the other side of the join. Hash joins can only be used for equality conditions (=). They require enough work_mem to hold the hash table; if the table overflows memory, PostgreSQL spills to disk in multiple “batches,” which you'll see as Batches: N in the output.

Merge Join

A merge join puts both input relations in sorted order (via a sort or index scan), and matches up on equal values by scanning through the two in parallel. It is the only way to handle really big datasets, though merge joins are slower than hash joins when sorting is required. Like hash joins, merge joins only support equality conditions.

Key EXPLAIN ANALYZE patterns to recognize

Row count misestimates

Find the lowest node where the estimated row count is significantly different from the actual row count. Very often, this is the cause of bad performance — the long execution time somewhere else is only a consequence of a bad plan choice based on a bad estimate. “Significantly different” typically means a factor of 10 or so.

When the planner severely underestimates rows, it may choose a Nested Loop where a Hash Join would be faster. When it overestimates, it may choose a Seq Scan where an Index Scan would win. Fix this by running ANALYZE on the affected table, or by increasing default_statistics_target on columns with skewed data distributions.

Slow sequential scans with filters

Seq Scan on orders  (cost=0.00..2346.00 rows=35360 width=50)
  Filter: (status = 'pending')
  Rows Removed by Filter: 980000

“Rows Removed by Filter” is a red flag. If a filter eliminates most of the scanned rows, that table is a strong candidate for an index.

High loops in a Nested Loop

Index Scan on order_items  (actual time=0.003..0.003 rows=2 loops=4500)

Multiply: 0.003 ms × 4500 = 13.5 ms total. A nested loop with a large loop count is often the source of unexpectedly slow queries, even if the per-loop time looks small.

Sort spilling to disk

Sort Method: external merge  Disk: 14MB

When you see external merge instead of quicksort or top-N heapsort, the sort ran out of work_mem and spilled to disk. Increase work_mem for the session to fix this.

Hash Batches > 1

Hash  (actual time=45.2..45.2 rows=500000 loops=1)
  Buckets: 16384  Batches: 8  Memory Usage: 4096kB

Batches: 8 means the hash table was too large for work_mem and was processed in 8 passes. Increasing work_mem may bring it to Batches: 1 and yield a significant speedup.

The BUFFERS option

Adding BUFFERS shows block-level I/O per node:

Seq Scan on orders  (actual time=0.026..1.790 rows=1850 loops=1)
  Buffers: shared hit=345 read=92
  • shared hit — blocks found in PostgreSQL's shared buffer cache (fast, no disk I/O)
  • shared read — blocks read from disk or OS page cache (slower)
  • shared written — blocks written during the query
  • temp read/written — temporary files used for sorts or hash batches

A high read count relative to hit suggests the working set doesn't fit in shared_buffers, or the data was cold. Shared blocks contain data from regular tables and indexes; local blocks contain data from temporary tables and indexes; while temporary blocks contain short-term working data used in sorts, hashes, Materialize plan nodes, and similar cases.

EXPLAIN output formats

For tooling and automation, use FORMAT JSON:

EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT ...;

This produces a machine-readable tree that many GUI tools and visualizers can consume directly. The default TEXT format is the most human-readable for manual inspection.

Useful companion tools

Reading long plans as raw text is tedious. Several tools can help:

  • explain.depesz.com — paste your plan text to get a color-coded, annotated view that highlights the most expensive nodes
  • explain.dalibo.com — similar visualizer with a graphical tree view
  • pg_stat_statements — a built-in extension that tracks cumulative execution statistics across all queries, making it easy to find the queries that consume the most total time (not just the ones you thought to profile)

Practical checklist

When you open an EXPLAIN (ANALYZE, BUFFERS) output, work through this in order:

1Check Execution TimeIs it actually slow?2Find most expensive nodeWhere is actual time highest?3Compare est. vs actual rowsMismatch? Run ANALYZE4Seq Scan + Rows Removed?Candidate for an index5Sort: external merge?Increase work_mem6Hash Batches > 1?Increase work_mem7Multiply time × loopsHigh loops hide cost8Check shared readHigh reads = cache pressure
Work through this sequence top to bottom when reading any EXPLAIN ANALYZE output.

A note on repeatability

There is always a certain variation in query execution time, as data may not be in cache during the first execution. That's why it is valuable to repeat EXPLAIN ANALYZE a couple of times and see if the result changes. The first run may be slow due to cold I/O; subsequent runs reflect the warm-cache steady state.