summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/using-explain.html
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:15:05 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:15:05 +0000
commit46651ce6fe013220ed397add242004d764fc0153 (patch)
tree6e5299f990f88e60174a1d3ae6e48eedd2688b2b /doc/src/sgml/html/using-explain.html
parentInitial commit. (diff)
downloadpostgresql-14-46651ce6fe013220ed397add242004d764fc0153.tar.xz
postgresql-14-46651ce6fe013220ed397add242004d764fc0153.zip
Adding upstream version 14.5.upstream/14.5upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/html/using-explain.html')
-rw-r--r--doc/src/sgml/html/using-explain.html802
1 files changed, 802 insertions, 0 deletions
diff --git a/doc/src/sgml/html/using-explain.html b/doc/src/sgml/html/using-explain.html
new file mode 100644
index 0000000..0c2fb8f
--- /dev/null
+++ b/doc/src/sgml/html/using-explain.html
@@ -0,0 +1,802 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>14.1. Using EXPLAIN</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="performance-tips.html" title="Chapter 14. Performance Tips" /><link rel="next" href="planner-stats.html" title="14.2. Statistics Used by the Planner" /></head><body id="docContent" class="container-fluid col-10"><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">14.1. Using <code xmlns="http://www.w3.org/1999/xhtml" class="command">EXPLAIN</code></th></tr><tr><td width="10%" align="left"><a accesskey="p" href="performance-tips.html" title="Chapter 14. Performance Tips">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="performance-tips.html" title="Chapter 14. Performance Tips">Up</a></td><th width="60%" align="center">Chapter 14. Performance Tips</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="planner-stats.html" title="14.2. Statistics Used by the Planner">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="USING-EXPLAIN"><div class="titlepage"><div><div><h2 class="title" style="clear: both">14.1. Using <code class="command">EXPLAIN</code></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="using-explain.html#USING-EXPLAIN-BASICS">14.1.1. <code class="command">EXPLAIN</code> Basics</a></span></dt><dt><span class="sect2"><a href="using-explain.html#USING-EXPLAIN-ANALYZE">14.1.2. <code class="command">EXPLAIN ANALYZE</code></a></span></dt><dt><span class="sect2"><a href="using-explain.html#USING-EXPLAIN-CAVEATS">14.1.3. Caveats</a></span></dt></dl></div><a id="id-1.5.13.4.2" class="indexterm"></a><a id="id-1.5.13.4.3" class="indexterm"></a><p>
+ <span class="productname">PostgreSQL</span> devises a <em class="firstterm">query
+ plan</em> for each query it receives. Choosing the right
+ plan to match the query structure and the properties of the data
+ is absolutely critical for good performance, so the system includes
+ a complex <em class="firstterm">planner</em> that tries to choose good plans.
+ You can use the <a class="link" href="sql-explain.html" title="EXPLAIN"><code class="command">EXPLAIN</code></a> command
+ to see what query plan the planner creates for any query.
+ Plan-reading is an art that requires some experience to master,
+ but this section attempts to cover the basics.
+ </p><p>
+ Examples in this section are drawn from the regression test database
+ after doing a <code class="command">VACUUM ANALYZE</code>, using 9.3 development sources.
+ You should be able to get similar results if you try the examples
+ yourself, but your estimated costs and row counts might vary slightly
+ because <code class="command">ANALYZE</code>'s statistics are random samples rather
+ than exact, and because costs are inherently somewhat platform-dependent.
+ </p><p>
+ The examples use <code class="command">EXPLAIN</code>'s default <span class="quote">“<span class="quote">text</span>”</span> output
+ format, which is compact and convenient for humans to read.
+ If you want to feed <code class="command">EXPLAIN</code>'s output to a program for further
+ analysis, you should use one of its machine-readable output formats
+ (XML, JSON, or YAML) instead.
+ </p><div class="sect2" id="USING-EXPLAIN-BASICS"><div class="titlepage"><div><div><h3 class="title">14.1.1. <code class="command">EXPLAIN</code> Basics</h3></div></div></div><p>
+ The structure of a query plan is a tree of <em class="firstterm">plan nodes</em>.
+ Nodes at the bottom level of the tree are scan nodes: they return raw rows
+ from a table. There are different types of scan nodes for different
+ table access methods: sequential scans, index scans, and bitmap index
+ scans. There are also non-table row sources, such as <code class="literal">VALUES</code>
+ clauses and set-returning functions in <code class="literal">FROM</code>, which have their
+ own scan node types.
+ If the query requires joining, aggregation, sorting, or other
+ operations on the raw rows, then there will be additional nodes
+ above the scan nodes to perform these operations. Again,
+ there is usually more than one possible way to do these operations,
+ so different node types can appear here too. The output
+ of <code class="command">EXPLAIN</code> has one line for each node in the plan
+ tree, showing the basic node type plus the cost estimates that the planner
+ made for the execution of that plan node. Additional lines might appear,
+ indented from the node's summary line,
+ to show additional properties of the node.
+ The very first line (the summary line for the topmost
+ node) has the estimated total execution cost for the plan; it is this
+ number that the planner seeks to minimize.
+ </p><p>
+ Here is a trivial example, just to show what the output looks like:
+
+</p><pre class="screen">
+EXPLAIN SELECT * FROM tenk1;
+
+ QUERY PLAN
+-------------------------------------------------------------
+ Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)
+</pre><p>
+ </p><p>
+ Since this query has no <code class="literal">WHERE</code> clause, it must scan all the
+ rows of the table, so the planner has chosen to use a simple sequential
+ scan plan. The numbers that are quoted in parentheses are (left
+ to right):
+
+ </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
+ Estimated start-up cost. This is the time expended before the output
+ phase can begin, e.g., time to do the sorting in a sort node.
+ </p></li><li class="listitem"><p>
+ Estimated total cost. This is stated on the assumption that the plan
+ node is run to completion, i.e., all available rows are retrieved.
+ In practice a node's parent node might stop short of reading all
+ available rows (see the <code class="literal">LIMIT</code> example below).
+ </p></li><li class="listitem"><p>
+ Estimated number of rows output by this plan node. Again, the node
+ is assumed to be run to completion.
+ </p></li><li class="listitem"><p>
+ Estimated average width of rows output by this plan node (in bytes).
+ </p></li></ul></div><p>
+ </p><p>
+ The costs are measured in arbitrary units determined by the planner's
+ cost parameters (see <a class="xref" href="runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS" title="20.7.2. Planner Cost Constants">Section 20.7.2</a>).
+ Traditional practice is to measure the costs in units of disk page
+ fetches; that is, <a class="xref" href="runtime-config-query.html#GUC-SEQ-PAGE-COST">seq_page_cost</a> is conventionally
+ set to <code class="literal">1.0</code> and the other cost parameters are set relative
+ to that. The examples in this section are run with the default cost
+ parameters.
+ </p><p>
+ It's important to understand that the cost of an upper-level node includes
+ the cost of all its child nodes. It's also important to realize that
+ the cost only reflects things that the planner cares about.
+ In particular, the cost does not consider the time spent transmitting
+ result rows to the client, which could be an important
+ factor in the real elapsed time; but the planner ignores it because
+ it cannot change it by altering the plan. (Every correct plan will
+ output the same row set, we trust.)
+ </p><p>
+ The <code class="literal">rows</code> value is a little tricky because it is
+ not the number of rows processed or scanned by the
+ plan node, but rather the number emitted by the node. This is often
+ less than the number scanned, as a result of filtering by any
+ <code class="literal">WHERE</code>-clause conditions that are being applied at the node.
+ Ideally the top-level rows estimate will approximate the number of rows
+ actually returned, updated, or deleted by the query.
+ </p><p>
+ Returning to our example:
+
+</p><pre class="screen">
+EXPLAIN SELECT * FROM tenk1;
+
+ QUERY PLAN
+-------------------------------------------------------------
+ Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)
+</pre><p>
+ </p><p>
+ These numbers are derived very straightforwardly. If you do:
+
+</p><pre class="programlisting">
+SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1';
+</pre><p>
+
+ you will find that <code class="classname">tenk1</code> has 358 disk
+ pages and 10000 rows. The estimated cost is computed as (disk pages read *
+ <a class="xref" href="runtime-config-query.html#GUC-SEQ-PAGE-COST">seq_page_cost</a>) + (rows scanned *
+ <a class="xref" href="runtime-config-query.html#GUC-CPU-TUPLE-COST">cpu_tuple_cost</a>). By default,
+ <code class="varname">seq_page_cost</code> is 1.0 and <code class="varname">cpu_tuple_cost</code> is 0.01,
+ so the estimated cost is (358 * 1.0) + (10000 * 0.01) = 458.
+ </p><p>
+ Now let's modify the query to add a <code class="literal">WHERE</code> condition:
+
+</p><pre class="screen">
+EXPLAIN SELECT * FROM tenk1 WHERE unique1 &lt; 7000;
+
+ QUERY PLAN
+------------------------------------------------------------
+ Seq Scan on tenk1 (cost=0.00..483.00 rows=7001 width=244)
+ Filter: (unique1 &lt; 7000)
+</pre><p>
+
+ Notice that the <code class="command">EXPLAIN</code> output shows the <code class="literal">WHERE</code>
+ clause being applied as a <span class="quote">“<span class="quote">filter</span>”</span> condition attached to the Seq
+ Scan plan node. This means that
+ the plan node checks the condition for each row it scans, and outputs
+ only the ones that pass the condition.
+ The estimate of output rows has been reduced because of the
+ <code class="literal">WHERE</code> clause.
+ However, the scan will still have to visit all 10000 rows, so the cost
+ hasn't decreased; in fact it has gone up a bit (by 10000 * <a class="xref" href="runtime-config-query.html#GUC-CPU-OPERATOR-COST">cpu_operator_cost</a>, to be exact) to reflect the extra CPU
+ time spent checking the <code class="literal">WHERE</code> condition.
+ </p><p>
+ The actual number of rows this query would select is 7000, but the <code class="literal">rows</code>
+ estimate is only approximate. If you try to duplicate this experiment,
+ you will probably get a slightly different estimate; moreover, it can
+ change after each <code class="command">ANALYZE</code> command, because the
+ statistics produced by <code class="command">ANALYZE</code> are taken from a
+ randomized sample of the table.
+ </p><p>
+ Now, let's make the condition more restrictive:
+
+</p><pre class="screen">
+EXPLAIN SELECT * FROM tenk1 WHERE unique1 &lt; 100;
+
+ QUERY PLAN
+-------------------------------------------------------------------​-----------
+ Bitmap Heap Scan on tenk1 (cost=5.07..229.20 rows=101 width=244)
+ Recheck Cond: (unique1 &lt; 100)
+ -&gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0)
+ Index Cond: (unique1 &lt; 100)
+</pre><p>
+
+ Here the planner has decided to use a two-step plan: the child plan
+ node visits an index to find the locations of rows matching the index
+ condition, and then the upper plan node actually fetches those rows
+ from the table itself. Fetching rows separately is much more
+ expensive than reading them sequentially, but because not all the pages
+ of the table have to be visited, this is still cheaper than a sequential
+ scan. (The reason for using two plan levels is that the upper plan
+ node sorts the row locations identified by the index into physical order
+ before reading them, to minimize the cost of separate fetches.
+ The <span class="quote">“<span class="quote">bitmap</span>”</span> mentioned in the node names is the mechanism that
+ does the sorting.)
+ </p><p>
+ Now let's add another condition to the <code class="literal">WHERE</code> clause:
+
+</p><pre class="screen">
+EXPLAIN SELECT * FROM tenk1 WHERE unique1 &lt; 100 AND stringu1 = 'xxx';
+
+ QUERY PLAN
+-------------------------------------------------------------------​-----------
+ Bitmap Heap Scan on tenk1 (cost=5.04..229.43 rows=1 width=244)
+ Recheck Cond: (unique1 &lt; 100)
+ Filter: (stringu1 = 'xxx'::name)
+ -&gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0)
+ Index Cond: (unique1 &lt; 100)
+</pre><p>
+
+ The added condition <code class="literal">stringu1 = 'xxx'</code> reduces the
+ output row count estimate, but not the cost because we still have to visit
+ the same set of rows. Notice that the <code class="literal">stringu1</code> clause
+ cannot be applied as an index condition, since this index is only on
+ the <code class="literal">unique1</code> column. Instead it is applied as a filter on
+ the rows retrieved by the index. Thus the cost has actually gone up
+ slightly to reflect this extra checking.
+ </p><p>
+ In some cases the planner will prefer a <span class="quote">“<span class="quote">simple</span>”</span> index scan plan:
+
+</p><pre class="screen">
+EXPLAIN SELECT * FROM tenk1 WHERE unique1 = 42;
+
+ QUERY PLAN
+-------------------------------------------------------------------​----------
+ Index Scan using tenk1_unique1 on tenk1 (cost=0.29..8.30 rows=1 width=244)
+ Index Cond: (unique1 = 42)
+</pre><p>
+
+ In this type of plan the table rows are fetched in index order, which
+ makes them even more expensive to read, but there are so few that the
+ extra cost of sorting the row locations is not worth it. You'll most
+ often see this plan type for queries that fetch just a single row. It's
+ also often used for queries that have an <code class="literal">ORDER BY</code> condition
+ that matches the index order, because then no extra sorting step is needed
+ to satisfy the <code class="literal">ORDER BY</code>. In this example, adding
+ <code class="literal">ORDER BY unique1</code> would use the same plan because the
+ index already implicitly provides the requested ordering.
+ </p><p>
+ The planner may implement an <code class="literal">ORDER BY</code> clause in several
+ ways. The above example shows that such an ordering clause may be
+ implemented implicitly. The planner may also add an explicit
+ <code class="literal">sort</code> step:
+
+</p><pre class="screen">
+EXPLAIN SELECT * FROM tenk1 ORDER BY unique1;
+ QUERY PLAN
+-------------------------------------------------------------------
+ Sort (cost=1109.39..1134.39 rows=10000 width=244)
+ Sort Key: unique1
+ -&gt; Seq Scan on tenk1 (cost=0.00..445.00 rows=10000 width=244)
+</pre><p>
+
+ If a part of the plan guarantees an ordering on a prefix of the
+ required sort keys, then the planner may instead decide to use an
+ <code class="literal">incremental sort</code> step:
+
+</p><pre class="screen">
+EXPLAIN SELECT * FROM tenk1 ORDER BY four, ten LIMIT 100;
+ QUERY PLAN
+-------------------------------------------------------------------​-----------------------------------
+ Limit (cost=521.06..538.05 rows=100 width=244)
+ -&gt; Incremental Sort (cost=521.06..2220.95 rows=10000 width=244)
+ Sort Key: four, ten
+ Presorted Key: four
+ -&gt; Index Scan using index_tenk1_on_four on tenk1 (cost=0.29..1510.08 rows=10000 width=244)
+</pre><p>
+
+ Compared to regular sorts, sorting incrementally allows returning tuples
+ before the entire result set has been sorted, which particularly enables
+ optimizations with <code class="literal">LIMIT</code> queries. It may also reduce
+ memory usage and the likelihood of spilling sorts to disk, but it comes at
+ the cost of the increased overhead of splitting the result set into multiple
+ sorting batches.
+ </p><p>
+ If there are separate indexes on several of the columns referenced
+ in <code class="literal">WHERE</code>, the planner might choose to use an AND or OR
+ combination of the indexes:
+
+</p><pre class="screen">
+EXPLAIN SELECT * FROM tenk1 WHERE unique1 &lt; 100 AND unique2 &gt; 9000;
+
+ QUERY PLAN
+-------------------------------------------------------------------​------------------
+ Bitmap Heap Scan on tenk1 (cost=25.08..60.21 rows=10 width=244)
+ Recheck Cond: ((unique1 &lt; 100) AND (unique2 &gt; 9000))
+ -&gt; BitmapAnd (cost=25.08..25.08 rows=10 width=0)
+ -&gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0)
+ Index Cond: (unique1 &lt; 100)
+ -&gt; Bitmap Index Scan on tenk1_unique2 (cost=0.00..19.78 rows=999 width=0)
+ Index Cond: (unique2 &gt; 9000)
+</pre><p>
+
+ But this requires visiting both indexes, so it's not necessarily a win
+ compared to using just one index and treating the other condition as
+ a filter. If you vary the ranges involved you'll see the plan change
+ accordingly.
+ </p><p>
+ Here is an example showing the effects of <code class="literal">LIMIT</code>:
+
+</p><pre class="screen">
+EXPLAIN SELECT * FROM tenk1 WHERE unique1 &lt; 100 AND unique2 &gt; 9000 LIMIT 2;
+
+ QUERY PLAN
+-------------------------------------------------------------------​------------------
+ Limit (cost=0.29..14.48 rows=2 width=244)
+ -&gt; Index Scan using tenk1_unique2 on tenk1 (cost=0.29..71.27 rows=10 width=244)
+ Index Cond: (unique2 &gt; 9000)
+ Filter: (unique1 &lt; 100)
+</pre><p>
+ </p><p>
+ This is the same query as above, but we added a <code class="literal">LIMIT</code> so that
+ not all the rows need be retrieved, and the planner changed its mind about
+ what to do. Notice that the total cost and row count of the Index Scan
+ node are shown as if it were run to completion. However, the Limit node
+ is expected to stop after retrieving only a fifth of those rows, so its
+ total cost is only a fifth as much, and that's the actual estimated cost
+ of the query. This plan is preferred over adding a Limit node to the
+ previous plan because the Limit could not avoid paying the startup cost
+ of the bitmap scan, so the total cost would be something over 25 units
+ with that approach.
+ </p><p>
+ Let's try joining two tables, using the columns we have been discussing:
+
+</p><pre class="screen">
+EXPLAIN SELECT *
+FROM tenk1 t1, tenk2 t2
+WHERE t1.unique1 &lt; 10 AND t1.unique2 = t2.unique2;
+
+ QUERY PLAN
+-------------------------------------------------------------------​-------------------
+ Nested Loop (cost=4.65..118.62 rows=10 width=488)
+ -&gt; Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244)
+ Recheck Cond: (unique1 &lt; 10)
+ -&gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0)
+ Index Cond: (unique1 &lt; 10)
+ -&gt; Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244)
+ Index Cond: (unique2 = t1.unique2)
+</pre><p>
+ </p><p>
+ In this plan, we have a nested-loop join node with two table scans as
+ inputs, or children. The indentation of the node summary lines reflects
+ the plan tree structure. The join's first, or <span class="quote">“<span class="quote">outer</span>”</span>, child
+ is a bitmap scan similar to those we saw before. Its cost and row count
+ are the same as we'd get from <code class="literal">SELECT ... WHERE unique1 &lt; 10</code>
+ because we are
+ applying the <code class="literal">WHERE</code> clause <code class="literal">unique1 &lt; 10</code>
+ at that node.
+ The <code class="literal">t1.unique2 = t2.unique2</code> clause is not relevant yet,
+ so it doesn't affect the row count of the outer scan. The nested-loop
+ join node will run its second,
+ or <span class="quote">“<span class="quote">inner</span>”</span> child once for each row obtained from the outer child.
+ Column values from the current outer row can be plugged into the inner
+ scan; here, the <code class="literal">t1.unique2</code> value from the outer row is available,
+ so we get a plan and costs similar to what we saw above for a simple
+ <code class="literal">SELECT ... WHERE t2.unique2 = <em class="replaceable"><code>constant</code></em></code> case.
+ (The estimated cost is actually a bit lower than what was seen above,
+ as a result of caching that's expected to occur during the repeated
+ index scans on <code class="literal">t2</code>.) The
+ costs of the loop node are then set on the basis of the cost of the outer
+ scan, plus one repetition of the inner scan for each outer row (10 * 7.91,
+ here), plus a little CPU time for join processing.
+ </p><p>
+ In this example the join's output row count is the same as the product
+ of the two scans' row counts, but that's not true in all cases because
+ there can be additional <code class="literal">WHERE</code> clauses that mention both tables
+ and so can only be applied at the join point, not to either input scan.
+ Here's an example:
+
+</p><pre class="screen">
+EXPLAIN SELECT *
+FROM tenk1 t1, tenk2 t2
+WHERE t1.unique1 &lt; 10 AND t2.unique2 &lt; 10 AND t1.hundred &lt; t2.hundred;
+
+ QUERY PLAN
+-------------------------------------------------------------------​--------------------------
+ Nested Loop (cost=4.65..49.46 rows=33 width=488)
+ Join Filter: (t1.hundred &lt; t2.hundred)
+ -&gt; Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244)
+ Recheck Cond: (unique1 &lt; 10)
+ -&gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0)
+ Index Cond: (unique1 &lt; 10)
+ -&gt; Materialize (cost=0.29..8.51 rows=10 width=244)
+ -&gt; Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..8.46 rows=10 width=244)
+ Index Cond: (unique2 &lt; 10)
+</pre><p>
+
+ The condition <code class="literal">t1.hundred &lt; t2.hundred</code> can't be
+ tested in the <code class="literal">tenk2_unique2</code> index, so it's applied at the
+ join node. This reduces the estimated output row count of the join node,
+ but does not change either input scan.
+ </p><p>
+ Notice that here the planner has chosen to <span class="quote">“<span class="quote">materialize</span>”</span> the inner
+ relation of the join, by putting a Materialize plan node atop it. This
+ means that the <code class="literal">t2</code> index scan will be done just once, even
+ though the nested-loop join node needs to read that data ten times, once
+ for each row from the outer relation. The Materialize node saves the data
+ in memory as it's read, and then returns the data from memory on each
+ subsequent pass.
+ </p><p>
+ When dealing with outer joins, you might see join plan nodes with both
+ <span class="quote">“<span class="quote">Join Filter</span>”</span> and plain <span class="quote">“<span class="quote">Filter</span>”</span> conditions attached.
+ Join Filter conditions come from the outer join's <code class="literal">ON</code> clause,
+ so a row that fails the Join Filter condition could still get emitted as
+ a null-extended row. But a plain Filter condition is applied after the
+ outer-join rules and so acts to remove rows unconditionally. In an inner
+ join there is no semantic difference between these types of filters.
+ </p><p>
+ If we change the query's selectivity a bit, we might get a very different
+ join plan:
+
+</p><pre class="screen">
+EXPLAIN SELECT *
+FROM tenk1 t1, tenk2 t2
+WHERE t1.unique1 &lt; 100 AND t1.unique2 = t2.unique2;
+
+ QUERY PLAN
+-------------------------------------------------------------------​-----------------------
+ Hash Join (cost=230.47..713.98 rows=101 width=488)
+ Hash Cond: (t2.unique2 = t1.unique2)
+ -&gt; Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244)
+ -&gt; Hash (cost=229.20..229.20 rows=101 width=244)
+ -&gt; Bitmap Heap Scan on tenk1 t1 (cost=5.07..229.20 rows=101 width=244)
+ Recheck Cond: (unique1 &lt; 100)
+ -&gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0)
+ Index Cond: (unique1 &lt; 100)
+</pre><p>
+ </p><p>
+ Here, the planner has chosen to use a hash join, in which rows of one
+ table are entered into an in-memory hash table, after which the other
+ table is scanned and the hash table is probed for matches to each row.
+ Again note how the indentation reflects the plan structure: the bitmap
+ scan on <code class="literal">tenk1</code> is the input to the Hash node, which constructs
+ the hash table. That's then returned to the Hash Join node, which reads
+ rows from its outer child plan and searches the hash table for each one.
+ </p><p>
+ Another possible type of join is a merge join, illustrated here:
+
+</p><pre class="screen">
+EXPLAIN SELECT *
+FROM tenk1 t1, onek t2
+WHERE t1.unique1 &lt; 100 AND t1.unique2 = t2.unique2;
+
+ QUERY PLAN
+-------------------------------------------------------------------​-----------------------
+ Merge Join (cost=198.11..268.19 rows=10 width=488)
+ Merge Cond: (t1.unique2 = t2.unique2)
+ -&gt; Index Scan using tenk1_unique2 on tenk1 t1 (cost=0.29..656.28 rows=101 width=244)
+ Filter: (unique1 &lt; 100)
+ -&gt; Sort (cost=197.83..200.33 rows=1000 width=244)
+ Sort Key: t2.unique2
+ -&gt; Seq Scan on onek t2 (cost=0.00..148.00 rows=1000 width=244)
+</pre><p>
+ </p><p>
+ Merge join requires its input data to be sorted on the join keys. In this
+ plan the <code class="literal">tenk1</code> data is sorted by using an index scan to visit
+ the rows in the correct order, but a sequential scan and sort is preferred
+ for <code class="literal">onek</code>, because there are many more rows to be visited in
+ that table.
+ (Sequential-scan-and-sort frequently beats an index scan for sorting many rows,
+ because of the nonsequential disk access required by the index scan.)
+ </p><p>
+ One way to look at variant plans is to force the planner to disregard
+ whatever strategy it thought was the cheapest, using the enable/disable
+ flags described in <a class="xref" href="runtime-config-query.html#RUNTIME-CONFIG-QUERY-ENABLE" title="20.7.1. Planner Method Configuration">Section 20.7.1</a>.
+ (This is a crude tool, but useful. See
+ also <a class="xref" href="explicit-joins.html" title="14.3. Controlling the Planner with Explicit JOIN Clauses">Section 14.3</a>.)
+ For example, if we're unconvinced that sequential-scan-and-sort is the best way to
+ deal with table <code class="literal">onek</code> in the previous example, we could try
+
+</p><pre class="screen">
+SET enable_sort = off;
+
+EXPLAIN SELECT *
+FROM tenk1 t1, onek t2
+WHERE t1.unique1 &lt; 100 AND t1.unique2 = t2.unique2;
+
+ QUERY PLAN
+-------------------------------------------------------------------​-----------------------
+ Merge Join (cost=0.56..292.65 rows=10 width=488)
+ Merge Cond: (t1.unique2 = t2.unique2)
+ -&gt; Index Scan using tenk1_unique2 on tenk1 t1 (cost=0.29..656.28 rows=101 width=244)
+ Filter: (unique1 &lt; 100)
+ -&gt; Index Scan using onek_unique2 on onek t2 (cost=0.28..224.79 rows=1000 width=244)
+</pre><p>
+
+ which shows that the planner thinks that sorting <code class="literal">onek</code> by
+ index-scanning is about 12% more expensive than sequential-scan-and-sort.
+ Of course, the next question is whether it's right about that.
+ We can investigate that using <code class="command">EXPLAIN ANALYZE</code>, as discussed
+ below.
+ </p></div><div class="sect2" id="USING-EXPLAIN-ANALYZE"><div class="titlepage"><div><div><h3 class="title">14.1.2. <code class="command">EXPLAIN ANALYZE</code></h3></div></div></div><p>
+ It is possible to check the accuracy of the planner's estimates
+ by using <code class="command">EXPLAIN</code>'s <code class="literal">ANALYZE</code> option. With this
+ option, <code class="command">EXPLAIN</code> actually executes the query, and then displays
+ the true row counts and true run time accumulated within each plan node,
+ along with the same estimates that a plain <code class="command">EXPLAIN</code>
+ shows. For example, we might get a result like this:
+
+</p><pre class="screen">
+EXPLAIN ANALYZE SELECT *
+FROM tenk1 t1, tenk2 t2
+WHERE t1.unique1 &lt; 10 AND t1.unique2 = t2.unique2;
+
+ QUERY PLAN
+-------------------------------------------------------------------​--------------------------------------------------------------
+ Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)
+ -&gt; Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1)
+ Recheck Cond: (unique1 &lt; 10)
+ -&gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1)
+ Index Cond: (unique1 &lt; 10)
+ -&gt; Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10)
+ Index Cond: (unique2 = t1.unique2)
+ Planning time: 0.181 ms
+ Execution time: 0.501 ms
+</pre><p>
+
+ Note that the <span class="quote">“<span class="quote">actual time</span>”</span> values are in milliseconds of
+ real time, whereas the <code class="literal">cost</code> estimates are expressed in
+ arbitrary units; so they are unlikely to match up.
+ The thing that's usually most important to look for is whether the
+ estimated row counts are reasonably close to reality. In this example
+ the estimates were all dead-on, but that's quite unusual in practice.
+ </p><p>
+ In some query plans, it is possible for a subplan node to be executed more
+ than once. For example, the inner index scan will be executed once per
+ outer row in the above nested-loop plan. In such cases, the
+ <code class="literal">loops</code> value reports the
+ total number of executions of the node, and the actual time and rows
+ values shown are averages per-execution. This is done to make the numbers
+ comparable with the way that the cost estimates are shown. Multiply by
+ the <code class="literal">loops</code> value to get the total time actually spent in
+ the node. In the above example, we spent a total of 0.220 milliseconds
+ executing the index scans on <code class="literal">tenk2</code>.
+ </p><p>
+ In some cases <code class="command">EXPLAIN ANALYZE</code> shows additional execution
+ statistics beyond the plan node execution times and row counts.
+ For example, Sort and Hash nodes provide extra information:
+
+</p><pre class="screen">
+EXPLAIN ANALYZE SELECT *
+FROM tenk1 t1, tenk2 t2
+WHERE t1.unique1 &lt; 100 AND t1.unique2 = t2.unique2 ORDER BY t1.fivethous;
+
+ QUERY PLAN
+-------------------------------------------------------------------​-------------------------------------------------------------------​------
+ Sort (cost=717.34..717.59 rows=101 width=488) (actual time=7.761..7.774 rows=100 loops=1)
+ Sort Key: t1.fivethous
+ Sort Method: quicksort Memory: 77kB
+ -&gt; Hash Join (cost=230.47..713.98 rows=101 width=488) (actual time=0.711..7.427 rows=100 loops=1)
+ Hash Cond: (t2.unique2 = t1.unique2)
+ -&gt; Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) (actual time=0.007..2.583 rows=10000 loops=1)
+ -&gt; Hash (cost=229.20..229.20 rows=101 width=244) (actual time=0.659..0.659 rows=100 loops=1)
+ Buckets: 1024 Batches: 1 Memory Usage: 28kB
+ -&gt; Bitmap Heap Scan on tenk1 t1 (cost=5.07..229.20 rows=101 width=244) (actual time=0.080..0.526 rows=100 loops=1)
+ Recheck Cond: (unique1 &lt; 100)
+ -&gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) (actual time=0.049..0.049 rows=100 loops=1)
+ Index Cond: (unique1 &lt; 100)
+ Planning time: 0.194 ms
+ Execution time: 8.008 ms
+</pre><p>
+
+ The Sort node shows the sort method used (in particular, whether the sort
+ was in-memory or on-disk) and the amount of memory or disk space needed.
+ The Hash node shows the number of hash buckets and batches as well as the
+ peak amount of memory used for the hash table. (If the number of batches
+ exceeds one, there will also be disk space usage involved, but that is not
+ shown.)
+ </p><p>
+ Another type of extra information is the number of rows removed by a
+ filter condition:
+
+</p><pre class="screen">
+EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE ten &lt; 7;
+
+ QUERY PLAN
+-------------------------------------------------------------------​--------------------------------------
+ Seq Scan on tenk1 (cost=0.00..483.00 rows=7000 width=244) (actual time=0.016..5.107 rows=7000 loops=1)
+ Filter: (ten &lt; 7)
+ Rows Removed by Filter: 3000
+ Planning time: 0.083 ms
+ Execution time: 5.905 ms
+</pre><p>
+
+ These counts can be particularly valuable for filter conditions applied at
+ join nodes. The <span class="quote">“<span class="quote">Rows Removed</span>”</span> line only appears when at least
+ one scanned row, or potential join pair in the case of a join node,
+ is rejected by the filter condition.
+ </p><p>
+ A case similar to filter conditions occurs with <span class="quote">“<span class="quote">lossy</span>”</span>
+ index scans. For example, consider this search for polygons containing a
+ specific point:
+
+</p><pre class="screen">
+EXPLAIN ANALYZE SELECT * FROM polygon_tbl WHERE f1 @&gt; polygon '(0.5,2.0)';
+
+ QUERY PLAN
+-------------------------------------------------------------------​-----------------------------------
+ Seq Scan on polygon_tbl (cost=0.00..1.05 rows=1 width=32) (actual time=0.044..0.044 rows=0 loops=1)
+ Filter: (f1 @&gt; '((0.5,2))'::polygon)
+ Rows Removed by Filter: 4
+ Planning time: 0.040 ms
+ Execution time: 0.083 ms
+</pre><p>
+
+ The planner thinks (quite correctly) that this sample table is too small
+ to bother with an index scan, so we have a plain sequential scan in which
+ all the rows got rejected by the filter condition. But if we force an
+ index scan to be used, we see:
+
+</p><pre class="screen">
+SET enable_seqscan TO off;
+
+EXPLAIN ANALYZE SELECT * FROM polygon_tbl WHERE f1 @&gt; polygon '(0.5,2.0)';
+
+ QUERY PLAN
+-------------------------------------------------------------------​-------------------------------------------------------
+ Index Scan using gpolygonind on polygon_tbl (cost=0.13..8.15 rows=1 width=32) (actual time=0.062..0.062 rows=0 loops=1)
+ Index Cond: (f1 @&gt; '((0.5,2))'::polygon)
+ Rows Removed by Index Recheck: 1
+ Planning time: 0.034 ms
+ Execution time: 0.144 ms
+</pre><p>
+
+ Here we can see that the index returned one candidate row, which was
+ then rejected by a recheck of the index condition. This happens because a
+ GiST index is <span class="quote">“<span class="quote">lossy</span>”</span> for polygon containment tests: it actually
+ returns the rows with polygons that overlap the target, and then we have
+ to do the exact containment test on those rows.
+ </p><p>
+ <code class="command">EXPLAIN</code> has a <code class="literal">BUFFERS</code> option that can be used with
+ <code class="literal">ANALYZE</code> to get even more run time statistics:
+
+</p><pre class="screen">
+EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tenk1 WHERE unique1 &lt; 100 AND unique2 &gt; 9000;
+
+ QUERY PLAN
+-------------------------------------------------------------------​--------------------------------------------------------------
+ Bitmap Heap Scan on tenk1 (cost=25.08..60.21 rows=10 width=244) (actual time=0.323..0.342 rows=10 loops=1)
+ Recheck Cond: ((unique1 &lt; 100) AND (unique2 &gt; 9000))
+ Buffers: shared hit=15
+ -&gt; BitmapAnd (cost=25.08..25.08 rows=10 width=0) (actual time=0.309..0.309 rows=0 loops=1)
+ Buffers: shared hit=7
+ -&gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) (actual time=0.043..0.043 rows=100 loops=1)
+ Index Cond: (unique1 &lt; 100)
+ Buffers: shared hit=2
+ -&gt; Bitmap Index Scan on tenk1_unique2 (cost=0.00..19.78 rows=999 width=0) (actual time=0.227..0.227 rows=999 loops=1)
+ Index Cond: (unique2 &gt; 9000)
+ Buffers: shared hit=5
+ Planning time: 0.088 ms
+ Execution time: 0.423 ms
+</pre><p>
+
+ The numbers provided by <code class="literal">BUFFERS</code> help to identify which parts
+ of the query are the most I/O-intensive.
+ </p><p>
+ Keep in mind that because <code class="command">EXPLAIN ANALYZE</code> actually
+ runs the query, any side-effects will happen as usual, even though
+ whatever results the query might output are discarded in favor of
+ printing the <code class="command">EXPLAIN</code> data. If you want to analyze a
+ data-modifying query without changing your tables, you can
+ roll the command back afterwards, for example:
+
+</p><pre class="screen">
+BEGIN;
+
+EXPLAIN ANALYZE UPDATE tenk1 SET hundred = hundred + 1 WHERE unique1 &lt; 100;
+
+ QUERY PLAN
+-------------------------------------------------------------------​-------------------------------------------------------------
+ Update on tenk1 (cost=5.08..230.08 rows=0 width=0) (actual time=3.791..3.792 rows=0 loops=1)
+ -&gt; Bitmap Heap Scan on tenk1 (cost=5.08..230.08 rows=102 width=10) (actual time=0.069..0.513 rows=100 loops=1)
+ Recheck Cond: (unique1 &lt; 100)
+ Heap Blocks: exact=90
+ -&gt; Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.05 rows=102 width=0) (actual time=0.036..0.037 rows=300 loops=1)
+ Index Cond: (unique1 &lt; 100)
+ Planning Time: 0.113 ms
+ Execution Time: 3.850 ms
+
+ROLLBACK;
+</pre><p>
+ </p><p>
+ As seen in this example, when the query is an <code class="command">INSERT</code>,
+ <code class="command">UPDATE</code>, or <code class="command">DELETE</code> command, the actual work of
+ applying the table changes is done by a top-level Insert, Update,
+ or Delete plan node. The plan nodes underneath this node perform
+ the work of locating the old rows and/or computing the new data.
+ So above, we see the same sort of bitmap table scan we've seen already,
+ and its output is fed to an Update node that stores the updated rows.
+ It's worth noting that although the data-modifying node can take a
+ considerable amount of run time (here, it's consuming the lion's share
+ of the time), the planner does not currently add anything to the cost
+ estimates to account for that work. That's because the work to be done is
+ the same for every correct query plan, so it doesn't affect planning
+ decisions.
+ </p><p>
+ When an <code class="command">UPDATE</code> or <code class="command">DELETE</code> command affects an
+ inheritance hierarchy, the output might look like this:
+
+</p><pre class="screen">
+EXPLAIN UPDATE parent SET f2 = f2 + 1 WHERE f1 = 101;
+ QUERY PLAN
+-------------------------------------------------------------------​-----------------------------------
+ Update on parent (cost=0.00..24.59 rows=0 width=0)
+ Update on parent parent_1
+ Update on child1 parent_2
+ Update on child2 parent_3
+ Update on child3 parent_4
+ -&gt; Result (cost=0.00..24.59 rows=4 width=14)
+ -&gt; Append (cost=0.00..24.54 rows=4 width=14)
+ -&gt; Seq Scan on parent parent_1 (cost=0.00..0.00 rows=1 width=14)
+ Filter: (f1 = 101)
+ -&gt; Index Scan using child1_pkey on child1 parent_2 (cost=0.15..8.17 rows=1 width=14)
+ Index Cond: (f1 = 101)
+ -&gt; Index Scan using child2_pkey on child2 parent_3 (cost=0.15..8.17 rows=1 width=14)
+ Index Cond: (f1 = 101)
+ -&gt; Index Scan using child3_pkey on child3 parent_4 (cost=0.15..8.17 rows=1 width=14)
+ Index Cond: (f1 = 101)
+</pre><p>
+
+ In this example the Update node needs to consider three child tables as
+ well as the originally-mentioned parent table. So there are four input
+ scanning subplans, one per table. For clarity, the Update node is
+ annotated to show the specific target tables that will be updated, in the
+ same order as the corresponding subplans.
+ </p><p>
+ The <code class="literal">Planning time</code> shown by <code class="command">EXPLAIN
+ ANALYZE</code> is the time it took to generate the query plan from the
+ parsed query and optimize it. It does not include parsing or rewriting.
+ </p><p>
+ The <code class="literal">Execution time</code> shown by <code class="command">EXPLAIN
+ ANALYZE</code> includes executor start-up and shut-down time, as well
+ as the time to run any triggers that are fired, but it does not include
+ parsing, rewriting, or planning time.
+ Time spent executing <code class="literal">BEFORE</code> triggers, if any, is included in
+ the time for the related Insert, Update, or Delete node; but time
+ spent executing <code class="literal">AFTER</code> triggers is not counted there because
+ <code class="literal">AFTER</code> triggers are fired after completion of the whole plan.
+ The total time spent in each trigger
+ (either <code class="literal">BEFORE</code> or <code class="literal">AFTER</code>) is also shown separately.
+ Note that deferred constraint triggers will not be executed
+ until end of transaction and are thus not considered at all by
+ <code class="command">EXPLAIN ANALYZE</code>.
+ </p></div><div class="sect2" id="USING-EXPLAIN-CAVEATS"><div class="titlepage"><div><div><h3 class="title">14.1.3. Caveats</h3></div></div></div><p>
+ There are two significant ways in which run times measured by
+ <code class="command">EXPLAIN ANALYZE</code> can deviate from normal execution of
+ the same query. First, since no output rows are delivered to the client,
+ network transmission costs and I/O conversion costs are not included.
+ Second, the measurement overhead added by <code class="command">EXPLAIN
+ ANALYZE</code> can be significant, especially on machines with slow
+ <code class="function">gettimeofday()</code> operating-system calls. You can use the
+ <a class="xref" href="pgtesttiming.html" title="pg_test_timing"><span class="refentrytitle"><span class="application">pg_test_timing</span></span></a> tool to measure the overhead of timing
+ on your system.
+ </p><p>
+ <code class="command">EXPLAIN</code> results should not be extrapolated to situations
+ much different from the one you are actually testing; for example,
+ results on a toy-sized table cannot be assumed to apply to large tables.
+ The planner's cost estimates are not linear and so it might choose
+ a different plan for a larger or smaller table. An extreme example
+ is that on a table that only occupies one disk page, you'll nearly
+ always get a sequential scan plan whether indexes are available or not.
+ The planner realizes that it's going to take one disk page read to
+ process the table in any case, so there's no value in expending additional
+ page reads to look at an index. (We saw this happening in the
+ <code class="literal">polygon_tbl</code> example above.)
+ </p><p>
+ There are cases in which the actual and estimated values won't match up
+ well, but nothing is really wrong. One such case occurs when
+ plan node execution is stopped short by a <code class="literal">LIMIT</code> or similar
+ effect. For example, in the <code class="literal">LIMIT</code> query we used before,
+
+</p><pre class="screen">
+EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 &lt; 100 AND unique2 &gt; 9000 LIMIT 2;
+
+ QUERY PLAN
+-------------------------------------------------------------------​------------------------------------------------------------
+ Limit (cost=0.29..14.71 rows=2 width=244) (actual time=0.177..0.249 rows=2 loops=1)
+ -&gt; Index Scan using tenk1_unique2 on tenk1 (cost=0.29..72.42 rows=10 width=244) (actual time=0.174..0.244 rows=2 loops=1)
+ Index Cond: (unique2 &gt; 9000)
+ Filter: (unique1 &lt; 100)
+ Rows Removed by Filter: 287
+ Planning time: 0.096 ms
+ Execution time: 0.336 ms
+</pre><p>
+
+ the estimated cost and row count for the Index Scan node are shown as
+ though it were run to completion. But in reality the Limit node stopped
+ requesting rows after it got two, so the actual row count is only 2 and
+ the run time is less than the cost estimate would suggest. This is not
+ an estimation error, only a discrepancy in the way the estimates and true
+ values are displayed.
+ </p><p>
+ Merge joins also have measurement artifacts that can confuse the unwary.
+ A merge join will stop reading one input if it's exhausted the other input
+ and the next key value in the one input is greater than the last key value
+ of the other input; in such a case there can be no more matches and so no
+ need to scan the rest of the first input. This results in not reading all
+ of one child, with results like those mentioned for <code class="literal">LIMIT</code>.
+ Also, if the outer (first) child contains rows with duplicate key values,
+ the inner (second) child is backed up and rescanned for the portion of its
+ rows matching that key value. <code class="command">EXPLAIN ANALYZE</code> counts these
+ repeated emissions of the same inner rows as if they were real additional
+ rows. When there are many outer duplicates, the reported actual row count
+ for the inner child plan node can be significantly larger than the number
+ of rows that are actually in the inner relation.
+ </p><p>
+ BitmapAnd and BitmapOr nodes always report their actual row counts as zero,
+ due to implementation limitations.
+ </p><p>
+ Normally, <code class="command">EXPLAIN</code> will display every plan node
+ created by the planner. However, there are cases where the executor
+ can determine that certain nodes need not be executed because they
+ cannot produce any rows, based on parameter values that were not
+ available at planning time. (Currently this can only happen for child
+ nodes of an Append or MergeAppend node that is scanning a partitioned
+ table.) When this happens, those plan nodes are omitted from
+ the <code class="command">EXPLAIN</code> output and a <code class="literal">Subplans
+ Removed: <em class="replaceable"><code>N</code></em></code> annotation appears
+ instead.
+ </p></div></div><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navfooter"><hr></hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="performance-tips.html" title="Chapter 14. Performance Tips">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="performance-tips.html" title="Chapter 14. Performance Tips">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="planner-stats.html" title="14.2. Statistics Used by the Planner">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 14. Performance Tips </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 14.2. Statistics Used by the Planner</td></tr></table></div></body></html> \ No newline at end of file