diff options
Diffstat (limited to '')
-rw-r--r-- | doc/src/sgml/html/using-explain.html | 802 |
1 files changed, 802 insertions, 0 deletions
diff --git a/doc/src/sgml/html/using-explain.html b/doc/src/sgml/html/using-explain.html new file mode 100644 index 0000000..0c2fb8f --- /dev/null +++ b/doc/src/sgml/html/using-explain.html @@ -0,0 +1,802 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>14.1. Using EXPLAIN</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="performance-tips.html" title="Chapter 14. Performance Tips" /><link rel="next" href="planner-stats.html" title="14.2. Statistics Used by the Planner" /></head><body id="docContent" class="container-fluid col-10"><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">14.1. Using <code xmlns="http://www.w3.org/1999/xhtml" class="command">EXPLAIN</code></th></tr><tr><td width="10%" align="left"><a accesskey="p" href="performance-tips.html" title="Chapter 14. Performance Tips">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="performance-tips.html" title="Chapter 14. Performance Tips">Up</a></td><th width="60%" align="center">Chapter 14. Performance Tips</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="planner-stats.html" title="14.2. Statistics Used by the Planner">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="USING-EXPLAIN"><div class="titlepage"><div><div><h2 class="title" style="clear: both">14.1. Using <code class="command">EXPLAIN</code></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="using-explain.html#USING-EXPLAIN-BASICS">14.1.1. <code class="command">EXPLAIN</code> Basics</a></span></dt><dt><span class="sect2"><a href="using-explain.html#USING-EXPLAIN-ANALYZE">14.1.2. <code class="command">EXPLAIN ANALYZE</code></a></span></dt><dt><span class="sect2"><a href="using-explain.html#USING-EXPLAIN-CAVEATS">14.1.3. Caveats</a></span></dt></dl></div><a id="id-1.5.13.4.2" class="indexterm"></a><a id="id-1.5.13.4.3" class="indexterm"></a><p> + <span class="productname">PostgreSQL</span> devises a <em class="firstterm">query + plan</em> for each query it receives. Choosing the right + plan to match the query structure and the properties of the data + is absolutely critical for good performance, so the system includes + a complex <em class="firstterm">planner</em> that tries to choose good plans. + You can use the <a class="link" href="sql-explain.html" title="EXPLAIN"><code class="command">EXPLAIN</code></a> command + to see what query plan the planner creates for any query. + Plan-reading is an art that requires some experience to master, + but this section attempts to cover the basics. + </p><p> + Examples in this section are drawn from the regression test database + after doing a <code class="command">VACUUM ANALYZE</code>, using 9.3 development sources. + You should be able to get similar results if you try the examples + yourself, but your estimated costs and row counts might vary slightly + because <code class="command">ANALYZE</code>'s statistics are random samples rather + than exact, and because costs are inherently somewhat platform-dependent. + </p><p> + The examples use <code class="command">EXPLAIN</code>'s default <span class="quote">“<span class="quote">text</span>”</span> output + format, which is compact and convenient for humans to read. + If you want to feed <code class="command">EXPLAIN</code>'s output to a program for further + analysis, you should use one of its machine-readable output formats + (XML, JSON, or YAML) instead. + </p><div class="sect2" id="USING-EXPLAIN-BASICS"><div class="titlepage"><div><div><h3 class="title">14.1.1. <code class="command">EXPLAIN</code> Basics</h3></div></div></div><p> + The structure of a query plan is a tree of <em class="firstterm">plan nodes</em>. + Nodes at the bottom level of the tree are scan nodes: they return raw rows + from a table. There are different types of scan nodes for different + table access methods: sequential scans, index scans, and bitmap index + scans. There are also non-table row sources, such as <code class="literal">VALUES</code> + clauses and set-returning functions in <code class="literal">FROM</code>, which have their + own scan node types. + If the query requires joining, aggregation, sorting, or other + operations on the raw rows, then there will be additional nodes + above the scan nodes to perform these operations. Again, + there is usually more than one possible way to do these operations, + so different node types can appear here too. The output + of <code class="command">EXPLAIN</code> has one line for each node in the plan + tree, showing the basic node type plus the cost estimates that the planner + made for the execution of that plan node. Additional lines might appear, + indented from the node's summary line, + to show additional properties of the node. + The very first line (the summary line for the topmost + node) has the estimated total execution cost for the plan; it is this + number that the planner seeks to minimize. + </p><p> + Here is a trivial example, just to show what the output looks like: + +</p><pre class="screen"> +EXPLAIN SELECT * FROM tenk1; + + QUERY PLAN +------------------------------------------------------------- + Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244) +</pre><p> + </p><p> + Since this query has no <code class="literal">WHERE</code> clause, it must scan all the + rows of the table, so the planner has chosen to use a simple sequential + scan plan. The numbers that are quoted in parentheses are (left + to right): + + </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> + Estimated start-up cost. This is the time expended before the output + phase can begin, e.g., time to do the sorting in a sort node. + </p></li><li class="listitem"><p> + Estimated total cost. This is stated on the assumption that the plan + node is run to completion, i.e., all available rows are retrieved. + In practice a node's parent node might stop short of reading all + available rows (see the <code class="literal">LIMIT</code> example below). + </p></li><li class="listitem"><p> + Estimated number of rows output by this plan node. Again, the node + is assumed to be run to completion. + </p></li><li class="listitem"><p> + Estimated average width of rows output by this plan node (in bytes). + </p></li></ul></div><p> + </p><p> + The costs are measured in arbitrary units determined by the planner's + cost parameters (see <a class="xref" href="runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS" title="20.7.2. Planner Cost Constants">Section 20.7.2</a>). + Traditional practice is to measure the costs in units of disk page + fetches; that is, <a class="xref" href="runtime-config-query.html#GUC-SEQ-PAGE-COST">seq_page_cost</a> is conventionally + set to <code class="literal">1.0</code> and the other cost parameters are set relative + to that. The examples in this section are run with the default cost + parameters. + </p><p> + It's important to understand that the cost of an upper-level node includes + the cost of all its child nodes. It's also important to realize that + the cost only reflects things that the planner cares about. + In particular, the cost does not consider the time spent transmitting + result rows to the client, which could be an important + factor in the real elapsed time; but the planner ignores it because + it cannot change it by altering the plan. (Every correct plan will + output the same row set, we trust.) + </p><p> + The <code class="literal">rows</code> value is a little tricky because it is + not the number of rows processed or scanned by the + plan node, but rather the number emitted by the node. This is often + less than the number scanned, as a result of filtering by any + <code class="literal">WHERE</code>-clause conditions that are being applied at the node. + Ideally the top-level rows estimate will approximate the number of rows + actually returned, updated, or deleted by the query. + </p><p> + Returning to our example: + +</p><pre class="screen"> +EXPLAIN SELECT * FROM tenk1; + + QUERY PLAN +------------------------------------------------------------- + Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244) +</pre><p> + </p><p> + These numbers are derived very straightforwardly. If you do: + +</p><pre class="programlisting"> +SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1'; +</pre><p> + + you will find that <code class="classname">tenk1</code> has 358 disk + pages and 10000 rows. The estimated cost is computed as (disk pages read * + <a class="xref" href="runtime-config-query.html#GUC-SEQ-PAGE-COST">seq_page_cost</a>) + (rows scanned * + <a class="xref" href="runtime-config-query.html#GUC-CPU-TUPLE-COST">cpu_tuple_cost</a>). By default, + <code class="varname">seq_page_cost</code> is 1.0 and <code class="varname">cpu_tuple_cost</code> is 0.01, + so the estimated cost is (358 * 1.0) + (10000 * 0.01) = 458. + </p><p> + Now let's modify the query to add a <code class="literal">WHERE</code> condition: + +</p><pre class="screen"> +EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 7000; + + QUERY PLAN +------------------------------------------------------------ + Seq Scan on tenk1 (cost=0.00..483.00 rows=7001 width=244) + Filter: (unique1 < 7000) +</pre><p> + + Notice that the <code class="command">EXPLAIN</code> output shows the <code class="literal">WHERE</code> + clause being applied as a <span class="quote">“<span class="quote">filter</span>”</span> condition attached to the Seq + Scan plan node. This means that + the plan node checks the condition for each row it scans, and outputs + only the ones that pass the condition. + The estimate of output rows has been reduced because of the + <code class="literal">WHERE</code> clause. + However, the scan will still have to visit all 10000 rows, so the cost + hasn't decreased; in fact it has gone up a bit (by 10000 * <a class="xref" href="runtime-config-query.html#GUC-CPU-OPERATOR-COST">cpu_operator_cost</a>, to be exact) to reflect the extra CPU + time spent checking the <code class="literal">WHERE</code> condition. + </p><p> + The actual number of rows this query would select is 7000, but the <code class="literal">rows</code> + estimate is only approximate. If you try to duplicate this experiment, + you will probably get a slightly different estimate; moreover, it can + change after each <code class="command">ANALYZE</code> command, because the + statistics produced by <code class="command">ANALYZE</code> are taken from a + randomized sample of the table. + </p><p> + Now, let's make the condition more restrictive: + +</p><pre class="screen"> +EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100; + + QUERY PLAN +------------------------------------------------------------------------------ + Bitmap Heap Scan on tenk1 (cost=5.07..229.20 rows=101 width=244) + Recheck Cond: (unique1 < 100) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) + Index Cond: (unique1 < 100) +</pre><p> + + Here the planner has decided to use a two-step plan: the child plan + node visits an index to find the locations of rows matching the index + condition, and then the upper plan node actually fetches those rows + from the table itself. Fetching rows separately is much more + expensive than reading them sequentially, but because not all the pages + of the table have to be visited, this is still cheaper than a sequential + scan. (The reason for using two plan levels is that the upper plan + node sorts the row locations identified by the index into physical order + before reading them, to minimize the cost of separate fetches. + The <span class="quote">“<span class="quote">bitmap</span>”</span> mentioned in the node names is the mechanism that + does the sorting.) + </p><p> + Now let's add another condition to the <code class="literal">WHERE</code> clause: + +</p><pre class="screen"> +EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100 AND stringu1 = 'xxx'; + + QUERY PLAN +------------------------------------------------------------------------------ + Bitmap Heap Scan on tenk1 (cost=5.04..229.43 rows=1 width=244) + Recheck Cond: (unique1 < 100) + Filter: (stringu1 = 'xxx'::name) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) + Index Cond: (unique1 < 100) +</pre><p> + + The added condition <code class="literal">stringu1 = 'xxx'</code> reduces the + output row count estimate, but not the cost because we still have to visit + the same set of rows. Notice that the <code class="literal">stringu1</code> clause + cannot be applied as an index condition, since this index is only on + the <code class="literal">unique1</code> column. Instead it is applied as a filter on + the rows retrieved by the index. Thus the cost has actually gone up + slightly to reflect this extra checking. + </p><p> + In some cases the planner will prefer a <span class="quote">“<span class="quote">simple</span>”</span> index scan plan: + +</p><pre class="screen"> +EXPLAIN SELECT * FROM tenk1 WHERE unique1 = 42; + + QUERY PLAN +----------------------------------------------------------------------------- + Index Scan using tenk1_unique1 on tenk1 (cost=0.29..8.30 rows=1 width=244) + Index Cond: (unique1 = 42) +</pre><p> + + In this type of plan the table rows are fetched in index order, which + makes them even more expensive to read, but there are so few that the + extra cost of sorting the row locations is not worth it. You'll most + often see this plan type for queries that fetch just a single row. It's + also often used for queries that have an <code class="literal">ORDER BY</code> condition + that matches the index order, because then no extra sorting step is needed + to satisfy the <code class="literal">ORDER BY</code>. In this example, adding + <code class="literal">ORDER BY unique1</code> would use the same plan because the + index already implicitly provides the requested ordering. + </p><p> + The planner may implement an <code class="literal">ORDER BY</code> clause in several + ways. The above example shows that such an ordering clause may be + implemented implicitly. The planner may also add an explicit + <code class="literal">sort</code> step: + +</p><pre class="screen"> +EXPLAIN SELECT * FROM tenk1 ORDER BY unique1; + QUERY PLAN +------------------------------------------------------------------- + Sort (cost=1109.39..1134.39 rows=10000 width=244) + Sort Key: unique1 + -> Seq Scan on tenk1 (cost=0.00..445.00 rows=10000 width=244) +</pre><p> + + If a part of the plan guarantees an ordering on a prefix of the + required sort keys, then the planner may instead decide to use an + <code class="literal">incremental sort</code> step: + +</p><pre class="screen"> +EXPLAIN SELECT * FROM tenk1 ORDER BY four, ten LIMIT 100; + QUERY PLAN +------------------------------------------------------------------------------------------------------ + Limit (cost=521.06..538.05 rows=100 width=244) + -> Incremental Sort (cost=521.06..2220.95 rows=10000 width=244) + Sort Key: four, ten + Presorted Key: four + -> Index Scan using index_tenk1_on_four on tenk1 (cost=0.29..1510.08 rows=10000 width=244) +</pre><p> + + Compared to regular sorts, sorting incrementally allows returning tuples + before the entire result set has been sorted, which particularly enables + optimizations with <code class="literal">LIMIT</code> queries. It may also reduce + memory usage and the likelihood of spilling sorts to disk, but it comes at + the cost of the increased overhead of splitting the result set into multiple + sorting batches. + </p><p> + If there are separate indexes on several of the columns referenced + in <code class="literal">WHERE</code>, the planner might choose to use an AND or OR + combination of the indexes: + +</p><pre class="screen"> +EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000; + + QUERY PLAN +------------------------------------------------------------------------------------- + Bitmap Heap Scan on tenk1 (cost=25.08..60.21 rows=10 width=244) + Recheck Cond: ((unique1 < 100) AND (unique2 > 9000)) + -> BitmapAnd (cost=25.08..25.08 rows=10 width=0) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) + Index Cond: (unique1 < 100) + -> Bitmap Index Scan on tenk1_unique2 (cost=0.00..19.78 rows=999 width=0) + Index Cond: (unique2 > 9000) +</pre><p> + + But this requires visiting both indexes, so it's not necessarily a win + compared to using just one index and treating the other condition as + a filter. If you vary the ranges involved you'll see the plan change + accordingly. + </p><p> + Here is an example showing the effects of <code class="literal">LIMIT</code>: + +</p><pre class="screen"> +EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000 LIMIT 2; + + QUERY PLAN +------------------------------------------------------------------------------------- + Limit (cost=0.29..14.48 rows=2 width=244) + -> Index Scan using tenk1_unique2 on tenk1 (cost=0.29..71.27 rows=10 width=244) + Index Cond: (unique2 > 9000) + Filter: (unique1 < 100) +</pre><p> + </p><p> + This is the same query as above, but we added a <code class="literal">LIMIT</code> so that + not all the rows need be retrieved, and the planner changed its mind about + what to do. Notice that the total cost and row count of the Index Scan + node are shown as if it were run to completion. However, the Limit node + is expected to stop after retrieving only a fifth of those rows, so its + total cost is only a fifth as much, and that's the actual estimated cost + of the query. This plan is preferred over adding a Limit node to the + previous plan because the Limit could not avoid paying the startup cost + of the bitmap scan, so the total cost would be something over 25 units + with that approach. + </p><p> + Let's try joining two tables, using the columns we have been discussing: + +</p><pre class="screen"> +EXPLAIN SELECT * +FROM tenk1 t1, tenk2 t2 +WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; + + QUERY PLAN +-------------------------------------------------------------------------------------- + Nested Loop (cost=4.65..118.62 rows=10 width=488) + -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) + Recheck Cond: (unique1 < 10) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) + Index Cond: (unique1 < 10) + -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) + Index Cond: (unique2 = t1.unique2) +</pre><p> + </p><p> + In this plan, we have a nested-loop join node with two table scans as + inputs, or children. The indentation of the node summary lines reflects + the plan tree structure. The join's first, or <span class="quote">“<span class="quote">outer</span>”</span>, child + is a bitmap scan similar to those we saw before. Its cost and row count + are the same as we'd get from <code class="literal">SELECT ... WHERE unique1 < 10</code> + because we are + applying the <code class="literal">WHERE</code> clause <code class="literal">unique1 < 10</code> + at that node. + The <code class="literal">t1.unique2 = t2.unique2</code> clause is not relevant yet, + so it doesn't affect the row count of the outer scan. The nested-loop + join node will run its second, + or <span class="quote">“<span class="quote">inner</span>”</span> child once for each row obtained from the outer child. + Column values from the current outer row can be plugged into the inner + scan; here, the <code class="literal">t1.unique2</code> value from the outer row is available, + so we get a plan and costs similar to what we saw above for a simple + <code class="literal">SELECT ... WHERE t2.unique2 = <em class="replaceable"><code>constant</code></em></code> case. + (The estimated cost is actually a bit lower than what was seen above, + as a result of caching that's expected to occur during the repeated + index scans on <code class="literal">t2</code>.) The + costs of the loop node are then set on the basis of the cost of the outer + scan, plus one repetition of the inner scan for each outer row (10 * 7.91, + here), plus a little CPU time for join processing. + </p><p> + In this example the join's output row count is the same as the product + of the two scans' row counts, but that's not true in all cases because + there can be additional <code class="literal">WHERE</code> clauses that mention both tables + and so can only be applied at the join point, not to either input scan. + Here's an example: + +</p><pre class="screen"> +EXPLAIN SELECT * +FROM tenk1 t1, tenk2 t2 +WHERE t1.unique1 < 10 AND t2.unique2 < 10 AND t1.hundred < t2.hundred; + + QUERY PLAN +--------------------------------------------------------------------------------------------- + Nested Loop (cost=4.65..49.46 rows=33 width=488) + Join Filter: (t1.hundred < t2.hundred) + -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) + Recheck Cond: (unique1 < 10) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) + Index Cond: (unique1 < 10) + -> Materialize (cost=0.29..8.51 rows=10 width=244) + -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..8.46 rows=10 width=244) + Index Cond: (unique2 < 10) +</pre><p> + + The condition <code class="literal">t1.hundred < t2.hundred</code> can't be + tested in the <code class="literal">tenk2_unique2</code> index, so it's applied at the + join node. This reduces the estimated output row count of the join node, + but does not change either input scan. + </p><p> + Notice that here the planner has chosen to <span class="quote">“<span class="quote">materialize</span>”</span> the inner + relation of the join, by putting a Materialize plan node atop it. This + means that the <code class="literal">t2</code> index scan will be done just once, even + though the nested-loop join node needs to read that data ten times, once + for each row from the outer relation. The Materialize node saves the data + in memory as it's read, and then returns the data from memory on each + subsequent pass. + </p><p> + When dealing with outer joins, you might see join plan nodes with both + <span class="quote">“<span class="quote">Join Filter</span>”</span> and plain <span class="quote">“<span class="quote">Filter</span>”</span> conditions attached. + Join Filter conditions come from the outer join's <code class="literal">ON</code> clause, + so a row that fails the Join Filter condition could still get emitted as + a null-extended row. But a plain Filter condition is applied after the + outer-join rules and so acts to remove rows unconditionally. In an inner + join there is no semantic difference between these types of filters. + </p><p> + If we change the query's selectivity a bit, we might get a very different + join plan: + +</p><pre class="screen"> +EXPLAIN SELECT * +FROM tenk1 t1, tenk2 t2 +WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; + + QUERY PLAN +------------------------------------------------------------------------------------------ + Hash Join (cost=230.47..713.98 rows=101 width=488) + Hash Cond: (t2.unique2 = t1.unique2) + -> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) + -> Hash (cost=229.20..229.20 rows=101 width=244) + -> Bitmap Heap Scan on tenk1 t1 (cost=5.07..229.20 rows=101 width=244) + Recheck Cond: (unique1 < 100) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) + Index Cond: (unique1 < 100) +</pre><p> + </p><p> + Here, the planner has chosen to use a hash join, in which rows of one + table are entered into an in-memory hash table, after which the other + table is scanned and the hash table is probed for matches to each row. + Again note how the indentation reflects the plan structure: the bitmap + scan on <code class="literal">tenk1</code> is the input to the Hash node, which constructs + the hash table. That's then returned to the Hash Join node, which reads + rows from its outer child plan and searches the hash table for each one. + </p><p> + Another possible type of join is a merge join, illustrated here: + +</p><pre class="screen"> +EXPLAIN SELECT * +FROM tenk1 t1, onek t2 +WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; + + QUERY PLAN +------------------------------------------------------------------------------------------ + Merge Join (cost=198.11..268.19 rows=10 width=488) + Merge Cond: (t1.unique2 = t2.unique2) + -> Index Scan using tenk1_unique2 on tenk1 t1 (cost=0.29..656.28 rows=101 width=244) + Filter: (unique1 < 100) + -> Sort (cost=197.83..200.33 rows=1000 width=244) + Sort Key: t2.unique2 + -> Seq Scan on onek t2 (cost=0.00..148.00 rows=1000 width=244) +</pre><p> + </p><p> + Merge join requires its input data to be sorted on the join keys. In this + plan the <code class="literal">tenk1</code> data is sorted by using an index scan to visit + the rows in the correct order, but a sequential scan and sort is preferred + for <code class="literal">onek</code>, because there are many more rows to be visited in + that table. + (Sequential-scan-and-sort frequently beats an index scan for sorting many rows, + because of the nonsequential disk access required by the index scan.) + </p><p> + One way to look at variant plans is to force the planner to disregard + whatever strategy it thought was the cheapest, using the enable/disable + flags described in <a class="xref" href="runtime-config-query.html#RUNTIME-CONFIG-QUERY-ENABLE" title="20.7.1. Planner Method Configuration">Section 20.7.1</a>. + (This is a crude tool, but useful. See + also <a class="xref" href="explicit-joins.html" title="14.3. Controlling the Planner with Explicit JOIN Clauses">Section 14.3</a>.) + For example, if we're unconvinced that sequential-scan-and-sort is the best way to + deal with table <code class="literal">onek</code> in the previous example, we could try + +</p><pre class="screen"> +SET enable_sort = off; + +EXPLAIN SELECT * +FROM tenk1 t1, onek t2 +WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; + + QUERY PLAN +------------------------------------------------------------------------------------------ + Merge Join (cost=0.56..292.65 rows=10 width=488) + Merge Cond: (t1.unique2 = t2.unique2) + -> Index Scan using tenk1_unique2 on tenk1 t1 (cost=0.29..656.28 rows=101 width=244) + Filter: (unique1 < 100) + -> Index Scan using onek_unique2 on onek t2 (cost=0.28..224.79 rows=1000 width=244) +</pre><p> + + which shows that the planner thinks that sorting <code class="literal">onek</code> by + index-scanning is about 12% more expensive than sequential-scan-and-sort. + Of course, the next question is whether it's right about that. + We can investigate that using <code class="command">EXPLAIN ANALYZE</code>, as discussed + below. + </p></div><div class="sect2" id="USING-EXPLAIN-ANALYZE"><div class="titlepage"><div><div><h3 class="title">14.1.2. <code class="command">EXPLAIN ANALYZE</code></h3></div></div></div><p> + It is possible to check the accuracy of the planner's estimates + by using <code class="command">EXPLAIN</code>'s <code class="literal">ANALYZE</code> option. With this + option, <code class="command">EXPLAIN</code> actually executes the query, and then displays + the true row counts and true run time accumulated within each plan node, + along with the same estimates that a plain <code class="command">EXPLAIN</code> + shows. For example, we might get a result like this: + +</p><pre class="screen"> +EXPLAIN ANALYZE SELECT * +FROM tenk1 t1, tenk2 t2 +WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; + + QUERY PLAN +--------------------------------------------------------------------------------------------------------------------------------- + Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1) + -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1) + Recheck Cond: (unique1 < 10) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1) + Index Cond: (unique1 < 10) + -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10) + Index Cond: (unique2 = t1.unique2) + Planning time: 0.181 ms + Execution time: 0.501 ms +</pre><p> + + Note that the <span class="quote">“<span class="quote">actual time</span>”</span> values are in milliseconds of + real time, whereas the <code class="literal">cost</code> estimates are expressed in + arbitrary units; so they are unlikely to match up. + The thing that's usually most important to look for is whether the + estimated row counts are reasonably close to reality. In this example + the estimates were all dead-on, but that's quite unusual in practice. + </p><p> + In some query plans, it is possible for a subplan node to be executed more + than once. For example, the inner index scan will be executed once per + outer row in the above nested-loop plan. In such cases, the + <code class="literal">loops</code> value reports the + total number of executions of the node, and the actual time and rows + values shown are averages per-execution. This is done to make the numbers + comparable with the way that the cost estimates are shown. Multiply by + the <code class="literal">loops</code> value to get the total time actually spent in + the node. In the above example, we spent a total of 0.220 milliseconds + executing the index scans on <code class="literal">tenk2</code>. + </p><p> + In some cases <code class="command">EXPLAIN ANALYZE</code> shows additional execution + statistics beyond the plan node execution times and row counts. + For example, Sort and Hash nodes provide extra information: + +</p><pre class="screen"> +EXPLAIN ANALYZE SELECT * +FROM tenk1 t1, tenk2 t2 +WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2 ORDER BY t1.fivethous; + + QUERY PLAN +-------------------------------------------------------------------------------------------------------------------------------------------- + Sort (cost=717.34..717.59 rows=101 width=488) (actual time=7.761..7.774 rows=100 loops=1) + Sort Key: t1.fivethous + Sort Method: quicksort Memory: 77kB + -> Hash Join (cost=230.47..713.98 rows=101 width=488) (actual time=0.711..7.427 rows=100 loops=1) + Hash Cond: (t2.unique2 = t1.unique2) + -> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) (actual time=0.007..2.583 rows=10000 loops=1) + -> Hash (cost=229.20..229.20 rows=101 width=244) (actual time=0.659..0.659 rows=100 loops=1) + Buckets: 1024 Batches: 1 Memory Usage: 28kB + -> Bitmap Heap Scan on tenk1 t1 (cost=5.07..229.20 rows=101 width=244) (actual time=0.080..0.526 rows=100 loops=1) + Recheck Cond: (unique1 < 100) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) (actual time=0.049..0.049 rows=100 loops=1) + Index Cond: (unique1 < 100) + Planning time: 0.194 ms + Execution time: 8.008 ms +</pre><p> + + The Sort node shows the sort method used (in particular, whether the sort + was in-memory or on-disk) and the amount of memory or disk space needed. + The Hash node shows the number of hash buckets and batches as well as the + peak amount of memory used for the hash table. (If the number of batches + exceeds one, there will also be disk space usage involved, but that is not + shown.) + </p><p> + Another type of extra information is the number of rows removed by a + filter condition: + +</p><pre class="screen"> +EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE ten < 7; + + QUERY PLAN +--------------------------------------------------------------------------------------------------------- + Seq Scan on tenk1 (cost=0.00..483.00 rows=7000 width=244) (actual time=0.016..5.107 rows=7000 loops=1) + Filter: (ten < 7) + Rows Removed by Filter: 3000 + Planning time: 0.083 ms + Execution time: 5.905 ms +</pre><p> + + These counts can be particularly valuable for filter conditions applied at + join nodes. The <span class="quote">“<span class="quote">Rows Removed</span>”</span> line only appears when at least + one scanned row, or potential join pair in the case of a join node, + is rejected by the filter condition. + </p><p> + A case similar to filter conditions occurs with <span class="quote">“<span class="quote">lossy</span>”</span> + index scans. For example, consider this search for polygons containing a + specific point: + +</p><pre class="screen"> +EXPLAIN ANALYZE SELECT * FROM polygon_tbl WHERE f1 @> polygon '(0.5,2.0)'; + + QUERY PLAN +------------------------------------------------------------------------------------------------------ + Seq Scan on polygon_tbl (cost=0.00..1.05 rows=1 width=32) (actual time=0.044..0.044 rows=0 loops=1) + Filter: (f1 @> '((0.5,2))'::polygon) + Rows Removed by Filter: 4 + Planning time: 0.040 ms + Execution time: 0.083 ms +</pre><p> + + The planner thinks (quite correctly) that this sample table is too small + to bother with an index scan, so we have a plain sequential scan in which + all the rows got rejected by the filter condition. But if we force an + index scan to be used, we see: + +</p><pre class="screen"> +SET enable_seqscan TO off; + +EXPLAIN ANALYZE SELECT * FROM polygon_tbl WHERE f1 @> polygon '(0.5,2.0)'; + + QUERY PLAN +-------------------------------------------------------------------------------------------------------------------------- + Index Scan using gpolygonind on polygon_tbl (cost=0.13..8.15 rows=1 width=32) (actual time=0.062..0.062 rows=0 loops=1) + Index Cond: (f1 @> '((0.5,2))'::polygon) + Rows Removed by Index Recheck: 1 + Planning time: 0.034 ms + Execution time: 0.144 ms +</pre><p> + + Here we can see that the index returned one candidate row, which was + then rejected by a recheck of the index condition. This happens because a + GiST index is <span class="quote">“<span class="quote">lossy</span>”</span> for polygon containment tests: it actually + returns the rows with polygons that overlap the target, and then we have + to do the exact containment test on those rows. + </p><p> + <code class="command">EXPLAIN</code> has a <code class="literal">BUFFERS</code> option that can be used with + <code class="literal">ANALYZE</code> to get even more run time statistics: + +</p><pre class="screen"> +EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000; + + QUERY PLAN +--------------------------------------------------------------------------------------------------------------------------------- + Bitmap Heap Scan on tenk1 (cost=25.08..60.21 rows=10 width=244) (actual time=0.323..0.342 rows=10 loops=1) + Recheck Cond: ((unique1 < 100) AND (unique2 > 9000)) + Buffers: shared hit=15 + -> BitmapAnd (cost=25.08..25.08 rows=10 width=0) (actual time=0.309..0.309 rows=0 loops=1) + Buffers: shared hit=7 + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) (actual time=0.043..0.043 rows=100 loops=1) + Index Cond: (unique1 < 100) + Buffers: shared hit=2 + -> Bitmap Index Scan on tenk1_unique2 (cost=0.00..19.78 rows=999 width=0) (actual time=0.227..0.227 rows=999 loops=1) + Index Cond: (unique2 > 9000) + Buffers: shared hit=5 + Planning time: 0.088 ms + Execution time: 0.423 ms +</pre><p> + + The numbers provided by <code class="literal">BUFFERS</code> help to identify which parts + of the query are the most I/O-intensive. + </p><p> + Keep in mind that because <code class="command">EXPLAIN ANALYZE</code> actually + runs the query, any side-effects will happen as usual, even though + whatever results the query might output are discarded in favor of + printing the <code class="command">EXPLAIN</code> data. If you want to analyze a + data-modifying query without changing your tables, you can + roll the command back afterwards, for example: + +</p><pre class="screen"> +BEGIN; + +EXPLAIN ANALYZE UPDATE tenk1 SET hundred = hundred + 1 WHERE unique1 < 100; + + QUERY PLAN +-------------------------------------------------------------------------------------------------------------------------------- + Update on tenk1 (cost=5.08..230.08 rows=0 width=0) (actual time=3.791..3.792 rows=0 loops=1) + -> Bitmap Heap Scan on tenk1 (cost=5.08..230.08 rows=102 width=10) (actual time=0.069..0.513 rows=100 loops=1) + Recheck Cond: (unique1 < 100) + Heap Blocks: exact=90 + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.05 rows=102 width=0) (actual time=0.036..0.037 rows=300 loops=1) + Index Cond: (unique1 < 100) + Planning Time: 0.113 ms + Execution Time: 3.850 ms + +ROLLBACK; +</pre><p> + </p><p> + As seen in this example, when the query is an <code class="command">INSERT</code>, + <code class="command">UPDATE</code>, or <code class="command">DELETE</code> command, the actual work of + applying the table changes is done by a top-level Insert, Update, + or Delete plan node. The plan nodes underneath this node perform + the work of locating the old rows and/or computing the new data. + So above, we see the same sort of bitmap table scan we've seen already, + and its output is fed to an Update node that stores the updated rows. + It's worth noting that although the data-modifying node can take a + considerable amount of run time (here, it's consuming the lion's share + of the time), the planner does not currently add anything to the cost + estimates to account for that work. That's because the work to be done is + the same for every correct query plan, so it doesn't affect planning + decisions. + </p><p> + When an <code class="command">UPDATE</code> or <code class="command">DELETE</code> command affects an + inheritance hierarchy, the output might look like this: + +</p><pre class="screen"> +EXPLAIN UPDATE parent SET f2 = f2 + 1 WHERE f1 = 101; + QUERY PLAN +------------------------------------------------------------------------------------------------------ + Update on parent (cost=0.00..24.59 rows=0 width=0) + Update on parent parent_1 + Update on child1 parent_2 + Update on child2 parent_3 + Update on child3 parent_4 + -> Result (cost=0.00..24.59 rows=4 width=14) + -> Append (cost=0.00..24.54 rows=4 width=14) + -> Seq Scan on parent parent_1 (cost=0.00..0.00 rows=1 width=14) + Filter: (f1 = 101) + -> Index Scan using child1_pkey on child1 parent_2 (cost=0.15..8.17 rows=1 width=14) + Index Cond: (f1 = 101) + -> Index Scan using child2_pkey on child2 parent_3 (cost=0.15..8.17 rows=1 width=14) + Index Cond: (f1 = 101) + -> Index Scan using child3_pkey on child3 parent_4 (cost=0.15..8.17 rows=1 width=14) + Index Cond: (f1 = 101) +</pre><p> + + In this example the Update node needs to consider three child tables as + well as the originally-mentioned parent table. So there are four input + scanning subplans, one per table. For clarity, the Update node is + annotated to show the specific target tables that will be updated, in the + same order as the corresponding subplans. + </p><p> + The <code class="literal">Planning time</code> shown by <code class="command">EXPLAIN + ANALYZE</code> is the time it took to generate the query plan from the + parsed query and optimize it. It does not include parsing or rewriting. + </p><p> + The <code class="literal">Execution time</code> shown by <code class="command">EXPLAIN + ANALYZE</code> includes executor start-up and shut-down time, as well + as the time to run any triggers that are fired, but it does not include + parsing, rewriting, or planning time. + Time spent executing <code class="literal">BEFORE</code> triggers, if any, is included in + the time for the related Insert, Update, or Delete node; but time + spent executing <code class="literal">AFTER</code> triggers is not counted there because + <code class="literal">AFTER</code> triggers are fired after completion of the whole plan. + The total time spent in each trigger + (either <code class="literal">BEFORE</code> or <code class="literal">AFTER</code>) is also shown separately. + Note that deferred constraint triggers will not be executed + until end of transaction and are thus not considered at all by + <code class="command">EXPLAIN ANALYZE</code>. + </p></div><div class="sect2" id="USING-EXPLAIN-CAVEATS"><div class="titlepage"><div><div><h3 class="title">14.1.3. Caveats</h3></div></div></div><p> + There are two significant ways in which run times measured by + <code class="command">EXPLAIN ANALYZE</code> can deviate from normal execution of + the same query. First, since no output rows are delivered to the client, + network transmission costs and I/O conversion costs are not included. + Second, the measurement overhead added by <code class="command">EXPLAIN + ANALYZE</code> can be significant, especially on machines with slow + <code class="function">gettimeofday()</code> operating-system calls. You can use the + <a class="xref" href="pgtesttiming.html" title="pg_test_timing"><span class="refentrytitle"><span class="application">pg_test_timing</span></span></a> tool to measure the overhead of timing + on your system. + </p><p> + <code class="command">EXPLAIN</code> results should not be extrapolated to situations + much different from the one you are actually testing; for example, + results on a toy-sized table cannot be assumed to apply to large tables. + The planner's cost estimates are not linear and so it might choose + a different plan for a larger or smaller table. An extreme example + is that on a table that only occupies one disk page, you'll nearly + always get a sequential scan plan whether indexes are available or not. + The planner realizes that it's going to take one disk page read to + process the table in any case, so there's no value in expending additional + page reads to look at an index. (We saw this happening in the + <code class="literal">polygon_tbl</code> example above.) + </p><p> + There are cases in which the actual and estimated values won't match up + well, but nothing is really wrong. One such case occurs when + plan node execution is stopped short by a <code class="literal">LIMIT</code> or similar + effect. For example, in the <code class="literal">LIMIT</code> query we used before, + +</p><pre class="screen"> +EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000 LIMIT 2; + + QUERY PLAN +------------------------------------------------------------------------------------------------------------------------------- + Limit (cost=0.29..14.71 rows=2 width=244) (actual time=0.177..0.249 rows=2 loops=1) + -> Index Scan using tenk1_unique2 on tenk1 (cost=0.29..72.42 rows=10 width=244) (actual time=0.174..0.244 rows=2 loops=1) + Index Cond: (unique2 > 9000) + Filter: (unique1 < 100) + Rows Removed by Filter: 287 + Planning time: 0.096 ms + Execution time: 0.336 ms +</pre><p> + + the estimated cost and row count for the Index Scan node are shown as + though it were run to completion. But in reality the Limit node stopped + requesting rows after it got two, so the actual row count is only 2 and + the run time is less than the cost estimate would suggest. This is not + an estimation error, only a discrepancy in the way the estimates and true + values are displayed. + </p><p> + Merge joins also have measurement artifacts that can confuse the unwary. + A merge join will stop reading one input if it's exhausted the other input + and the next key value in the one input is greater than the last key value + of the other input; in such a case there can be no more matches and so no + need to scan the rest of the first input. This results in not reading all + of one child, with results like those mentioned for <code class="literal">LIMIT</code>. + Also, if the outer (first) child contains rows with duplicate key values, + the inner (second) child is backed up and rescanned for the portion of its + rows matching that key value. <code class="command">EXPLAIN ANALYZE</code> counts these + repeated emissions of the same inner rows as if they were real additional + rows. When there are many outer duplicates, the reported actual row count + for the inner child plan node can be significantly larger than the number + of rows that are actually in the inner relation. + </p><p> + BitmapAnd and BitmapOr nodes always report their actual row counts as zero, + due to implementation limitations. + </p><p> + Normally, <code class="command">EXPLAIN</code> will display every plan node + created by the planner. However, there are cases where the executor + can determine that certain nodes need not be executed because they + cannot produce any rows, based on parameter values that were not + available at planning time. (Currently this can only happen for child + nodes of an Append or MergeAppend node that is scanning a partitioned + table.) When this happens, those plan nodes are omitted from + the <code class="command">EXPLAIN</code> output and a <code class="literal">Subplans + Removed: <em class="replaceable"><code>N</code></em></code> annotation appears + instead. + </p></div></div><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navfooter"><hr></hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="performance-tips.html" title="Chapter 14. Performance Tips">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="performance-tips.html" title="Chapter 14. Performance Tips">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="planner-stats.html" title="14.2. Statistics Used by the Planner">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 14. Performance Tips </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 14.2. Statistics Used by the Planner</td></tr></table></div></body></html>
\ No newline at end of file |