diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-04 12:19:15 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-04 12:19:15 +0000 |
commit | 6eb9c5a5657d1fe77b55cc261450f3538d35a94d (patch) | |
tree | 657d8194422a5daccecfd42d654b8a245ef7b4c8 /doc/src/sgml/perform.sgml | |
parent | Initial commit. (diff) | |
download | postgresql-13-upstream.tar.xz postgresql-13-upstream.zip |
Adding upstream version 13.4.upstream/13.4upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/perform.sgml')
-rw-r--r-- | doc/src/sgml/perform.sgml | 1956 |
1 files changed, 1956 insertions, 0 deletions
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml new file mode 100644 index 0000000..1cd9f50 --- /dev/null +++ b/doc/src/sgml/perform.sgml @@ -0,0 +1,1956 @@ +<!-- doc/src/sgml/perform.sgml --> + + <chapter id="performance-tips"> + <title>Performance Tips</title> + + <indexterm zone="performance-tips"> + <primary>performance</primary> + </indexterm> + + <para> + Query performance can be affected by many things. Some of these can + be controlled by the user, while others are fundamental to the underlying + design of the system. This chapter provides some hints about understanding + and tuning <productname>PostgreSQL</productname> performance. + </para> + + <sect1 id="using-explain"> + <title>Using <command>EXPLAIN</command></title> + + <indexterm zone="using-explain"> + <primary>EXPLAIN</primary> + </indexterm> + + <indexterm zone="using-explain"> + <primary>query plan</primary> + </indexterm> + + <para> + <productname>PostgreSQL</productname> devises a <firstterm>query + plan</firstterm> for each query it receives. Choosing the right + plan to match the query structure and the properties of the data + is absolutely critical for good performance, so the system includes + a complex <firstterm>planner</firstterm> that tries to choose good plans. + You can use the <xref linkend="sql-explain"/> command + to see what query plan the planner creates for any query. + Plan-reading is an art that requires some experience to master, + but this section attempts to cover the basics. + </para> + + <para> + Examples in this section are drawn from the regression test database + after doing a <command>VACUUM ANALYZE</command>, using 9.3 development sources. + You should be able to get similar results if you try the examples + yourself, but your estimated costs and row counts might vary slightly + because <command>ANALYZE</command>'s statistics are random samples rather + than exact, and because costs are inherently somewhat platform-dependent. + </para> + + <para> + The examples use <command>EXPLAIN</command>'s default <quote>text</quote> output + format, which is compact and convenient for humans to read. + If you want to feed <command>EXPLAIN</command>'s output to a program for further + analysis, you should use one of its machine-readable output formats + (XML, JSON, or YAML) instead. + </para> + + <sect2 id="using-explain-basics"> + <title><command>EXPLAIN</command> Basics</title> + + <para> + The structure of a query plan is a tree of <firstterm>plan nodes</firstterm>. + Nodes at the bottom level of the tree are scan nodes: they return raw rows + from a table. There are different types of scan nodes for different + table access methods: sequential scans, index scans, and bitmap index + scans. There are also non-table row sources, such as <literal>VALUES</literal> + clauses and set-returning functions in <literal>FROM</literal>, which have their + own scan node types. + If the query requires joining, aggregation, sorting, or other + operations on the raw rows, then there will be additional nodes + above the scan nodes to perform these operations. Again, + there is usually more than one possible way to do these operations, + so different node types can appear here too. The output + of <command>EXPLAIN</command> has one line for each node in the plan + tree, showing the basic node type plus the cost estimates that the planner + made for the execution of that plan node. Additional lines might appear, + indented from the node's summary line, + to show additional properties of the node. + The very first line (the summary line for the topmost + node) has the estimated total execution cost for the plan; it is this + number that the planner seeks to minimize. + </para> + + <para> + Here is a trivial example, just to show what the output looks like: + +<screen> +EXPLAIN SELECT * FROM tenk1; + + QUERY PLAN +------------------------------------------------------------- + Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244) +</screen> + </para> + + <para> + Since this query has no <literal>WHERE</literal> clause, it must scan all the + rows of the table, so the planner has chosen to use a simple sequential + scan plan. The numbers that are quoted in parentheses are (left + to right): + + <itemizedlist> + <listitem> + <para> + Estimated start-up cost. This is the time expended before the output + phase can begin, e.g., time to do the sorting in a sort node. + </para> + </listitem> + + <listitem> + <para> + Estimated total cost. This is stated on the assumption that the plan + node is run to completion, i.e., all available rows are retrieved. + In practice a node's parent node might stop short of reading all + available rows (see the <literal>LIMIT</literal> example below). + </para> + </listitem> + + <listitem> + <para> + Estimated number of rows output by this plan node. Again, the node + is assumed to be run to completion. + </para> + </listitem> + + <listitem> + <para> + Estimated average width of rows output by this plan node (in bytes). + </para> + </listitem> + </itemizedlist> + </para> + + <para> + The costs are measured in arbitrary units determined by the planner's + cost parameters (see <xref linkend="runtime-config-query-constants"/>). + Traditional practice is to measure the costs in units of disk page + fetches; that is, <xref linkend="guc-seq-page-cost"/> is conventionally + set to <literal>1.0</literal> and the other cost parameters are set relative + to that. The examples in this section are run with the default cost + parameters. + </para> + + <para> + It's important to understand that the cost of an upper-level node includes + the cost of all its child nodes. It's also important to realize that + the cost only reflects things that the planner cares about. + In particular, the cost does not consider the time spent transmitting + result rows to the client, which could be an important + factor in the real elapsed time; but the planner ignores it because + it cannot change it by altering the plan. (Every correct plan will + output the same row set, we trust.) + </para> + + <para> + The <literal>rows</literal> value is a little tricky because it is + not the number of rows processed or scanned by the + plan node, but rather the number emitted by the node. This is often + less than the number scanned, as a result of filtering by any + <literal>WHERE</literal>-clause conditions that are being applied at the node. + Ideally the top-level rows estimate will approximate the number of rows + actually returned, updated, or deleted by the query. + </para> + + <para> + Returning to our example: + +<screen> +EXPLAIN SELECT * FROM tenk1; + + QUERY PLAN +------------------------------------------------------------- + Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244) +</screen> + </para> + + <para> + These numbers are derived very straightforwardly. If you do: + +<programlisting> +SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1'; +</programlisting> + + you will find that <classname>tenk1</classname> has 358 disk + pages and 10000 rows. The estimated cost is computed as (disk pages read * + <xref linkend="guc-seq-page-cost"/>) + (rows scanned * + <xref linkend="guc-cpu-tuple-cost"/>). By default, + <varname>seq_page_cost</varname> is 1.0 and <varname>cpu_tuple_cost</varname> is 0.01, + so the estimated cost is (358 * 1.0) + (10000 * 0.01) = 458. + </para> + + <para> + Now let's modify the query to add a <literal>WHERE</literal> condition: + +<screen> +EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 7000; + + QUERY PLAN +------------------------------------------------------------ + Seq Scan on tenk1 (cost=0.00..483.00 rows=7001 width=244) + Filter: (unique1 < 7000) +</screen> + + Notice that the <command>EXPLAIN</command> output shows the <literal>WHERE</literal> + clause being applied as a <quote>filter</quote> condition attached to the Seq + Scan plan node. This means that + the plan node checks the condition for each row it scans, and outputs + only the ones that pass the condition. + The estimate of output rows has been reduced because of the + <literal>WHERE</literal> clause. + However, the scan will still have to visit all 10000 rows, so the cost + hasn't decreased; in fact it has gone up a bit (by 10000 * <xref + linkend="guc-cpu-operator-cost"/>, to be exact) to reflect the extra CPU + time spent checking the <literal>WHERE</literal> condition. + </para> + + <para> + The actual number of rows this query would select is 7000, but the <literal>rows</literal> + estimate is only approximate. If you try to duplicate this experiment, + you will probably get a slightly different estimate; moreover, it can + change after each <command>ANALYZE</command> command, because the + statistics produced by <command>ANALYZE</command> are taken from a + randomized sample of the table. + </para> + + <para> + Now, let's make the condition more restrictive: + +<screen> +EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;----------- + Bitmap Heap Scan on tenk1 (cost=5.07..229.20 rows=101 width=244) + Recheck Cond: (unique1 < 100) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) + Index Cond: (unique1 < 100) +</screen> + + Here the planner has decided to use a two-step plan: the child plan + node visits an index to find the locations of rows matching the index + condition, and then the upper plan node actually fetches those rows + from the table itself. Fetching rows separately is much more + expensive than reading them sequentially, but because not all the pages + of the table have to be visited, this is still cheaper than a sequential + scan. (The reason for using two plan levels is that the upper plan + node sorts the row locations identified by the index into physical order + before reading them, to minimize the cost of separate fetches. + The <quote>bitmap</quote> mentioned in the node names is the mechanism that + does the sorting.) + </para> + + <para> + Now let's add another condition to the <literal>WHERE</literal> clause: + +<screen> +EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100 AND stringu1 = 'xxx'; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;----------- + Bitmap Heap Scan on tenk1 (cost=5.04..229.43 rows=1 width=244) + Recheck Cond: (unique1 < 100) + Filter: (stringu1 = 'xxx'::name) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) + Index Cond: (unique1 < 100) +</screen> + + The added condition <literal>stringu1 = 'xxx'</literal> reduces the + output row count estimate, but not the cost because we still have to visit + the same set of rows. Notice that the <literal>stringu1</literal> clause + cannot be applied as an index condition, since this index is only on + the <literal>unique1</literal> column. Instead it is applied as a filter on + the rows retrieved by the index. Thus the cost has actually gone up + slightly to reflect this extra checking. + </para> + + <para> + In some cases the planner will prefer a <quote>simple</quote> index scan plan: + +<screen> +EXPLAIN SELECT * FROM tenk1 WHERE unique1 = 42; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;---------- + Index Scan using tenk1_unique1 on tenk1 (cost=0.29..8.30 rows=1 width=244) + Index Cond: (unique1 = 42) +</screen> + + In this type of plan the table rows are fetched in index order, which + makes them even more expensive to read, but there are so few that the + extra cost of sorting the row locations is not worth it. You'll most + often see this plan type for queries that fetch just a single row. It's + also often used for queries that have an <literal>ORDER BY</literal> condition + that matches the index order, because then no extra sorting step is needed + to satisfy the <literal>ORDER BY</literal>. In this example, adding + <literal>ORDER BY unique1</literal> would use the same plan because the + index already implicitly provides the requested ordering. + </para> + + <para> + The planner may implement an <literal>ORDER BY</literal> clause in several + ways. The above example shows that such an ordering clause may be + implemented implicitly. The planner may also add an explicit + <literal>sort</literal> step: + +<screen> +EXPLAIN SELECT * FROM tenk1 ORDER BY unique1; + QUERY PLAN +------------------------------------------------------------------- + Sort (cost=1109.39..1134.39 rows=10000 width=244) + Sort Key: unique1 + -> Seq Scan on tenk1 (cost=0.00..445.00 rows=10000 width=244) +</screen> + + If a part of the plan guarantees an ordering on a prefix of the + required sort keys, then the planner may instead decide to use an + <literal>incremental sort</literal> step: + +<screen> +EXPLAIN SELECT * FROM tenk1 ORDER BY four, ten LIMIT 100; + QUERY PLAN +-------------------------------------------------------------------&zwsp;----------------------------------- + Limit (cost=521.06..538.05 rows=100 width=244) + -> Incremental Sort (cost=521.06..2220.95 rows=10000 width=244) + Sort Key: four, ten + Presorted Key: four + -> Index Scan using index_tenk1_on_four on tenk1 (cost=0.29..1510.08 rows=10000 width=244) +</screen> + + Compared to regular sorts, sorting incrementally allows returning tuples + before the entire result set has been sorted, which particularly enables + optimizations with <literal>LIMIT</literal> queries. It may also reduce + memory usage and the likelihood of spilling sorts to disk, but it comes at + the cost of the increased overhead of splitting the result set into multiple + sorting batches. + </para> + + <para> + If there are separate indexes on several of the columns referenced + in <literal>WHERE</literal>, the planner might choose to use an AND or OR + combination of the indexes: + +<screen> +EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;------------------ + Bitmap Heap Scan on tenk1 (cost=25.08..60.21 rows=10 width=244) + Recheck Cond: ((unique1 < 100) AND (unique2 > 9000)) + -> BitmapAnd (cost=25.08..25.08 rows=10 width=0) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) + Index Cond: (unique1 < 100) + -> Bitmap Index Scan on tenk1_unique2 (cost=0.00..19.78 rows=999 width=0) + Index Cond: (unique2 > 9000) +</screen> + + But this requires visiting both indexes, so it's not necessarily a win + compared to using just one index and treating the other condition as + a filter. If you vary the ranges involved you'll see the plan change + accordingly. + </para> + + <para> + Here is an example showing the effects of <literal>LIMIT</literal>: + +<screen> +EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000 LIMIT 2; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;------------------ + Limit (cost=0.29..14.48 rows=2 width=244) + -> Index Scan using tenk1_unique2 on tenk1 (cost=0.29..71.27 rows=10 width=244) + Index Cond: (unique2 > 9000) + Filter: (unique1 < 100) +</screen> + </para> + + <para> + This is the same query as above, but we added a <literal>LIMIT</literal> so that + not all the rows need be retrieved, and the planner changed its mind about + what to do. Notice that the total cost and row count of the Index Scan + node are shown as if it were run to completion. However, the Limit node + is expected to stop after retrieving only a fifth of those rows, so its + total cost is only a fifth as much, and that's the actual estimated cost + of the query. This plan is preferred over adding a Limit node to the + previous plan because the Limit could not avoid paying the startup cost + of the bitmap scan, so the total cost would be something over 25 units + with that approach. + </para> + + <para> + Let's try joining two tables, using the columns we have been discussing: + +<screen> +EXPLAIN SELECT * +FROM tenk1 t1, tenk2 t2 +WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;------------------- + Nested Loop (cost=4.65..118.62 rows=10 width=488) + -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) + Recheck Cond: (unique1 < 10) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) + Index Cond: (unique1 < 10) + -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) + Index Cond: (unique2 = t1.unique2) +</screen> + </para> + + <para> + In this plan, we have a nested-loop join node with two table scans as + inputs, or children. The indentation of the node summary lines reflects + the plan tree structure. The join's first, or <quote>outer</quote>, child + is a bitmap scan similar to those we saw before. Its cost and row count + are the same as we'd get from <literal>SELECT ... WHERE unique1 < 10</literal> + because we are + applying the <literal>WHERE</literal> clause <literal>unique1 < 10</literal> + at that node. + The <literal>t1.unique2 = t2.unique2</literal> clause is not relevant yet, + so it doesn't affect the row count of the outer scan. The nested-loop + join node will run its second, + or <quote>inner</quote> child once for each row obtained from the outer child. + Column values from the current outer row can be plugged into the inner + scan; here, the <literal>t1.unique2</literal> value from the outer row is available, + so we get a plan and costs similar to what we saw above for a simple + <literal>SELECT ... WHERE t2.unique2 = <replaceable>constant</replaceable></literal> case. + (The estimated cost is actually a bit lower than what was seen above, + as a result of caching that's expected to occur during the repeated + index scans on <literal>t2</literal>.) The + costs of the loop node are then set on the basis of the cost of the outer + scan, plus one repetition of the inner scan for each outer row (10 * 7.91, + here), plus a little CPU time for join processing. + </para> + + <para> + In this example the join's output row count is the same as the product + of the two scans' row counts, but that's not true in all cases because + there can be additional <literal>WHERE</literal> clauses that mention both tables + and so can only be applied at the join point, not to either input scan. + Here's an example: + +<screen> +EXPLAIN SELECT * +FROM tenk1 t1, tenk2 t2 +WHERE t1.unique1 < 10 AND t2.unique2 < 10 AND t1.hundred < t2.hundred; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;-------------------------- + Nested Loop (cost=4.65..49.46 rows=33 width=488) + Join Filter: (t1.hundred < t2.hundred) + -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) + Recheck Cond: (unique1 < 10) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) + Index Cond: (unique1 < 10) + -> Materialize (cost=0.29..8.51 rows=10 width=244) + -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..8.46 rows=10 width=244) + Index Cond: (unique2 < 10) +</screen> + + The condition <literal>t1.hundred < t2.hundred</literal> can't be + tested in the <literal>tenk2_unique2</literal> index, so it's applied at the + join node. This reduces the estimated output row count of the join node, + but does not change either input scan. + </para> + + <para> + Notice that here the planner has chosen to <quote>materialize</quote> the inner + relation of the join, by putting a Materialize plan node atop it. This + means that the <literal>t2</literal> index scan will be done just once, even + though the nested-loop join node needs to read that data ten times, once + for each row from the outer relation. The Materialize node saves the data + in memory as it's read, and then returns the data from memory on each + subsequent pass. + </para> + + <para> + When dealing with outer joins, you might see join plan nodes with both + <quote>Join Filter</quote> and plain <quote>Filter</quote> conditions attached. + Join Filter conditions come from the outer join's <literal>ON</literal> clause, + so a row that fails the Join Filter condition could still get emitted as + a null-extended row. But a plain Filter condition is applied after the + outer-join rules and so acts to remove rows unconditionally. In an inner + join there is no semantic difference between these types of filters. + </para> + + <para> + If we change the query's selectivity a bit, we might get a very different + join plan: + +<screen> +EXPLAIN SELECT * +FROM tenk1 t1, tenk2 t2 +WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;----------------------- + Hash Join (cost=230.47..713.98 rows=101 width=488) + Hash Cond: (t2.unique2 = t1.unique2) + -> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) + -> Hash (cost=229.20..229.20 rows=101 width=244) + -> Bitmap Heap Scan on tenk1 t1 (cost=5.07..229.20 rows=101 width=244) + Recheck Cond: (unique1 < 100) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) + Index Cond: (unique1 < 100) +</screen> + </para> + + <para> + Here, the planner has chosen to use a hash join, in which rows of one + table are entered into an in-memory hash table, after which the other + table is scanned and the hash table is probed for matches to each row. + Again note how the indentation reflects the plan structure: the bitmap + scan on <literal>tenk1</literal> is the input to the Hash node, which constructs + the hash table. That's then returned to the Hash Join node, which reads + rows from its outer child plan and searches the hash table for each one. + </para> + + <para> + Another possible type of join is a merge join, illustrated here: + +<screen> +EXPLAIN SELECT * +FROM tenk1 t1, onek t2 +WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;----------------------- + Merge Join (cost=198.11..268.19 rows=10 width=488) + Merge Cond: (t1.unique2 = t2.unique2) + -> Index Scan using tenk1_unique2 on tenk1 t1 (cost=0.29..656.28 rows=101 width=244) + Filter: (unique1 < 100) + -> Sort (cost=197.83..200.33 rows=1000 width=244) + Sort Key: t2.unique2 + -> Seq Scan on onek t2 (cost=0.00..148.00 rows=1000 width=244) +</screen> + </para> + + <para> + Merge join requires its input data to be sorted on the join keys. In this + plan the <literal>tenk1</literal> data is sorted by using an index scan to visit + the rows in the correct order, but a sequential scan and sort is preferred + for <literal>onek</literal>, because there are many more rows to be visited in + that table. + (Sequential-scan-and-sort frequently beats an index scan for sorting many rows, + because of the nonsequential disk access required by the index scan.) + </para> + + <para> + One way to look at variant plans is to force the planner to disregard + whatever strategy it thought was the cheapest, using the enable/disable + flags described in <xref linkend="runtime-config-query-enable"/>. + (This is a crude tool, but useful. See + also <xref linkend="explicit-joins"/>.) + For example, if we're unconvinced that sequential-scan-and-sort is the best way to + deal with table <literal>onek</literal> in the previous example, we could try + +<screen> +SET enable_sort = off; + +EXPLAIN SELECT * +FROM tenk1 t1, onek t2 +WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;----------------------- + Merge Join (cost=0.56..292.65 rows=10 width=488) + Merge Cond: (t1.unique2 = t2.unique2) + -> Index Scan using tenk1_unique2 on tenk1 t1 (cost=0.29..656.28 rows=101 width=244) + Filter: (unique1 < 100) + -> Index Scan using onek_unique2 on onek t2 (cost=0.28..224.79 rows=1000 width=244) +</screen> + + which shows that the planner thinks that sorting <literal>onek</literal> by + index-scanning is about 12% more expensive than sequential-scan-and-sort. + Of course, the next question is whether it's right about that. + We can investigate that using <command>EXPLAIN ANALYZE</command>, as discussed + below. + </para> + + </sect2> + + <sect2 id="using-explain-analyze"> + <title><command>EXPLAIN ANALYZE</command></title> + + <para> + It is possible to check the accuracy of the planner's estimates + by using <command>EXPLAIN</command>'s <literal>ANALYZE</literal> option. With this + option, <command>EXPLAIN</command> actually executes the query, and then displays + the true row counts and true run time accumulated within each plan node, + along with the same estimates that a plain <command>EXPLAIN</command> + shows. For example, we might get a result like this: + +<screen> +EXPLAIN ANALYZE SELECT * +FROM tenk1 t1, tenk2 t2 +WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;-------------------------------------------------------------- + Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1) + -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1) + Recheck Cond: (unique1 < 10) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1) + Index Cond: (unique1 < 10) + -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10) + Index Cond: (unique2 = t1.unique2) + Planning time: 0.181 ms + Execution time: 0.501 ms +</screen> + + Note that the <quote>actual time</quote> values are in milliseconds of + real time, whereas the <literal>cost</literal> estimates are expressed in + arbitrary units; so they are unlikely to match up. + The thing that's usually most important to look for is whether the + estimated row counts are reasonably close to reality. In this example + the estimates were all dead-on, but that's quite unusual in practice. + </para> + + <para> + In some query plans, it is possible for a subplan node to be executed more + than once. For example, the inner index scan will be executed once per + outer row in the above nested-loop plan. In such cases, the + <literal>loops</literal> value reports the + total number of executions of the node, and the actual time and rows + values shown are averages per-execution. This is done to make the numbers + comparable with the way that the cost estimates are shown. Multiply by + the <literal>loops</literal> value to get the total time actually spent in + the node. In the above example, we spent a total of 0.220 milliseconds + executing the index scans on <literal>tenk2</literal>. + </para> + + <para> + In some cases <command>EXPLAIN ANALYZE</command> shows additional execution + statistics beyond the plan node execution times and row counts. + For example, Sort and Hash nodes provide extra information: + +<screen> +EXPLAIN ANALYZE SELECT * +FROM tenk1 t1, tenk2 t2 +WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2 ORDER BY t1.fivethous; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;-------------------------------------------------------------------&zwsp;------ + Sort (cost=717.34..717.59 rows=101 width=488) (actual time=7.761..7.774 rows=100 loops=1) + Sort Key: t1.fivethous + Sort Method: quicksort Memory: 77kB + -> Hash Join (cost=230.47..713.98 rows=101 width=488) (actual time=0.711..7.427 rows=100 loops=1) + Hash Cond: (t2.unique2 = t1.unique2) + -> Seq Scan on tenk2 t2 (cost=0.00..445.00 rows=10000 width=244) (actual time=0.007..2.583 rows=10000 loops=1) + -> Hash (cost=229.20..229.20 rows=101 width=244) (actual time=0.659..0.659 rows=100 loops=1) + Buckets: 1024 Batches: 1 Memory Usage: 28kB + -> Bitmap Heap Scan on tenk1 t1 (cost=5.07..229.20 rows=101 width=244) (actual time=0.080..0.526 rows=100 loops=1) + Recheck Cond: (unique1 < 100) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) (actual time=0.049..0.049 rows=100 loops=1) + Index Cond: (unique1 < 100) + Planning time: 0.194 ms + Execution time: 8.008 ms +</screen> + + The Sort node shows the sort method used (in particular, whether the sort + was in-memory or on-disk) and the amount of memory or disk space needed. + The Hash node shows the number of hash buckets and batches as well as the + peak amount of memory used for the hash table. (If the number of batches + exceeds one, there will also be disk space usage involved, but that is not + shown.) + </para> + + <para> + Another type of extra information is the number of rows removed by a + filter condition: + +<screen> +EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE ten < 7; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;-------------------------------------- + Seq Scan on tenk1 (cost=0.00..483.00 rows=7000 width=244) (actual time=0.016..5.107 rows=7000 loops=1) + Filter: (ten < 7) + Rows Removed by Filter: 3000 + Planning time: 0.083 ms + Execution time: 5.905 ms +</screen> + + These counts can be particularly valuable for filter conditions applied at + join nodes. The <quote>Rows Removed</quote> line only appears when at least + one scanned row, or potential join pair in the case of a join node, + is rejected by the filter condition. + </para> + + <para> + A case similar to filter conditions occurs with <quote>lossy</quote> + index scans. For example, consider this search for polygons containing a + specific point: + +<screen> +EXPLAIN ANALYZE SELECT * FROM polygon_tbl WHERE f1 @> polygon '(0.5,2.0)'; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;----------------------------------- + Seq Scan on polygon_tbl (cost=0.00..1.05 rows=1 width=32) (actual time=0.044..0.044 rows=0 loops=1) + Filter: (f1 @> '((0.5,2))'::polygon) + Rows Removed by Filter: 4 + Planning time: 0.040 ms + Execution time: 0.083 ms +</screen> + + The planner thinks (quite correctly) that this sample table is too small + to bother with an index scan, so we have a plain sequential scan in which + all the rows got rejected by the filter condition. But if we force an + index scan to be used, we see: + +<screen> +SET enable_seqscan TO off; + +EXPLAIN ANALYZE SELECT * FROM polygon_tbl WHERE f1 @> polygon '(0.5,2.0)'; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;------------------------------------------------------- + Index Scan using gpolygonind on polygon_tbl (cost=0.13..8.15 rows=1 width=32) (actual time=0.062..0.062 rows=0 loops=1) + Index Cond: (f1 @> '((0.5,2))'::polygon) + Rows Removed by Index Recheck: 1 + Planning time: 0.034 ms + Execution time: 0.144 ms +</screen> + + Here we can see that the index returned one candidate row, which was + then rejected by a recheck of the index condition. This happens because a + GiST index is <quote>lossy</quote> for polygon containment tests: it actually + returns the rows with polygons that overlap the target, and then we have + to do the exact containment test on those rows. + </para> + + <para> + <command>EXPLAIN</command> has a <literal>BUFFERS</literal> option that can be used with + <literal>ANALYZE</literal> to get even more run time statistics: + +<screen> +EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;-------------------------------------------------------------- + Bitmap Heap Scan on tenk1 (cost=25.08..60.21 rows=10 width=244) (actual time=0.323..0.342 rows=10 loops=1) + Recheck Cond: ((unique1 < 100) AND (unique2 > 9000)) + Buffers: shared hit=15 + -> BitmapAnd (cost=25.08..25.08 rows=10 width=0) (actual time=0.309..0.309 rows=0 loops=1) + Buffers: shared hit=7 + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) (actual time=0.043..0.043 rows=100 loops=1) + Index Cond: (unique1 < 100) + Buffers: shared hit=2 + -> Bitmap Index Scan on tenk1_unique2 (cost=0.00..19.78 rows=999 width=0) (actual time=0.227..0.227 rows=999 loops=1) + Index Cond: (unique2 > 9000) + Buffers: shared hit=5 + Planning time: 0.088 ms + Execution time: 0.423 ms +</screen> + + The numbers provided by <literal>BUFFERS</literal> help to identify which parts + of the query are the most I/O-intensive. + </para> + + <para> + Keep in mind that because <command>EXPLAIN ANALYZE</command> actually + runs the query, any side-effects will happen as usual, even though + whatever results the query might output are discarded in favor of + printing the <command>EXPLAIN</command> data. If you want to analyze a + data-modifying query without changing your tables, you can + roll the command back afterwards, for example: + +<screen> +BEGIN; + +EXPLAIN ANALYZE UPDATE tenk1 SET hundred = hundred + 1 WHERE unique1 < 100; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;------------------------------------------------------------- + Update on tenk1 (cost=5.07..229.46 rows=101 width=250) (actual time=14.628..14.628 rows=0 loops=1) + -> Bitmap Heap Scan on tenk1 (cost=5.07..229.46 rows=101 width=250) (actual time=0.101..0.439 rows=100 loops=1) + Recheck Cond: (unique1 < 100) + -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..5.04 rows=101 width=0) (actual time=0.043..0.043 rows=100 loops=1) + Index Cond: (unique1 < 100) + Planning time: 0.079 ms + Execution time: 14.727 ms + +ROLLBACK; +</screen> + </para> + + <para> + As seen in this example, when the query is an <command>INSERT</command>, + <command>UPDATE</command>, or <command>DELETE</command> command, the actual work of + applying the table changes is done by a top-level Insert, Update, + or Delete plan node. The plan nodes underneath this node perform + the work of locating the old rows and/or computing the new data. + So above, we see the same sort of bitmap table scan we've seen already, + and its output is fed to an Update node that stores the updated rows. + It's worth noting that although the data-modifying node can take a + considerable amount of run time (here, it's consuming the lion's share + of the time), the planner does not currently add anything to the cost + estimates to account for that work. That's because the work to be done is + the same for every correct query plan, so it doesn't affect planning + decisions. + </para> + + <para> + When an <command>UPDATE</command> or <command>DELETE</command> command affects an + inheritance hierarchy, the output might look like this: + +<screen> +EXPLAIN UPDATE parent SET f2 = f2 + 1 WHERE f1 = 101; + QUERY PLAN +-------------------------------------------------------------------&zwsp;---------------- + Update on parent (cost=0.00..24.53 rows=4 width=14) + Update on parent + Update on child1 + Update on child2 + Update on child3 + -> Seq Scan on parent (cost=0.00..0.00 rows=1 width=14) + Filter: (f1 = 101) + -> Index Scan using child1_f1_key on child1 (cost=0.15..8.17 rows=1 width=14) + Index Cond: (f1 = 101) + -> Index Scan using child2_f1_key on child2 (cost=0.15..8.17 rows=1 width=14) + Index Cond: (f1 = 101) + -> Index Scan using child3_f1_key on child3 (cost=0.15..8.17 rows=1 width=14) + Index Cond: (f1 = 101) +</screen> + + In this example the Update node needs to consider three child tables as + well as the originally-mentioned parent table. So there are four input + scanning subplans, one per table. For clarity, the Update node is + annotated to show the specific target tables that will be updated, in the + same order as the corresponding subplans. (These annotations are new as + of <productname>PostgreSQL</productname> 9.5; in prior versions the reader had to + intuit the target tables by inspecting the subplans.) + </para> + + <para> + The <literal>Planning time</literal> shown by <command>EXPLAIN + ANALYZE</command> is the time it took to generate the query plan from the + parsed query and optimize it. It does not include parsing or rewriting. + </para> + + <para> + The <literal>Execution time</literal> shown by <command>EXPLAIN + ANALYZE</command> includes executor start-up and shut-down time, as well + as the time to run any triggers that are fired, but it does not include + parsing, rewriting, or planning time. + Time spent executing <literal>BEFORE</literal> triggers, if any, is included in + the time for the related Insert, Update, or Delete node; but time + spent executing <literal>AFTER</literal> triggers is not counted there because + <literal>AFTER</literal> triggers are fired after completion of the whole plan. + The total time spent in each trigger + (either <literal>BEFORE</literal> or <literal>AFTER</literal>) is also shown separately. + Note that deferred constraint triggers will not be executed + until end of transaction and are thus not considered at all by + <command>EXPLAIN ANALYZE</command>. + </para> + + </sect2> + + <sect2 id="using-explain-caveats"> + <title>Caveats</title> + + <para> + There are two significant ways in which run times measured by + <command>EXPLAIN ANALYZE</command> can deviate from normal execution of + the same query. First, since no output rows are delivered to the client, + network transmission costs and I/O conversion costs are not included. + Second, the measurement overhead added by <command>EXPLAIN + ANALYZE</command> can be significant, especially on machines with slow + <function>gettimeofday()</function> operating-system calls. You can use the + <xref linkend="pgtesttiming"/> tool to measure the overhead of timing + on your system. + </para> + + <para> + <command>EXPLAIN</command> results should not be extrapolated to situations + much different from the one you are actually testing; for example, + results on a toy-sized table cannot be assumed to apply to large tables. + The planner's cost estimates are not linear and so it might choose + a different plan for a larger or smaller table. An extreme example + is that on a table that only occupies one disk page, you'll nearly + always get a sequential scan plan whether indexes are available or not. + The planner realizes that it's going to take one disk page read to + process the table in any case, so there's no value in expending additional + page reads to look at an index. (We saw this happening in the + <literal>polygon_tbl</literal> example above.) + </para> + + <para> + There are cases in which the actual and estimated values won't match up + well, but nothing is really wrong. One such case occurs when + plan node execution is stopped short by a <literal>LIMIT</literal> or similar + effect. For example, in the <literal>LIMIT</literal> query we used before, + +<screen> +EXPLAIN ANALYZE SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000 LIMIT 2; + + QUERY PLAN +-------------------------------------------------------------------&zwsp;------------------------------------------------------------ + Limit (cost=0.29..14.71 rows=2 width=244) (actual time=0.177..0.249 rows=2 loops=1) + -> Index Scan using tenk1_unique2 on tenk1 (cost=0.29..72.42 rows=10 width=244) (actual time=0.174..0.244 rows=2 loops=1) + Index Cond: (unique2 > 9000) + Filter: (unique1 < 100) + Rows Removed by Filter: 287 + Planning time: 0.096 ms + Execution time: 0.336 ms +</screen> + + the estimated cost and row count for the Index Scan node are shown as + though it were run to completion. But in reality the Limit node stopped + requesting rows after it got two, so the actual row count is only 2 and + the run time is less than the cost estimate would suggest. This is not + an estimation error, only a discrepancy in the way the estimates and true + values are displayed. + </para> + + <para> + Merge joins also have measurement artifacts that can confuse the unwary. + A merge join will stop reading one input if it's exhausted the other input + and the next key value in the one input is greater than the last key value + of the other input; in such a case there can be no more matches and so no + need to scan the rest of the first input. This results in not reading all + of one child, with results like those mentioned for <literal>LIMIT</literal>. + Also, if the outer (first) child contains rows with duplicate key values, + the inner (second) child is backed up and rescanned for the portion of its + rows matching that key value. <command>EXPLAIN ANALYZE</command> counts these + repeated emissions of the same inner rows as if they were real additional + rows. When there are many outer duplicates, the reported actual row count + for the inner child plan node can be significantly larger than the number + of rows that are actually in the inner relation. + </para> + + <para> + BitmapAnd and BitmapOr nodes always report their actual row counts as zero, + due to implementation limitations. + </para> + + <para> + Normally, <command>EXPLAIN</command> will display every plan node + created by the planner. However, there are cases where the executor + can determine that certain nodes need not be executed because they + cannot produce any rows, based on parameter values that were not + available at planning time. (Currently this can only happen for child + nodes of an Append or MergeAppend node that is scanning a partitioned + table.) When this happens, those plan nodes are omitted from + the <command>EXPLAIN</command> output and a <literal>Subplans + Removed: <replaceable>N</replaceable></literal> annotation appears + instead. + </para> + </sect2> + + </sect1> + + <sect1 id="planner-stats"> + <title>Statistics Used by the Planner</title> + + <indexterm zone="planner-stats"> + <primary>statistics</primary> + <secondary>of the planner</secondary> + </indexterm> + + <sect2> + <title>Single-Column Statistics</title> + <para> + As we saw in the previous section, the query planner needs to estimate + the number of rows retrieved by a query in order to make good choices + of query plans. This section provides a quick look at the statistics + that the system uses for these estimates. + </para> + + <para> + One component of the statistics is the total number of entries in + each table and index, as well as the number of disk blocks occupied + by each table and index. This information is kept in the table + <link linkend="catalog-pg-class"><structname>pg_class</structname></link>, + in the columns <structfield>reltuples</structfield> and + <structfield>relpages</structfield>. We can look at it with + queries similar to this one: + +<screen> +SELECT relname, relkind, reltuples, relpages +FROM pg_class +WHERE relname LIKE 'tenk1%'; + + relname | relkind | reltuples | relpages +----------------------+---------+-----------+---------- + tenk1 | r | 10000 | 358 + tenk1_hundred | i | 10000 | 30 + tenk1_thous_tenthous | i | 10000 | 30 + tenk1_unique1 | i | 10000 | 30 + tenk1_unique2 | i | 10000 | 30 +(5 rows) +</screen> + + Here we can see that <structname>tenk1</structname> contains 10000 + rows, as do its indexes, but the indexes are (unsurprisingly) much + smaller than the table. + </para> + + <para> + For efficiency reasons, <structfield>reltuples</structfield> + and <structfield>relpages</structfield> are not updated on-the-fly, + and so they usually contain somewhat out-of-date values. + They are updated by <command>VACUUM</command>, <command>ANALYZE</command>, and a + few DDL commands such as <command>CREATE INDEX</command>. A <command>VACUUM</command> + or <command>ANALYZE</command> operation that does not scan the entire table + (which is commonly the case) will incrementally update the + <structfield>reltuples</structfield> count on the basis of the part + of the table it did scan, resulting in an approximate value. + In any case, the planner + will scale the values it finds in <structname>pg_class</structname> + to match the current physical table size, thus obtaining a closer + approximation. + </para> + + <indexterm> + <primary>pg_statistic</primary> + </indexterm> + + <para> + Most queries retrieve only a fraction of the rows in a table, due + to <literal>WHERE</literal> clauses that restrict the rows to be + examined. The planner thus needs to make an estimate of the + <firstterm>selectivity</firstterm> of <literal>WHERE</literal> clauses, that is, + the fraction of rows that match each condition in the + <literal>WHERE</literal> clause. The information used for this task is + stored in the + <link linkend="catalog-pg-statistic"><structname>pg_statistic</structname></link> + system catalog. Entries in <structname>pg_statistic</structname> + are updated by the <command>ANALYZE</command> and <command>VACUUM + ANALYZE</command> commands, and are always approximate even when freshly + updated. + </para> + + <indexterm> + <primary>pg_stats</primary> + </indexterm> + + <para> + Rather than look at <structname>pg_statistic</structname> directly, + it's better to look at its view + <link linkend="view-pg-stats"><structname>pg_stats</structname></link> + when examining the statistics manually. <structname>pg_stats</structname> + is designed to be more easily readable. Furthermore, + <structname>pg_stats</structname> is readable by all, whereas + <structname>pg_statistic</structname> is only readable by a superuser. + (This prevents unprivileged users from learning something about + the contents of other people's tables from the statistics. The + <structname>pg_stats</structname> view is restricted to show only + rows about tables that the current user can read.) + For example, we might do: + +<screen> +SELECT attname, inherited, n_distinct, + array_to_string(most_common_vals, E'\n') as most_common_vals +FROM pg_stats +WHERE tablename = 'road'; + + attname | inherited | n_distinct | most_common_vals +---------+-----------+------------+------------------------------------ + name | f | -0.363388 | I- 580 Ramp+ + | | | I- 880 Ramp+ + | | | Sp Railroad + + | | | I- 580 + + | | | I- 680 Ramp + name | t | -0.284859 | I- 880 Ramp+ + | | | I- 580 Ramp+ + | | | I- 680 Ramp+ + | | | I- 580 + + | | | State Hwy 13 Ramp +(2 rows) +</screen> + + Note that two rows are displayed for the same column, one corresponding + to the complete inheritance hierarchy starting at the + <literal>road</literal> table (<literal>inherited</literal>=<literal>t</literal>), + and another one including only the <literal>road</literal> table itself + (<literal>inherited</literal>=<literal>f</literal>). + </para> + + <para> + The amount of information stored in <structname>pg_statistic</structname> + by <command>ANALYZE</command>, in particular the maximum number of entries in the + <structfield>most_common_vals</structfield> and <structfield>histogram_bounds</structfield> + arrays for each column, can be set on a + column-by-column basis using the <command>ALTER TABLE SET STATISTICS</command> + command, or globally by setting the + <xref linkend="guc-default-statistics-target"/> configuration variable. + The default limit is presently 100 entries. Raising the limit + might allow more accurate planner estimates to be made, particularly for + columns with irregular data distributions, at the price of consuming + more space in <structname>pg_statistic</structname> and slightly more + time to compute the estimates. Conversely, a lower limit might be + sufficient for columns with simple data distributions. + </para> + + <para> + Further details about the planner's use of statistics can be found in + <xref linkend="planner-stats-details"/>. + </para> + </sect2> + + <sect2 id="planner-stats-extended"> + <title>Extended Statistics</title> + + <indexterm zone="planner-stats-extended"> + <primary>statistics</primary> + <secondary>of the planner</secondary> + </indexterm> + + <indexterm> + <primary>correlation</primary> + <secondary>in the query planner</secondary> + </indexterm> + + <indexterm> + <primary>pg_statistic_ext</primary> + </indexterm> + + <indexterm> + <primary>pg_statistic_ext_data</primary> + </indexterm> + + <para> + It is common to see slow queries running bad execution plans because + multiple columns used in the query clauses are correlated. + The planner normally assumes that multiple conditions + are independent of each other, + an assumption that does not hold when column values are correlated. + Regular statistics, because of their per-individual-column nature, + cannot capture any knowledge about cross-column correlation. + However, <productname>PostgreSQL</productname> has the ability to compute + <firstterm>multivariate statistics</firstterm>, which can capture + such information. + </para> + + <para> + Because the number of possible column combinations is very large, + it's impractical to compute multivariate statistics automatically. + Instead, <firstterm>extended statistics objects</firstterm>, more often + called just <firstterm>statistics objects</firstterm>, can be created to instruct + the server to obtain statistics across interesting sets of columns. + </para> + + <para> + Statistics objects are created using the + <xref linkend="sql-createstatistics"/> command. + Creation of such an object merely creates a catalog entry expressing + interest in the statistics. Actual data collection is performed + by <command>ANALYZE</command> (either a manual command, or background + auto-analyze). The collected values can be examined in the + <link linkend="catalog-pg-statistic-ext-data"><structname>pg_statistic_ext_data</structname></link> + catalog. + </para> + + <para> + <command>ANALYZE</command> computes extended statistics based on the same + sample of table rows that it takes for computing regular single-column + statistics. Since the sample size is increased by increasing the + statistics target for the table or any of its columns (as described in + the previous section), a larger statistics target will normally result in + more accurate extended statistics, as well as more time spent calculating + them. + </para> + + <para> + The following subsections describe the kinds of extended statistics + that are currently supported. + </para> + + <sect3> + <title>Functional Dependencies</title> + + <para> + The simplest kind of extended statistics tracks <firstterm>functional + dependencies</firstterm>, a concept used in definitions of database normal forms. + We say that column <structfield>b</structfield> is functionally dependent on + column <structfield>a</structfield> if knowledge of the value of + <structfield>a</structfield> is sufficient to determine the value + of <structfield>b</structfield>, that is there are no two rows having the same value + of <structfield>a</structfield> but different values of <structfield>b</structfield>. + In a fully normalized database, functional dependencies should exist + only on primary keys and superkeys. However, in practice many data sets + are not fully normalized for various reasons; intentional + denormalization for performance reasons is a common example. + Even in a fully normalized database, there may be partial correlation + between some columns, which can be expressed as partial functional + dependency. + </para> + + <para> + The existence of functional dependencies directly affects the accuracy + of estimates in certain queries. If a query contains conditions on + both the independent and the dependent column(s), the + conditions on the dependent columns do not further reduce the result + size; but without knowledge of the functional dependency, the query + planner will assume that the conditions are independent, resulting + in underestimating the result size. + </para> + + <para> + To inform the planner about functional dependencies, <command>ANALYZE</command> + can collect measurements of cross-column dependency. Assessing the + degree of dependency between all sets of columns would be prohibitively + expensive, so data collection is limited to those groups of columns + appearing together in a statistics object defined with + the <literal>dependencies</literal> option. It is advisable to create + <literal>dependencies</literal> statistics only for column groups that are + strongly correlated, to avoid unnecessary overhead in both + <command>ANALYZE</command> and later query planning. + </para> + + <para> + Here is an example of collecting functional-dependency statistics: +<programlisting> +CREATE STATISTICS stts (dependencies) ON city, zip FROM zipcodes; + +ANALYZE zipcodes; + +SELECT stxname, stxkeys, stxddependencies + FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid) + WHERE stxname = 'stts'; + stxname | stxkeys | stxddependencies +---------+---------+------------------------------------------ + stts | 1 5 | {"1 => 5": 1.000000, "5 => 1": 0.423130} +(1 row) +</programlisting> + Here it can be seen that column 1 (zip code) fully determines column + 5 (city) so the coefficient is 1.0, while city only determines zip code + about 42% of the time, meaning that there are many cities (58%) that are + represented by more than a single ZIP code. + </para> + + <para> + When computing the selectivity for a query involving functionally + dependent columns, the planner adjusts the per-condition selectivity + estimates using the dependency coefficients so as not to produce + an underestimate. + </para> + + <sect4> + <title>Limitations of Functional Dependencies</title> + + <para> + Functional dependencies are currently only applied when considering + simple equality conditions that compare columns to constant values, + and <literal>IN</literal> clauses with constant values. + They are not used to improve estimates for equality conditions + comparing two columns or comparing a column to an expression, nor for + range clauses, <literal>LIKE</literal> or any other type of condition. + </para> + + <para> + When estimating with functional dependencies, the planner assumes that + conditions on the involved columns are compatible and hence redundant. + If they are incompatible, the correct estimate would be zero rows, but + that possibility is not considered. For example, given a query like +<programlisting> +SELECT * FROM zipcodes WHERE city = 'San Francisco' AND zip = '94105'; +</programlisting> + the planner will disregard the <structfield>city</structfield> clause as not + changing the selectivity, which is correct. However, it will make + the same assumption about +<programlisting> +SELECT * FROM zipcodes WHERE city = 'San Francisco' AND zip = '90210'; +</programlisting> + even though there will really be zero rows satisfying this query. + Functional dependency statistics do not provide enough information + to conclude that, however. + </para> + + <para> + In many practical situations, this assumption is usually satisfied; + for example, there might be a GUI in the application that only allows + selecting compatible city and ZIP code values to use in a query. + But if that's not the case, functional dependencies may not be a viable + option. + </para> + </sect4> + </sect3> + + <sect3> + <title>Multivariate N-Distinct Counts</title> + + <para> + Single-column statistics store the number of distinct values in each + column. Estimates of the number of distinct values when combining more + than one column (for example, for <literal>GROUP BY a, b</literal>) are + frequently wrong when the planner only has single-column statistical + data, causing it to select bad plans. + </para> + + <para> + To improve such estimates, <command>ANALYZE</command> can collect n-distinct + statistics for groups of columns. As before, it's impractical to do + this for every possible column grouping, so data is collected only for + those groups of columns appearing together in a statistics object + defined with the <literal>ndistinct</literal> option. Data will be collected + for each possible combination of two or more columns from the set of + listed columns. + </para> + + <para> + Continuing the previous example, the n-distinct counts in a + table of ZIP codes might look like the following: +<programlisting> +CREATE STATISTICS stts2 (ndistinct) ON city, state, zip FROM zipcodes; + +ANALYZE zipcodes; + +SELECT stxkeys AS k, stxdndistinct AS nd + FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid) + WHERE stxname = 'stts2'; +-[ RECORD 1 ]------------------------------------------------------&zwsp;-- +k | 1 2 5 +nd | {"1, 2": 33178, "1, 5": 33178, "2, 5": 27435, "1, 2, 5": 33178} +(1 row) +</programlisting> + This indicates that there are three combinations of columns that + have 33178 distinct values: ZIP code and state; ZIP code and city; + and ZIP code, city and state (the fact that they are all equal is + expected given that ZIP code alone is unique in this table). On the + other hand, the combination of city and state has only 27435 distinct + values. + </para> + + <para> + It's advisable to create <literal>ndistinct</literal> statistics objects only + on combinations of columns that are actually used for grouping, and + for which misestimation of the number of groups is resulting in bad + plans. Otherwise, the <command>ANALYZE</command> cycles are just wasted. + </para> + </sect3> + + <sect3> + <title>Multivariate MCV Lists</title> + + <para> + Another type of statistics stored for each column are most-common value + lists. This allows very accurate estimates for individual columns, but + may result in significant misestimates for queries with conditions on + multiple columns. + </para> + + <para> + To improve such estimates, <command>ANALYZE</command> can collect MCV + lists on combinations of columns. Similarly to functional dependencies + and n-distinct coefficients, it's impractical to do this for every + possible column grouping. Even more so in this case, as the MCV list + (unlike functional dependencies and n-distinct coefficients) does store + the common column values. So data is collected only for those groups + of columns appearing together in a statistics object defined with the + <literal>mcv</literal> option. + </para> + + <para> + Continuing the previous example, the MCV list for a table of ZIP codes + might look like the following (unlike for simpler types of statistics, + a function is required for inspection of MCV contents): + +<programlisting> +CREATE STATISTICS stts3 (mcv) ON city, state FROM zipcodes; + +ANALYZE zipcodes; + +SELECT m.* FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid), + pg_mcv_list_items(stxdmcv) m WHERE stxname = 'stts3'; + + index | values | nulls | frequency | base_frequency +-------+------------------------+-------+-----------+---------------- + 0 | {Washington, DC} | {f,f} | 0.003467 | 2.7e-05 + 1 | {Apo, AE} | {f,f} | 0.003067 | 1.9e-05 + 2 | {Houston, TX} | {f,f} | 0.002167 | 0.000133 + 3 | {El Paso, TX} | {f,f} | 0.002 | 0.000113 + 4 | {New York, NY} | {f,f} | 0.001967 | 0.000114 + 5 | {Atlanta, GA} | {f,f} | 0.001633 | 3.3e-05 + 6 | {Sacramento, CA} | {f,f} | 0.001433 | 7.8e-05 + 7 | {Miami, FL} | {f,f} | 0.0014 | 6e-05 + 8 | {Dallas, TX} | {f,f} | 0.001367 | 8.8e-05 + 9 | {Chicago, IL} | {f,f} | 0.001333 | 5.1e-05 + ... +(99 rows) +</programlisting> + This indicates that the most common combination of city and state is + Washington in DC, with actual frequency (in the sample) about 0.35%. + The base frequency of the combination (as computed from the simple + per-column frequencies) is only 0.0027%, resulting in two orders of + magnitude under-estimates. + </para> + + <para> + It's advisable to create <acronym>MCV</acronym> statistics objects only + on combinations of columns that are actually used in conditions together, + and for which misestimation of the number of groups is resulting in bad + plans. Otherwise, the <command>ANALYZE</command> and planning cycles + are just wasted. + </para> + </sect3> + + </sect2> + </sect1> + + <sect1 id="explicit-joins"> + <title>Controlling the Planner with Explicit <literal>JOIN</literal> Clauses</title> + + <indexterm zone="explicit-joins"> + <primary>join</primary> + <secondary>controlling the order</secondary> + </indexterm> + + <para> + It is possible + to control the query planner to some extent by using the explicit <literal>JOIN</literal> + syntax. To see why this matters, we first need some background. + </para> + + <para> + In a simple join query, such as: +<programlisting> +SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id; +</programlisting> + the planner is free to join the given tables in any order. For + example, it could generate a query plan that joins A to B, using + the <literal>WHERE</literal> condition <literal>a.id = b.id</literal>, and then + joins C to this joined table, using the other <literal>WHERE</literal> + condition. Or it could join B to C and then join A to that result. + Or it could join A to C and then join them with B — but that + would be inefficient, since the full Cartesian product of A and C + would have to be formed, there being no applicable condition in the + <literal>WHERE</literal> clause to allow optimization of the join. (All + joins in the <productname>PostgreSQL</productname> executor happen + between two input tables, so it's necessary to build up the result + in one or another of these fashions.) The important point is that + these different join possibilities give semantically equivalent + results but might have hugely different execution costs. Therefore, + the planner will explore all of them to try to find the most + efficient query plan. + </para> + + <para> + When a query only involves two or three tables, there aren't many join + orders to worry about. But the number of possible join orders grows + exponentially as the number of tables expands. Beyond ten or so input + tables it's no longer practical to do an exhaustive search of all the + possibilities, and even for six or seven tables planning might take an + annoyingly long time. When there are too many input tables, the + <productname>PostgreSQL</productname> planner will switch from exhaustive + search to a <firstterm>genetic</firstterm> probabilistic search + through a limited number of possibilities. (The switch-over threshold is + set by the <xref linkend="guc-geqo-threshold"/> run-time + parameter.) + The genetic search takes less time, but it won't + necessarily find the best possible plan. + </para> + + <para> + When the query involves outer joins, the planner has less freedom + than it does for plain (inner) joins. For example, consider: +<programlisting> +SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id); +</programlisting> + Although this query's restrictions are superficially similar to the + previous example, the semantics are different because a row must be + emitted for each row of A that has no matching row in the join of B and C. + Therefore the planner has no choice of join order here: it must join + B to C and then join A to that result. Accordingly, this query takes + less time to plan than the previous query. In other cases, the planner + might be able to determine that more than one join order is safe. + For example, given: +<programlisting> +SELECT * FROM a LEFT JOIN b ON (a.bid = b.id) LEFT JOIN c ON (a.cid = c.id); +</programlisting> + it is valid to join A to either B or C first. Currently, only + <literal>FULL JOIN</literal> completely constrains the join order. Most + practical cases involving <literal>LEFT JOIN</literal> or <literal>RIGHT JOIN</literal> + can be rearranged to some extent. + </para> + + <para> + Explicit inner join syntax (<literal>INNER JOIN</literal>, <literal>CROSS + JOIN</literal>, or unadorned <literal>JOIN</literal>) is semantically the same as + listing the input relations in <literal>FROM</literal>, so it does not + constrain the join order. + </para> + + <para> + Even though most kinds of <literal>JOIN</literal> don't completely constrain + the join order, it is possible to instruct the + <productname>PostgreSQL</productname> query planner to treat all + <literal>JOIN</literal> clauses as constraining the join order anyway. + For example, these three queries are logically equivalent: +<programlisting> +SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id; +SELECT * FROM a CROSS JOIN b CROSS JOIN c WHERE a.id = b.id AND b.ref = c.id; +SELECT * FROM a JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id); +</programlisting> + But if we tell the planner to honor the <literal>JOIN</literal> order, + the second and third take less time to plan than the first. This effect + is not worth worrying about for only three tables, but it can be a + lifesaver with many tables. + </para> + + <para> + To force the planner to follow the join order laid out by explicit + <literal>JOIN</literal>s, + set the <xref linkend="guc-join-collapse-limit"/> run-time parameter to 1. + (Other possible values are discussed below.) + </para> + + <para> + You do not need to constrain the join order completely in order to + cut search time, because it's OK to use <literal>JOIN</literal> operators + within items of a plain <literal>FROM</literal> list. For example, consider: +<programlisting> +SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...; +</programlisting> + With <varname>join_collapse_limit</varname> = 1, this + forces the planner to join A to B before joining them to other tables, + but doesn't constrain its choices otherwise. In this example, the + number of possible join orders is reduced by a factor of 5. + </para> + + <para> + Constraining the planner's search in this way is a useful technique + both for reducing planning time and for directing the planner to a + good query plan. If the planner chooses a bad join order by default, + you can force it to choose a better order via <literal>JOIN</literal> syntax + — assuming that you know of a better order, that is. Experimentation + is recommended. + </para> + + <para> + A closely related issue that affects planning time is collapsing of + subqueries into their parent query. For example, consider: +<programlisting> +SELECT * +FROM x, y, + (SELECT * FROM a, b, c WHERE something) AS ss +WHERE somethingelse; +</programlisting> + This situation might arise from use of a view that contains a join; + the view's <literal>SELECT</literal> rule will be inserted in place of the view + reference, yielding a query much like the above. Normally, the planner + will try to collapse the subquery into the parent, yielding: +<programlisting> +SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; +</programlisting> + This usually results in a better plan than planning the subquery + separately. (For example, the outer <literal>WHERE</literal> conditions might be such that + joining X to A first eliminates many rows of A, thus avoiding the need to + form the full logical output of the subquery.) But at the same time, + we have increased the planning time; here, we have a five-way join + problem replacing two separate three-way join problems. Because of the + exponential growth of the number of possibilities, this makes a big + difference. The planner tries to avoid getting stuck in huge join search + problems by not collapsing a subquery if more than <varname>from_collapse_limit</varname> + <literal>FROM</literal> items would result in the parent + query. You can trade off planning time against quality of plan by + adjusting this run-time parameter up or down. + </para> + + <para> + <xref linkend="guc-from-collapse-limit"/> and <xref + linkend="guc-join-collapse-limit"/> + are similarly named because they do almost the same thing: one controls + when the planner will <quote>flatten out</quote> subqueries, and the + other controls when it will flatten out explicit joins. Typically + you would either set <varname>join_collapse_limit</varname> equal to + <varname>from_collapse_limit</varname> (so that explicit joins and subqueries + act similarly) or set <varname>join_collapse_limit</varname> to 1 (if you want + to control join order with explicit joins). But you might set them + differently if you are trying to fine-tune the trade-off between planning + time and run time. + </para> + </sect1> + + <sect1 id="populate"> + <title>Populating a Database</title> + + <para> + One might need to insert a large amount of data when first populating + a database. This section contains some suggestions on how to make + this process as efficient as possible. + </para> + + <sect2 id="disable-autocommit"> + <title>Disable Autocommit</title> + + <indexterm> + <primary>autocommit</primary> + <secondary>bulk-loading data</secondary> + </indexterm> + + <para> + When using multiple <command>INSERT</command>s, turn off autocommit and just do + one commit at the end. (In plain + SQL, this means issuing <command>BEGIN</command> at the start and + <command>COMMIT</command> at the end. Some client libraries might + do this behind your back, in which case you need to make sure the + library does it when you want it done.) If you allow each + insertion to be committed separately, + <productname>PostgreSQL</productname> is doing a lot of work for + each row that is added. An additional benefit of doing all + insertions in one transaction is that if the insertion of one row + were to fail then the insertion of all rows inserted up to that + point would be rolled back, so you won't be stuck with partially + loaded data. + </para> + </sect2> + + <sect2 id="populate-copy-from"> + <title>Use <command>COPY</command></title> + + <para> + Use <xref linkend="sql-copy"/> to load + all the rows in one command, instead of using a series of + <command>INSERT</command> commands. The <command>COPY</command> + command is optimized for loading large numbers of rows; it is less + flexible than <command>INSERT</command>, but incurs significantly + less overhead for large data loads. Since <command>COPY</command> + is a single command, there is no need to disable autocommit if you + use this method to populate a table. + </para> + + <para> + If you cannot use <command>COPY</command>, it might help to use <xref + linkend="sql-prepare"/> to create a + prepared <command>INSERT</command> statement, and then use + <command>EXECUTE</command> as many times as required. This avoids + some of the overhead of repeatedly parsing and planning + <command>INSERT</command>. Different interfaces provide this facility + in different ways; look for <quote>prepared statements</quote> in the interface + documentation. + </para> + + <para> + Note that loading a large number of rows using + <command>COPY</command> is almost always faster than using + <command>INSERT</command>, even if <command>PREPARE</command> is used and + multiple insertions are batched into a single transaction. + </para> + + <para> + <command>COPY</command> is fastest when used within the same + transaction as an earlier <command>CREATE TABLE</command> or + <command>TRUNCATE</command> command. In such cases no WAL + needs to be written, because in case of an error, the files + containing the newly loaded data will be removed anyway. + However, this consideration only applies when + <xref linkend="guc-wal-level"/> is <literal>minimal</literal> + as all commands must write WAL otherwise. + </para> + + </sect2> + + <sect2 id="populate-rm-indexes"> + <title>Remove Indexes</title> + + <para> + If you are loading a freshly created table, the fastest method is to + create the table, bulk load the table's data using + <command>COPY</command>, then create any indexes needed for the + table. Creating an index on pre-existing data is quicker than + updating it incrementally as each row is loaded. + </para> + + <para> + If you are adding large amounts of data to an existing table, + it might be a win to drop the indexes, + load the table, and then recreate the indexes. Of course, the + database performance for other users might suffer + during the time the indexes are missing. One should also think + twice before dropping a unique index, since the error checking + afforded by the unique constraint will be lost while the index is + missing. + </para> + </sect2> + + <sect2 id="populate-rm-fkeys"> + <title>Remove Foreign Key Constraints</title> + + <para> + Just as with indexes, a foreign key constraint can be checked + <quote>in bulk</quote> more efficiently than row-by-row. So it might be + useful to drop foreign key constraints, load data, and re-create + the constraints. Again, there is a trade-off between data load + speed and loss of error checking while the constraint is missing. + </para> + + <para> + What's more, when you load data into a table with existing foreign key + constraints, each new row requires an entry in the server's list of + pending trigger events (since it is the firing of a trigger that checks + the row's foreign key constraint). Loading many millions of rows can + cause the trigger event queue to overflow available memory, leading to + intolerable swapping or even outright failure of the command. Therefore + it may be <emphasis>necessary</emphasis>, not just desirable, to drop and re-apply + foreign keys when loading large amounts of data. If temporarily removing + the constraint isn't acceptable, the only other recourse may be to split + up the load operation into smaller transactions. + </para> + </sect2> + + <sect2 id="populate-work-mem"> + <title>Increase <varname>maintenance_work_mem</varname></title> + + <para> + Temporarily increasing the <xref linkend="guc-maintenance-work-mem"/> + configuration variable when loading large amounts of data can + lead to improved performance. This will help to speed up <command>CREATE + INDEX</command> commands and <command>ALTER TABLE ADD FOREIGN KEY</command> commands. + It won't do much for <command>COPY</command> itself, so this advice is + only useful when you are using one or both of the above techniques. + </para> + </sect2> + + <sect2 id="populate-max-wal-size"> + <title>Increase <varname>max_wal_size</varname></title> + + <para> + Temporarily increasing the <xref linkend="guc-max-wal-size"/> + configuration variable can also + make large data loads faster. This is because loading a large + amount of data into <productname>PostgreSQL</productname> will + cause checkpoints to occur more often than the normal checkpoint + frequency (specified by the <varname>checkpoint_timeout</varname> + configuration variable). Whenever a checkpoint occurs, all dirty + pages must be flushed to disk. By increasing + <varname>max_wal_size</varname> temporarily during bulk + data loads, the number of checkpoints that are required can be + reduced. + </para> + </sect2> + + <sect2 id="populate-pitr"> + <title>Disable WAL Archival and Streaming Replication</title> + + <para> + When loading large amounts of data into an installation that uses + WAL archiving or streaming replication, it might be faster to take a + new base backup after the load has completed than to process a large + amount of incremental WAL data. To prevent incremental WAL logging + while loading, disable archiving and streaming replication, by setting + <xref linkend="guc-wal-level"/> to <literal>minimal</literal>, + <xref linkend="guc-archive-mode"/> to <literal>off</literal>, and + <xref linkend="guc-max-wal-senders"/> to zero. + But note that changing these settings requires a server restart. + </para> + + <para> + Aside from avoiding the time for the archiver or WAL sender to process the + WAL data, doing this will actually make certain commands faster, because + they do not to write WAL at all if <varname>wal_level</varname> + is <literal>minimal</literal> and the current subtransaction (or top-level + transaction) created or truncated the table or index they change. (They + can guarantee crash safety more cheaply by doing + an <function>fsync</function> at the end than by writing WAL.) + </para> + </sect2> + + <sect2 id="populate-analyze"> + <title>Run <command>ANALYZE</command> Afterwards</title> + + <para> + Whenever you have significantly altered the distribution of data + within a table, running <xref linkend="sql-analyze"/> is strongly recommended. This + includes bulk loading large amounts of data into the table. Running + <command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>) + ensures that the planner has up-to-date statistics about the + table. With no statistics or obsolete statistics, the planner might + make poor decisions during query planning, leading to poor + performance on any tables with inaccurate or nonexistent + statistics. Note that if the autovacuum daemon is enabled, it might + run <command>ANALYZE</command> automatically; see + <xref linkend="vacuum-for-statistics"/> + and <xref linkend="autovacuum"/> for more information. + </para> + </sect2> + + <sect2 id="populate-pg-dump"> + <title>Some Notes about <application>pg_dump</application></title> + + <para> + Dump scripts generated by <application>pg_dump</application> automatically apply + several, but not all, of the above guidelines. To reload a + <application>pg_dump</application> dump as quickly as possible, you need to + do a few extra things manually. (Note that these points apply while + <emphasis>restoring</emphasis> a dump, not while <emphasis>creating</emphasis> it. + The same points apply whether loading a text dump with + <application>psql</application> or using <application>pg_restore</application> to load + from a <application>pg_dump</application> archive file.) + </para> + + <para> + By default, <application>pg_dump</application> uses <command>COPY</command>, and when + it is generating a complete schema-and-data dump, it is careful to + load data before creating indexes and foreign keys. So in this case + several guidelines are handled automatically. What is left + for you to do is to: + <itemizedlist> + <listitem> + <para> + Set appropriate (i.e., larger than normal) values for + <varname>maintenance_work_mem</varname> and + <varname>max_wal_size</varname>. + </para> + </listitem> + <listitem> + <para> + If using WAL archiving or streaming replication, consider disabling + them during the restore. To do that, set <varname>archive_mode</varname> + to <literal>off</literal>, + <varname>wal_level</varname> to <literal>minimal</literal>, and + <varname>max_wal_senders</varname> to zero before loading the dump. + Afterwards, set them back to the right values and take a fresh + base backup. + </para> + </listitem> + <listitem> + <para> + Experiment with the parallel dump and restore modes of both + <application>pg_dump</application> and <application>pg_restore</application> and find the + optimal number of concurrent jobs to use. Dumping and restoring in + parallel by means of the <option>-j</option> option should give you a + significantly higher performance over the serial mode. + </para> + </listitem> + <listitem> + <para> + Consider whether the whole dump should be restored as a single + transaction. To do that, pass the <option>-1</option> or + <option>--single-transaction</option> command-line option to + <application>psql</application> or <application>pg_restore</application>. When using this + mode, even the smallest of errors will rollback the entire restore, + possibly discarding many hours of processing. Depending on how + interrelated the data is, that might seem preferable to manual cleanup, + or not. <command>COPY</command> commands will run fastest if you use a single + transaction and have WAL archiving turned off. + </para> + </listitem> + <listitem> + <para> + If multiple CPUs are available in the database server, consider using + <application>pg_restore</application>'s <option>--jobs</option> option. This + allows concurrent data loading and index creation. + </para> + </listitem> + <listitem> + <para> + Run <command>ANALYZE</command> afterwards. + </para> + </listitem> + </itemizedlist> + </para> + + <para> + A data-only dump will still use <command>COPY</command>, but it does not + drop or recreate indexes, and it does not normally touch foreign + keys. + + <footnote> + <para> + You can get the effect of disabling foreign keys by using + the <option>--disable-triggers</option> option — but realize that + that eliminates, rather than just postpones, foreign key + validation, and so it is possible to insert bad data if you use it. + </para> + </footnote> + + So when loading a data-only dump, it is up to you to drop and recreate + indexes and foreign keys if you wish to use those techniques. + It's still useful to increase <varname>max_wal_size</varname> + while loading the data, but don't bother increasing + <varname>maintenance_work_mem</varname>; rather, you'd do that while + manually recreating indexes and foreign keys afterwards. + And don't forget to <command>ANALYZE</command> when you're done; see + <xref linkend="vacuum-for-statistics"/> + and <xref linkend="autovacuum"/> for more information. + </para> + </sect2> + </sect1> + + <sect1 id="non-durability"> + <title>Non-Durable Settings</title> + + <indexterm zone="non-durability"> + <primary>non-durable</primary> + </indexterm> + + <para> + Durability is a database feature that guarantees the recording of + committed transactions even if the server crashes or loses + power. However, durability adds significant database overhead, + so if your site does not require such a guarantee, + <productname>PostgreSQL</productname> can be configured to run + much faster. The following are configuration changes you can make + to improve performance in such cases. Except as noted below, durability + is still guaranteed in case of a crash of the database software; + only abrupt operating system stoppage creates a risk of data loss + or corruption when these settings are used. + + <itemizedlist> + <listitem> + <para> + Place the database cluster's data directory in a memory-backed + file system (i.e., <acronym>RAM</acronym> disk). This eliminates all + database disk I/O, but limits data storage to the amount of + available memory (and perhaps swap). + </para> + </listitem> + + <listitem> + <para> + Turn off <xref linkend="guc-fsync"/>; there is no need to flush + data to disk. + </para> + </listitem> + + <listitem> + <para> + Turn off <xref linkend="guc-synchronous-commit"/>; there might be no + need to force <acronym>WAL</acronym> writes to disk on every + commit. This setting does risk transaction loss (though not data + corruption) in case of a crash of the <emphasis>database</emphasis>. + </para> + </listitem> + + <listitem> + <para> + Turn off <xref linkend="guc-full-page-writes"/>; there is no need + to guard against partial page writes. + </para> + </listitem> + + <listitem> + <para> + Increase <xref linkend="guc-max-wal-size"/> and <xref + linkend="guc-checkpoint-timeout"/>; this reduces the frequency + of checkpoints, but increases the storage requirements of + <filename>/pg_wal</filename>. + </para> + </listitem> + + <listitem> + <para> + Create <link linkend="sql-createtable-unlogged">unlogged + tables</link> to avoid <acronym>WAL</acronym> writes, though it + makes the tables non-crash-safe. + </para> + </listitem> + + </itemizedlist> + </para> + </sect1> + + </chapter> |