diff options
Diffstat (limited to 'doc/src/sgml/html/parallel-plans.html')
-rw-r--r-- | doc/src/sgml/html/parallel-plans.html | 154 |
1 files changed, 154 insertions, 0 deletions
diff --git a/doc/src/sgml/html/parallel-plans.html b/doc/src/sgml/html/parallel-plans.html new file mode 100644 index 0000000..11eb3ab --- /dev/null +++ b/doc/src/sgml/html/parallel-plans.html @@ -0,0 +1,154 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>15.3. Parallel Plans</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="when-can-parallel-query-be-used.html" title="15.2. When Can Parallel Query Be Used?" /><link rel="next" href="parallel-safety.html" title="15.4. Parallel Safety" /></head><body id="docContent" class="container-fluid col-10"><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">15.3. Parallel Plans</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="when-can-parallel-query-be-used.html" title="15.2. When Can Parallel Query Be Used?">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="parallel-query.html" title="Chapter 15. Parallel Query">Up</a></td><th width="60%" align="center">Chapter 15. Parallel Query</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="parallel-safety.html" title="15.4. Parallel Safety">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="PARALLEL-PLANS"><div class="titlepage"><div><div><h2 class="title" style="clear: both">15.3. Parallel Plans</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="parallel-plans.html#PARALLEL-SCANS">15.3.1. Parallel Scans</a></span></dt><dt><span class="sect2"><a href="parallel-plans.html#PARALLEL-JOINS">15.3.2. Parallel Joins</a></span></dt><dt><span class="sect2"><a href="parallel-plans.html#PARALLEL-AGGREGATION">15.3.3. Parallel Aggregation</a></span></dt><dt><span class="sect2"><a href="parallel-plans.html#PARALLEL-APPEND">15.3.4. Parallel Append</a></span></dt><dt><span class="sect2"><a href="parallel-plans.html#PARALLEL-PLAN-TIPS">15.3.5. Parallel Plan Tips</a></span></dt></dl></div><p> + Because each worker executes the parallel portion of the plan to + completion, it is not possible to simply take an ordinary query plan + and run it using multiple workers. Each worker would produce a full + copy of the output result set, so the query would not run any faster + than normal but would produce incorrect results. Instead, the parallel + portion of the plan must be what is known internally to the query + optimizer as a <em class="firstterm">partial plan</em>; that is, it must be constructed + so that each process that executes the plan will generate only a + subset of the output rows in such a way that each required output row + is guaranteed to be generated by exactly one of the cooperating processes. + Generally, this means that the scan on the driving table of the query + must be a parallel-aware scan. + </p><div class="sect2" id="PARALLEL-SCANS"><div class="titlepage"><div><div><h3 class="title">15.3.1. Parallel Scans</h3></div></div></div><p> + The following types of parallel-aware table scans are currently supported. + + </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> + In a <span class="emphasis"><em>parallel sequential scan</em></span>, the table's blocks will + be divided among the cooperating processes. Blocks are handed out one + at a time, so that access to the table remains sequential. + </p></li><li class="listitem"><p> + In a <span class="emphasis"><em>parallel bitmap heap scan</em></span>, one process is chosen + as the leader. That process performs a scan of one or more indexes + and builds a bitmap indicating which table blocks need to be visited. + These blocks are then divided among the cooperating processes as in + a parallel sequential scan. In other words, the heap scan is performed + in parallel, but the underlying index scan is not. + </p></li><li class="listitem"><p> + In a <span class="emphasis"><em>parallel index scan</em></span> or <span class="emphasis"><em>parallel index-only + scan</em></span>, the cooperating processes take turns reading data from the + index. Currently, parallel index scans are supported only for + btree indexes. Each process will claim a single index block and will + scan and return all tuples referenced by that block; other processes can + at the same time be returning tuples from a different index block. + The results of a parallel btree scan are returned in sorted order + within each worker process. + </p></li></ul></div><p> + + Other scan types, such as scans of non-btree indexes, may support + parallel scans in the future. + </p></div><div class="sect2" id="PARALLEL-JOINS"><div class="titlepage"><div><div><h3 class="title">15.3.2. Parallel Joins</h3></div></div></div><p> + Just as in a non-parallel plan, the driving table may be joined to one or + more other tables using a nested loop, hash join, or merge join. The + inner side of the join may be any kind of non-parallel plan that is + otherwise supported by the planner provided that it is safe to run within + a parallel worker. Depending on the join type, the inner side may also be + a parallel plan. + </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> + In a <span class="emphasis"><em>nested loop join</em></span>, the inner side is always + non-parallel. Although it is executed in full, this is efficient if + the inner side is an index scan, because the outer tuples and thus + the loops that look up values in the index are divided over the + cooperating processes. + </p></li><li class="listitem"><p> + In a <span class="emphasis"><em>merge join</em></span>, the inner side is always + a non-parallel plan and therefore executed in full. This may be + inefficient, especially if a sort must be performed, because the work + and resulting data are duplicated in every cooperating process. + </p></li><li class="listitem"><p> + In a <span class="emphasis"><em>hash join</em></span> (without the "parallel" prefix), + the inner side is executed in full by every cooperating process + to build identical copies of the hash table. This may be inefficient + if the hash table is large or the plan is expensive. In a + <span class="emphasis"><em>parallel hash join</em></span>, the inner side is a + <span class="emphasis"><em>parallel hash</em></span> that divides the work of building + a shared hash table over the cooperating processes. + </p></li></ul></div></div><div class="sect2" id="PARALLEL-AGGREGATION"><div class="titlepage"><div><div><h3 class="title">15.3.3. Parallel Aggregation</h3></div></div></div><p> + <span class="productname">PostgreSQL</span> supports parallel aggregation by aggregating in + two stages. First, each process participating in the parallel portion of + the query performs an aggregation step, producing a partial result for + each group of which that process is aware. This is reflected in the plan + as a <code class="literal">Partial Aggregate</code> node. Second, the partial results are + transferred to the leader via <code class="literal">Gather</code> or <code class="literal">Gather + Merge</code>. Finally, the leader re-aggregates the results across all + workers in order to produce the final result. This is reflected in the + plan as a <code class="literal">Finalize Aggregate</code> node. + </p><p> + Because the <code class="literal">Finalize Aggregate</code> node runs on the leader + process, queries that produce a relatively large number of groups in + comparison to the number of input rows will appear less favorable to the + query planner. For example, in the worst-case scenario the number of + groups seen by the <code class="literal">Finalize Aggregate</code> node could be as many as + the number of input rows that were seen by all worker processes in the + <code class="literal">Partial Aggregate</code> stage. For such cases, there is clearly + going to be no performance benefit to using parallel aggregation. The + query planner takes this into account during the planning process and is + unlikely to choose parallel aggregate in this scenario. + </p><p> + Parallel aggregation is not supported in all situations. Each aggregate + must be <a class="link" href="parallel-safety.html" title="15.4. Parallel Safety">safe</a> for parallelism and must + have a combine function. If the aggregate has a transition state of type + <code class="literal">internal</code>, it must have serialization and deserialization + functions. See <a class="xref" href="sql-createaggregate.html" title="CREATE AGGREGATE"><span class="refentrytitle">CREATE AGGREGATE</span></a> for more details. + Parallel aggregation is not supported if any aggregate function call + contains <code class="literal">DISTINCT</code> or <code class="literal">ORDER BY</code> clause and is also + not supported for ordered set aggregates or when the query involves + <code class="literal">GROUPING SETS</code>. It can only be used when all joins involved in + the query are also part of the parallel portion of the plan. + </p></div><div class="sect2" id="PARALLEL-APPEND"><div class="titlepage"><div><div><h3 class="title">15.3.4. Parallel Append</h3></div></div></div><p> + Whenever <span class="productname">PostgreSQL</span> needs to combine rows + from multiple sources into a single result set, it uses an + <code class="literal">Append</code> or <code class="literal">MergeAppend</code> plan node. + This commonly happens when implementing <code class="literal">UNION ALL</code> or + when scanning a partitioned table. Such nodes can be used in parallel + plans just as they can in any other plan. However, in a parallel plan, + the planner may instead use a <code class="literal">Parallel Append</code> node. + </p><p> + When an <code class="literal">Append</code> node is used in a parallel plan, each + process will execute the child plans in the order in which they appear, + so that all participating processes cooperate to execute the first child + plan until it is complete and then move to the second plan at around the + same time. When a <code class="literal">Parallel Append</code> is used instead, the + executor will instead spread out the participating processes as evenly as + possible across its child plans, so that multiple child plans are executed + simultaneously. This avoids contention, and also avoids paying the startup + cost of a child plan in those processes that never execute it. + </p><p> + Also, unlike a regular <code class="literal">Append</code> node, which can only have + partial children when used within a parallel plan, a <code class="literal">Parallel + Append</code> node can have both partial and non-partial child plans. + Non-partial children will be scanned by only a single process, since + scanning them more than once would produce duplicate results. Plans that + involve appending multiple results sets can therefore achieve + coarse-grained parallelism even when efficient partial plans are not + available. For example, consider a query against a partitioned table + that can only be implemented efficiently by using an index that does + not support parallel scans. The planner might choose a <code class="literal">Parallel + Append</code> of regular <code class="literal">Index Scan</code> plans; each + individual index scan would have to be executed to completion by a single + process, but different scans could be performed at the same time by + different processes. + </p><p> + <a class="xref" href="runtime-config-query.html#GUC-ENABLE-PARALLEL-APPEND">enable_parallel_append</a> can be used to disable + this feature. + </p></div><div class="sect2" id="PARALLEL-PLAN-TIPS"><div class="titlepage"><div><div><h3 class="title">15.3.5. Parallel Plan Tips</h3></div></div></div><p> + If a query that is expected to do so does not produce a parallel plan, + you can try reducing <a class="xref" href="runtime-config-query.html#GUC-PARALLEL-SETUP-COST">parallel_setup_cost</a> or + <a class="xref" href="runtime-config-query.html#GUC-PARALLEL-TUPLE-COST">parallel_tuple_cost</a>. Of course, this plan may turn + out to be slower than the serial plan that the planner preferred, but + this will not always be the case. If you don't get a parallel + plan even with very small values of these settings (e.g., after setting + them both to zero), there may be some reason why the query planner is + unable to generate a parallel plan for your query. See + <a class="xref" href="when-can-parallel-query-be-used.html" title="15.2. When Can Parallel Query Be Used?">Section 15.2</a> and + <a class="xref" href="parallel-safety.html" title="15.4. Parallel Safety">Section 15.4</a> for information on why this may be + the case. + </p><p> + When executing a parallel plan, you can use <code class="literal">EXPLAIN (ANALYZE, + VERBOSE)</code> to display per-worker statistics for each plan node. + This may be useful in determining whether the work is being evenly + distributed between all plan nodes and more generally in understanding the + performance characteristics of the plan. + </p></div></div><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navfooter"><hr></hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="when-can-parallel-query-be-used.html" title="15.2. When Can Parallel Query Be Used?">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="parallel-query.html" title="Chapter 15. Parallel Query">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="parallel-safety.html" title="15.4. Parallel Safety">Next</a></td></tr><tr><td width="40%" align="left" valign="top">15.2. When Can Parallel Query Be Used? </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 15.4. Parallel Safety</td></tr></table></div></body></html>
\ No newline at end of file |