summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/parallel-plans.html
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:15:05 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:15:05 +0000
commit46651ce6fe013220ed397add242004d764fc0153 (patch)
tree6e5299f990f88e60174a1d3ae6e48eedd2688b2b /doc/src/sgml/html/parallel-plans.html
parentInitial commit. (diff)
downloadpostgresql-14-46651ce6fe013220ed397add242004d764fc0153.tar.xz
postgresql-14-46651ce6fe013220ed397add242004d764fc0153.zip
Adding upstream version 14.5.upstream/14.5upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/html/parallel-plans.html')
-rw-r--r--doc/src/sgml/html/parallel-plans.html154
1 files changed, 154 insertions, 0 deletions
diff --git a/doc/src/sgml/html/parallel-plans.html b/doc/src/sgml/html/parallel-plans.html
new file mode 100644
index 0000000..11eb3ab
--- /dev/null
+++ b/doc/src/sgml/html/parallel-plans.html
@@ -0,0 +1,154 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>15.3. Parallel Plans</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="when-can-parallel-query-be-used.html" title="15.2. When Can Parallel Query Be Used?" /><link rel="next" href="parallel-safety.html" title="15.4. Parallel Safety" /></head><body id="docContent" class="container-fluid col-10"><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">15.3. Parallel Plans</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="when-can-parallel-query-be-used.html" title="15.2. When Can Parallel Query Be Used?">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="parallel-query.html" title="Chapter 15. Parallel Query">Up</a></td><th width="60%" align="center">Chapter 15. Parallel Query</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="parallel-safety.html" title="15.4. Parallel Safety">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="PARALLEL-PLANS"><div class="titlepage"><div><div><h2 class="title" style="clear: both">15.3. Parallel Plans</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="parallel-plans.html#PARALLEL-SCANS">15.3.1. Parallel Scans</a></span></dt><dt><span class="sect2"><a href="parallel-plans.html#PARALLEL-JOINS">15.3.2. Parallel Joins</a></span></dt><dt><span class="sect2"><a href="parallel-plans.html#PARALLEL-AGGREGATION">15.3.3. Parallel Aggregation</a></span></dt><dt><span class="sect2"><a href="parallel-plans.html#PARALLEL-APPEND">15.3.4. Parallel Append</a></span></dt><dt><span class="sect2"><a href="parallel-plans.html#PARALLEL-PLAN-TIPS">15.3.5. Parallel Plan Tips</a></span></dt></dl></div><p>
+ Because each worker executes the parallel portion of the plan to
+ completion, it is not possible to simply take an ordinary query plan
+ and run it using multiple workers. Each worker would produce a full
+ copy of the output result set, so the query would not run any faster
+ than normal but would produce incorrect results. Instead, the parallel
+ portion of the plan must be what is known internally to the query
+ optimizer as a <em class="firstterm">partial plan</em>; that is, it must be constructed
+ so that each process that executes the plan will generate only a
+ subset of the output rows in such a way that each required output row
+ is guaranteed to be generated by exactly one of the cooperating processes.
+ Generally, this means that the scan on the driving table of the query
+ must be a parallel-aware scan.
+ </p><div class="sect2" id="PARALLEL-SCANS"><div class="titlepage"><div><div><h3 class="title">15.3.1. Parallel Scans</h3></div></div></div><p>
+ The following types of parallel-aware table scans are currently supported.
+
+ </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
+ In a <span class="emphasis"><em>parallel sequential scan</em></span>, the table's blocks will
+ be divided among the cooperating processes. Blocks are handed out one
+ at a time, so that access to the table remains sequential.
+ </p></li><li class="listitem"><p>
+ In a <span class="emphasis"><em>parallel bitmap heap scan</em></span>, one process is chosen
+ as the leader. That process performs a scan of one or more indexes
+ and builds a bitmap indicating which table blocks need to be visited.
+ These blocks are then divided among the cooperating processes as in
+ a parallel sequential scan. In other words, the heap scan is performed
+ in parallel, but the underlying index scan is not.
+ </p></li><li class="listitem"><p>
+ In a <span class="emphasis"><em>parallel index scan</em></span> or <span class="emphasis"><em>parallel index-only
+ scan</em></span>, the cooperating processes take turns reading data from the
+ index. Currently, parallel index scans are supported only for
+ btree indexes. Each process will claim a single index block and will
+ scan and return all tuples referenced by that block; other processes can
+ at the same time be returning tuples from a different index block.
+ The results of a parallel btree scan are returned in sorted order
+ within each worker process.
+ </p></li></ul></div><p>
+
+ Other scan types, such as scans of non-btree indexes, may support
+ parallel scans in the future.
+ </p></div><div class="sect2" id="PARALLEL-JOINS"><div class="titlepage"><div><div><h3 class="title">15.3.2. Parallel Joins</h3></div></div></div><p>
+ Just as in a non-parallel plan, the driving table may be joined to one or
+ more other tables using a nested loop, hash join, or merge join. The
+ inner side of the join may be any kind of non-parallel plan that is
+ otherwise supported by the planner provided that it is safe to run within
+ a parallel worker. Depending on the join type, the inner side may also be
+ a parallel plan.
+ </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
+ In a <span class="emphasis"><em>nested loop join</em></span>, the inner side is always
+ non-parallel. Although it is executed in full, this is efficient if
+ the inner side is an index scan, because the outer tuples and thus
+ the loops that look up values in the index are divided over the
+ cooperating processes.
+ </p></li><li class="listitem"><p>
+ In a <span class="emphasis"><em>merge join</em></span>, the inner side is always
+ a non-parallel plan and therefore executed in full. This may be
+ inefficient, especially if a sort must be performed, because the work
+ and resulting data are duplicated in every cooperating process.
+ </p></li><li class="listitem"><p>
+ In a <span class="emphasis"><em>hash join</em></span> (without the "parallel" prefix),
+ the inner side is executed in full by every cooperating process
+ to build identical copies of the hash table. This may be inefficient
+ if the hash table is large or the plan is expensive. In a
+ <span class="emphasis"><em>parallel hash join</em></span>, the inner side is a
+ <span class="emphasis"><em>parallel hash</em></span> that divides the work of building
+ a shared hash table over the cooperating processes.
+ </p></li></ul></div></div><div class="sect2" id="PARALLEL-AGGREGATION"><div class="titlepage"><div><div><h3 class="title">15.3.3. Parallel Aggregation</h3></div></div></div><p>
+ <span class="productname">PostgreSQL</span> supports parallel aggregation by aggregating in
+ two stages. First, each process participating in the parallel portion of
+ the query performs an aggregation step, producing a partial result for
+ each group of which that process is aware. This is reflected in the plan
+ as a <code class="literal">Partial Aggregate</code> node. Second, the partial results are
+ transferred to the leader via <code class="literal">Gather</code> or <code class="literal">Gather
+ Merge</code>. Finally, the leader re-aggregates the results across all
+ workers in order to produce the final result. This is reflected in the
+ plan as a <code class="literal">Finalize Aggregate</code> node.
+ </p><p>
+ Because the <code class="literal">Finalize Aggregate</code> node runs on the leader
+ process, queries that produce a relatively large number of groups in
+ comparison to the number of input rows will appear less favorable to the
+ query planner. For example, in the worst-case scenario the number of
+ groups seen by the <code class="literal">Finalize Aggregate</code> node could be as many as
+ the number of input rows that were seen by all worker processes in the
+ <code class="literal">Partial Aggregate</code> stage. For such cases, there is clearly
+ going to be no performance benefit to using parallel aggregation. The
+ query planner takes this into account during the planning process and is
+ unlikely to choose parallel aggregate in this scenario.
+ </p><p>
+ Parallel aggregation is not supported in all situations. Each aggregate
+ must be <a class="link" href="parallel-safety.html" title="15.4. Parallel Safety">safe</a> for parallelism and must
+ have a combine function. If the aggregate has a transition state of type
+ <code class="literal">internal</code>, it must have serialization and deserialization
+ functions. See <a class="xref" href="sql-createaggregate.html" title="CREATE AGGREGATE"><span class="refentrytitle">CREATE AGGREGATE</span></a> for more details.
+ Parallel aggregation is not supported if any aggregate function call
+ contains <code class="literal">DISTINCT</code> or <code class="literal">ORDER BY</code> clause and is also
+ not supported for ordered set aggregates or when the query involves
+ <code class="literal">GROUPING SETS</code>. It can only be used when all joins involved in
+ the query are also part of the parallel portion of the plan.
+ </p></div><div class="sect2" id="PARALLEL-APPEND"><div class="titlepage"><div><div><h3 class="title">15.3.4. Parallel Append</h3></div></div></div><p>
+ Whenever <span class="productname">PostgreSQL</span> needs to combine rows
+ from multiple sources into a single result set, it uses an
+ <code class="literal">Append</code> or <code class="literal">MergeAppend</code> plan node.
+ This commonly happens when implementing <code class="literal">UNION ALL</code> or
+ when scanning a partitioned table. Such nodes can be used in parallel
+ plans just as they can in any other plan. However, in a parallel plan,
+ the planner may instead use a <code class="literal">Parallel Append</code> node.
+ </p><p>
+ When an <code class="literal">Append</code> node is used in a parallel plan, each
+ process will execute the child plans in the order in which they appear,
+ so that all participating processes cooperate to execute the first child
+ plan until it is complete and then move to the second plan at around the
+ same time. When a <code class="literal">Parallel Append</code> is used instead, the
+ executor will instead spread out the participating processes as evenly as
+ possible across its child plans, so that multiple child plans are executed
+ simultaneously. This avoids contention, and also avoids paying the startup
+ cost of a child plan in those processes that never execute it.
+ </p><p>
+ Also, unlike a regular <code class="literal">Append</code> node, which can only have
+ partial children when used within a parallel plan, a <code class="literal">Parallel
+ Append</code> node can have both partial and non-partial child plans.
+ Non-partial children will be scanned by only a single process, since
+ scanning them more than once would produce duplicate results. Plans that
+ involve appending multiple results sets can therefore achieve
+ coarse-grained parallelism even when efficient partial plans are not
+ available. For example, consider a query against a partitioned table
+ that can only be implemented efficiently by using an index that does
+ not support parallel scans. The planner might choose a <code class="literal">Parallel
+ Append</code> of regular <code class="literal">Index Scan</code> plans; each
+ individual index scan would have to be executed to completion by a single
+ process, but different scans could be performed at the same time by
+ different processes.
+ </p><p>
+ <a class="xref" href="runtime-config-query.html#GUC-ENABLE-PARALLEL-APPEND">enable_parallel_append</a> can be used to disable
+ this feature.
+ </p></div><div class="sect2" id="PARALLEL-PLAN-TIPS"><div class="titlepage"><div><div><h3 class="title">15.3.5. Parallel Plan Tips</h3></div></div></div><p>
+ If a query that is expected to do so does not produce a parallel plan,
+ you can try reducing <a class="xref" href="runtime-config-query.html#GUC-PARALLEL-SETUP-COST">parallel_setup_cost</a> or
+ <a class="xref" href="runtime-config-query.html#GUC-PARALLEL-TUPLE-COST">parallel_tuple_cost</a>. Of course, this plan may turn
+ out to be slower than the serial plan that the planner preferred, but
+ this will not always be the case. If you don't get a parallel
+ plan even with very small values of these settings (e.g., after setting
+ them both to zero), there may be some reason why the query planner is
+ unable to generate a parallel plan for your query. See
+ <a class="xref" href="when-can-parallel-query-be-used.html" title="15.2. When Can Parallel Query Be Used?">Section 15.2</a> and
+ <a class="xref" href="parallel-safety.html" title="15.4. Parallel Safety">Section 15.4</a> for information on why this may be
+ the case.
+ </p><p>
+ When executing a parallel plan, you can use <code class="literal">EXPLAIN (ANALYZE,
+ VERBOSE)</code> to display per-worker statistics for each plan node.
+ This may be useful in determining whether the work is being evenly
+ distributed between all plan nodes and more generally in understanding the
+ performance characteristics of the plan.
+ </p></div></div><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navfooter"><hr></hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="when-can-parallel-query-be-used.html" title="15.2. When Can Parallel Query Be Used?">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="parallel-query.html" title="Chapter 15. Parallel Query">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="parallel-safety.html" title="15.4. Parallel Safety">Next</a></td></tr><tr><td width="40%" align="left" valign="top">15.2. When Can Parallel Query Be Used? </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 15.4. Parallel Safety</td></tr></table></div></body></html> \ No newline at end of file