summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/parallel.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/parallel.sgml')
-rw-r--r--doc/src/sgml/parallel.sgml582
1 files changed, 582 insertions, 0 deletions
diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
new file mode 100644
index 0000000..c81abff
--- /dev/null
+++ b/doc/src/sgml/parallel.sgml
@@ -0,0 +1,582 @@
+<!-- doc/src/sgml/parallel.sgml -->
+
+ <chapter id="parallel-query">
+ <title>Parallel Query</title>
+
+ <indexterm zone="parallel-query">
+ <primary>parallel query</primary>
+ </indexterm>
+
+ <para>
+ <productname>PostgreSQL</productname> can devise query plans which can leverage
+ multiple CPUs in order to answer queries faster. This feature is known
+ as parallel query. Many queries cannot benefit from parallel query, either
+ due to limitations of the current implementation or because there is no
+ imaginable query plan which is any faster than the serial query plan.
+ However, for queries that can benefit, the speedup from parallel query
+ is often very significant. Many queries can run more than twice as fast
+ when using parallel query, and some queries can run four times faster or
+ even more. Queries that touch a large amount of data but return only a
+ few rows to the user will typically benefit most. This chapter explains
+ some details of how parallel query works and in which situations it can be
+ used so that users who wish to make use of it can understand what to expect.
+ </para>
+
+ <sect1 id="how-parallel-query-works">
+ <title>How Parallel Query Works</title>
+
+ <para>
+ When the optimizer determines that parallel query is the fastest execution
+ strategy for a particular query, it will create a query plan which includes
+ a <firstterm>Gather</firstterm> or <firstterm>Gather Merge</firstterm>
+ node. Here is a simple example:
+
+<screen>
+EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
+ QUERY PLAN
+-------------------------------------------------------------------&zwsp;------------------
+ Gather (cost=1000.00..217018.43 rows=1 width=97)
+ Workers Planned: 2
+ -> Parallel Seq Scan on pgbench_accounts (cost=0.00..216018.33 rows=1 width=97)
+ Filter: (filler ~~ '%x%'::text)
+(4 rows)
+</screen>
+ </para>
+
+ <para>
+ In all cases, the <literal>Gather</literal> or
+ <literal>Gather Merge</literal> node will have exactly one
+ child plan, which is the portion of the plan that will be executed in
+ parallel. If the <literal>Gather</literal> or <literal>Gather Merge</literal> node is
+ at the very top of the plan tree, then the entire query will execute in
+ parallel. If it is somewhere else in the plan tree, then only the portion
+ of the plan below it will run in parallel. In the example above, the
+ query accesses only one table, so there is only one plan node other than
+ the <literal>Gather</literal> node itself; since that plan node is a child of the
+ <literal>Gather</literal> node, it will run in parallel.
+ </para>
+
+ <para>
+ <link linkend="using-explain">Using EXPLAIN</link>, you can see the number of
+ workers chosen by the planner. When the <literal>Gather</literal> node is reached
+ during query execution, the process which is implementing the user's
+ session will request a number of <link linkend="bgworker">background
+ worker processes</link> equal to the number
+ of workers chosen by the planner. The number of background workers that
+ the planner will consider using is limited to at most
+ <xref linkend="guc-max-parallel-workers-per-gather"/>. The total number
+ of background workers that can exist at any one time is limited by both
+ <xref linkend="guc-max-worker-processes"/> and
+ <xref linkend="guc-max-parallel-workers"/>. Therefore, it is possible for a
+ parallel query to run with fewer workers than planned, or even with
+ no workers at all. The optimal plan may depend on the number of workers
+ that are available, so this can result in poor query performance. If this
+ occurrence is frequent, consider increasing
+ <varname>max_worker_processes</varname> and <varname>max_parallel_workers</varname>
+ so that more workers can be run simultaneously or alternatively reducing
+ <varname>max_parallel_workers_per_gather</varname> so that the planner
+ requests fewer workers.
+ </para>
+
+ <para>
+ Every background worker process which is successfully started for a given
+ parallel query will execute the parallel portion of the plan. The leader
+ will also execute that portion of the plan, but it has an additional
+ responsibility: it must also read all of the tuples generated by the
+ workers. When the parallel portion of the plan generates only a small
+ number of tuples, the leader will often behave very much like an additional
+ worker, speeding up query execution. Conversely, when the parallel portion
+ of the plan generates a large number of tuples, the leader may be almost
+ entirely occupied with reading the tuples generated by the workers and
+ performing any further processing steps which are required by plan nodes
+ above the level of the <literal>Gather</literal> node or
+ <literal>Gather Merge</literal> node. In such cases, the leader will
+ do very little of the work of executing the parallel portion of the plan.
+ </para>
+
+ <para>
+ When the node at the top of the parallel portion of the plan is
+ <literal>Gather Merge</literal> rather than <literal>Gather</literal>, it indicates that
+ each process executing the parallel portion of the plan is producing
+ tuples in sorted order, and that the leader is performing an
+ order-preserving merge. In contrast, <literal>Gather</literal> reads tuples
+ from the workers in whatever order is convenient, destroying any sort
+ order that may have existed.
+ </para>
+ </sect1>
+
+ <sect1 id="when-can-parallel-query-be-used">
+ <title>When Can Parallel Query Be Used?</title>
+
+ <para>
+ There are several settings which can cause the query planner not to
+ generate a parallel query plan under any circumstances. In order for
+ any parallel query plans whatsoever to be generated, the following
+ settings must be configured as indicated.
+ </para>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ <xref linkend="guc-max-parallel-workers-per-gather"/> must be set to a
+ value which is greater than zero. This is a special case of the more
+ general principle that no more workers should be used than the number
+ configured via <varname>max_parallel_workers_per_gather</varname>.
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ <para>
+ In addition, the system must not be running in single-user mode. Since
+ the entire database system is running in single process in this situation,
+ no background workers will be available.
+ </para>
+
+ <para>
+ Even when it is in general possible for parallel query plans to be
+ generated, the planner will not generate them for a given query
+ if any of the following are true:
+ </para>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ The query writes any data or locks any database rows. If a query
+ contains a data-modifying operation either at the top level or within
+ a CTE, no parallel plans for that query will be generated. As an
+ exception, the commands <literal>CREATE TABLE ... AS</literal>, <literal>SELECT
+ INTO</literal>, and <literal>CREATE MATERIALIZED VIEW</literal> which create a new
+ table and populate it can use a parallel plan.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The query might be suspended during execution. In any situation in
+ which the system thinks that partial or incremental execution might
+ occur, no parallel plan is generated. For example, a cursor created
+ using <link linkend="sql-declare">DECLARE CURSOR</link> will never use
+ a parallel plan. Similarly, a PL/pgSQL loop of the form
+ <literal>FOR x IN query LOOP .. END LOOP</literal> will never use a
+ parallel plan, because the parallel query system is unable to verify
+ that the code in the loop is safe to execute while parallel query is
+ active.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The query uses any function marked <literal>PARALLEL UNSAFE</literal>.
+ Most system-defined functions are <literal>PARALLEL SAFE</literal>,
+ but user-defined functions are marked <literal>PARALLEL
+ UNSAFE</literal> by default. See the discussion of
+ <xref linkend="parallel-safety"/>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The query is running inside of another query that is already parallel.
+ For example, if a function called by a parallel query issues an SQL
+ query itself, that query will never use a parallel plan. This is a
+ limitation of the current implementation, but it may not be desirable
+ to remove this limitation, since it could result in a single query
+ using a very large number of processes.
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ <para>
+ Even when parallel query plan is generated for a particular query, there
+ are several circumstances under which it will be impossible to execute
+ that plan in parallel at execution time. If this occurs, the leader
+ will execute the portion of the plan below the <literal>Gather</literal>
+ node entirely by itself, almost as if the <literal>Gather</literal> node were
+ not present. This will happen if any of the following conditions are met:
+ </para>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ No background workers can be obtained because of the limitation that
+ the total number of background workers cannot exceed
+ <xref linkend="guc-max-worker-processes"/>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ No background workers can be obtained because of the limitation that
+ the total number of background workers launched for purposes of
+ parallel query cannot exceed <xref linkend="guc-max-parallel-workers"/>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The client sends an Execute message with a non-zero fetch count.
+ See the discussion of the
+ <link linkend="protocol-flow-ext-query">extended query protocol</link>.
+ Since <link linkend="libpq">libpq</link> currently provides no way to
+ send such a message, this can only occur when using a client that
+ does not rely on libpq. If this is a frequent
+ occurrence, it may be a good idea to set
+ <xref linkend="guc-max-parallel-workers-per-gather"/> to zero in
+ sessions where it is likely, so as to avoid generating query plans
+ that may be suboptimal when run serially.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </sect1>
+
+ <sect1 id="parallel-plans">
+ <title>Parallel Plans</title>
+
+ <para>
+ Because each worker executes the parallel portion of the plan to
+ completion, it is not possible to simply take an ordinary query plan
+ and run it using multiple workers. Each worker would produce a full
+ copy of the output result set, so the query would not run any faster
+ than normal but would produce incorrect results. Instead, the parallel
+ portion of the plan must be what is known internally to the query
+ optimizer as a <firstterm>partial plan</firstterm>; that is, it must be constructed
+ so that each process which executes the plan will generate only a
+ subset of the output rows in such a way that each required output row
+ is guaranteed to be generated by exactly one of the cooperating processes.
+ Generally, this means that the scan on the driving table of the query
+ must be a parallel-aware scan.
+ </para>
+
+ <sect2 id="parallel-scans">
+ <title>Parallel Scans</title>
+
+ <para>
+ The following types of parallel-aware table scans are currently supported.
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ In a <emphasis>parallel sequential scan</emphasis>, the table's blocks will
+ be divided among the cooperating processes. Blocks are handed out one
+ at a time, so that access to the table remains sequential.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ In a <emphasis>parallel bitmap heap scan</emphasis>, one process is chosen
+ as the leader. That process performs a scan of one or more indexes
+ and builds a bitmap indicating which table blocks need to be visited.
+ These blocks are then divided among the cooperating processes as in
+ a parallel sequential scan. In other words, the heap scan is performed
+ in parallel, but the underlying index scan is not.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ In a <emphasis>parallel index scan</emphasis> or <emphasis>parallel index-only
+ scan</emphasis>, the cooperating processes take turns reading data from the
+ index. Currently, parallel index scans are supported only for
+ btree indexes. Each process will claim a single index block and will
+ scan and return all tuples referenced by that block; other processes can
+ at the same time be returning tuples from a different index block.
+ The results of a parallel btree scan are returned in sorted order
+ within each worker process.
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ Other scan types, such as scans of non-btree indexes, may support
+ parallel scans in the future.
+ </para>
+ </sect2>
+
+ <sect2 id="parallel-joins">
+ <title>Parallel Joins</title>
+
+ <para>
+ Just as in a non-parallel plan, the driving table may be joined to one or
+ more other tables using a nested loop, hash join, or merge join. The
+ inner side of the join may be any kind of non-parallel plan that is
+ otherwise supported by the planner provided that it is safe to run within
+ a parallel worker. Depending on the join type, the inner side may also be
+ a parallel plan.
+ </para>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ In a <emphasis>nested loop join</emphasis>, the inner side is always
+ non-parallel. Although it is executed in full, this is efficient if
+ the inner side is an index scan, because the outer tuples and thus
+ the loops that look up values in the index are divided over the
+ cooperating processes.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ In a <emphasis>merge join</emphasis>, the inner side is always
+ a non-parallel plan and therefore executed in full. This may be
+ inefficient, especially if a sort must be performed, because the work
+ and resulting data are duplicated in every cooperating process.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ In a <emphasis>hash join</emphasis> (without the "parallel" prefix),
+ the inner side is executed in full by every cooperating process
+ to build identical copies of the hash table. This may be inefficient
+ if the hash table is large or the plan is expensive. In a
+ <emphasis>parallel hash join</emphasis>, the inner side is a
+ <emphasis>parallel hash</emphasis> that divides the work of building
+ a shared hash table over the cooperating processes.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </sect2>
+
+ <sect2 id="parallel-aggregation">
+ <title>Parallel Aggregation</title>
+ <para>
+ <productname>PostgreSQL</productname> supports parallel aggregation by aggregating in
+ two stages. First, each process participating in the parallel portion of
+ the query performs an aggregation step, producing a partial result for
+ each group of which that process is aware. This is reflected in the plan
+ as a <literal>Partial Aggregate</literal> node. Second, the partial results are
+ transferred to the leader via <literal>Gather</literal> or <literal>Gather
+ Merge</literal>. Finally, the leader re-aggregates the results across all
+ workers in order to produce the final result. This is reflected in the
+ plan as a <literal>Finalize Aggregate</literal> node.
+ </para>
+
+ <para>
+ Because the <literal>Finalize Aggregate</literal> node runs on the leader
+ process, queries which produce a relatively large number of groups in
+ comparison to the number of input rows will appear less favorable to the
+ query planner. For example, in the worst-case scenario the number of
+ groups seen by the <literal>Finalize Aggregate</literal> node could be as many as
+ the number of input rows which were seen by all worker processes in the
+ <literal>Partial Aggregate</literal> stage. For such cases, there is clearly
+ going to be no performance benefit to using parallel aggregation. The
+ query planner takes this into account during the planning process and is
+ unlikely to choose parallel aggregate in this scenario.
+ </para>
+
+ <para>
+ Parallel aggregation is not supported in all situations. Each aggregate
+ must be <link linkend="parallel-safety">safe</link> for parallelism and must
+ have a combine function. If the aggregate has a transition state of type
+ <literal>internal</literal>, it must have serialization and deserialization
+ functions. See <xref linkend="sql-createaggregate"/> for more details.
+ Parallel aggregation is not supported if any aggregate function call
+ contains <literal>DISTINCT</literal> or <literal>ORDER BY</literal> clause and is also
+ not supported for ordered set aggregates or when the query involves
+ <literal>GROUPING SETS</literal>. It can only be used when all joins involved in
+ the query are also part of the parallel portion of the plan.
+ </para>
+
+ </sect2>
+
+ <sect2 id="parallel-append">
+ <title>Parallel Append</title>
+
+ <para>
+ Whenever <productname>PostgreSQL</productname> needs to combine rows
+ from multiple sources into a single result set, it uses an
+ <literal>Append</literal> or <literal>MergeAppend</literal> plan node.
+ This commonly happens when implementing <literal>UNION ALL</literal> or
+ when scanning a partitioned table. Such nodes can be used in parallel
+ plans just as they can in any other plan. However, in a parallel plan,
+ the planner may instead use a <literal>Parallel Append</literal> node.
+ </para>
+
+ <para>
+ When an <literal>Append</literal> node is used in a parallel plan, each
+ process will execute the child plans in the order in which they appear,
+ so that all participating processes cooperate to execute the first child
+ plan until it is complete and then move to the second plan at around the
+ same time. When a <literal>Parallel Append</literal> is used instead, the
+ executor will instead spread out the participating processes as evenly as
+ possible across its child plans, so that multiple child plans are executed
+ simultaneously. This avoids contention, and also avoids paying the startup
+ cost of a child plan in those processes that never execute it.
+ </para>
+
+ <para>
+ Also, unlike a regular <literal>Append</literal> node, which can only have
+ partial children when used within a parallel plan, a <literal>Parallel
+ Append</literal> node can have both partial and non-partial child plans.
+ Non-partial children will be scanned by only a single process, since
+ scanning them more than once would produce duplicate results. Plans that
+ involve appending multiple results sets can therefore achieve
+ coarse-grained parallelism even when efficient partial plans are not
+ available. For example, consider a query against a partitioned table
+ which can only be implemented efficiently by using an index that does
+ not support parallel scans. The planner might choose a <literal>Parallel
+ Append</literal> of regular <literal>Index Scan</literal> plans; each
+ individual index scan would have to be executed to completion by a single
+ process, but different scans could be performed at the same time by
+ different processes.
+ </para>
+
+ <para>
+ <xref linkend="guc-enable-parallel-append" /> can be used to disable
+ this feature.
+ </para>
+ </sect2>
+
+ <sect2 id="parallel-plan-tips">
+ <title>Parallel Plan Tips</title>
+
+ <para>
+ If a query that is expected to do so does not produce a parallel plan,
+ you can try reducing <xref linkend="guc-parallel-setup-cost"/> or
+ <xref linkend="guc-parallel-tuple-cost"/>. Of course, this plan may turn
+ out to be slower than the serial plan which the planner preferred, but
+ this will not always be the case. If you don't get a parallel
+ plan even with very small values of these settings (e.g., after setting
+ them both to zero), there may be some reason why the query planner is
+ unable to generate a parallel plan for your query. See
+ <xref linkend="when-can-parallel-query-be-used"/> and
+ <xref linkend="parallel-safety"/> for information on why this may be
+ the case.
+ </para>
+
+ <para>
+ When executing a parallel plan, you can use <literal>EXPLAIN (ANALYZE,
+ VERBOSE)</literal> to display per-worker statistics for each plan node.
+ This may be useful in determining whether the work is being evenly
+ distributed between all plan nodes and more generally in understanding the
+ performance characteristics of the plan.
+ </para>
+
+ </sect2>
+ </sect1>
+
+ <sect1 id="parallel-safety">
+ <title>Parallel Safety</title>
+
+ <para>
+ The planner classifies operations involved in a query as either
+ <firstterm>parallel safe</firstterm>, <firstterm>parallel restricted</firstterm>,
+ or <firstterm>parallel unsafe</firstterm>. A parallel safe operation is one which
+ does not conflict with the use of parallel query. A parallel restricted
+ operation is one which cannot be performed in a parallel worker, but which
+ can be performed in the leader while parallel query is in use. Therefore,
+ parallel restricted operations can never occur below a <literal>Gather</literal>
+ or <literal>Gather Merge</literal> node, but can occur elsewhere in a plan which
+ contains such a node. A parallel unsafe operation is one which cannot
+ be performed while parallel query is in use, not even in the leader.
+ When a query contains anything which is parallel unsafe, parallel query
+ is completely disabled for that query.
+ </para>
+
+ <para>
+ The following operations are always parallel restricted:
+ </para>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ Scans of common table expressions (CTEs).
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Scans of temporary tables.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Scans of foreign tables, unless the foreign data wrapper has
+ an <literal>IsForeignScanParallelSafe</literal> API which indicates otherwise.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Plan nodes to which an <literal>InitPlan</literal> is attached.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Plan nodes which reference a correlated <literal>SubPlan</literal>.
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ <sect2 id="parallel-labeling">
+ <title>Parallel Labeling for Functions and Aggregates</title>
+
+ <para>
+ The planner cannot automatically determine whether a user-defined
+ function or aggregate is parallel safe, parallel restricted, or parallel
+ unsafe, because this would require predicting every operation which the
+ function could possibly perform. In general, this is equivalent to the
+ Halting Problem and therefore impossible. Even for simple functions
+ where it could conceivably be done, we do not try, since this would be expensive
+ and error-prone. Instead, all user-defined functions are assumed to
+ be parallel unsafe unless otherwise marked. When using
+ <xref linkend="sql-createfunction"/> or
+ <xref linkend="sql-alterfunction"/>, markings can be set by specifying
+ <literal>PARALLEL SAFE</literal>, <literal>PARALLEL RESTRICTED</literal>, or
+ <literal>PARALLEL UNSAFE</literal> as appropriate. When using
+ <xref linkend="sql-createaggregate"/>, the
+ <literal>PARALLEL</literal> option can be specified with <literal>SAFE</literal>,
+ <literal>RESTRICTED</literal>, or <literal>UNSAFE</literal> as the corresponding value.
+ </para>
+
+ <para>
+ Functions and aggregates must be marked <literal>PARALLEL UNSAFE</literal> if
+ they write to the database, access sequences, change the transaction state
+ even temporarily (e.g., a PL/pgSQL function which establishes an
+ <literal>EXCEPTION</literal> block to catch errors), or make persistent changes to
+ settings. Similarly, functions must be marked <literal>PARALLEL
+ RESTRICTED</literal> if they access temporary tables, client connection state,
+ cursors, prepared statements, or miscellaneous backend-local state which
+ the system cannot synchronize across workers. For example,
+ <literal>setseed</literal> and <literal>random</literal> are parallel restricted for
+ this last reason.
+ </para>
+
+ <para>
+ In general, if a function is labeled as being safe when it is restricted or
+ unsafe, or if it is labeled as being restricted when it is in fact unsafe,
+ it may throw errors or produce wrong answers when used in a parallel query.
+ C-language functions could in theory exhibit totally undefined behavior if
+ mislabeled, since there is no way for the system to protect itself against
+ arbitrary C code, but in most likely cases the result will be no worse than
+ for any other function. If in doubt, it is probably best to label functions
+ as <literal>UNSAFE</literal>.
+ </para>
+
+ <para>
+ If a function executed within a parallel worker acquires locks which are
+ not held by the leader, for example by querying a table not referenced in
+ the query, those locks will be released at worker exit, not end of
+ transaction. If you write a function which does this, and this behavior
+ difference is important to you, mark such functions as
+ <literal>PARALLEL RESTRICTED</literal>
+ to ensure that they execute only in the leader.
+ </para>
+
+ <para>
+ Note that the query planner does not consider deferring the evaluation of
+ parallel-restricted functions or aggregates involved in the query in
+ order to obtain a superior plan. So, for example, if a <literal>WHERE</literal>
+ clause applied to a particular table is parallel restricted, the query
+ planner will not consider performing a scan of that table in the parallel
+ portion of a plan. In some cases, it would be
+ possible (and perhaps even efficient) to include the scan of that table in
+ the parallel portion of the query and defer the evaluation of the
+ <literal>WHERE</literal> clause so that it happens above the <literal>Gather</literal>
+ node. However, the planner does not do this.
+ </para>
+
+ </sect2>
+
+ </sect1>
+
+ </chapter>