summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/explicit-joins.html
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-13 13:44:03 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-13 13:44:03 +0000
commit293913568e6a7a86fd1479e1cff8e2ecb58d6568 (patch)
treefc3b469a3ec5ab71b36ea97cc7aaddb838423a0c /doc/src/sgml/html/explicit-joins.html
parentInitial commit. (diff)
downloadpostgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.tar.xz
postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.zip
Adding upstream version 16.2.upstream/16.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/html/explicit-joins.html')
-rw-r--r--doc/src/sgml/html/explicit-joins.html144
1 files changed, 144 insertions, 0 deletions
diff --git a/doc/src/sgml/html/explicit-joins.html b/doc/src/sgml/html/explicit-joins.html
new file mode 100644
index 0000000..bec5cdf
--- /dev/null
+++ b/doc/src/sgml/html/explicit-joins.html
@@ -0,0 +1,144 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>14.3. Controlling the Planner with Explicit JOIN Clauses</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="planner-stats.html" title="14.2. Statistics Used by the Planner" /><link rel="next" href="populate.html" title="14.4. Populating a Database" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">14.3. Controlling the Planner with Explicit <code class="literal">JOIN</code> Clauses</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="planner-stats.html" title="14.2. Statistics Used by the Planner">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="performance-tips.html" title="Chapter 14. Performance Tips">Up</a></td><th width="60%" align="center">Chapter 14. Performance Tips</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="populate.html" title="14.4. Populating a Database">Next</a></td></tr></table><hr /></div><div class="sect1" id="EXPLICIT-JOINS"><div class="titlepage"><div><div><h2 class="title" style="clear: both">14.3. Controlling the Planner with Explicit <code class="literal">JOIN</code> Clauses <a href="#EXPLICIT-JOINS" class="id_link">#</a></h2></div></div></div><a id="id-1.5.13.6.2" class="indexterm"></a><p>
+ It is possible
+ to control the query planner to some extent by using the explicit <code class="literal">JOIN</code>
+ syntax. To see why this matters, we first need some background.
+ </p><p>
+ In a simple join query, such as:
+</p><pre class="programlisting">
+SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;
+</pre><p>
+ the planner is free to join the given tables in any order. For
+ example, it could generate a query plan that joins A to B, using
+ the <code class="literal">WHERE</code> condition <code class="literal">a.id = b.id</code>, and then
+ joins C to this joined table, using the other <code class="literal">WHERE</code>
+ condition. Or it could join B to C and then join A to that result.
+ Or it could join A to C and then join them with B — but that
+ would be inefficient, since the full Cartesian product of A and C
+ would have to be formed, there being no applicable condition in the
+ <code class="literal">WHERE</code> clause to allow optimization of the join. (All
+ joins in the <span class="productname">PostgreSQL</span> executor happen
+ between two input tables, so it's necessary to build up the result
+ in one or another of these fashions.) The important point is that
+ these different join possibilities give semantically equivalent
+ results but might have hugely different execution costs. Therefore,
+ the planner will explore all of them to try to find the most
+ efficient query plan.
+ </p><p>
+ When a query only involves two or three tables, there aren't many join
+ orders to worry about. But the number of possible join orders grows
+ exponentially as the number of tables expands. Beyond ten or so input
+ tables it's no longer practical to do an exhaustive search of all the
+ possibilities, and even for six or seven tables planning might take an
+ annoyingly long time. When there are too many input tables, the
+ <span class="productname">PostgreSQL</span> planner will switch from exhaustive
+ search to a <em class="firstterm">genetic</em> probabilistic search
+ through a limited number of possibilities. (The switch-over threshold is
+ set by the <a class="xref" href="runtime-config-query.html#GUC-GEQO-THRESHOLD">geqo_threshold</a> run-time
+ parameter.)
+ The genetic search takes less time, but it won't
+ necessarily find the best possible plan.
+ </p><p>
+ When the query involves outer joins, the planner has less freedom
+ than it does for plain (inner) joins. For example, consider:
+</p><pre class="programlisting">
+SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
+</pre><p>
+ Although this query's restrictions are superficially similar to the
+ previous example, the semantics are different because a row must be
+ emitted for each row of A that has no matching row in the join of B and C.
+ Therefore the planner has no choice of join order here: it must join
+ B to C and then join A to that result. Accordingly, this query takes
+ less time to plan than the previous query. In other cases, the planner
+ might be able to determine that more than one join order is safe.
+ For example, given:
+</p><pre class="programlisting">
+SELECT * FROM a LEFT JOIN b ON (a.bid = b.id) LEFT JOIN c ON (a.cid = c.id);
+</pre><p>
+ it is valid to join A to either B or C first. Currently, only
+ <code class="literal">FULL JOIN</code> completely constrains the join order. Most
+ practical cases involving <code class="literal">LEFT JOIN</code> or <code class="literal">RIGHT JOIN</code>
+ can be rearranged to some extent.
+ </p><p>
+ Explicit inner join syntax (<code class="literal">INNER JOIN</code>, <code class="literal">CROSS
+ JOIN</code>, or unadorned <code class="literal">JOIN</code>) is semantically the same as
+ listing the input relations in <code class="literal">FROM</code>, so it does not
+ constrain the join order.
+ </p><p>
+ Even though most kinds of <code class="literal">JOIN</code> don't completely constrain
+ the join order, it is possible to instruct the
+ <span class="productname">PostgreSQL</span> query planner to treat all
+ <code class="literal">JOIN</code> clauses as constraining the join order anyway.
+ For example, these three queries are logically equivalent:
+</p><pre class="programlisting">
+SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;
+SELECT * FROM a CROSS JOIN b CROSS JOIN c WHERE a.id = b.id AND b.ref = c.id;
+SELECT * FROM a JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
+</pre><p>
+ But if we tell the planner to honor the <code class="literal">JOIN</code> order,
+ the second and third take less time to plan than the first. This effect
+ is not worth worrying about for only three tables, but it can be a
+ lifesaver with many tables.
+ </p><p>
+ To force the planner to follow the join order laid out by explicit
+ <code class="literal">JOIN</code>s,
+ set the <a class="xref" href="runtime-config-query.html#GUC-JOIN-COLLAPSE-LIMIT">join_collapse_limit</a> run-time parameter to 1.
+ (Other possible values are discussed below.)
+ </p><p>
+ You do not need to constrain the join order completely in order to
+ cut search time, because it's OK to use <code class="literal">JOIN</code> operators
+ within items of a plain <code class="literal">FROM</code> list. For example, consider:
+</p><pre class="programlisting">
+SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...;
+</pre><p>
+ With <code class="varname">join_collapse_limit</code> = 1, this
+ forces the planner to join A to B before joining them to other tables,
+ but doesn't constrain its choices otherwise. In this example, the
+ number of possible join orders is reduced by a factor of 5.
+ </p><p>
+ Constraining the planner's search in this way is a useful technique
+ both for reducing planning time and for directing the planner to a
+ good query plan. If the planner chooses a bad join order by default,
+ you can force it to choose a better order via <code class="literal">JOIN</code> syntax
+ — assuming that you know of a better order, that is. Experimentation
+ is recommended.
+ </p><p>
+ A closely related issue that affects planning time is collapsing of
+ subqueries into their parent query. For example, consider:
+</p><pre class="programlisting">
+SELECT *
+FROM x, y,
+ (SELECT * FROM a, b, c WHERE something) AS ss
+WHERE somethingelse;
+</pre><p>
+ This situation might arise from use of a view that contains a join;
+ the view's <code class="literal">SELECT</code> rule will be inserted in place of the view
+ reference, yielding a query much like the above. Normally, the planner
+ will try to collapse the subquery into the parent, yielding:
+</p><pre class="programlisting">
+SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
+</pre><p>
+ This usually results in a better plan than planning the subquery
+ separately. (For example, the outer <code class="literal">WHERE</code> conditions might be such that
+ joining X to A first eliminates many rows of A, thus avoiding the need to
+ form the full logical output of the subquery.) But at the same time,
+ we have increased the planning time; here, we have a five-way join
+ problem replacing two separate three-way join problems. Because of the
+ exponential growth of the number of possibilities, this makes a big
+ difference. The planner tries to avoid getting stuck in huge join search
+ problems by not collapsing a subquery if more than <code class="varname">from_collapse_limit</code>
+ <code class="literal">FROM</code> items would result in the parent
+ query. You can trade off planning time against quality of plan by
+ adjusting this run-time parameter up or down.
+ </p><p>
+ <a class="xref" href="runtime-config-query.html#GUC-FROM-COLLAPSE-LIMIT">from_collapse_limit</a> and <a class="xref" href="runtime-config-query.html#GUC-JOIN-COLLAPSE-LIMIT">join_collapse_limit</a>
+ are similarly named because they do almost the same thing: one controls
+ when the planner will <span class="quote">“<span class="quote">flatten out</span>”</span> subqueries, and the
+ other controls when it will flatten out explicit joins. Typically
+ you would either set <code class="varname">join_collapse_limit</code> equal to
+ <code class="varname">from_collapse_limit</code> (so that explicit joins and subqueries
+ act similarly) or set <code class="varname">join_collapse_limit</code> to 1 (if you want
+ to control join order with explicit joins). But you might set them
+ differently if you are trying to fine-tune the trade-off between planning
+ time and run time.
+ </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="planner-stats.html" title="14.2. Statistics Used by the Planner">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="performance-tips.html" title="Chapter 14. Performance Tips">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="populate.html" title="14.4. Populating a Database">Next</a></td></tr><tr><td width="40%" align="left" valign="top">14.2. Statistics Used by the Planner </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 14.4. Populating a Database</td></tr></table></div></body></html> \ No newline at end of file