diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-13 13:44:03 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-13 13:44:03 +0000 |
commit | 293913568e6a7a86fd1479e1cff8e2ecb58d6568 (patch) | |
tree | fc3b469a3ec5ab71b36ea97cc7aaddb838423a0c /doc/src/sgml/html/explicit-joins.html | |
parent | Initial commit. (diff) | |
download | postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.tar.xz postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.zip |
Adding upstream version 16.2.upstream/16.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/html/explicit-joins.html')
-rw-r--r-- | doc/src/sgml/html/explicit-joins.html | 144 |
1 files changed, 144 insertions, 0 deletions
diff --git a/doc/src/sgml/html/explicit-joins.html b/doc/src/sgml/html/explicit-joins.html new file mode 100644 index 0000000..bec5cdf --- /dev/null +++ b/doc/src/sgml/html/explicit-joins.html @@ -0,0 +1,144 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>14.3. Controlling the Planner with Explicit JOIN Clauses</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="planner-stats.html" title="14.2. Statistics Used by the Planner" /><link rel="next" href="populate.html" title="14.4. Populating a Database" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">14.3. Controlling the Planner with Explicit <code class="literal">JOIN</code> Clauses</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="planner-stats.html" title="14.2. Statistics Used by the Planner">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="performance-tips.html" title="Chapter 14. Performance Tips">Up</a></td><th width="60%" align="center">Chapter 14. Performance Tips</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="populate.html" title="14.4. Populating a Database">Next</a></td></tr></table><hr /></div><div class="sect1" id="EXPLICIT-JOINS"><div class="titlepage"><div><div><h2 class="title" style="clear: both">14.3. Controlling the Planner with Explicit <code class="literal">JOIN</code> Clauses <a href="#EXPLICIT-JOINS" class="id_link">#</a></h2></div></div></div><a id="id-1.5.13.6.2" class="indexterm"></a><p> + It is possible + to control the query planner to some extent by using the explicit <code class="literal">JOIN</code> + syntax. To see why this matters, we first need some background. + </p><p> + In a simple join query, such as: +</p><pre class="programlisting"> +SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id; +</pre><p> + the planner is free to join the given tables in any order. For + example, it could generate a query plan that joins A to B, using + the <code class="literal">WHERE</code> condition <code class="literal">a.id = b.id</code>, and then + joins C to this joined table, using the other <code class="literal">WHERE</code> + condition. Or it could join B to C and then join A to that result. + Or it could join A to C and then join them with B — but that + would be inefficient, since the full Cartesian product of A and C + would have to be formed, there being no applicable condition in the + <code class="literal">WHERE</code> clause to allow optimization of the join. (All + joins in the <span class="productname">PostgreSQL</span> executor happen + between two input tables, so it's necessary to build up the result + in one or another of these fashions.) The important point is that + these different join possibilities give semantically equivalent + results but might have hugely different execution costs. Therefore, + the planner will explore all of them to try to find the most + efficient query plan. + </p><p> + When a query only involves two or three tables, there aren't many join + orders to worry about. But the number of possible join orders grows + exponentially as the number of tables expands. Beyond ten or so input + tables it's no longer practical to do an exhaustive search of all the + possibilities, and even for six or seven tables planning might take an + annoyingly long time. When there are too many input tables, the + <span class="productname">PostgreSQL</span> planner will switch from exhaustive + search to a <em class="firstterm">genetic</em> probabilistic search + through a limited number of possibilities. (The switch-over threshold is + set by the <a class="xref" href="runtime-config-query.html#GUC-GEQO-THRESHOLD">geqo_threshold</a> run-time + parameter.) + The genetic search takes less time, but it won't + necessarily find the best possible plan. + </p><p> + When the query involves outer joins, the planner has less freedom + than it does for plain (inner) joins. For example, consider: +</p><pre class="programlisting"> +SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id); +</pre><p> + Although this query's restrictions are superficially similar to the + previous example, the semantics are different because a row must be + emitted for each row of A that has no matching row in the join of B and C. + Therefore the planner has no choice of join order here: it must join + B to C and then join A to that result. Accordingly, this query takes + less time to plan than the previous query. In other cases, the planner + might be able to determine that more than one join order is safe. + For example, given: +</p><pre class="programlisting"> +SELECT * FROM a LEFT JOIN b ON (a.bid = b.id) LEFT JOIN c ON (a.cid = c.id); +</pre><p> + it is valid to join A to either B or C first. Currently, only + <code class="literal">FULL JOIN</code> completely constrains the join order. Most + practical cases involving <code class="literal">LEFT JOIN</code> or <code class="literal">RIGHT JOIN</code> + can be rearranged to some extent. + </p><p> + Explicit inner join syntax (<code class="literal">INNER JOIN</code>, <code class="literal">CROSS + JOIN</code>, or unadorned <code class="literal">JOIN</code>) is semantically the same as + listing the input relations in <code class="literal">FROM</code>, so it does not + constrain the join order. + </p><p> + Even though most kinds of <code class="literal">JOIN</code> don't completely constrain + the join order, it is possible to instruct the + <span class="productname">PostgreSQL</span> query planner to treat all + <code class="literal">JOIN</code> clauses as constraining the join order anyway. + For example, these three queries are logically equivalent: +</p><pre class="programlisting"> +SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id; +SELECT * FROM a CROSS JOIN b CROSS JOIN c WHERE a.id = b.id AND b.ref = c.id; +SELECT * FROM a JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id); +</pre><p> + But if we tell the planner to honor the <code class="literal">JOIN</code> order, + the second and third take less time to plan than the first. This effect + is not worth worrying about for only three tables, but it can be a + lifesaver with many tables. + </p><p> + To force the planner to follow the join order laid out by explicit + <code class="literal">JOIN</code>s, + set the <a class="xref" href="runtime-config-query.html#GUC-JOIN-COLLAPSE-LIMIT">join_collapse_limit</a> run-time parameter to 1. + (Other possible values are discussed below.) + </p><p> + You do not need to constrain the join order completely in order to + cut search time, because it's OK to use <code class="literal">JOIN</code> operators + within items of a plain <code class="literal">FROM</code> list. For example, consider: +</p><pre class="programlisting"> +SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...; +</pre><p> + With <code class="varname">join_collapse_limit</code> = 1, this + forces the planner to join A to B before joining them to other tables, + but doesn't constrain its choices otherwise. In this example, the + number of possible join orders is reduced by a factor of 5. + </p><p> + Constraining the planner's search in this way is a useful technique + both for reducing planning time and for directing the planner to a + good query plan. If the planner chooses a bad join order by default, + you can force it to choose a better order via <code class="literal">JOIN</code> syntax + — assuming that you know of a better order, that is. Experimentation + is recommended. + </p><p> + A closely related issue that affects planning time is collapsing of + subqueries into their parent query. For example, consider: +</p><pre class="programlisting"> +SELECT * +FROM x, y, + (SELECT * FROM a, b, c WHERE something) AS ss +WHERE somethingelse; +</pre><p> + This situation might arise from use of a view that contains a join; + the view's <code class="literal">SELECT</code> rule will be inserted in place of the view + reference, yielding a query much like the above. Normally, the planner + will try to collapse the subquery into the parent, yielding: +</p><pre class="programlisting"> +SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; +</pre><p> + This usually results in a better plan than planning the subquery + separately. (For example, the outer <code class="literal">WHERE</code> conditions might be such that + joining X to A first eliminates many rows of A, thus avoiding the need to + form the full logical output of the subquery.) But at the same time, + we have increased the planning time; here, we have a five-way join + problem replacing two separate three-way join problems. Because of the + exponential growth of the number of possibilities, this makes a big + difference. The planner tries to avoid getting stuck in huge join search + problems by not collapsing a subquery if more than <code class="varname">from_collapse_limit</code> + <code class="literal">FROM</code> items would result in the parent + query. You can trade off planning time against quality of plan by + adjusting this run-time parameter up or down. + </p><p> + <a class="xref" href="runtime-config-query.html#GUC-FROM-COLLAPSE-LIMIT">from_collapse_limit</a> and <a class="xref" href="runtime-config-query.html#GUC-JOIN-COLLAPSE-LIMIT">join_collapse_limit</a> + are similarly named because they do almost the same thing: one controls + when the planner will <span class="quote">“<span class="quote">flatten out</span>”</span> subqueries, and the + other controls when it will flatten out explicit joins. Typically + you would either set <code class="varname">join_collapse_limit</code> equal to + <code class="varname">from_collapse_limit</code> (so that explicit joins and subqueries + act similarly) or set <code class="varname">join_collapse_limit</code> to 1 (if you want + to control join order with explicit joins). But you might set them + differently if you are trying to fine-tune the trade-off between planning + time and run time. + </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="planner-stats.html" title="14.2. Statistics Used by the Planner">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="performance-tips.html" title="Chapter 14. Performance Tips">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="populate.html" title="14.4. Populating a Database">Next</a></td></tr><tr><td width="40%" align="left" valign="top">14.2. Statistics Used by the Planner </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 14.4. Populating a Database</td></tr></table></div></body></html>
\ No newline at end of file |