diff options
Diffstat (limited to 'doc/src/sgml/html/queries-with.html')
-rw-r--r-- | doc/src/sgml/html/queries-with.html | 426 |
1 files changed, 426 insertions, 0 deletions
diff --git a/doc/src/sgml/html/queries-with.html b/doc/src/sgml/html/queries-with.html new file mode 100644 index 0000000..c56b0fd --- /dev/null +++ b/doc/src/sgml/html/queries-with.html @@ -0,0 +1,426 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>7.8. WITH Queries (Common Table Expressions)</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="prev" href="queries-values.html" title="7.7. VALUES Lists" /><link rel="next" href="datatype.html" title="Chapter 8. Data Types" /></head><body id="docContent" class="container-fluid col-10"><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">7.8. <code xmlns="http://www.w3.org/1999/xhtml" class="literal">WITH</code> Queries (Common Table Expressions)</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="queries-values.html" title="7.7. VALUES Lists">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="queries.html" title="Chapter 7. Queries">Up</a></td><th width="60%" align="center">Chapter 7. Queries</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 13.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="datatype.html" title="Chapter 8. Data Types">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="QUERIES-WITH"><div class="titlepage"><div><div><h2 class="title" style="clear: both">7.8. <code class="literal">WITH</code> Queries (Common Table Expressions)</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="queries-with.html#QUERIES-WITH-SELECT">7.8.1. <code class="command">SELECT</code> in <code class="literal">WITH</code></a></span></dt><dt><span class="sect2"><a href="queries-with.html#QUERIES-WITH-MODIFYING">7.8.2. Data-Modifying Statements in <code class="literal">WITH</code></a></span></dt></dl></div><a id="id-1.5.6.12.2" class="indexterm"></a><a id="id-1.5.6.12.3" class="indexterm"></a><p> + <code class="literal">WITH</code> provides a way to write auxiliary statements for use in a + larger query. These statements, which are often referred to as Common + Table Expressions or <acronym class="acronym">CTE</acronym>s, can be thought of as defining + temporary tables that exist just for one query. Each auxiliary statement + in a <code class="literal">WITH</code> clause can be a <code class="command">SELECT</code>, + <code class="command">INSERT</code>, <code class="command">UPDATE</code>, or <code class="command">DELETE</code>; and the + <code class="literal">WITH</code> clause itself is attached to a primary statement that can + also be a <code class="command">SELECT</code>, <code class="command">INSERT</code>, <code class="command">UPDATE</code>, or + <code class="command">DELETE</code>. + </p><div class="sect2" id="QUERIES-WITH-SELECT"><div class="titlepage"><div><div><h3 class="title">7.8.1. <code class="command">SELECT</code> in <code class="literal">WITH</code></h3></div></div></div><p> + The basic value of <code class="command">SELECT</code> in <code class="literal">WITH</code> is to + break down complicated queries into simpler parts. An example is: + +</p><pre class="programlisting"> +WITH regional_sales AS ( + SELECT region, SUM(amount) AS total_sales + FROM orders + GROUP BY region +), top_regions AS ( + SELECT region + FROM regional_sales + WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) +) +SELECT region, + product, + SUM(quantity) AS product_units, + SUM(amount) AS product_sales +FROM orders +WHERE region IN (SELECT region FROM top_regions) +GROUP BY region, product; +</pre><p> + + which displays per-product sales totals in only the top sales regions. + The <code class="literal">WITH</code> clause defines two auxiliary statements named + <code class="structname">regional_sales</code> and <code class="structname">top_regions</code>, + where the output of <code class="structname">regional_sales</code> is used in + <code class="structname">top_regions</code> and the output of <code class="structname">top_regions</code> + is used in the primary <code class="command">SELECT</code> query. + This example could have been written without <code class="literal">WITH</code>, + but we'd have needed two levels of nested sub-<code class="command">SELECT</code>s. It's a bit + easier to follow this way. + </p><p> + <a id="id-1.5.6.12.5.3.1" class="indexterm"></a> + The optional <code class="literal">RECURSIVE</code> modifier changes <code class="literal">WITH</code> + from a mere syntactic convenience into a feature that accomplishes + things not otherwise possible in standard SQL. Using + <code class="literal">RECURSIVE</code>, a <code class="literal">WITH</code> query can refer to its own + output. A very simple example is this query to sum the integers from 1 + through 100: + +</p><pre class="programlisting"> +WITH RECURSIVE t(n) AS ( + VALUES (1) + UNION ALL + SELECT n+1 FROM t WHERE n < 100 +) +SELECT sum(n) FROM t; +</pre><p> + + The general form of a recursive <code class="literal">WITH</code> query is always a + <em class="firstterm">non-recursive term</em>, then <code class="literal">UNION</code> (or + <code class="literal">UNION ALL</code>), then a + <em class="firstterm">recursive term</em>, where only the recursive term can contain + a reference to the query's own output. Such a query is executed as + follows: + </p><div class="procedure" id="id-1.5.6.12.5.4"><p class="title"><strong>Recursive Query Evaluation</strong></p><ol class="procedure" type="1"><li class="step"><p> + Evaluate the non-recursive term. For <code class="literal">UNION</code> (but not + <code class="literal">UNION ALL</code>), discard duplicate rows. Include all remaining + rows in the result of the recursive query, and also place them in a + temporary <em class="firstterm">working table</em>. + </p></li><li class="step"><p> + So long as the working table is not empty, repeat these steps: + </p><ol type="a" class="substeps"><li class="step"><p> + Evaluate the recursive term, substituting the current contents of + the working table for the recursive self-reference. + For <code class="literal">UNION</code> (but not <code class="literal">UNION ALL</code>), discard + duplicate rows and rows that duplicate any previous result row. + Include all remaining rows in the result of the recursive query, and + also place them in a temporary <em class="firstterm">intermediate table</em>. + </p></li><li class="step"><p> + Replace the contents of the working table with the contents of the + intermediate table, then empty the intermediate table. + </p></li></ol></li></ol></div><div class="note"><h3 class="title">Note</h3><p> + Strictly speaking, this process is iteration not recursion, but + <code class="literal">RECURSIVE</code> is the terminology chosen by the SQL standards + committee. + </p></div><p> + In the example above, the working table has just a single row in each step, + and it takes on the values from 1 through 100 in successive steps. In + the 100th step, there is no output because of the <code class="literal">WHERE</code> + clause, and so the query terminates. + </p><p> + Recursive queries are typically used to deal with hierarchical or + tree-structured data. A useful example is this query to find all the + direct and indirect sub-parts of a product, given only a table that + shows immediate inclusions: + +</p><pre class="programlisting"> +WITH RECURSIVE included_parts(sub_part, part, quantity) AS ( + SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product' + UNION ALL + SELECT p.sub_part, p.part, p.quantity + FROM included_parts pr, parts p + WHERE p.part = pr.sub_part +) +SELECT sub_part, SUM(quantity) as total_quantity +FROM included_parts +GROUP BY sub_part +</pre><p> + </p><p> + When working with recursive queries it is important to be sure that + the recursive part of the query will eventually return no tuples, + or else the query will loop indefinitely. Sometimes, using + <code class="literal">UNION</code> instead of <code class="literal">UNION ALL</code> can accomplish this + by discarding rows that duplicate previous output rows. However, often a + cycle does not involve output rows that are completely duplicate: it may be + necessary to check just one or a few fields to see if the same point has + been reached before. The standard method for handling such situations is + to compute an array of the already-visited values. For example, consider + the following query that searches a table <code class="structname">graph</code> using a + <code class="structfield">link</code> field: + +</p><pre class="programlisting"> +WITH RECURSIVE search_graph(id, link, data, depth) AS ( + SELECT g.id, g.link, g.data, 1 + FROM graph g + UNION ALL + SELECT g.id, g.link, g.data, sg.depth + 1 + FROM graph g, search_graph sg + WHERE g.id = sg.link +) +SELECT * FROM search_graph; +</pre><p> + + This query will loop if the <code class="structfield">link</code> relationships contain + cycles. Because we require a <span class="quote">“<span class="quote">depth</span>”</span> output, just changing + <code class="literal">UNION ALL</code> to <code class="literal">UNION</code> would not eliminate the looping. + Instead we need to recognize whether we have reached the same row again + while following a particular path of links. We add two columns + <code class="structfield">path</code> and <code class="structfield">cycle</code> to the loop-prone query: + +</p><pre class="programlisting"> +WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS ( + SELECT g.id, g.link, g.data, 1, + ARRAY[g.id], + false + FROM graph g + UNION ALL + SELECT g.id, g.link, g.data, sg.depth + 1, + path || g.id, + g.id = ANY(path) + FROM graph g, search_graph sg + WHERE g.id = sg.link AND NOT cycle +) +SELECT * FROM search_graph; +</pre><p> + + Aside from preventing cycles, the array value is often useful in its own + right as representing the <span class="quote">“<span class="quote">path</span>”</span> taken to reach any particular row. + </p><p> + In the general case where more than one field needs to be checked to + recognize a cycle, use an array of rows. For example, if we needed to + compare fields <code class="structfield">f1</code> and <code class="structfield">f2</code>: + +</p><pre class="programlisting"> +WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS ( + SELECT g.id, g.link, g.data, 1, + ARRAY[ROW(g.f1, g.f2)], + false + FROM graph g + UNION ALL + SELECT g.id, g.link, g.data, sg.depth + 1, + path || ROW(g.f1, g.f2), + ROW(g.f1, g.f2) = ANY(path) + FROM graph g, search_graph sg + WHERE g.id = sg.link AND NOT cycle +) +SELECT * FROM search_graph; +</pre><p> + </p><div class="tip"><h3 class="title">Tip</h3><p> + Omit the <code class="literal">ROW()</code> syntax in the common case where only one field + needs to be checked to recognize a cycle. This allows a simple array + rather than a composite-type array to be used, gaining efficiency. + </p></div><div class="tip"><h3 class="title">Tip</h3><p> + The recursive query evaluation algorithm produces its output in + breadth-first search order. You can display the results in depth-first + search order by making the outer query <code class="literal">ORDER BY</code> a + <span class="quote">“<span class="quote">path</span>”</span> column constructed in this way. + </p></div><p> + A helpful trick for testing queries + when you are not certain if they might loop is to place a <code class="literal">LIMIT</code> + in the parent query. For example, this query would loop forever without + the <code class="literal">LIMIT</code>: + +</p><pre class="programlisting"> +WITH RECURSIVE t(n) AS ( + SELECT 1 + UNION ALL + SELECT n+1 FROM t +) +SELECT n FROM t LIMIT 100; +</pre><p> + + This works because <span class="productname">PostgreSQL</span>'s implementation + evaluates only as many rows of a <code class="literal">WITH</code> query as are actually + fetched by the parent query. Using this trick in production is not + recommended, because other systems might work differently. Also, it + usually won't work if you make the outer query sort the recursive query's + results or join them to some other table, because in such cases the + outer query will usually try to fetch all of the <code class="literal">WITH</code> query's + output anyway. + </p><p> + A useful property of <code class="literal">WITH</code> queries is that they are + normally evaluated only once per execution of the parent query, even if + they are referred to more than once by the parent query or + sibling <code class="literal">WITH</code> queries. + Thus, expensive calculations that are needed in multiple places can be + placed within a <code class="literal">WITH</code> query to avoid redundant work. Another + possible application is to prevent unwanted multiple evaluations of + functions with side-effects. + However, the other side of this coin is that the optimizer is not able to + push restrictions from the parent query down into a multiply-referenced + <code class="literal">WITH</code> query, since that might affect all uses of the + <code class="literal">WITH</code> query's output when it should affect only one. + The multiply-referenced <code class="literal">WITH</code> query will be + evaluated as written, without suppression of rows that the parent query + might discard afterwards. (But, as mentioned above, evaluation might stop + early if the reference(s) to the query demand only a limited number of + rows.) + </p><p> + However, if a <code class="literal">WITH</code> query is non-recursive and + side-effect-free (that is, it is a <code class="literal">SELECT</code> containing + no volatile functions) then it can be folded into the parent query, + allowing joint optimization of the two query levels. By default, this + happens if the parent query references the <code class="literal">WITH</code> query + just once, but not if it references the <code class="literal">WITH</code> query + more than once. You can override that decision by + specifying <code class="literal">MATERIALIZED</code> to force separate calculation + of the <code class="literal">WITH</code> query, or by specifying <code class="literal">NOT + MATERIALIZED</code> to force it to be merged into the parent query. + The latter choice risks duplicate computation of + the <code class="literal">WITH</code> query, but it can still give a net savings if + each usage of the <code class="literal">WITH</code> query needs only a small part + of the <code class="literal">WITH</code> query's full output. + </p><p> + A simple example of these rules is +</p><pre class="programlisting"> +WITH w AS ( + SELECT * FROM big_table +) +SELECT * FROM w WHERE key = 123; +</pre><p> + This <code class="literal">WITH</code> query will be folded, producing the same + execution plan as +</p><pre class="programlisting"> +SELECT * FROM big_table WHERE key = 123; +</pre><p> + In particular, if there's an index on <code class="structfield">key</code>, + it will probably be used to fetch just the rows having <code class="literal">key = + 123</code>. On the other hand, in +</p><pre class="programlisting"> +WITH w AS ( + SELECT * FROM big_table +) +SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref +WHERE w2.key = 123; +</pre><p> + the <code class="literal">WITH</code> query will be materialized, producing a + temporary copy of <code class="structname">big_table</code> that is then + joined with itself — without benefit of any index. This query + will be executed much more efficiently if written as +</p><pre class="programlisting"> +WITH w AS NOT MATERIALIZED ( + SELECT * FROM big_table +) +SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref +WHERE w2.key = 123; +</pre><p> + so that the parent query's restrictions can be applied directly + to scans of <code class="structname">big_table</code>. + </p><p> + An example where <code class="literal">NOT MATERIALIZED</code> could be + undesirable is +</p><pre class="programlisting"> +WITH w AS ( + SELECT key, very_expensive_function(val) as f FROM some_table +) +SELECT * FROM w AS w1 JOIN w AS w2 ON w1.f = w2.f; +</pre><p> + Here, materialization of the <code class="literal">WITH</code> query ensures + that <code class="function">very_expensive_function</code> is evaluated only + once per table row, not twice. + </p><p> + The examples above only show <code class="literal">WITH</code> being used with + <code class="command">SELECT</code>, but it can be attached in the same way to + <code class="command">INSERT</code>, <code class="command">UPDATE</code>, or <code class="command">DELETE</code>. + In each case it effectively provides temporary table(s) that can + be referred to in the main command. + </p></div><div class="sect2" id="QUERIES-WITH-MODIFYING"><div class="titlepage"><div><div><h3 class="title">7.8.2. Data-Modifying Statements in <code class="literal">WITH</code></h3></div></div></div><p> + You can use data-modifying statements (<code class="command">INSERT</code>, + <code class="command">UPDATE</code>, or <code class="command">DELETE</code>) in <code class="literal">WITH</code>. This + allows you to perform several different operations in the same query. + An example is: + +</p><pre class="programlisting"> +WITH moved_rows AS ( + DELETE FROM products + WHERE + "date" >= '2010-10-01' AND + "date" < '2010-11-01' + RETURNING * +) +INSERT INTO products_log +SELECT * FROM moved_rows; +</pre><p> + + This query effectively moves rows from <code class="structname">products</code> to + <code class="structname">products_log</code>. The <code class="command">DELETE</code> in <code class="literal">WITH</code> + deletes the specified rows from <code class="structname">products</code>, returning their + contents by means of its <code class="literal">RETURNING</code> clause; and then the + primary query reads that output and inserts it into + <code class="structname">products_log</code>. + </p><p> + A fine point of the above example is that the <code class="literal">WITH</code> clause is + attached to the <code class="command">INSERT</code>, not the sub-<code class="command">SELECT</code> within + the <code class="command">INSERT</code>. This is necessary because data-modifying + statements are only allowed in <code class="literal">WITH</code> clauses that are attached + to the top-level statement. However, normal <code class="literal">WITH</code> visibility + rules apply, so it is possible to refer to the <code class="literal">WITH</code> + statement's output from the sub-<code class="command">SELECT</code>. + </p><p> + Data-modifying statements in <code class="literal">WITH</code> usually have + <code class="literal">RETURNING</code> clauses (see <a class="xref" href="dml-returning.html" title="6.4. Returning Data from Modified Rows">Section 6.4</a>), + as shown in the example above. + It is the output of the <code class="literal">RETURNING</code> clause, <span class="emphasis"><em>not</em></span> the + target table of the data-modifying statement, that forms the temporary + table that can be referred to by the rest of the query. If a + data-modifying statement in <code class="literal">WITH</code> lacks a <code class="literal">RETURNING</code> + clause, then it forms no temporary table and cannot be referred to in + the rest of the query. Such a statement will be executed nonetheless. + A not-particularly-useful example is: + +</p><pre class="programlisting"> +WITH t AS ( + DELETE FROM foo +) +DELETE FROM bar; +</pre><p> + + This example would remove all rows from tables <code class="structname">foo</code> and + <code class="structname">bar</code>. The number of affected rows reported to the client + would only include rows removed from <code class="structname">bar</code>. + </p><p> + Recursive self-references in data-modifying statements are not + allowed. In some cases it is possible to work around this limitation by + referring to the output of a recursive <code class="literal">WITH</code>, for example: + +</p><pre class="programlisting"> +WITH RECURSIVE included_parts(sub_part, part) AS ( + SELECT sub_part, part FROM parts WHERE part = 'our_product' + UNION ALL + SELECT p.sub_part, p.part + FROM included_parts pr, parts p + WHERE p.part = pr.sub_part +) +DELETE FROM parts + WHERE part IN (SELECT part FROM included_parts); +</pre><p> + + This query would remove all direct and indirect subparts of a product. + </p><p> + Data-modifying statements in <code class="literal">WITH</code> are executed exactly once, + and always to completion, independently of whether the primary query + reads all (or indeed any) of their output. Notice that this is different + from the rule for <code class="command">SELECT</code> in <code class="literal">WITH</code>: as stated in the + previous section, execution of a <code class="command">SELECT</code> is carried only as far + as the primary query demands its output. + </p><p> + The sub-statements in <code class="literal">WITH</code> are executed concurrently with + each other and with the main query. Therefore, when using data-modifying + statements in <code class="literal">WITH</code>, the order in which the specified updates + actually happen is unpredictable. All the statements are executed with + the same <em class="firstterm">snapshot</em> (see <a class="xref" href="mvcc.html" title="Chapter 13. Concurrency Control">Chapter 13</a>), so they + cannot <span class="quote">“<span class="quote">see</span>”</span> one another's effects on the target tables. This + alleviates the effects of the unpredictability of the actual order of row + updates, and means that <code class="literal">RETURNING</code> data is the only way to + communicate changes between different <code class="literal">WITH</code> sub-statements and + the main query. An example of this is that in + +</p><pre class="programlisting"> +WITH t AS ( + UPDATE products SET price = price * 1.05 + RETURNING * +) +SELECT * FROM products; +</pre><p> + + the outer <code class="command">SELECT</code> would return the original prices before the + action of the <code class="command">UPDATE</code>, while in + +</p><pre class="programlisting"> +WITH t AS ( + UPDATE products SET price = price * 1.05 + RETURNING * +) +SELECT * FROM t; +</pre><p> + + the outer <code class="command">SELECT</code> would return the updated data. + </p><p> + Trying to update the same row twice in a single statement is not + supported. Only one of the modifications takes place, but it is not easy + (and sometimes not possible) to reliably predict which one. This also + applies to deleting a row that was already updated in the same statement: + only the update is performed. Therefore you should generally avoid trying + to modify a single row twice in a single statement. In particular avoid + writing <code class="literal">WITH</code> sub-statements that could affect the same rows + changed by the main statement or a sibling sub-statement. The effects + of such a statement will not be predictable. + </p><p> + At present, any table used as the target of a data-modifying statement in + <code class="literal">WITH</code> must not have a conditional rule, nor an <code class="literal">ALSO</code> + rule, nor an <code class="literal">INSTEAD</code> rule that expands to multiple statements. + </p></div></div><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navfooter"><hr></hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="queries-values.html" title="7.7. VALUES Lists">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="queries.html" title="Chapter 7. Queries">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="datatype.html" title="Chapter 8. Data Types">Next</a></td></tr><tr><td width="40%" align="left" valign="top">7.7. <code xmlns="http://www.w3.org/1999/xhtml" class="literal">VALUES</code> Lists </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 13.4 Documentation">Home</a></td><td width="40%" align="right" valign="top"> Chapter 8. Data Types</td></tr></table></div></body></html>
\ No newline at end of file |