summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/queries-with.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/html/queries-with.html')
-rw-r--r--doc/src/sgml/html/queries-with.html426
1 files changed, 426 insertions, 0 deletions
diff --git a/doc/src/sgml/html/queries-with.html b/doc/src/sgml/html/queries-with.html
new file mode 100644
index 0000000..c56b0fd
--- /dev/null
+++ b/doc/src/sgml/html/queries-with.html
@@ -0,0 +1,426 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>7.8. WITH Queries (Common Table Expressions)</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="prev" href="queries-values.html" title="7.7. VALUES Lists" /><link rel="next" href="datatype.html" title="Chapter 8. Data Types" /></head><body id="docContent" class="container-fluid col-10"><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">7.8. <code xmlns="http://www.w3.org/1999/xhtml" class="literal">WITH</code> Queries (Common Table Expressions)</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="queries-values.html" title="7.7. VALUES Lists">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="queries.html" title="Chapter 7. Queries">Up</a></td><th width="60%" align="center">Chapter 7. Queries</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 13.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="datatype.html" title="Chapter 8. Data Types">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="QUERIES-WITH"><div class="titlepage"><div><div><h2 class="title" style="clear: both">7.8. <code class="literal">WITH</code> Queries (Common Table Expressions)</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="queries-with.html#QUERIES-WITH-SELECT">7.8.1. <code class="command">SELECT</code> in <code class="literal">WITH</code></a></span></dt><dt><span class="sect2"><a href="queries-with.html#QUERIES-WITH-MODIFYING">7.8.2. Data-Modifying Statements in <code class="literal">WITH</code></a></span></dt></dl></div><a id="id-1.5.6.12.2" class="indexterm"></a><a id="id-1.5.6.12.3" class="indexterm"></a><p>
+ <code class="literal">WITH</code> provides a way to write auxiliary statements for use in a
+ larger query. These statements, which are often referred to as Common
+ Table Expressions or <acronym class="acronym">CTE</acronym>s, can be thought of as defining
+ temporary tables that exist just for one query. Each auxiliary statement
+ in a <code class="literal">WITH</code> clause can be a <code class="command">SELECT</code>,
+ <code class="command">INSERT</code>, <code class="command">UPDATE</code>, or <code class="command">DELETE</code>; and the
+ <code class="literal">WITH</code> clause itself is attached to a primary statement that can
+ also be a <code class="command">SELECT</code>, <code class="command">INSERT</code>, <code class="command">UPDATE</code>, or
+ <code class="command">DELETE</code>.
+ </p><div class="sect2" id="QUERIES-WITH-SELECT"><div class="titlepage"><div><div><h3 class="title">7.8.1. <code class="command">SELECT</code> in <code class="literal">WITH</code></h3></div></div></div><p>
+ The basic value of <code class="command">SELECT</code> in <code class="literal">WITH</code> is to
+ break down complicated queries into simpler parts. An example is:
+
+</p><pre class="programlisting">
+WITH regional_sales AS (
+ SELECT region, SUM(amount) AS total_sales
+ FROM orders
+ GROUP BY region
+), top_regions AS (
+ SELECT region
+ FROM regional_sales
+ WHERE total_sales &gt; (SELECT SUM(total_sales)/10 FROM regional_sales)
+)
+SELECT region,
+ product,
+ SUM(quantity) AS product_units,
+ SUM(amount) AS product_sales
+FROM orders
+WHERE region IN (SELECT region FROM top_regions)
+GROUP BY region, product;
+</pre><p>
+
+ which displays per-product sales totals in only the top sales regions.
+ The <code class="literal">WITH</code> clause defines two auxiliary statements named
+ <code class="structname">regional_sales</code> and <code class="structname">top_regions</code>,
+ where the output of <code class="structname">regional_sales</code> is used in
+ <code class="structname">top_regions</code> and the output of <code class="structname">top_regions</code>
+ is used in the primary <code class="command">SELECT</code> query.
+ This example could have been written without <code class="literal">WITH</code>,
+ but we'd have needed two levels of nested sub-<code class="command">SELECT</code>s. It's a bit
+ easier to follow this way.
+ </p><p>
+ <a id="id-1.5.6.12.5.3.1" class="indexterm"></a>
+ The optional <code class="literal">RECURSIVE</code> modifier changes <code class="literal">WITH</code>
+ from a mere syntactic convenience into a feature that accomplishes
+ things not otherwise possible in standard SQL. Using
+ <code class="literal">RECURSIVE</code>, a <code class="literal">WITH</code> query can refer to its own
+ output. A very simple example is this query to sum the integers from 1
+ through 100:
+
+</p><pre class="programlisting">
+WITH RECURSIVE t(n) AS (
+ VALUES (1)
+ UNION ALL
+ SELECT n+1 FROM t WHERE n &lt; 100
+)
+SELECT sum(n) FROM t;
+</pre><p>
+
+ The general form of a recursive <code class="literal">WITH</code> query is always a
+ <em class="firstterm">non-recursive term</em>, then <code class="literal">UNION</code> (or
+ <code class="literal">UNION ALL</code>), then a
+ <em class="firstterm">recursive term</em>, where only the recursive term can contain
+ a reference to the query's own output. Such a query is executed as
+ follows:
+ </p><div class="procedure" id="id-1.5.6.12.5.4"><p class="title"><strong>Recursive Query Evaluation</strong></p><ol class="procedure" type="1"><li class="step"><p>
+ Evaluate the non-recursive term. For <code class="literal">UNION</code> (but not
+ <code class="literal">UNION ALL</code>), discard duplicate rows. Include all remaining
+ rows in the result of the recursive query, and also place them in a
+ temporary <em class="firstterm">working table</em>.
+ </p></li><li class="step"><p>
+ So long as the working table is not empty, repeat these steps:
+ </p><ol type="a" class="substeps"><li class="step"><p>
+ Evaluate the recursive term, substituting the current contents of
+ the working table for the recursive self-reference.
+ For <code class="literal">UNION</code> (but not <code class="literal">UNION ALL</code>), discard
+ duplicate rows and rows that duplicate any previous result row.
+ Include all remaining rows in the result of the recursive query, and
+ also place them in a temporary <em class="firstterm">intermediate table</em>.
+ </p></li><li class="step"><p>
+ Replace the contents of the working table with the contents of the
+ intermediate table, then empty the intermediate table.
+ </p></li></ol></li></ol></div><div class="note"><h3 class="title">Note</h3><p>
+ Strictly speaking, this process is iteration not recursion, but
+ <code class="literal">RECURSIVE</code> is the terminology chosen by the SQL standards
+ committee.
+ </p></div><p>
+ In the example above, the working table has just a single row in each step,
+ and it takes on the values from 1 through 100 in successive steps. In
+ the 100th step, there is no output because of the <code class="literal">WHERE</code>
+ clause, and so the query terminates.
+ </p><p>
+ Recursive queries are typically used to deal with hierarchical or
+ tree-structured data. A useful example is this query to find all the
+ direct and indirect sub-parts of a product, given only a table that
+ shows immediate inclusions:
+
+</p><pre class="programlisting">
+WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
+ SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
+ UNION ALL
+ SELECT p.sub_part, p.part, p.quantity
+ FROM included_parts pr, parts p
+ WHERE p.part = pr.sub_part
+)
+SELECT sub_part, SUM(quantity) as total_quantity
+FROM included_parts
+GROUP BY sub_part
+</pre><p>
+ </p><p>
+ When working with recursive queries it is important to be sure that
+ the recursive part of the query will eventually return no tuples,
+ or else the query will loop indefinitely. Sometimes, using
+ <code class="literal">UNION</code> instead of <code class="literal">UNION ALL</code> can accomplish this
+ by discarding rows that duplicate previous output rows. However, often a
+ cycle does not involve output rows that are completely duplicate: it may be
+ necessary to check just one or a few fields to see if the same point has
+ been reached before. The standard method for handling such situations is
+ to compute an array of the already-visited values. For example, consider
+ the following query that searches a table <code class="structname">graph</code> using a
+ <code class="structfield">link</code> field:
+
+</p><pre class="programlisting">
+WITH RECURSIVE search_graph(id, link, data, depth) AS (
+ SELECT g.id, g.link, g.data, 1
+ FROM graph g
+ UNION ALL
+ SELECT g.id, g.link, g.data, sg.depth + 1
+ FROM graph g, search_graph sg
+ WHERE g.id = sg.link
+)
+SELECT * FROM search_graph;
+</pre><p>
+
+ This query will loop if the <code class="structfield">link</code> relationships contain
+ cycles. Because we require a <span class="quote">“<span class="quote">depth</span>”</span> output, just changing
+ <code class="literal">UNION ALL</code> to <code class="literal">UNION</code> would not eliminate the looping.
+ Instead we need to recognize whether we have reached the same row again
+ while following a particular path of links. We add two columns
+ <code class="structfield">path</code> and <code class="structfield">cycle</code> to the loop-prone query:
+
+</p><pre class="programlisting">
+WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
+ SELECT g.id, g.link, g.data, 1,
+ ARRAY[g.id],
+ false
+ FROM graph g
+ UNION ALL
+ SELECT g.id, g.link, g.data, sg.depth + 1,
+ path || g.id,
+ g.id = ANY(path)
+ FROM graph g, search_graph sg
+ WHERE g.id = sg.link AND NOT cycle
+)
+SELECT * FROM search_graph;
+</pre><p>
+
+ Aside from preventing cycles, the array value is often useful in its own
+ right as representing the <span class="quote">“<span class="quote">path</span>”</span> taken to reach any particular row.
+ </p><p>
+ In the general case where more than one field needs to be checked to
+ recognize a cycle, use an array of rows. For example, if we needed to
+ compare fields <code class="structfield">f1</code> and <code class="structfield">f2</code>:
+
+</p><pre class="programlisting">
+WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
+ SELECT g.id, g.link, g.data, 1,
+ ARRAY[ROW(g.f1, g.f2)],
+ false
+ FROM graph g
+ UNION ALL
+ SELECT g.id, g.link, g.data, sg.depth + 1,
+ path || ROW(g.f1, g.f2),
+ ROW(g.f1, g.f2) = ANY(path)
+ FROM graph g, search_graph sg
+ WHERE g.id = sg.link AND NOT cycle
+)
+SELECT * FROM search_graph;
+</pre><p>
+ </p><div class="tip"><h3 class="title">Tip</h3><p>
+ Omit the <code class="literal">ROW()</code> syntax in the common case where only one field
+ needs to be checked to recognize a cycle. This allows a simple array
+ rather than a composite-type array to be used, gaining efficiency.
+ </p></div><div class="tip"><h3 class="title">Tip</h3><p>
+ The recursive query evaluation algorithm produces its output in
+ breadth-first search order. You can display the results in depth-first
+ search order by making the outer query <code class="literal">ORDER BY</code> a
+ <span class="quote">“<span class="quote">path</span>”</span> column constructed in this way.
+ </p></div><p>
+ A helpful trick for testing queries
+ when you are not certain if they might loop is to place a <code class="literal">LIMIT</code>
+ in the parent query. For example, this query would loop forever without
+ the <code class="literal">LIMIT</code>:
+
+</p><pre class="programlisting">
+WITH RECURSIVE t(n) AS (
+ SELECT 1
+ UNION ALL
+ SELECT n+1 FROM t
+)
+SELECT n FROM t LIMIT 100;
+</pre><p>
+
+ This works because <span class="productname">PostgreSQL</span>'s implementation
+ evaluates only as many rows of a <code class="literal">WITH</code> query as are actually
+ fetched by the parent query. Using this trick in production is not
+ recommended, because other systems might work differently. Also, it
+ usually won't work if you make the outer query sort the recursive query's
+ results or join them to some other table, because in such cases the
+ outer query will usually try to fetch all of the <code class="literal">WITH</code> query's
+ output anyway.
+ </p><p>
+ A useful property of <code class="literal">WITH</code> queries is that they are
+ normally evaluated only once per execution of the parent query, even if
+ they are referred to more than once by the parent query or
+ sibling <code class="literal">WITH</code> queries.
+ Thus, expensive calculations that are needed in multiple places can be
+ placed within a <code class="literal">WITH</code> query to avoid redundant work. Another
+ possible application is to prevent unwanted multiple evaluations of
+ functions with side-effects.
+ However, the other side of this coin is that the optimizer is not able to
+ push restrictions from the parent query down into a multiply-referenced
+ <code class="literal">WITH</code> query, since that might affect all uses of the
+ <code class="literal">WITH</code> query's output when it should affect only one.
+ The multiply-referenced <code class="literal">WITH</code> query will be
+ evaluated as written, without suppression of rows that the parent query
+ might discard afterwards. (But, as mentioned above, evaluation might stop
+ early if the reference(s) to the query demand only a limited number of
+ rows.)
+ </p><p>
+ However, if a <code class="literal">WITH</code> query is non-recursive and
+ side-effect-free (that is, it is a <code class="literal">SELECT</code> containing
+ no volatile functions) then it can be folded into the parent query,
+ allowing joint optimization of the two query levels. By default, this
+ happens if the parent query references the <code class="literal">WITH</code> query
+ just once, but not if it references the <code class="literal">WITH</code> query
+ more than once. You can override that decision by
+ specifying <code class="literal">MATERIALIZED</code> to force separate calculation
+ of the <code class="literal">WITH</code> query, or by specifying <code class="literal">NOT
+ MATERIALIZED</code> to force it to be merged into the parent query.
+ The latter choice risks duplicate computation of
+ the <code class="literal">WITH</code> query, but it can still give a net savings if
+ each usage of the <code class="literal">WITH</code> query needs only a small part
+ of the <code class="literal">WITH</code> query's full output.
+ </p><p>
+ A simple example of these rules is
+</p><pre class="programlisting">
+WITH w AS (
+ SELECT * FROM big_table
+)
+SELECT * FROM w WHERE key = 123;
+</pre><p>
+ This <code class="literal">WITH</code> query will be folded, producing the same
+ execution plan as
+</p><pre class="programlisting">
+SELECT * FROM big_table WHERE key = 123;
+</pre><p>
+ In particular, if there's an index on <code class="structfield">key</code>,
+ it will probably be used to fetch just the rows having <code class="literal">key =
+ 123</code>. On the other hand, in
+</p><pre class="programlisting">
+WITH w AS (
+ SELECT * FROM big_table
+)
+SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
+WHERE w2.key = 123;
+</pre><p>
+ the <code class="literal">WITH</code> query will be materialized, producing a
+ temporary copy of <code class="structname">big_table</code> that is then
+ joined with itself — without benefit of any index. This query
+ will be executed much more efficiently if written as
+</p><pre class="programlisting">
+WITH w AS NOT MATERIALIZED (
+ SELECT * FROM big_table
+)
+SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
+WHERE w2.key = 123;
+</pre><p>
+ so that the parent query's restrictions can be applied directly
+ to scans of <code class="structname">big_table</code>.
+ </p><p>
+ An example where <code class="literal">NOT MATERIALIZED</code> could be
+ undesirable is
+</p><pre class="programlisting">
+WITH w AS (
+ SELECT key, very_expensive_function(val) as f FROM some_table
+)
+SELECT * FROM w AS w1 JOIN w AS w2 ON w1.f = w2.f;
+</pre><p>
+ Here, materialization of the <code class="literal">WITH</code> query ensures
+ that <code class="function">very_expensive_function</code> is evaluated only
+ once per table row, not twice.
+ </p><p>
+ The examples above only show <code class="literal">WITH</code> being used with
+ <code class="command">SELECT</code>, but it can be attached in the same way to
+ <code class="command">INSERT</code>, <code class="command">UPDATE</code>, or <code class="command">DELETE</code>.
+ In each case it effectively provides temporary table(s) that can
+ be referred to in the main command.
+ </p></div><div class="sect2" id="QUERIES-WITH-MODIFYING"><div class="titlepage"><div><div><h3 class="title">7.8.2. Data-Modifying Statements in <code class="literal">WITH</code></h3></div></div></div><p>
+ You can use data-modifying statements (<code class="command">INSERT</code>,
+ <code class="command">UPDATE</code>, or <code class="command">DELETE</code>) in <code class="literal">WITH</code>. This
+ allows you to perform several different operations in the same query.
+ An example is:
+
+</p><pre class="programlisting">
+WITH moved_rows AS (
+ DELETE FROM products
+ WHERE
+ "date" &gt;= '2010-10-01' AND
+ "date" &lt; '2010-11-01'
+ RETURNING *
+)
+INSERT INTO products_log
+SELECT * FROM moved_rows;
+</pre><p>
+
+ This query effectively moves rows from <code class="structname">products</code> to
+ <code class="structname">products_log</code>. The <code class="command">DELETE</code> in <code class="literal">WITH</code>
+ deletes the specified rows from <code class="structname">products</code>, returning their
+ contents by means of its <code class="literal">RETURNING</code> clause; and then the
+ primary query reads that output and inserts it into
+ <code class="structname">products_log</code>.
+ </p><p>
+ A fine point of the above example is that the <code class="literal">WITH</code> clause is
+ attached to the <code class="command">INSERT</code>, not the sub-<code class="command">SELECT</code> within
+ the <code class="command">INSERT</code>. This is necessary because data-modifying
+ statements are only allowed in <code class="literal">WITH</code> clauses that are attached
+ to the top-level statement. However, normal <code class="literal">WITH</code> visibility
+ rules apply, so it is possible to refer to the <code class="literal">WITH</code>
+ statement's output from the sub-<code class="command">SELECT</code>.
+ </p><p>
+ Data-modifying statements in <code class="literal">WITH</code> usually have
+ <code class="literal">RETURNING</code> clauses (see <a class="xref" href="dml-returning.html" title="6.4. Returning Data from Modified Rows">Section 6.4</a>),
+ as shown in the example above.
+ It is the output of the <code class="literal">RETURNING</code> clause, <span class="emphasis"><em>not</em></span> the
+ target table of the data-modifying statement, that forms the temporary
+ table that can be referred to by the rest of the query. If a
+ data-modifying statement in <code class="literal">WITH</code> lacks a <code class="literal">RETURNING</code>
+ clause, then it forms no temporary table and cannot be referred to in
+ the rest of the query. Such a statement will be executed nonetheless.
+ A not-particularly-useful example is:
+
+</p><pre class="programlisting">
+WITH t AS (
+ DELETE FROM foo
+)
+DELETE FROM bar;
+</pre><p>
+
+ This example would remove all rows from tables <code class="structname">foo</code> and
+ <code class="structname">bar</code>. The number of affected rows reported to the client
+ would only include rows removed from <code class="structname">bar</code>.
+ </p><p>
+ Recursive self-references in data-modifying statements are not
+ allowed. In some cases it is possible to work around this limitation by
+ referring to the output of a recursive <code class="literal">WITH</code>, for example:
+
+</p><pre class="programlisting">
+WITH RECURSIVE included_parts(sub_part, part) AS (
+ SELECT sub_part, part FROM parts WHERE part = 'our_product'
+ UNION ALL
+ SELECT p.sub_part, p.part
+ FROM included_parts pr, parts p
+ WHERE p.part = pr.sub_part
+)
+DELETE FROM parts
+ WHERE part IN (SELECT part FROM included_parts);
+</pre><p>
+
+ This query would remove all direct and indirect subparts of a product.
+ </p><p>
+ Data-modifying statements in <code class="literal">WITH</code> are executed exactly once,
+ and always to completion, independently of whether the primary query
+ reads all (or indeed any) of their output. Notice that this is different
+ from the rule for <code class="command">SELECT</code> in <code class="literal">WITH</code>: as stated in the
+ previous section, execution of a <code class="command">SELECT</code> is carried only as far
+ as the primary query demands its output.
+ </p><p>
+ The sub-statements in <code class="literal">WITH</code> are executed concurrently with
+ each other and with the main query. Therefore, when using data-modifying
+ statements in <code class="literal">WITH</code>, the order in which the specified updates
+ actually happen is unpredictable. All the statements are executed with
+ the same <em class="firstterm">snapshot</em> (see <a class="xref" href="mvcc.html" title="Chapter 13. Concurrency Control">Chapter 13</a>), so they
+ cannot <span class="quote">“<span class="quote">see</span>”</span> one another's effects on the target tables. This
+ alleviates the effects of the unpredictability of the actual order of row
+ updates, and means that <code class="literal">RETURNING</code> data is the only way to
+ communicate changes between different <code class="literal">WITH</code> sub-statements and
+ the main query. An example of this is that in
+
+</p><pre class="programlisting">
+WITH t AS (
+ UPDATE products SET price = price * 1.05
+ RETURNING *
+)
+SELECT * FROM products;
+</pre><p>
+
+ the outer <code class="command">SELECT</code> would return the original prices before the
+ action of the <code class="command">UPDATE</code>, while in
+
+</p><pre class="programlisting">
+WITH t AS (
+ UPDATE products SET price = price * 1.05
+ RETURNING *
+)
+SELECT * FROM t;
+</pre><p>
+
+ the outer <code class="command">SELECT</code> would return the updated data.
+ </p><p>
+ Trying to update the same row twice in a single statement is not
+ supported. Only one of the modifications takes place, but it is not easy
+ (and sometimes not possible) to reliably predict which one. This also
+ applies to deleting a row that was already updated in the same statement:
+ only the update is performed. Therefore you should generally avoid trying
+ to modify a single row twice in a single statement. In particular avoid
+ writing <code class="literal">WITH</code> sub-statements that could affect the same rows
+ changed by the main statement or a sibling sub-statement. The effects
+ of such a statement will not be predictable.
+ </p><p>
+ At present, any table used as the target of a data-modifying statement in
+ <code class="literal">WITH</code> must not have a conditional rule, nor an <code class="literal">ALSO</code>
+ rule, nor an <code class="literal">INSTEAD</code> rule that expands to multiple statements.
+ </p></div></div><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navfooter"><hr></hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="queries-values.html" title="7.7. VALUES Lists">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="queries.html" title="Chapter 7. Queries">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="datatype.html" title="Chapter 8. Data Types">Next</a></td></tr><tr><td width="40%" align="left" valign="top">7.7. <code xmlns="http://www.w3.org/1999/xhtml" class="literal">VALUES</code> Lists </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 13.4 Documentation">Home</a></td><td width="40%" align="right" valign="top"> Chapter 8. Data Types</td></tr></table></div></body></html> \ No newline at end of file