diff options
Diffstat (limited to 'doc/src/sgml/rules.sgml')
-rw-r--r-- | doc/src/sgml/rules.sgml | 2429 |
1 files changed, 2429 insertions, 0 deletions
diff --git a/doc/src/sgml/rules.sgml b/doc/src/sgml/rules.sgml new file mode 100644 index 0000000..6b0dc41 --- /dev/null +++ b/doc/src/sgml/rules.sgml @@ -0,0 +1,2429 @@ +<!-- doc/src/sgml/rules.sgml --> + +<chapter id="rules"> +<title>The Rule System</title> + + <indexterm zone="rules"> + <primary>rule</primary> + </indexterm> + +<para> + This chapter discusses the rule system in + <productname>PostgreSQL</productname>. Production rule systems + are conceptually simple, but there are many subtle points + involved in actually using them. +</para> + +<para> + Some other database systems define active database rules, which + are usually stored procedures and triggers. In + <productname>PostgreSQL</productname>, these can be implemented + using functions and triggers as well. +</para> + +<para> + The rule system (more precisely speaking, the query rewrite rule + system) is totally different from stored procedures and triggers. + It modifies queries to take rules into consideration, and then + passes the modified query to the query planner for planning and + execution. It is very powerful, and can be used for many things + such as query language procedures, views, and versions. The + theoretical foundations and the power of this rule system are + also discussed in <xref linkend="ston90b"/> and <xref + linkend="ong90"/>. +</para> + +<sect1 id="querytree"> +<title>The Query Tree</title> + +<indexterm zone="querytree"> + <primary>query tree</primary> +</indexterm> + +<para> + To understand how the rule system works it is necessary to know + when it is invoked and what its input and results are. +</para> + +<para> + The rule system is located between the parser and the planner. + It takes the output of the parser, one query tree, and the user-defined + rewrite rules, which are also + query trees with some extra information, and creates zero or more + query trees as result. So its input and output are always things + the parser itself could have produced and thus, anything it sees + is basically representable as an <acronym>SQL</acronym> statement. +</para> + +<para> + Now what is a query tree? It is an internal representation of an + <acronym>SQL</acronym> statement where the single parts that it is + built from are stored separately. These query trees can be shown + in the server log if you set the configuration parameters + <varname>debug_print_parse</varname>, + <varname>debug_print_rewritten</varname>, or + <varname>debug_print_plan</varname>. The rule actions are also + stored as query trees, in the system catalog + <structname>pg_rewrite</structname>. They are not formatted like + the log output, but they contain exactly the same information. +</para> + +<para> + Reading a raw query tree requires some experience. But since + <acronym>SQL</acronym> representations of query trees are + sufficient to understand the rule system, this chapter will not + teach how to read them. +</para> + +<para> + When reading the <acronym>SQL</acronym> representations of the + query trees in this chapter it is necessary to be able to identify + the parts the statement is broken into when it is in the query tree + structure. The parts of a query tree are + +<variablelist> + <varlistentry> + <term> + the command type + </term> + <listitem> + <para> + This is a simple value telling which command + (<command>SELECT</command>, <command>INSERT</command>, + <command>UPDATE</command>, <command>DELETE</command>) produced + the query tree. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + the range table + <indexterm><primary>range table</primary></indexterm> + </term> + <listitem> + <para> + The range table is a list of relations that are used in the query. + In a <command>SELECT</command> statement these are the relations given after + the <literal>FROM</literal> key word. + </para> + + <para> + Every range table entry identifies a table or view and tells + by which name it is called in the other parts of the query. + In the query tree, the range table entries are referenced by + number rather than by name, so here it doesn't matter if there + are duplicate names as it would in an <acronym>SQL</acronym> + statement. This can happen after the range tables of rules + have been merged in. The examples in this chapter will not have + this situation. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + the result relation + </term> + <listitem> + <para> + This is an index into the range table that identifies the + relation where the results of the query go. + </para> + + <para> + <command>SELECT</command> queries don't have a result + relation. (The special case of <command>SELECT INTO</command> is + mostly identical to <command>CREATE TABLE</command> followed by + <literal>INSERT ... SELECT</literal>, and is not discussed + separately here.) + </para> + + <para> + For <command>INSERT</command>, <command>UPDATE</command>, and + <command>DELETE</command> commands, the result relation is the table + (or view!) where the changes are to take effect. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + the target list + <indexterm><primary>target list</primary></indexterm> + </term> + <listitem> + <para> + The target list is a list of expressions that define the + result of the query. In the case of a + <command>SELECT</command>, these expressions are the ones that + build the final output of the query. They correspond to the + expressions between the key words <command>SELECT</command> + and <command>FROM</command>. (<literal>*</literal> is just an + abbreviation for all the column names of a relation. It is + expanded by the parser into the individual columns, so the + rule system never sees it.) + </para> + + <para> + <command>DELETE</command> commands don't need a normal target list + because they don't produce any result. Instead, the planner + adds a special <acronym>CTID</acronym> entry to the empty target list, + to allow the executor to find the row to be deleted. + (<acronym>CTID</acronym> is added when the result relation is an ordinary + table. If it is a view, a whole-row variable is added instead, by + the rule system, as described in <xref linkend="rules-views-update"/>.) + </para> + + <para> + For <command>INSERT</command> commands, the target list describes + the new rows that should go into the result relation. It consists of the + expressions in the <literal>VALUES</literal> clause or the ones from the + <command>SELECT</command> clause in <literal>INSERT + ... SELECT</literal>. The first step of the rewrite process adds + target list entries for any columns that were not assigned to by + the original command but have defaults. Any remaining columns (with + neither a given value nor a default) will be filled in by the + planner with a constant null expression. + </para> + + <para> + For <command>UPDATE</command> commands, the target list + describes the new rows that should replace the old ones. In the + rule system, it contains just the expressions from the <literal>SET + column = expression</literal> part of the command. The planner will + handle missing columns by inserting expressions that copy the values + from the old row into the new one. Just as for <command>DELETE</command>, + a <acronym>CTID</acronym> or whole-row variable is added so that + the executor can identify the old row to be updated. + </para> + + <para> + Every entry in the target list contains an expression that can + be a constant value, a variable pointing to a column of one + of the relations in the range table, a parameter, or an expression + tree made of function calls, constants, variables, operators, etc. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + the qualification + </term> + <listitem> + <para> + The query's qualification is an expression much like one of + those contained in the target list entries. The result value of + this expression is a Boolean that tells whether the operation + (<command>INSERT</command>, <command>UPDATE</command>, + <command>DELETE</command>, or <command>SELECT</command>) for the + final result row should be executed or not. It corresponds to the <literal>WHERE</literal> clause + of an <acronym>SQL</acronym> statement. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + the join tree + </term> + <listitem> + <para> + The query's join tree shows the structure of the <literal>FROM</literal> clause. + For a simple query like <literal>SELECT ... FROM a, b, c</literal>, the join tree is just + a list of the <literal>FROM</literal> items, because we are allowed to join them in + any order. But when <literal>JOIN</literal> expressions, particularly outer joins, + are used, we have to join in the order shown by the joins. + In that case, the join tree shows the structure of the <literal>JOIN</literal> expressions. The + restrictions associated with particular <literal>JOIN</literal> clauses (from <literal>ON</literal> or + <literal>USING</literal> expressions) are stored as qualification expressions attached + to those join-tree nodes. It turns out to be convenient to store + the top-level <literal>WHERE</literal> expression as a qualification attached to the + top-level join-tree item, too. So really the join tree represents + both the <literal>FROM</literal> and <literal>WHERE</literal> clauses of a <command>SELECT</command>. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + the others + </term> + <listitem> + <para> + The other parts of the query tree like the <literal>ORDER BY</literal> + clause aren't of interest here. The rule system + substitutes some entries there while applying rules, but that + doesn't have much to do with the fundamentals of the rule + system. + </para> + </listitem> + </varlistentry> + +</variablelist> +</para> +</sect1> + +<sect1 id="rules-views"> +<title>Views and the Rule System</title> + +<indexterm zone="rules-views"> + <primary>rule</primary> + <secondary>and views</secondary> +</indexterm> + +<indexterm zone="rules-views"> + <primary>view</primary> + <secondary>implementation through rules</secondary> +</indexterm> + +<para> + Views in <productname>PostgreSQL</productname> are implemented + using the rule system. In fact, there is essentially no difference + between: + +<programlisting> +CREATE VIEW myview AS SELECT * FROM mytab; +</programlisting> + + compared against the two commands: + +<programlisting> +CREATE TABLE myview (<replaceable>same column list as mytab</replaceable>); +CREATE RULE "_RETURN" AS ON SELECT TO myview DO INSTEAD + SELECT * FROM mytab; +</programlisting> + + because this is exactly what the <command>CREATE VIEW</command> + command does internally. This has some side effects. One of them + is that the information about a view in the + <productname>PostgreSQL</productname> system catalogs is exactly + the same as it is for a table. So for the parser, there is + absolutely no difference between a table and a view. They are the + same thing: relations. +</para> + +<sect2 id="rules-select"> +<title>How <command>SELECT</command> Rules Work</title> + +<indexterm zone="rules-select"> + <primary>rule</primary> + <secondary sortas="SELECT">for SELECT</secondary> +</indexterm> + +<para> + Rules <literal>ON SELECT</literal> are applied to all queries as the last step, even + if the command given is an <command>INSERT</command>, + <command>UPDATE</command> or <command>DELETE</command>. And they + have different semantics from rules on the other command types in that they modify the + query tree in place instead of creating a new one. So + <command>SELECT</command> rules are described first. +</para> + +<para> + Currently, there can be only one action in an <literal>ON SELECT</literal> rule, and it must + be an unconditional <command>SELECT</command> action that is <literal>INSTEAD</literal>. This restriction was + required to make rules safe enough to open them for ordinary users, and + it restricts <literal>ON SELECT</literal> rules to act like views. +</para> + +<para> + The examples for this chapter are two join views that do some + calculations and some more views using them in turn. One of the + two first views is customized later by adding rules for + <command>INSERT</command>, <command>UPDATE</command>, and + <command>DELETE</command> operations so that the final result will + be a view that behaves like a real table with some magic + functionality. This is not such a simple example to start from and + this makes things harder to get into. But it's better to have one + example that covers all the points discussed step by step rather + than having many different ones that might mix up in mind. +</para> + +<para> + The real tables we need in the first two rule system descriptions + are these: + +<programlisting> +CREATE TABLE shoe_data ( + shoename text, -- primary key + sh_avail integer, -- available number of pairs + slcolor text, -- preferred shoelace color + slminlen real, -- minimum shoelace length + slmaxlen real, -- maximum shoelace length + slunit text -- length unit +); + +CREATE TABLE shoelace_data ( + sl_name text, -- primary key + sl_avail integer, -- available number of pairs + sl_color text, -- shoelace color + sl_len real, -- shoelace length + sl_unit text -- length unit +); + +CREATE TABLE unit ( + un_name text, -- primary key + un_fact real -- factor to transform to cm +); +</programlisting> + + As you can see, they represent shoe-store data. +</para> + +<para> + The views are created as: + +<programlisting> +CREATE VIEW shoe AS + SELECT sh.shoename, + sh.sh_avail, + sh.slcolor, + sh.slminlen, + sh.slminlen * un.un_fact AS slminlen_cm, + sh.slmaxlen, + sh.slmaxlen * un.un_fact AS slmaxlen_cm, + sh.slunit + FROM shoe_data sh, unit un + WHERE sh.slunit = un.un_name; + +CREATE VIEW shoelace AS + SELECT s.sl_name, + s.sl_avail, + s.sl_color, + s.sl_len, + s.sl_unit, + s.sl_len * u.un_fact AS sl_len_cm + FROM shoelace_data s, unit u + WHERE s.sl_unit = u.un_name; + +CREATE VIEW shoe_ready AS + SELECT rsh.shoename, + rsh.sh_avail, + rsl.sl_name, + rsl.sl_avail, + least(rsh.sh_avail, rsl.sl_avail) AS total_avail + FROM shoe rsh, shoelace rsl + WHERE rsl.sl_color = rsh.slcolor + AND rsl.sl_len_cm >= rsh.slminlen_cm + AND rsl.sl_len_cm <= rsh.slmaxlen_cm; +</programlisting> + + The <command>CREATE VIEW</command> command for the + <literal>shoelace</literal> view (which is the simplest one we + have) will create a relation <literal>shoelace</literal> and an entry in + <structname>pg_rewrite</structname> that tells that there is a + rewrite rule that must be applied whenever the relation <literal>shoelace</literal> + is referenced in a query's range table. The rule has no rule + qualification (discussed later, with the non-<command>SELECT</command> rules, since + <command>SELECT</command> rules currently cannot have them) and it is <literal>INSTEAD</literal>. Note + that rule qualifications are not the same as query qualifications. + The action of our rule has a query qualification. + The action of the rule is one query tree that is a copy of the + <command>SELECT</command> statement in the view creation command. +</para> + + <note> + <para> + The two extra range + table entries for <literal>NEW</literal> and <literal>OLD</literal> that you can see in + the <structname>pg_rewrite</structname> entry aren't of interest + for <command>SELECT</command> rules. + </para> + </note> + +<para> + Now we populate <literal>unit</literal>, <literal>shoe_data</literal> + and <literal>shoelace_data</literal> and run a simple query on a view: + +<programlisting> +INSERT INTO unit VALUES ('cm', 1.0); +INSERT INTO unit VALUES ('m', 100.0); +INSERT INTO unit VALUES ('inch', 2.54); + +INSERT INTO shoe_data VALUES ('sh1', 2, 'black', 70.0, 90.0, 'cm'); +INSERT INTO shoe_data VALUES ('sh2', 0, 'black', 30.0, 40.0, 'inch'); +INSERT INTO shoe_data VALUES ('sh3', 4, 'brown', 50.0, 65.0, 'cm'); +INSERT INTO shoe_data VALUES ('sh4', 3, 'brown', 40.0, 50.0, 'inch'); + +INSERT INTO shoelace_data VALUES ('sl1', 5, 'black', 80.0, 'cm'); +INSERT INTO shoelace_data VALUES ('sl2', 6, 'black', 100.0, 'cm'); +INSERT INTO shoelace_data VALUES ('sl3', 0, 'black', 35.0 , 'inch'); +INSERT INTO shoelace_data VALUES ('sl4', 8, 'black', 40.0 , 'inch'); +INSERT INTO shoelace_data VALUES ('sl5', 4, 'brown', 1.0 , 'm'); +INSERT INTO shoelace_data VALUES ('sl6', 0, 'brown', 0.9 , 'm'); +INSERT INTO shoelace_data VALUES ('sl7', 7, 'brown', 60 , 'cm'); +INSERT INTO shoelace_data VALUES ('sl8', 1, 'brown', 40 , 'inch'); + +SELECT * FROM shoelace; + + sl_name | sl_avail | sl_color | sl_len | sl_unit | sl_len_cm +-----------+----------+----------+--------+---------+----------- + sl1 | 5 | black | 80 | cm | 80 + sl2 | 6 | black | 100 | cm | 100 + sl7 | 7 | brown | 60 | cm | 60 + sl3 | 0 | black | 35 | inch | 88.9 + sl4 | 8 | black | 40 | inch | 101.6 + sl8 | 1 | brown | 40 | inch | 101.6 + sl5 | 4 | brown | 1 | m | 100 + sl6 | 0 | brown | 0.9 | m | 90 +(8 rows) +</programlisting> + </para> + + <para> + This is the simplest <command>SELECT</command> you can do on our + views, so we take this opportunity to explain the basics of view + rules. The <literal>SELECT * FROM shoelace</literal> was + interpreted by the parser and produced the query tree: + +<programlisting> +SELECT shoelace.sl_name, shoelace.sl_avail, + shoelace.sl_color, shoelace.sl_len, + shoelace.sl_unit, shoelace.sl_len_cm + FROM shoelace shoelace; +</programlisting> + + and this is given to the rule system. The rule system walks through the + range table and checks if there are rules + for any relation. When processing the range table entry for + <literal>shoelace</literal> (the only one up to now) it finds the + <literal>_RETURN</literal> rule with the query tree: + +<programlisting> +SELECT s.sl_name, s.sl_avail, + s.sl_color, s.sl_len, s.sl_unit, + s.sl_len * u.un_fact AS sl_len_cm + FROM shoelace old, shoelace new, + shoelace_data s, unit u + WHERE s.sl_unit = u.un_name; +</programlisting> +</para> + +<para> + To expand the view, the rewriter simply creates a subquery range-table + entry containing the rule's action query tree, and substitutes this + range table entry for the original one that referenced the view. The + resulting rewritten query tree is almost the same as if you had typed: + +<programlisting> +SELECT shoelace.sl_name, shoelace.sl_avail, + shoelace.sl_color, shoelace.sl_len, + shoelace.sl_unit, shoelace.sl_len_cm + FROM (SELECT s.sl_name, + s.sl_avail, + s.sl_color, + s.sl_len, + s.sl_unit, + s.sl_len * u.un_fact AS sl_len_cm + FROM shoelace_data s, unit u + WHERE s.sl_unit = u.un_name) shoelace; +</programlisting> + + There is one difference however: the subquery's range table has two + extra entries <literal>shoelace old</literal> and <literal>shoelace new</literal>. These entries don't + participate directly in the query, since they aren't referenced by + the subquery's join tree or target list. The rewriter uses them + to store the access privilege check information that was originally present + in the range-table entry that referenced the view. In this way, the + executor will still check that the user has proper privileges to access + the view, even though there's no direct use of the view in the rewritten + query. +</para> + +<para> + That was the first rule applied. The rule system will continue checking + the remaining range-table entries in the top query (in this example there + are no more), and it will recursively check the range-table entries in + the added subquery to see if any of them reference views. (But it + won't expand <literal>old</literal> or <literal>new</literal> — otherwise we'd have infinite recursion!) + In this example, there are no rewrite rules for <literal>shoelace_data</literal> or <literal>unit</literal>, + so rewriting is complete and the above is the final result given to + the planner. +</para> + +<para> + Now we want to write a query that finds out for which shoes currently in the store + we have the matching shoelaces (color and length) and where the + total number of exactly matching pairs is greater than or equal to two. + +<programlisting> +SELECT * FROM shoe_ready WHERE total_avail >= 2; + + shoename | sh_avail | sl_name | sl_avail | total_avail +----------+----------+---------+----------+------------- + sh1 | 2 | sl1 | 5 | 2 + sh3 | 4 | sl7 | 7 | 4 +(2 rows) +</programlisting> +</para> + +<para> + The output of the parser this time is the query tree: + +<programlisting> +SELECT shoe_ready.shoename, shoe_ready.sh_avail, + shoe_ready.sl_name, shoe_ready.sl_avail, + shoe_ready.total_avail + FROM shoe_ready shoe_ready + WHERE shoe_ready.total_avail >= 2; +</programlisting> + + The first rule applied will be the one for the + <literal>shoe_ready</literal> view and it results in the + query tree: + +<programlisting> +SELECT shoe_ready.shoename, shoe_ready.sh_avail, + shoe_ready.sl_name, shoe_ready.sl_avail, + shoe_ready.total_avail + FROM (SELECT rsh.shoename, + rsh.sh_avail, + rsl.sl_name, + rsl.sl_avail, + least(rsh.sh_avail, rsl.sl_avail) AS total_avail + FROM shoe rsh, shoelace rsl + WHERE rsl.sl_color = rsh.slcolor + AND rsl.sl_len_cm >= rsh.slminlen_cm + AND rsl.sl_len_cm <= rsh.slmaxlen_cm) shoe_ready + WHERE shoe_ready.total_avail >= 2; +</programlisting> + + Similarly, the rules for <literal>shoe</literal> and + <literal>shoelace</literal> are substituted into the range table of + the subquery, leading to a three-level final query tree: + +<programlisting> +SELECT shoe_ready.shoename, shoe_ready.sh_avail, + shoe_ready.sl_name, shoe_ready.sl_avail, + shoe_ready.total_avail + FROM (SELECT rsh.shoename, + rsh.sh_avail, + rsl.sl_name, + rsl.sl_avail, + least(rsh.sh_avail, rsl.sl_avail) AS total_avail + FROM (SELECT sh.shoename, + sh.sh_avail, + sh.slcolor, + sh.slminlen, + sh.slminlen * un.un_fact AS slminlen_cm, + sh.slmaxlen, + sh.slmaxlen * un.un_fact AS slmaxlen_cm, + sh.slunit + FROM shoe_data sh, unit un + WHERE sh.slunit = un.un_name) rsh, + (SELECT s.sl_name, + s.sl_avail, + s.sl_color, + s.sl_len, + s.sl_unit, + s.sl_len * u.un_fact AS sl_len_cm + FROM shoelace_data s, unit u + WHERE s.sl_unit = u.un_name) rsl + WHERE rsl.sl_color = rsh.slcolor + AND rsl.sl_len_cm >= rsh.slminlen_cm + AND rsl.sl_len_cm <= rsh.slmaxlen_cm) shoe_ready + WHERE shoe_ready.total_avail > 2; +</programlisting> + </para> + + <para> + This might look inefficient, but the planner will collapse this into a + single-level query tree by <quote>pulling up</quote> the subqueries, + and then it will plan the joins just as if we'd written them out + manually. So collapsing the query tree is an optimization that the + rewrite system doesn't have to concern itself with. + </para> +</sect2> + +<sect2> +<title>View Rules in Non-<command>SELECT</command> Statements</title> + +<para> + Two details of the query tree aren't touched in the description of + view rules above. These are the command type and the result relation. + In fact, the command type is not needed by view rules, but the result + relation may affect the way in which the query rewriter works, because + special care needs to be taken if the result relation is a view. +</para> + +<para> + There are only a few differences between a query tree for a + <command>SELECT</command> and one for any other + command. Obviously, they have a different command type and for a + command other than a <command>SELECT</command>, the result + relation points to the range-table entry where the result should + go. Everything else is absolutely the same. So having two tables + <literal>t1</literal> and <literal>t2</literal> with columns <literal>a</literal> and + <literal>b</literal>, the query trees for the two statements: + +<programlisting> +SELECT t2.b FROM t1, t2 WHERE t1.a = t2.a; + +UPDATE t1 SET b = t2.b FROM t2 WHERE t1.a = t2.a; +</programlisting> + + are nearly identical. In particular: + + <itemizedlist> + <listitem> + <para> + The range tables contain entries for the tables <literal>t1</literal> and <literal>t2</literal>. + </para> + </listitem> + + <listitem> + <para> + The target lists contain one variable that points to column + <literal>b</literal> of the range table entry for table <literal>t2</literal>. + </para> + </listitem> + + <listitem> + <para> + The qualification expressions compare the columns <literal>a</literal> of both + range-table entries for equality. + </para> + </listitem> + + <listitem> + <para> + The join trees show a simple join between <literal>t1</literal> and <literal>t2</literal>. + </para> + </listitem> + </itemizedlist> + </para> + + <para> + The consequence is, that both query trees result in similar + execution plans: They are both joins over the two tables. For the + <command>UPDATE</command> the missing columns from <literal>t1</literal> are added to + the target list by the planner and the final query tree will read + as: + +<programlisting> +UPDATE t1 SET a = t1.a, b = t2.b FROM t2 WHERE t1.a = t2.a; +</programlisting> + + and thus the executor run over the join will produce exactly the + same result set as: + +<programlisting> +SELECT t1.a, t2.b FROM t1, t2 WHERE t1.a = t2.a; +</programlisting> + + But there is a little problem in + <command>UPDATE</command>: the part of the executor plan that does + the join does not care what the results from the join are + meant for. It just produces a result set of rows. The fact that + one is a <command>SELECT</command> command and the other is an + <command>UPDATE</command> is handled higher up in the executor, where + it knows that this is an <command>UPDATE</command>, and it knows that + this result should go into table <literal>t1</literal>. But which of the rows + that are there has to be replaced by the new row? +</para> + +<para> + To resolve this problem, another entry is added to the target list + in <command>UPDATE</command> (and also in + <command>DELETE</command>) statements: the current tuple ID + (<acronym>CTID</acronym>).<indexterm><primary>CTID</primary></indexterm> + This is a system column containing the + file block number and position in the block for the row. Knowing + the table, the <acronym>CTID</acronym> can be used to retrieve the + original row of <literal>t1</literal> to be updated. After adding the + <acronym>CTID</acronym> to the target list, the query actually looks like: + +<programlisting> +SELECT t1.a, t2.b, t1.ctid FROM t1, t2 WHERE t1.a = t2.a; +</programlisting> + + Now another detail of <productname>PostgreSQL</productname> enters + the stage. Old table rows aren't overwritten, and this + is why <command>ROLLBACK</command> is fast. In an <command>UPDATE</command>, + the new result row is inserted into the table (after stripping the + <acronym>CTID</acronym>) and in the row header of the old row, which the + <acronym>CTID</acronym> pointed to, the <literal>cmax</literal> and + <literal>xmax</literal> entries are set to the current command counter + and current transaction ID. Thus the old row is hidden, and after + the transaction commits the vacuum cleaner can eventually remove + the dead row. +</para> + +<para> + Knowing all that, we can simply apply view rules in absolutely + the same way to any command. There is no difference. +</para> +</sect2> + +<sect2> +<title>The Power of Views in <productname>PostgreSQL</productname></title> + +<para> + The above demonstrates how the rule system incorporates view + definitions into the original query tree. In the second example, a + simple <command>SELECT</command> from one view created a final + query tree that is a join of 4 tables (<literal>unit</literal> was used twice with + different names). +</para> + +<para> + The benefit of implementing views with the rule system is + that the planner has all + the information about which tables have to be scanned plus the + relationships between these tables plus the restrictive + qualifications from the views plus the qualifications from + the original query + in one single query tree. And this is still the situation + when the original query is already a join over views. + The planner has to decide which is + the best path to execute the query, and the more information + the planner has, the better this decision can be. And + the rule system as implemented in <productname>PostgreSQL</productname> + ensures that this is all information available about the query + up to that point. +</para> +</sect2> + +<sect2 id="rules-views-update"> +<title>Updating a View</title> + +<para> + What happens if a view is named as the target relation for an + <command>INSERT</command>, <command>UPDATE</command>, or + <command>DELETE</command>? Doing the substitutions + described above would give a query tree in which the result + relation points at a subquery range-table entry, which will not + work. There are several ways in which <productname>PostgreSQL</productname> + can support the appearance of updating a view, however. +</para> + +<para> + If the subquery selects from a single base relation and is simple + enough, the rewriter can automatically replace the subquery with the + underlying base relation so that the <command>INSERT</command>, + <command>UPDATE</command>, or <command>DELETE</command> is applied to + the base relation in the appropriate way. Views that are + <quote>simple enough</quote> for this are called <firstterm>automatically + updatable</firstterm>. For detailed information on the kinds of view that can + be automatically updated, see <xref linkend="sql-createview"/>. +</para> + +<para> + Alternatively, the operation may be handled by a user-provided + <literal>INSTEAD OF</literal> trigger on the view. + Rewriting works slightly differently + in this case. For <command>INSERT</command>, the rewriter does + nothing at all with the view, leaving it as the result relation + for the query. For <command>UPDATE</command> and + <command>DELETE</command>, it's still necessary to expand the + view query to produce the <quote>old</quote> rows that the command will + attempt to update or delete. So the view is expanded as normal, + but another unexpanded range-table entry is added to the query + to represent the view in its capacity as the result relation. +</para> + +<para> + The problem that now arises is how to identify the rows to be + updated in the view. Recall that when the result relation + is a table, a special <acronym>CTID</acronym> entry is added to the target + list to identify the physical locations of the rows to be updated. + This does not work if the result relation is a view, because a view + does not have any <acronym>CTID</acronym>, since its rows do not have + actual physical locations. Instead, for an <command>UPDATE</command> + or <command>DELETE</command> operation, a special <literal>wholerow</literal> + entry is added to the target list, which expands to include all + columns from the view. The executor uses this value to supply the + <quote>old</quote> row to the <literal>INSTEAD OF</literal> trigger. It is + up to the trigger to work out what to update based on the old and + new row values. +</para> + +<para> + Another possibility is for the user to define <literal>INSTEAD</literal> + rules that specify substitute actions for <command>INSERT</command>, + <command>UPDATE</command>, and <command>DELETE</command> commands on + a view. These rules will rewrite the command, typically into a command + that updates one or more tables, rather than views. That is the topic + of <xref linkend="rules-update"/>. +</para> + +<para> + Note that rules are evaluated first, rewriting the original query + before it is planned and executed. Therefore, if a view has + <literal>INSTEAD OF</literal> triggers as well as rules on <command>INSERT</command>, + <command>UPDATE</command>, or <command>DELETE</command>, then the rules will be + evaluated first, and depending on the result, the triggers may not be + used at all. +</para> + +<para> + Automatic rewriting of an <command>INSERT</command>, + <command>UPDATE</command>, or <command>DELETE</command> query on a + simple view is always tried last. Therefore, if a view has rules or + triggers, they will override the default behavior of automatically + updatable views. +</para> + +<para> + If there are no <literal>INSTEAD</literal> rules or <literal>INSTEAD OF</literal> + triggers for the view, and the rewriter cannot automatically rewrite + the query as an update on the underlying base relation, an error will + be thrown because the executor cannot update a view as such. +</para> + +</sect2> + +</sect1> + +<sect1 id="rules-materializedviews"> +<title>Materialized Views</title> + +<indexterm zone="rules-materializedviews"> + <primary>rule</primary> + <secondary>and materialized views</secondary> +</indexterm> + +<indexterm zone="rules-materializedviews"> + <primary>materialized view</primary> + <secondary>implementation through rules</secondary> +</indexterm> + +<indexterm zone="rules-materializedviews"> + <primary>view</primary> + <secondary>materialized</secondary> +</indexterm> + +<para> + Materialized views in <productname>PostgreSQL</productname> use the + rule system like views do, but persist the results in a table-like form. + The main differences between: + +<programlisting> +CREATE MATERIALIZED VIEW mymatview AS SELECT * FROM mytab; +</programlisting> + + and: + +<programlisting> +CREATE TABLE mymatview AS SELECT * FROM mytab; +</programlisting> + + are that the materialized view cannot subsequently be directly updated + and that the query used to create the materialized view is stored in + exactly the same way that a view's query is stored, so that fresh data + can be generated for the materialized view with: + +<programlisting> +REFRESH MATERIALIZED VIEW mymatview; +</programlisting> + + The information about a materialized view in the + <productname>PostgreSQL</productname> system catalogs is exactly + the same as it is for a table or view. So for the parser, a + materialized view is a relation, just like a table or a view. When + a materialized view is referenced in a query, the data is returned + directly from the materialized view, like from a table; the rule is + only used for populating the materialized view. +</para> + +<para> + While access to the data stored in a materialized view is often much + faster than accessing the underlying tables directly or through a view, + the data is not always current; yet sometimes current data is not needed. + Consider a table which records sales: + +<programlisting> +CREATE TABLE invoice ( + invoice_no integer PRIMARY KEY, + seller_no integer, -- ID of salesperson + invoice_date date, -- date of sale + invoice_amt numeric(13,2) -- amount of sale +); +</programlisting> + + If people want to be able to quickly graph historical sales data, they + might want to summarize, and they may not care about the incomplete data + for the current date: + +<programlisting> +CREATE MATERIALIZED VIEW sales_summary AS + SELECT + seller_no, + invoice_date, + sum(invoice_amt)::numeric(13,2) as sales_amt + FROM invoice + WHERE invoice_date < CURRENT_DATE + GROUP BY + seller_no, + invoice_date + ORDER BY + seller_no, + invoice_date; + +CREATE UNIQUE INDEX sales_summary_seller + ON sales_summary (seller_no, invoice_date); +</programlisting> + + This materialized view might be useful for displaying a graph in the + dashboard created for salespeople. A job could be scheduled to update + the statistics each night using this SQL statement: + +<programlisting> +REFRESH MATERIALIZED VIEW sales_summary; +</programlisting> +</para> + +<para> + Another use for a materialized view is to allow faster access to data + brought across from a remote system through a foreign data wrapper. + A simple example using <literal>file_fdw</literal> is below, with timings, + but since this is using cache on the local system the performance + difference compared to access to a remote system would usually be greater + than shown here. Notice we are also exploiting the ability to put an + index on the materialized view, whereas <literal>file_fdw</literal> does + not support indexes; this advantage might not apply for other sorts of + foreign data access. +</para> + +<para> + Setup: + +<programlisting> +CREATE EXTENSION file_fdw; +CREATE SERVER local_file FOREIGN DATA WRAPPER file_fdw; +CREATE FOREIGN TABLE words (word text NOT NULL) + SERVER local_file + OPTIONS (filename '/usr/share/dict/words'); +CREATE MATERIALIZED VIEW wrd AS SELECT * FROM words; +CREATE UNIQUE INDEX wrd_word ON wrd (word); +CREATE EXTENSION pg_trgm; +CREATE INDEX wrd_trgm ON wrd USING gist (word gist_trgm_ops); +VACUUM ANALYZE wrd; +</programlisting> + + Now let's spell-check a word. Using <literal>file_fdw</literal> directly: + +<programlisting> +SELECT count(*) FROM words WHERE word = 'caterpiler'; + + count +------- + 0 +(1 row) +</programlisting> + + With <command>EXPLAIN ANALYZE</command>, we see: + +<programlisting> + Aggregate (cost=21763.99..21764.00 rows=1 width=0) (actual time=188.180..188.181 rows=1 loops=1) + -> Foreign Scan on words (cost=0.00..21761.41 rows=1032 width=0) (actual time=188.177..188.177 rows=0 loops=1) + Filter: (word = 'caterpiler'::text) + Rows Removed by Filter: 479829 + Foreign File: /usr/share/dict/words + Foreign File Size: 4953699 + Planning time: 0.118 ms + Execution time: 188.273 ms +</programlisting> + + If the materialized view is used instead, the query is much faster: + +<programlisting> + Aggregate (cost=4.44..4.45 rows=1 width=0) (actual time=0.042..0.042 rows=1 loops=1) + -> Index Only Scan using wrd_word on wrd (cost=0.42..4.44 rows=1 width=0) (actual time=0.039..0.039 rows=0 loops=1) + Index Cond: (word = 'caterpiler'::text) + Heap Fetches: 0 + Planning time: 0.164 ms + Execution time: 0.117 ms +</programlisting> + + Either way, the word is spelled wrong, so let's look for what we might + have wanted. Again using <literal>file_fdw</literal> and + <literal>pg_trgm</literal>: + +<programlisting> +SELECT word FROM words ORDER BY word <-> 'caterpiler' LIMIT 10; + + word +--------------- + cater + caterpillar + Caterpillar + caterpillars + caterpillar's + Caterpillar's + caterer + caterer's + caters + catered +(10 rows) +</programlisting> + +<programlisting> + Limit (cost=11583.61..11583.64 rows=10 width=32) (actual time=1431.591..1431.594 rows=10 loops=1) + -> Sort (cost=11583.61..11804.76 rows=88459 width=32) (actual time=1431.589..1431.591 rows=10 loops=1) + Sort Key: ((word <-> 'caterpiler'::text)) + Sort Method: top-N heapsort Memory: 25kB + -> Foreign Scan on words (cost=0.00..9672.05 rows=88459 width=32) (actual time=0.057..1286.455 rows=479829 loops=1) + Foreign File: /usr/share/dict/words + Foreign File Size: 4953699 + Planning time: 0.128 ms + Execution time: 1431.679 ms +</programlisting> + + Using the materialized view: + +<programlisting> + Limit (cost=0.29..1.06 rows=10 width=10) (actual time=187.222..188.257 rows=10 loops=1) + -> Index Scan using wrd_trgm on wrd (cost=0.29..37020.87 rows=479829 width=10) (actual time=187.219..188.252 rows=10 loops=1) + Order By: (word <-> 'caterpiler'::text) + Planning time: 0.196 ms + Execution time: 198.640 ms +</programlisting> + + If you can tolerate periodic update of the remote data to the local + database, the performance benefit can be substantial. +</para> + +</sect1> + +<sect1 id="rules-update"> +<title>Rules on <command>INSERT</command>, <command>UPDATE</command>, and <command>DELETE</command></title> + +<indexterm zone="rules-update"> + <primary>rule</primary> + <secondary sortas="INSERT">for INSERT</secondary> +</indexterm> + +<indexterm zone="rules-update"> + <primary>rule</primary> + <secondary sortas="UPDATE">for UPDATE</secondary> +</indexterm> + +<indexterm zone="rules-update"> + <primary>rule</primary> + <secondary sortas="DELETE">for DELETE</secondary> +</indexterm> + +<para> + Rules that are defined on <command>INSERT</command>, <command>UPDATE</command>, + and <command>DELETE</command> are significantly different from the view rules + described in the previous section. First, their <command>CREATE + RULE</command> command allows more: + + <itemizedlist> + <listitem> + <para> + They are allowed to have no action. + </para> + </listitem> + + <listitem> + <para> + They can have multiple actions. + </para> + </listitem> + + <listitem> + <para> + They can be <literal>INSTEAD</literal> or <literal>ALSO</literal> (the default). + </para> + </listitem> + + <listitem> + <para> + The pseudorelations <literal>NEW</literal> and <literal>OLD</literal> become useful. + </para> + </listitem> + + <listitem> + <para> + They can have rule qualifications. + </para> + </listitem> + </itemizedlist> + + Second, they don't modify the query tree in place. Instead they + create zero or more new query trees and can throw away the + original one. +</para> + +<caution> + <para> + In many cases, tasks that could be performed by rules + on <command>INSERT</command>/<command>UPDATE</command>/<command>DELETE</command> are better done + with triggers. Triggers are notationally a bit more complicated, but their + semantics are much simpler to understand. Rules tend to have surprising + results when the original query contains volatile functions: volatile + functions may get executed more times than expected in the process of + carrying out the rules. + </para> + + <para> + Also, there are some cases that are not supported by these types of rules at + all, notably including <literal>WITH</literal> clauses in the original query and + multiple-assignment sub-<literal>SELECT</literal>s in the <literal>SET</literal> list + of <command>UPDATE</command> queries. This is because copying these constructs + into a rule query would result in multiple evaluations of the sub-query, + contrary to the express intent of the query's author. + </para> +</caution> + +<sect2> +<title>How Update Rules Work</title> + +<para> + Keep the syntax: + +<programlisting> +CREATE [ OR REPLACE ] RULE <replaceable class="parameter">name</replaceable> AS ON <replaceable class="parameter">event</replaceable> + TO <replaceable class="parameter">table</replaceable> [ WHERE <replaceable class="parameter">condition</replaceable> ] + DO [ ALSO | INSTEAD ] { NOTHING | <replaceable class="parameter">command</replaceable> | ( <replaceable class="parameter">command</replaceable> ; <replaceable class="parameter">command</replaceable> ... ) } +</programlisting> + + in mind. + In the following, <firstterm>update rules</firstterm> means rules that are defined + on <command>INSERT</command>, <command>UPDATE</command>, or <command>DELETE</command>. +</para> + +<para> + Update rules get applied by the rule system when the result + relation and the command type of a query tree are equal to the + object and event given in the <command>CREATE RULE</command> command. + For update rules, the rule system creates a list of query trees. + Initially the query-tree list is empty. + There can be zero (<literal>NOTHING</literal> key word), one, or multiple actions. + To simplify, we will look at a rule with one action. This rule + can have a qualification or not and it can be <literal>INSTEAD</literal> or + <literal>ALSO</literal> (the default). +</para> + +<para> + What is a rule qualification? It is a restriction that tells + when the actions of the rule should be done and when not. This + qualification can only reference the pseudorelations <literal>NEW</literal> and/or <literal>OLD</literal>, + which basically represent the relation that was given as object (but with a + special meaning). +</para> + + <para> + So we have three cases that produce the following query trees for + a one-action rule. + + <variablelist> + <varlistentry> + <term>No qualification, with either <literal>ALSO</literal> or + <literal>INSTEAD</literal></term> + <listitem> + <para> + the query tree from the rule action with the original query + tree's qualification added + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term>Qualification given and <literal>ALSO</literal></term> + <listitem> + <para> + the query tree from the rule action with the rule + qualification and the original query tree's qualification + added + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term>Qualification given and <literal>INSTEAD</literal></term> + <listitem> + <para> + the query tree from the rule action with the rule + qualification and the original query tree's qualification; and + the original query tree with the negated rule qualification + added + </para> + </listitem> + </varlistentry> + </variablelist> + + Finally, if the rule is <literal>ALSO</literal>, the unchanged original query tree is + added to the list. Since only qualified <literal>INSTEAD</literal> rules already add the + original query tree, we end up with either one or two output query trees + for a rule with one action. +</para> + +<para> + For <literal>ON INSERT</literal> rules, the original query (if not suppressed by <literal>INSTEAD</literal>) + is done before any actions added by rules. This allows the actions to + see the inserted row(s). But for <literal>ON UPDATE</literal> and <literal>ON + DELETE</literal> rules, the original query is done after the actions added by rules. + This ensures that the actions can see the to-be-updated or to-be-deleted + rows; otherwise, the actions might do nothing because they find no rows + matching their qualifications. +</para> + +<para> + The query trees generated from rule actions are thrown into the + rewrite system again, and maybe more rules get applied resulting + in additional or fewer query trees. + So a rule's actions must have either a different + command type or a different result relation than the rule itself is + on, otherwise this recursive process will end up in an infinite loop. + (Recursive expansion of a rule will be detected and reported as an + error.) +</para> + +<para> + The query trees found in the actions of the + <structname>pg_rewrite</structname> system catalog are only + templates. Since they can reference the range-table entries for + <literal>NEW</literal> and <literal>OLD</literal>, some substitutions have to be made before they can be + used. For any reference to <literal>NEW</literal>, the target list of the original + query is searched for a corresponding entry. If found, that + entry's expression replaces the reference. Otherwise, <literal>NEW</literal> means the + same as <literal>OLD</literal> (for an <command>UPDATE</command>) or is replaced by + a null value (for an <command>INSERT</command>). Any reference to <literal>OLD</literal> is + replaced by a reference to the range-table entry that is the + result relation. +</para> + +<para> + After the system is done applying update rules, it applies view rules to the + produced query tree(s). Views cannot insert new update actions so + there is no need to apply update rules to the output of view rewriting. +</para> + +<sect3> +<title>A First Rule Step by Step</title> + +<para> + Say we want to trace changes to the <literal>sl_avail</literal> column in the + <literal>shoelace_data</literal> relation. So we set up a log table + and a rule that conditionally writes a log entry when an + <command>UPDATE</command> is performed on + <literal>shoelace_data</literal>. + +<programlisting> +CREATE TABLE shoelace_log ( + sl_name text, -- shoelace changed + sl_avail integer, -- new available value + log_who text, -- who did it + log_when timestamp -- when +); + +CREATE RULE log_shoelace AS ON UPDATE TO shoelace_data + WHERE NEW.sl_avail <> OLD.sl_avail + DO INSERT INTO shoelace_log VALUES ( + NEW.sl_name, + NEW.sl_avail, + current_user, + current_timestamp + ); +</programlisting> +</para> + +<para> + Now someone does: + +<programlisting> +UPDATE shoelace_data SET sl_avail = 6 WHERE sl_name = 'sl7'; +</programlisting> + + and we look at the log table: + +<programlisting> +SELECT * FROM shoelace_log; + + sl_name | sl_avail | log_who | log_when +---------+----------+---------+---------------------------------- + sl7 | 6 | Al | Tue Oct 20 16:14:45 1998 MET DST +(1 row) +</programlisting> + </para> + + <para> + That's what we expected. What happened in the background is the following. + The parser created the query tree: + +<programlisting> +UPDATE shoelace_data SET sl_avail = 6 + FROM shoelace_data shoelace_data + WHERE shoelace_data.sl_name = 'sl7'; +</programlisting> + + There is a rule <literal>log_shoelace</literal> that is <literal>ON UPDATE</literal> with the rule + qualification expression: + +<programlisting> +NEW.sl_avail <> OLD.sl_avail +</programlisting> + + and the action: + +<programlisting> +INSERT INTO shoelace_log VALUES ( + new.sl_name, new.sl_avail, + current_user, current_timestamp ) + FROM shoelace_data new, shoelace_data old; +</programlisting> + + (This looks a little strange since you cannot normally write + <literal>INSERT ... VALUES ... FROM</literal>. The <literal>FROM</literal> + clause here is just to indicate that there are range-table entries + in the query tree for <literal>new</literal> and <literal>old</literal>. + These are needed so that they can be referenced by variables in + the <command>INSERT</command> command's query tree.) +</para> + +<para> + The rule is a qualified <literal>ALSO</literal> rule, so the rule system + has to return two query trees: the modified rule action and the original + query tree. In step 1, the range table of the original query is + incorporated into the rule's action query tree. This results in: + +<programlisting> +INSERT INTO shoelace_log VALUES ( + new.sl_name, new.sl_avail, + current_user, current_timestamp ) + FROM shoelace_data new, shoelace_data old, + <emphasis>shoelace_data shoelace_data</emphasis>; +</programlisting> + + In step 2, the rule qualification is added to it, so the result set + is restricted to rows where <literal>sl_avail</literal> changes: + +<programlisting> +INSERT INTO shoelace_log VALUES ( + new.sl_name, new.sl_avail, + current_user, current_timestamp ) + FROM shoelace_data new, shoelace_data old, + shoelace_data shoelace_data + <emphasis>WHERE new.sl_avail <> old.sl_avail</emphasis>; +</programlisting> + + (This looks even stranger, since <literal>INSERT ... VALUES</literal> doesn't have + a <literal>WHERE</literal> clause either, but the planner and executor will have no + difficulty with it. They need to support this same functionality + anyway for <literal>INSERT ... SELECT</literal>.) + </para> + + <para> + In step 3, the original query tree's qualification is added, + restricting the result set further to only the rows that would have been touched + by the original query: + +<programlisting> +INSERT INTO shoelace_log VALUES ( + new.sl_name, new.sl_avail, + current_user, current_timestamp ) + FROM shoelace_data new, shoelace_data old, + shoelace_data shoelace_data + WHERE new.sl_avail <> old.sl_avail + <emphasis>AND shoelace_data.sl_name = 'sl7'</emphasis>; +</programlisting> + </para> + + <para> + Step 4 replaces references to <literal>NEW</literal> by the target list entries from the + original query tree or by the matching variable references + from the result relation: + +<programlisting> +INSERT INTO shoelace_log VALUES ( + <emphasis>shoelace_data.sl_name</emphasis>, <emphasis>6</emphasis>, + current_user, current_timestamp ) + FROM shoelace_data new, shoelace_data old, + shoelace_data shoelace_data + WHERE <emphasis>6</emphasis> <> old.sl_avail + AND shoelace_data.sl_name = 'sl7'; +</programlisting> + + </para> + + <para> + Step 5 changes <literal>OLD</literal> references into result relation references: + +<programlisting> +INSERT INTO shoelace_log VALUES ( + shoelace_data.sl_name, 6, + current_user, current_timestamp ) + FROM shoelace_data new, shoelace_data old, + shoelace_data shoelace_data + WHERE 6 <> <emphasis>shoelace_data.sl_avail</emphasis> + AND shoelace_data.sl_name = 'sl7'; +</programlisting> + </para> + + <para> + That's it. Since the rule is <literal>ALSO</literal>, we also output the + original query tree. In short, the output from the rule system + is a list of two query trees that correspond to these statements: + +<programlisting> +INSERT INTO shoelace_log VALUES ( + shoelace_data.sl_name, 6, + current_user, current_timestamp ) + FROM shoelace_data + WHERE 6 <> shoelace_data.sl_avail + AND shoelace_data.sl_name = 'sl7'; + +UPDATE shoelace_data SET sl_avail = 6 + WHERE sl_name = 'sl7'; +</programlisting> + + These are executed in this order, and that is exactly what + the rule was meant to do. + </para> + + <para> + The substitutions and the added qualifications + ensure that, if the original query would be, say: + +<programlisting> +UPDATE shoelace_data SET sl_color = 'green' + WHERE sl_name = 'sl7'; +</programlisting> + + no log entry would get written. In that case, the original query + tree does not contain a target list entry for + <literal>sl_avail</literal>, so <literal>NEW.sl_avail</literal> will get + replaced by <literal>shoelace_data.sl_avail</literal>. Thus, the extra + command generated by the rule is: + +<programlisting> +INSERT INTO shoelace_log VALUES ( + shoelace_data.sl_name, <emphasis>shoelace_data.sl_avail</emphasis>, + current_user, current_timestamp ) + FROM shoelace_data + WHERE <emphasis>shoelace_data.sl_avail</emphasis> <> shoelace_data.sl_avail + AND shoelace_data.sl_name = 'sl7'; +</programlisting> + + and that qualification will never be true. + </para> + + <para> + It will also work if the original query modifies multiple rows. So + if someone issued the command: + +<programlisting> +UPDATE shoelace_data SET sl_avail = 0 + WHERE sl_color = 'black'; +</programlisting> + + four rows in fact get updated (<literal>sl1</literal>, <literal>sl2</literal>, <literal>sl3</literal>, and <literal>sl4</literal>). + But <literal>sl3</literal> already has <literal>sl_avail = 0</literal>. In this case, the original + query trees qualification is different and that results + in the extra query tree: + +<programlisting> +INSERT INTO shoelace_log +SELECT shoelace_data.sl_name, 0, + current_user, current_timestamp + FROM shoelace_data + WHERE 0 <> shoelace_data.sl_avail + AND <emphasis>shoelace_data.sl_color = 'black'</emphasis>; +</programlisting> + + being generated by the rule. This query tree will surely insert + three new log entries. And that's absolutely correct. +</para> + +<para> + Here we can see why it is important that the original query tree + is executed last. If the <command>UPDATE</command> had been + executed first, all the rows would have already been set to zero, so the + logging <command>INSERT</command> would not find any row where + <literal>0 <> shoelace_data.sl_avail</literal>. +</para> +</sect3> + +</sect2> + +<sect2 id="rules-update-views"> +<title>Cooperation with Views</title> + +<indexterm zone="rules-update-views"><primary>view</primary><secondary>updating</secondary></indexterm> + +<para> + A simple way to protect view relations from the mentioned + possibility that someone can try to run <command>INSERT</command>, + <command>UPDATE</command>, or <command>DELETE</command> on them is + to let those query trees get thrown away. So we could create the rules: + +<programlisting> +CREATE RULE shoe_ins_protect AS ON INSERT TO shoe + DO INSTEAD NOTHING; +CREATE RULE shoe_upd_protect AS ON UPDATE TO shoe + DO INSTEAD NOTHING; +CREATE RULE shoe_del_protect AS ON DELETE TO shoe + DO INSTEAD NOTHING; +</programlisting> + + If someone now tries to do any of these operations on the view + relation <literal>shoe</literal>, the rule system will + apply these rules. Since the rules have + no actions and are <literal>INSTEAD</literal>, the resulting list of + query trees will be empty and the whole query will become + nothing because there is nothing left to be optimized or + executed after the rule system is done with it. +</para> + +<para> + A more sophisticated way to use the rule system is to + create rules that rewrite the query tree into one that + does the right operation on the real tables. To do that + on the <literal>shoelace</literal> view, we create + the following rules: + +<programlisting> +CREATE RULE shoelace_ins AS ON INSERT TO shoelace + DO INSTEAD + INSERT INTO shoelace_data VALUES ( + NEW.sl_name, + NEW.sl_avail, + NEW.sl_color, + NEW.sl_len, + NEW.sl_unit + ); + +CREATE RULE shoelace_upd AS ON UPDATE TO shoelace + DO INSTEAD + UPDATE shoelace_data + SET sl_name = NEW.sl_name, + sl_avail = NEW.sl_avail, + sl_color = NEW.sl_color, + sl_len = NEW.sl_len, + sl_unit = NEW.sl_unit + WHERE sl_name = OLD.sl_name; + +CREATE RULE shoelace_del AS ON DELETE TO shoelace + DO INSTEAD + DELETE FROM shoelace_data + WHERE sl_name = OLD.sl_name; +</programlisting> + </para> + + <para> + If you want to support <literal>RETURNING</literal> queries on the view, + you need to make the rules include <literal>RETURNING</literal> clauses that + compute the view rows. This is usually pretty trivial for views on a + single table, but it's a bit tedious for join views such as + <literal>shoelace</literal>. An example for the insert case is: + +<programlisting> +CREATE RULE shoelace_ins AS ON INSERT TO shoelace + DO INSTEAD + INSERT INTO shoelace_data VALUES ( + NEW.sl_name, + NEW.sl_avail, + NEW.sl_color, + NEW.sl_len, + NEW.sl_unit + ) + RETURNING + shoelace_data.*, + (SELECT shoelace_data.sl_len * u.un_fact + FROM unit u WHERE shoelace_data.sl_unit = u.un_name); +</programlisting> + + Note that this one rule supports both <command>INSERT</command> and + <command>INSERT RETURNING</command> queries on the view — the + <literal>RETURNING</literal> clause is simply ignored for <command>INSERT</command>. + </para> + + <para> + Now assume that once in a while, a pack of shoelaces arrives at + the shop and a big parts list along with it. But you don't want + to manually update the <literal>shoelace</literal> view every + time. Instead we set up two little tables: one where you can + insert the items from the part list, and one with a special + trick. The creation commands for these are: + +<programlisting> +CREATE TABLE shoelace_arrive ( + arr_name text, + arr_quant integer +); + +CREATE TABLE shoelace_ok ( + ok_name text, + ok_quant integer +); + +CREATE RULE shoelace_ok_ins AS ON INSERT TO shoelace_ok + DO INSTEAD + UPDATE shoelace + SET sl_avail = sl_avail + NEW.ok_quant + WHERE sl_name = NEW.ok_name; +</programlisting> + + Now you can fill the table <literal>shoelace_arrive</literal> with + the data from the parts list: + +<programlisting> +SELECT * FROM shoelace_arrive; + + arr_name | arr_quant +----------+----------- + sl3 | 10 + sl6 | 20 + sl8 | 20 +(3 rows) +</programlisting> + + Take a quick look at the current data: + +<programlisting> +SELECT * FROM shoelace; + + sl_name | sl_avail | sl_color | sl_len | sl_unit | sl_len_cm +----------+----------+----------+--------+---------+----------- + sl1 | 5 | black | 80 | cm | 80 + sl2 | 6 | black | 100 | cm | 100 + sl7 | 6 | brown | 60 | cm | 60 + sl3 | 0 | black | 35 | inch | 88.9 + sl4 | 8 | black | 40 | inch | 101.6 + sl8 | 1 | brown | 40 | inch | 101.6 + sl5 | 4 | brown | 1 | m | 100 + sl6 | 0 | brown | 0.9 | m | 90 +(8 rows) +</programlisting> + + Now move the arrived shoelaces in: + +<programlisting> +INSERT INTO shoelace_ok SELECT * FROM shoelace_arrive; +</programlisting> + + and check the results: + +<programlisting> +SELECT * FROM shoelace ORDER BY sl_name; + + sl_name | sl_avail | sl_color | sl_len | sl_unit | sl_len_cm +----------+----------+----------+--------+---------+----------- + sl1 | 5 | black | 80 | cm | 80 + sl2 | 6 | black | 100 | cm | 100 + sl7 | 6 | brown | 60 | cm | 60 + sl4 | 8 | black | 40 | inch | 101.6 + sl3 | 10 | black | 35 | inch | 88.9 + sl8 | 21 | brown | 40 | inch | 101.6 + sl5 | 4 | brown | 1 | m | 100 + sl6 | 20 | brown | 0.9 | m | 90 +(8 rows) + +SELECT * FROM shoelace_log; + + sl_name | sl_avail | log_who| log_when +---------+----------+--------+---------------------------------- + sl7 | 6 | Al | Tue Oct 20 19:14:45 1998 MET DST + sl3 | 10 | Al | Tue Oct 20 19:25:16 1998 MET DST + sl6 | 20 | Al | Tue Oct 20 19:25:16 1998 MET DST + sl8 | 21 | Al | Tue Oct 20 19:25:16 1998 MET DST +(4 rows) +</programlisting> + </para> + + <para> + It's a long way from the one <literal>INSERT ... SELECT</literal> + to these results. And the description of the query-tree + transformation will be the last in this chapter. First, there is + the parser's output: + +<programlisting> +INSERT INTO shoelace_ok +SELECT shoelace_arrive.arr_name, shoelace_arrive.arr_quant + FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok; +</programlisting> + + Now the first rule <literal>shoelace_ok_ins</literal> is applied and turns this + into: + +<programlisting> +UPDATE shoelace + SET sl_avail = shoelace.sl_avail + shoelace_arrive.arr_quant + FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok, + shoelace_ok old, shoelace_ok new, + shoelace shoelace + WHERE shoelace.sl_name = shoelace_arrive.arr_name; +</programlisting> + + and throws away the original <command>INSERT</command> on + <literal>shoelace_ok</literal>. This rewritten query is passed to + the rule system again, and the second applied rule + <literal>shoelace_upd</literal> produces: + +<programlisting> +UPDATE shoelace_data + SET sl_name = shoelace.sl_name, + sl_avail = shoelace.sl_avail + shoelace_arrive.arr_quant, + sl_color = shoelace.sl_color, + sl_len = shoelace.sl_len, + sl_unit = shoelace.sl_unit + FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok, + shoelace_ok old, shoelace_ok new, + shoelace shoelace, shoelace old, + shoelace new, shoelace_data shoelace_data + WHERE shoelace.sl_name = shoelace_arrive.arr_name + AND shoelace_data.sl_name = shoelace.sl_name; +</programlisting> + + Again it's an <literal>INSTEAD</literal> rule and the previous query tree is trashed. + Note that this query still uses the view <literal>shoelace</literal>. + But the rule system isn't finished with this step, so it continues + and applies the <literal>_RETURN</literal> rule on it, and we get: + +<programlisting> +UPDATE shoelace_data + SET sl_name = s.sl_name, + sl_avail = s.sl_avail + shoelace_arrive.arr_quant, + sl_color = s.sl_color, + sl_len = s.sl_len, + sl_unit = s.sl_unit + FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok, + shoelace_ok old, shoelace_ok new, + shoelace shoelace, shoelace old, + shoelace new, shoelace_data shoelace_data, + shoelace old, shoelace new, + shoelace_data s, unit u + WHERE s.sl_name = shoelace_arrive.arr_name + AND shoelace_data.sl_name = s.sl_name; +</programlisting> + + Finally, the rule <literal>log_shoelace</literal> gets applied, + producing the extra query tree: + +<programlisting> +INSERT INTO shoelace_log +SELECT s.sl_name, + s.sl_avail + shoelace_arrive.arr_quant, + current_user, + current_timestamp + FROM shoelace_arrive shoelace_arrive, shoelace_ok shoelace_ok, + shoelace_ok old, shoelace_ok new, + shoelace shoelace, shoelace old, + shoelace new, shoelace_data shoelace_data, + shoelace old, shoelace new, + shoelace_data s, unit u, + shoelace_data old, shoelace_data new + shoelace_log shoelace_log + WHERE s.sl_name = shoelace_arrive.arr_name + AND shoelace_data.sl_name = s.sl_name + AND (s.sl_avail + shoelace_arrive.arr_quant) <> s.sl_avail; +</programlisting> + + After that the rule system runs out of rules and returns the + generated query trees. + </para> + + <para> + So we end up with two final query trees that are equivalent to the + <acronym>SQL</acronym> statements: + +<programlisting> +INSERT INTO shoelace_log +SELECT s.sl_name, + s.sl_avail + shoelace_arrive.arr_quant, + current_user, + current_timestamp + FROM shoelace_arrive shoelace_arrive, shoelace_data shoelace_data, + shoelace_data s + WHERE s.sl_name = shoelace_arrive.arr_name + AND shoelace_data.sl_name = s.sl_name + AND s.sl_avail + shoelace_arrive.arr_quant <> s.sl_avail; + +UPDATE shoelace_data + SET sl_avail = shoelace_data.sl_avail + shoelace_arrive.arr_quant + FROM shoelace_arrive shoelace_arrive, + shoelace_data shoelace_data, + shoelace_data s + WHERE s.sl_name = shoelace_arrive.sl_name + AND shoelace_data.sl_name = s.sl_name; +</programlisting> + + The result is that data coming from one relation inserted into another, + changed into updates on a third, changed into updating + a fourth plus logging that final update in a fifth + gets reduced into two queries. +</para> + +<para> + There is a little detail that's a bit ugly. Looking at the two + queries, it turns out that the <literal>shoelace_data</literal> + relation appears twice in the range table where it could + definitely be reduced to one. The planner does not handle it and + so the execution plan for the rule systems output of the + <command>INSERT</command> will be + +<literallayout class="monospaced"> +Nested Loop + -> Merge Join + -> Seq Scan + -> Sort + -> Seq Scan on s + -> Seq Scan + -> Sort + -> Seq Scan on shoelace_arrive + -> Seq Scan on shoelace_data +</literallayout> + + while omitting the extra range table entry would result in a + +<literallayout class="monospaced"> +Merge Join + -> Seq Scan + -> Sort + -> Seq Scan on s + -> Seq Scan + -> Sort + -> Seq Scan on shoelace_arrive +</literallayout> + + which produces exactly the same entries in the log table. Thus, + the rule system caused one extra scan on the table + <literal>shoelace_data</literal> that is absolutely not + necessary. And the same redundant scan is done once more in the + <command>UPDATE</command>. But it was a really hard job to make + that all possible at all. +</para> + +<para> + Now we make a final demonstration of the + <productname>PostgreSQL</productname> rule system and its power. + Say you add some shoelaces with extraordinary colors to your + database: + +<programlisting> +INSERT INTO shoelace VALUES ('sl9', 0, 'pink', 35.0, 'inch', 0.0); +INSERT INTO shoelace VALUES ('sl10', 1000, 'magenta', 40.0, 'inch', 0.0); +</programlisting> + + We would like to make a view to check which + <literal>shoelace</literal> entries do not fit any shoe in color. + The view for this is: + +<programlisting> +CREATE VIEW shoelace_mismatch AS + SELECT * FROM shoelace WHERE NOT EXISTS + (SELECT shoename FROM shoe WHERE slcolor = sl_color); +</programlisting> + + Its output is: + +<programlisting> +SELECT * FROM shoelace_mismatch; + + sl_name | sl_avail | sl_color | sl_len | sl_unit | sl_len_cm +---------+----------+----------+--------+---------+----------- + sl9 | 0 | pink | 35 | inch | 88.9 + sl10 | 1000 | magenta | 40 | inch | 101.6 +</programlisting> + </para> + + <para> + Now we want to set it up so that mismatching shoelaces that are + not in stock are deleted from the database. + To make it a little harder for <productname>PostgreSQL</productname>, + we don't delete it directly. Instead we create one more view: + +<programlisting> +CREATE VIEW shoelace_can_delete AS + SELECT * FROM shoelace_mismatch WHERE sl_avail = 0; +</programlisting> + + and do it this way: + +<programlisting> +DELETE FROM shoelace WHERE EXISTS + (SELECT * FROM shoelace_can_delete + WHERE sl_name = shoelace.sl_name); +</programlisting> + + <foreignphrase>Voilà</foreignphrase>: + +<programlisting> +SELECT * FROM shoelace; + + sl_name | sl_avail | sl_color | sl_len | sl_unit | sl_len_cm +---------+----------+----------+--------+---------+----------- + sl1 | 5 | black | 80 | cm | 80 + sl2 | 6 | black | 100 | cm | 100 + sl7 | 6 | brown | 60 | cm | 60 + sl4 | 8 | black | 40 | inch | 101.6 + sl3 | 10 | black | 35 | inch | 88.9 + sl8 | 21 | brown | 40 | inch | 101.6 + sl10 | 1000 | magenta | 40 | inch | 101.6 + sl5 | 4 | brown | 1 | m | 100 + sl6 | 20 | brown | 0.9 | m | 90 +(9 rows) +</programlisting> + </para> + + <para> + A <command>DELETE</command> on a view, with a subquery qualification that + in total uses 4 nesting/joined views, where one of them + itself has a subquery qualification containing a view + and where calculated view columns are used, + gets rewritten into + one single query tree that deletes the requested data + from a real table. +</para> + +<para> + There are probably only a few situations out in the real world + where such a construct is necessary. But it makes you feel + comfortable that it works. +</para> +</sect2> + +</sect1> + +<sect1 id="rules-privileges"> +<title>Rules and Privileges</title> + +<indexterm zone="rules-privileges"> + <primary>privilege</primary> + <secondary sortas="Regeln">with rules</secondary> +</indexterm> + +<indexterm zone="rules-privileges"> + <primary>privilege</primary> + <secondary sortas="Sichten">with views</secondary> +</indexterm> + +<para> + Due to rewriting of queries by the <productname>PostgreSQL</productname> + rule system, other tables/views than those used in the original + query get accessed. When update rules are used, this can include write access + to tables. +</para> + +<para> + Rewrite rules don't have a separate owner. The owner of + a relation (table or view) is automatically the owner of the + rewrite rules that are defined for it. + The <productname>PostgreSQL</productname> rule system changes the + behavior of the default access control system. Relations that + are used due to rules get checked against the + privileges of the rule owner, not the user invoking the rule. + This means that users only need the required privileges + for the tables/views that are explicitly named in their queries. +</para> + +<para> + For example: A user has a list of phone numbers where some of + them are private, the others are of interest for the assistant of the office. + The user can construct the following: + +<programlisting> +CREATE TABLE phone_data (person text, phone text, private boolean); +CREATE VIEW phone_number AS + SELECT person, CASE WHEN NOT private THEN phone END AS phone + FROM phone_data; +GRANT SELECT ON phone_number TO assistant; +</programlisting> + + Nobody except that user (and the database superusers) can access the + <literal>phone_data</literal> table. But because of the <command>GRANT</command>, + the assistant can run a <command>SELECT</command> on the + <literal>phone_number</literal> view. The rule system will rewrite the + <command>SELECT</command> from <literal>phone_number</literal> into a + <command>SELECT</command> from <literal>phone_data</literal>. + Since the user is the owner of + <literal>phone_number</literal> and therefore the owner of the rule, the + read access to <literal>phone_data</literal> is now checked against the user's + privileges and the query is permitted. The check for accessing + <literal>phone_number</literal> is also performed, but this is done + against the invoking user, so nobody but the user and the + assistant can use it. +</para> + +<para> + The privileges are checked rule by rule. So the assistant is for now the + only one who can see the public phone numbers. But the assistant can set up + another view and grant access to that to the public. Then, anyone + can see the <literal>phone_number</literal> data through the assistant's view. + What the assistant cannot do is to create a view that directly + accesses <literal>phone_data</literal>. (Actually the assistant can, but it will not work since + every access will be denied during the permission checks.) + And as soon as the user notices that the assistant opened + their <literal>phone_number</literal> view, the user can revoke the assistant's access. Immediately, any + access to the assistant's view would fail. +</para> + +<para> + One might think that this rule-by-rule checking is a security + hole, but in fact it isn't. But if it did not work this way, the assistant + could set up a table with the same columns as <literal>phone_number</literal> and + copy the data to there once per day. Then it's the assistant's own data and + the assistant can grant access to everyone they want. A + <command>GRANT</command> command means, <quote>I trust you</quote>. + If someone you trust does the thing above, it's time to + think it over and then use <command>REVOKE</command>. +</para> + +<para> + Note that while views can be used to hide the contents of certain + columns using the technique shown above, they cannot be used to reliably + conceal the data in unseen rows unless the + <literal>security_barrier</literal> flag has been set. For example, + the following view is insecure: +<programlisting> +CREATE VIEW phone_number AS + SELECT person, phone FROM phone_data WHERE phone NOT LIKE '412%'; +</programlisting> + This view might seem secure, since the rule system will rewrite any + <command>SELECT</command> from <literal>phone_number</literal> into a + <command>SELECT</command> from <literal>phone_data</literal> and add the + qualification that only entries where <literal>phone</literal> does not begin + with 412 are wanted. But if the user can create their own functions, + it is not difficult to convince the planner to execute the user-defined + function prior to the <function>NOT LIKE</function> expression. + For example: +<programlisting> +CREATE FUNCTION tricky(text, text) RETURNS bool AS $$ +BEGIN + RAISE NOTICE '% => %', $1, $2; + RETURN true; +END; +$$ LANGUAGE plpgsql COST 0.0000000000000000000001; + +SELECT * FROM phone_number WHERE tricky(person, phone); +</programlisting> + Every person and phone number in the <literal>phone_data</literal> table will be + printed as a <literal>NOTICE</literal>, because the planner will choose to + execute the inexpensive <function>tricky</function> function before the + more expensive <function>NOT LIKE</function>. Even if the user is + prevented from defining new functions, built-in functions can be used in + similar attacks. (For example, most casting functions include their + input values in the error messages they produce.) +</para> + +<para> + Similar considerations apply to update rules. In the examples of + the previous section, the owner of the tables in the example + database could grant the privileges <literal>SELECT</literal>, + <literal>INSERT</literal>, <literal>UPDATE</literal>, and <literal>DELETE</literal> on + the <literal>shoelace</literal> view to someone else, but only + <literal>SELECT</literal> on <literal>shoelace_log</literal>. The rule action to + write log entries will still be executed successfully, and that + other user could see the log entries. But they could not create fake + entries, nor could they manipulate or remove existing ones. In this + case, there is no possibility of subverting the rules by convincing + the planner to alter the order of operations, because the only rule + which references <literal>shoelace_log</literal> is an unqualified + <literal>INSERT</literal>. This might not be true in more complex scenarios. +</para> + +<para> + When it is necessary for a view to provide row level security, the + <literal>security_barrier</literal> attribute should be applied to + the view. This prevents maliciously-chosen functions and operators from + being passed values from rows until after the view has done its work. For + example, if the view shown above had been created like this, it would + be secure: +<programlisting> +CREATE VIEW phone_number WITH (security_barrier) AS + SELECT person, phone FROM phone_data WHERE phone NOT LIKE '412%'; +</programlisting> + Views created with the <literal>security_barrier</literal> may perform + far worse than views created without this option. In general, there is + no way to avoid this: the fastest possible plan must be rejected + if it may compromise security. For this reason, this option is not + enabled by default. +</para> + +<para> + The query planner has more flexibility when dealing with functions that + have no side effects. Such functions are referred to as <literal>LEAKPROOF</literal>, and + include many simple, commonly used operators, such as many equality + operators. The query planner can safely allow such functions to be evaluated + at any point in the query execution process, since invoking them on rows + invisible to the user will not leak any information about the unseen rows. + Further, functions which do not take arguments or which are not passed any + arguments from the security barrier view do not have to be marked as + <literal>LEAKPROOF</literal> to be pushed down, as they never receive data + from the view. In contrast, a function that might throw an error depending + on the values received as arguments (such as one that throws an error in the + event of overflow or division by zero) is not leak-proof, and could provide + significant information about the unseen rows if applied before the security + view's row filters. +</para> + +<para> + It is important to understand that even a view created with the + <literal>security_barrier</literal> option is intended to be secure only + in the limited sense that the contents of the invisible tuples will not be + passed to possibly-insecure functions. The user may well have other means + of making inferences about the unseen data; for example, they can see the + query plan using <command>EXPLAIN</command>, or measure the run time of + queries against the view. A malicious attacker might be able to infer + something about the amount of unseen data, or even gain some information + about the data distribution or most common values (since these things may + affect the run time of the plan; or even, since they are also reflected in + the optimizer statistics, the choice of plan). If these types of "covert + channel" attacks are of concern, it is probably unwise to grant any access + to the data at all. +</para> +</sect1> + +<sect1 id="rules-status"> +<title>Rules and Command Status</title> + +<para> + The <productname>PostgreSQL</productname> server returns a command + status string, such as <literal>INSERT 149592 1</literal>, for each + command it receives. This is simple enough when there are no rules + involved, but what happens when the query is rewritten by rules? +</para> + +<para> + Rules affect the command status as follows: + + <itemizedlist> + <listitem> + <para> + If there is no unconditional <literal>INSTEAD</literal> rule for the query, then + the originally given query will be executed, and its command + status will be returned as usual. (But note that if there were + any conditional <literal>INSTEAD</literal> rules, the negation of their qualifications + will have been added to the original query. This might reduce the + number of rows it processes, and if so the reported status will + be affected.) + </para> + </listitem> + + <listitem> + <para> + If there is any unconditional <literal>INSTEAD</literal> rule for the query, then + the original query will not be executed at all. In this case, + the server will return the command status for the last query + that was inserted by an <literal>INSTEAD</literal> rule (conditional or + unconditional) and is of the same command type + (<command>INSERT</command>, <command>UPDATE</command>, or + <command>DELETE</command>) as the original query. If no query + meeting those requirements is added by any rule, then the + returned command status shows the original query type and + zeroes for the row-count and OID fields. + </para> + </listitem> + </itemizedlist> +</para> + +<para> + The programmer can ensure that any desired <literal>INSTEAD</literal> rule is the one + that sets the command status in the second case, by giving it the + alphabetically last rule name among the active rules, so that it + gets applied last. +</para> +</sect1> + +<sect1 id="rules-triggers"> +<title>Rules Versus Triggers</title> + +<indexterm zone="rules-triggers"> + <primary>rule</primary> + <secondary sortas="Trigger">compared with triggers</secondary> +</indexterm> + +<indexterm zone="rules-triggers"> + <primary>trigger</primary> + <secondary sortas="Regeln">compared with rules</secondary> +</indexterm> + +<para> + Many things that can be done using triggers can also be + implemented using the <productname>PostgreSQL</productname> + rule system. One of the things that cannot be implemented by + rules are some kinds of constraints, especially foreign keys. It is possible + to place a qualified rule that rewrites a command to <literal>NOTHING</literal> + if the value of a column does not appear in another table. + But then the data is silently thrown away and that's + not a good idea. If checks for valid values are required, + and in the case of an invalid value an error message should + be generated, it must be done by a trigger. +</para> + +<para> + In this chapter, we focused on using rules to update views. All of + the update rule examples in this chapter can also be implemented + using <literal>INSTEAD OF</literal> triggers on the views. Writing such + triggers is often easier than writing rules, particularly if complex + logic is required to perform the update. +</para> + +<para> + For the things that can be implemented by both, which is best + depends on the usage of the database. + A trigger is fired once for each affected row. A rule modifies + the query or generates an additional query. So if many + rows are affected in one statement, a rule issuing one extra + command is likely to be faster than a trigger that is + called for every single row and must re-determine what to do + many times. However, the trigger approach is conceptually far + simpler than the rule approach, and is easier for novices to get right. +</para> + +<para> + Here we show an example of how the choice of rules versus triggers + plays out in one situation. There are two tables: + +<programlisting> +CREATE TABLE computer ( + hostname text, -- indexed + manufacturer text -- indexed +); + +CREATE TABLE software ( + software text, -- indexed + hostname text -- indexed +); +</programlisting> + + Both tables have many thousands of rows and the indexes on + <structfield>hostname</structfield> are unique. The rule or trigger should + implement a constraint that deletes rows from <literal>software</literal> + that reference a deleted computer. The trigger would use this command: + +<programlisting> +DELETE FROM software WHERE hostname = $1; +</programlisting> + + Since the trigger is called for each individual row deleted from + <literal>computer</literal>, it can prepare and save the plan for this + command and pass the <structfield>hostname</structfield> value in the + parameter. The rule would be written as: + +<programlisting> +CREATE RULE computer_del AS ON DELETE TO computer + DO DELETE FROM software WHERE hostname = OLD.hostname; +</programlisting> + </para> + + <para> + Now we look at different types of deletes. In the case of a: + +<programlisting> +DELETE FROM computer WHERE hostname = 'mypc.local.net'; +</programlisting> + + the table <literal>computer</literal> is scanned by index (fast), and the + command issued by the trigger would also use an index scan (also fast). + The extra command from the rule would be: + +<programlisting> +DELETE FROM software WHERE computer.hostname = 'mypc.local.net' + AND software.hostname = computer.hostname; +</programlisting> + + Since there are appropriate indexes set up, the planner + will create a plan of + +<literallayout class="monospaced"> +Nestloop + -> Index Scan using comp_hostidx on computer + -> Index Scan using soft_hostidx on software +</literallayout> + + So there would be not that much difference in speed between + the trigger and the rule implementation. + </para> + + <para> + With the next delete we want to get rid of all the 2000 computers + where the <structfield>hostname</structfield> starts with + <literal>old</literal>. There are two possible commands to do that. One + is: + +<programlisting> +DELETE FROM computer WHERE hostname >= 'old' + AND hostname < 'ole' +</programlisting> + + The command added by the rule will be: + +<programlisting> +DELETE FROM software WHERE computer.hostname >= 'old' AND computer.hostname < 'ole' + AND software.hostname = computer.hostname; +</programlisting> + + with the plan + +<literallayout class="monospaced"> +Hash Join + -> Seq Scan on software + -> Hash + -> Index Scan using comp_hostidx on computer +</literallayout> + + The other possible command is: + +<programlisting> +DELETE FROM computer WHERE hostname ~ '^old'; +</programlisting> + + which results in the following executing plan for the command + added by the rule: + +<literallayout class="monospaced"> +Nestloop + -> Index Scan using comp_hostidx on computer + -> Index Scan using soft_hostidx on software +</literallayout> + + This shows, that the planner does not realize that the + qualification for <structfield>hostname</structfield> in + <literal>computer</literal> could also be used for an index scan on + <literal>software</literal> when there are multiple qualification + expressions combined with <literal>AND</literal>, which is what it does + in the regular-expression version of the command. The trigger will + get invoked once for each of the 2000 old computers that have to be + deleted, and that will result in one index scan over + <literal>computer</literal> and 2000 index scans over + <literal>software</literal>. The rule implementation will do it with two + commands that use indexes. And it depends on the overall size of + the table <literal>software</literal> whether the rule will still be faster in the + sequential scan situation. 2000 command executions from the trigger over the SPI + manager take some time, even if all the index blocks will soon be in the cache. +</para> + +<para> + The last command we look at is: + +<programlisting> +DELETE FROM computer WHERE manufacturer = 'bim'; +</programlisting> + + Again this could result in many rows to be deleted from + <literal>computer</literal>. So the trigger will again run many commands + through the executor. The command generated by the rule will be: + +<programlisting> +DELETE FROM software WHERE computer.manufacturer = 'bim' + AND software.hostname = computer.hostname; +</programlisting> + + The plan for that command will again be the nested loop over two + index scans, only using a different index on <literal>computer</literal>: + +<programlisting> +Nestloop + -> Index Scan using comp_manufidx on computer + -> Index Scan using soft_hostidx on software +</programlisting> + + In any of these cases, the extra commands from the rule system + will be more or less independent from the number of affected rows + in a command. +</para> + +<para> + The summary is, rules will only be significantly slower than + triggers if their actions result in large and badly qualified + joins, a situation where the planner fails. +</para> +</sect1> + +</chapter> |