diff options
Diffstat (limited to 'doc/src/sgml/advanced.sgml')
-rw-r--r-- | doc/src/sgml/advanced.sgml | 720 |
1 files changed, 720 insertions, 0 deletions
diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml new file mode 100644 index 0000000..71ae423 --- /dev/null +++ b/doc/src/sgml/advanced.sgml @@ -0,0 +1,720 @@ +<!-- doc/src/sgml/advanced.sgml --> + + <chapter id="tutorial-advanced"> + <title>Advanced Features</title> + + <sect1 id="tutorial-advanced-intro"> + <title>Introduction</title> + + <para> + In the previous chapter we have covered the basics of using + <acronym>SQL</acronym> to store and access your data in + <productname>PostgreSQL</productname>. We will now discuss some + more advanced features of <acronym>SQL</acronym> that simplify + management and prevent loss or corruption of your data. Finally, + we will look at some <productname>PostgreSQL</productname> + extensions. + </para> + + <para> + This chapter will on occasion refer to examples found in <xref + linkend="tutorial-sql"/> to change or improve them, so it will be + useful to have read that chapter. Some examples from + this chapter can also be found in + <filename>advanced.sql</filename> in the tutorial directory. This + file also contains some sample data to load, which is not + repeated here. (Refer to <xref linkend="tutorial-sql-intro"/> for + how to use the file.) + </para> + </sect1> + + + <sect1 id="tutorial-views"> + <title>Views</title> + + <indexterm zone="tutorial-views"> + <primary>view</primary> + </indexterm> + + <para> + Refer back to the queries in <xref linkend="tutorial-join"/>. + Suppose the combined listing of weather records and city location + is of particular interest to your application, but you do not want + to type the query each time you need it. You can create a + <firstterm>view</firstterm> over the query, which gives a name to + the query that you can refer to like an ordinary table: + +<programlisting> +CREATE VIEW myview AS + SELECT name, temp_lo, temp_hi, prcp, date, location + FROM weather, cities + WHERE city = name; + +SELECT * FROM myview; +</programlisting> + </para> + + <para> + Making liberal use of views is a key aspect of good SQL database + design. Views allow you to encapsulate the details of the + structure of your tables, which might change as your application + evolves, behind consistent interfaces. + </para> + + <para> + Views can be used in almost any place a real table can be used. + Building views upon other views is not uncommon. + </para> + </sect1> + + + <sect1 id="tutorial-fk"> + <title>Foreign Keys</title> + + <indexterm zone="tutorial-fk"> + <primary>foreign key</primary> + </indexterm> + + <indexterm zone="tutorial-fk"> + <primary>referential integrity</primary> + </indexterm> + + <para> + Recall the <classname>weather</classname> and + <classname>cities</classname> tables from <xref + linkend="tutorial-sql"/>. Consider the following problem: You + want to make sure that no one can insert rows in the + <classname>weather</classname> table that do not have a matching + entry in the <classname>cities</classname> table. This is called + maintaining the <firstterm>referential integrity</firstterm> of + your data. In simplistic database systems this would be + implemented (if at all) by first looking at the + <classname>cities</classname> table to check if a matching record + exists, and then inserting or rejecting the new + <classname>weather</classname> records. This approach has a + number of problems and is very inconvenient, so + <productname>PostgreSQL</productname> can do this for you. + </para> + + <para> + The new declaration of the tables would look like this: + +<programlisting> +CREATE TABLE cities ( + name varchar(80) primary key, + location point +); + +CREATE TABLE weather ( + city varchar(80) references cities(name), + temp_lo int, + temp_hi int, + prcp real, + date date +); +</programlisting> + + Now try inserting an invalid record: + +<programlisting> +INSERT INTO weather VALUES ('Berkeley', 45, 53, 0.0, '1994-11-28'); +</programlisting> + +<screen> +ERROR: insert or update on table "weather" violates foreign key constraint "weather_city_fkey" +DETAIL: Key (city)=(Berkeley) is not present in table "cities". +</screen> + </para> + + <para> + The behavior of foreign keys can be finely tuned to your + application. We will not go beyond this simple example in this + tutorial, but just refer you to <xref linkend="ddl"/> + for more information. Making correct use of + foreign keys will definitely improve the quality of your database + applications, so you are strongly encouraged to learn about them. + </para> + </sect1> + + + <sect1 id="tutorial-transactions"> + <title>Transactions</title> + + <indexterm zone="tutorial-transactions"> + <primary>transaction</primary> + </indexterm> + + <para> + <firstterm>Transactions</firstterm> are a fundamental concept of all database + systems. The essential point of a transaction is that it bundles + multiple steps into a single, all-or-nothing operation. The intermediate + states between the steps are not visible to other concurrent transactions, + and if some failure occurs that prevents the transaction from completing, + then none of the steps affect the database at all. + </para> + + <para> + For example, consider a bank database that contains balances for various + customer accounts, as well as total deposit balances for branches. + Suppose that we want to record a payment of $100.00 from Alice's account + to Bob's account. Simplifying outrageously, the SQL commands for this + might look like: + +<programlisting> +UPDATE accounts SET balance = balance - 100.00 + WHERE name = 'Alice'; +UPDATE branches SET balance = balance - 100.00 + WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Alice'); +UPDATE accounts SET balance = balance + 100.00 + WHERE name = 'Bob'; +UPDATE branches SET balance = balance + 100.00 + WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Bob'); +</programlisting> + </para> + + <para> + The details of these commands are not important here; the important + point is that there are several separate updates involved to accomplish + this rather simple operation. Our bank's officers will want to be + assured that either all these updates happen, or none of them happen. + It would certainly not do for a system failure to result in Bob + receiving $100.00 that was not debited from Alice. Nor would Alice long + remain a happy customer if she was debited without Bob being credited. + We need a guarantee that if something goes wrong partway through the + operation, none of the steps executed so far will take effect. Grouping + the updates into a <firstterm>transaction</firstterm> gives us this guarantee. + A transaction is said to be <firstterm>atomic</firstterm>: from the point of + view of other transactions, it either happens completely or not at all. + </para> + + <para> + We also want a + guarantee that once a transaction is completed and acknowledged by + the database system, it has indeed been permanently recorded + and won't be lost even if a crash ensues shortly thereafter. + For example, if we are recording a cash withdrawal by Bob, + we do not want any chance that the debit to his account will + disappear in a crash just after he walks out the bank door. + A transactional database guarantees that all the updates made by + a transaction are logged in permanent storage (i.e., on disk) before + the transaction is reported complete. + </para> + + <para> + Another important property of transactional databases is closely + related to the notion of atomic updates: when multiple transactions + are running concurrently, each one should not be able to see the + incomplete changes made by others. For example, if one transaction + is busy totalling all the branch balances, it would not do for it + to include the debit from Alice's branch but not the credit to + Bob's branch, nor vice versa. So transactions must be all-or-nothing + not only in terms of their permanent effect on the database, but + also in terms of their visibility as they happen. The updates made + so far by an open transaction are invisible to other transactions + until the transaction completes, whereupon all the updates become + visible simultaneously. + </para> + + <para> + In <productname>PostgreSQL</productname>, a transaction is set up by surrounding + the SQL commands of the transaction with + <command>BEGIN</command> and <command>COMMIT</command> commands. So our banking + transaction would actually look like: + +<programlisting> +BEGIN; +UPDATE accounts SET balance = balance - 100.00 + WHERE name = 'Alice'; +-- etc etc +COMMIT; +</programlisting> + </para> + + <para> + If, partway through the transaction, we decide we do not want to + commit (perhaps we just noticed that Alice's balance went negative), + we can issue the command <command>ROLLBACK</command> instead of + <command>COMMIT</command>, and all our updates so far will be canceled. + </para> + + <para> + <productname>PostgreSQL</productname> actually treats every SQL statement as being + executed within a transaction. If you do not issue a <command>BEGIN</command> + command, + then each individual statement has an implicit <command>BEGIN</command> and + (if successful) <command>COMMIT</command> wrapped around it. A group of + statements surrounded by <command>BEGIN</command> and <command>COMMIT</command> + is sometimes called a <firstterm>transaction block</firstterm>. + </para> + + <note> + <para> + Some client libraries issue <command>BEGIN</command> and <command>COMMIT</command> + commands automatically, so that you might get the effect of transaction + blocks without asking. Check the documentation for the interface + you are using. + </para> + </note> + + <para> + It's possible to control the statements in a transaction in a more + granular fashion through the use of <firstterm>savepoints</firstterm>. Savepoints + allow you to selectively discard parts of the transaction, while + committing the rest. After defining a savepoint with + <command>SAVEPOINT</command>, you can if needed roll back to the savepoint + with <command>ROLLBACK TO</command>. All the transaction's database changes + between defining the savepoint and rolling back to it are discarded, but + changes earlier than the savepoint are kept. + </para> + + <para> + After rolling back to a savepoint, it continues to be defined, so you can + roll back to it several times. Conversely, if you are sure you won't need + to roll back to a particular savepoint again, it can be released, so the + system can free some resources. Keep in mind that either releasing or + rolling back to a savepoint + will automatically release all savepoints that were defined after it. + </para> + + <para> + All this is happening within the transaction block, so none of it + is visible to other database sessions. When and if you commit the + transaction block, the committed actions become visible as a unit + to other sessions, while the rolled-back actions never become visible + at all. + </para> + + <para> + Remembering the bank database, suppose we debit $100.00 from Alice's + account, and credit Bob's account, only to find later that we should + have credited Wally's account. We could do it using savepoints like + this: + +<programlisting> +BEGIN; +UPDATE accounts SET balance = balance - 100.00 + WHERE name = 'Alice'; +SAVEPOINT my_savepoint; +UPDATE accounts SET balance = balance + 100.00 + WHERE name = 'Bob'; +-- oops ... forget that and use Wally's account +ROLLBACK TO my_savepoint; +UPDATE accounts SET balance = balance + 100.00 + WHERE name = 'Wally'; +COMMIT; +</programlisting> + </para> + + <para> + This example is, of course, oversimplified, but there's a lot of control + possible in a transaction block through the use of savepoints. + Moreover, <command>ROLLBACK TO</command> is the only way to regain control of a + transaction block that was put in aborted state by the + system due to an error, short of rolling it back completely and starting + again. + </para> + + </sect1> + + + <sect1 id="tutorial-window"> + <title>Window Functions</title> + + <indexterm zone="tutorial-window"> + <primary>window function</primary> + </indexterm> + + <para> + A <firstterm>window function</firstterm> performs a calculation across a set of + table rows that are somehow related to the current row. This is comparable + to the type of calculation that can be done with an aggregate function. + However, window functions do not cause rows to become grouped into a single + output row like non-window aggregate calls would. Instead, the + rows retain their separate identities. Behind the scenes, the window + function is able to access more than just the current row of the query + result. + </para> + + <para> + Here is an example that shows how to compare each employee's salary + with the average salary in his or her department: + +<programlisting> +SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary; +</programlisting> + +<screen> + depname | empno | salary | avg +-----------+-------+--------+----------------------- + develop | 11 | 5200 | 5020.0000000000000000 + develop | 7 | 4200 | 5020.0000000000000000 + develop | 9 | 4500 | 5020.0000000000000000 + develop | 8 | 6000 | 5020.0000000000000000 + develop | 10 | 5200 | 5020.0000000000000000 + personnel | 5 | 3500 | 3700.0000000000000000 + personnel | 2 | 3900 | 3700.0000000000000000 + sales | 3 | 4800 | 4866.6666666666666667 + sales | 1 | 5000 | 4866.6666666666666667 + sales | 4 | 4800 | 4866.6666666666666667 +(10 rows) +</screen> + + The first three output columns come directly from the table + <structname>empsalary</structname>, and there is one output row for each row in the + table. The fourth column represents an average taken across all the table + rows that have the same <structfield>depname</structfield> value as the current row. + (This actually is the same function as the non-window <function>avg</function> + aggregate, but the <literal>OVER</literal> clause causes it to be + treated as a window function and computed across the window frame.) + </para> + + <para> + A window function call always contains an <literal>OVER</literal> clause + directly following the window function's name and argument(s). This is what + syntactically distinguishes it from a normal function or non-window + aggregate. The <literal>OVER</literal> clause determines exactly how the + rows of the query are split up for processing by the window function. + The <literal>PARTITION BY</literal> clause within <literal>OVER</literal> + divides the rows into groups, or partitions, that share the same + values of the <literal>PARTITION BY</literal> expression(s). For each row, + the window function is computed across the rows that fall into the + same partition as the current row. + </para> + + <para> + You can also control the order in which rows are processed by + window functions using <literal>ORDER BY</literal> within <literal>OVER</literal>. + (The window <literal>ORDER BY</literal> does not even have to match the + order in which the rows are output.) Here is an example: + +<programlisting> +SELECT depname, empno, salary, + rank() OVER (PARTITION BY depname ORDER BY salary DESC) +FROM empsalary; +</programlisting> + +<screen> + depname | empno | salary | rank +-----------+-------+--------+------ + develop | 8 | 6000 | 1 + develop | 10 | 5200 | 2 + develop | 11 | 5200 | 2 + develop | 9 | 4500 | 4 + develop | 7 | 4200 | 5 + personnel | 2 | 3900 | 1 + personnel | 5 | 3500 | 2 + sales | 1 | 5000 | 1 + sales | 4 | 4800 | 2 + sales | 3 | 4800 | 2 +(10 rows) +</screen> + + As shown here, the <function>rank</function> function produces a numerical rank + for each distinct <literal>ORDER BY</literal> value in the current row's + partition, using the order defined by the <literal>ORDER BY</literal> clause. + <function>rank</function> needs no explicit parameter, because its behavior + is entirely determined by the <literal>OVER</literal> clause. + </para> + + <para> + The rows considered by a window function are those of the <quote>virtual + table</quote> produced by the query's <literal>FROM</literal> clause as filtered by its + <literal>WHERE</literal>, <literal>GROUP BY</literal>, and <literal>HAVING</literal> clauses + if any. For example, a row removed because it does not meet the + <literal>WHERE</literal> condition is not seen by any window function. + A query can contain multiple window functions that slice up the data + in different ways using different <literal>OVER</literal> clauses, but + they all act on the same collection of rows defined by this virtual table. + </para> + + <para> + We already saw that <literal>ORDER BY</literal> can be omitted if the ordering + of rows is not important. It is also possible to omit <literal>PARTITION + BY</literal>, in which case there is a single partition containing all rows. + </para> + + <para> + There is another important concept associated with window functions: + for each row, there is a set of rows within its partition called its + <firstterm>window frame</firstterm>. Some window functions act only + on the rows of the window frame, rather than of the whole partition. + By default, if <literal>ORDER BY</literal> is supplied then the frame consists of + all rows from the start of the partition up through the current row, plus + any following rows that are equal to the current row according to the + <literal>ORDER BY</literal> clause. When <literal>ORDER BY</literal> is omitted the + default frame consists of all rows in the partition. + <footnote> + <para> + There are options to define the window frame in other ways, but + this tutorial does not cover them. See + <xref linkend="syntax-window-functions"/> for details. + </para> + </footnote> + Here is an example using <function>sum</function>: + </para> + +<programlisting> +SELECT salary, sum(salary) OVER () FROM empsalary; +</programlisting> + +<screen> + salary | sum +--------+------- + 5200 | 47100 + 5000 | 47100 + 3500 | 47100 + 4800 | 47100 + 3900 | 47100 + 4200 | 47100 + 4500 | 47100 + 4800 | 47100 + 6000 | 47100 + 5200 | 47100 +(10 rows) +</screen> + + <para> + Above, since there is no <literal>ORDER BY</literal> in the <literal>OVER</literal> + clause, the window frame is the same as the partition, which for lack of + <literal>PARTITION BY</literal> is the whole table; in other words each sum is + taken over the whole table and so we get the same result for each output + row. But if we add an <literal>ORDER BY</literal> clause, we get very different + results: + </para> + +<programlisting> +SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary; +</programlisting> + +<screen> + salary | sum +--------+------- + 3500 | 3500 + 3900 | 7400 + 4200 | 11600 + 4500 | 16100 + 4800 | 25700 + 4800 | 25700 + 5000 | 30700 + 5200 | 41100 + 5200 | 41100 + 6000 | 47100 +(10 rows) +</screen> + + <para> + Here the sum is taken from the first (lowest) salary up through the + current one, including any duplicates of the current one (notice the + results for the duplicated salaries). + </para> + + <para> + Window functions are permitted only in the <literal>SELECT</literal> list + and the <literal>ORDER BY</literal> clause of the query. They are forbidden + elsewhere, such as in <literal>GROUP BY</literal>, <literal>HAVING</literal> + and <literal>WHERE</literal> clauses. This is because they logically + execute after the processing of those clauses. Also, window functions + execute after non-window aggregate functions. This means it is valid to + include an aggregate function call in the arguments of a window function, + but not vice versa. + </para> + + <para> + If there is a need to filter or group rows after the window calculations + are performed, you can use a sub-select. For example: + +<programlisting> +SELECT depname, empno, salary, enroll_date +FROM + (SELECT depname, empno, salary, enroll_date, + rank() OVER (PARTITION BY depname ORDER BY salary DESC, empno) AS pos + FROM empsalary + ) AS ss +WHERE pos < 3; +</programlisting> + + The above query only shows the rows from the inner query having + <literal>rank</literal> less than 3. + </para> + + <para> + When a query involves multiple window functions, it is possible to write + out each one with a separate <literal>OVER</literal> clause, but this is + duplicative and error-prone if the same windowing behavior is wanted + for several functions. Instead, each windowing behavior can be named + in a <literal>WINDOW</literal> clause and then referenced in <literal>OVER</literal>. + For example: + +<programlisting> +SELECT sum(salary) OVER w, avg(salary) OVER w + FROM empsalary + WINDOW w AS (PARTITION BY depname ORDER BY salary DESC); +</programlisting> + </para> + + <para> + More details about window functions can be found in + <xref linkend="syntax-window-functions"/>, + <xref linkend="functions-window"/>, + <xref linkend="queries-window"/>, and the + <xref linkend="sql-select"/> reference page. + </para> + </sect1> + + + <sect1 id="tutorial-inheritance"> + <title>Inheritance</title> + + <indexterm zone="tutorial-inheritance"> + <primary>inheritance</primary> + </indexterm> + + <para> + Inheritance is a concept from object-oriented databases. It opens + up interesting new possibilities of database design. + </para> + + <para> + Let's create two tables: A table <classname>cities</classname> + and a table <classname>capitals</classname>. Naturally, capitals + are also cities, so you want some way to show the capitals + implicitly when you list all cities. If you're really clever you + might invent some scheme like this: + +<programlisting> +CREATE TABLE capitals ( + name text, + population real, + elevation int, -- (in ft) + state char(2) +); + +CREATE TABLE non_capitals ( + name text, + population real, + elevation int -- (in ft) +); + +CREATE VIEW cities AS + SELECT name, population, elevation FROM capitals + UNION + SELECT name, population, elevation FROM non_capitals; +</programlisting> + + This works OK as far as querying goes, but it gets ugly when you + need to update several rows, for one thing. + </para> + + <para> + A better solution is this: + +<programlisting> +CREATE TABLE cities ( + name text, + population real, + elevation int -- (in ft) +); + +CREATE TABLE capitals ( + state char(2) UNIQUE NOT NULL +) INHERITS (cities); +</programlisting> + </para> + + <para> + In this case, a row of <classname>capitals</classname> + <firstterm>inherits</firstterm> all columns (<structfield>name</structfield>, + <structfield>population</structfield>, and <structfield>elevation</structfield>) from its + <firstterm>parent</firstterm>, <classname>cities</classname>. The + type of the column <structfield>name</structfield> is + <type>text</type>, a native <productname>PostgreSQL</productname> + type for variable length character strings. The + <classname>capitals</classname> table has + an additional column, <structfield>state</structfield>, which shows its + state abbreviation. In + <productname>PostgreSQL</productname>, a table can inherit from + zero or more other tables. + </para> + + <para> + For example, the following query finds the names of all cities, + including state capitals, that are located at an elevation + over 500 feet: + +<programlisting> +SELECT name, elevation + FROM cities + WHERE elevation > 500; +</programlisting> + + which returns: + +<screen> + name | elevation +-----------+----------- + Las Vegas | 2174 + Mariposa | 1953 + Madison | 845 +(3 rows) +</screen> + </para> + + <para> + On the other hand, the following query finds + all the cities that are not state capitals and + are situated at an elevation over 500 feet: + +<programlisting> +SELECT name, elevation + FROM ONLY cities + WHERE elevation > 500; +</programlisting> + +<screen> + name | elevation +-----------+----------- + Las Vegas | 2174 + Mariposa | 1953 +(2 rows) +</screen> + </para> + + <para> + Here the <literal>ONLY</literal> before <literal>cities</literal> + indicates that the query should be run over only the + <classname>cities</classname> table, and not tables below + <classname>cities</classname> in the inheritance hierarchy. Many + of the commands that we have already discussed — + <command>SELECT</command>, <command>UPDATE</command>, and + <command>DELETE</command> — support this <literal>ONLY</literal> + notation. + </para> + + <note> + <para> + Although inheritance is frequently useful, it has not been integrated + with unique constraints or foreign keys, which limits its usefulness. + See <xref linkend="ddl-inherit"/> for more detail. + </para> + </note> + </sect1> + + + <sect1 id="tutorial-conclusion"> + <title>Conclusion</title> + + <para> + <productname>PostgreSQL</productname> has many features not + touched upon in this tutorial introduction, which has been + oriented toward newer users of <acronym>SQL</acronym>. These + features are discussed in more detail in the remainder of this + book. + </para> + + <para> + If you feel you need more introductory material, please visit the PostgreSQL + <ulink url="https://www.postgresql.org">web site</ulink> + for links to more resources. + </para> + </sect1> + </chapter> |