diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-13 13:44:03 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-13 13:44:03 +0000 |
commit | 293913568e6a7a86fd1479e1cff8e2ecb58d6568 (patch) | |
tree | fc3b469a3ec5ab71b36ea97cc7aaddb838423a0c /doc/src/sgml/query.sgml | |
parent | Initial commit. (diff) | |
download | postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.tar.xz postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.zip |
Adding upstream version 16.2.upstream/16.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/query.sgml')
-rw-r--r-- | doc/src/sgml/query.sgml | 910 |
1 files changed, 910 insertions, 0 deletions
diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml new file mode 100644 index 0000000..59962d6 --- /dev/null +++ b/doc/src/sgml/query.sgml @@ -0,0 +1,910 @@ +<!-- doc/src/sgml/query.sgml --> + + <chapter id="tutorial-sql"> + <title>The <acronym>SQL</acronym> Language</title> + + <sect1 id="tutorial-sql-intro"> + <title>Introduction</title> + + <para> + This chapter provides an overview of how to use + <acronym>SQL</acronym> to perform simple operations. This + tutorial is only intended to give you an introduction and is in no + way a complete tutorial on <acronym>SQL</acronym>. Numerous books + have been written on <acronym>SQL</acronym>, including <xref + linkend="melt93"/> and <xref linkend="date97"/>. + You should be aware that some <productname>PostgreSQL</productname> + language features are extensions to the standard. + </para> + + <para> + In the examples that follow, we assume that you have created a + database named <literal>mydb</literal>, as described in the previous + chapter, and have been able to start <application>psql</application>. + </para> + + <para> + Examples in this manual can also be found in the + <productname>PostgreSQL</productname> source distribution + in the directory <filename>src/tutorial/</filename>. (Binary + distributions of <productname>PostgreSQL</productname> might not + provide those files.) To use those + files, first change to that directory and run <application>make</application>: + +<screen> +<prompt>$</prompt> <userinput>cd <replaceable>...</replaceable>/src/tutorial</userinput> +<prompt>$</prompt> <userinput>make</userinput> +</screen> + + This creates the scripts and compiles the C files containing user-defined + functions and types. Then, to start the tutorial, do the following: + +<screen> +<prompt>$</prompt> <userinput>psql -s mydb</userinput> +<computeroutput> +... +</computeroutput> +<prompt>mydb=></prompt> <userinput>\i basics.sql</userinput> +</screen> + + The <literal>\i</literal> command reads in commands from the + specified file. <command>psql</command>'s <literal>-s</literal> option puts you in + single step mode which pauses before sending each statement to the + server. The commands used in this section are in the file + <filename>basics.sql</filename>. + </para> + </sect1> + + + <sect1 id="tutorial-concepts"> + <title>Concepts</title> + + <para> + <indexterm><primary>relational database</primary></indexterm> + <indexterm><primary>hierarchical database</primary></indexterm> + <indexterm><primary>object-oriented database</primary></indexterm> + <indexterm><primary>relation</primary></indexterm> + <indexterm><primary>table</primary></indexterm> + + <productname>PostgreSQL</productname> is a <firstterm>relational + database management system</firstterm> (<acronym>RDBMS</acronym>). + That means it is a system for managing data stored in + <firstterm>relations</firstterm>. Relation is essentially a + mathematical term for <firstterm>table</firstterm>. The notion of + storing data in tables is so commonplace today that it might + seem inherently obvious, but there are a number of other ways of + organizing databases. Files and directories on Unix-like + operating systems form an example of a hierarchical database. A + more modern development is the object-oriented database. + </para> + + <para> + <indexterm><primary>row</primary></indexterm> + <indexterm><primary>column</primary></indexterm> + + Each table is a named collection of <firstterm>rows</firstterm>. + Each row of a given table has the same set of named + <firstterm>columns</firstterm>, + and each column is of a specific data type. Whereas columns have + a fixed order in each row, it is important to remember that SQL + does not guarantee the order of the rows within the table in any + way (although they can be explicitly sorted for display). + </para> + + <para> + <indexterm><primary>database cluster</primary></indexterm> + <indexterm><primary>cluster</primary><secondary>of databases</secondary><see>database cluster</see></indexterm> + + Tables are grouped into databases, and a collection of databases + managed by a single <productname>PostgreSQL</productname> server + instance constitutes a database <firstterm>cluster</firstterm>. + </para> + </sect1> + + + <sect1 id="tutorial-table"> + <title>Creating a New Table</title> + + <indexterm zone="tutorial-table"> + <primary>CREATE TABLE</primary> + </indexterm> + + <para> + You can create a new table by specifying the table + name, along with all column names and their types: + +<programlisting> +CREATE TABLE weather ( + city varchar(80), + temp_lo int, -- low temperature + temp_hi int, -- high temperature + prcp real, -- precipitation + date date +); +</programlisting> + + You can enter this into <command>psql</command> with the line + breaks. <command>psql</command> will recognize that the command + is not terminated until the semicolon. + </para> + + <para> + White space (i.e., spaces, tabs, and newlines) can be used freely + in SQL commands. That means you can type the command aligned + differently than above, or even all on one line. Two dashes + (<quote><literal>--</literal></quote>) introduce comments. + Whatever follows them is ignored up to the end of the line. SQL + is case-insensitive about key words and identifiers, except + when identifiers are double-quoted to preserve the case (not done + above). + </para> + + <para> + <type>varchar(80)</type> specifies a data type that can store + arbitrary character strings up to 80 characters in length. + <type>int</type> is the normal integer type. <type>real</type> is + a type for storing single precision floating-point numbers. + <type>date</type> should be self-explanatory. (Yes, the column of + type <type>date</type> is also named <structfield>date</structfield>. + This might be convenient or confusing — you choose.) + </para> + + <para> + <productname>PostgreSQL</productname> supports the standard + <acronym>SQL</acronym> types <type>int</type>, + <type>smallint</type>, <type>real</type>, <type>double + precision</type>, <type>char(<replaceable>N</replaceable>)</type>, + <type>varchar(<replaceable>N</replaceable>)</type>, <type>date</type>, + <type>time</type>, <type>timestamp</type>, and + <type>interval</type>, as well as other types of general utility + and a rich set of geometric types. + <productname>PostgreSQL</productname> can be customized with an + arbitrary number of user-defined data types. Consequently, type + names are not key words in the syntax, except where required to + support special cases in the <acronym>SQL</acronym> standard. + </para> + + <para> + The second example will store cities and their associated + geographical location: +<programlisting> +CREATE TABLE cities ( + name varchar(80), + location point +); +</programlisting> + The <type>point</type> type is an example of a + <productname>PostgreSQL</productname>-specific data type. + </para> + + <para> + <indexterm> + <primary>DROP TABLE</primary> + </indexterm> + + Finally, it should be mentioned that if you don't need a table any + longer or want to recreate it differently you can remove it using + the following command: +<synopsis> +DROP TABLE <replaceable>tablename</replaceable>; +</synopsis> + </para> + </sect1> + + + <sect1 id="tutorial-populate"> + <title>Populating a Table With Rows</title> + + <indexterm zone="tutorial-populate"> + <primary>INSERT</primary> + </indexterm> + + <para> + The <command>INSERT</command> statement is used to populate a table with + rows: + +<programlisting> +INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27'); +</programlisting> + + Note that all data types use rather obvious input formats. + Constants that are not simple numeric values usually must be + surrounded by single quotes (<literal>'</literal>), as in the example. + The + <type>date</type> type is actually quite flexible in what it + accepts, but for this tutorial we will stick to the unambiguous + format shown here. + </para> + + <para> + The <type>point</type> type requires a coordinate pair as input, + as shown here: +<programlisting> +INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)'); +</programlisting> + </para> + + <para> + The syntax used so far requires you to remember the order of the + columns. An alternative syntax allows you to list the columns + explicitly: +<programlisting> +INSERT INTO weather (city, temp_lo, temp_hi, prcp, date) + VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29'); +</programlisting> + You can list the columns in a different order if you wish or + even omit some columns, e.g., if the precipitation is unknown: +<programlisting> +INSERT INTO weather (date, city, temp_hi, temp_lo) + VALUES ('1994-11-29', 'Hayward', 54, 37); +</programlisting> + Many developers consider explicitly listing the columns better + style than relying on the order implicitly. + </para> + + <para> + Please enter all the commands shown above so you have some data to + work with in the following sections. + </para> + + <para> + <indexterm> + <primary>COPY</primary> + </indexterm> + + You could also have used <command>COPY</command> to load large + amounts of data from flat-text files. This is usually faster + because the <command>COPY</command> command is optimized for this + application while allowing less flexibility than + <command>INSERT</command>. An example would be: + +<programlisting> +COPY weather FROM '/home/user/weather.txt'; +</programlisting> + + where the file name for the source file must be available on the + machine running the backend process, not the client, since the backend process + reads the file directly. You can read more about the + <command>COPY</command> command in <xref linkend="sql-copy"/>. + </para> + </sect1> + + + <sect1 id="tutorial-select"> + <title>Querying a Table</title> + + <para> + <indexterm><primary>query</primary></indexterm> + <indexterm><primary>SELECT</primary></indexterm> + + To retrieve data from a table, the table is + <firstterm>queried</firstterm>. An <acronym>SQL</acronym> + <command>SELECT</command> statement is used to do this. The + statement is divided into a select list (the part that lists the + columns to be returned), a table list (the part that lists the + tables from which to retrieve the data), and an optional + qualification (the part that specifies any restrictions). For + example, to retrieve all the rows of table + <structname>weather</structname>, type: +<programlisting> +SELECT * FROM weather; +</programlisting> + Here <literal>*</literal> is a shorthand for <quote>all columns</quote>. + <footnote> + <para> + While <literal>SELECT *</literal> is useful for off-the-cuff + queries, it is widely considered bad style in production code, + since adding a column to the table would change the results. + </para> + </footnote> + So the same result would be had with: +<programlisting> +SELECT city, temp_lo, temp_hi, prcp, date FROM weather; +</programlisting> + + The output should be: + +<screen> + city | temp_lo | temp_hi | prcp | date +---------------+---------+---------+------+------------ + San Francisco | 46 | 50 | 0.25 | 1994-11-27 + San Francisco | 43 | 57 | 0 | 1994-11-29 + Hayward | 37 | 54 | | 1994-11-29 +(3 rows) +</screen> + </para> + + <para> + You can write expressions, not just simple column references, in the + select list. For example, you can do: +<programlisting> +SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather; +</programlisting> + This should give: +<screen> + city | temp_avg | date +---------------+----------+------------ + San Francisco | 48 | 1994-11-27 + San Francisco | 50 | 1994-11-29 + Hayward | 45 | 1994-11-29 +(3 rows) +</screen> + Notice how the <literal>AS</literal> clause is used to relabel the + output column. (The <literal>AS</literal> clause is optional.) + </para> + + <para> + A query can be <quote>qualified</quote> by adding a <literal>WHERE</literal> + clause that specifies which rows are wanted. The <literal>WHERE</literal> + clause contains a Boolean (truth value) expression, and only rows for + which the Boolean expression is true are returned. The usual + Boolean operators (<literal>AND</literal>, + <literal>OR</literal>, and <literal>NOT</literal>) are allowed in + the qualification. For example, the following + retrieves the weather of San Francisco on rainy days: + +<programlisting> +SELECT * FROM weather + WHERE city = 'San Francisco' AND prcp > 0.0; +</programlisting> + Result: +<screen> + city | temp_lo | temp_hi | prcp | date +---------------+---------+---------+------+------------ + San Francisco | 46 | 50 | 0.25 | 1994-11-27 +(1 row) +</screen> + </para> + + <para> + <indexterm><primary>ORDER BY</primary></indexterm> + + You can request that the results of a query + be returned in sorted order: + +<programlisting> +SELECT * FROM weather + ORDER BY city; +</programlisting> + +<screen> + city | temp_lo | temp_hi | prcp | date +---------------+---------+---------+------+------------ + Hayward | 37 | 54 | | 1994-11-29 + San Francisco | 43 | 57 | 0 | 1994-11-29 + San Francisco | 46 | 50 | 0.25 | 1994-11-27 +</screen> + + In this example, the sort order isn't fully specified, and so you + might get the San Francisco rows in either order. But you'd always + get the results shown above if you do: + +<programlisting> +SELECT * FROM weather + ORDER BY city, temp_lo; +</programlisting> + </para> + + <para> + <indexterm><primary>DISTINCT</primary></indexterm> + <indexterm><primary>duplicate</primary></indexterm> + + You can request that duplicate rows be removed from the result of + a query: + +<programlisting> +SELECT DISTINCT city + FROM weather; +</programlisting> + +<screen> + city +--------------- + Hayward + San Francisco +(2 rows) +</screen> + + Here again, the result row ordering might vary. + You can ensure consistent results by using <literal>DISTINCT</literal> and + <literal>ORDER BY</literal> together: + <footnote> + <para> + In some database systems, including older versions of + <productname>PostgreSQL</productname>, the implementation of + <literal>DISTINCT</literal> automatically orders the rows and + so <literal>ORDER BY</literal> is unnecessary. But this is not + required by the SQL standard, and current + <productname>PostgreSQL</productname> does not guarantee that + <literal>DISTINCT</literal> causes the rows to be ordered. + </para> + </footnote> + +<programlisting> +SELECT DISTINCT city + FROM weather + ORDER BY city; +</programlisting> + </para> + </sect1> + + + <sect1 id="tutorial-join"> + <title>Joins Between Tables</title> + + <indexterm zone="tutorial-join"> + <primary>join</primary> + </indexterm> + + <para> + Thus far, our queries have only accessed one table at a time. + Queries can access multiple tables at once, or access the same + table in such a way that multiple rows of the table are being + processed at the same time. Queries that access multiple tables + (or multiple instances of the same table) at one time are called + <firstterm>join</firstterm> queries. They combine rows from one table + with rows from a second table, with an expression specifying which rows + are to be paired. For example, to return all the weather records together + with the location of the associated city, the database needs to compare + the <structfield>city</structfield> + column of each row of the <structname>weather</structname> table with the + <structfield>name</structfield> column of all rows in the <structname>cities</structname> + table, and select the pairs of rows where these values match.<footnote> + <para> + This is only a conceptual model. The join is usually performed + in a more efficient manner than actually comparing each possible + pair of rows, but this is invisible to the user. + </para> + </footnote> + This would be accomplished by the following query: + +<programlisting> +SELECT * FROM weather JOIN cities ON city = name; +</programlisting> + +<screen> + city | temp_lo | temp_hi | prcp | date | name | location +---------------+---------+---------+------+------------+---------------+----------- + San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53) + San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53) +(2 rows) +</screen> + + </para> + + <para> + Observe two things about the result set: + <itemizedlist> + <listitem> + <para> + There is no result row for the city of Hayward. This is + because there is no matching entry in the + <structname>cities</structname> table for Hayward, so the join + ignores the unmatched rows in the <structname>weather</structname> table. We will see + shortly how this can be fixed. + </para> + </listitem> + + <listitem> + <para> + There are two columns containing the city name. This is + correct because the lists of columns from the + <structname>weather</structname> and + <structname>cities</structname> tables are concatenated. In + practice this is undesirable, though, so you will probably want + to list the output columns explicitly rather than using + <literal>*</literal>: +<programlisting> +SELECT city, temp_lo, temp_hi, prcp, date, location + FROM weather JOIN cities ON city = name; +</programlisting> + </para> + </listitem> + </itemizedlist> + </para> + + <para> + Since the columns all had different names, the parser + automatically found which table they belong to. If there + were duplicate column names in the two tables you'd need to + <firstterm>qualify</firstterm> the column names to show which one you + meant, as in: + +<programlisting> +SELECT weather.city, weather.temp_lo, weather.temp_hi, + weather.prcp, weather.date, cities.location + FROM weather JOIN cities ON weather.city = cities.name; +</programlisting> + + It is widely considered good style to qualify all column names + in a join query, so that the query won't fail if a duplicate + column name is later added to one of the tables. + </para> + + <para> + Join queries of the kind seen thus far can also be written in this + form: + +<programlisting> +SELECT * + FROM weather, cities + WHERE city = name; +</programlisting> + + This syntax pre-dates the <literal>JOIN</literal>/<literal>ON</literal> + syntax, which was introduced in SQL-92. The tables are simply listed in + the <literal>FROM</literal> clause, and the comparison expression is added + to the <literal>WHERE</literal> clause. The results from this older + implicit syntax and the newer explicit + <literal>JOIN</literal>/<literal>ON</literal> syntax are identical. But + for a reader of the query, the explicit syntax makes its meaning easier to + understand: The join condition is introduced by its own key word whereas + previously the condition was mixed into the <literal>WHERE</literal> + clause together with other conditions. + </para> + + <indexterm><primary>join</primary><secondary>outer</secondary></indexterm> + + <para> + Now we will figure out how we can get the Hayward records back in. + What we want the query to do is to scan the + <structname>weather</structname> table and for each row to find the + matching <structname>cities</structname> row(s). If no matching row is + found we want some <quote>empty values</quote> to be substituted + for the <structname>cities</structname> table's columns. This kind + of query is called an <firstterm>outer join</firstterm>. (The + joins we have seen so far are <firstterm>inner joins</firstterm>.) + The command looks like this: + +<programlisting> +SELECT * + FROM weather LEFT OUTER JOIN cities ON weather.city = cities.name; +</programlisting> + +<screen> + city | temp_lo | temp_hi | prcp | date | name | location +---------------+---------+---------+------+------------+---------------+----------- + Hayward | 37 | 54 | | 1994-11-29 | | + San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53) + San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53) +(3 rows) +</screen> + + This query is called a <firstterm>left outer + join</firstterm> because the table mentioned on the left of the + join operator will have each of its rows in the output at least + once, whereas the table on the right will only have those rows + output that match some row of the left table. When outputting a + left-table row for which there is no right-table match, empty (null) + values are substituted for the right-table columns. + </para> + + <formalpara> + <title>Exercise:</title> + + <para> + There are also right outer joins and full outer joins. Try to + find out what those do. + </para> + </formalpara> + + <indexterm><primary>join</primary><secondary>self</secondary></indexterm> + <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm> + <para> + We can also join a table against itself. This is called a + <firstterm>self join</firstterm>. As an example, suppose we wish + to find all the weather records that are in the temperature range + of other weather records. So we need to compare the + <structfield>temp_lo</structfield> and <structfield>temp_hi</structfield> columns of + each <structname>weather</structname> row to the + <structfield>temp_lo</structfield> and + <structfield>temp_hi</structfield> columns of all other + <structname>weather</structname> rows. We can do this with the + following query: + +<programlisting> +SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high, + w2.city, w2.temp_lo AS low, w2.temp_hi AS high + FROM weather w1 JOIN weather w2 + ON w1.temp_lo < w2.temp_lo AND w1.temp_hi > w2.temp_hi; +</programlisting> + +<screen> + city | low | high | city | low | high +---------------+-----+------+---------------+-----+------ + San Francisco | 43 | 57 | San Francisco | 46 | 50 + Hayward | 37 | 54 | San Francisco | 46 | 50 +(2 rows) +</screen> + + Here we have relabeled the weather table as <literal>w1</literal> and + <literal>w2</literal> to be able to distinguish the left and right side + of the join. You can also use these kinds of aliases in other + queries to save some typing, e.g.: +<programlisting> +SELECT * + FROM weather w JOIN cities c ON w.city = c.name; +</programlisting> + You will encounter this style of abbreviating quite frequently. + </para> + </sect1> + + + <sect1 id="tutorial-agg"> + <title>Aggregate Functions</title> + + <indexterm zone="tutorial-agg"> + <primary>aggregate function</primary> + </indexterm> + + <para> + Like most other relational database products, + <productname>PostgreSQL</productname> supports + <firstterm>aggregate functions</firstterm>. + An aggregate function computes a single result from multiple input rows. + For example, there are aggregates to compute the + <function>count</function>, <function>sum</function>, + <function>avg</function> (average), <function>max</function> (maximum) and + <function>min</function> (minimum) over a set of rows. + </para> + + <para> + As an example, we can find the highest low-temperature reading anywhere + with: + +<programlisting> +SELECT max(temp_lo) FROM weather; +</programlisting> + +<screen> + max +----- + 46 +(1 row) +</screen> + </para> + + <para> + <indexterm><primary>subquery</primary></indexterm> + + If we wanted to know what city (or cities) that reading occurred in, + we might try: + +<programlisting> +SELECT city FROM weather WHERE temp_lo = max(temp_lo); <lineannotation>WRONG</lineannotation> +</programlisting> + + but this will not work since the aggregate + <function>max</function> cannot be used in the + <literal>WHERE</literal> clause. (This restriction exists because + the <literal>WHERE</literal> clause determines which rows will be + included in the aggregate calculation; so obviously it has to be evaluated + before aggregate functions are computed.) + However, as is often the case + the query can be restated to accomplish the desired result, here + by using a <firstterm>subquery</firstterm>: + +<programlisting> +SELECT city FROM weather + WHERE temp_lo = (SELECT max(temp_lo) FROM weather); +</programlisting> + +<screen> + city +--------------- + San Francisco +(1 row) +</screen> + + This is OK because the subquery is an independent computation + that computes its own aggregate separately from what is happening + in the outer query. + </para> + + <para> + <indexterm><primary>GROUP BY</primary></indexterm> + <indexterm><primary>HAVING</primary></indexterm> + + Aggregates are also very useful in combination with <literal>GROUP + BY</literal> clauses. For example, we can get the number of readings + and the maximum low temperature observed in each city with: + +<programlisting> +SELECT city, count(*), max(temp_lo) + FROM weather + GROUP BY city; +</programlisting> + +<screen> + city | count | max +---------------+-------+----- + Hayward | 1 | 37 + San Francisco | 2 | 46 +(2 rows) +</screen> + + which gives us one output row per city. Each aggregate result is + computed over the table rows matching that city. + We can filter these grouped + rows using <literal>HAVING</literal>: + +<programlisting> +SELECT city, count(*), max(temp_lo) + FROM weather + GROUP BY city + HAVING max(temp_lo) < 40; +</programlisting> + +<screen> + city | count | max +---------+-------+----- + Hayward | 1 | 37 +(1 row) +</screen> + + which gives us the same results for only the cities that have all + <structfield>temp_lo</structfield> values below 40. Finally, if we only care about + cities whose + names begin with <quote><literal>S</literal></quote>, we might do: + +<programlisting> +SELECT city, count(*), max(temp_lo) + FROM weather + WHERE city LIKE 'S%' -- <co id="co.tutorial-agg-like"/> + GROUP BY city; +</programlisting> + +<screen> + city | count | max +---------------+-------+----- + San Francisco | 2 | 46 +(1 row) +</screen> + <calloutlist> + <callout arearefs="co.tutorial-agg-like"> + <para> + The <literal>LIKE</literal> operator does pattern matching and + is explained in <xref linkend="functions-matching"/>. + </para> + </callout> + </calloutlist> + </para> + + <para> + It is important to understand the interaction between aggregates and + <acronym>SQL</acronym>'s <literal>WHERE</literal> and <literal>HAVING</literal> clauses. + The fundamental difference between <literal>WHERE</literal> and + <literal>HAVING</literal> is this: <literal>WHERE</literal> selects + input rows before groups and aggregates are computed (thus, it controls + which rows go into the aggregate computation), whereas + <literal>HAVING</literal> selects group rows after groups and + aggregates are computed. Thus, the + <literal>WHERE</literal> clause must not contain aggregate functions; + it makes no sense to try to use an aggregate to determine which rows + will be inputs to the aggregates. On the other hand, the + <literal>HAVING</literal> clause always contains aggregate functions. + (Strictly speaking, you are allowed to write a <literal>HAVING</literal> + clause that doesn't use aggregates, but it's seldom useful. The same + condition could be used more efficiently at the <literal>WHERE</literal> + stage.) + </para> + + <para> + In the previous example, we can apply the city name restriction in + <literal>WHERE</literal>, since it needs no aggregate. This is + more efficient than adding the restriction to <literal>HAVING</literal>, + because we avoid doing the grouping and aggregate calculations + for all rows that fail the <literal>WHERE</literal> check. + </para> + + <para> + Another way to select the rows that go into an aggregate + computation is to use <literal>FILTER</literal>, which is a + per-aggregate option: + +<programlisting> +SELECT city, count(*) FILTER (WHERE temp_lo < 45), max(temp_lo) + FROM weather + GROUP BY city; +</programlisting> + +<screen> + city | count | max +---------------+-------+----- + Hayward | 1 | 37 + San Francisco | 1 | 46 +(2 rows) +</screen> + + <literal>FILTER</literal> is much like <literal>WHERE</literal>, + except that it removes rows only from the input of the particular + aggregate function that it is attached to. + Here, the <literal>count</literal> aggregate counts only + rows with <literal>temp_lo</literal> below 45; but the + <literal>max</literal> aggregate is still applied to all rows, + so it still finds the reading of 46. + </para> + </sect1> + + + <sect1 id="tutorial-update"> + <title>Updates</title> + + <indexterm zone="tutorial-update"> + <primary>UPDATE</primary> + </indexterm> + + <para> + You can update existing rows using the + <command>UPDATE</command> command. + Suppose you discover the temperature readings are + all off by 2 degrees after November 28. You can correct the + data as follows: + +<programlisting> +UPDATE weather + SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2 + WHERE date > '1994-11-28'; +</programlisting> + </para> + + <para> + Look at the new state of the data: +<programlisting> +SELECT * FROM weather; + + city | temp_lo | temp_hi | prcp | date +---------------+---------+---------+------+------------ + San Francisco | 46 | 50 | 0.25 | 1994-11-27 + San Francisco | 41 | 55 | 0 | 1994-11-29 + Hayward | 35 | 52 | | 1994-11-29 +(3 rows) +</programlisting> + </para> + </sect1> + + <sect1 id="tutorial-delete"> + <title>Deletions</title> + + <indexterm zone="tutorial-delete"> + <primary>DELETE</primary> + </indexterm> + + <para> + Rows can be removed from a table using the <command>DELETE</command> + command. + Suppose you are no longer interested in the weather of Hayward. + Then you can do the following to delete those rows from the table: +<programlisting> +DELETE FROM weather WHERE city = 'Hayward'; +</programlisting> + + All weather records belonging to Hayward are removed. + +<programlisting> +SELECT * FROM weather; +</programlisting> + +<screen> + city | temp_lo | temp_hi | prcp | date +---------------+---------+---------+------+------------ + San Francisco | 46 | 50 | 0.25 | 1994-11-27 + San Francisco | 41 | 55 | 0 | 1994-11-29 +(2 rows) +</screen> + </para> + + <para> + One should be wary of statements of the form +<synopsis> +DELETE FROM <replaceable>tablename</replaceable>; +</synopsis> + + Without a qualification, <command>DELETE</command> will + remove <emphasis>all</emphasis> rows from the given table, leaving it + empty. The system will not request confirmation before + doing this! + </para> + </sect1> + + </chapter> |