1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
|
<!-- doc/src/sgml/advanced.sgml -->
<chapter id="tutorial-advanced">
<title>Advanced Features</title>
<sect1 id="tutorial-advanced-intro">
<title>Introduction</title>
<para>
In the previous chapter we have covered the basics of using
<acronym>SQL</acronym> to store and access your data in
<productname>PostgreSQL</productname>. We will now discuss some
more advanced features of <acronym>SQL</acronym> that simplify
management and prevent loss or corruption of your data. Finally,
we will look at some <productname>PostgreSQL</productname>
extensions.
</para>
<para>
This chapter will on occasion refer to examples found in <xref
linkend="tutorial-sql"/> to change or improve them, so it will be
useful to have read that chapter. Some examples from
this chapter can also be found in
<filename>advanced.sql</filename> in the tutorial directory. This
file also contains some sample data to load, which is not
repeated here. (Refer to <xref linkend="tutorial-sql-intro"/> for
how to use the file.)
</para>
</sect1>
<sect1 id="tutorial-views">
<title>Views</title>
<indexterm zone="tutorial-views">
<primary>view</primary>
</indexterm>
<para>
Refer back to the queries in <xref linkend="tutorial-join"/>.
Suppose the combined listing of weather records and city location
is of particular interest to your application, but you do not want
to type the query each time you need it. You can create a
<firstterm>view</firstterm> over the query, which gives a name to
the query that you can refer to like an ordinary table:
<programlisting>
CREATE VIEW myview AS
SELECT name, temp_lo, temp_hi, prcp, date, location
FROM weather, cities
WHERE city = name;
SELECT * FROM myview;
</programlisting>
</para>
<para>
Making liberal use of views is a key aspect of good SQL database
design. Views allow you to encapsulate the details of the
structure of your tables, which might change as your application
evolves, behind consistent interfaces.
</para>
<para>
Views can be used in almost any place a real table can be used.
Building views upon other views is not uncommon.
</para>
</sect1>
<sect1 id="tutorial-fk">
<title>Foreign Keys</title>
<indexterm zone="tutorial-fk">
<primary>foreign key</primary>
</indexterm>
<indexterm zone="tutorial-fk">
<primary>referential integrity</primary>
</indexterm>
<para>
Recall the <classname>weather</classname> and
<classname>cities</classname> tables from <xref
linkend="tutorial-sql"/>. Consider the following problem: You
want to make sure that no one can insert rows in the
<classname>weather</classname> table that do not have a matching
entry in the <classname>cities</classname> table. This is called
maintaining the <firstterm>referential integrity</firstterm> of
your data. In simplistic database systems this would be
implemented (if at all) by first looking at the
<classname>cities</classname> table to check if a matching record
exists, and then inserting or rejecting the new
<classname>weather</classname> records. This approach has a
number of problems and is very inconvenient, so
<productname>PostgreSQL</productname> can do this for you.
</para>
<para>
The new declaration of the tables would look like this:
<programlisting>
CREATE TABLE cities (
name varchar(80) primary key,
location point
);
CREATE TABLE weather (
city varchar(80) references cities(name),
temp_lo int,
temp_hi int,
prcp real,
date date
);
</programlisting>
Now try inserting an invalid record:
<programlisting>
INSERT INTO weather VALUES ('Berkeley', 45, 53, 0.0, '1994-11-28');
</programlisting>
<screen>
ERROR: insert or update on table "weather" violates foreign key constraint "weather_city_fkey"
DETAIL: Key (city)=(Berkeley) is not present in table "cities".
</screen>
</para>
<para>
The behavior of foreign keys can be finely tuned to your
application. We will not go beyond this simple example in this
tutorial, but just refer you to <xref linkend="ddl"/>
for more information. Making correct use of
foreign keys will definitely improve the quality of your database
applications, so you are strongly encouraged to learn about them.
</para>
</sect1>
<sect1 id="tutorial-transactions">
<title>Transactions</title>
<indexterm zone="tutorial-transactions">
<primary>transaction</primary>
</indexterm>
<para>
<firstterm>Transactions</firstterm> are a fundamental concept of all database
systems. The essential point of a transaction is that it bundles
multiple steps into a single, all-or-nothing operation. The intermediate
states between the steps are not visible to other concurrent transactions,
and if some failure occurs that prevents the transaction from completing,
then none of the steps affect the database at all.
</para>
<para>
For example, consider a bank database that contains balances for various
customer accounts, as well as total deposit balances for branches.
Suppose that we want to record a payment of $100.00 from Alice's account
to Bob's account. Simplifying outrageously, the SQL commands for this
might look like:
<programlisting>
UPDATE accounts SET balance = balance - 100.00
WHERE name = 'Alice';
UPDATE branches SET balance = balance - 100.00
WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Alice');
UPDATE accounts SET balance = balance + 100.00
WHERE name = 'Bob';
UPDATE branches SET balance = balance + 100.00
WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Bob');
</programlisting>
</para>
<para>
The details of these commands are not important here; the important
point is that there are several separate updates involved to accomplish
this rather simple operation. Our bank's officers will want to be
assured that either all these updates happen, or none of them happen.
It would certainly not do for a system failure to result in Bob
receiving $100.00 that was not debited from Alice. Nor would Alice long
remain a happy customer if she was debited without Bob being credited.
We need a guarantee that if something goes wrong partway through the
operation, none of the steps executed so far will take effect. Grouping
the updates into a <firstterm>transaction</firstterm> gives us this guarantee.
A transaction is said to be <firstterm>atomic</firstterm>: from the point of
view of other transactions, it either happens completely or not at all.
</para>
<para>
We also want a
guarantee that once a transaction is completed and acknowledged by
the database system, it has indeed been permanently recorded
and won't be lost even if a crash ensues shortly thereafter.
For example, if we are recording a cash withdrawal by Bob,
we do not want any chance that the debit to his account will
disappear in a crash just after he walks out the bank door.
A transactional database guarantees that all the updates made by
a transaction are logged in permanent storage (i.e., on disk) before
the transaction is reported complete.
</para>
<para>
Another important property of transactional databases is closely
related to the notion of atomic updates: when multiple transactions
are running concurrently, each one should not be able to see the
incomplete changes made by others. For example, if one transaction
is busy totalling all the branch balances, it would not do for it
to include the debit from Alice's branch but not the credit to
Bob's branch, nor vice versa. So transactions must be all-or-nothing
not only in terms of their permanent effect on the database, but
also in terms of their visibility as they happen. The updates made
so far by an open transaction are invisible to other transactions
until the transaction completes, whereupon all the updates become
visible simultaneously.
</para>
<para>
In <productname>PostgreSQL</productname>, a transaction is set up by surrounding
the SQL commands of the transaction with
<command>BEGIN</command> and <command>COMMIT</command> commands. So our banking
transaction would actually look like:
<programlisting>
BEGIN;
UPDATE accounts SET balance = balance - 100.00
WHERE name = 'Alice';
-- etc etc
COMMIT;
</programlisting>
</para>
<para>
If, partway through the transaction, we decide we do not want to
commit (perhaps we just noticed that Alice's balance went negative),
we can issue the command <command>ROLLBACK</command> instead of
<command>COMMIT</command>, and all our updates so far will be canceled.
</para>
<para>
<productname>PostgreSQL</productname> actually treats every SQL statement as being
executed within a transaction. If you do not issue a <command>BEGIN</command>
command,
then each individual statement has an implicit <command>BEGIN</command> and
(if successful) <command>COMMIT</command> wrapped around it. A group of
statements surrounded by <command>BEGIN</command> and <command>COMMIT</command>
is sometimes called a <firstterm>transaction block</firstterm>.
</para>
<note>
<para>
Some client libraries issue <command>BEGIN</command> and <command>COMMIT</command>
commands automatically, so that you might get the effect of transaction
blocks without asking. Check the documentation for the interface
you are using.
</para>
</note>
<para>
It's possible to control the statements in a transaction in a more
granular fashion through the use of <firstterm>savepoints</firstterm>. Savepoints
allow you to selectively discard parts of the transaction, while
committing the rest. After defining a savepoint with
<command>SAVEPOINT</command>, you can if needed roll back to the savepoint
with <command>ROLLBACK TO</command>. All the transaction's database changes
between defining the savepoint and rolling back to it are discarded, but
changes earlier than the savepoint are kept.
</para>
<para>
After rolling back to a savepoint, it continues to be defined, so you can
roll back to it several times. Conversely, if you are sure you won't need
to roll back to a particular savepoint again, it can be released, so the
system can free some resources. Keep in mind that either releasing or
rolling back to a savepoint
will automatically release all savepoints that were defined after it.
</para>
<para>
All this is happening within the transaction block, so none of it
is visible to other database sessions. When and if you commit the
transaction block, the committed actions become visible as a unit
to other sessions, while the rolled-back actions never become visible
at all.
</para>
<para>
Remembering the bank database, suppose we debit $100.00 from Alice's
account, and credit Bob's account, only to find later that we should
have credited Wally's account. We could do it using savepoints like
this:
<programlisting>
BEGIN;
UPDATE accounts SET balance = balance - 100.00
WHERE name = 'Alice';
SAVEPOINT my_savepoint;
UPDATE accounts SET balance = balance + 100.00
WHERE name = 'Bob';
-- oops ... forget that and use Wally's account
ROLLBACK TO my_savepoint;
UPDATE accounts SET balance = balance + 100.00
WHERE name = 'Wally';
COMMIT;
</programlisting>
</para>
<para>
This example is, of course, oversimplified, but there's a lot of control
possible in a transaction block through the use of savepoints.
Moreover, <command>ROLLBACK TO</command> is the only way to regain control of a
transaction block that was put in aborted state by the
system due to an error, short of rolling it back completely and starting
again.
</para>
</sect1>
<sect1 id="tutorial-window">
<title>Window Functions</title>
<indexterm zone="tutorial-window">
<primary>window function</primary>
</indexterm>
<para>
A <firstterm>window function</firstterm> performs a calculation across a set of
table rows that are somehow related to the current row. This is comparable
to the type of calculation that can be done with an aggregate function.
However, window functions do not cause rows to become grouped into a single
output row like non-window aggregate calls would. Instead, the
rows retain their separate identities. Behind the scenes, the window
function is able to access more than just the current row of the query
result.
</para>
<para>
Here is an example that shows how to compare each employee's salary
with the average salary in his or her department:
<programlisting>
SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
</programlisting>
<screen>
depname | empno | salary | avg
-----------+-------+--------+-----------------------
develop | 11 | 5200 | 5020.0000000000000000
develop | 7 | 4200 | 5020.0000000000000000
develop | 9 | 4500 | 5020.0000000000000000
develop | 8 | 6000 | 5020.0000000000000000
develop | 10 | 5200 | 5020.0000000000000000
personnel | 5 | 3500 | 3700.0000000000000000
personnel | 2 | 3900 | 3700.0000000000000000
sales | 3 | 4800 | 4866.6666666666666667
sales | 1 | 5000 | 4866.6666666666666667
sales | 4 | 4800 | 4866.6666666666666667
(10 rows)
</screen>
The first three output columns come directly from the table
<structname>empsalary</structname>, and there is one output row for each row in the
table. The fourth column represents an average taken across all the table
rows that have the same <structfield>depname</structfield> value as the current row.
(This actually is the same function as the non-window <function>avg</function>
aggregate, but the <literal>OVER</literal> clause causes it to be
treated as a window function and computed across the window frame.)
</para>
<para>
A window function call always contains an <literal>OVER</literal> clause
directly following the window function's name and argument(s). This is what
syntactically distinguishes it from a normal function or non-window
aggregate. The <literal>OVER</literal> clause determines exactly how the
rows of the query are split up for processing by the window function.
The <literal>PARTITION BY</literal> clause within <literal>OVER</literal>
divides the rows into groups, or partitions, that share the same
values of the <literal>PARTITION BY</literal> expression(s). For each row,
the window function is computed across the rows that fall into the
same partition as the current row.
</para>
<para>
You can also control the order in which rows are processed by
window functions using <literal>ORDER BY</literal> within <literal>OVER</literal>.
(The window <literal>ORDER BY</literal> does not even have to match the
order in which the rows are output.) Here is an example:
<programlisting>
SELECT depname, empno, salary,
rank() OVER (PARTITION BY depname ORDER BY salary DESC)
FROM empsalary;
</programlisting>
<screen>
depname | empno | salary | rank
-----------+-------+--------+------
develop | 8 | 6000 | 1
develop | 10 | 5200 | 2
develop | 11 | 5200 | 2
develop | 9 | 4500 | 4
develop | 7 | 4200 | 5
personnel | 2 | 3900 | 1
personnel | 5 | 3500 | 2
sales | 1 | 5000 | 1
sales | 4 | 4800 | 2
sales | 3 | 4800 | 2
(10 rows)
</screen>
As shown here, the <function>rank</function> function produces a numerical rank
for each distinct <literal>ORDER BY</literal> value in the current row's
partition, using the order defined by the <literal>ORDER BY</literal> clause.
<function>rank</function> needs no explicit parameter, because its behavior
is entirely determined by the <literal>OVER</literal> clause.
</para>
<para>
The rows considered by a window function are those of the <quote>virtual
table</quote> produced by the query's <literal>FROM</literal> clause as filtered by its
<literal>WHERE</literal>, <literal>GROUP BY</literal>, and <literal>HAVING</literal> clauses
if any. For example, a row removed because it does not meet the
<literal>WHERE</literal> condition is not seen by any window function.
A query can contain multiple window functions that slice up the data
in different ways using different <literal>OVER</literal> clauses, but
they all act on the same collection of rows defined by this virtual table.
</para>
<para>
We already saw that <literal>ORDER BY</literal> can be omitted if the ordering
of rows is not important. It is also possible to omit <literal>PARTITION
BY</literal>, in which case there is a single partition containing all rows.
</para>
<para>
There is another important concept associated with window functions:
for each row, there is a set of rows within its partition called its
<firstterm>window frame</firstterm>. Some window functions act only
on the rows of the window frame, rather than of the whole partition.
By default, if <literal>ORDER BY</literal> is supplied then the frame consists of
all rows from the start of the partition up through the current row, plus
any following rows that are equal to the current row according to the
<literal>ORDER BY</literal> clause. When <literal>ORDER BY</literal> is omitted the
default frame consists of all rows in the partition.
<footnote>
<para>
There are options to define the window frame in other ways, but
this tutorial does not cover them. See
<xref linkend="syntax-window-functions"/> for details.
</para>
</footnote>
Here is an example using <function>sum</function>:
</para>
<programlisting>
SELECT salary, sum(salary) OVER () FROM empsalary;
</programlisting>
<screen>
salary | sum
--------+-------
5200 | 47100
5000 | 47100
3500 | 47100
4800 | 47100
3900 | 47100
4200 | 47100
4500 | 47100
4800 | 47100
6000 | 47100
5200 | 47100
(10 rows)
</screen>
<para>
Above, since there is no <literal>ORDER BY</literal> in the <literal>OVER</literal>
clause, the window frame is the same as the partition, which for lack of
<literal>PARTITION BY</literal> is the whole table; in other words each sum is
taken over the whole table and so we get the same result for each output
row. But if we add an <literal>ORDER BY</literal> clause, we get very different
results:
</para>
<programlisting>
SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;
</programlisting>
<screen>
salary | sum
--------+-------
3500 | 3500
3900 | 7400
4200 | 11600
4500 | 16100
4800 | 25700
4800 | 25700
5000 | 30700
5200 | 41100
5200 | 41100
6000 | 47100
(10 rows)
</screen>
<para>
Here the sum is taken from the first (lowest) salary up through the
current one, including any duplicates of the current one (notice the
results for the duplicated salaries).
</para>
<para>
Window functions are permitted only in the <literal>SELECT</literal> list
and the <literal>ORDER BY</literal> clause of the query. They are forbidden
elsewhere, such as in <literal>GROUP BY</literal>, <literal>HAVING</literal>
and <literal>WHERE</literal> clauses. This is because they logically
execute after the processing of those clauses. Also, window functions
execute after non-window aggregate functions. This means it is valid to
include an aggregate function call in the arguments of a window function,
but not vice versa.
</para>
<para>
If there is a need to filter or group rows after the window calculations
are performed, you can use a sub-select. For example:
<programlisting>
SELECT depname, empno, salary, enroll_date
FROM
(SELECT depname, empno, salary, enroll_date,
rank() OVER (PARTITION BY depname ORDER BY salary DESC, empno) AS pos
FROM empsalary
) AS ss
WHERE pos < 3;
</programlisting>
The above query only shows the rows from the inner query having
<literal>rank</literal> less than 3.
</para>
<para>
When a query involves multiple window functions, it is possible to write
out each one with a separate <literal>OVER</literal> clause, but this is
duplicative and error-prone if the same windowing behavior is wanted
for several functions. Instead, each windowing behavior can be named
in a <literal>WINDOW</literal> clause and then referenced in <literal>OVER</literal>.
For example:
<programlisting>
SELECT sum(salary) OVER w, avg(salary) OVER w
FROM empsalary
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);
</programlisting>
</para>
<para>
More details about window functions can be found in
<xref linkend="syntax-window-functions"/>,
<xref linkend="functions-window"/>,
<xref linkend="queries-window"/>, and the
<xref linkend="sql-select"/> reference page.
</para>
</sect1>
<sect1 id="tutorial-inheritance">
<title>Inheritance</title>
<indexterm zone="tutorial-inheritance">
<primary>inheritance</primary>
</indexterm>
<para>
Inheritance is a concept from object-oriented databases. It opens
up interesting new possibilities of database design.
</para>
<para>
Let's create two tables: A table <classname>cities</classname>
and a table <classname>capitals</classname>. Naturally, capitals
are also cities, so you want some way to show the capitals
implicitly when you list all cities. If you're really clever you
might invent some scheme like this:
<programlisting>
CREATE TABLE capitals (
name text,
population real,
elevation int, -- (in ft)
state char(2)
);
CREATE TABLE non_capitals (
name text,
population real,
elevation int -- (in ft)
);
CREATE VIEW cities AS
SELECT name, population, elevation FROM capitals
UNION
SELECT name, population, elevation FROM non_capitals;
</programlisting>
This works OK as far as querying goes, but it gets ugly when you
need to update several rows, for one thing.
</para>
<para>
A better solution is this:
<programlisting>
CREATE TABLE cities (
name text,
population real,
elevation int -- (in ft)
);
CREATE TABLE capitals (
state char(2) UNIQUE NOT NULL
) INHERITS (cities);
</programlisting>
</para>
<para>
In this case, a row of <classname>capitals</classname>
<firstterm>inherits</firstterm> all columns (<structfield>name</structfield>,
<structfield>population</structfield>, and <structfield>elevation</structfield>) from its
<firstterm>parent</firstterm>, <classname>cities</classname>. The
type of the column <structfield>name</structfield> is
<type>text</type>, a native <productname>PostgreSQL</productname>
type for variable length character strings. The
<classname>capitals</classname> table has
an additional column, <structfield>state</structfield>, which shows its
state abbreviation. In
<productname>PostgreSQL</productname>, a table can inherit from
zero or more other tables.
</para>
<para>
For example, the following query finds the names of all cities,
including state capitals, that are located at an elevation
over 500 feet:
<programlisting>
SELECT name, elevation
FROM cities
WHERE elevation > 500;
</programlisting>
which returns:
<screen>
name | elevation
-----------+-----------
Las Vegas | 2174
Mariposa | 1953
Madison | 845
(3 rows)
</screen>
</para>
<para>
On the other hand, the following query finds
all the cities that are not state capitals and
are situated at an elevation over 500 feet:
<programlisting>
SELECT name, elevation
FROM ONLY cities
WHERE elevation > 500;
</programlisting>
<screen>
name | elevation
-----------+-----------
Las Vegas | 2174
Mariposa | 1953
(2 rows)
</screen>
</para>
<para>
Here the <literal>ONLY</literal> before <literal>cities</literal>
indicates that the query should be run over only the
<classname>cities</classname> table, and not tables below
<classname>cities</classname> in the inheritance hierarchy. Many
of the commands that we have already discussed —
<command>SELECT</command>, <command>UPDATE</command>, and
<command>DELETE</command> — support this <literal>ONLY</literal>
notation.
</para>
<note>
<para>
Although inheritance is frequently useful, it has not been integrated
with unique constraints or foreign keys, which limits its usefulness.
See <xref linkend="ddl-inherit"/> for more detail.
</para>
</note>
</sect1>
<sect1 id="tutorial-conclusion">
<title>Conclusion</title>
<para>
<productname>PostgreSQL</productname> has many features not
touched upon in this tutorial introduction, which has been
oriented toward newer users of <acronym>SQL</acronym>. These
features are discussed in more detail in the remainder of this
book.
</para>
<para>
If you feel you need more introductory material, please visit the PostgreSQL
<ulink url="https://www.postgresql.org">web site</ulink>
for links to more resources.
</para>
</sect1>
</chapter>
|