diff options
Diffstat (limited to '')
-rw-r--r-- | doc/src/sgml/ddl.sgml | 5279 |
1 files changed, 5279 insertions, 0 deletions
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml new file mode 100644 index 0000000..11fa9c4 --- /dev/null +++ b/doc/src/sgml/ddl.sgml @@ -0,0 +1,5279 @@ +<!-- doc/src/sgml/ddl.sgml --> + +<chapter id="ddl"> + <title>Data Definition</title> + + <para> + This chapter covers how one creates the database structures that + will hold one's data. In a relational database, the raw data is + stored in tables, so the majority of this chapter is devoted to + explaining how tables are created and modified and what features are + available to control what data is stored in the tables. + Subsequently, we discuss how tables can be organized into + schemas, and how privileges can be assigned to tables. Finally, + we will briefly look at other features that affect the data storage, + such as inheritance, table partitioning, views, functions, and + triggers. + </para> + + <sect1 id="ddl-basics"> + <title>Table Basics</title> + + <indexterm zone="ddl-basics"> + <primary>table</primary> + </indexterm> + + <indexterm> + <primary>row</primary> + </indexterm> + + <indexterm> + <primary>column</primary> + </indexterm> + + <para> + A table in a relational database is much like a table on paper: It + consists of rows and columns. The number and order of the columns + is fixed, and each column has a name. The number of rows is + variable — it reflects how much data is stored at a given moment. + SQL does not make any guarantees about the order of the rows in a + table. When a table is read, the rows will appear in an unspecified order, + unless sorting is explicitly requested. This is covered in <xref + linkend="queries"/>. Furthermore, SQL does not assign unique + identifiers to rows, so it is possible to have several completely + identical rows in a table. This is a consequence of the + mathematical model that underlies SQL but is usually not desirable. + Later in this chapter we will see how to deal with this issue. + </para> + + <para> + Each column has a data type. The data type constrains the set of + possible values that can be assigned to a column and assigns + semantics to the data stored in the column so that it can be used + for computations. For instance, a column declared to be of a + numerical type will not accept arbitrary text strings, and the data + stored in such a column can be used for mathematical computations. + By contrast, a column declared to be of a character string type + will accept almost any kind of data but it does not lend itself to + mathematical calculations, although other operations such as string + concatenation are available. + </para> + + <para> + <productname>PostgreSQL</productname> includes a sizable set of + built-in data types that fit many applications. Users can also + define their own data types. Most built-in data types have obvious + names and semantics, so we defer a detailed explanation to <xref + linkend="datatype"/>. Some of the frequently used data types are + <type>integer</type> for whole numbers, <type>numeric</type> for + possibly fractional numbers, <type>text</type> for character + strings, <type>date</type> for dates, <type>time</type> for + time-of-day values, and <type>timestamp</type> for values + containing both date and time. + </para> + + <indexterm> + <primary>table</primary> + <secondary>creating</secondary> + </indexterm> + + <para> + To create a table, you use the aptly named <xref + linkend="sql-createtable"/> command. + In this command you specify at least a name for the new table, the + names of the columns and the data type of each column. For + example: +<programlisting> +CREATE TABLE my_first_table ( + first_column text, + second_column integer +); +</programlisting> + This creates a table named <literal>my_first_table</literal> with + two columns. The first column is named + <literal>first_column</literal> and has a data type of + <type>text</type>; the second column has the name + <literal>second_column</literal> and the type <type>integer</type>. + The table and column names follow the identifier syntax explained + in <xref linkend="sql-syntax-identifiers"/>. The type names are + usually also identifiers, but there are some exceptions. Note that the + column list is comma-separated and surrounded by parentheses. + </para> + + <para> + Of course, the previous example was heavily contrived. Normally, + you would give names to your tables and columns that convey what + kind of data they store. So let's look at a more realistic + example: +<programlisting> +CREATE TABLE products ( + product_no integer, + name text, + price numeric +); +</programlisting> + (The <type>numeric</type> type can store fractional components, as + would be typical of monetary amounts.) + </para> + + <tip> + <para> + When you create many interrelated tables it is wise to choose a + consistent naming pattern for the tables and columns. For + instance, there is a choice of using singular or plural nouns for + table names, both of which are favored by some theorist or other. + </para> + </tip> + + <para> + There is a limit on how many columns a table can contain. + Depending on the column types, it is between 250 and 1600. + However, defining a table with anywhere near this many columns is + highly unusual and often a questionable design. + </para> + + <indexterm> + <primary>table</primary> + <secondary>removing</secondary> + </indexterm> + + <para> + If you no longer need a table, you can remove it using the <xref + linkend="sql-droptable"/> command. + For example: +<programlisting> +DROP TABLE my_first_table; +DROP TABLE products; +</programlisting> + Attempting to drop a table that does not exist is an error. + Nevertheless, it is common in SQL script files to unconditionally + try to drop each table before creating it, ignoring any error + messages, so that the script works whether or not the table exists. + (If you like, you can use the <literal>DROP TABLE IF EXISTS</literal> variant + to avoid the error messages, but this is not standard SQL.) + </para> + + <para> + If you need to modify a table that already exists, see <xref + linkend="ddl-alter"/> later in this chapter. + </para> + + <para> + With the tools discussed so far you can create fully functional + tables. The remainder of this chapter is concerned with adding + features to the table definition to ensure data integrity, + security, or convenience. If you are eager to fill your tables with + data now you can skip ahead to <xref linkend="dml"/> and read the + rest of this chapter later. + </para> + </sect1> + + <sect1 id="ddl-default"> + <title>Default Values</title> + + <indexterm zone="ddl-default"> + <primary>default value</primary> + </indexterm> + + <para> + A column can be assigned a default value. When a new row is + created and no values are specified for some of the columns, those + columns will be filled with their respective default values. A + data manipulation command can also request explicitly that a column + be set to its default value, without having to know what that value is. + (Details about data manipulation commands are in <xref linkend="dml"/>.) + </para> + + <para> + <indexterm><primary>null value</primary><secondary>default value</secondary></indexterm> + If no default value is declared explicitly, the default value is the + null value. This usually makes sense because a null value can + be considered to represent unknown data. + </para> + + <para> + In a table definition, default values are listed after the column + data type. For example: +<programlisting> +CREATE TABLE products ( + product_no integer, + name text, + price numeric <emphasis>DEFAULT 9.99</emphasis> +); +</programlisting> + </para> + + <para> + The default value can be an expression, which will be + evaluated whenever the default value is inserted + (<emphasis>not</emphasis> when the table is created). A common example + is for a <type>timestamp</type> column to have a default of <literal>CURRENT_TIMESTAMP</literal>, + so that it gets set to the time of row insertion. Another common + example is generating a <quote>serial number</quote> for each row. + In <productname>PostgreSQL</productname> this is typically done by + something like: +<programlisting> +CREATE TABLE products ( + product_no integer <emphasis>DEFAULT nextval('products_product_no_seq')</emphasis>, + ... +); +</programlisting> + where the <literal>nextval()</literal> function supplies successive values + from a <firstterm>sequence object</firstterm> (see <xref + linkend="functions-sequence"/>). This arrangement is sufficiently common + that there's a special shorthand for it: +<programlisting> +CREATE TABLE products ( + product_no <emphasis>SERIAL</emphasis>, + ... +); +</programlisting> + The <literal>SERIAL</literal> shorthand is discussed further in <xref + linkend="datatype-serial"/>. + </para> + </sect1> + + <sect1 id="ddl-generated-columns"> + <title>Generated Columns</title> + + <indexterm zone="ddl-generated-columns"> + <primary>generated column</primary> + </indexterm> + + <para> + A generated column is a special column that is always computed from other + columns. Thus, it is for columns what a view is for tables. There are two + kinds of generated columns: stored and virtual. A stored generated column + is computed when it is written (inserted or updated) and occupies storage + as if it were a normal column. A virtual generated column occupies no + storage and is computed when it is read. Thus, a virtual generated column + is similar to a view and a stored generated column is similar to a + materialized view (except that it is always updated automatically). + PostgreSQL currently implements only stored generated columns. + </para> + + <para> + To create a generated column, use the <literal>GENERATED ALWAYS + AS</literal> clause in <command>CREATE TABLE</command>, for example: +<programlisting> +CREATE TABLE people ( + ..., + height_cm numeric, + height_in numeric <emphasis>GENERATED ALWAYS AS (height_cm / 2.54) STORED</emphasis> +); +</programlisting> + The keyword <literal>STORED</literal> must be specified to choose the + stored kind of generated column. See <xref linkend="sql-createtable"/> for + more details. + </para> + + <para> + A generated column cannot be written to directly. In + <command>INSERT</command> or <command>UPDATE</command> commands, a value + cannot be specified for a generated column, but the keyword + <literal>DEFAULT</literal> may be specified. + </para> + + <para> + Consider the differences between a column with a default and a generated + column. The column default is evaluated once when the row is first + inserted if no other value was provided; a generated column is updated + whenever the row changes and cannot be overridden. A column default may + not refer to other columns of the table; a generation expression would + normally do so. A column default can use volatile functions, for example + <literal>random()</literal> or functions referring to the current time; + this is not allowed for generated columns. + </para> + + <para> + Several restrictions apply to the definition of generated columns and + tables involving generated columns: + + <itemizedlist> + <listitem> + <para> + The generation expression can only use immutable functions and cannot + use subqueries or reference anything other than the current row in any + way. + </para> + </listitem> + <listitem> + <para> + A generation expression cannot reference another generated column. + </para> + </listitem> + <listitem> + <para> + A generation expression cannot reference a system column, except + <varname>tableoid</varname>. + </para> + </listitem> + <listitem> + <para> + A generated column cannot have a column default or an identity definition. + </para> + </listitem> + <listitem> + <para> + A generated column cannot be part of a partition key. + </para> + </listitem> + <listitem> + <para> + Foreign tables can have generated columns. See <xref + linkend="sql-createforeigntable"/> for details. + </para> + </listitem> + <listitem> + <para>For inheritance and partitioning:</para> + <itemizedlist> + <listitem> + <para> + If a parent column is a generated column, its child column must also + be a generated column; however, the child column can have a + different generation expression. The generation expression that is + actually applied during insert or update of a row is the one + associated with the table that the row is physically in. + (This is unlike the behavior for column defaults: for those, the + default value associated with the table named in the query applies.) + </para> + </listitem> + <listitem> + <para> + If a parent column is not a generated column, its child column must + not be generated either. + </para> + </listitem> + <listitem> + <para> + For inherited tables, if you write a child column definition without + any <literal>GENERATED</literal> clause in <command>CREATE TABLE + ... INHERITS</command>, then its <literal>GENERATED</literal> clause + will automatically be copied from the parent. <command>ALTER TABLE + ... INHERIT</command> will insist that parent and child columns + already match as to generation status, but it will not require their + generation expressions to match. + </para> + </listitem> + <listitem> + <para> + Similarly for partitioned tables, if you write a child column + definition without any <literal>GENERATED</literal> clause + in <command>CREATE TABLE ... PARTITION OF</command>, then + its <literal>GENERATED</literal> clause will automatically be copied + from the parent. <command>ALTER TABLE ... ATTACH PARTITION</command> + will insist that parent and child columns already match as to + generation status, but it will not require their generation + expressions to match. + </para> + </listitem> + <listitem> + <para> + In case of multiple inheritance, if one parent column is a generated + column, then all parent columns must be generated columns. If they + do not all have the same generation expression, then the desired + expression for the child must be specified explicitly. + </para> + </listitem> + </itemizedlist> + </listitem> + </itemizedlist> + </para> + + <para> + Additional considerations apply to the use of generated columns. + <itemizedlist> + <listitem> + <para> + Generated columns maintain access privileges separately from their + underlying base columns. So, it is possible to arrange it so that a + particular role can read from a generated column but not from the + underlying base columns. + </para> + </listitem> + <listitem> + <para> + Generated columns are, conceptually, updated after + <literal>BEFORE</literal> triggers have run. Therefore, changes made to + base columns in a <literal>BEFORE</literal> trigger will be reflected in + generated columns. But conversely, it is not allowed to access + generated columns in <literal>BEFORE</literal> triggers. + </para> + </listitem> + </itemizedlist> + </para> + </sect1> + + <sect1 id="ddl-constraints"> + <title>Constraints</title> + + <indexterm zone="ddl-constraints"> + <primary>constraint</primary> + </indexterm> + + <para> + Data types are a way to limit the kind of data that can be stored + in a table. For many applications, however, the constraint they + provide is too coarse. For example, a column containing a product + price should probably only accept positive values. But there is no + standard data type that accepts only positive numbers. Another issue is + that you might want to constrain column data with respect to other + columns or rows. For example, in a table containing product + information, there should be only one row for each product number. + </para> + + <para> + To that end, SQL allows you to define constraints on columns and + tables. Constraints give you as much control over the data in your + tables as you wish. If a user attempts to store data in a column + that would violate a constraint, an error is raised. This applies + even if the value came from the default value definition. + </para> + + <sect2 id="ddl-constraints-check-constraints"> + <title>Check Constraints</title> + + <indexterm> + <primary>check constraint</primary> + </indexterm> + + <indexterm> + <primary>constraint</primary> + <secondary>check</secondary> + </indexterm> + + <para> + A check constraint is the most generic constraint type. It allows + you to specify that the value in a certain column must satisfy a + Boolean (truth-value) expression. For instance, to require positive + product prices, you could use: +<programlisting> +CREATE TABLE products ( + product_no integer, + name text, + price numeric <emphasis>CHECK (price > 0)</emphasis> +); +</programlisting> + </para> + + <para> + As you see, the constraint definition comes after the data type, + just like default value definitions. Default values and + constraints can be listed in any order. A check constraint + consists of the key word <literal>CHECK</literal> followed by an + expression in parentheses. The check constraint expression should + involve the column thus constrained, otherwise the constraint + would not make too much sense. + </para> + + <indexterm> + <primary>constraint</primary> + <secondary>name</secondary> + </indexterm> + + <para> + You can also give the constraint a separate name. This clarifies + error messages and allows you to refer to the constraint when you + need to change it. The syntax is: +<programlisting> +CREATE TABLE products ( + product_no integer, + name text, + price numeric <emphasis>CONSTRAINT positive_price</emphasis> CHECK (price > 0) +); +</programlisting> + So, to specify a named constraint, use the key word + <literal>CONSTRAINT</literal> followed by an identifier followed + by the constraint definition. (If you don't specify a constraint + name in this way, the system chooses a name for you.) + </para> + + <para> + A check constraint can also refer to several columns. Say you + store a regular price and a discounted price, and you want to + ensure that the discounted price is lower than the regular price: +<programlisting> +CREATE TABLE products ( + product_no integer, + name text, + price numeric CHECK (price > 0), + discounted_price numeric CHECK (discounted_price > 0), + <emphasis>CHECK (price > discounted_price)</emphasis> +); +</programlisting> + </para> + + <para> + The first two constraints should look familiar. The third one + uses a new syntax. It is not attached to a particular column, + instead it appears as a separate item in the comma-separated + column list. Column definitions and these constraint + definitions can be listed in mixed order. + </para> + + <para> + We say that the first two constraints are column constraints, whereas the + third one is a table constraint because it is written separately + from any one column definition. Column constraints can also be + written as table constraints, while the reverse is not necessarily + possible, since a column constraint is supposed to refer to only the + column it is attached to. (<productname>PostgreSQL</productname> doesn't + enforce that rule, but you should follow it if you want your table + definitions to work with other database systems.) The above example could + also be written as: +<programlisting> +CREATE TABLE products ( + product_no integer, + name text, + price numeric, + CHECK (price > 0), + discounted_price numeric, + CHECK (discounted_price > 0), + CHECK (price > discounted_price) +); +</programlisting> + or even: +<programlisting> +CREATE TABLE products ( + product_no integer, + name text, + price numeric CHECK (price > 0), + discounted_price numeric, + CHECK (discounted_price > 0 AND price > discounted_price) +); +</programlisting> + It's a matter of taste. + </para> + + <para> + Names can be assigned to table constraints in the same way as + column constraints: +<programlisting> +CREATE TABLE products ( + product_no integer, + name text, + price numeric, + CHECK (price > 0), + discounted_price numeric, + CHECK (discounted_price > 0), + <emphasis>CONSTRAINT valid_discount</emphasis> CHECK (price > discounted_price) +); +</programlisting> + </para> + + <indexterm> + <primary>null value</primary> + <secondary sortas="check constraints">with check constraints</secondary> + </indexterm> + + <para> + It should be noted that a check constraint is satisfied if the + check expression evaluates to true or the null value. Since most + expressions will evaluate to the null value if any operand is null, + they will not prevent null values in the constrained columns. To + ensure that a column does not contain null values, the not-null + constraint described in the next section can be used. + </para> + + <note> + <para> + <productname>PostgreSQL</productname> does not support + <literal>CHECK</literal> constraints that reference table data other than + the new or updated row being checked. While a <literal>CHECK</literal> + constraint that violates this rule may appear to work in simple + tests, it cannot guarantee that the database will not reach a state + in which the constraint condition is false (due to subsequent changes + of the other row(s) involved). This would cause a database dump and + restore to fail. The restore could fail even when the complete + database state is consistent with the constraint, due to rows not + being loaded in an order that will satisfy the constraint. If + possible, use <literal>UNIQUE</literal>, <literal>EXCLUDE</literal>, + or <literal>FOREIGN KEY</literal> constraints to express + cross-row and cross-table restrictions. + </para> + + <para> + If what you desire is a one-time check against other rows at row + insertion, rather than a continuously-maintained consistency + guarantee, a custom <link linkend="triggers">trigger</link> can be used + to implement that. (This approach avoids the dump/restore problem because + <application>pg_dump</application> does not reinstall triggers until after + restoring data, so that the check will not be enforced during a + dump/restore.) + </para> + </note> + + <note> + <para> + <productname>PostgreSQL</productname> assumes that + <literal>CHECK</literal> constraints' conditions are immutable, that + is, they will always give the same result for the same input row. + This assumption is what justifies examining <literal>CHECK</literal> + constraints only when rows are inserted or updated, and not at other + times. (The warning above about not referencing other table data is + really a special case of this restriction.) + </para> + + <para> + An example of a common way to break this assumption is to reference a + user-defined function in a <literal>CHECK</literal> expression, and + then change the behavior of that + function. <productname>PostgreSQL</productname> does not disallow + that, but it will not notice if there are rows in the table that now + violate the <literal>CHECK</literal> constraint. That would cause a + subsequent database dump and restore to fail. + The recommended way to handle such a change is to drop the constraint + (using <command>ALTER TABLE</command>), adjust the function definition, + and re-add the constraint, thereby rechecking it against all table rows. + </para> + </note> + </sect2> + + <sect2 id="ddl-constraints-not-null"> + <title>Not-Null Constraints</title> + + <indexterm> + <primary>not-null constraint</primary> + </indexterm> + + <indexterm> + <primary>constraint</primary> + <secondary>NOT NULL</secondary> + </indexterm> + + <para> + A not-null constraint simply specifies that a column must not + assume the null value. A syntax example: +<programlisting> +CREATE TABLE products ( + product_no integer <emphasis>NOT NULL</emphasis>, + name text <emphasis>NOT NULL</emphasis>, + price numeric +); +</programlisting> + </para> + + <para> + A not-null constraint is always written as a column constraint. A + not-null constraint is functionally equivalent to creating a check + constraint <literal>CHECK (<replaceable>column_name</replaceable> + IS NOT NULL)</literal>, but in + <productname>PostgreSQL</productname> creating an explicit + not-null constraint is more efficient. The drawback is that you + cannot give explicit names to not-null constraints created this + way. + </para> + + <para> + Of course, a column can have more than one constraint. Just write + the constraints one after another: +<programlisting> +CREATE TABLE products ( + product_no integer NOT NULL, + name text NOT NULL, + price numeric NOT NULL CHECK (price > 0) +); +</programlisting> + The order doesn't matter. It does not necessarily determine in which + order the constraints are checked. + </para> + + <para> + The <literal>NOT NULL</literal> constraint has an inverse: the + <literal>NULL</literal> constraint. This does not mean that the + column must be null, which would surely be useless. Instead, this + simply selects the default behavior that the column might be null. + The <literal>NULL</literal> constraint is not present in the SQL + standard and should not be used in portable applications. (It was + only added to <productname>PostgreSQL</productname> to be + compatible with some other database systems.) Some users, however, + like it because it makes it easy to toggle the constraint in a + script file. For example, you could start with: +<programlisting> +CREATE TABLE products ( + product_no integer NULL, + name text NULL, + price numeric NULL +); +</programlisting> + and then insert the <literal>NOT</literal> key word where desired. + </para> + + <tip> + <para> + In most database designs the majority of columns should be marked + not null. + </para> + </tip> + </sect2> + + <sect2 id="ddl-constraints-unique-constraints"> + <title>Unique Constraints</title> + + <indexterm> + <primary>unique constraint</primary> + </indexterm> + + <indexterm> + <primary>constraint</primary> + <secondary>unique</secondary> + </indexterm> + + <para> + Unique constraints ensure that the data contained in a column, or a + group of columns, is unique among all the rows in the + table. The syntax is: +<programlisting> +CREATE TABLE products ( + product_no integer <emphasis>UNIQUE</emphasis>, + name text, + price numeric +); +</programlisting> + when written as a column constraint, and: +<programlisting> +CREATE TABLE products ( + product_no integer, + name text, + price numeric, + <emphasis>UNIQUE (product_no)</emphasis> +); +</programlisting> + when written as a table constraint. + </para> + + <para> + To define a unique constraint for a group of columns, write it as a + table constraint with the column names separated by commas: +<programlisting> +CREATE TABLE example ( + a integer, + b integer, + c integer, + <emphasis>UNIQUE (a, c)</emphasis> +); +</programlisting> + This specifies that the combination of values in the indicated columns + is unique across the whole table, though any one of the columns + need not be (and ordinarily isn't) unique. + </para> + + <para> + You can assign your own name for a unique constraint, in the usual way: +<programlisting> +CREATE TABLE products ( + product_no integer <emphasis>CONSTRAINT must_be_different</emphasis> UNIQUE, + name text, + price numeric +); +</programlisting> + </para> + + <para> + Adding a unique constraint will automatically create a unique B-tree + index on the column or group of columns listed in the constraint. + A uniqueness restriction covering only some rows cannot be written as + a unique constraint, but it is possible to enforce such a restriction by + creating a unique <link linkend="indexes-partial">partial index</link>. + </para> + + <indexterm> + <primary>null value</primary> + <secondary sortas="unique constraints">with unique constraints</secondary> + </indexterm> + + <para> + In general, a unique constraint is violated if there is more than + one row in the table where the values of all of the + columns included in the constraint are equal. + By default, two null values are not considered equal in this + comparison. That means even in the presence of a + unique constraint it is possible to store duplicate + rows that contain a null value in at least one of the constrained + columns. This behavior can be changed by adding the clause <literal>NULLS + NOT DISTINCT</literal>, like +<programlisting> +CREATE TABLE products ( + product_no integer UNIQUE <emphasis>NULLS NOT DISTINCT</emphasis>, + name text, + price numeric +); +</programlisting> + or +<programlisting> +CREATE TABLE products ( + product_no integer, + name text, + price numeric, + UNIQUE <emphasis>NULLS NOT DISTINCT</emphasis> (product_no) +); +</programlisting> + The default behavior can be specified explicitly using <literal>NULLS + DISTINCT</literal>. The default null treatment in unique constraints is + implementation-defined according to the SQL standard, and other + implementations have a different behavior. So be careful when developing + applications that are intended to be portable. + </para> + </sect2> + + <sect2 id="ddl-constraints-primary-keys"> + <title>Primary Keys</title> + + <indexterm> + <primary>primary key</primary> + </indexterm> + + <indexterm> + <primary>constraint</primary> + <secondary>primary key</secondary> + </indexterm> + + <para> + A primary key constraint indicates that a column, or group of columns, + can be used as a unique identifier for rows in the table. This + requires that the values be both unique and not null. So, the following + two table definitions accept the same data: +<programlisting> +CREATE TABLE products ( + product_no integer UNIQUE NOT NULL, + name text, + price numeric +); +</programlisting> + +<programlisting> +CREATE TABLE products ( + product_no integer <emphasis>PRIMARY KEY</emphasis>, + name text, + price numeric +); +</programlisting> + </para> + + <para> + Primary keys can span more than one column; the syntax + is similar to unique constraints: +<programlisting> +CREATE TABLE example ( + a integer, + b integer, + c integer, + <emphasis>PRIMARY KEY (a, c)</emphasis> +); +</programlisting> + </para> + + <para> + Adding a primary key will automatically create a unique B-tree index + on the column or group of columns listed in the primary key, and will + force the column(s) to be marked <literal>NOT NULL</literal>. + </para> + + <para> + A table can have at most one primary key. (There can be any number + of unique and not-null constraints, which are functionally almost the + same thing, but only one can be identified as the primary key.) + Relational database theory + dictates that every table must have a primary key. This rule is + not enforced by <productname>PostgreSQL</productname>, but it is + usually best to follow it. + </para> + + <para> + Primary keys are useful both for + documentation purposes and for client applications. For example, + a GUI application that allows modifying row values probably needs + to know the primary key of a table to be able to identify rows + uniquely. There are also various ways in which the database system + makes use of a primary key if one has been declared; for example, + the primary key defines the default target column(s) for foreign keys + referencing its table. + </para> + </sect2> + + <sect2 id="ddl-constraints-fk"> + <title>Foreign Keys</title> + + <indexterm> + <primary>foreign key</primary> + </indexterm> + + <indexterm> + <primary>constraint</primary> + <secondary>foreign key</secondary> + </indexterm> + + <indexterm> + <primary>referential integrity</primary> + </indexterm> + + <para> + A foreign key constraint specifies that the values in a column (or + a group of columns) must match the values appearing in some row + of another table. + We say this maintains the <firstterm>referential + integrity</firstterm> between two related tables. + </para> + + <para> + Say you have the product table that we have used several times already: +<programlisting> +CREATE TABLE products ( + product_no integer PRIMARY KEY, + name text, + price numeric +); +</programlisting> + Let's also assume you have a table storing orders of those + products. We want to ensure that the orders table only contains + orders of products that actually exist. So we define a foreign + key constraint in the orders table that references the products + table: +<programlisting> +CREATE TABLE orders ( + order_id integer PRIMARY KEY, + product_no integer <emphasis>REFERENCES products (product_no)</emphasis>, + quantity integer +); +</programlisting> + Now it is impossible to create orders with non-NULL + <structfield>product_no</structfield> entries that do not appear in the + products table. + </para> + + <para> + We say that in this situation the orders table is the + <firstterm>referencing</firstterm> table and the products table is + the <firstterm>referenced</firstterm> table. Similarly, there are + referencing and referenced columns. + </para> + + <para> + You can also shorten the above command to: +<programlisting> +CREATE TABLE orders ( + order_id integer PRIMARY KEY, + product_no integer <emphasis>REFERENCES products</emphasis>, + quantity integer +); +</programlisting> + because in absence of a column list the primary key of the + referenced table is used as the referenced column(s). + </para> + + <para> + You can assign your own name for a foreign key constraint, + in the usual way. + </para> + + <para> + A foreign key can also constrain and reference a group of columns. + As usual, it then needs to be written in table constraint form. + Here is a contrived syntax example: +<programlisting> +CREATE TABLE t1 ( + a integer PRIMARY KEY, + b integer, + c integer, + <emphasis>FOREIGN KEY (b, c) REFERENCES other_table (c1, c2)</emphasis> +); +</programlisting> + Of course, the number and type of the constrained columns need to + match the number and type of the referenced columns. + </para> + + <indexterm> + <primary>foreign key</primary> + <secondary>self-referential</secondary> + </indexterm> + + <para> + Sometimes it is useful for the <quote>other table</quote> of a + foreign key constraint to be the same table; this is called + a <firstterm>self-referential</firstterm> foreign key. For + example, if you want rows of a table to represent nodes of a tree + structure, you could write +<programlisting> +CREATE TABLE tree ( + node_id integer PRIMARY KEY, + parent_id integer REFERENCES tree, + name text, + ... +); +</programlisting> + A top-level node would have NULL <structfield>parent_id</structfield>, + while non-NULL <structfield>parent_id</structfield> entries would be + constrained to reference valid rows of the table. + </para> + + <para> + A table can have more than one foreign key constraint. This is + used to implement many-to-many relationships between tables. Say + you have tables about products and orders, but now you want to + allow one order to contain possibly many products (which the + structure above did not allow). You could use this table structure: +<programlisting> +CREATE TABLE products ( + product_no integer PRIMARY KEY, + name text, + price numeric +); + +CREATE TABLE orders ( + order_id integer PRIMARY KEY, + shipping_address text, + ... +); + +CREATE TABLE order_items ( + product_no integer REFERENCES products, + order_id integer REFERENCES orders, + quantity integer, + PRIMARY KEY (product_no, order_id) +); +</programlisting> + Notice that the primary key overlaps with the foreign keys in + the last table. + </para> + + <indexterm> + <primary>CASCADE</primary> + <secondary>foreign key action</secondary> + </indexterm> + + <indexterm> + <primary>RESTRICT</primary> + <secondary>foreign key action</secondary> + </indexterm> + + <para> + We know that the foreign keys disallow creation of orders that + do not relate to any products. But what if a product is removed + after an order is created that references it? SQL allows you to + handle that as well. Intuitively, we have a few options: + <itemizedlist spacing="compact"> + <listitem><para>Disallow deleting a referenced product</para></listitem> + <listitem><para>Delete the orders as well</para></listitem> + <listitem><para>Something else?</para></listitem> + </itemizedlist> + </para> + + <para> + To illustrate this, let's implement the following policy on the + many-to-many relationship example above: when someone wants to + remove a product that is still referenced by an order (via + <literal>order_items</literal>), we disallow it. If someone + removes an order, the order items are removed as well: +<programlisting> +CREATE TABLE products ( + product_no integer PRIMARY KEY, + name text, + price numeric +); + +CREATE TABLE orders ( + order_id integer PRIMARY KEY, + shipping_address text, + ... +); + +CREATE TABLE order_items ( + product_no integer REFERENCES products <emphasis>ON DELETE RESTRICT</emphasis>, + order_id integer REFERENCES orders <emphasis>ON DELETE CASCADE</emphasis>, + quantity integer, + PRIMARY KEY (product_no, order_id) +); +</programlisting> + </para> + + <para> + Restricting and cascading deletes are the two most common options. + <literal>RESTRICT</literal> prevents deletion of a + referenced row. <literal>NO ACTION</literal> means that if any + referencing rows still exist when the constraint is checked, an error + is raised; this is the default behavior if you do not specify anything. + (The essential difference between these two choices is that + <literal>NO ACTION</literal> allows the check to be deferred until + later in the transaction, whereas <literal>RESTRICT</literal> does not.) + <literal>CASCADE</literal> specifies that when a referenced row is deleted, + row(s) referencing it should be automatically deleted as well. + There are two other options: + <literal>SET NULL</literal> and <literal>SET DEFAULT</literal>. + These cause the referencing column(s) in the referencing row(s) + to be set to nulls or their default + values, respectively, when the referenced row is deleted. + Note that these do not excuse you from observing any constraints. + For example, if an action specifies <literal>SET DEFAULT</literal> + but the default value would not satisfy the foreign key constraint, the + operation will fail. + </para> + + <para> + The appropriate choice of <literal>ON DELETE</literal> action depends on + what kinds of objects the related tables represent. When the referencing + table represents something that is a component of what is represented by + the referenced table and cannot exist independently, then + <literal>CASCADE</literal> could be appropriate. If the two tables + represent independent objects, then <literal>RESTRICT</literal> or + <literal>NO ACTION</literal> is more appropriate; an application that + actually wants to delete both objects would then have to be explicit about + this and run two delete commands. In the above example, order items are + part of an order, and it is convenient if they are deleted automatically + if an order is deleted. But products and orders are different things, and + so making a deletion of a product automatically cause the deletion of some + order items could be considered problematic. The actions <literal>SET + NULL</literal> or <literal>SET DEFAULT</literal> can be appropriate if a + foreign-key relationship represents optional information. For example, if + the products table contained a reference to a product manager, and the + product manager entry gets deleted, then setting the product's product + manager to null or a default might be useful. + </para> + + <para> + The actions <literal>SET NULL</literal> and <literal>SET DEFAULT</literal> + can take a column list to specify which columns to set. Normally, all + columns of the foreign-key constraint are set; setting only a subset is + useful in some special cases. Consider the following example: +<programlisting> +CREATE TABLE tenants ( + tenant_id integer PRIMARY KEY +); + +CREATE TABLE users ( + tenant_id integer REFERENCES tenants ON DELETE CASCADE, + user_id integer NOT NULL, + PRIMARY KEY (tenant_id, user_id) +); + +CREATE TABLE posts ( + tenant_id integer REFERENCES tenants ON DELETE CASCADE, + post_id integer NOT NULL, + author_id integer, + PRIMARY KEY (tenant_id, post_id), + FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL <emphasis>(author_id)</emphasis> +); +</programlisting> + Without the specification of the column, the foreign key would also set + the column <literal>tenant_id</literal> to null, but that column is still + required as part of the primary key. + </para> + + <para> + Analogous to <literal>ON DELETE</literal> there is also + <literal>ON UPDATE</literal> which is invoked when a referenced + column is changed (updated). The possible actions are the same, + except that column lists cannot be specified for <literal>SET + NULL</literal> and <literal>SET DEFAULT</literal>. + In this case, <literal>CASCADE</literal> means that the updated values of the + referenced column(s) should be copied into the referencing row(s). + </para> + + <para> + Normally, a referencing row need not satisfy the foreign key constraint + if any of its referencing columns are null. If <literal>MATCH FULL</literal> + is added to the foreign key declaration, a referencing row escapes + satisfying the constraint only if all its referencing columns are null + (so a mix of null and non-null values is guaranteed to fail a + <literal>MATCH FULL</literal> constraint). If you don't want referencing rows + to be able to avoid satisfying the foreign key constraint, declare the + referencing column(s) as <literal>NOT NULL</literal>. + </para> + + <para> + A foreign key must reference columns that either are a primary key or + form a unique constraint, or are columns from a non-partial unique index. + This means that the referenced columns always have an index to allow + efficient lookups on whether a referencing row has a match. Since a + <command>DELETE</command> of a row from the referenced table or an + <command>UPDATE</command> of a referenced column will require a scan of + the referencing table for rows matching the old value, it is often a good + idea to index the referencing columns too. Because this is not always + needed, and there are many choices available on how to index, the + declaration of a foreign key constraint does not automatically create an + index on the referencing columns. + </para> + + <para> + More information about updating and deleting data is in <xref + linkend="dml"/>. Also see the description of foreign key constraint + syntax in the reference documentation for + <xref linkend="sql-createtable"/>. + </para> + </sect2> + + <sect2 id="ddl-constraints-exclusion"> + <title>Exclusion Constraints</title> + + <indexterm> + <primary>exclusion constraint</primary> + </indexterm> + + <indexterm> + <primary>constraint</primary> + <secondary>exclusion</secondary> + </indexterm> + + <para> + Exclusion constraints ensure that if any two rows are compared on + the specified columns or expressions using the specified operators, + at least one of these operator comparisons will return false or null. + The syntax is: +<programlisting> +CREATE TABLE circles ( + c circle, + EXCLUDE USING gist (c WITH &&) +); +</programlisting> + </para> + + <para> + See also <link linkend="sql-createtable-exclude"><command>CREATE + TABLE ... CONSTRAINT ... EXCLUDE</command></link> for details. + </para> + + <para> + Adding an exclusion constraint will automatically create an index + of the type specified in the constraint declaration. + </para> + </sect2> + </sect1> + + <sect1 id="ddl-system-columns"> + <title>System Columns</title> + + <para> + Every table has several <firstterm>system columns</firstterm> that are + implicitly defined by the system. Therefore, these names cannot be + used as names of user-defined columns. (Note that these + restrictions are separate from whether the name is a key word or + not; quoting a name will not allow you to escape these + restrictions.) You do not really need to be concerned about these + columns; just know they exist. + </para> + + <indexterm> + <primary>column</primary> + <secondary>system column</secondary> + </indexterm> + + <variablelist> + <varlistentry id="ddl-system-columns-tableoid"> + <term><structfield>tableoid</structfield></term> + <listitem> + <indexterm> + <primary>tableoid</primary> + </indexterm> + + <para> + The OID of the table containing this row. This column is + particularly handy for queries that select from partitioned + tables (see <xref linkend="ddl-partitioning"/>) or inheritance + hierarchies (see <xref linkend="ddl-inherit"/>), since without it, + it's difficult to tell which individual table a row came from. The + <structfield>tableoid</structfield> can be joined against the + <structfield>oid</structfield> column of + <structname>pg_class</structname> to obtain the table name. + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-system-columns-xmin"> + <term><structfield>xmin</structfield></term> + <listitem> + <indexterm> + <primary>xmin</primary> + </indexterm> + + <para> + The identity (transaction ID) of the inserting transaction for + this row version. (A row version is an individual state of a + row; each update of a row creates a new row version for the same + logical row.) + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-system-columns-cmin"> + <term><structfield>cmin</structfield></term> + <listitem> + <indexterm> + <primary>cmin</primary> + </indexterm> + + <para> + The command identifier (starting at zero) within the inserting + transaction. + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-system-columns-xmax"> + <term><structfield>xmax</structfield></term> + <listitem> + <indexterm> + <primary>xmax</primary> + </indexterm> + + <para> + The identity (transaction ID) of the deleting transaction, or + zero for an undeleted row version. It is possible for this column to + be nonzero in a visible row version. That usually indicates that the + deleting transaction hasn't committed yet, or that an attempted + deletion was rolled back. + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-system-columns-cmax"> + <term><structfield>cmax</structfield></term> + <listitem> + <indexterm> + <primary>cmax</primary> + </indexterm> + + <para> + The command identifier within the deleting transaction, or zero. + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-system-columns-ctid"> + <term><structfield>ctid</structfield></term> + <listitem> + <indexterm> + <primary>ctid</primary> + </indexterm> + + <para> + The physical location of the row version within its table. Note that + although the <structfield>ctid</structfield> can be used to + locate the row version very quickly, a row's + <structfield>ctid</structfield> will change if it is + updated or moved by <command>VACUUM FULL</command>. Therefore + <structfield>ctid</structfield> is useless as a long-term row + identifier. A primary key should be used to identify logical rows. + </para> + </listitem> + </varlistentry> + </variablelist> + + <para> + Transaction identifiers are also 32-bit quantities. In a + long-lived database it is possible for transaction IDs to wrap + around. This is not a fatal problem given appropriate maintenance + procedures; see <xref linkend="maintenance"/> for details. It is + unwise, however, to depend on the uniqueness of transaction IDs + over the long term (more than one billion transactions). + </para> + + <para> + Command identifiers are also 32-bit quantities. This creates a hard limit + of 2<superscript>32</superscript> (4 billion) <acronym>SQL</acronym> commands + within a single transaction. In practice this limit is not a + problem — note that the limit is on the number of + <acronym>SQL</acronym> commands, not the number of rows processed. + Also, only commands that actually modify the database contents will + consume a command identifier. + </para> + </sect1> + + <sect1 id="ddl-alter"> + <title>Modifying Tables</title> + + <indexterm zone="ddl-alter"> + <primary>table</primary> + <secondary>modifying</secondary> + </indexterm> + + <para> + When you create a table and you realize that you made a mistake, or + the requirements of the application change, you can drop the + table and create it again. But this is not a convenient option if + the table is already filled with data, or if the table is + referenced by other database objects (for instance a foreign key + constraint). Therefore <productname>PostgreSQL</productname> + provides a family of commands to make modifications to existing + tables. Note that this is conceptually distinct from altering + the data contained in the table: here we are interested in altering + the definition, or structure, of the table. + </para> + + <para> + You can: + <itemizedlist spacing="compact"> + <listitem> + <para>Add columns</para> + </listitem> + <listitem> + <para>Remove columns</para> + </listitem> + <listitem> + <para>Add constraints</para> + </listitem> + <listitem> + <para>Remove constraints</para> + </listitem> + <listitem> + <para>Change default values</para> + </listitem> + <listitem> + <para>Change column data types</para> + </listitem> + <listitem> + <para>Rename columns</para> + </listitem> + <listitem> + <para>Rename tables</para> + </listitem> + </itemizedlist> + + All these actions are performed using the + <xref linkend="sql-altertable"/> + command, whose reference page contains details beyond those given + here. + </para> + + <sect2 id="ddl-alter-adding-a-column"> + <title>Adding a Column</title> + + <indexterm> + <primary>column</primary> + <secondary>adding</secondary> + </indexterm> + + <para> + To add a column, use a command like: +<programlisting> +ALTER TABLE products ADD COLUMN description text; +</programlisting> + The new column is initially filled with whatever default + value is given (null if you don't specify a <literal>DEFAULT</literal> clause). + </para> + + <tip> + <para> + From <productname>PostgreSQL</productname> 11, adding a column with + a constant default value no longer means that each row of the table + needs to be updated when the <command>ALTER TABLE</command> statement + is executed. Instead, the default value will be returned the next time + the row is accessed, and applied when the table is rewritten, making + the <command>ALTER TABLE</command> very fast even on large tables. + </para> + + <para> + However, if the default value is volatile (e.g., + <function>clock_timestamp()</function>) + each row will need to be updated with the value calculated at the time + <command>ALTER TABLE</command> is executed. To avoid a potentially + lengthy update operation, particularly if you intend to fill the column + with mostly nondefault values anyway, it may be preferable to add the + column with no default, insert the correct values using + <command>UPDATE</command>, and then add any desired default as described + below. + </para> + </tip> + + <para> + You can also define constraints on the column at the same time, + using the usual syntax: +<programlisting> +ALTER TABLE products ADD COLUMN description text CHECK (description <> ''); +</programlisting> + In fact all the options that can be applied to a column description + in <command>CREATE TABLE</command> can be used here. Keep in mind however + that the default value must satisfy the given constraints, or the + <literal>ADD</literal> will fail. Alternatively, you can add + constraints later (see below) after you've filled in the new column + correctly. + </para> + + </sect2> + + <sect2 id="ddl-alter-removing-a-column"> + <title>Removing a Column</title> + + <indexterm> + <primary>column</primary> + <secondary>removing</secondary> + </indexterm> + + <para> + To remove a column, use a command like: +<programlisting> +ALTER TABLE products DROP COLUMN description; +</programlisting> + Whatever data was in the column disappears. Table constraints involving + the column are dropped, too. However, if the column is referenced by a + foreign key constraint of another table, + <productname>PostgreSQL</productname> will not silently drop that + constraint. You can authorize dropping everything that depends on + the column by adding <literal>CASCADE</literal>: +<programlisting> +ALTER TABLE products DROP COLUMN description CASCADE; +</programlisting> + See <xref linkend="ddl-depend"/> for a description of the general + mechanism behind this. + </para> + </sect2> + + <sect2 id="ddl-alter-adding-a-constraint"> + <title>Adding a Constraint</title> + + <indexterm> + <primary>constraint</primary> + <secondary>adding</secondary> + </indexterm> + + <para> + To add a constraint, the table constraint syntax is used. For example: +<programlisting> +ALTER TABLE products ADD CHECK (name <> ''); +ALTER TABLE products ADD CONSTRAINT some_name UNIQUE (product_no); +ALTER TABLE products ADD FOREIGN KEY (product_group_id) REFERENCES product_groups; +</programlisting> + To add a not-null constraint, which cannot be written as a table + constraint, use this syntax: +<programlisting> +ALTER TABLE products ALTER COLUMN product_no SET NOT NULL; +</programlisting> + </para> + + <para> + The constraint will be checked immediately, so the table data must + satisfy the constraint before it can be added. + </para> + </sect2> + + <sect2 id="ddl-alter-removing-a-constraint"> + <title>Removing a Constraint</title> + + <indexterm> + <primary>constraint</primary> + <secondary>removing</secondary> + </indexterm> + + <para> + To remove a constraint you need to know its name. If you gave it + a name then that's easy. Otherwise the system assigned a + generated name, which you need to find out. The + <application>psql</application> command <literal>\d + <replaceable>tablename</replaceable></literal> can be helpful + here; other interfaces might also provide a way to inspect table + details. Then the command is: +<programlisting> +ALTER TABLE products DROP CONSTRAINT some_name; +</programlisting> + (If you are dealing with a generated constraint name like <literal>$2</literal>, + don't forget that you'll need to double-quote it to make it a valid + identifier.) + </para> + + <para> + As with dropping a column, you need to add <literal>CASCADE</literal> if you + want to drop a constraint that something else depends on. An example + is that a foreign key constraint depends on a unique or primary key + constraint on the referenced column(s). + </para> + + <para> + This works the same for all constraint types except not-null + constraints. To drop a not null constraint use: +<programlisting> +ALTER TABLE products ALTER COLUMN product_no DROP NOT NULL; +</programlisting> + (Recall that not-null constraints do not have names.) + </para> + </sect2> + + <sect2 id="ddl-alter-column-default"> + <title>Changing a Column's Default Value</title> + + <indexterm> + <primary>default value</primary> + <secondary>changing</secondary> + </indexterm> + + <para> + To set a new default for a column, use a command like: +<programlisting> +ALTER TABLE products ALTER COLUMN price SET DEFAULT 7.77; +</programlisting> + Note that this doesn't affect any existing rows in the table, it + just changes the default for future <command>INSERT</command> commands. + </para> + + <para> + To remove any default value, use: +<programlisting> +ALTER TABLE products ALTER COLUMN price DROP DEFAULT; +</programlisting> + This is effectively the same as setting the default to null. + As a consequence, it is not an error + to drop a default where one hadn't been defined, because the + default is implicitly the null value. + </para> + </sect2> + + <sect2 id="ddl-alter-column-type"> + <title>Changing a Column's Data Type</title> + + <indexterm> + <primary>column data type</primary> + <secondary>changing</secondary> + </indexterm> + + <para> + To convert a column to a different data type, use a command like: +<programlisting> +ALTER TABLE products ALTER COLUMN price TYPE numeric(10,2); +</programlisting> + This will succeed only if each existing entry in the column can be + converted to the new type by an implicit cast. If a more complex + conversion is needed, you can add a <literal>USING</literal> clause that + specifies how to compute the new values from the old. + </para> + + <para> + <productname>PostgreSQL</productname> will attempt to convert the column's + default value (if any) to the new type, as well as any constraints + that involve the column. But these conversions might fail, or might + produce surprising results. It's often best to drop any constraints + on the column before altering its type, and then add back suitably + modified constraints afterwards. + </para> + </sect2> + + <sect2 id="ddl-alter-renaming-column"> + <title>Renaming a Column</title> + + <indexterm> + <primary>column</primary> + <secondary>renaming</secondary> + </indexterm> + + <para> + To rename a column: +<programlisting> +ALTER TABLE products RENAME COLUMN product_no TO product_number; +</programlisting> + </para> + </sect2> + + <sect2 id="ddl-alter-renaming-table"> + <title>Renaming a Table</title> + + <indexterm> + <primary>table</primary> + <secondary>renaming</secondary> + </indexterm> + + <para> + To rename a table: +<programlisting> +ALTER TABLE products RENAME TO items; +</programlisting> + </para> + </sect2> + </sect1> + + <sect1 id="ddl-priv"> + <title>Privileges</title> + + <indexterm zone="ddl-priv"> + <primary>privilege</primary> + </indexterm> + + <indexterm> + <primary>permission</primary> + <see>privilege</see> + </indexterm> + + <indexterm zone="ddl-priv"> + <primary>owner</primary> + </indexterm> + + <indexterm zone="ddl-priv"> + <primary>GRANT</primary> + </indexterm> + + <indexterm zone="ddl-priv"> + <primary>REVOKE</primary> + </indexterm> + + <indexterm zone="ddl-priv"> + <primary>ACL</primary> + </indexterm> + + <para> + When an object is created, it is assigned an owner. The + owner is normally the role that executed the creation statement. + For most kinds of objects, the initial state is that only the owner + (or a superuser) can do anything with the object. To allow + other roles to use it, <firstterm>privileges</firstterm> must be + granted. + </para> + + <para> + There are different kinds of privileges: <literal>SELECT</literal>, + <literal>INSERT</literal>, <literal>UPDATE</literal>, <literal>DELETE</literal>, + <literal>TRUNCATE</literal>, <literal>REFERENCES</literal>, <literal>TRIGGER</literal>, + <literal>CREATE</literal>, <literal>CONNECT</literal>, <literal>TEMPORARY</literal>, + <literal>EXECUTE</literal>, <literal>USAGE</literal>, <literal>SET</literal> + and <literal>ALTER SYSTEM</literal>. + The privileges applicable to a particular + object vary depending on the object's type (table, function, etc.). + More detail about the meanings of these privileges appears below. + The following sections and chapters will also show you how + these privileges are used. + </para> + + <para> + The right to modify or destroy an object is inherent in being the + object's owner, and cannot be granted or revoked in itself. + (However, like all privileges, that right can be inherited by + members of the owning role; see <xref linkend="role-membership"/>.) + </para> + + <para> + An object can be assigned to a new owner with an <command>ALTER</command> + command of the appropriate kind for the object, for example +<programlisting> +ALTER TABLE <replaceable>table_name</replaceable> OWNER TO <replaceable>new_owner</replaceable>; +</programlisting> + Superusers can always do this; ordinary roles can only do it if they are + both the current owner of the object (or inherit the privileges of the + owning role) and able to <literal>SET ROLE</literal> to the new owning role. + </para> + + <para> + To assign privileges, the <xref linkend="sql-grant"/> command is + used. For example, if <literal>joe</literal> is an existing role, and + <literal>accounts</literal> is an existing table, the privilege to + update the table can be granted with: +<programlisting> +GRANT UPDATE ON accounts TO joe; +</programlisting> + Writing <literal>ALL</literal> in place of a specific privilege grants all + privileges that are relevant for the object type. + </para> + + <para> + The special <quote>role</quote> name <literal>PUBLIC</literal> can + be used to grant a privilege to every role on the system. Also, + <quote>group</quote> roles can be set up to help manage privileges when + there are many users of a database — for details see + <xref linkend="user-manag"/>. + </para> + + <para> + To revoke a previously-granted privilege, use the fittingly named + <xref linkend="sql-revoke"/> command: +<programlisting> +REVOKE ALL ON accounts FROM PUBLIC; +</programlisting> + </para> + + <para> + Ordinarily, only the object's owner (or a superuser) can grant or + revoke privileges on an object. However, it is possible to grant a + privilege <quote>with grant option</quote>, which gives the recipient + the right to grant it in turn to others. If the grant option is + subsequently revoked then all who received the privilege from that + recipient (directly or through a chain of grants) will lose the + privilege. For details see the <xref linkend="sql-grant"/> and + <xref linkend="sql-revoke"/> reference pages. + </para> + + <para> + An object's owner can choose to revoke their own ordinary privileges, + for example to make a table read-only for themselves as well as others. + But owners are always treated as holding all grant options, so they + can always re-grant their own privileges. + </para> + + <para> + The available privileges are: + + <variablelist> + <varlistentry id="ddl-priv-select"> + <term><literal>SELECT</literal></term> + <listitem> + <para> + Allows <command>SELECT</command> from + any column, or specific column(s), of a table, view, materialized + view, or other table-like object. + Also allows use of <command>COPY TO</command>. + This privilege is also needed to reference existing column values in + <command>UPDATE</command>, <command>DELETE</command>, + or <command>MERGE</command>. + For sequences, this privilege also allows use of the + <function>currval</function> function. + For large objects, this privilege allows the object to be read. + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-priv-insert"> + <term><literal>INSERT</literal></term> + <listitem> + <para> + Allows <command>INSERT</command> of a new row into a table, view, + etc. Can be granted on specific column(s), in which case + only those columns may be assigned to in the <command>INSERT</command> + command (other columns will therefore receive default values). + Also allows use of <command>COPY FROM</command>. + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-priv-update"> + <term><literal>UPDATE</literal></term> + <listitem> + <para> + Allows <command>UPDATE</command> of any + column, or specific column(s), of a table, view, etc. + (In practice, any nontrivial <command>UPDATE</command> command will + require <literal>SELECT</literal> privilege as well, since it must + reference table columns to determine which rows to update, and/or to + compute new values for columns.) + <literal>SELECT ... FOR UPDATE</literal> + and <literal>SELECT ... FOR SHARE</literal> + also require this privilege on at least one column, in addition to the + <literal>SELECT</literal> privilege. For sequences, this + privilege allows use of the <function>nextval</function> and + <function>setval</function> functions. + For large objects, this privilege allows writing or truncating the + object. + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-priv-delete"> + <term><literal>DELETE</literal></term> + <listitem> + <para> + Allows <command>DELETE</command> of a row from a table, view, etc. + (In practice, any nontrivial <command>DELETE</command> command will + require <literal>SELECT</literal> privilege as well, since it must + reference table columns to determine which rows to delete.) + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-priv-truncate"> + <term><literal>TRUNCATE</literal></term> + <listitem> + <para> + Allows <command>TRUNCATE</command> on a table. + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-priv-references"> + <term><literal>REFERENCES</literal></term> + <listitem> + <para> + Allows creation of a foreign key constraint referencing a + table, or specific column(s) of a table. + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-priv-trigger"> + <term><literal>TRIGGER</literal></term> + <listitem> + <para> + Allows creation of a trigger on a table, view, etc. + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-priv-create"> + <term><literal>CREATE</literal></term> + <listitem> + <para> + For databases, allows new schemas and publications to be created within + the database, and allows trusted extensions to be installed within + the database. + </para> + <para> + For schemas, allows new objects to be created within the schema. + To rename an existing object, you must own the + object <emphasis>and</emphasis> have this privilege for the containing + schema. + </para> + <para> + For tablespaces, allows tables, indexes, and temporary files to be + created within the tablespace, and allows databases to be created that + have the tablespace as their default tablespace. + </para> + <para> + Note that revoking this privilege will not alter the existence or + location of existing objects. + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-priv-connect"> + <term><literal>CONNECT</literal></term> + <listitem> + <para> + Allows the grantee to connect to the database. This + privilege is checked at connection startup (in addition to checking + any restrictions imposed by <filename>pg_hba.conf</filename>). + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-priv-temporary"> + <term><literal>TEMPORARY</literal></term> + <listitem> + <para> + Allows temporary tables to be created while using the database. + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-priv-execute"> + <term><literal>EXECUTE</literal></term> + <listitem> + <para> + Allows calling a function or procedure, including use of + any operators that are implemented on top of the function. This is the + only type of privilege that is applicable to functions and procedures. + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-priv-usage"> + <term><literal>USAGE</literal></term> + <listitem> + <para> + For procedural languages, allows use of the language for + the creation of functions in that language. This is the only type + of privilege that is applicable to procedural languages. + </para> + <para> + For schemas, allows access to objects contained in the + schema (assuming that the objects' own privilege requirements are + also met). Essentially this allows the grantee to <quote>look up</quote> + objects within the schema. Without this permission, it is still + possible to see the object names, e.g., by querying system catalogs. + Also, after revoking this permission, existing sessions might have + statements that have previously performed this lookup, so this is not + a completely secure way to prevent object access. + </para> + <para> + For sequences, allows use of the + <function>currval</function> and <function>nextval</function> functions. + </para> + <para> + For types and domains, allows use of the type or domain in the + creation of tables, functions, and other schema objects. (Note that + this privilege does not control all <quote>usage</quote> of the + type, such as values of the type appearing in queries. It only + prevents objects from being created that depend on the type. The + main purpose of this privilege is controlling which users can create + dependencies on a type, which could prevent the owner from changing + the type later.) + </para> + <para> + For foreign-data wrappers, allows creation of new servers using the + foreign-data wrapper. + </para> + <para> + For foreign servers, allows creation of foreign tables using the + server. Grantees may also create, alter, or drop their own user + mappings associated with that server. + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-priv-set"> + <term><literal>SET</literal></term> + <listitem> + <para> + Allows a server configuration parameter to be set to a new value + within the current session. (While this privilege can be granted + on any parameter, it is meaningless except for parameters that would + normally require superuser privilege to set.) + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-priv-alter-system"> + <term><literal>ALTER SYSTEM</literal></term> + <listitem> + <para> + Allows a server configuration parameter to be configured to a new + value using the <xref linkend="sql-altersystem"/> command. + </para> + </listitem> + </varlistentry> + </variablelist> + + The privileges required by other commands are listed on the + reference page of the respective command. + </para> + + <para> + PostgreSQL grants privileges on some types of objects to + <literal>PUBLIC</literal> by default when the objects are created. + No privileges are granted to <literal>PUBLIC</literal> by default on + tables, + table columns, + sequences, + foreign data wrappers, + foreign servers, + large objects, + schemas, + tablespaces, + or configuration parameters. + For other types of objects, the default privileges + granted to <literal>PUBLIC</literal> are as follows: + <literal>CONNECT</literal> and <literal>TEMPORARY</literal> (create + temporary tables) privileges for databases; + <literal>EXECUTE</literal> privilege for functions and procedures; and + <literal>USAGE</literal> privilege for languages and data types + (including domains). + The object owner can, of course, <command>REVOKE</command> + both default and expressly granted privileges. (For maximum + security, issue the <command>REVOKE</command> in the same transaction that + creates the object; then there is no window in which another user + can use the object.) + Also, these default privilege settings can be overridden using the + <xref linkend="sql-alterdefaultprivileges"/> command. + </para> + + <para> + <xref linkend="privilege-abbrevs-table"/> shows the one-letter + abbreviations that are used for these privilege types in + <firstterm>ACL</firstterm> (Access Control List) values. + You will see these letters in the output of the <xref linkend="app-psql"/> + commands listed below, or when looking at ACL columns of system catalogs. + </para> + + <table id="privilege-abbrevs-table"> + <title>ACL Privilege Abbreviations</title> + <tgroup cols="3"> + <colspec colname="col1" colwidth="1*"/> + <colspec colname="col2" colwidth="1*"/> + <colspec colname="col3" colwidth="2*"/> + <thead> + <row> + <entry>Privilege</entry> + <entry>Abbreviation</entry> + <entry>Applicable Object Types</entry> + </row> + </thead> + <tbody> + <row> + <entry><literal>SELECT</literal></entry> + <entry><literal>r</literal> (<quote>read</quote>)</entry> + <entry> + <literal>LARGE OBJECT</literal>, + <literal>SEQUENCE</literal>, + <literal>TABLE</literal> (and table-like objects), + table column + </entry> + </row> + <row> + <entry><literal>INSERT</literal></entry> + <entry><literal>a</literal> (<quote>append</quote>)</entry> + <entry><literal>TABLE</literal>, table column</entry> + </row> + <row> + <entry><literal>UPDATE</literal></entry> + <entry><literal>w</literal> (<quote>write</quote>)</entry> + <entry> + <literal>LARGE OBJECT</literal>, + <literal>SEQUENCE</literal>, + <literal>TABLE</literal>, + table column + </entry> + </row> + <row> + <entry><literal>DELETE</literal></entry> + <entry><literal>d</literal></entry> + <entry><literal>TABLE</literal></entry> + </row> + <row> + <entry><literal>TRUNCATE</literal></entry> + <entry><literal>D</literal></entry> + <entry><literal>TABLE</literal></entry> + </row> + <row> + <entry><literal>REFERENCES</literal></entry> + <entry><literal>x</literal></entry> + <entry><literal>TABLE</literal>, table column</entry> + </row> + <row> + <entry><literal>TRIGGER</literal></entry> + <entry><literal>t</literal></entry> + <entry><literal>TABLE</literal></entry> + </row> + <row> + <entry><literal>CREATE</literal></entry> + <entry><literal>C</literal></entry> + <entry> + <literal>DATABASE</literal>, + <literal>SCHEMA</literal>, + <literal>TABLESPACE</literal> + </entry> + </row> + <row> + <entry><literal>CONNECT</literal></entry> + <entry><literal>c</literal></entry> + <entry><literal>DATABASE</literal></entry> + </row> + <row> + <entry><literal>TEMPORARY</literal></entry> + <entry><literal>T</literal></entry> + <entry><literal>DATABASE</literal></entry> + </row> + <row> + <entry><literal>EXECUTE</literal></entry> + <entry><literal>X</literal></entry> + <entry><literal>FUNCTION</literal>, <literal>PROCEDURE</literal></entry> + </row> + <row> + <entry><literal>USAGE</literal></entry> + <entry><literal>U</literal></entry> + <entry> + <literal>DOMAIN</literal>, + <literal>FOREIGN DATA WRAPPER</literal>, + <literal>FOREIGN SERVER</literal>, + <literal>LANGUAGE</literal>, + <literal>SCHEMA</literal>, + <literal>SEQUENCE</literal>, + <literal>TYPE</literal> + </entry> + </row> + <row> + <entry><literal>SET</literal></entry> + <entry><literal>s</literal></entry> + <entry><literal>PARAMETER</literal></entry> + </row> + <row> + <entry><literal>ALTER SYSTEM</literal></entry> + <entry><literal>A</literal></entry> + <entry><literal>PARAMETER</literal></entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + <xref linkend="privileges-summary-table"/> summarizes the privileges + available for each type of SQL object, using the abbreviations shown + above. + It also shows the <application>psql</application> command + that can be used to examine privilege settings for each object type. + </para> + + <table id="privileges-summary-table"> + <title>Summary of Access Privileges</title> + <tgroup cols="4"> + <colspec colname="col1" colwidth="2*"/> + <colspec colname="col2" colwidth="1*"/> + <colspec colname="col3" colwidth="1*"/> + <colspec colname="col4" colwidth="1*"/> + <thead> + <row> + <entry>Object Type</entry> + <entry>All Privileges</entry> + <entry>Default <literal>PUBLIC</literal> Privileges</entry> + <entry><application>psql</application> Command</entry> + </row> + </thead> + <tbody> + <row> + <entry><literal>DATABASE</literal></entry> + <entry><literal>CTc</literal></entry> + <entry><literal>Tc</literal></entry> + <entry><literal>\l</literal></entry> + </row> + <row> + <entry><literal>DOMAIN</literal></entry> + <entry><literal>U</literal></entry> + <entry><literal>U</literal></entry> + <entry><literal>\dD+</literal></entry> + </row> + <row> + <entry><literal>FUNCTION</literal> or <literal>PROCEDURE</literal></entry> + <entry><literal>X</literal></entry> + <entry><literal>X</literal></entry> + <entry><literal>\df+</literal></entry> + </row> + <row> + <entry><literal>FOREIGN DATA WRAPPER</literal></entry> + <entry><literal>U</literal></entry> + <entry>none</entry> + <entry><literal>\dew+</literal></entry> + </row> + <row> + <entry><literal>FOREIGN SERVER</literal></entry> + <entry><literal>U</literal></entry> + <entry>none</entry> + <entry><literal>\des+</literal></entry> + </row> + <row> + <entry><literal>LANGUAGE</literal></entry> + <entry><literal>U</literal></entry> + <entry><literal>U</literal></entry> + <entry><literal>\dL+</literal></entry> + </row> + <row> + <entry><literal>LARGE OBJECT</literal></entry> + <entry><literal>rw</literal></entry> + <entry>none</entry> + <entry><literal>\dl+</literal></entry> + </row> + <row> + <entry><literal>PARAMETER</literal></entry> + <entry><literal>sA</literal></entry> + <entry>none</entry> + <entry><literal>\dconfig+</literal></entry> + </row> + <row> + <entry><literal>SCHEMA</literal></entry> + <entry><literal>UC</literal></entry> + <entry>none</entry> + <entry><literal>\dn+</literal></entry> + </row> + <row> + <entry><literal>SEQUENCE</literal></entry> + <entry><literal>rwU</literal></entry> + <entry>none</entry> + <entry><literal>\dp</literal></entry> + </row> + <row> + <entry><literal>TABLE</literal> (and table-like objects)</entry> + <entry><literal>arwdDxt</literal></entry> + <entry>none</entry> + <entry><literal>\dp</literal></entry> + </row> + <row> + <entry>Table column</entry> + <entry><literal>arwx</literal></entry> + <entry>none</entry> + <entry><literal>\dp</literal></entry> + </row> + <row> + <entry><literal>TABLESPACE</literal></entry> + <entry><literal>C</literal></entry> + <entry>none</entry> + <entry><literal>\db+</literal></entry> + </row> + <row> + <entry><literal>TYPE</literal></entry> + <entry><literal>U</literal></entry> + <entry><literal>U</literal></entry> + <entry><literal>\dT+</literal></entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + <indexterm> + <primary><type>aclitem</type></primary> + </indexterm> + The privileges that have been granted for a particular object are + displayed as a list of <type>aclitem</type> entries, each having the + format: +<synopsis> +<replaceable>grantee</replaceable><literal>=</literal><replaceable>privilege-abbreviation</replaceable><optional><literal>*</literal></optional>...<literal>/</literal><replaceable>grantor</replaceable> +</synopsis> + Each <type>aclitem</type> lists all the permissions of one grantee that + have been granted by a particular grantor. Specific privileges are + represented by one-letter abbreviations from + <xref linkend="privilege-abbrevs-table"/>, with <literal>*</literal> + appended if the privilege was granted with grant option. For example, + <literal>calvin=r*w/hobbes</literal> specifies that the role + <literal>calvin</literal> has the privilege + <literal>SELECT</literal> (<literal>r</literal>) with grant option + (<literal>*</literal>) as well as the non-grantable + privilege <literal>UPDATE</literal> (<literal>w</literal>), both granted + by the role <literal>hobbes</literal>. If <literal>calvin</literal> + also has some privileges on the same object granted by a different + grantor, those would appear as a separate <type>aclitem</type> entry. + An empty grantee field in an <type>aclitem</type> stands + for <literal>PUBLIC</literal>. + </para> + + <para> + As an example, suppose that user <literal>miriam</literal> creates + table <literal>mytable</literal> and does: +<programlisting> +GRANT SELECT ON mytable TO PUBLIC; +GRANT SELECT, UPDATE, INSERT ON mytable TO admin; +GRANT SELECT (col1), UPDATE (col1) ON mytable TO miriam_rw; +</programlisting> + Then <application>psql</application>'s <literal>\dp</literal> command + would show: +<programlisting> +=> \dp mytable + Access privileges + Schema | Name | Type | Access privileges | Column privileges | Policies +--------+---------+-------+-----------------------+-----------------------+---------- + public | mytable | table | miriam=arwdDxt/miriam+| col1: +| + | | | =r/miriam +| miriam_rw=rw/miriam | + | | | admin=arw/miriam | | +(1 row) +</programlisting> + </para> + + <para> + If the <quote>Access privileges</quote> column is empty for a given + object, it means the object has default privileges (that is, its + privileges entry in the relevant system catalog is null). Default + privileges always include all privileges for the owner, and can include + some privileges for <literal>PUBLIC</literal> depending on the object + type, as explained above. The first <command>GRANT</command> + or <command>REVOKE</command> on an object will instantiate the default + privileges (producing, for + example, <literal>miriam=arwdDxt/miriam</literal>) and then modify them + per the specified request. Similarly, entries are shown in <quote>Column + privileges</quote> only for columns with nondefault privileges. + (Note: for this purpose, <quote>default privileges</quote> always means + the built-in default privileges for the object's type. An object whose + privileges have been affected by an <command>ALTER DEFAULT + PRIVILEGES</command> command will always be shown with an explicit + privilege entry that includes the effects of + the <command>ALTER</command>.) + </para> + + <para> + Notice that the owner's implicit grant options are not marked in the + access privileges display. A <literal>*</literal> will appear only when + grant options have been explicitly granted to someone. + </para> + </sect1> + + <sect1 id="ddl-rowsecurity"> + <title>Row Security Policies</title> + + <indexterm zone="ddl-rowsecurity"> + <primary>row-level security</primary> + </indexterm> + + <indexterm zone="ddl-rowsecurity"> + <primary>policy</primary> + </indexterm> + + <para> + In addition to the SQL-standard <link linkend="ddl-priv">privilege + system</link> available through <xref linkend="sql-grant"/>, + tables can have <firstterm>row security policies</firstterm> that restrict, + on a per-user basis, which rows can be returned by normal queries + or inserted, updated, or deleted by data modification commands. + This feature is also known as <firstterm>Row-Level Security</firstterm>. + By default, tables do not have any policies, so that if a user has + access privileges to a table according to the SQL privilege system, + all rows within it are equally available for querying or updating. + </para> + + <para> + When row security is enabled on a table (with + <link linkend="sql-altertable">ALTER TABLE ... ENABLE ROW LEVEL + SECURITY</link>), all normal access to the table for selecting rows or + modifying rows must be allowed by a row security policy. (However, the + table's owner is typically not subject to row security policies.) If no + policy exists for the table, a default-deny policy is used, meaning that + no rows are visible or can be modified. Operations that apply to the + whole table, such as <command>TRUNCATE</command> and <literal>REFERENCES</literal>, + are not subject to row security. + </para> + + <para> + Row security policies can be specific to commands, or to roles, or to + both. A policy can be specified to apply to <literal>ALL</literal> + commands, or to <literal>SELECT</literal>, <literal>INSERT</literal>, <literal>UPDATE</literal>, + or <literal>DELETE</literal>. Multiple roles can be assigned to a given + policy, and normal role membership and inheritance rules apply. + </para> + + <para> + To specify which rows are visible or modifiable according to a policy, + an expression is required that returns a Boolean result. This + expression will be evaluated for each row prior to any conditions or + functions coming from the user's query. (The only exceptions to this + rule are <literal>leakproof</literal> functions, which are guaranteed to + not leak information; the optimizer may choose to apply such functions + ahead of the row-security check.) Rows for which the expression does + not return <literal>true</literal> will not be processed. Separate expressions + may be specified to provide independent control over the rows which are + visible and the rows which are allowed to be modified. Policy + expressions are run as part of the query and with the privileges of the + user running the query, although security-definer functions can be used + to access data not available to the calling user. + </para> + + <para> + Superusers and roles with the <literal>BYPASSRLS</literal> attribute always + bypass the row security system when accessing a table. Table owners + normally bypass row security as well, though a table owner can choose to + be subject to row security with <link linkend="sql-altertable">ALTER + TABLE ... FORCE ROW LEVEL SECURITY</link>. + </para> + + <para> + Enabling and disabling row security, as well as adding policies to a + table, is always the privilege of the table owner only. + </para> + + <para> + Policies are created using the <xref linkend="sql-createpolicy"/> + command, altered using the <xref linkend="sql-alterpolicy"/> command, + and dropped using the <xref linkend="sql-droppolicy"/> command. To + enable and disable row security for a given table, use the + <xref linkend="sql-altertable"/> command. + </para> + + <para> + Each policy has a name and multiple policies can be defined for a + table. As policies are table-specific, each policy for a table must + have a unique name. Different tables may have policies with the + same name. + </para> + + <para> + When multiple policies apply to a given query, they are combined using + either <literal>OR</literal> (for permissive policies, which are the + default) or using <literal>AND</literal> (for restrictive policies). + This is similar to the rule that a given role has the privileges + of all roles that they are a member of. Permissive vs. restrictive + policies are discussed further below. + </para> + + <para> + As a simple example, here is how to create a policy on + the <literal>account</literal> relation to allow only members of + the <literal>managers</literal> role to access rows, and only rows of their + accounts: + </para> + +<programlisting> +CREATE TABLE accounts (manager text, company text, contact_email text); + +ALTER TABLE accounts ENABLE ROW LEVEL SECURITY; + +CREATE POLICY account_managers ON accounts TO managers + USING (manager = current_user); +</programlisting> + + <para> + The policy above implicitly provides a <literal>WITH CHECK</literal> + clause identical to its <literal>USING</literal> clause, so that the + constraint applies both to rows selected by a command (so a manager + cannot <command>SELECT</command>, <command>UPDATE</command>, + or <command>DELETE</command> existing rows belonging to a different + manager) and to rows modified by a command (so rows belonging to a + different manager cannot be created via <command>INSERT</command> + or <command>UPDATE</command>). + </para> + + <para> + If no role is specified, or the special user name + <literal>PUBLIC</literal> is used, then the policy applies to all + users on the system. To allow all users to access only their own row in + a <literal>users</literal> table, a simple policy can be used: + </para> + +<programlisting> +CREATE POLICY user_policy ON users + USING (user_name = current_user); +</programlisting> + + <para> + This works similarly to the previous example. + </para> + + <para> + To use a different policy for rows that are being added to the table + compared to those rows that are visible, multiple policies can be + combined. This pair of policies would allow all users to view all rows + in the <literal>users</literal> table, but only modify their own: + </para> + +<programlisting> +CREATE POLICY user_sel_policy ON users + FOR SELECT + USING (true); +CREATE POLICY user_mod_policy ON users + USING (user_name = current_user); +</programlisting> + + <para> + In a <command>SELECT</command> command, these two policies are combined + using <literal>OR</literal>, with the net effect being that all rows + can be selected. In other command types, only the second policy applies, + so that the effects are the same as before. + </para> + + <para> + Row security can also be disabled with the <command>ALTER TABLE</command> + command. Disabling row security does not remove any policies that are + defined on the table; they are simply ignored. Then all rows in the + table are visible and modifiable, subject to the standard SQL privileges + system. + </para> + + <para> + Below is a larger example of how this feature can be used in production + environments. The table <literal>passwd</literal> emulates a Unix password + file: + </para> + +<programlisting> +-- Simple passwd-file based example +CREATE TABLE passwd ( + user_name text UNIQUE NOT NULL, + pwhash text, + uid int PRIMARY KEY, + gid int NOT NULL, + real_name text NOT NULL, + home_phone text, + extra_info text, + home_dir text NOT NULL, + shell text NOT NULL +); + +CREATE ROLE admin; -- Administrator +CREATE ROLE bob; -- Normal user +CREATE ROLE alice; -- Normal user + +-- Populate the table +INSERT INTO passwd VALUES + ('admin','xxx',0,0,'Admin','111-222-3333',null,'/root','/bin/dash'); +INSERT INTO passwd VALUES + ('bob','xxx',1,1,'Bob','123-456-7890',null,'/home/bob','/bin/zsh'); +INSERT INTO passwd VALUES + ('alice','xxx',2,1,'Alice','098-765-4321',null,'/home/alice','/bin/zsh'); + +-- Be sure to enable row-level security on the table +ALTER TABLE passwd ENABLE ROW LEVEL SECURITY; + +-- Create policies +-- Administrator can see all rows and add any rows +CREATE POLICY admin_all ON passwd TO admin USING (true) WITH CHECK (true); +-- Normal users can view all rows +CREATE POLICY all_view ON passwd FOR SELECT USING (true); +-- Normal users can update their own records, but +-- limit which shells a normal user is allowed to set +CREATE POLICY user_mod ON passwd FOR UPDATE + USING (current_user = user_name) + WITH CHECK ( + current_user = user_name AND + shell IN ('/bin/bash','/bin/sh','/bin/dash','/bin/zsh','/bin/tcsh') + ); + +-- Allow admin all normal rights +GRANT SELECT, INSERT, UPDATE, DELETE ON passwd TO admin; +-- Users only get select access on public columns +GRANT SELECT + (user_name, uid, gid, real_name, home_phone, extra_info, home_dir, shell) + ON passwd TO public; +-- Allow users to update certain columns +GRANT UPDATE + (pwhash, real_name, home_phone, extra_info, shell) + ON passwd TO public; +</programlisting> + + <para> + As with any security settings, it's important to test and ensure that + the system is behaving as expected. Using the example above, this + demonstrates that the permission system is working properly. + </para> + +<programlisting> +-- admin can view all rows and fields +postgres=> set role admin; +SET +postgres=> table passwd; + user_name | pwhash | uid | gid | real_name | home_phone | extra_info | home_dir | shell +-----------+--------+-----+-----+-----------+--------------+------------+-------------+----------- + admin | xxx | 0 | 0 | Admin | 111-222-3333 | | /root | /bin/dash + bob | xxx | 1 | 1 | Bob | 123-456-7890 | | /home/bob | /bin/zsh + alice | xxx | 2 | 1 | Alice | 098-765-4321 | | /home/alice | /bin/zsh +(3 rows) + +-- Test what Alice is able to do +postgres=> set role alice; +SET +postgres=> table passwd; +ERROR: permission denied for table passwd +postgres=> select user_name,real_name,home_phone,extra_info,home_dir,shell from passwd; + user_name | real_name | home_phone | extra_info | home_dir | shell +-----------+-----------+--------------+------------+-------------+----------- + admin | Admin | 111-222-3333 | | /root | /bin/dash + bob | Bob | 123-456-7890 | | /home/bob | /bin/zsh + alice | Alice | 098-765-4321 | | /home/alice | /bin/zsh +(3 rows) + +postgres=> update passwd set user_name = 'joe'; +ERROR: permission denied for table passwd +-- Alice is allowed to change her own real_name, but no others +postgres=> update passwd set real_name = 'Alice Doe'; +UPDATE 1 +postgres=> update passwd set real_name = 'John Doe' where user_name = 'admin'; +UPDATE 0 +postgres=> update passwd set shell = '/bin/xx'; +ERROR: new row violates WITH CHECK OPTION for "passwd" +postgres=> delete from passwd; +ERROR: permission denied for table passwd +postgres=> insert into passwd (user_name) values ('xxx'); +ERROR: permission denied for table passwd +-- Alice can change her own password; RLS silently prevents updating other rows +postgres=> update passwd set pwhash = 'abc'; +UPDATE 1 +</programlisting> + + <para> + All of the policies constructed thus far have been permissive policies, + meaning that when multiple policies are applied they are combined using + the <quote>OR</quote> Boolean operator. While permissive policies can be constructed + to only allow access to rows in the intended cases, it can be simpler to + combine permissive policies with restrictive policies (which the records + must pass and which are combined using the <quote>AND</quote> Boolean operator). + Building on the example above, we add a restrictive policy to require + the administrator to be connected over a local Unix socket to access the + records of the <literal>passwd</literal> table: + </para> + +<programlisting> +CREATE POLICY admin_local_only ON passwd AS RESTRICTIVE TO admin + USING (pg_catalog.inet_client_addr() IS NULL); +</programlisting> + + <para> + We can then see that an administrator connecting over a network will not + see any records, due to the restrictive policy: + </para> + +<programlisting> +=> SELECT current_user; + current_user +-------------- + admin +(1 row) + +=> select inet_client_addr(); + inet_client_addr +------------------ + 127.0.0.1 +(1 row) + +=> TABLE passwd; + user_name | pwhash | uid | gid | real_name | home_phone | extra_info | home_dir | shell +-----------+--------+-----+-----+-----------+------------+------------+----------+------- +(0 rows) + +=> UPDATE passwd set pwhash = NULL; +UPDATE 0 +</programlisting> + + <para> + Referential integrity checks, such as unique or primary key constraints + and foreign key references, always bypass row security to ensure that + data integrity is maintained. Care must be taken when developing + schemas and row level policies to avoid <quote>covert channel</quote> leaks of + information through such referential integrity checks. + </para> + + <para> + In some contexts it is important to be sure that row security is + not being applied. For example, when taking a backup, it could be + disastrous if row security silently caused some rows to be omitted + from the backup. In such a situation, you can set the + <xref linkend="guc-row-security"/> configuration parameter + to <literal>off</literal>. This does not in itself bypass row security; + what it does is throw an error if any query's results would get filtered + by a policy. The reason for the error can then be investigated and + fixed. + </para> + + <para> + In the examples above, the policy expressions consider only the current + values in the row to be accessed or updated. This is the simplest and + best-performing case; when possible, it's best to design row security + applications to work this way. If it is necessary to consult other rows + or other tables to make a policy decision, that can be accomplished using + sub-<command>SELECT</command>s, or functions that contain <command>SELECT</command>s, + in the policy expressions. Be aware however that such accesses can + create race conditions that could allow information leakage if care is + not taken. As an example, consider the following table design: + </para> + +<programlisting> +-- definition of privilege groups +CREATE TABLE groups (group_id int PRIMARY KEY, + group_name text NOT NULL); + +INSERT INTO groups VALUES + (1, 'low'), + (2, 'medium'), + (5, 'high'); + +GRANT ALL ON groups TO alice; -- alice is the administrator +GRANT SELECT ON groups TO public; + +-- definition of users' privilege levels +CREATE TABLE users (user_name text PRIMARY KEY, + group_id int NOT NULL REFERENCES groups); + +INSERT INTO users VALUES + ('alice', 5), + ('bob', 2), + ('mallory', 2); + +GRANT ALL ON users TO alice; +GRANT SELECT ON users TO public; + +-- table holding the information to be protected +CREATE TABLE information (info text, + group_id int NOT NULL REFERENCES groups); + +INSERT INTO information VALUES + ('barely secret', 1), + ('slightly secret', 2), + ('very secret', 5); + +ALTER TABLE information ENABLE ROW LEVEL SECURITY; + +-- a row should be visible to/updatable by users whose security group_id is +-- greater than or equal to the row's group_id +CREATE POLICY fp_s ON information FOR SELECT + USING (group_id <= (SELECT group_id FROM users WHERE user_name = current_user)); +CREATE POLICY fp_u ON information FOR UPDATE + USING (group_id <= (SELECT group_id FROM users WHERE user_name = current_user)); + +-- we rely only on RLS to protect the information table +GRANT ALL ON information TO public; +</programlisting> + + <para> + Now suppose that <literal>alice</literal> wishes to change the <quote>slightly + secret</quote> information, but decides that <literal>mallory</literal> should not + be trusted with the new content of that row, so she does: + </para> + +<programlisting> +BEGIN; +UPDATE users SET group_id = 1 WHERE user_name = 'mallory'; +UPDATE information SET info = 'secret from mallory' WHERE group_id = 2; +COMMIT; +</programlisting> + + <para> + That looks safe; there is no window wherein <literal>mallory</literal> should be + able to see the <quote>secret from mallory</quote> string. However, there is + a race condition here. If <literal>mallory</literal> is concurrently doing, + say, +<programlisting> +SELECT * FROM information WHERE group_id = 2 FOR UPDATE; +</programlisting> + and her transaction is in <literal>READ COMMITTED</literal> mode, it is possible + for her to see <quote>secret from mallory</quote>. That happens if her + transaction reaches the <structname>information</structname> row just + after <literal>alice</literal>'s does. It blocks waiting + for <literal>alice</literal>'s transaction to commit, then fetches the updated + row contents thanks to the <literal>FOR UPDATE</literal> clause. However, it + does <emphasis>not</emphasis> fetch an updated row for the + implicit <command>SELECT</command> from <structname>users</structname>, because that + sub-<command>SELECT</command> did not have <literal>FOR UPDATE</literal>; instead + the <structname>users</structname> row is read with the snapshot taken at the start + of the query. Therefore, the policy expression tests the old value + of <literal>mallory</literal>'s privilege level and allows her to see the + updated row. + </para> + + <para> + There are several ways around this problem. One simple answer is to use + <literal>SELECT ... FOR SHARE</literal> in sub-<command>SELECT</command>s in row + security policies. However, that requires granting <literal>UPDATE</literal> + privilege on the referenced table (here <structname>users</structname>) to the + affected users, which might be undesirable. (But another row security + policy could be applied to prevent them from actually exercising that + privilege; or the sub-<command>SELECT</command> could be embedded into a security + definer function.) Also, heavy concurrent use of row share locks on the + referenced table could pose a performance problem, especially if updates + of it are frequent. Another solution, practical if updates of the + referenced table are infrequent, is to take an + <literal>ACCESS EXCLUSIVE</literal> lock on the + referenced table when updating it, so that no concurrent transactions + could be examining old row values. Or one could just wait for all + concurrent transactions to end after committing an update of the + referenced table and before making changes that rely on the new security + situation. + </para> + + <para> + For additional details see <xref linkend="sql-createpolicy"/> + and <xref linkend="sql-altertable"/>. + </para> + + </sect1> + + <sect1 id="ddl-schemas"> + <title>Schemas</title> + + <indexterm zone="ddl-schemas"> + <primary>schema</primary> + </indexterm> + + <para> + A <productname>PostgreSQL</productname> database cluster contains + one or more named databases. Roles and a few other object types are + shared across the entire cluster. A client connection to the server + can only access data in a single database, the one specified in the + connection request. + </para> + + <note> + <para> + Users of a cluster do not necessarily have the privilege to access every + database in the cluster. Sharing of role names means that there + cannot be different roles named, say, <literal>joe</literal> in two databases + in the same cluster; but the system can be configured to allow + <literal>joe</literal> access to only some of the databases. + </para> + </note> + + <para> + A database contains one or more named <firstterm>schemas</firstterm>, which + in turn contain tables. Schemas also contain other kinds of named + objects, including data types, functions, and operators. The same + object name can be used in different schemas without conflict; for + example, both <literal>schema1</literal> and <literal>myschema</literal> can + contain tables named <literal>mytable</literal>. Unlike databases, + schemas are not rigidly separated: a user can access objects in any + of the schemas in the database they are connected to, if they have + privileges to do so. + </para> + + <para> + There are several reasons why one might want to use schemas: + + <itemizedlist> + <listitem> + <para> + To allow many users to use one database without interfering with + each other. + </para> + </listitem> + + <listitem> + <para> + To organize database objects into logical groups to make them + more manageable. + </para> + </listitem> + + <listitem> + <para> + Third-party applications can be put into separate schemas so + they do not collide with the names of other objects. + </para> + </listitem> + </itemizedlist> + + Schemas are analogous to directories at the operating system level, + except that schemas cannot be nested. + </para> + + <sect2 id="ddl-schemas-create"> + <title>Creating a Schema</title> + + <indexterm zone="ddl-schemas-create"> + <primary>schema</primary> + <secondary>creating</secondary> + </indexterm> + + <para> + To create a schema, use the <xref linkend="sql-createschema"/> + command. Give the schema a name + of your choice. For example: +<programlisting> +CREATE SCHEMA myschema; +</programlisting> + </para> + + <indexterm> + <primary>qualified name</primary> + </indexterm> + + <indexterm> + <primary>name</primary> + <secondary>qualified</secondary> + </indexterm> + + <para> + To create or access objects in a schema, write a + <firstterm>qualified name</firstterm> consisting of the schema name and + table name separated by a dot: +<synopsis> +<replaceable>schema</replaceable><literal>.</literal><replaceable>table</replaceable> +</synopsis> + This works anywhere a table name is expected, including the table + modification commands and the data access commands discussed in + the following chapters. + (For brevity we will speak of tables only, but the same ideas apply + to other kinds of named objects, such as types and functions.) + </para> + + <para> + Actually, the even more general syntax +<synopsis> +<replaceable>database</replaceable><literal>.</literal><replaceable>schema</replaceable><literal>.</literal><replaceable>table</replaceable> +</synopsis> + can be used too, but at present this is just for pro forma + compliance with the SQL standard. If you write a database name, + it must be the same as the database you are connected to. + </para> + + <para> + So to create a table in the new schema, use: +<programlisting> +CREATE TABLE myschema.mytable ( + ... +); +</programlisting> + </para> + + <indexterm> + <primary>schema</primary> + <secondary>removing</secondary> + </indexterm> + + <para> + To drop a schema if it's empty (all objects in it have been + dropped), use: +<programlisting> +DROP SCHEMA myschema; +</programlisting> + To drop a schema including all contained objects, use: +<programlisting> +DROP SCHEMA myschema CASCADE; +</programlisting> + See <xref linkend="ddl-depend"/> for a description of the general + mechanism behind this. + </para> + + <para> + Often you will want to create a schema owned by someone else + (since this is one of the ways to restrict the activities of your + users to well-defined namespaces). The syntax for that is: +<programlisting> +CREATE SCHEMA <replaceable>schema_name</replaceable> AUTHORIZATION <replaceable>user_name</replaceable>; +</programlisting> + You can even omit the schema name, in which case the schema name + will be the same as the user name. See <xref + linkend="ddl-schemas-patterns"/> for how this can be useful. + </para> + + <para> + Schema names beginning with <literal>pg_</literal> are reserved for + system purposes and cannot be created by users. + </para> + </sect2> + + <sect2 id="ddl-schemas-public"> + <title>The Public Schema</title> + + <indexterm zone="ddl-schemas-public"> + <primary>schema</primary> + <secondary>public</secondary> + </indexterm> + + <para> + In the previous sections we created tables without specifying any + schema names. By default such tables (and other objects) are + automatically put into a schema named <quote>public</quote>. Every new + database contains such a schema. Thus, the following are equivalent: +<programlisting> +CREATE TABLE products ( ... ); +</programlisting> + and: +<programlisting> +CREATE TABLE public.products ( ... ); +</programlisting> + </para> + </sect2> + + <sect2 id="ddl-schemas-path"> + <title>The Schema Search Path</title> + + <indexterm> + <primary>search path</primary> + </indexterm> + + <indexterm> + <primary>unqualified name</primary> + </indexterm> + + <indexterm> + <primary>name</primary> + <secondary>unqualified</secondary> + </indexterm> + + <para> + Qualified names are tedious to write, and it's often best not to + wire a particular schema name into applications anyway. Therefore + tables are often referred to by <firstterm>unqualified names</firstterm>, + which consist of just the table name. The system determines which table + is meant by following a <firstterm>search path</firstterm>, which is a list + of schemas to look in. The first matching table in the search path + is taken to be the one wanted. If there is no match in the search + path, an error is reported, even if matching table names exist + in other schemas in the database. + </para> + + <para> + The ability to create like-named objects in different schemas complicates + writing a query that references precisely the same objects every time. It + also opens up the potential for users to change the behavior of other + users' queries, maliciously or accidentally. Due to the prevalence of + unqualified names in queries and their use + in <productname>PostgreSQL</productname> internals, adding a schema + to <varname>search_path</varname> effectively trusts all users having + <literal>CREATE</literal> privilege on that schema. When you run an + ordinary query, a malicious user able to create objects in a schema of + your search path can take control and execute arbitrary SQL functions as + though you executed them. + </para> + + <indexterm> + <primary>schema</primary> + <secondary>current</secondary> + </indexterm> + + <para> + The first schema named in the search path is called the current schema. + Aside from being the first schema searched, it is also the schema in + which new tables will be created if the <command>CREATE TABLE</command> + command does not specify a schema name. + </para> + + <indexterm> + <primary><varname>search_path</varname> configuration parameter</primary> + </indexterm> + + <para> + To show the current search path, use the following command: +<programlisting> +SHOW search_path; +</programlisting> + In the default setup this returns: +<screen> + search_path +-------------- + "$user", public +</screen> + The first element specifies that a schema with the same name as + the current user is to be searched. If no such schema exists, + the entry is ignored. The second element refers to the + public schema that we have seen already. + </para> + + <para> + The first schema in the search path that exists is the default + location for creating new objects. That is the reason that by + default objects are created in the public schema. When objects + are referenced in any other context without schema qualification + (table modification, data modification, or query commands) the + search path is traversed until a matching object is found. + Therefore, in the default configuration, any unqualified access + again can only refer to the public schema. + </para> + + <para> + To put our new schema in the path, we use: +<programlisting> +SET search_path TO myschema,public; +</programlisting> + (We omit the <literal>$user</literal> here because we have no + immediate need for it.) And then we can access the table without + schema qualification: +<programlisting> +DROP TABLE mytable; +</programlisting> + Also, since <literal>myschema</literal> is the first element in + the path, new objects would by default be created in it. + </para> + + <para> + We could also have written: +<programlisting> +SET search_path TO myschema; +</programlisting> + Then we no longer have access to the public schema without + explicit qualification. There is nothing special about the public + schema except that it exists by default. It can be dropped, too. + </para> + + <para> + See also <xref linkend="functions-info"/> for other ways to manipulate + the schema search path. + </para> + + <para> + The search path works in the same way for data type names, function names, + and operator names as it does for table names. Data type and function + names can be qualified in exactly the same way as table names. If you + need to write a qualified operator name in an expression, there is a + special provision: you must write +<synopsis> +<literal>OPERATOR(</literal><replaceable>schema</replaceable><literal>.</literal><replaceable>operator</replaceable><literal>)</literal> +</synopsis> + This is needed to avoid syntactic ambiguity. An example is: +<programlisting> +SELECT 3 OPERATOR(pg_catalog.+) 4; +</programlisting> + In practice one usually relies on the search path for operators, + so as not to have to write anything so ugly as that. + </para> + </sect2> + + <sect2 id="ddl-schemas-priv"> + <title>Schemas and Privileges</title> + + <indexterm zone="ddl-schemas-priv"> + <primary>privilege</primary> + <secondary sortas="schemas">for schemas</secondary> + </indexterm> + + <para> + By default, users cannot access any objects in schemas they do not + own. To allow that, the owner of the schema must grant the + <literal>USAGE</literal> privilege on the schema. By default, everyone + has that privilege on the schema <literal>public</literal>. To allow + users to make use of the objects in a schema, additional privileges might + need to be granted, as appropriate for the object. + </para> + + <para> + A user can also be allowed to create objects in someone else's schema. To + allow that, the <literal>CREATE</literal> privilege on the schema needs to + be granted. In databases upgraded from + <productname>PostgreSQL</productname> 14 or earlier, everyone has that + privilege on the schema <literal>public</literal>. + Some <link linkend="ddl-schemas-patterns">usage patterns</link> call for + revoking that privilege: +<programlisting> +REVOKE CREATE ON SCHEMA public FROM PUBLIC; +</programlisting> + (The first <quote>public</quote> is the schema, the second + <quote>public</quote> means <quote>every user</quote>. In the + first sense it is an identifier, in the second sense it is a + key word, hence the different capitalization; recall the + guidelines from <xref linkend="sql-syntax-identifiers"/>.) + </para> + </sect2> + + <sect2 id="ddl-schemas-catalog"> + <title>The System Catalog Schema</title> + + <indexterm zone="ddl-schemas-catalog"> + <primary>system catalog</primary> + <secondary>schema</secondary> + </indexterm> + + <para> + In addition to <literal>public</literal> and user-created schemas, each + database contains a <literal>pg_catalog</literal> schema, which contains + the system tables and all the built-in data types, functions, and + operators. <literal>pg_catalog</literal> is always effectively part of + the search path. If it is not named explicitly in the path then + it is implicitly searched <emphasis>before</emphasis> searching the path's + schemas. This ensures that built-in names will always be + findable. However, you can explicitly place + <literal>pg_catalog</literal> at the end of your search path if you + prefer to have user-defined names override built-in names. + </para> + + <para> + Since system table names begin with <literal>pg_</literal>, it is best to + avoid such names to ensure that you won't suffer a conflict if some + future version defines a system table named the same as your + table. (With the default search path, an unqualified reference to + your table name would then be resolved as the system table instead.) + System tables will continue to follow the convention of having + names beginning with <literal>pg_</literal>, so that they will not + conflict with unqualified user-table names so long as users avoid + the <literal>pg_</literal> prefix. + </para> + </sect2> + + <sect2 id="ddl-schemas-patterns"> + <title>Usage Patterns</title> + + <para> + Schemas can be used to organize your data in many ways. + A <firstterm>secure schema usage pattern</firstterm> prevents untrusted + users from changing the behavior of other users' queries. When a database + does not use a secure schema usage pattern, users wishing to securely + query that database would take protective action at the beginning of each + session. Specifically, they would begin each session by + setting <varname>search_path</varname> to the empty string or otherwise + removing schemas that are writable by non-superusers + from <varname>search_path</varname>. There are a few usage patterns + easily supported by the default configuration: + <itemizedlist> + <listitem> + <para> + Constrain ordinary users to user-private schemas. + To implement this pattern, first ensure that no schemas have + public <literal>CREATE</literal> privileges. Then, for every user + needing to create non-temporary objects, create a schema with the + same name as that user, for example + <literal>CREATE SCHEMA alice AUTHORIZATION alice</literal>. + (Recall that the default search path starts + with <literal>$user</literal>, which resolves to the user + name. Therefore, if each user has a separate schema, they access + their own schemas by default.) This pattern is a secure schema + usage pattern unless an untrusted user is the database owner or + has been granted <literal>ADMIN OPTION</literal> on a relevant role, + in which case no secure schema usage pattern exists. + </para> + <!-- A database owner can attack the database's users via "CREATE SCHEMA + trojan; ALTER DATABASE $mydb SET search_path = trojan, public;". --> + + <para> + In <productname>PostgreSQL</productname> 15 and later, the default + configuration supports this usage pattern. In prior versions, or + when using a database that has been upgraded from a prior version, + you will need to remove the public <literal>CREATE</literal> + privilege from the <literal>public</literal> schema (issue + <literal>REVOKE CREATE ON SCHEMA public FROM PUBLIC</literal>). + Then consider auditing the <literal>public</literal> schema for + objects named like objects in schema <literal>pg_catalog</literal>. + </para> + <!-- "DROP SCHEMA public" is inferior to this REVOKE, because pg_dump + doesn't preserve that DROP. --> + </listitem> + + <listitem> + <para> + Remove the public schema from the default search path, by modifying + <link linkend="config-setting-configuration-file"><filename>postgresql.conf</filename></link> + or by issuing <literal>ALTER ROLE ALL SET search_path = + "$user"</literal>. Then, grant privileges to create in the public + schema. Only qualified names will choose public schema objects. While + qualified table references are fine, calls to functions in the public + schema <link linkend="typeconv-func">will be unsafe or + unreliable</link>. If you create functions or extensions in the public + schema, use the first pattern instead. Otherwise, like the first + pattern, this is secure unless an untrusted user is the database owner + or has been granted <literal>ADMIN OPTION</literal> on a relevant role. + </para> + </listitem> + + <listitem> + <para> + Keep the default search path, and grant privileges to create in the + public schema. All users access the public schema implicitly. This + simulates the situation where schemas are not available at all, giving + a smooth transition from the non-schema-aware world. However, this is + never a secure pattern. It is acceptable only when the database has a + single user or a few mutually-trusting users. In databases upgraded + from <productname>PostgreSQL</productname> 14 or earlier, this is the + default. + </para> + </listitem> + </itemizedlist> + </para> + + <para> + For any pattern, to install shared applications (tables to be used by + everyone, additional functions provided by third parties, etc.), put them + into separate schemas. Remember to grant appropriate privileges to allow + the other users to access them. Users can then refer to these additional + objects by qualifying the names with a schema name, or they can put the + additional schemas into their search path, as they choose. + </para> + </sect2> + + <sect2 id="ddl-schemas-portability"> + <title>Portability</title> + + <para> + In the SQL standard, the notion of objects in the same schema + being owned by different users does not exist. Moreover, some + implementations do not allow you to create schemas that have a + different name than their owner. In fact, the concepts of schema + and user are nearly equivalent in a database system that + implements only the basic schema support specified in the + standard. Therefore, many users consider qualified names to + really consist of + <literal><replaceable>user_name</replaceable>.<replaceable>table_name</replaceable></literal>. + This is how <productname>PostgreSQL</productname> will effectively + behave if you create a per-user schema for every user. + </para> + + <para> + Also, there is no concept of a <literal>public</literal> schema in the + SQL standard. For maximum conformance to the standard, you should + not use the <literal>public</literal> schema. + </para> + + <para> + Of course, some SQL database systems might not implement schemas + at all, or provide namespace support by allowing (possibly + limited) cross-database access. If you need to work with those + systems, then maximum portability would be achieved by not using + schemas at all. + </para> + </sect2> + </sect1> + + <sect1 id="ddl-inherit"> + <title>Inheritance</title> + + <indexterm> + <primary>inheritance</primary> + </indexterm> + + <indexterm> + <primary>table</primary> + <secondary>inheritance</secondary> + </indexterm> + + <para> + <productname>PostgreSQL</productname> implements table inheritance, + which can be a useful tool for database designers. (SQL:1999 and + later define a type inheritance feature, which differs in many + respects from the features described here.) + </para> + + <para> + Let's start with an example: suppose we are trying to build a data + model for cities. Each state has many cities, but only one + capital. We want to be able to quickly retrieve the capital city + for any particular state. This can be done by creating two tables, + one for state capitals and one for cities that are not + capitals. However, what happens when we want to ask for data about + a city, regardless of whether it is a capital or not? The + inheritance feature can help to resolve this problem. We define the + <structname>capitals</structname> table so that it inherits from + <structname>cities</structname>: + +<programlisting> +CREATE TABLE cities ( + name text, + population float, + elevation int -- in feet +); + +CREATE TABLE capitals ( + state char(2) +) INHERITS (cities); +</programlisting> + + In this case, the <structname>capitals</structname> table <firstterm>inherits</firstterm> + all the columns of its parent table, <structname>cities</structname>. State + capitals also have an extra column, <structfield>state</structfield>, that shows + their state. + </para> + + <para> + In <productname>PostgreSQL</productname>, a table can inherit from + zero or more other tables, and a query can reference either all + rows of a table or all rows of a table plus all of its descendant tables. + The latter behavior is the default. + For example, the following query finds the names of all cities, + including state capitals, that are located at an elevation over + 500 feet: + +<programlisting> +SELECT name, elevation + FROM cities + WHERE elevation > 500; +</programlisting> + + Given the sample data from the <productname>PostgreSQL</productname> + tutorial (see <xref linkend="tutorial-sql-intro"/>), this returns: + +<programlisting> + name | elevation +-----------+----------- + Las Vegas | 2174 + Mariposa | 1953 + Madison | 845 +</programlisting> + </para> + + <para> + On the other hand, the following query finds all the cities that + are not state capitals and are situated at an elevation over 500 feet: + +<programlisting> +SELECT name, elevation + FROM ONLY cities + WHERE elevation > 500; + + name | elevation +-----------+----------- + Las Vegas | 2174 + Mariposa | 1953 +</programlisting> + </para> + + <para> + Here the <literal>ONLY</literal> keyword indicates that the query + should apply only to <structname>cities</structname>, and not any tables + below <structname>cities</structname> in the inheritance hierarchy. Many + of the commands that we have already discussed — + <command>SELECT</command>, <command>UPDATE</command> and + <command>DELETE</command> — support the + <literal>ONLY</literal> keyword. + </para> + + <para> + You can also write the table name with a trailing <literal>*</literal> + to explicitly specify that descendant tables are included: + +<programlisting> +SELECT name, elevation + FROM cities* + WHERE elevation > 500; +</programlisting> + + Writing <literal>*</literal> is not necessary, since this behavior is always + the default. However, this syntax is still supported for + compatibility with older releases where the default could be changed. + </para> + + <para> + In some cases you might wish to know which table a particular row + originated from. There is a system column called + <structfield>tableoid</structfield> in each table which can tell you the + originating table: + +<programlisting> +SELECT c.tableoid, c.name, c.elevation +FROM cities c +WHERE c.elevation > 500; +</programlisting> + + which returns: + +<programlisting> + tableoid | name | elevation +----------+-----------+----------- + 139793 | Las Vegas | 2174 + 139793 | Mariposa | 1953 + 139798 | Madison | 845 +</programlisting> + + (If you try to reproduce this example, you will probably get + different numeric OIDs.) By doing a join with + <structname>pg_class</structname> you can see the actual table names: + +<programlisting> +SELECT p.relname, c.name, c.elevation +FROM cities c, pg_class p +WHERE c.elevation > 500 AND c.tableoid = p.oid; +</programlisting> + + which returns: + +<programlisting> + relname | name | elevation +----------+-----------+----------- + cities | Las Vegas | 2174 + cities | Mariposa | 1953 + capitals | Madison | 845 +</programlisting> + </para> + + <para> + Another way to get the same effect is to use the <type>regclass</type> + alias type, which will print the table OID symbolically: + +<programlisting> +SELECT c.tableoid::regclass, c.name, c.elevation +FROM cities c +WHERE c.elevation > 500; +</programlisting> + </para> + + <para> + Inheritance does not automatically propagate data from + <command>INSERT</command> or <command>COPY</command> commands to + other tables in the inheritance hierarchy. In our example, the + following <command>INSERT</command> statement will fail: +<programlisting> +INSERT INTO cities (name, population, elevation, state) +VALUES ('Albany', NULL, NULL, 'NY'); +</programlisting> + We might hope that the data would somehow be routed to the + <structname>capitals</structname> table, but this does not happen: + <command>INSERT</command> always inserts into exactly the table + specified. In some cases it is possible to redirect the insertion + using a rule (see <xref linkend="rules"/>). However that does not + help for the above case because the <structname>cities</structname> table + does not contain the column <structfield>state</structfield>, and so the + command will be rejected before the rule can be applied. + </para> + + <para> + All check constraints and not-null constraints on a parent table are + automatically inherited by its children, unless explicitly specified + otherwise with <literal>NO INHERIT</literal> clauses. Other types of constraints + (unique, primary key, and foreign key constraints) are not inherited. + </para> + + <para> + A table can inherit from more than one parent table, in which case it has + the union of the columns defined by the parent tables. Any columns + declared in the child table's definition are added to these. If the + same column name appears in multiple parent tables, or in both a parent + table and the child's definition, then these columns are <quote>merged</quote> + so that there is only one such column in the child table. To be merged, + columns must have the same data types, else an error is raised. + Inheritable check constraints and not-null constraints are merged in a + similar fashion. Thus, for example, a merged column will be marked + not-null if any one of the column definitions it came from is marked + not-null. Check constraints are merged if they have the same name, + and the merge will fail if their conditions are different. + </para> + + <para> + Table inheritance is typically established when the child table is + created, using the <literal>INHERITS</literal> clause of the + <link linkend="sql-createtable"><command>CREATE TABLE</command></link> + statement. + Alternatively, a table which is already defined in a compatible way can + have a new parent relationship added, using the <literal>INHERIT</literal> + variant of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>. + To do this the new child table must already include columns with + the same names and types as the columns of the parent. It must also include + check constraints with the same names and check expressions as those of the + parent. Similarly an inheritance link can be removed from a child using the + <literal>NO INHERIT</literal> variant of <command>ALTER TABLE</command>. + Dynamically adding and removing inheritance links like this can be useful + when the inheritance relationship is being used for table + partitioning (see <xref linkend="ddl-partitioning"/>). + </para> + + <para> + One convenient way to create a compatible table that will later be made + a new child is to use the <literal>LIKE</literal> clause in <command>CREATE + TABLE</command>. This creates a new table with the same columns as + the source table. If there are any <literal>CHECK</literal> + constraints defined on the source table, the <literal>INCLUDING + CONSTRAINTS</literal> option to <literal>LIKE</literal> should be + specified, as the new child must have constraints matching the parent + to be considered compatible. + </para> + + <para> + A parent table cannot be dropped while any of its children remain. Neither + can columns or check constraints of child tables be dropped or altered + if they are inherited + from any parent tables. If you wish to remove a table and all of its + descendants, one easy way is to drop the parent table with the + <literal>CASCADE</literal> option (see <xref linkend="ddl-depend"/>). + </para> + + <para> + <command>ALTER TABLE</command> will + propagate any changes in column data definitions and check + constraints down the inheritance hierarchy. Again, dropping + columns that are depended on by other tables is only possible when using + the <literal>CASCADE</literal> option. <command>ALTER + TABLE</command> follows the same rules for duplicate column merging + and rejection that apply during <command>CREATE TABLE</command>. + </para> + + <para> + Inherited queries perform access permission checks on the parent table + only. Thus, for example, granting <literal>UPDATE</literal> permission on + the <structname>cities</structname> table implies permission to update rows in + the <structname>capitals</structname> table as well, when they are + accessed through <structname>cities</structname>. This preserves the appearance + that the data is (also) in the parent table. But + the <structname>capitals</structname> table could not be updated directly + without an additional grant. In a similar way, the parent table's row + security policies (see <xref linkend="ddl-rowsecurity"/>) are applied to + rows coming from child tables during an inherited query. A child table's + policies, if any, are applied only when it is the table explicitly named + in the query; and in that case, any policies attached to its parent(s) are + ignored. + </para> + + <para> + Foreign tables (see <xref linkend="ddl-foreign-data"/>) can also + be part of inheritance hierarchies, either as parent or child + tables, just as regular tables can be. If a foreign table is part + of an inheritance hierarchy then any operations not supported by + the foreign table are not supported on the whole hierarchy either. + </para> + + <sect2 id="ddl-inherit-caveats"> + <title>Caveats</title> + + <para> + Note that not all SQL commands are able to work on + inheritance hierarchies. Commands that are used for data querying, + data modification, or schema modification + (e.g., <literal>SELECT</literal>, <literal>UPDATE</literal>, <literal>DELETE</literal>, + most variants of <literal>ALTER TABLE</literal>, but + not <literal>INSERT</literal> or <literal>ALTER TABLE ... + RENAME</literal>) typically default to including child tables and + support the <literal>ONLY</literal> notation to exclude them. + Commands that do database maintenance and tuning + (e.g., <literal>REINDEX</literal>, <literal>VACUUM</literal>) + typically only work on individual, physical tables and do not + support recursing over inheritance hierarchies. The respective + behavior of each individual command is documented in its reference + page (<xref linkend="sql-commands"/>). + </para> + + <para> + A serious limitation of the inheritance feature is that indexes (including + unique constraints) and foreign key constraints only apply to single + tables, not to their inheritance children. This is true on both the + referencing and referenced sides of a foreign key constraint. Thus, + in the terms of the above example: + + <itemizedlist> + <listitem> + <para> + If we declared <structname>cities</structname>.<structfield>name</structfield> to be + <literal>UNIQUE</literal> or a <literal>PRIMARY KEY</literal>, this would not stop the + <structname>capitals</structname> table from having rows with names duplicating + rows in <structname>cities</structname>. And those duplicate rows would by + default show up in queries from <structname>cities</structname>. In fact, by + default <structname>capitals</structname> would have no unique constraint at all, + and so could contain multiple rows with the same name. + You could add a unique constraint to <structname>capitals</structname>, but this + would not prevent duplication compared to <structname>cities</structname>. + </para> + </listitem> + + <listitem> + <para> + Similarly, if we were to specify that + <structname>cities</structname>.<structfield>name</structfield> <literal>REFERENCES</literal> some + other table, this constraint would not automatically propagate to + <structname>capitals</structname>. In this case you could work around it by + manually adding the same <literal>REFERENCES</literal> constraint to + <structname>capitals</structname>. + </para> + </listitem> + + <listitem> + <para> + Specifying that another table's column <literal>REFERENCES + cities(name)</literal> would allow the other table to contain city names, but + not capital names. There is no good workaround for this case. + </para> + </listitem> + </itemizedlist> + + Some functionality not implemented for inheritance hierarchies is + implemented for declarative partitioning. + Considerable care is needed in deciding whether partitioning with legacy + inheritance is useful for your application. + </para> + + </sect2> + </sect1> + + <sect1 id="ddl-partitioning"> + <title>Table Partitioning</title> + + <indexterm> + <primary>partitioning</primary> + </indexterm> + + <indexterm> + <primary>table</primary> + <secondary>partitioning</secondary> + </indexterm> + + <indexterm> + <primary>partitioned table</primary> + </indexterm> + + <para> + <productname>PostgreSQL</productname> supports basic table + partitioning. This section describes why and how to implement + partitioning as part of your database design. + </para> + + <sect2 id="ddl-partitioning-overview"> + <title>Overview</title> + + <para> + Partitioning refers to splitting what is logically one large table into + smaller physical pieces. Partitioning can provide several benefits: + <itemizedlist> + <listitem> + <para> + Query performance can be improved dramatically in certain situations, + particularly when most of the heavily accessed rows of the table are in a + single partition or a small number of partitions. Partitioning + effectively substitutes for the upper tree levels of indexes, + making it more likely that the heavily-used parts of the indexes + fit in memory. + </para> + </listitem> + + <listitem> + <para> + When queries or updates access a large percentage of a single + partition, performance can be improved by using a + sequential scan of that partition instead of using an + index, which would require random-access reads scattered across the + whole table. + </para> + </listitem> + + <listitem> + <para> + Bulk loads and deletes can be accomplished by adding or removing + partitions, if the usage pattern is accounted for in the + partitioning design. Dropping an individual partition + using <command>DROP TABLE</command>, or doing <command>ALTER TABLE + DETACH PARTITION</command>, is far faster than a bulk + operation. These commands also entirely avoid the + <command>VACUUM</command> overhead caused by a bulk <command>DELETE</command>. + </para> + </listitem> + + <listitem> + <para> + Seldom-used data can be migrated to cheaper and slower storage media. + </para> + </listitem> + </itemizedlist> + + These benefits will normally be worthwhile only when a table would + otherwise be very large. The exact point at which a table will + benefit from partitioning depends on the application, although a + rule of thumb is that the size of the table should exceed the physical + memory of the database server. + </para> + + <para> + <productname>PostgreSQL</productname> offers built-in support for the + following forms of partitioning: + + <variablelist> + <varlistentry id="ddl-partitioning-overview-range"> + <term>Range Partitioning</term> + + <listitem> + <para> + The table is partitioned into <quote>ranges</quote> defined + by a key column or set of columns, with no overlap between + the ranges of values assigned to different partitions. For + example, one might partition by date ranges, or by ranges of + identifiers for particular business objects. + Each range's bounds are understood as being inclusive at the + lower end and exclusive at the upper end. For example, if one + partition's range is from <literal>1</literal> + to <literal>10</literal>, and the next one's range is + from <literal>10</literal> to <literal>20</literal>, then + value <literal>10</literal> belongs to the second partition not + the first. + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-partitioning-overview-list"> + <term>List Partitioning</term> + + <listitem> + <para> + The table is partitioned by explicitly listing which key value(s) + appear in each partition. + </para> + </listitem> + </varlistentry> + + <varlistentry id="ddl-partitioning-overview-hash"> + <term>Hash Partitioning</term> + + <listitem> + <para> + The table is partitioned by specifying a modulus and a remainder for + each partition. Each partition will hold the rows for which the hash + value of the partition key divided by the specified modulus will + produce the specified remainder. + </para> + </listitem> + </varlistentry> + </variablelist> + + If your application needs to use other forms of partitioning not listed + above, alternative methods such as inheritance and + <literal>UNION ALL</literal> views can be used instead. Such methods + offer flexibility but do not have some of the performance benefits + of built-in declarative partitioning. + </para> + </sect2> + + <sect2 id="ddl-partitioning-declarative"> + <title>Declarative Partitioning</title> + + <para> + <productname>PostgreSQL</productname> allows you to declare + that a table is divided into partitions. The table that is divided + is referred to as a <firstterm>partitioned table</firstterm>. The + declaration includes the <firstterm>partitioning method</firstterm> + as described above, plus a list of columns or expressions to be used + as the <firstterm>partition key</firstterm>. + </para> + + <para> + The partitioned table itself is a <quote>virtual</quote> table having + no storage of its own. Instead, the storage belongs + to <firstterm>partitions</firstterm>, which are otherwise-ordinary + tables associated with the partitioned table. + Each partition stores a subset of the data as defined by its + <firstterm>partition bounds</firstterm>. + All rows inserted into a partitioned table will be routed to the + appropriate one of the partitions based on the values of the partition + key column(s). + Updating the partition key of a row will cause it to be moved into a + different partition if it no longer satisfies the partition bounds + of its original partition. + </para> + + <para> + Partitions may themselves be defined as partitioned tables, resulting + in <firstterm>sub-partitioning</firstterm>. Although all partitions + must have the same columns as their partitioned parent, partitions may + have their + own indexes, constraints and default values, distinct from those of other + partitions. See <xref linkend="sql-createtable"/> for more details on + creating partitioned tables and partitions. + </para> + + <para> + It is not possible to turn a regular table into a partitioned table or + vice versa. However, it is possible to add an existing regular or + partitioned table as a partition of a partitioned table, or remove a + partition from a partitioned table turning it into a standalone table; + this can simplify and speed up many maintenance processes. + See <xref linkend="sql-altertable"/> to learn more about the + <command>ATTACH PARTITION</command> and <command>DETACH PARTITION</command> + sub-commands. + </para> + + <para> + Partitions can also be <link linkend="ddl-foreign-data">foreign + tables</link>, although considerable care is needed because it is then + the user's responsibility that the contents of the foreign table + satisfy the partitioning rule. There are some other restrictions as + well. See <xref linkend="sql-createforeigntable"/> for more + information. + </para> + + <sect3 id="ddl-partitioning-declarative-example"> + <title>Example</title> + + <para> + Suppose we are constructing a database for a large ice cream company. + The company measures peak temperatures every day as well as ice cream + sales in each region. Conceptually, we want a table like: + +<programlisting> +CREATE TABLE measurement ( + city_id int not null, + logdate date not null, + peaktemp int, + unitsales int +); +</programlisting> + + We know that most queries will access just the last week's, month's or + quarter's data, since the main use of this table will be to prepare + online reports for management. To reduce the amount of old data that + needs to be stored, we decide to keep only the most recent 3 years + worth of data. At the beginning of each month we will remove the oldest + month's data. In this situation we can use partitioning to help us meet + all of our different requirements for the measurements table. + </para> + + <para> + To use declarative partitioning in this case, use the following steps: + + <orderedlist spacing="compact"> + <listitem> + <para> + Create the <structname>measurement</structname> table as a partitioned + table by specifying the <literal>PARTITION BY</literal> clause, which + includes the partitioning method (<literal>RANGE</literal> in this + case) and the list of column(s) to use as the partition key. + +<programlisting> +CREATE TABLE measurement ( + city_id int not null, + logdate date not null, + peaktemp int, + unitsales int +) PARTITION BY RANGE (logdate); +</programlisting> + </para> + </listitem> + + <listitem> + <para> + Create partitions. Each partition's definition must specify bounds + that correspond to the partitioning method and partition key of the + parent. Note that specifying bounds such that the new partition's + values would overlap with those in one or more existing partitions will + cause an error. + </para> + + <para> + Partitions thus created are in every way normal + <productname>PostgreSQL</productname> + tables (or, possibly, foreign tables). It is possible to specify a + tablespace and storage parameters for each partition separately. + </para> + + <para> + For our example, each partition should hold one month's worth of + data, to match the requirement of deleting one month's data at a + time. So the commands might look like: + +<programlisting> +CREATE TABLE measurement_y2006m02 PARTITION OF measurement + FOR VALUES FROM ('2006-02-01') TO ('2006-03-01'); + +CREATE TABLE measurement_y2006m03 PARTITION OF measurement + FOR VALUES FROM ('2006-03-01') TO ('2006-04-01'); + +... +CREATE TABLE measurement_y2007m11 PARTITION OF measurement + FOR VALUES FROM ('2007-11-01') TO ('2007-12-01'); + +CREATE TABLE measurement_y2007m12 PARTITION OF measurement + FOR VALUES FROM ('2007-12-01') TO ('2008-01-01') + TABLESPACE fasttablespace; + +CREATE TABLE measurement_y2008m01 PARTITION OF measurement + FOR VALUES FROM ('2008-01-01') TO ('2008-02-01') + WITH (parallel_workers = 4) + TABLESPACE fasttablespace; +</programlisting> + + (Recall that adjacent partitions can share a bound value, since + range upper bounds are treated as exclusive bounds.) + </para> + + <para> + If you wish to implement sub-partitioning, again specify the + <literal>PARTITION BY</literal> clause in the commands used to create + individual partitions, for example: + +<programlisting> +CREATE TABLE measurement_y2006m02 PARTITION OF measurement + FOR VALUES FROM ('2006-02-01') TO ('2006-03-01') + PARTITION BY RANGE (peaktemp); +</programlisting> + + After creating partitions of <structname>measurement_y2006m02</structname>, + any data inserted into <structname>measurement</structname> that is mapped to + <structname>measurement_y2006m02</structname> (or data that is + directly inserted into <structname>measurement_y2006m02</structname>, + which is allowed provided its partition constraint is satisfied) + will be further redirected to one of its + partitions based on the <structfield>peaktemp</structfield> column. The partition + key specified may overlap with the parent's partition key, although + care should be taken when specifying the bounds of a sub-partition + such that the set of data it accepts constitutes a subset of what + the partition's own bounds allow; the system does not try to check + whether that's really the case. + </para> + + <para> + Inserting data into the parent table that does not map + to one of the existing partitions will cause an error; an appropriate + partition must be added manually. + </para> + + <para> + It is not necessary to manually create table constraints describing + the partition boundary conditions for partitions. Such constraints + will be created automatically. + </para> + </listitem> + + <listitem> + <para> + Create an index on the key column(s), as well as any other indexes you + might want, on the partitioned table. (The key index is not strictly + necessary, but in most scenarios it is helpful.) + This automatically creates a matching index on each partition, and + any partitions you create or attach later will also have such an + index. + An index or unique constraint declared on a partitioned table + is <quote>virtual</quote> in the same way that the partitioned table + is: the actual data is in child indexes on the individual partition + tables. + +<programlisting> +CREATE INDEX ON measurement (logdate); +</programlisting> + </para> + </listitem> + + <listitem> + <para> + Ensure that the <xref linkend="guc-enable-partition-pruning"/> + configuration parameter is not disabled in <filename>postgresql.conf</filename>. + If it is, queries will not be optimized as desired. + </para> + </listitem> + </orderedlist> + </para> + + <para> + In the above example we would be creating a new partition each month, so + it might be wise to write a script that generates the required DDL + automatically. + </para> + </sect3> + + <sect3 id="ddl-partitioning-declarative-maintenance"> + <title>Partition Maintenance</title> + + <para> + Normally the set of partitions established when initially defining the + table is not intended to remain static. It is common to want to + remove partitions holding old data and periodically add new partitions for + new data. One of the most important advantages of partitioning is + precisely that it allows this otherwise painful task to be executed + nearly instantaneously by manipulating the partition structure, rather + than physically moving large amounts of data around. + </para> + + <para> + The simplest option for removing old data is to drop the partition that + is no longer necessary: +<programlisting> +DROP TABLE measurement_y2006m02; +</programlisting> + This can very quickly delete millions of records because it doesn't have + to individually delete every record. Note however that the above command + requires taking an <literal>ACCESS EXCLUSIVE</literal> lock on the parent + table. + </para> + + <para> + Another option that is often preferable is to remove the partition from + the partitioned table but retain access to it as a table in its own + right. This has two forms: + +<programlisting> +ALTER TABLE measurement DETACH PARTITION measurement_y2006m02; +ALTER TABLE measurement DETACH PARTITION measurement_y2006m02 CONCURRENTLY; +</programlisting> + + These allow further operations to be performed on the data before + it is dropped. For example, this is often a useful time to back up + the data using <command>COPY</command>, <application>pg_dump</application>, or + similar tools. It might also be a useful time to aggregate data + into smaller formats, perform other data manipulations, or run + reports. The first form of the command requires an + <literal>ACCESS EXCLUSIVE</literal> lock on the parent table. + Adding the <literal>CONCURRENTLY</literal> qualifier as in the second + form allows the detach operation to require only + <literal>SHARE UPDATE EXCLUSIVE</literal> lock on the parent table, but see + <link linkend="sql-altertable-detach-partition"><literal>ALTER TABLE ... DETACH PARTITION</literal></link> + for details on the restrictions. + </para> + + <para> + Similarly we can add a new partition to handle new data. We can create an + empty partition in the partitioned table just as the original partitions + were created above: + +<programlisting> +CREATE TABLE measurement_y2008m02 PARTITION OF measurement + FOR VALUES FROM ('2008-02-01') TO ('2008-03-01') + TABLESPACE fasttablespace; +</programlisting> + + As an alternative, it is sometimes more convenient to create the + new table outside the partition structure, and attach it as a + partition later. This allows new data to be loaded, checked, and + transformed prior to it appearing in the partitioned table. + Moreover, the <literal>ATTACH PARTITION</literal> operation requires + only <literal>SHARE UPDATE EXCLUSIVE</literal> lock on the + partitioned table, as opposed to the <literal>ACCESS + EXCLUSIVE</literal> lock that is required by <command>CREATE TABLE + ... PARTITION OF</command>, so it is more friendly to concurrent + operations on the partitioned table. + The <literal>CREATE TABLE ... LIKE</literal> option is helpful + to avoid tediously repeating the parent table's definition: + +<programlisting> +CREATE TABLE measurement_y2008m02 + (LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS) + TABLESPACE fasttablespace; + +ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02 + CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01' ); + +\copy measurement_y2008m02 from 'measurement_y2008m02' +-- possibly some other data preparation work + +ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02 + FOR VALUES FROM ('2008-02-01') TO ('2008-03-01' ); +</programlisting> + </para> + + <para> + Before running the <command>ATTACH PARTITION</command> command, it is + recommended to create a <literal>CHECK</literal> constraint on the table to + be attached that matches the expected partition constraint, as + illustrated above. That way, the system will be able to skip the scan + which is otherwise needed to validate the implicit + partition constraint. Without the <literal>CHECK</literal> constraint, + the table will be scanned to validate the partition constraint while + holding an <literal>ACCESS EXCLUSIVE</literal> lock on that partition. + It is recommended to drop the now-redundant <literal>CHECK</literal> + constraint after the <command>ATTACH PARTITION</command> is complete. If + the table being attached is itself a partitioned table, then each of its + sub-partitions will be recursively locked and scanned until either a + suitable <literal>CHECK</literal> constraint is encountered or the leaf + partitions are reached. + </para> + + <para> + Similarly, if the partitioned table has a <literal>DEFAULT</literal> + partition, it is recommended to create a <literal>CHECK</literal> + constraint which excludes the to-be-attached partition's constraint. If + this is not done then the <literal>DEFAULT</literal> partition will be + scanned to verify that it contains no records which should be located in + the partition being attached. This operation will be performed whilst + holding an <literal>ACCESS EXCLUSIVE</literal> lock on the <literal> + DEFAULT</literal> partition. If the <literal>DEFAULT</literal> partition + is itself a partitioned table, then each of its partitions will be + recursively checked in the same way as the table being attached, as + mentioned above. + </para> + + <para> + As explained above, it is possible to create indexes on partitioned tables + so that they are applied automatically to the entire hierarchy. + This is very + convenient, as not only will the existing partitions become indexed, but + also any partitions that are created in the future will. One limitation is + that it's not possible to use the <literal>CONCURRENTLY</literal> + qualifier when creating such a partitioned index. To avoid long lock + times, it is possible to use <command>CREATE INDEX ON ONLY</command> + the partitioned table; such an index is marked invalid, and the partitions + do not get the index applied automatically. The indexes on partitions can + be created individually using <literal>CONCURRENTLY</literal>, and then + <firstterm>attached</firstterm> to the index on the parent using + <command>ALTER INDEX .. ATTACH PARTITION</command>. Once indexes for all + partitions are attached to the parent index, the parent index is marked + valid automatically. Example: +<programlisting> +CREATE INDEX measurement_usls_idx ON ONLY measurement (unitsales); + +CREATE INDEX CONCURRENTLY measurement_usls_200602_idx + ON measurement_y2006m02 (unitsales); +ALTER INDEX measurement_usls_idx + ATTACH PARTITION measurement_usls_200602_idx; +... +</programlisting> + + This technique can be used with <literal>UNIQUE</literal> and + <literal>PRIMARY KEY</literal> constraints too; the indexes are created + implicitly when the constraint is created. Example: +<programlisting> +ALTER TABLE ONLY measurement ADD UNIQUE (city_id, logdate); + +ALTER TABLE measurement_y2006m02 ADD UNIQUE (city_id, logdate); +ALTER INDEX measurement_city_id_logdate_key + ATTACH PARTITION measurement_y2006m02_city_id_logdate_key; +... +</programlisting> + </para> + </sect3> + + <sect3 id="ddl-partitioning-declarative-limitations"> + <title>Limitations</title> + + <para> + The following limitations apply to partitioned tables: + <itemizedlist> + <listitem> + <para> + To create a unique or primary key constraint on a partitioned table, + the partition keys must not include any expressions or function calls + and the constraint's columns must include all of the partition key + columns. This limitation exists because the individual indexes making + up the constraint can only directly enforce uniqueness within their own + partitions; therefore, the partition structure itself must guarantee + that there are not duplicates in different partitions. + </para> + </listitem> + + <listitem> + <para> + There is no way to create an exclusion constraint spanning the + whole partitioned table. It is only possible to put such a + constraint on each leaf partition individually. Again, this + limitation stems from not being able to enforce cross-partition + restrictions. + </para> + </listitem> + + <listitem> + <para> + <literal>BEFORE ROW</literal> triggers on <literal>INSERT</literal> + cannot change which partition is the final destination for a new row. + </para> + </listitem> + + <listitem> + <para> + Mixing temporary and permanent relations in the same partition tree is + not allowed. Hence, if the partitioned table is permanent, so must be + its partitions and likewise if the partitioned table is temporary. When + using temporary relations, all members of the partition tree have to be + from the same session. + </para> + </listitem> + </itemizedlist> + </para> + + <para> + Individual partitions are linked to their partitioned table using + inheritance behind-the-scenes. However, it is not possible to use + all of the generic features of inheritance with declaratively + partitioned tables or their partitions, as discussed below. Notably, + a partition cannot have any parents other than the partitioned table + it is a partition of, nor can a table inherit from both a partitioned + table and a regular table. That means partitioned tables and their + partitions never share an inheritance hierarchy with regular tables. + </para> + + <para> + Since a partition hierarchy consisting of the partitioned table and its + partitions is still an inheritance hierarchy, + <structfield>tableoid</structfield> and all the normal rules of + inheritance apply as described in <xref linkend="ddl-inherit"/>, with + a few exceptions: + + <itemizedlist> + <listitem> + <para> + Partitions cannot have columns that are not present in the parent. It + is not possible to specify columns when creating partitions with + <command>CREATE TABLE</command>, nor is it possible to add columns to + partitions after-the-fact using <command>ALTER TABLE</command>. + Tables may be added as a partition with <command>ALTER TABLE + ... ATTACH PARTITION</command> only if their columns exactly match + the parent. + </para> + </listitem> + + <listitem> + <para> + Both <literal>CHECK</literal> and <literal>NOT NULL</literal> + constraints of a partitioned table are always inherited by all its + partitions. <literal>CHECK</literal> constraints that are marked + <literal>NO INHERIT</literal> are not allowed to be created on + partitioned tables. + You cannot drop a <literal>NOT NULL</literal> constraint on a + partition's column if the same constraint is present in the parent + table. + </para> + </listitem> + + <listitem> + <para> + Using <literal>ONLY</literal> to add or drop a constraint on only + the partitioned table is supported as long as there are no + partitions. Once partitions exist, using <literal>ONLY</literal> + will result in an error for any constraints other than + <literal>UNIQUE</literal> and <literal>PRIMARY KEY</literal>. + Instead, constraints on the partitions + themselves can be added and (if they are not present in the parent + table) dropped. + </para> + </listitem> + + <listitem> + <para> + As a partitioned table does not have any data itself, attempts to use + <command>TRUNCATE</command> <literal>ONLY</literal> on a partitioned + table will always return an error. + </para> + </listitem> + </itemizedlist> + </para> + </sect3> + </sect2> + + <sect2 id="ddl-partitioning-using-inheritance"> + <title>Partitioning Using Inheritance</title> + + <para> + While the built-in declarative partitioning is suitable for most + common use cases, there are some circumstances where a more flexible + approach may be useful. Partitioning can be implemented using table + inheritance, which allows for several features not supported + by declarative partitioning, such as: + + <itemizedlist> + <listitem> + <para> + For declarative partitioning, partitions must have exactly the same set + of columns as the partitioned table, whereas with table inheritance, + child tables may have extra columns not present in the parent. + </para> + </listitem> + + <listitem> + <para> + Table inheritance allows for multiple inheritance. + </para> + </listitem> + + <listitem> + <para> + Declarative partitioning only supports range, list and hash + partitioning, whereas table inheritance allows data to be divided in a + manner of the user's choosing. (Note, however, that if constraint + exclusion is unable to prune child tables effectively, query performance + might be poor.) + </para> + </listitem> + </itemizedlist> + </para> + + <sect3 id="ddl-partitioning-inheritance-example"> + <title>Example</title> + + <para> + This example builds a partitioning structure equivalent to the + declarative partitioning example above. Use + the following steps: + + <orderedlist spacing="compact"> + <listitem> + <para> + Create the <quote>root</quote> table, from which all of the + <quote>child</quote> tables will inherit. This table will contain no data. Do not + define any check constraints on this table, unless you intend them + to be applied equally to all child tables. There is no point in + defining any indexes or unique constraints on it, either. For our + example, the root table is the <structname>measurement</structname> + table as originally defined: + +<programlisting> +CREATE TABLE measurement ( + city_id int not null, + logdate date not null, + peaktemp int, + unitsales int +); +</programlisting> + </para> + </listitem> + + <listitem> + <para> + Create several <quote>child</quote> tables that each inherit from + the root table. Normally, these tables will not add any columns + to the set inherited from the root. Just as with declarative + partitioning, these tables are in every way normal + <productname>PostgreSQL</productname> tables (or foreign tables). + </para> + + <para> +<programlisting> +CREATE TABLE measurement_y2006m02 () INHERITS (measurement); +CREATE TABLE measurement_y2006m03 () INHERITS (measurement); +... +CREATE TABLE measurement_y2007m11 () INHERITS (measurement); +CREATE TABLE measurement_y2007m12 () INHERITS (measurement); +CREATE TABLE measurement_y2008m01 () INHERITS (measurement); +</programlisting> + </para> + </listitem> + + <listitem> + <para> + Add non-overlapping table constraints to the child tables to + define the allowed key values in each. + </para> + + <para> + Typical examples would be: +<programlisting> +CHECK ( x = 1 ) +CHECK ( county IN ( 'Oxfordshire', 'Buckinghamshire', 'Warwickshire' )) +CHECK ( outletID >= 100 AND outletID < 200 ) +</programlisting> + Ensure that the constraints guarantee that there is no overlap + between the key values permitted in different child tables. A common + mistake is to set up range constraints like: +<programlisting> +CHECK ( outletID BETWEEN 100 AND 200 ) +CHECK ( outletID BETWEEN 200 AND 300 ) +</programlisting> + This is wrong since it is not clear which child table the key + value 200 belongs in. + Instead, ranges should be defined in this style: + +<programlisting> +CREATE TABLE measurement_y2006m02 ( + CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' ) +) INHERITS (measurement); + +CREATE TABLE measurement_y2006m03 ( + CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' ) +) INHERITS (measurement); + +... +CREATE TABLE measurement_y2007m11 ( + CHECK ( logdate >= DATE '2007-11-01' AND logdate < DATE '2007-12-01' ) +) INHERITS (measurement); + +CREATE TABLE measurement_y2007m12 ( + CHECK ( logdate >= DATE '2007-12-01' AND logdate < DATE '2008-01-01' ) +) INHERITS (measurement); + +CREATE TABLE measurement_y2008m01 ( + CHECK ( logdate >= DATE '2008-01-01' AND logdate < DATE '2008-02-01' ) +) INHERITS (measurement); +</programlisting> + </para> + </listitem> + + <listitem> + <para> + For each child table, create an index on the key column(s), + as well as any other indexes you might want. +<programlisting> +CREATE INDEX measurement_y2006m02_logdate ON measurement_y2006m02 (logdate); +CREATE INDEX measurement_y2006m03_logdate ON measurement_y2006m03 (logdate); +CREATE INDEX measurement_y2007m11_logdate ON measurement_y2007m11 (logdate); +CREATE INDEX measurement_y2007m12_logdate ON measurement_y2007m12 (logdate); +CREATE INDEX measurement_y2008m01_logdate ON measurement_y2008m01 (logdate); +</programlisting> + </para> + </listitem> + + <listitem> + <para> + We want our application to be able to say <literal>INSERT INTO + measurement ...</literal> and have the data be redirected into the + appropriate child table. We can arrange that by attaching + a suitable trigger function to the root table. + If data will be added only to the latest child, we can + use a very simple trigger function: + +<programlisting> +CREATE OR REPLACE FUNCTION measurement_insert_trigger() +RETURNS TRIGGER AS $$ +BEGIN + INSERT INTO measurement_y2008m01 VALUES (NEW.*); + RETURN NULL; +END; +$$ +LANGUAGE plpgsql; +</programlisting> + </para> + + <para> + After creating the function, we create a trigger which + calls the trigger function: + +<programlisting> +CREATE TRIGGER insert_measurement_trigger + BEFORE INSERT ON measurement + FOR EACH ROW EXECUTE FUNCTION measurement_insert_trigger(); +</programlisting> + + We must redefine the trigger function each month so that it always + inserts into the current child table. The trigger definition does + not need to be updated, however. + </para> + + <para> + We might want to insert data and have the server automatically + locate the child table into which the row should be added. We + could do this with a more complex trigger function, for example: + +<programlisting> +CREATE OR REPLACE FUNCTION measurement_insert_trigger() +RETURNS TRIGGER AS $$ +BEGIN + IF ( NEW.logdate >= DATE '2006-02-01' AND + NEW.logdate < DATE '2006-03-01' ) THEN + INSERT INTO measurement_y2006m02 VALUES (NEW.*); + ELSIF ( NEW.logdate >= DATE '2006-03-01' AND + NEW.logdate < DATE '2006-04-01' ) THEN + INSERT INTO measurement_y2006m03 VALUES (NEW.*); + ... + ELSIF ( NEW.logdate >= DATE '2008-01-01' AND + NEW.logdate < DATE '2008-02-01' ) THEN + INSERT INTO measurement_y2008m01 VALUES (NEW.*); + ELSE + RAISE EXCEPTION 'Date out of range. Fix the measurement_insert_trigger() function!'; + END IF; + RETURN NULL; +END; +$$ +LANGUAGE plpgsql; +</programlisting> + + The trigger definition is the same as before. + Note that each <literal>IF</literal> test must exactly match the + <literal>CHECK</literal> constraint for its child table. + </para> + + <para> + While this function is more complex than the single-month case, + it doesn't need to be updated as often, since branches can be + added in advance of being needed. + </para> + + <note> + <para> + In practice, it might be best to check the newest child first, + if most inserts go into that child. For simplicity, we have + shown the trigger's tests in the same order as in other parts + of this example. + </para> + </note> + + <para> + A different approach to redirecting inserts into the appropriate + child table is to set up rules, instead of a trigger, on the + root table. For example: + +<programlisting> +CREATE RULE measurement_insert_y2006m02 AS +ON INSERT TO measurement WHERE + ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' ) +DO INSTEAD + INSERT INTO measurement_y2006m02 VALUES (NEW.*); +... +CREATE RULE measurement_insert_y2008m01 AS +ON INSERT TO measurement WHERE + ( logdate >= DATE '2008-01-01' AND logdate < DATE '2008-02-01' ) +DO INSTEAD + INSERT INTO measurement_y2008m01 VALUES (NEW.*); +</programlisting> + + A rule has significantly more overhead than a trigger, but the + overhead is paid once per query rather than once per row, so this + method might be advantageous for bulk-insert situations. In most + cases, however, the trigger method will offer better performance. + </para> + + <para> + Be aware that <command>COPY</command> ignores rules. If you want to + use <command>COPY</command> to insert data, you'll need to copy into the + correct child table rather than directly into the root. <command>COPY</command> + does fire triggers, so you can use it normally if you use the trigger + approach. + </para> + + <para> + Another disadvantage of the rule approach is that there is no simple + way to force an error if the set of rules doesn't cover the insertion + date; the data will silently go into the root table instead. + </para> + </listitem> + + <listitem> + <para> + Ensure that the <xref linkend="guc-constraint-exclusion"/> + configuration parameter is not disabled in + <filename>postgresql.conf</filename>; otherwise + child tables may be accessed unnecessarily. + </para> + </listitem> + </orderedlist> + </para> + + <para> + As we can see, a complex table hierarchy could require a + substantial amount of DDL. In the above example we would be creating + a new child table each month, so it might be wise to write a script that + generates the required DDL automatically. + </para> + </sect3> + + <sect3 id="ddl-partitioning-inheritance-maintenance"> + <title>Maintenance for Inheritance Partitioning</title> + <para> + To remove old data quickly, simply drop the child table that is no longer + necessary: +<programlisting> +DROP TABLE measurement_y2006m02; +</programlisting> + </para> + + <para> + To remove the child table from the inheritance hierarchy table but retain access to + it as a table in its own right: + +<programlisting> +ALTER TABLE measurement_y2006m02 NO INHERIT measurement; +</programlisting> + </para> + + <para> + To add a new child table to handle new data, create an empty child table + just as the original children were created above: + +<programlisting> +CREATE TABLE measurement_y2008m02 ( + CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01' ) +) INHERITS (measurement); +</programlisting> + + Alternatively, one may want to create and populate the new child table + before adding it to the table hierarchy. This could allow data to be + loaded, checked, and transformed before being made visible to queries on + the parent table. + +<programlisting> +CREATE TABLE measurement_y2008m02 + (LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS); +ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02 + CHECK ( logdate >= DATE '2008-02-01' AND logdate < DATE '2008-03-01' ); +\copy measurement_y2008m02 from 'measurement_y2008m02' +-- possibly some other data preparation work +ALTER TABLE measurement_y2008m02 INHERIT measurement; +</programlisting> + </para> + </sect3> + + <sect3 id="ddl-partitioning-inheritance-caveats"> + <title>Caveats</title> + + <para> + The following caveats apply to partitioning implemented using + inheritance: + <itemizedlist> + <listitem> + <para> + There is no automatic way to verify that all of the + <literal>CHECK</literal> constraints are mutually + exclusive. It is safer to create code that generates + child tables and creates and/or modifies associated objects than + to write each by hand. + </para> + </listitem> + + <listitem> + <para> + Indexes and foreign key constraints apply to single tables and not + to their inheritance children, hence they have some + <link linkend="ddl-inherit-caveats">caveats</link> to be aware of. + </para> + </listitem> + + <listitem> + <para> + The schemes shown here assume that the values of a row's key column(s) + never change, or at least do not change enough to require it to move to another partition. + An <command>UPDATE</command> that attempts + to do that will fail because of the <literal>CHECK</literal> constraints. + If you need to handle such cases, you can put suitable update triggers + on the child tables, but it makes management of the structure + much more complicated. + </para> + </listitem> + + <listitem> + <para> + If you are using manual <command>VACUUM</command> or + <command>ANALYZE</command> commands, don't forget that + you need to run them on each child table individually. A command like: +<programlisting> +ANALYZE measurement; +</programlisting> + will only process the root table. + </para> + </listitem> + + <listitem> + <para> + <command>INSERT</command> statements with <literal>ON CONFLICT</literal> + clauses are unlikely to work as expected, as the <literal>ON CONFLICT</literal> + action is only taken in case of unique violations on the specified + target relation, not its child relations. + </para> + </listitem> + + <listitem> + <para> + Triggers or rules will be needed to route rows to the desired + child table, unless the application is explicitly aware of the + partitioning scheme. Triggers may be complicated to write, and will + be much slower than the tuple routing performed internally by + declarative partitioning. + </para> + </listitem> + </itemizedlist> + </para> + </sect3> + </sect2> + + <sect2 id="ddl-partition-pruning"> + <title>Partition Pruning</title> + + <indexterm> + <primary>partition pruning</primary> + </indexterm> + + <para> + <firstterm>Partition pruning</firstterm> is a query optimization technique + that improves performance for declaratively partitioned tables. + As an example: + +<programlisting> +SET enable_partition_pruning = on; -- the default +SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; +</programlisting> + + Without partition pruning, the above query would scan each of the + partitions of the <structname>measurement</structname> table. With + partition pruning enabled, the planner will examine the definition + of each partition and prove that the partition need not + be scanned because it could not contain any rows meeting the query's + <literal>WHERE</literal> clause. When the planner can prove this, it + excludes (<firstterm>prunes</firstterm>) the partition from the query + plan. + </para> + + <para> + By using the EXPLAIN command and the <xref + linkend="guc-enable-partition-pruning"/> configuration parameter, it's + possible to show the difference between a plan for which partitions have + been pruned and one for which they have not. A typical unoptimized + plan for this type of table setup is: +<programlisting> +SET enable_partition_pruning = off; +EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; + QUERY PLAN +-------------------------------------------------------------------&zwsp;---------------- + Aggregate (cost=188.76..188.77 rows=1 width=8) + -> Append (cost=0.00..181.05 rows=3085 width=0) + -> Seq Scan on measurement_y2006m02 (cost=0.00..33.12 rows=617 width=0) + Filter: (logdate >= '2008-01-01'::date) + -> Seq Scan on measurement_y2006m03 (cost=0.00..33.12 rows=617 width=0) + Filter: (logdate >= '2008-01-01'::date) +... + -> Seq Scan on measurement_y2007m11 (cost=0.00..33.12 rows=617 width=0) + Filter: (logdate >= '2008-01-01'::date) + -> Seq Scan on measurement_y2007m12 (cost=0.00..33.12 rows=617 width=0) + Filter: (logdate >= '2008-01-01'::date) + -> Seq Scan on measurement_y2008m01 (cost=0.00..33.12 rows=617 width=0) + Filter: (logdate >= '2008-01-01'::date) +</programlisting> + + Some or all of the partitions might use index scans instead of + full-table sequential scans, but the point here is that there + is no need to scan the older partitions at all to answer this query. + When we enable partition pruning, we get a significantly + cheaper plan that will deliver the same answer: +<programlisting> +SET enable_partition_pruning = on; +EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; + QUERY PLAN +-------------------------------------------------------------------&zwsp;---------------- + Aggregate (cost=37.75..37.76 rows=1 width=8) + -> Seq Scan on measurement_y2008m01 (cost=0.00..33.12 rows=617 width=0) + Filter: (logdate >= '2008-01-01'::date) +</programlisting> + </para> + + <para> + Note that partition pruning is driven only by the constraints defined + implicitly by the partition keys, not by the presence of indexes. + Therefore it isn't necessary to define indexes on the key columns. + Whether an index needs to be created for a given partition depends on + whether you expect that queries that scan the partition will + generally scan a large part of the partition or just a small part. + An index will be helpful in the latter case but not the former. + </para> + + <para> + Partition pruning can be performed not only during the planning of a + given query, but also during its execution. This is useful as it can + allow more partitions to be pruned when clauses contain expressions + whose values are not known at query planning time, for example, + parameters defined in a <command>PREPARE</command> statement, using a + value obtained from a subquery, or using a parameterized value on the + inner side of a nested loop join. Partition pruning during execution + can be performed at any of the following times: + + <itemizedlist> + <listitem> + <para> + During initialization of the query plan. Partition pruning can be + performed here for parameter values which are known during the + initialization phase of execution. Partitions which are pruned + during this stage will not show up in the query's + <command>EXPLAIN</command> or <command>EXPLAIN ANALYZE</command>. + It is possible to determine the number of partitions which were + removed during this phase by observing the + <quote>Subplans Removed</quote> property in the + <command>EXPLAIN</command> output. + </para> + </listitem> + + <listitem> + <para> + During actual execution of the query plan. Partition pruning may + also be performed here to remove partitions using values which are + only known during actual query execution. This includes values + from subqueries and values from execution-time parameters such as + those from parameterized nested loop joins. Since the value of + these parameters may change many times during the execution of the + query, partition pruning is performed whenever one of the + execution parameters being used by partition pruning changes. + Determining if partitions were pruned during this phase requires + careful inspection of the <literal>loops</literal> property in + the <command>EXPLAIN ANALYZE</command> output. Subplans + corresponding to different partitions may have different values + for it depending on how many times each of them was pruned during + execution. Some may be shown as <literal>(never executed)</literal> + if they were pruned every time. + </para> + </listitem> + </itemizedlist> + </para> + + <para> + Partition pruning can be disabled using the + <xref linkend="guc-enable-partition-pruning"/> setting. + </para> + </sect2> + + <sect2 id="ddl-partitioning-constraint-exclusion"> + <title>Partitioning and Constraint Exclusion</title> + + <indexterm> + <primary>constraint exclusion</primary> + </indexterm> + + <para> + <firstterm>Constraint exclusion</firstterm> is a query optimization + technique similar to partition pruning. While it is primarily used + for partitioning implemented using the legacy inheritance method, it can be + used for other purposes, including with declarative partitioning. + </para> + + <para> + Constraint exclusion works in a very similar way to partition + pruning, except that it uses each table's <literal>CHECK</literal> + constraints — which gives it its name — whereas partition + pruning uses the table's partition bounds, which exist only in the + case of declarative partitioning. Another difference is that + constraint exclusion is only applied at plan time; there is no attempt + to remove partitions at execution time. + </para> + + <para> + The fact that constraint exclusion uses <literal>CHECK</literal> + constraints, which makes it slow compared to partition pruning, can + sometimes be used as an advantage: because constraints can be defined + even on declaratively-partitioned tables, in addition to their internal + partition bounds, constraint exclusion may be able + to elide additional partitions from the query plan. + </para> + + <para> + The default (and recommended) setting of + <xref linkend="guc-constraint-exclusion"/> is neither + <literal>on</literal> nor <literal>off</literal>, but an intermediate setting + called <literal>partition</literal>, which causes the technique to be + applied only to queries that are likely to be working on inheritance partitioned + tables. The <literal>on</literal> setting causes the planner to examine + <literal>CHECK</literal> constraints in all queries, even simple ones that + are unlikely to benefit. + </para> + + <para> + The following caveats apply to constraint exclusion: + + <itemizedlist> + <listitem> + <para> + Constraint exclusion is only applied during query planning, unlike + partition pruning, which can also be applied during query execution. + </para> + </listitem> + + <listitem> + <para> + Constraint exclusion only works when the query's <literal>WHERE</literal> + clause contains constants (or externally supplied parameters). + For example, a comparison against a non-immutable function such as + <function>CURRENT_TIMESTAMP</function> cannot be optimized, since the + planner cannot know which child table the function's value might fall + into at run time. + </para> + </listitem> + + <listitem> + <para> + Keep the partitioning constraints simple, else the planner may not be + able to prove that child tables might not need to be visited. Use simple + equality conditions for list partitioning, or simple + range tests for range partitioning, as illustrated in the preceding + examples. A good rule of thumb is that partitioning constraints should + contain only comparisons of the partitioning column(s) to constants + using B-tree-indexable operators, because only B-tree-indexable + column(s) are allowed in the partition key. + </para> + </listitem> + + <listitem> + <para> + All constraints on all children of the parent table are examined + during constraint exclusion, so large numbers of children are likely + to increase query planning time considerably. So the legacy + inheritance based partitioning will work well with up to perhaps a + hundred child tables; don't try to use many thousands of children. + </para> + </listitem> + + </itemizedlist> + </para> + </sect2> + + <sect2 id="ddl-partitioning-declarative-best-practices"> + <title>Best Practices for Declarative Partitioning</title> + + <para> + The choice of how to partition a table should be made carefully, as the + performance of query planning and execution can be negatively affected by + poor design. + </para> + + <para> + One of the most critical design decisions will be the column or columns + by which you partition your data. Often the best choice will be to + partition by the column or set of columns which most commonly appear in + <literal>WHERE</literal> clauses of queries being executed on the + partitioned table. <literal>WHERE</literal> clauses that are compatible + with the partition bound constraints can be used to prune unneeded + partitions. However, you may be forced into making other decisions by + requirements for the <literal>PRIMARY KEY</literal> or a + <literal>UNIQUE</literal> constraint. Removal of unwanted data is also a + factor to consider when planning your partitioning strategy. An entire + partition can be detached fairly quickly, so it may be beneficial to + design the partition strategy in such a way that all data to be removed + at once is located in a single partition. + </para> + + <para> + Choosing the target number of partitions that the table should be divided + into is also a critical decision to make. Not having enough partitions + may mean that indexes remain too large and that data locality remains poor + which could result in low cache hit ratios. However, dividing the table + into too many partitions can also cause issues. Too many partitions can + mean longer query planning times and higher memory consumption during both + query planning and execution, as further described below. + When choosing how to partition your table, + it's also important to consider what changes may occur in the future. For + example, if you choose to have one partition per customer and you + currently have a small number of large customers, consider the + implications if in several years you instead find yourself with a large + number of small customers. In this case, it may be better to choose to + partition by <literal>HASH</literal> and choose a reasonable number of + partitions rather than trying to partition by <literal>LIST</literal> and + hoping that the number of customers does not increase beyond what it is + practical to partition the data by. + </para> + + <para> + Sub-partitioning can be useful to further divide partitions that are + expected to become larger than other partitions. + Another option is to use range partitioning with multiple columns in + the partition key. + Either of these can easily lead to excessive numbers of partitions, + so restraint is advisable. + </para> + + <para> + It is important to consider the overhead of partitioning during + query planning and execution. The query planner is generally able to + handle partition hierarchies with up to a few thousand partitions fairly + well, provided that typical queries allow the query planner to prune all + but a small number of partitions. Planning times become longer and memory + consumption becomes higher when more partitions remain after the planner + performs partition pruning. Another + reason to be concerned about having a large number of partitions is that + the server's memory consumption may grow significantly over + time, especially if many sessions touch large numbers of partitions. + That's because each partition requires its metadata to be loaded into the + local memory of each session that touches it. + </para> + + <para> + With data warehouse type workloads, it can make sense to use a larger + number of partitions than with an <acronym>OLTP</acronym> type workload. + Generally, in data warehouses, query planning time is less of a concern as + the majority of processing time is spent during query execution. With + either of these two types of workload, it is important to make the right + decisions early, as re-partitioning large quantities of data can be + painfully slow. Simulations of the intended workload are often beneficial + for optimizing the partitioning strategy. Never just assume that more + partitions are better than fewer partitions, nor vice-versa. + </para> + </sect2> + + </sect1> + + <sect1 id="ddl-foreign-data"> + <title>Foreign Data</title> + + <indexterm> + <primary>foreign data</primary> + </indexterm> + <indexterm> + <primary>foreign table</primary> + </indexterm> + <indexterm> + <primary>user mapping</primary> + </indexterm> + + <para> + <productname>PostgreSQL</productname> implements portions of the SQL/MED + specification, allowing you to access data that resides outside + PostgreSQL using regular SQL queries. Such data is referred to as + <firstterm>foreign data</firstterm>. (Note that this usage is not to be confused + with foreign keys, which are a type of constraint within the database.) + </para> + + <para> + Foreign data is accessed with help from a + <firstterm>foreign data wrapper</firstterm>. A foreign data wrapper is a + library that can communicate with an external data source, hiding the + details of connecting to the data source and obtaining data from it. + There are some foreign data wrappers available as <filename>contrib</filename> + modules; see <xref linkend="contrib"/>. Other kinds of foreign data + wrappers might be found as third party products. If none of the existing + foreign data wrappers suit your needs, you can write your own; see <xref + linkend="fdwhandler"/>. + </para> + + <para> + To access foreign data, you need to create a <firstterm>foreign server</firstterm> + object, which defines how to connect to a particular external data source + according to the set of options used by its supporting foreign data + wrapper. Then you need to create one or more <firstterm>foreign + tables</firstterm>, which define the structure of the remote data. A + foreign table can be used in queries just like a normal table, but a + foreign table has no storage in the PostgreSQL server. Whenever it is + used, <productname>PostgreSQL</productname> asks the foreign data wrapper + to fetch data from the external source, or transmit data to the external + source in the case of update commands. + </para> + + <para> + Accessing remote data may require authenticating to the external + data source. This information can be provided by a + <firstterm>user mapping</firstterm>, which can provide additional data + such as user names and passwords based + on the current <productname>PostgreSQL</productname> role. + </para> + + <para> + For additional information, see + <xref linkend="sql-createforeigndatawrapper"/>, + <xref linkend="sql-createserver"/>, + <xref linkend="sql-createusermapping"/>, + <xref linkend="sql-createforeigntable"/>, and + <xref linkend="sql-importforeignschema"/>. + </para> + </sect1> + + <sect1 id="ddl-others"> + <title>Other Database Objects</title> + + <para> + Tables are the central objects in a relational database structure, + because they hold your data. But they are not the only objects + that exist in a database. Many other kinds of objects can be + created to make the use and management of the data more efficient + or convenient. They are not discussed in this chapter, but we give + you a list here so that you are aware of what is possible: + </para> + + <itemizedlist> + <listitem> + <para> + Views + </para> + </listitem> + + <listitem> + <para> + Functions, procedures, and operators + </para> + </listitem> + + <listitem> + <para> + Data types and domains + </para> + </listitem> + + <listitem> + <para> + Triggers and rewrite rules + </para> + </listitem> + </itemizedlist> + + <para> + Detailed information on + these topics appears in <xref linkend="server-programming"/>. + </para> + </sect1> + + <sect1 id="ddl-depend"> + <title>Dependency Tracking</title> + + <indexterm zone="ddl-depend"> + <primary>CASCADE</primary> + <secondary sortas="DROP">with DROP</secondary> + </indexterm> + + <indexterm zone="ddl-depend"> + <primary>RESTRICT</primary> + <secondary sortas="DROP">with DROP</secondary> + </indexterm> + + <para> + When you create complex database structures involving many tables + with foreign key constraints, views, triggers, functions, etc. you + implicitly create a net of dependencies between the objects. + For instance, a table with a foreign key constraint depends on the + table it references. + </para> + + <para> + To ensure the integrity of the entire database structure, + <productname>PostgreSQL</productname> makes sure that you cannot + drop objects that other objects still depend on. For example, + attempting to drop the products table we considered in <xref + linkend="ddl-constraints-fk"/>, with the orders table depending on + it, would result in an error message like this: +<screen> +DROP TABLE products; + +ERROR: cannot drop table products because other objects depend on it +DETAIL: constraint orders_product_no_fkey on table orders depends on table products +HINT: Use DROP ... CASCADE to drop the dependent objects too. +</screen> + The error message contains a useful hint: if you do not want to + bother deleting all the dependent objects individually, you can run: +<screen> +DROP TABLE products CASCADE; +</screen> + and all the dependent objects will be removed, as will any objects + that depend on them, recursively. In this case, it doesn't remove + the orders table, it only removes the foreign key constraint. + It stops there because nothing depends on the foreign key constraint. + (If you want to check what <command>DROP ... CASCADE</command> will do, + run <command>DROP</command> without <literal>CASCADE</literal> and read the + <literal>DETAIL</literal> output.) + </para> + + <para> + Almost all <command>DROP</command> commands in <productname>PostgreSQL</productname> support + specifying <literal>CASCADE</literal>. Of course, the nature of + the possible dependencies varies with the type of the object. You + can also write <literal>RESTRICT</literal> instead of + <literal>CASCADE</literal> to get the default behavior, which is to + prevent dropping objects that any other objects depend on. + </para> + + <note> + <para> + According to the SQL standard, specifying either + <literal>RESTRICT</literal> or <literal>CASCADE</literal> is + required in a <command>DROP</command> command. No database system actually + enforces that rule, but whether the default behavior + is <literal>RESTRICT</literal> or <literal>CASCADE</literal> varies + across systems. + </para> + </note> + + <para> + If a <command>DROP</command> command lists multiple + objects, <literal>CASCADE</literal> is only required when there are + dependencies outside the specified group. For example, when saying + <literal>DROP TABLE tab1, tab2</literal> the existence of a foreign + key referencing <literal>tab1</literal> from <literal>tab2</literal> would not mean + that <literal>CASCADE</literal> is needed to succeed. + </para> + + <para> + For a user-defined function or procedure whose body is defined as a string + literal, <productname>PostgreSQL</productname> tracks + dependencies associated with the function's externally-visible properties, + such as its argument and result types, but <emphasis>not</emphasis> dependencies + that could only be known by examining the function body. As an example, + consider this situation: + +<programlisting> +CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', + 'green', 'blue', 'purple'); + +CREATE TABLE my_colors (color rainbow, note text); + +CREATE FUNCTION get_color_note (rainbow) RETURNS text AS + 'SELECT note FROM my_colors WHERE color = $1' + LANGUAGE SQL; +</programlisting> + + (See <xref linkend="xfunc-sql"/> for an explanation of SQL-language + functions.) <productname>PostgreSQL</productname> will be aware that + the <function>get_color_note</function> function depends on the <type>rainbow</type> + type: dropping the type would force dropping the function, because its + argument type would no longer be defined. But <productname>PostgreSQL</productname> + will not consider <function>get_color_note</function> to depend on + the <structname>my_colors</structname> table, and so will not drop the function if + the table is dropped. While there are disadvantages to this approach, + there are also benefits. The function is still valid in some sense if the + table is missing, though executing it would cause an error; creating a new + table of the same name would allow the function to work again. + </para> + + <para> + On the other hand, for a SQL-language function or procedure whose body + is written in SQL-standard style, the body is parsed at function + definition time and all dependencies recognized by the parser are + stored. Thus, if we write the function above as + +<programlisting> +CREATE FUNCTION get_color_note (rainbow) RETURNS text +BEGIN ATOMIC + SELECT note FROM my_colors WHERE color = $1; +END; +</programlisting> + + then the function's dependency on the <structname>my_colors</structname> + table will be known and enforced by <command>DROP</command>. + </para> + </sect1> + +</chapter> |