diff options
Diffstat (limited to 'doc/src/sgml/rowtypes.sgml')
-rw-r--r-- | doc/src/sgml/rowtypes.sgml | 540 |
1 files changed, 540 insertions, 0 deletions
diff --git a/doc/src/sgml/rowtypes.sgml b/doc/src/sgml/rowtypes.sgml new file mode 100644 index 0000000..417ccb0 --- /dev/null +++ b/doc/src/sgml/rowtypes.sgml @@ -0,0 +1,540 @@ +<!-- doc/src/sgml/rowtypes.sgml --> + +<sect1 id="rowtypes"> + <title>Composite Types</title> + + <indexterm> + <primary>composite type</primary> + </indexterm> + + <indexterm> + <primary>row type</primary> + </indexterm> + + <para> + A <firstterm>composite type</firstterm> represents the structure of a row or record; + it is essentially just a list of field names and their data types. + <productname>PostgreSQL</productname> allows composite types to be + used in many of the same ways that simple types can be used. For example, a + column of a table can be declared to be of a composite type. + </para> + + <sect2 id="rowtypes-declaring"> + <title>Declaration of Composite Types</title> + + <para> + Here are two simple examples of defining composite types: +<programlisting> +CREATE TYPE complex AS ( + r double precision, + i double precision +); + +CREATE TYPE inventory_item AS ( + name text, + supplier_id integer, + price numeric +); +</programlisting> + The syntax is comparable to <command>CREATE TABLE</command>, except that only + field names and types can be specified; no constraints (such as <literal>NOT + NULL</literal>) can presently be included. Note that the <literal>AS</literal> keyword + is essential; without it, the system will think a different kind + of <command>CREATE TYPE</command> command is meant, and you will get odd syntax + errors. + </para> + + <para> + Having defined the types, we can use them to create tables: + +<programlisting> +CREATE TABLE on_hand ( + item inventory_item, + count integer +); + +INSERT INTO on_hand VALUES (ROW('fuzzy dice', 42, 1.99), 1000); +</programlisting> + + or functions: + +<programlisting> +CREATE FUNCTION price_extension(inventory_item, integer) RETURNS numeric +AS 'SELECT $1.price * $2' LANGUAGE SQL; + +SELECT price_extension(item, 10) FROM on_hand; +</programlisting> + + </para> + + <para> + Whenever you create a table, a composite type is also automatically + created, with the same name as the table, to represent the table's + row type. For example, had we said: +<programlisting> +CREATE TABLE inventory_item ( + name text, + supplier_id integer REFERENCES suppliers, + price numeric CHECK (price > 0) +); +</programlisting> + then the same <literal>inventory_item</literal> composite type shown above would + come into being as a + byproduct, and could be used just as above. Note however an important + restriction of the current implementation: since no constraints are + associated with a composite type, the constraints shown in the table + definition <emphasis>do not apply</emphasis> to values of the composite type + outside the table. (To work around this, create a + <glossterm linkend="glossary-domain">domain</glossterm> over the composite + type, and apply the desired constraints as <literal>CHECK</literal> + constraints of the domain.) + </para> + </sect2> + + <sect2> + <title>Constructing Composite Values</title> + + <indexterm> + <primary>composite type</primary> + <secondary>constant</secondary> + </indexterm> + + <para> + To write a composite value as a literal constant, enclose the field + values within parentheses and separate them by commas. You can put double + quotes around any field value, and must do so if it contains commas or + parentheses. (More details appear <link + linkend="rowtypes-io-syntax">below</link>.) Thus, the general format of + a composite constant is the following: +<synopsis> +'( <replaceable>val1</replaceable> , <replaceable>val2</replaceable> , ... )' +</synopsis> + An example is: +<programlisting> +'("fuzzy dice",42,1.99)' +</programlisting> + which would be a valid value of the <literal>inventory_item</literal> type + defined above. To make a field be NULL, write no characters at all + in its position in the list. For example, this constant specifies + a NULL third field: +<programlisting> +'("fuzzy dice",42,)' +</programlisting> + If you want an empty string rather than NULL, write double quotes: +<programlisting> +'("",42,)' +</programlisting> + Here the first field is a non-NULL empty string, the third is NULL. + </para> + + <para> + (These constants are actually only a special case of + the generic type constants discussed in <xref + linkend="sql-syntax-constants-generic"/>. The constant is initially + treated as a string and passed to the composite-type input conversion + routine. An explicit type specification might be necessary to tell + which type to convert the constant to.) + </para> + + <para> + The <literal>ROW</literal> expression syntax can also be used to + construct composite values. In most cases this is considerably + simpler to use than the string-literal syntax since you don't have + to worry about multiple layers of quoting. We already used this + method above: +<programlisting> +ROW('fuzzy dice', 42, 1.99) +ROW('', 42, NULL) +</programlisting> + The ROW keyword is actually optional as long as you have more than one + field in the expression, so these can be simplified to: +<programlisting> +('fuzzy dice', 42, 1.99) +('', 42, NULL) +</programlisting> + The <literal>ROW</literal> expression syntax is discussed in more detail in <xref + linkend="sql-syntax-row-constructors"/>. + </para> + </sect2> + + <sect2 id="rowtypes-accessing"> + <title>Accessing Composite Types</title> + + <para> + To access a field of a composite column, one writes a dot and the field + name, much like selecting a field from a table name. In fact, it's so + much like selecting from a table name that you often have to use parentheses + to keep from confusing the parser. For example, you might try to select + some subfields from our <literal>on_hand</literal> example table with something + like: + +<programlisting> +SELECT item.name FROM on_hand WHERE item.price > 9.99; +</programlisting> + + This will not work since the name <literal>item</literal> is taken to be a table + name, not a column name of <literal>on_hand</literal>, per SQL syntax rules. + You must write it like this: + +<programlisting> +SELECT (item).name FROM on_hand WHERE (item).price > 9.99; +</programlisting> + + or if you need to use the table name as well (for instance in a multitable + query), like this: + +<programlisting> +SELECT (on_hand.item).name FROM on_hand WHERE (on_hand.item).price > 9.99; +</programlisting> + + Now the parenthesized object is correctly interpreted as a reference to + the <literal>item</literal> column, and then the subfield can be selected from it. + </para> + + <para> + Similar syntactic issues apply whenever you select a field from a composite + value. For instance, to select just one field from the result of a function + that returns a composite value, you'd need to write something like: + +<programlisting> +SELECT (my_func(...)).field FROM ... +</programlisting> + + Without the extra parentheses, this will generate a syntax error. + </para> + + <para> + The special field name <literal>*</literal> means <quote>all fields</quote>, as + further explained in <xref linkend="rowtypes-usage"/>. + </para> + </sect2> + + <sect2> + <title>Modifying Composite Types</title> + + <para> + Here are some examples of the proper syntax for inserting and updating + composite columns. + First, inserting or updating a whole column: + +<programlisting> +INSERT INTO mytab (complex_col) VALUES((1.1,2.2)); + +UPDATE mytab SET complex_col = ROW(1.1,2.2) WHERE ...; +</programlisting> + + The first example omits <literal>ROW</literal>, the second uses it; we + could have done it either way. + </para> + + <para> + We can update an individual subfield of a composite column: + +<programlisting> +UPDATE mytab SET complex_col.r = (complex_col).r + 1 WHERE ...; +</programlisting> + + Notice here that we don't need to (and indeed cannot) + put parentheses around the column name appearing just after + <literal>SET</literal>, but we do need parentheses when referencing the same + column in the expression to the right of the equal sign. + </para> + + <para> + And we can specify subfields as targets for <command>INSERT</command>, too: + +<programlisting> +INSERT INTO mytab (complex_col.r, complex_col.i) VALUES(1.1, 2.2); +</programlisting> + + Had we not supplied values for all the subfields of the column, the + remaining subfields would have been filled with null values. + </para> + </sect2> + + <sect2 id="rowtypes-usage"> + <title>Using Composite Types in Queries</title> + + <para> + There are various special syntax rules and behaviors associated with + composite types in queries. These rules provide useful shortcuts, + but can be confusing if you don't know the logic behind them. + </para> + + <para> + In <productname>PostgreSQL</productname>, a reference to a table name (or alias) + in a query is effectively a reference to the composite value of the + table's current row. For example, if we had a table + <structname>inventory_item</structname> as shown + <link linkend="rowtypes-declaring">above</link>, we could write: +<programlisting> +SELECT c FROM inventory_item c; +</programlisting> + This query produces a single composite-valued column, so we might get + output like: +<programlisting> + c +------------------------ + ("fuzzy dice",42,1.99) +(1 row) +</programlisting> + Note however that simple names are matched to column names before table + names, so this example works only because there is no column + named <structfield>c</structfield> in the query's tables. + </para> + + <para> + The ordinary qualified-column-name + syntax <replaceable>table_name</replaceable><literal>.</literal><replaceable>column_name</replaceable> + can be understood as applying <link linkend="field-selection">field + selection</link> to the composite value of the table's current row. + (For efficiency reasons, it's not actually implemented that way.) + </para> + + <para> + When we write +<programlisting> +SELECT c.* FROM inventory_item c; +</programlisting> + then, according to the SQL standard, we should get the contents of the + table expanded into separate columns: +<programlisting> + name | supplier_id | price +------------+-------------+------- + fuzzy dice | 42 | 1.99 +(1 row) +</programlisting> + as if the query were +<programlisting> +SELECT c.name, c.supplier_id, c.price FROM inventory_item c; +</programlisting> + <productname>PostgreSQL</productname> will apply this expansion behavior to + any composite-valued expression, although as shown <link + linkend="rowtypes-accessing">above</link>, you need to write parentheses + around the value that <literal>.*</literal> is applied to whenever it's not a + simple table name. For example, if <function>myfunc()</function> is a function + returning a composite type with columns <structfield>a</structfield>, + <structfield>b</structfield>, and <structfield>c</structfield>, then these two queries have the + same result: +<programlisting> +SELECT (myfunc(x)).* FROM some_table; +SELECT (myfunc(x)).a, (myfunc(x)).b, (myfunc(x)).c FROM some_table; +</programlisting> + </para> + + <tip> + <para> + <productname>PostgreSQL</productname> handles column expansion by + actually transforming the first form into the second. So, in this + example, <function>myfunc()</function> would get invoked three times per row + with either syntax. If it's an expensive function you may wish to + avoid that, which you can do with a query like: +<programlisting> +SELECT m.* FROM some_table, LATERAL myfunc(x) AS m; +</programlisting> + Placing the function in + a <literal>LATERAL</literal> <literal>FROM</literal> item keeps it from + being invoked more than once per row. <literal>m.*</literal> is still + expanded into <literal>m.a, m.b, m.c</literal>, but now those variables + are just references to the output of the <literal>FROM</literal> item. + (The <literal>LATERAL</literal> keyword is optional here, but we show it + to clarify that the function is getting <structfield>x</structfield> + from <structname>some_table</structname>.) + </para> + </tip> + + <para> + The <replaceable>composite_value</replaceable><literal>.*</literal> syntax results in + column expansion of this kind when it appears at the top level of + a <link linkend="queries-select-lists"><command>SELECT</command> output + list</link>, a <link linkend="dml-returning"><literal>RETURNING</literal> + list</link> in <command>INSERT</command>/<command>UPDATE</command>/<command>DELETE</command>, + a <link linkend="queries-values"><literal>VALUES</literal> clause</link>, or + a <link linkend="sql-syntax-row-constructors">row constructor</link>. + In all other contexts (including when nested inside one of those + constructs), attaching <literal>.*</literal> to a composite value does not + change the value, since it means <quote>all columns</quote> and so the + same composite value is produced again. For example, + if <function>somefunc()</function> accepts a composite-valued argument, + these queries are the same: + +<programlisting> +SELECT somefunc(c.*) FROM inventory_item c; +SELECT somefunc(c) FROM inventory_item c; +</programlisting> + + In both cases, the current row of <structname>inventory_item</structname> is + passed to the function as a single composite-valued argument. + Even though <literal>.*</literal> does nothing in such cases, using it is good + style, since it makes clear that a composite value is intended. In + particular, the parser will consider <literal>c</literal> in <literal>c.*</literal> to + refer to a table name or alias, not to a column name, so that there is + no ambiguity; whereas without <literal>.*</literal>, it is not clear + whether <literal>c</literal> means a table name or a column name, and in fact + the column-name interpretation will be preferred if there is a column + named <literal>c</literal>. + </para> + + <para> + Another example demonstrating these concepts is that all these queries + mean the same thing: +<programlisting> +SELECT * FROM inventory_item c ORDER BY c; +SELECT * FROM inventory_item c ORDER BY c.*; +SELECT * FROM inventory_item c ORDER BY ROW(c.*); +</programlisting> + All of these <literal>ORDER BY</literal> clauses specify the row's composite + value, resulting in sorting the rows according to the rules described + in <xref linkend="composite-type-comparison"/>. However, + if <structname>inventory_item</structname> contained a column + named <structfield>c</structfield>, the first case would be different from the + others, as it would mean to sort by that column only. Given the column + names previously shown, these queries are also equivalent to those above: +<programlisting> +SELECT * FROM inventory_item c ORDER BY ROW(c.name, c.supplier_id, c.price); +SELECT * FROM inventory_item c ORDER BY (c.name, c.supplier_id, c.price); +</programlisting> + (The last case uses a row constructor with the key word <literal>ROW</literal> + omitted.) + </para> + + <para> + Another special syntactical behavior associated with composite values is + that we can use <firstterm>functional notation</firstterm> for extracting a field + of a composite value. The simple way to explain this is that + the notations <literal><replaceable>field</replaceable>(<replaceable>table</replaceable>)</literal> + and <literal><replaceable>table</replaceable>.<replaceable>field</replaceable></literal> + are interchangeable. For example, these queries are equivalent: + +<programlisting> +SELECT c.name FROM inventory_item c WHERE c.price > 1000; +SELECT name(c) FROM inventory_item c WHERE price(c) > 1000; +</programlisting> + + Moreover, if we have a function that accepts a single argument of a + composite type, we can call it with either notation. These queries are + all equivalent: + +<programlisting> +SELECT somefunc(c) FROM inventory_item c; +SELECT somefunc(c.*) FROM inventory_item c; +SELECT c.somefunc FROM inventory_item c; +</programlisting> + </para> + + <para> + This equivalence between functional notation and field notation + makes it possible to use functions on composite types to implement + <quote>computed fields</quote>. + <indexterm> + <primary>computed field</primary> + </indexterm> + <indexterm> + <primary>field</primary> + <secondary>computed</secondary> + </indexterm> + An application using the last query above wouldn't need to be directly + aware that <literal>somefunc</literal> isn't a real column of the table. + </para> + + <tip> + <para> + Because of this behavior, it's unwise to give a function that takes a + single composite-type argument the same name as any of the fields of + that composite type. If there is ambiguity, the field-name + interpretation will be chosen if field-name syntax is used, while the + function will be chosen if function-call syntax is used. However, + <productname>PostgreSQL</productname> versions before 11 always chose the + field-name interpretation, unless the syntax of the call required it to + be a function call. One way to force the function interpretation in + older versions is to schema-qualify the function name, that is, write + <literal><replaceable>schema</replaceable>.<replaceable>func</replaceable>(<replaceable>compositevalue</replaceable>)</literal>. + </para> + </tip> + </sect2> + + <sect2 id="rowtypes-io-syntax"> + <title>Composite Type Input and Output Syntax</title> + + <para> + The external text representation of a composite value consists of items that + are interpreted according to the I/O conversion rules for the individual + field types, plus decoration that indicates the composite structure. + The decoration consists of parentheses (<literal>(</literal> and <literal>)</literal>) + around the whole value, plus commas (<literal>,</literal>) between adjacent + items. Whitespace outside the parentheses is ignored, but within the + parentheses it is considered part of the field value, and might or might not be + significant depending on the input conversion rules for the field data type. + For example, in: +<programlisting> +'( 42)' +</programlisting> + the whitespace will be ignored if the field type is integer, but not if + it is text. + </para> + + <para> + As shown previously, when writing a composite value you can write double + quotes around any individual field value. + You <emphasis>must</emphasis> do so if the field value would otherwise + confuse the composite-value parser. In particular, fields containing + parentheses, commas, double quotes, or backslashes must be double-quoted. + To put a double quote or backslash in a quoted composite field value, + precede it with a backslash. (Also, a pair of double quotes within a + double-quoted field value is taken to represent a double quote character, + analogously to the rules for single quotes in SQL literal strings.) + Alternatively, you can avoid quoting and use backslash-escaping to + protect all data characters + that would otherwise be taken as composite syntax. + </para> + + <para> + A completely empty field value (no characters at all between the commas + or parentheses) represents a NULL. To write a value that is an empty + string rather than NULL, write <literal>""</literal>. + </para> + + <para> + The composite output routine will put double quotes around field values + if they are empty strings or contain parentheses, commas, + double quotes, backslashes, or white space. (Doing so for white space + is not essential, but aids legibility.) Double quotes and backslashes + embedded in field values will be doubled. + </para> + + <note> + <para> + Remember that what you write in an SQL command will first be interpreted + as a string literal, and then as a composite. This doubles the number of + backslashes you need (assuming escape string syntax is used). + For example, to insert a <type>text</type> field + containing a double quote and a backslash in a composite + value, you'd need to write: +<programlisting> +INSERT ... VALUES ('("\"\\")'); +</programlisting> + The string-literal processor removes one level of backslashes, so that + what arrives at the composite-value parser looks like + <literal>("\"\\")</literal>. In turn, the string + fed to the <type>text</type> data type's input routine + becomes <literal>"\</literal>. (If we were working + with a data type whose input routine also treated backslashes specially, + <type>bytea</type> for example, we might need as many as eight backslashes + in the command to get one backslash into the stored composite field.) + Dollar quoting (see <xref linkend="sql-syntax-dollar-quoting"/>) can be + used to avoid the need to double backslashes. + </para> + </note> + + <tip> + <para> + The <literal>ROW</literal> constructor syntax is usually easier to work with + than the composite-literal syntax when writing composite values in SQL + commands. + In <literal>ROW</literal>, individual field values are written the same way + they would be written when not members of a composite. + </para> + </tip> + </sect2> + +</sect1> |