diff options
Diffstat (limited to 'doc/src/sgml/typeconv.sgml')
-rw-r--r-- | doc/src/sgml/typeconv.sgml | 1262 |
1 files changed, 1262 insertions, 0 deletions
diff --git a/doc/src/sgml/typeconv.sgml b/doc/src/sgml/typeconv.sgml new file mode 100644 index 0000000..2874874 --- /dev/null +++ b/doc/src/sgml/typeconv.sgml @@ -0,0 +1,1262 @@ +<!-- doc/src/sgml/typeconv.sgml --> + +<chapter id="typeconv"> +<title>Type Conversion</title> + +<indexterm zone="typeconv"> + <primary>data type</primary> + <secondary>conversion</secondary> +</indexterm> + +<para> +<acronym>SQL</acronym> statements can, intentionally or not, require +the mixing of different data types in the same expression. +<productname>PostgreSQL</productname> has extensive facilities for +evaluating mixed-type expressions. +</para> + +<para> +In many cases a user does not need +to understand the details of the type conversion mechanism. +However, implicit conversions done by <productname>PostgreSQL</productname> +can affect the results of a query. When necessary, these results +can be tailored by using <emphasis>explicit</emphasis> type conversion. +</para> + +<para> +This chapter introduces the <productname>PostgreSQL</productname> +type conversion mechanisms and conventions. +Refer to the relevant sections in <xref linkend="datatype"/> and <xref linkend="functions"/> +for more information on specific data types and allowed functions and +operators. +</para> + +<sect1 id="typeconv-overview"> +<title>Overview</title> + +<para> +<acronym>SQL</acronym> is a strongly typed language. That is, every data item +has an associated data type which determines its behavior and allowed usage. +<productname>PostgreSQL</productname> has an extensible type system that is +more general and flexible than other <acronym>SQL</acronym> implementations. +Hence, most type conversion behavior in <productname>PostgreSQL</productname> +is governed by general rules rather than by ad hoc +heuristics. This allows the use of mixed-type expressions even with +user-defined types. +</para> + +<para> +The <productname>PostgreSQL</productname> scanner/parser divides lexical +elements into five fundamental categories: integers, non-integer numbers, +strings, identifiers, and key words. Constants of most non-numeric types are +first classified as strings. The <acronym>SQL</acronym> language definition +allows specifying type names with strings, and this mechanism can be used in +<productname>PostgreSQL</productname> to start the parser down the correct +path. For example, the query: + +<screen> +SELECT text 'Origin' AS "label", point '(0,0)' AS "value"; + + label | value +--------+------- + Origin | (0,0) +(1 row) +</screen> + +has two literal constants, of type <type>text</type> and <type>point</type>. +If a type is not specified for a string literal, then the placeholder type +<type>unknown</type> is assigned initially, to be resolved in later +stages as described below. +</para> + +<para> +There are four fundamental <acronym>SQL</acronym> constructs requiring +distinct type conversion rules in the <productname>PostgreSQL</productname> +parser: + +<variablelist> +<varlistentry> +<term> +Function calls +</term> +<listitem> +<para> +Much of the <productname>PostgreSQL</productname> type system is built around a +rich set of functions. Functions can have one or more arguments. +Since <productname>PostgreSQL</productname> permits function +overloading, the function name alone does not uniquely identify the function +to be called; the parser must select the right function based on the data +types of the supplied arguments. +</para> +</listitem> +</varlistentry> +<varlistentry> +<term> +Operators +</term> +<listitem> +<para> +<productname>PostgreSQL</productname> allows expressions with +prefix (one-argument) operators, +as well as infix (two-argument) operators. Like functions, operators can +be overloaded, so the same problem of selecting the right operator +exists. +</para> +</listitem> +</varlistentry> +<varlistentry> +<term> +Value Storage +</term> +<listitem> +<para> +<acronym>SQL</acronym> <command>INSERT</command> and <command>UPDATE</command> statements place the results of +expressions into a table. The expressions in the statement must be matched up +with, and perhaps converted to, the types of the target columns. +</para> +</listitem> +</varlistentry> +<varlistentry> +<term> +<literal>UNION</literal>, <literal>CASE</literal>, and related constructs +</term> +<listitem> +<para> +Since all query results from a unionized <command>SELECT</command> statement +must appear in a single set of columns, the types of the results of each +<command>SELECT</command> clause must be matched up and converted to a uniform set. +Similarly, the result expressions of a <literal>CASE</literal> construct must be +converted to a common type so that the <literal>CASE</literal> expression as a whole +has a known output type. Some other constructs, such +as <literal>ARRAY[]</literal> and the <function>GREATEST</function> +and <function>LEAST</function> functions, likewise require determination of a +common type for several subexpressions. +</para> +</listitem> +</varlistentry> +</variablelist> +</para> + +<para> +The system catalogs store information about which conversions, or +<firstterm>casts</firstterm>, exist between which data types, and how to +perform those conversions. Additional casts can be added by the user +with the <xref linkend="sql-createcast"/> +command. (This is usually +done in conjunction with defining new data types. The set of casts +between built-in types has been carefully crafted and is best not +altered.) +</para> + +<indexterm> + <primary>data type</primary> + <secondary>category</secondary> +</indexterm> + +<para> +An additional heuristic provided by the parser allows improved determination +of the proper casting behavior among groups of types that have implicit casts. +Data types are divided into several basic <firstterm>type +categories</firstterm>, including <type>boolean</type>, <type>numeric</type>, +<type>string</type>, <type>bitstring</type>, <type>datetime</type>, +<type>timespan</type>, <type>geometric</type>, <type>network</type>, and +user-defined. (For a list see <xref linkend="catalog-typcategory-table"/>; +but note it is also possible to create custom type categories.) Within each +category there can be one or more <firstterm>preferred types</firstterm>, which +are preferred when there is a choice of possible types. With careful selection +of preferred types and available implicit casts, it is possible to ensure that +ambiguous expressions (those with multiple candidate parsing solutions) can be +resolved in a useful way. +</para> + +<para> +All type conversion rules are designed with several principles in mind: + +<itemizedlist> +<listitem> +<para> +Implicit conversions should never have surprising or unpredictable outcomes. +</para> +</listitem> + +<listitem> +<para> +There should be no extra overhead in the parser or executor +if a query does not need implicit type conversion. +That is, if a query is well-formed and the types already match, then the query should execute +without spending extra time in the parser and without introducing unnecessary implicit conversion +calls in the query. +</para> +</listitem> + +<listitem> +<para> +Additionally, if a query usually requires an implicit conversion for a function, and +if then the user defines a new function with the correct argument types, the parser +should use this new function and no longer do implicit conversion to use the old function. +</para> +</listitem> +</itemizedlist> +</para> + +</sect1> + +<sect1 id="typeconv-oper"> +<title>Operators</title> + +<indexterm zone="typeconv-oper"> + <primary>operator</primary> + <secondary>type resolution in an invocation</secondary> +</indexterm> + + <para> + The specific operator that is referenced by an operator expression + is determined using the following procedure. + Note that this procedure is indirectly affected + by the precedence of the operators involved, since that will determine + which sub-expressions are taken to be the inputs of which operators. + See <xref linkend="sql-precedence"/> for more information. + </para> + +<procedure> +<title>Operator Type Resolution</title> + +<step id="op-resol-select" performance="required"> +<para> +Select the operators to be considered from the +<classname>pg_operator</classname> system catalog. If a non-schema-qualified +operator name was used (the usual case), the operators +considered are those with the matching name and argument count that are +visible in the current search path (see <xref linkend="ddl-schemas-path"/>). +If a qualified operator name was given, only operators in the specified +schema are considered. +</para> + +<substeps> +<step performance="optional"> +<para> +If the search path finds multiple operators with identical argument types, +only the one appearing earliest in the path is considered. Operators with +different argument types are considered on an equal footing regardless of +search path position. +</para> +</step> +</substeps> +</step> + +<step id="op-resol-exact-match" performance="required"> +<para> +Check for an operator accepting exactly the input argument types. +If one exists (there can be only one exact match in the set of +operators considered), use it. Lack of an exact match creates a security +hazard when calling, via qualified name + <footnote id="op-qualified-security"> + <!-- If you edit this, consider editing func-qualified-security. --> + <para> + The hazard does not arise with a non-schema-qualified name, because a + search path containing schemas that permit untrusted users to create + objects is not a <link linkend="ddl-schemas-patterns">secure schema usage + pattern</link>. + </para> + </footnote> +(not typical), any operator found in a schema that permits untrusted users to +create objects. In such situations, cast arguments to force an exact match. +</para> + +<substeps> +<step id="op-resol-exact-unknown" performance="optional"> +<para> +If one argument of a binary operator invocation is of the <type>unknown</type> type, +then assume it is the same type as the other argument for this check. +Invocations involving two <type>unknown</type> inputs, or a prefix operator +with an <type>unknown</type> input, will never find a match at this step. +</para> +</step> +<step id="op-resol-exact-domain" performance="optional"> +<para> +If one argument of a binary operator invocation is of the <type>unknown</type> +type and the other is of a domain type, next check to see if there is an +operator accepting exactly the domain's base type on both sides; if so, use it. +</para> +</step> +</substeps> +</step> + +<step id="op-resol-best-match" performance="required"> +<para> +Look for the best match. +</para> +<substeps> +<step performance="required"> +<para> +Discard candidate operators for which the input types do not match +and cannot be converted (using an implicit conversion) to match. +<type>unknown</type> literals are +assumed to be convertible to anything for this purpose. If only one +candidate remains, use it; else continue to the next step. +</para> +</step> +<step performance="required"> +<para> +If any input argument is of a domain type, treat it as being of the +domain's base type for all subsequent steps. This ensures that domains +act like their base types for purposes of ambiguous-operator resolution. +</para> +</step> +<step performance="required"> +<para> +Run through all candidates and keep those with the most exact matches +on input types. Keep all candidates if none have exact matches. +If only one candidate remains, use it; else continue to the next step. +</para> +</step> +<step performance="required"> +<para> +Run through all candidates and keep those that accept preferred types (of the +input data type's type category) at the most positions where type conversion +will be required. +Keep all candidates if none accept preferred types. +If only one candidate remains, use it; else continue to the next step. +</para> +</step> +<step performance="required"> +<para> +If any input arguments are <type>unknown</type>, check the type +categories accepted at those argument positions by the remaining +candidates. At each position, select the <type>string</type> category +if any +candidate accepts that category. (This bias towards string is appropriate +since an unknown-type literal looks like a string.) Otherwise, if +all the remaining candidates accept the same type category, select that +category; otherwise fail because the correct choice cannot be deduced +without more clues. Now discard +candidates that do not accept the selected type category. Furthermore, +if any candidate accepts a preferred type in that category, +discard candidates that accept non-preferred types for that argument. +Keep all candidates if none survive these tests. +If only one candidate remains, use it; else continue to the next step. +</para> +</step> +<step id="op-resol-last-unknown" performance="required"> +<para> +If there are both <type>unknown</type> and known-type arguments, and all +the known-type arguments have the same type, assume that the +<type>unknown</type> arguments are also of that type, and check which +candidates can accept that type at the <type>unknown</type>-argument +positions. If exactly one candidate passes this test, use it. +Otherwise, fail. +</para> +</step> +</substeps> +</step> +</procedure> + +<para> +Some examples follow. +</para> + +<example> +<title>Square Root Operator Type Resolution</title> + +<para> +There is only one square root operator (prefix <literal>|/</literal>) +defined in the standard catalog, and it takes an argument of type +<type>double precision</type>. +The scanner assigns an initial type of <type>integer</type> to the argument +in this query expression: +<screen> +SELECT |/ 40 AS "square root of 40"; + square root of 40 +------------------- + 6.324555320336759 +(1 row) +</screen> + +So the parser does a type conversion on the operand and the query +is equivalent to: + +<screen> +SELECT |/ CAST(40 AS double precision) AS "square root of 40"; +</screen> +</para> +</example> + +<example> +<title>String Concatenation Operator Type Resolution</title> + +<para> +A string-like syntax is used for working with string types and for +working with complex extension types. +Strings with unspecified type are matched with likely operator candidates. +</para> + +<para> +An example with one unspecified argument: +<screen> +SELECT text 'abc' || 'def' AS "text and unknown"; + + text and unknown +------------------ + abcdef +(1 row) +</screen> +</para> + +<para> +In this case the parser looks to see if there is an operator taking <type>text</type> +for both arguments. Since there is, it assumes that the second argument should +be interpreted as type <type>text</type>. +</para> + +<para> +Here is a concatenation of two values of unspecified types: +<screen> +SELECT 'abc' || 'def' AS "unspecified"; + + unspecified +------------- + abcdef +(1 row) +</screen> +</para> + +<para> +In this case there is no initial hint for which type to use, since no types +are specified in the query. So, the parser looks for all candidate operators +and finds that there are candidates accepting both string-category and +bit-string-category inputs. Since string category is preferred when available, +that category is selected, and then the +preferred type for strings, <type>text</type>, is used as the specific +type to resolve the unknown-type literals as. +</para> +</example> + +<example> +<title>Absolute-Value and Negation Operator Type Resolution</title> + +<para> +The <productname>PostgreSQL</productname> operator catalog has several +entries for the prefix operator <literal>@</literal>, all of which implement +absolute-value operations for various numeric data types. One of these +entries is for type <type>float8</type>, which is the preferred type in +the numeric category. Therefore, <productname>PostgreSQL</productname> +will use that entry when faced with an <type>unknown</type> input: +<screen> +SELECT @ '-4.5' AS "abs"; + abs +----- + 4.5 +(1 row) +</screen> +Here the system has implicitly resolved the unknown-type literal as type +<type>float8</type> before applying the chosen operator. We can verify that +<type>float8</type> and not some other type was used: +<screen> +SELECT @ '-4.5e500' AS "abs"; + +ERROR: "-4.5e500" is out of range for type double precision +</screen> +</para> + +<para> +On the other hand, the prefix operator <literal>~</literal> (bitwise negation) +is defined only for integer data types, not for <type>float8</type>. So, if we +try a similar case with <literal>~</literal>, we get: +<screen> +SELECT ~ '20' AS "negation"; + +ERROR: operator is not unique: ~ "unknown" +HINT: Could not choose a best candidate operator. You might need to add +explicit type casts. +</screen> +This happens because the system cannot decide which of the several +possible <literal>~</literal> operators should be preferred. We can help +it out with an explicit cast: +<screen> +SELECT ~ CAST('20' AS int8) AS "negation"; + + negation +---------- + -21 +(1 row) +</screen> +</para> +</example> + +<example> +<title>Array Inclusion Operator Type Resolution</title> + +<para> +Here is another example of resolving an operator with one known and one +unknown input: +<screen> +SELECT array[1,2] <@ '{1,2,3}' as "is subset"; + + is subset +----------- + t +(1 row) +</screen> +The <productname>PostgreSQL</productname> operator catalog has several +entries for the infix operator <literal><@</literal>, but the only two that +could possibly accept an integer array on the left-hand side are +array inclusion (<type>anyarray</type> <literal><@</literal> <type>anyarray</type>) +and range inclusion (<type>anyelement</type> <literal><@</literal> <type>anyrange</type>). +Since none of these polymorphic pseudo-types (see <xref +linkend="datatype-pseudo"/>) are considered preferred, the parser cannot +resolve the ambiguity on that basis. +However, <xref linkend="op-resol-last-unknown"/> tells +it to assume that the unknown-type literal is of the same type as the other +input, that is, integer array. Now only one of the two operators can match, +so array inclusion is selected. (Had range inclusion been selected, we would +have gotten an error, because the string does not have the right format to be +a range literal.) +</para> +</example> + +<example> +<title>Custom Operator on a Domain Type</title> + +<para> +Users sometimes try to declare operators applying just to a domain type. +This is possible but is not nearly as useful as it might seem, because the +operator resolution rules are designed to select operators applying to the +domain's base type. As an example consider +<screen> +CREATE DOMAIN mytext AS text CHECK(...); +CREATE FUNCTION mytext_eq_text (mytext, text) RETURNS boolean AS ...; +CREATE OPERATOR = (procedure=mytext_eq_text, leftarg=mytext, rightarg=text); +CREATE TABLE mytable (val mytext); + +SELECT * FROM mytable WHERE val = 'foo'; +</screen> +This query will not use the custom operator. The parser will first see if +there is a <type>mytext</type> <literal>=</literal> <type>mytext</type> operator +(<xref linkend="op-resol-exact-unknown"/>), which there is not; +then it will consider the domain's base type <type>text</type>, and see if +there is a <type>text</type> <literal>=</literal> <type>text</type> operator +(<xref linkend="op-resol-exact-domain"/>), which there is; +so it resolves the <type>unknown</type>-type literal as <type>text</type> and +uses the <type>text</type> <literal>=</literal> <type>text</type> operator. +The only way to get the custom operator to be used is to explicitly cast +the literal: +<screen> +SELECT * FROM mytable WHERE val = text 'foo'; +</screen> +so that the <type>mytext</type> <literal>=</literal> <type>text</type> operator is found +immediately according to the exact-match rule. If the best-match rules +are reached, they actively discriminate against operators on domain types. +If they did not, such an operator would create too many ambiguous-operator +failures, because the casting rules always consider a domain as castable +to or from its base type, and so the domain operator would be considered +usable in all the same cases as a similarly-named operator on the base type. +</para> +</example> + +</sect1> + +<sect1 id="typeconv-func"> +<title>Functions</title> + +<indexterm zone="typeconv-func"> + <primary>function</primary> + <secondary>type resolution in an invocation</secondary> +</indexterm> + + <para> + The specific function that is referenced by a function call + is determined using the following procedure. + </para> + +<procedure> +<title>Function Type Resolution</title> + +<step performance="required"> +<para> +Select the functions to be considered from the +<classname>pg_proc</classname> system catalog. If a non-schema-qualified +function name was used, the functions +considered are those with the matching name and argument count that are +visible in the current search path (see <xref linkend="ddl-schemas-path"/>). +If a qualified function name was given, only functions in the specified +schema are considered. +</para> + +<substeps> +<step performance="optional"> +<para> +If the search path finds multiple functions of identical argument types, +only the one appearing earliest in the path is considered. Functions of +different argument types are considered on an equal footing regardless of +search path position. +</para> +</step> +<step performance="optional"> +<para> +If a function is declared with a <literal>VARIADIC</literal> array parameter, and +the call does not use the <literal>VARIADIC</literal> keyword, then the function +is treated as if the array parameter were replaced by one or more occurrences +of its element type, as needed to match the call. After such expansion the +function might have effective argument types identical to some non-variadic +function. In that case the function appearing earlier in the search path is +used, or if the two functions are in the same schema, the non-variadic one is +preferred. +</para> +<para> +This creates a security hazard when calling, via qualified name + <footnote id="func-qualified-security"> + <!-- If you edit this, consider editing op-qualified-security. --> + <para> + The hazard does not arise with a non-schema-qualified name, because a + search path containing schemas that permit untrusted users to create + objects is not a <link linkend="ddl-schemas-patterns">secure schema usage + pattern</link>. + </para> + </footnote>, +a variadic function found in a schema that permits untrusted users to create +objects. A malicious user can take control and execute arbitrary SQL +functions as though you executed them. Substitute a call bearing +the <literal>VARIADIC</literal> keyword, which bypasses this hazard. Calls +populating <literal>VARIADIC "any"</literal> parameters often have no +equivalent formulation containing the <literal>VARIADIC</literal> keyword. To +issue those calls safely, the function's schema must permit only trusted users +to create objects. +</para> +</step> +<step performance="optional"> +<para> +Functions that have default values for parameters are considered to match any +call that omits zero or more of the defaultable parameter positions. If more +than one such function matches a call, the one appearing earliest in the +search path is used. If there are two or more such functions in the same +schema with identical parameter types in the non-defaulted positions (which is +possible if they have different sets of defaultable parameters), the system +will not be able to determine which to prefer, and so an <quote>ambiguous +function call</quote> error will result if no better match to the call can be +found. +</para> +<para> +This creates an availability hazard when calling, via qualified +name<footnoteref linkend="func-qualified-security"/>, any function found in a +schema that permits untrusted users to create objects. A malicious user can +create a function with the name of an existing function, replicating that +function's parameters and appending novel parameters having default values. +This precludes new calls to the original function. To forestall this hazard, +place functions in schemas that permit only trusted users to create objects. +</para> +</step> +</substeps> +</step> + +<step performance="required"> +<para> +Check for a function accepting exactly the input argument types. +If one exists (there can be only one exact match in the set of +functions considered), use it. Lack of an exact match creates a security +hazard when calling, via qualified +name<footnoteref linkend="func-qualified-security"/>, a function found in a +schema that permits untrusted users to create objects. In such situations, +cast arguments to force an exact match. (Cases involving <type>unknown</type> +will never find a match at this step.) +</para> +</step> + +<step performance="required"> +<para> +If no exact match is found, see if the function call appears +to be a special type conversion request. This happens if the function call +has just one argument and the function name is the same as the (internal) +name of some data type. Furthermore, the function argument must be either +an unknown-type literal, or a type that is binary-coercible to the named +data type, or a type that could be converted to the named data type by +applying that type's I/O functions (that is, the conversion is either to or +from one of the standard string types). When these conditions are met, +the function call is treated as a form of <literal>CAST</literal> specification. + <footnote> + <para> + The reason for this step is to support function-style cast specifications + in cases where there is not an actual cast function. If there is a cast + function, it is conventionally named after its output type, and so there + is no need to have a special case. See + <xref linkend="sql-createcast"/> + for additional commentary. + </para> + </footnote> +</para> +</step> +<step performance="required"> +<para> +Look for the best match. +</para> +<substeps> +<step performance="required"> +<para> +Discard candidate functions for which the input types do not match +and cannot be converted (using an implicit conversion) to match. +<type>unknown</type> literals are +assumed to be convertible to anything for this purpose. If only one +candidate remains, use it; else continue to the next step. +</para> +</step> +<step performance="required"> +<para> +If any input argument is of a domain type, treat it as being of the +domain's base type for all subsequent steps. This ensures that domains +act like their base types for purposes of ambiguous-function resolution. +</para> +</step> +<step performance="required"> +<para> +Run through all candidates and keep those with the most exact matches +on input types. Keep all candidates if none have exact matches. +If only one candidate remains, use it; else continue to the next step. +</para> +</step> +<step performance="required"> +<para> +Run through all candidates and keep those that accept preferred types (of the +input data type's type category) at the most positions where type conversion +will be required. +Keep all candidates if none accept preferred types. +If only one candidate remains, use it; else continue to the next step. +</para> +</step> +<step performance="required"> +<para> +If any input arguments are <type>unknown</type>, check the type categories +accepted +at those argument positions by the remaining candidates. At each position, +select the <type>string</type> category if any candidate accepts that category. +(This bias towards string +is appropriate since an unknown-type literal looks like a string.) +Otherwise, if all the remaining candidates accept the same type category, +select that category; otherwise fail because +the correct choice cannot be deduced without more clues. +Now discard candidates that do not accept the selected type category. +Furthermore, if any candidate accepts a preferred type in that category, +discard candidates that accept non-preferred types for that argument. +Keep all candidates if none survive these tests. +If only one candidate remains, use it; else continue to the next step. +</para> +</step> +<step performance="required"> +<para> +If there are both <type>unknown</type> and known-type arguments, and all +the known-type arguments have the same type, assume that the +<type>unknown</type> arguments are also of that type, and check which +candidates can accept that type at the <type>unknown</type>-argument +positions. If exactly one candidate passes this test, use it. +Otherwise, fail. +</para> +</step> +</substeps> +</step> +</procedure> + +<para> +Note that the <quote>best match</quote> rules are identical for operator and +function type resolution. +Some examples follow. +</para> + +<example> +<title>Rounding Function Argument Type Resolution</title> + +<para> +There is only one <function>round</function> function that takes two +arguments; it takes a first argument of type <type>numeric</type> and +a second argument of type <type>integer</type>. +So the following query automatically converts +the first argument of type <type>integer</type> to +<type>numeric</type>: + +<screen> +SELECT round(4, 4); + + round +-------- + 4.0000 +(1 row) +</screen> + +That query is actually transformed by the parser to: +<screen> +SELECT round(CAST (4 AS numeric), 4); +</screen> +</para> + +<para> +Since numeric constants with decimal points are initially assigned the +type <type>numeric</type>, the following query will require no type +conversion and therefore might be slightly more efficient: +<screen> +SELECT round(4.0, 4); +</screen> +</para> +</example> + +<example> +<title>Variadic Function Resolution</title> + +<para> +<screen> +CREATE FUNCTION public.variadic_example(VARIADIC numeric[]) RETURNS int + LANGUAGE sql AS 'SELECT 1'; +CREATE FUNCTION +</screen> + +This function accepts, but does not require, the VARIADIC keyword. It +tolerates both integer and numeric arguments: + +<screen> +SELECT public.variadic_example(0), + public.variadic_example(0.0), + public.variadic_example(VARIADIC array[0.0]); + variadic_example | variadic_example | variadic_example +------------------+------------------+------------------ + 1 | 1 | 1 +(1 row) +</screen> + +However, the first and second calls will prefer more-specific functions, if +available: + +<screen> +CREATE FUNCTION public.variadic_example(numeric) RETURNS int + LANGUAGE sql AS 'SELECT 2'; +CREATE FUNCTION + +CREATE FUNCTION public.variadic_example(int) RETURNS int + LANGUAGE sql AS 'SELECT 3'; +CREATE FUNCTION + +SELECT public.variadic_example(0), + public.variadic_example(0.0), + public.variadic_example(VARIADIC array[0.0]); + variadic_example | variadic_example | variadic_example +------------------+------------------+------------------ + 3 | 2 | 1 +(1 row) +</screen> + +Given the default configuration and only the first function existing, the +first and second calls are insecure. Any user could intercept them by +creating the second or third function. By matching the argument type exactly +and using the <literal>VARIADIC</literal> keyword, the third call is secure. +</para> +</example> + +<example> +<title>Substring Function Type Resolution</title> + +<para> +There are several <function>substr</function> functions, one of which +takes types <type>text</type> and <type>integer</type>. If called +with a string constant of unspecified type, the system chooses the +candidate function that accepts an argument of the preferred category +<literal>string</literal> (namely of type <type>text</type>). + +<screen> +SELECT substr('1234', 3); + + substr +-------- + 34 +(1 row) +</screen> +</para> + +<para> +If the string is declared to be of type <type>varchar</type>, as might be the case +if it comes from a table, then the parser will try to convert it to become <type>text</type>: +<screen> +SELECT substr(varchar '1234', 3); + + substr +-------- + 34 +(1 row) +</screen> + +This is transformed by the parser to effectively become: +<screen> +SELECT substr(CAST (varchar '1234' AS text), 3); +</screen> +</para> +<para> +<note> +<para> +The parser learns from the <structname>pg_cast</structname> catalog that +<type>text</type> and <type>varchar</type> +are binary-compatible, meaning that one can be passed to a function that +accepts the other without doing any physical conversion. Therefore, no +type conversion call is really inserted in this case. +</para> +</note> +</para> + +<para> +And, if the function is called with an argument of type <type>integer</type>, +the parser will try to convert that to <type>text</type>: +<screen> +SELECT substr(1234, 3); +ERROR: function substr(integer, integer) does not exist +HINT: No function matches the given name and argument types. You might need +to add explicit type casts. +</screen> + +This does not work because <type>integer</type> does not have an implicit cast +to <type>text</type>. An explicit cast will work, however: +<screen> +SELECT substr(CAST (1234 AS text), 3); + + substr +-------- + 34 +(1 row) +</screen> +</para> +</example> + +</sect1> + +<sect1 id="typeconv-query"> +<title>Value Storage</title> + + <para> + Values to be inserted into a table are converted to the destination + column's data type according to the + following steps. + </para> + +<procedure> +<title>Value Storage Type Conversion</title> + +<step performance="required"> +<para> +Check for an exact match with the target. +</para> +</step> + +<step performance="required"> +<para> +Otherwise, try to convert the expression to the target type. This is possible +if an <firstterm>assignment cast</firstterm> between the two types is registered in the +<structname>pg_cast</structname> catalog (see <xref linkend="sql-createcast"/>). +Alternatively, if the expression is an unknown-type literal, the contents of +the literal string will be fed to the input conversion routine for the target +type. +</para> +</step> + +<step performance="required"> +<para> +Check to see if there is a sizing cast for the target type. A sizing +cast is a cast from that type to itself. If one is found in the +<structname>pg_cast</structname> catalog, apply it to the expression before storing +into the destination column. The implementation function for such a cast +always takes an extra parameter of type <type>integer</type>, which receives +the destination column's <structfield>atttypmod</structfield> value (typically its +declared length, although the interpretation of <structfield>atttypmod</structfield> +varies for different data types), and it may take a third <type>boolean</type> +parameter that says whether the cast is explicit or implicit. The cast +function +is responsible for applying any length-dependent semantics such as size +checking or truncation. +</para> +</step> + +</procedure> + +<example> +<title><type>character</type> Storage Type Conversion</title> + +<para> +For a target column declared as <type>character(20)</type> the following +statement shows that the stored value is sized correctly: + +<screen> +CREATE TABLE vv (v character(20)); +INSERT INTO vv SELECT 'abc' || 'def'; +SELECT v, octet_length(v) FROM vv; + + v | octet_length +----------------------+-------------- + abcdef | 20 +(1 row) +</screen> +</para> + +<para> +What has really happened here is that the two unknown literals are resolved +to <type>text</type> by default, allowing the <literal>||</literal> operator +to be resolved as <type>text</type> concatenation. Then the <type>text</type> +result of the operator is converted to <type>bpchar</type> (<quote>blank-padded +char</quote>, the internal name of the <type>character</type> data type) to match the target +column type. (Since the conversion from <type>text</type> to +<type>bpchar</type> is binary-coercible, this conversion does +not insert any real function call.) Finally, the sizing function +<literal>bpchar(bpchar, integer, boolean)</literal> is found in the system catalog +and applied to the operator's result and the stored column length. This +type-specific function performs the required length check and addition of +padding spaces. +</para> +</example> +</sect1> + +<sect1 id="typeconv-union-case"> +<title><literal>UNION</literal>, <literal>CASE</literal>, and Related Constructs</title> + +<indexterm zone="typeconv-union-case"> + <primary>UNION</primary> + <secondary>determination of result type</secondary> +</indexterm> + +<indexterm zone="typeconv-union-case"> + <primary>CASE</primary> + <secondary>determination of result type</secondary> +</indexterm> + +<indexterm zone="typeconv-union-case"> + <primary>ARRAY</primary> + <secondary>determination of result type</secondary> +</indexterm> + +<indexterm zone="typeconv-union-case"> + <primary>VALUES</primary> + <secondary>determination of result type</secondary> +</indexterm> + +<indexterm zone="typeconv-union-case"> + <primary>GREATEST</primary> + <secondary>determination of result type</secondary> +</indexterm> + +<indexterm zone="typeconv-union-case"> + <primary>LEAST</primary> + <secondary>determination of result type</secondary> +</indexterm> + +<para> +SQL <literal>UNION</literal> constructs must match up possibly dissimilar +types to become a single result set. The resolution algorithm is +applied separately to each output column of a union query. The +<literal>INTERSECT</literal> and <literal>EXCEPT</literal> constructs resolve +dissimilar types in the same way as <literal>UNION</literal>. +Some other constructs, including +<literal>CASE</literal>, <literal>ARRAY</literal>, <literal>VALUES</literal>, +and the <function>GREATEST</function> and <function>LEAST</function> +functions, use the identical +algorithm to match up their component expressions and select a result +data type. +</para> + +<procedure> +<title>Type Resolution for <literal>UNION</literal>, <literal>CASE</literal>, +and Related Constructs</title> + +<step performance="required"> +<para> +If all inputs are of the same type, and it is not <type>unknown</type>, +resolve as that type. +</para> +</step> + +<step performance="required"> +<para> +If any input is of a domain type, treat it as being of the +domain's base type for all subsequent steps. + <footnote> + <para> + Somewhat like the treatment of domain inputs for operators and + functions, this behavior allows a domain type to be preserved through + a <literal>UNION</literal> or similar construct, so long as the user is + careful to ensure that all inputs are implicitly or explicitly of that + exact type. Otherwise the domain's base type will be used. + </para> + </footnote> +</para> +</step> + +<step performance="required"> +<para> +If all inputs are of type <type>unknown</type>, resolve as type +<type>text</type> (the preferred type of the string category). +Otherwise, <type>unknown</type> inputs are ignored for the purposes +of the remaining rules. +</para> +</step> + +<step performance="required"> +<para> +If the non-unknown inputs are not all of the same type category, fail. +</para> +</step> + +<step performance="required"> +<para> +Select the first non-unknown input type as the candidate type, +then consider each other non-unknown input type, left to right. + <footnote> + <para> + For historical reasons, <literal>CASE</literal> treats + its <literal>ELSE</literal> clause (if any) as the <quote>first</quote> + input, with the <literal>THEN</literal> clauses(s) considered after + that. In all other cases, <quote>left to right</quote> means the order + in which the expressions appear in the query text. + </para> + </footnote> +If the candidate type can be implicitly converted to the other type, +but not vice-versa, select the other type as the new candidate type. +Then continue considering the remaining inputs. If, at any stage of this +process, a preferred type is selected, stop considering additional +inputs. +</para> +</step> + +<step performance="required"> +<para> +Convert all inputs to the final candidate type. Fail if there is not an +implicit conversion from a given input type to the candidate type. +</para> +</step> +</procedure> + +<para> +Some examples follow. +</para> + +<example> +<title>Type Resolution with Underspecified Types in a Union</title> + +<para> +<screen> +SELECT text 'a' AS "text" UNION SELECT 'b'; + + text +------ + a + b +(2 rows) +</screen> +Here, the unknown-type literal <literal>'b'</literal> will be resolved to type <type>text</type>. +</para> +</example> + +<example> +<title>Type Resolution in a Simple Union</title> + +<para> +<screen> +SELECT 1.2 AS "numeric" UNION SELECT 1; + + numeric +--------- + 1 + 1.2 +(2 rows) +</screen> +The literal <literal>1.2</literal> is of type <type>numeric</type>, +and the <type>integer</type> value <literal>1</literal> can be cast implicitly to +<type>numeric</type>, so that type is used. +</para> +</example> + +<example> +<title>Type Resolution in a Transposed Union</title> + +<para> +<screen> +SELECT 1 AS "real" UNION SELECT CAST('2.2' AS REAL); + + real +------ + 1 + 2.2 +(2 rows) +</screen> +Here, since type <type>real</type> cannot be implicitly cast to <type>integer</type>, +but <type>integer</type> can be implicitly cast to <type>real</type>, the union +result type is resolved as <type>real</type>. +</para> +</example> + +<example> +<title>Type Resolution in a Nested Union</title> + +<para> +<screen> +SELECT NULL UNION SELECT NULL UNION SELECT 1; + +ERROR: UNION types text and integer cannot be matched +</screen> +This failure occurs because <productname>PostgreSQL</productname> treats +multiple <literal>UNION</literal>s as a nest of pairwise operations; +that is, this input is the same as +<screen> +(SELECT NULL UNION SELECT NULL) UNION SELECT 1; +</screen> +The inner <literal>UNION</literal> is resolved as emitting +type <type>text</type>, according to the rules given above. Then the +outer <literal>UNION</literal> has inputs of types <type>text</type> +and <type>integer</type>, leading to the observed error. The problem +can be fixed by ensuring that the leftmost <literal>UNION</literal> +has at least one input of the desired result type. +</para> + +<para> +<literal>INTERSECT</literal> and <literal>EXCEPT</literal> operations are +likewise resolved pairwise. However, the other constructs described in this +section consider all of their inputs in one resolution step. +</para> +</example> +</sect1> + +<sect1 id="typeconv-select"> +<title><literal>SELECT</literal> Output Columns</title> + +<indexterm zone="typeconv-select"> + <primary>SELECT</primary> + <secondary>determination of result type</secondary> +</indexterm> + +<para> +The rules given in the preceding sections will result in assignment +of non-<type>unknown</type> data types to all expressions in an SQL query, +except for unspecified-type literals that appear as simple output +columns of a <command>SELECT</command> command. For example, in + +<screen> +SELECT 'Hello World'; +</screen> + +there is nothing to identify what type the string literal should be +taken as. In this situation <productname>PostgreSQL</productname> will fall back +to resolving the literal's type as <type>text</type>. +</para> + +<para> +When the <command>SELECT</command> is one arm of a <literal>UNION</literal> +(or <literal>INTERSECT</literal> or <literal>EXCEPT</literal>) construct, or when it +appears within <command>INSERT ... SELECT</command>, this rule is not applied +since rules given in preceding sections take precedence. The type of an +unspecified-type literal can be taken from the other <literal>UNION</literal> arm +in the first case, or from the destination column in the second case. +</para> + +<para> +<literal>RETURNING</literal> lists are treated the same as <command>SELECT</command> +output lists for this purpose. +</para> + +<note> + <para> + Prior to <productname>PostgreSQL</productname> 10, this rule did not exist, and + unspecified-type literals in a <command>SELECT</command> output list were + left as type <type>unknown</type>. That had assorted bad consequences, + so it's been changed. + </para> +</note> + +</sect1> +</chapter> |