diff options
Diffstat (limited to 'doc/src/sgml/xfunc.sgml')
-rw-r--r-- | doc/src/sgml/xfunc.sgml | 3639 |
1 files changed, 3639 insertions, 0 deletions
diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml new file mode 100644 index 0000000..701432f --- /dev/null +++ b/doc/src/sgml/xfunc.sgml @@ -0,0 +1,3639 @@ +<!-- doc/src/sgml/xfunc.sgml --> + + <sect1 id="xfunc"> + <title>User-Defined Functions</title> + + <indexterm zone="xfunc"> + <primary>function</primary> + <secondary>user-defined</secondary> + </indexterm> + + <para> + <productname>PostgreSQL</productname> provides four kinds of + functions: + + <itemizedlist> + <listitem> + <para> + query language functions (functions written in + <acronym>SQL</acronym>) (<xref linkend="xfunc-sql"/>) + </para> + </listitem> + <listitem> + <para> + procedural language functions (functions written in, for + example, <application>PL/pgSQL</application> or <application>PL/Tcl</application>) + (<xref linkend="xfunc-pl"/>) + </para> + </listitem> + <listitem> + <para> + internal functions (<xref linkend="xfunc-internal"/>) + </para> + </listitem> + <listitem> + <para> + C-language functions (<xref linkend="xfunc-c"/>) + </para> + </listitem> + </itemizedlist> + </para> + + <para> + Every kind + of function can take base types, composite types, or + combinations of these as arguments (parameters). In addition, + every kind of function can return a base type or + a composite type. Functions can also be defined to return + sets of base or composite values. + </para> + + <para> + Many kinds of functions can take or return certain pseudo-types + (such as polymorphic types), but the available facilities vary. + Consult the description of each kind of function for more details. + </para> + + <para> + It's easiest to define <acronym>SQL</acronym> + functions, so we'll start by discussing those. + Most of the concepts presented for <acronym>SQL</acronym> functions + will carry over to the other types of functions. + </para> + + <para> + Throughout this chapter, it can be useful to look at the reference + page of the <link linkend="sql-createfunction"><command>CREATE + FUNCTION</command></link> command to + understand the examples better. Some examples from this chapter + can be found in <filename>funcs.sql</filename> and + <filename>funcs.c</filename> in the <filename>src/tutorial</filename> + directory in the <productname>PostgreSQL</productname> source + distribution. + </para> + </sect1> + + <sect1 id="xproc"> + <title>User-Defined Procedures</title> + + <indexterm zone="xproc"> + <primary>procedure</primary> + <secondary>user-defined</secondary> + </indexterm> + + <para> + A procedure is a database object similar to a function. + The key differences are: + + <itemizedlist> + <listitem> + <para> + Procedures are defined with + the <link linkend="sql-createprocedure"><command>CREATE + PROCEDURE</command></link> command, not <command>CREATE + FUNCTION</command>. + </para> + </listitem> + <listitem> + <para> + Procedures do not return a function value; hence <command>CREATE + PROCEDURE</command> lacks a <literal>RETURNS</literal> clause. + However, procedures can instead return data to their callers via + output parameters. + </para> + </listitem> + <listitem> + <para> + While a function is called as part of a query or DML command, a + procedure is called in isolation using + the <link linkend="sql-call"><command>CALL</command></link> command. + </para> + </listitem> + <listitem> + <para> + A procedure can commit or roll back transactions during its + execution (then automatically beginning a new transaction), so long + as the invoking <command>CALL</command> command is not part of an + explicit transaction block. A function cannot do that. + </para> + </listitem> + <listitem> + <para> + Certain function attributes, such as strictness, don't apply to + procedures. Those attributes control how the function is + used in a query, which isn't relevant to procedures. + </para> + </listitem> + </itemizedlist> + </para> + + <para> + The explanations in the following sections about how to define + user-defined functions apply to procedures as well, except for the + points made above. + </para> + + <para> + Collectively, functions and procedures are also known + as <firstterm>routines</firstterm><indexterm><primary>routine</primary></indexterm>. + There are commands such as <link linkend="sql-alterroutine"><command>ALTER ROUTINE</command></link> + and <link linkend="sql-droproutine"><command>DROP ROUTINE</command></link> that can operate on functions and + procedures without having to know which kind it is. Note, however, that + there is no <literal>CREATE ROUTINE</literal> command. + </para> + </sect1> + + <sect1 id="xfunc-sql"> + <title>Query Language (<acronym>SQL</acronym>) Functions</title> + + <indexterm zone="xfunc-sql"> + <primary>function</primary> + <secondary>user-defined</secondary> + <tertiary>in SQL</tertiary> + </indexterm> + + <para> + SQL functions execute an arbitrary list of SQL statements, returning + the result of the last query in the list. + In the simple (non-set) + case, the first row of the last query's result will be returned. + (Bear in mind that <quote>the first row</quote> of a multirow + result is not well-defined unless you use <literal>ORDER BY</literal>.) + If the last query happens + to return no rows at all, the null value will be returned. + </para> + + <para> + Alternatively, an SQL function can be declared to return a set (that is, + multiple rows) by specifying the function's return type as <literal>SETOF + <replaceable>sometype</replaceable></literal>, or equivalently by declaring it as + <literal>RETURNS TABLE(<replaceable>columns</replaceable>)</literal>. In this case + all rows of the last query's result are returned. Further details appear + below. + </para> + + <para> + The body of an SQL function must be a list of SQL + statements separated by semicolons. A semicolon after the last + statement is optional. Unless the function is declared to return + <type>void</type>, the last statement must be a <command>SELECT</command>, + or an <command>INSERT</command>, <command>UPDATE</command>, or <command>DELETE</command> + that has a <literal>RETURNING</literal> clause. + </para> + + <para> + Any collection of commands in the <acronym>SQL</acronym> + language can be packaged together and defined as a function. + Besides <command>SELECT</command> queries, the commands can include data + modification queries (<command>INSERT</command>, + <command>UPDATE</command>, <command>DELETE</command>, and + <command>MERGE</command>), as well as + other SQL commands. (You cannot use transaction control commands, e.g., + <command>COMMIT</command>, <command>SAVEPOINT</command>, and some utility + commands, e.g., <literal>VACUUM</literal>, in <acronym>SQL</acronym> functions.) + However, the final command + must be a <command>SELECT</command> or have a <literal>RETURNING</literal> + clause that returns whatever is + specified as the function's return type. Alternatively, if you + want to define an SQL function that performs actions but has no + useful value to return, you can define it as returning <type>void</type>. + For example, this function removes rows with negative salaries from + the <literal>emp</literal> table: + +<screen> +CREATE FUNCTION clean_emp() RETURNS void AS ' + DELETE FROM emp + WHERE salary < 0; +' LANGUAGE SQL; + +SELECT clean_emp(); + + clean_emp +----------- + +(1 row) +</screen> + </para> + + <para> + You can also write this as a procedure, thus avoiding the issue of the + return type. For example: +<screen> +CREATE PROCEDURE clean_emp() AS ' + DELETE FROM emp + WHERE salary < 0; +' LANGUAGE SQL; + +CALL clean_emp(); +</screen> + In simple cases like this, the difference between a function returning + <type>void</type> and a procedure is mostly stylistic. However, + procedures offer additional functionality such as transaction control + that is not available in functions. Also, procedures are SQL standard + whereas returning <type>void</type> is a PostgreSQL extension. + </para> + + <note> + <para> + The entire body of an SQL function is parsed before any of it is + executed. While an SQL function can contain commands that alter + the system catalogs (e.g., <command>CREATE TABLE</command>), the effects + of such commands will not be visible during parse analysis of + later commands in the function. Thus, for example, + <literal>CREATE TABLE foo (...); INSERT INTO foo VALUES(...);</literal> + will not work as desired if packaged up into a single SQL function, + since <structname>foo</structname> won't exist yet when the <command>INSERT</command> + command is parsed. It's recommended to use <application>PL/pgSQL</application> + instead of an SQL function in this type of situation. + </para> + </note> + + <para> + The syntax of the <command>CREATE FUNCTION</command> command requires + the function body to be written as a string constant. It is usually + most convenient to use dollar quoting (see <xref + linkend="sql-syntax-dollar-quoting"/>) for the string constant. + If you choose to use regular single-quoted string constant syntax, + you must double single quote marks (<literal>'</literal>) and backslashes + (<literal>\</literal>) (assuming escape string syntax) in the body of + the function (see <xref linkend="sql-syntax-strings"/>). + </para> + + <sect2 id="xfunc-sql-function-arguments"> + <title>Arguments for <acronym>SQL</acronym> Functions</title> + + <indexterm> + <primary>function</primary> + <secondary>named argument</secondary> + </indexterm> + + <para> + Arguments of an SQL function can be referenced in the function + body using either names or numbers. Examples of both methods appear + below. + </para> + + <para> + To use a name, declare the function argument as having a name, and + then just write that name in the function body. If the argument name + is the same as any column name in the current SQL command within the + function, the column name will take precedence. To override this, + qualify the argument name with the name of the function itself, that is + <literal><replaceable>function_name</replaceable>.<replaceable>argument_name</replaceable></literal>. + (If this would conflict with a qualified column name, again the column + name wins. You can avoid the ambiguity by choosing a different alias for + the table within the SQL command.) + </para> + + <para> + In the older numeric approach, arguments are referenced using the syntax + <literal>$<replaceable>n</replaceable></literal>: <literal>$1</literal> refers to the first input + argument, <literal>$2</literal> to the second, and so on. This will work + whether or not the particular argument was declared with a name. + </para> + + <para> + If an argument is of a composite type, then the dot notation, + e.g., <literal><replaceable>argname</replaceable>.<replaceable>fieldname</replaceable></literal> or + <literal>$1.<replaceable>fieldname</replaceable></literal>, can be used to access attributes of the + argument. Again, you might need to qualify the argument's name with the + function name to make the form with an argument name unambiguous. + </para> + + <para> + SQL function arguments can only be used as data values, + not as identifiers. Thus for example this is reasonable: +<programlisting> +INSERT INTO mytable VALUES ($1); +</programlisting> +but this will not work: +<programlisting> +INSERT INTO $1 VALUES (42); +</programlisting> + </para> + + <note> + <para> + The ability to use names to reference SQL function arguments was added + in <productname>PostgreSQL</productname> 9.2. Functions to be used in + older servers must use the <literal>$<replaceable>n</replaceable></literal> notation. + </para> + </note> + </sect2> + + <sect2 id="xfunc-sql-base-functions"> + <title><acronym>SQL</acronym> Functions on Base Types</title> + + <para> + The simplest possible <acronym>SQL</acronym> function has no arguments and + simply returns a base type, such as <type>integer</type>: + +<screen> +CREATE FUNCTION one() RETURNS integer AS $$ + SELECT 1 AS result; +$$ LANGUAGE SQL; + +-- Alternative syntax for string literal: +CREATE FUNCTION one() RETURNS integer AS ' + SELECT 1 AS result; +' LANGUAGE SQL; + +SELECT one(); + + one +----- + 1 +</screen> + </para> + + <para> + Notice that we defined a column alias within the function body for the result of the function + (with the name <literal>result</literal>), but this column alias is not visible + outside the function. Hence, the result is labeled <literal>one</literal> + instead of <literal>result</literal>. + </para> + + <para> + It is almost as easy to define <acronym>SQL</acronym> functions + that take base types as arguments: + +<screen> +CREATE FUNCTION add_em(x integer, y integer) RETURNS integer AS $$ + SELECT x + y; +$$ LANGUAGE SQL; + +SELECT add_em(1, 2) AS answer; + + answer +-------- + 3 +</screen> + </para> + + <para> + Alternatively, we could dispense with names for the arguments and + use numbers: + +<screen> +CREATE FUNCTION add_em(integer, integer) RETURNS integer AS $$ + SELECT $1 + $2; +$$ LANGUAGE SQL; + +SELECT add_em(1, 2) AS answer; + + answer +-------- + 3 +</screen> + </para> + + <para> + Here is a more useful function, which might be used to debit a + bank account: + +<programlisting> +CREATE FUNCTION tf1 (accountno integer, debit numeric) RETURNS numeric AS $$ + UPDATE bank + SET balance = balance - debit + WHERE accountno = tf1.accountno; + SELECT 1; +$$ LANGUAGE SQL; +</programlisting> + + A user could execute this function to debit account 17 by $100.00 as + follows: + +<programlisting> +SELECT tf1(17, 100.0); +</programlisting> + </para> + + <para> + In this example, we chose the name <literal>accountno</literal> for the first + argument, but this is the same as the name of a column in the + <literal>bank</literal> table. Within the <command>UPDATE</command> command, + <literal>accountno</literal> refers to the column <literal>bank.accountno</literal>, + so <literal>tf1.accountno</literal> must be used to refer to the argument. + We could of course avoid this by using a different name for the argument. + </para> + + <para> + In practice one would probably like a more useful result from the + function than a constant 1, so a more likely definition + is: + +<programlisting> +CREATE FUNCTION tf1 (accountno integer, debit numeric) RETURNS numeric AS $$ + UPDATE bank + SET balance = balance - debit + WHERE accountno = tf1.accountno; + SELECT balance FROM bank WHERE accountno = tf1.accountno; +$$ LANGUAGE SQL; +</programlisting> + + which adjusts the balance and returns the new balance. + The same thing could be done in one command using <literal>RETURNING</literal>: + +<programlisting> +CREATE FUNCTION tf1 (accountno integer, debit numeric) RETURNS numeric AS $$ + UPDATE bank + SET balance = balance - debit + WHERE accountno = tf1.accountno + RETURNING balance; +$$ LANGUAGE SQL; +</programlisting> + </para> + + <para> + If the final <literal>SELECT</literal> or <literal>RETURNING</literal> + clause in an <acronym>SQL</acronym> function does not return exactly + the function's declared result + type, <productname>PostgreSQL</productname> will automatically cast + the value to the required type, if that is possible with an implicit + or assignment cast. Otherwise, you must write an explicit cast. + For example, suppose we wanted the + previous <function>add_em</function> function to return + type <type>float8</type> instead. It's sufficient to write + +<programlisting> +CREATE FUNCTION add_em(integer, integer) RETURNS float8 AS $$ + SELECT $1 + $2; +$$ LANGUAGE SQL; +</programlisting> + + since the <type>integer</type> sum can be implicitly cast + to <type>float8</type>. + (See <xref linkend="typeconv"/> or <xref linkend="sql-createcast"/> + for more about casts.) + </para> + </sect2> + + <sect2 id="xfunc-sql-composite-functions"> + <title><acronym>SQL</acronym> Functions on Composite Types</title> + + <para> + When writing functions with arguments of composite types, we must not + only specify which argument we want but also the desired attribute + (field) of that argument. For example, suppose that + <type>emp</type> is a table containing employee data, and therefore + also the name of the composite type of each row of the table. Here + is a function <function>double_salary</function> that computes what someone's + salary would be if it were doubled: + +<screen> +CREATE TABLE emp ( + name text, + salary numeric, + age integer, + cubicle point +); + +INSERT INTO emp VALUES ('Bill', 4200, 45, '(2,1)'); + +CREATE FUNCTION double_salary(emp) RETURNS numeric AS $$ + SELECT $1.salary * 2 AS salary; +$$ LANGUAGE SQL; + +SELECT name, double_salary(emp.*) AS dream + FROM emp + WHERE emp.cubicle ~= point '(2,1)'; + + name | dream +------+------- + Bill | 8400 +</screen> + </para> + + <para> + Notice the use of the syntax <literal>$1.salary</literal> + to select one field of the argument row value. Also notice + how the calling <command>SELECT</command> command + uses <replaceable>table_name</replaceable><literal>.*</literal> to select + the entire current row of a table as a composite value. The table + row can alternatively be referenced using just the table name, + like this: +<screen> +SELECT name, double_salary(emp) AS dream + FROM emp + WHERE emp.cubicle ~= point '(2,1)'; +</screen> + but this usage is deprecated since it's easy to get confused. + (See <xref linkend="rowtypes-usage"/> for details about these + two notations for the composite value of a table row.) + </para> + + <para> + Sometimes it is handy to construct a composite argument value + on-the-fly. This can be done with the <literal>ROW</literal> construct. + For example, we could adjust the data being passed to the function: +<screen> +SELECT name, double_salary(ROW(name, salary*1.1, age, cubicle)) AS dream + FROM emp; +</screen> + </para> + + <para> + It is also possible to build a function that returns a composite type. + This is an example of a function + that returns a single <type>emp</type> row: + +<programlisting> +CREATE FUNCTION new_emp() RETURNS emp AS $$ + SELECT text 'None' AS name, + 1000.0 AS salary, + 25 AS age, + point '(2,2)' AS cubicle; +$$ LANGUAGE SQL; +</programlisting> + + In this example we have specified each of the attributes + with a constant value, but any computation + could have been substituted for these constants. + </para> + + <para> + Note two important things about defining the function: + + <itemizedlist> + <listitem> + <para> + The select list order in the query must be exactly the same as + that in which the columns appear in the composite type. + (Naming the columns, as we did above, + is irrelevant to the system.) + </para> + </listitem> + <listitem> + <para> + We must ensure each expression's type can be cast to that of + the corresponding column of the composite type. + Otherwise we'll get errors like this: +<screen> +<computeroutput> +ERROR: return type mismatch in function declared to return emp +DETAIL: Final statement returns text instead of point at column 4. +</computeroutput> +</screen> + As with the base-type case, the system will not insert explicit + casts automatically, only implicit or assignment casts. + </para> + </listitem> + </itemizedlist> + </para> + + <para> + A different way to define the same function is: + +<programlisting> +CREATE FUNCTION new_emp() RETURNS emp AS $$ + SELECT ROW('None', 1000.0, 25, '(2,2)')::emp; +$$ LANGUAGE SQL; +</programlisting> + + Here we wrote a <command>SELECT</command> that returns just a single + column of the correct composite type. This isn't really better + in this situation, but it is a handy alternative in some cases + — for example, if we need to compute the result by calling + another function that returns the desired composite value. + Another example is that if we are trying to write a function that + returns a domain over composite, rather than a plain composite type, + it is always necessary to write it as returning a single column, + since there is no way to cause a coercion of the whole row result. + </para> + + <para> + We could call this function directly either by using it in + a value expression: + +<screen> +SELECT new_emp(); + + new_emp +-------------------------- + (None,1000.0,25,"(2,2)") +</screen> + + or by calling it as a table function: + +<screen> +SELECT * FROM new_emp(); + + name | salary | age | cubicle +------+--------+-----+--------- + None | 1000.0 | 25 | (2,2) +</screen> + + The second way is described more fully in <xref + linkend="xfunc-sql-table-functions"/>. + </para> + + <para> + When you use a function that returns a composite type, + you might want only one field (attribute) from its result. + You can do that with syntax like this: + +<screen> +SELECT (new_emp()).name; + + name +------ + None +</screen> + + The extra parentheses are needed to keep the parser from getting + confused. If you try to do it without them, you get something like this: + +<screen> +SELECT new_emp().name; +ERROR: syntax error at or near "." +LINE 1: SELECT new_emp().name; + ^ +</screen> + </para> + + <para> + Another option is to use functional notation for extracting an attribute: + +<screen> +SELECT name(new_emp()); + + name +------ + None +</screen> + + As explained in <xref linkend="rowtypes-usage"/>, the field notation and + functional notation are equivalent. + </para> + + <para> + Another way to use a function returning a composite type is to pass the + result to another function that accepts the correct row type as input: + +<screen> +CREATE FUNCTION getname(emp) RETURNS text AS $$ + SELECT $1.name; +$$ LANGUAGE SQL; + +SELECT getname(new_emp()); + getname +--------- + None +(1 row) +</screen> + </para> + </sect2> + + <sect2 id="xfunc-output-parameters"> + <title><acronym>SQL</acronym> Functions with Output Parameters</title> + + <indexterm> + <primary>function</primary> + <secondary>output parameter</secondary> + </indexterm> + + <para> + An alternative way of describing a function's results is to define it + with <firstterm>output parameters</firstterm>, as in this example: + +<screen> +CREATE FUNCTION add_em (IN x int, IN y int, OUT sum int) +AS 'SELECT x + y' +LANGUAGE SQL; + +SELECT add_em(3,7); + add_em +-------- + 10 +(1 row) +</screen> + + This is not essentially different from the version of <literal>add_em</literal> + shown in <xref linkend="xfunc-sql-base-functions"/>. The real value of + output parameters is that they provide a convenient way of defining + functions that return several columns. For example, + +<screen> +CREATE FUNCTION sum_n_product (x int, y int, OUT sum int, OUT product int) +AS 'SELECT x + y, x * y' +LANGUAGE SQL; + + SELECT * FROM sum_n_product(11,42); + sum | product +-----+--------- + 53 | 462 +(1 row) +</screen> + + What has essentially happened here is that we have created an anonymous + composite type for the result of the function. The above example has + the same end result as + +<screen> +CREATE TYPE sum_prod AS (sum int, product int); + +CREATE FUNCTION sum_n_product (int, int) RETURNS sum_prod +AS 'SELECT $1 + $2, $1 * $2' +LANGUAGE SQL; +</screen> + + but not having to bother with the separate composite type definition + is often handy. Notice that the names attached to the output parameters + are not just decoration, but determine the column names of the anonymous + composite type. (If you omit a name for an output parameter, the + system will choose a name on its own.) + </para> + + <para> + Notice that output parameters are not included in the calling argument + list when invoking such a function from SQL. This is because + <productname>PostgreSQL</productname> considers only the input + parameters to define the function's calling signature. That means + also that only the input parameters matter when referencing the function + for purposes such as dropping it. We could drop the above function + with either of + +<screen> +DROP FUNCTION sum_n_product (x int, y int, OUT sum int, OUT product int); +DROP FUNCTION sum_n_product (int, int); +</screen> + </para> + + <para> + Parameters can be marked as <literal>IN</literal> (the default), + <literal>OUT</literal>, <literal>INOUT</literal>, or <literal>VARIADIC</literal>. + An <literal>INOUT</literal> + parameter serves as both an input parameter (part of the calling + argument list) and an output parameter (part of the result record type). + <literal>VARIADIC</literal> parameters are input parameters, but are treated + specially as described below. + </para> + </sect2> + + <sect2 id="xfunc-output-parameters-proc"> + <title><acronym>SQL</acronym> Procedures with Output Parameters</title> + + <indexterm> + <primary>procedures</primary> + <secondary>output parameter</secondary> + </indexterm> + + <para> + Output parameters are also supported in procedures, but they work a bit + differently from functions. In <command>CALL</command> commands, + output parameters must be included in the argument list. + For example, the bank account debiting routine from earlier could be + written like this: +<programlisting> +CREATE PROCEDURE tp1 (accountno integer, debit numeric, OUT new_balance numeric) AS $$ + UPDATE bank + SET balance = balance - debit + WHERE accountno = tp1.accountno + RETURNING balance; +$$ LANGUAGE SQL; +</programlisting> + To call this procedure, an argument matching the <literal>OUT</literal> + parameter must be included. It's customary to write + <literal>NULL</literal>: +<programlisting> +CALL tp1(17, 100.0, NULL); +</programlisting> + If you write something else, it must be an expression that is implicitly + coercible to the declared type of the parameter, just as for input + parameters. Note however that such an expression will not be evaluated. + </para> + + <para> + When calling a procedure from <application>PL/pgSQL</application>, + instead of writing <literal>NULL</literal> you must write a variable + that will receive the procedure's output. See <xref + linkend="plpgsql-statements-calling-procedure"/> for details. + </para> + </sect2> + + <sect2 id="xfunc-sql-variadic-functions"> + <title><acronym>SQL</acronym> Functions with Variable Numbers of Arguments</title> + + <indexterm> + <primary>function</primary> + <secondary>variadic</secondary> + </indexterm> + + <indexterm> + <primary>variadic function</primary> + </indexterm> + + <para> + <acronym>SQL</acronym> functions can be declared to accept + variable numbers of arguments, so long as all the <quote>optional</quote> + arguments are of the same data type. The optional arguments will be + passed to the function as an array. The function is declared by + marking the last parameter as <literal>VARIADIC</literal>; this parameter + must be declared as being of an array type. For example: + +<screen> +CREATE FUNCTION mleast(VARIADIC arr numeric[]) RETURNS numeric AS $$ + SELECT min($1[i]) FROM generate_subscripts($1, 1) g(i); +$$ LANGUAGE SQL; + +SELECT mleast(10, -1, 5, 4.4); + mleast +-------- + -1 +(1 row) +</screen> + + Effectively, all the actual arguments at or beyond the + <literal>VARIADIC</literal> position are gathered up into a one-dimensional + array, as if you had written + +<screen> +SELECT mleast(ARRAY[10, -1, 5, 4.4]); -- doesn't work +</screen> + + You can't actually write that, though — or at least, it will + not match this function definition. A parameter marked + <literal>VARIADIC</literal> matches one or more occurrences of its element + type, not of its own type. + </para> + + <para> + Sometimes it is useful to be able to pass an already-constructed array + to a variadic function; this is particularly handy when one variadic + function wants to pass on its array parameter to another one. Also, + this is the only secure way to call a variadic function found in a schema + that permits untrusted users to create objects; see + <xref linkend="typeconv-func"/>. You can do this by + specifying <literal>VARIADIC</literal> in the call: + +<screen> +SELECT mleast(VARIADIC ARRAY[10, -1, 5, 4.4]); +</screen> + + This prevents expansion of the function's variadic parameter into its + element type, thereby allowing the array argument value to match + normally. <literal>VARIADIC</literal> can only be attached to the last + actual argument of a function call. + </para> + + <para> + Specifying <literal>VARIADIC</literal> in the call is also the only way to + pass an empty array to a variadic function, for example: + +<screen> +SELECT mleast(VARIADIC ARRAY[]::numeric[]); +</screen> + + Simply writing <literal>SELECT mleast()</literal> does not work because a + variadic parameter must match at least one actual argument. + (You could define a second function also named <literal>mleast</literal>, + with no parameters, if you wanted to allow such calls.) + </para> + + <para> + The array element parameters generated from a variadic parameter are + treated as not having any names of their own. This means it is not + possible to call a variadic function using named arguments (<xref + linkend="sql-syntax-calling-funcs"/>), except when you specify + <literal>VARIADIC</literal>. For example, this will work: + +<screen> +SELECT mleast(VARIADIC arr => ARRAY[10, -1, 5, 4.4]); +</screen> + + but not these: + +<screen> +SELECT mleast(arr => 10); +SELECT mleast(arr => ARRAY[10, -1, 5, 4.4]); +</screen> + </para> + </sect2> + + <sect2 id="xfunc-sql-parameter-defaults"> + <title><acronym>SQL</acronym> Functions with Default Values for Arguments</title> + + <indexterm> + <primary>function</primary> + <secondary>default values for arguments</secondary> + </indexterm> + + <para> + Functions can be declared with default values for some or all input + arguments. The default values are inserted whenever the function is + called with insufficiently many actual arguments. Since arguments + can only be omitted from the end of the actual argument list, all + parameters after a parameter with a default value have to have + default values as well. (Although the use of named argument notation + could allow this restriction to be relaxed, it's still enforced so that + positional argument notation works sensibly.) Whether or not you use it, + this capability creates a need for precautions when calling functions in + databases where some users mistrust other users; see + <xref linkend="typeconv-func"/>. + </para> + + <para> + For example: +<screen> +CREATE FUNCTION foo(a int, b int DEFAULT 2, c int DEFAULT 3) +RETURNS int +LANGUAGE SQL +AS $$ + SELECT $1 + $2 + $3; +$$; + +SELECT foo(10, 20, 30); + foo +----- + 60 +(1 row) + +SELECT foo(10, 20); + foo +----- + 33 +(1 row) + +SELECT foo(10); + foo +----- + 15 +(1 row) + +SELECT foo(); -- fails since there is no default for the first argument +ERROR: function foo() does not exist +</screen> + The <literal>=</literal> sign can also be used in place of the + key word <literal>DEFAULT</literal>. + </para> + </sect2> + + <sect2 id="xfunc-sql-table-functions"> + <title><acronym>SQL</acronym> Functions as Table Sources</title> + + <para> + All SQL functions can be used in the <literal>FROM</literal> clause of a query, + but it is particularly useful for functions returning composite types. + If the function is defined to return a base type, the table function + produces a one-column table. If the function is defined to return + a composite type, the table function produces a column for each attribute + of the composite type. + </para> + + <para> + Here is an example: + +<screen> +CREATE TABLE foo (fooid int, foosubid int, fooname text); +INSERT INTO foo VALUES (1, 1, 'Joe'); +INSERT INTO foo VALUES (1, 2, 'Ed'); +INSERT INTO foo VALUES (2, 1, 'Mary'); + +CREATE FUNCTION getfoo(int) RETURNS foo AS $$ + SELECT * FROM foo WHERE fooid = $1; +$$ LANGUAGE SQL; + +SELECT *, upper(fooname) FROM getfoo(1) AS t1; + + fooid | foosubid | fooname | upper +-------+----------+---------+------- + 1 | 1 | Joe | JOE +(1 row) +</screen> + + As the example shows, we can work with the columns of the function's + result just the same as if they were columns of a regular table. + </para> + + <para> + Note that we only got one row out of the function. This is because + we did not use <literal>SETOF</literal>. That is described in the next section. + </para> + </sect2> + + <sect2 id="xfunc-sql-functions-returning-set"> + <title><acronym>SQL</acronym> Functions Returning Sets</title> + + <indexterm> + <primary>function</primary> + <secondary>with SETOF</secondary> + </indexterm> + + <para> + When an SQL function is declared as returning <literal>SETOF + <replaceable>sometype</replaceable></literal>, the function's final + query is executed to completion, and each row it + outputs is returned as an element of the result set. + </para> + + <para> + This feature is normally used when calling the function in the <literal>FROM</literal> + clause. In this case each row returned by the function becomes + a row of the table seen by the query. For example, assume that + table <literal>foo</literal> has the same contents as above, and we say: + +<programlisting> +CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$ + SELECT * FROM foo WHERE fooid = $1; +$$ LANGUAGE SQL; + +SELECT * FROM getfoo(1) AS t1; +</programlisting> + + Then we would get: +<screen> + fooid | foosubid | fooname +-------+----------+--------- + 1 | 1 | Joe + 1 | 2 | Ed +(2 rows) +</screen> + </para> + + <para> + It is also possible to return multiple rows with the columns defined by + output parameters, like this: + +<programlisting> +CREATE TABLE tab (y int, z int); +INSERT INTO tab VALUES (1, 2), (3, 4), (5, 6), (7, 8); + +CREATE FUNCTION sum_n_product_with_tab (x int, OUT sum int, OUT product int) +RETURNS SETOF record +AS $$ + SELECT $1 + tab.y, $1 * tab.y FROM tab; +$$ LANGUAGE SQL; + +SELECT * FROM sum_n_product_with_tab(10); + sum | product +-----+--------- + 11 | 10 + 13 | 30 + 15 | 50 + 17 | 70 +(4 rows) +</programlisting> + + The key point here is that you must write <literal>RETURNS SETOF record</literal> + to indicate that the function returns multiple rows instead of just one. + If there is only one output parameter, write that parameter's type + instead of <type>record</type>. + </para> + + <para> + It is frequently useful to construct a query's result by invoking a + set-returning function multiple times, with the parameters for each + invocation coming from successive rows of a table or subquery. The + preferred way to do this is to use the <literal>LATERAL</literal> key word, + which is described in <xref linkend="queries-lateral"/>. + Here is an example using a set-returning function to enumerate + elements of a tree structure: + +<screen> +SELECT * FROM nodes; + name | parent +-----------+-------- + Top | + Child1 | Top + Child2 | Top + Child3 | Top + SubChild1 | Child1 + SubChild2 | Child1 +(6 rows) + +CREATE FUNCTION listchildren(text) RETURNS SETOF text AS $$ + SELECT name FROM nodes WHERE parent = $1 +$$ LANGUAGE SQL STABLE; + +SELECT * FROM listchildren('Top'); + listchildren +-------------- + Child1 + Child2 + Child3 +(3 rows) + +SELECT name, child FROM nodes, LATERAL listchildren(name) AS child; + name | child +--------+----------- + Top | Child1 + Top | Child2 + Top | Child3 + Child1 | SubChild1 + Child1 | SubChild2 +(5 rows) +</screen> + + This example does not do anything that we couldn't have done with a + simple join, but in more complex calculations the option to put + some of the work into a function can be quite convenient. + </para> + + <para> + Functions returning sets can also be called in the select list + of a query. For each row that the query + generates by itself, the set-returning function is invoked, and an output + row is generated for each element of the function's result set. + The previous example could also be done with queries like + these: + +<screen> +SELECT listchildren('Top'); + listchildren +-------------- + Child1 + Child2 + Child3 +(3 rows) + +SELECT name, listchildren(name) FROM nodes; + name | listchildren +--------+-------------- + Top | Child1 + Top | Child2 + Top | Child3 + Child1 | SubChild1 + Child1 | SubChild2 +(5 rows) +</screen> + + In the last <command>SELECT</command>, + notice that no output row appears for <literal>Child2</literal>, <literal>Child3</literal>, etc. + This happens because <function>listchildren</function> returns an empty set + for those arguments, so no result rows are generated. This is the same + behavior as we got from an inner join to the function result when using + the <literal>LATERAL</literal> syntax. + </para> + + <para> + <productname>PostgreSQL</productname>'s behavior for a set-returning function in a + query's select list is almost exactly the same as if the set-returning + function had been written in a <literal>LATERAL FROM</literal>-clause item + instead. For example, +<programlisting> +SELECT x, generate_series(1,5) AS g FROM tab; +</programlisting> + is almost equivalent to +<programlisting> +SELECT x, g FROM tab, LATERAL generate_series(1,5) AS g; +</programlisting> + It would be exactly the same, except that in this specific example, + the planner could choose to put <structname>g</structname> on the outside of the + nested-loop join, since <structname>g</structname> has no actual lateral dependency + on <structname>tab</structname>. That would result in a different output row + order. Set-returning functions in the select list are always evaluated + as though they are on the inside of a nested-loop join with the rest of + the <literal>FROM</literal> clause, so that the function(s) are run to + completion before the next row from the <literal>FROM</literal> clause is + considered. + </para> + + <para> + If there is more than one set-returning function in the query's select + list, the behavior is similar to what you get from putting the functions + into a single <literal>LATERAL ROWS FROM( ... )</literal> <literal>FROM</literal>-clause + item. For each row from the underlying query, there is an output row + using the first result from each function, then an output row using the + second result, and so on. If some of the set-returning functions + produce fewer outputs than others, null values are substituted for the + missing data, so that the total number of rows emitted for one + underlying row is the same as for the set-returning function that + produced the most outputs. Thus the set-returning functions + run <quote>in lockstep</quote> until they are all exhausted, and then + execution continues with the next underlying row. + </para> + + <para> + Set-returning functions can be nested in a select list, although that is + not allowed in <literal>FROM</literal>-clause items. In such cases, each level + of nesting is treated separately, as though it were + a separate <literal>LATERAL ROWS FROM( ... )</literal> item. For example, in +<programlisting> +SELECT srf1(srf2(x), srf3(y)), srf4(srf5(z)) FROM tab; +</programlisting> + the set-returning functions <function>srf2</function>, <function>srf3</function>, + and <function>srf5</function> would be run in lockstep for each row + of <structname>tab</structname>, and then <function>srf1</function> and <function>srf4</function> + would be applied in lockstep to each row produced by the lower + functions. + </para> + + <para> + Set-returning functions cannot be used within conditional-evaluation + constructs, such as <literal>CASE</literal> or <literal>COALESCE</literal>. For + example, consider +<programlisting> +SELECT x, CASE WHEN x > 0 THEN generate_series(1, 5) ELSE 0 END FROM tab; +</programlisting> + It might seem that this should produce five repetitions of input rows + that have <literal>x > 0</literal>, and a single repetition of those that do + not; but actually, because <function>generate_series(1, 5)</function> would be + run in an implicit <literal>LATERAL FROM</literal> item before + the <literal>CASE</literal> expression is ever evaluated, it would produce five + repetitions of every input row. To reduce confusion, such cases produce + a parse-time error instead. + </para> + + <note> + <para> + If a function's last command is <command>INSERT</command>, <command>UPDATE</command>, + or <command>DELETE</command> with <literal>RETURNING</literal>, that command will + always be executed to completion, even if the function is not declared + with <literal>SETOF</literal> or the calling query does not fetch all the + result rows. Any extra rows produced by the <literal>RETURNING</literal> + clause are silently dropped, but the commanded table modifications + still happen (and are all completed before returning from the function). + </para> + </note> + + <note> + <para> + Before <productname>PostgreSQL</productname> 10, putting more than one + set-returning function in the same select list did not behave very + sensibly unless they always produced equal numbers of rows. Otherwise, + what you got was a number of output rows equal to the least common + multiple of the numbers of rows produced by the set-returning + functions. Also, nested set-returning functions did not work as + described above; instead, a set-returning function could have at most + one set-returning argument, and each nest of set-returning functions + was run independently. Also, conditional execution (set-returning + functions inside <literal>CASE</literal> etc.) was previously allowed, + complicating things even more. + Use of the <literal>LATERAL</literal> syntax is recommended when writing + queries that need to work in older <productname>PostgreSQL</productname> versions, + because that will give consistent results across different versions. + If you have a query that is relying on conditional execution of a + set-returning function, you may be able to fix it by moving the + conditional test into a custom set-returning function. For example, +<programlisting> +SELECT x, CASE WHEN y > 0 THEN generate_series(1, z) ELSE 5 END FROM tab; +</programlisting> + could become +<programlisting> +CREATE FUNCTION case_generate_series(cond bool, start int, fin int, els int) + RETURNS SETOF int AS $$ +BEGIN + IF cond THEN + RETURN QUERY SELECT generate_series(start, fin); + ELSE + RETURN QUERY SELECT els; + END IF; +END$$ LANGUAGE plpgsql; + +SELECT x, case_generate_series(y > 0, 1, z, 5) FROM tab; +</programlisting> + This formulation will work the same in all versions + of <productname>PostgreSQL</productname>. + </para> + </note> + </sect2> + + <sect2 id="xfunc-sql-functions-returning-table"> + <title><acronym>SQL</acronym> Functions Returning <literal>TABLE</literal></title> + + <indexterm> + <primary>function</primary> + <secondary>RETURNS TABLE</secondary> + </indexterm> + + <para> + There is another way to declare a function as returning a set, + which is to use the syntax + <literal>RETURNS TABLE(<replaceable>columns</replaceable>)</literal>. + This is equivalent to using one or more <literal>OUT</literal> parameters plus + marking the function as returning <literal>SETOF record</literal> (or + <literal>SETOF</literal> a single output parameter's type, as appropriate). + This notation is specified in recent versions of the SQL standard, and + thus may be more portable than using <literal>SETOF</literal>. + </para> + + <para> + For example, the preceding sum-and-product example could also be + done this way: + +<programlisting> +CREATE FUNCTION sum_n_product_with_tab (x int) +RETURNS TABLE(sum int, product int) AS $$ + SELECT $1 + tab.y, $1 * tab.y FROM tab; +$$ LANGUAGE SQL; +</programlisting> + + It is not allowed to use explicit <literal>OUT</literal> or <literal>INOUT</literal> + parameters with the <literal>RETURNS TABLE</literal> notation — you must + put all the output columns in the <literal>TABLE</literal> list. + </para> + </sect2> + + <sect2 id="xfunc-sql-polymorphic-functions"> + <title>Polymorphic <acronym>SQL</acronym> Functions</title> + + <para> + <acronym>SQL</acronym> functions can be declared to accept and + return the polymorphic types described in <xref + linkend="extend-types-polymorphic"/>. Here is a polymorphic + function <function>make_array</function> that builds up an array + from two arbitrary data type elements: +<screen> +CREATE FUNCTION make_array(anyelement, anyelement) RETURNS anyarray AS $$ + SELECT ARRAY[$1, $2]; +$$ LANGUAGE SQL; + +SELECT make_array(1, 2) AS intarray, make_array('a'::text, 'b') AS textarray; + intarray | textarray +----------+----------- + {1,2} | {a,b} +(1 row) +</screen> + </para> + + <para> + Notice the use of the typecast <literal>'a'::text</literal> + to specify that the argument is of type <type>text</type>. This is + required if the argument is just a string literal, since otherwise + it would be treated as type + <type>unknown</type>, and array of <type>unknown</type> is not a valid + type. + Without the typecast, you will get errors like this: +<screen> +ERROR: could not determine polymorphic type because input has type unknown +</screen> + </para> + + <para> + With <function>make_array</function> declared as above, you must + provide two arguments that are of exactly the same data type; the + system will not attempt to resolve any type differences. Thus for + example this does not work: +<screen> +SELECT make_array(1, 2.5) AS numericarray; +ERROR: function make_array(integer, numeric) does not exist +</screen> + An alternative approach is to use the <quote>common</quote> family of + polymorphic types, which allows the system to try to identify a + suitable common type: +<screen> +CREATE FUNCTION make_array2(anycompatible, anycompatible) +RETURNS anycompatiblearray AS $$ + SELECT ARRAY[$1, $2]; +$$ LANGUAGE SQL; + +SELECT make_array2(1, 2.5) AS numericarray; + numericarray +-------------- + {1,2.5} +(1 row) +</screen> + Because the rules for common type resolution default to choosing + type <type>text</type> when all inputs are of unknown types, this + also works: +<screen> +SELECT make_array2('a', 'b') AS textarray; + textarray +----------- + {a,b} +(1 row) +</screen> + </para> + + <para> + It is permitted to have polymorphic arguments with a fixed + return type, but the converse is not. For example: +<screen> +CREATE FUNCTION is_greater(anyelement, anyelement) RETURNS boolean AS $$ + SELECT $1 > $2; +$$ LANGUAGE SQL; + +SELECT is_greater(1, 2); + is_greater +------------ + f +(1 row) + +CREATE FUNCTION invalid_func() RETURNS anyelement AS $$ + SELECT 1; +$$ LANGUAGE SQL; +ERROR: cannot determine result data type +DETAIL: A result of type anyelement requires at least one input of type anyelement, anyarray, anynonarray, anyenum, or anyrange. +</screen> + </para> + + <para> + Polymorphism can be used with functions that have output arguments. + For example: +<screen> +CREATE FUNCTION dup (f1 anyelement, OUT f2 anyelement, OUT f3 anyarray) +AS 'select $1, array[$1,$1]' LANGUAGE SQL; + +SELECT * FROM dup(22); + f2 | f3 +----+--------- + 22 | {22,22} +(1 row) +</screen> + </para> + + <para> + Polymorphism can also be used with variadic functions. + For example: +<screen> +CREATE FUNCTION anyleast (VARIADIC anyarray) RETURNS anyelement AS $$ + SELECT min($1[i]) FROM generate_subscripts($1, 1) g(i); +$$ LANGUAGE SQL; + +SELECT anyleast(10, -1, 5, 4); + anyleast +---------- + -1 +(1 row) + +SELECT anyleast('abc'::text, 'def'); + anyleast +---------- + abc +(1 row) + +CREATE FUNCTION concat_values(text, VARIADIC anyarray) RETURNS text AS $$ + SELECT array_to_string($2, $1); +$$ LANGUAGE SQL; + +SELECT concat_values('|', 1, 4, 2); + concat_values +--------------- + 1|4|2 +(1 row) +</screen> + </para> + </sect2> + + <sect2> + <title><acronym>SQL</acronym> Functions with Collations</title> + + <indexterm> + <primary>collation</primary> + <secondary>in SQL functions</secondary> + </indexterm> + + <para> + When an SQL function has one or more parameters of collatable data types, + a collation is identified for each function call depending on the + collations assigned to the actual arguments, as described in <xref + linkend="collation"/>. If a collation is successfully identified + (i.e., there are no conflicts of implicit collations among the arguments) + then all the collatable parameters are treated as having that collation + implicitly. This will affect the behavior of collation-sensitive + operations within the function. For example, using the + <function>anyleast</function> function described above, the result of +<programlisting> +SELECT anyleast('abc'::text, 'ABC'); +</programlisting> + will depend on the database's default collation. In <literal>C</literal> locale + the result will be <literal>ABC</literal>, but in many other locales it will + be <literal>abc</literal>. The collation to use can be forced by adding + a <literal>COLLATE</literal> clause to any of the arguments, for example +<programlisting> +SELECT anyleast('abc'::text, 'ABC' COLLATE "C"); +</programlisting> + Alternatively, if you wish a function to operate with a particular + collation regardless of what it is called with, insert + <literal>COLLATE</literal> clauses as needed in the function definition. + This version of <function>anyleast</function> would always use <literal>en_US</literal> + locale to compare strings: +<programlisting> +CREATE FUNCTION anyleast (VARIADIC anyarray) RETURNS anyelement AS $$ + SELECT min($1[i] COLLATE "en_US") FROM generate_subscripts($1, 1) g(i); +$$ LANGUAGE SQL; +</programlisting> + But note that this will throw an error if applied to a non-collatable + data type. + </para> + + <para> + If no common collation can be identified among the actual arguments, + then an SQL function treats its parameters as having their data types' + default collation (which is usually the database's default collation, + but could be different for parameters of domain types). + </para> + + <para> + The behavior of collatable parameters can be thought of as a limited + form of polymorphism, applicable only to textual data types. + </para> + </sect2> + </sect1> + + <sect1 id="xfunc-overload"> + <title>Function Overloading</title> + + <indexterm zone="xfunc-overload"> + <primary>overloading</primary> + <secondary>functions</secondary> + </indexterm> + + <para> + More than one function can be defined with the same SQL name, so long + as the arguments they take are different. In other words, + function names can be <firstterm>overloaded</firstterm>. Whether or not + you use it, this capability entails security precautions when calling + functions in databases where some users mistrust other users; see + <xref linkend="typeconv-func"/>. When a query is executed, the server + will determine which function to call from the data types and the number + of the provided arguments. Overloading can also be used to simulate + functions with a variable number of arguments, up to a finite maximum + number. + </para> + + <para> + When creating a family of overloaded functions, one should be + careful not to create ambiguities. For instance, given the + functions: +<programlisting> +CREATE FUNCTION test(int, real) RETURNS ... +CREATE FUNCTION test(smallint, double precision) RETURNS ... +</programlisting> + it is not immediately clear which function would be called with + some trivial input like <literal>test(1, 1.5)</literal>. The + currently implemented resolution rules are described in + <xref linkend="typeconv"/>, but it is unwise to design a system that subtly + relies on this behavior. + </para> + + <para> + A function that takes a single argument of a composite type should + generally not have the same name as any attribute (field) of that type. + Recall that <literal><replaceable>attribute</replaceable>(<replaceable>table</replaceable>)</literal> + is considered equivalent + to <literal><replaceable>table</replaceable>.<replaceable>attribute</replaceable></literal>. + In the case that there is an + ambiguity between a function on a composite type and an attribute of + the composite type, the attribute will always be used. It is possible + to override that choice by schema-qualifying the function name + (that is, <literal><replaceable>schema</replaceable>.<replaceable>func</replaceable>(<replaceable>table</replaceable>) + </literal>) but it's better to + avoid the problem by not choosing conflicting names. + </para> + + <para> + Another possible conflict is between variadic and non-variadic functions. + For instance, it is possible to create both <literal>foo(numeric)</literal> and + <literal>foo(VARIADIC numeric[])</literal>. In this case it is unclear which one + should be matched to a call providing a single numeric argument, such as + <literal>foo(10.1)</literal>. The rule is that the function appearing + earlier in the search path is used, or if the two functions are in the + same schema, the non-variadic one is preferred. + </para> + + <para> + When overloading C-language functions, there is an additional + constraint: The C name of each function in the family of + overloaded functions must be different from the C names of all + other functions, either internal or dynamically loaded. If this + rule is violated, the behavior is not portable. You might get a + run-time linker error, or one of the functions will get called + (usually the internal one). The alternative form of the + <literal>AS</literal> clause for the SQL <command>CREATE + FUNCTION</command> command decouples the SQL function name from + the function name in the C source code. For instance: +<programlisting> +CREATE FUNCTION test(int) RETURNS int + AS '<replaceable>filename</replaceable>', 'test_1arg' + LANGUAGE C; +CREATE FUNCTION test(int, int) RETURNS int + AS '<replaceable>filename</replaceable>', 'test_2arg' + LANGUAGE C; +</programlisting> + The names of the C functions here reflect one of many possible conventions. + </para> + </sect1> + + <sect1 id="xfunc-volatility"> + <title>Function Volatility Categories</title> + + <indexterm zone="xfunc-volatility"> + <primary>volatility</primary> + <secondary>functions</secondary> + </indexterm> + <indexterm zone="xfunc-volatility"> + <primary>VOLATILE</primary> + </indexterm> + <indexterm zone="xfunc-volatility"> + <primary>STABLE</primary> + </indexterm> + <indexterm zone="xfunc-volatility"> + <primary>IMMUTABLE</primary> + </indexterm> + + <para> + Every function has a <firstterm>volatility</firstterm> classification, with + the possibilities being <literal>VOLATILE</literal>, <literal>STABLE</literal>, or + <literal>IMMUTABLE</literal>. <literal>VOLATILE</literal> is the default if the + <link linkend="sql-createfunction"><command>CREATE FUNCTION</command></link> + command does not specify a category. The volatility category is a + promise to the optimizer about the behavior of the function: + + <itemizedlist> + <listitem> + <para> + A <literal>VOLATILE</literal> function can do anything, including modifying + the database. It can return different results on successive calls with + the same arguments. The optimizer makes no assumptions about the + behavior of such functions. A query using a volatile function will + re-evaluate the function at every row where its value is needed. + </para> + </listitem> + <listitem> + <para> + A <literal>STABLE</literal> function cannot modify the database and is + guaranteed to return the same results given the same arguments + for all rows within a single statement. This category allows the + optimizer to optimize multiple calls of the function to a single + call. In particular, it is safe to use an expression containing + such a function in an index scan condition. (Since an index scan + will evaluate the comparison value only once, not once at each + row, it is not valid to use a <literal>VOLATILE</literal> function in an + index scan condition.) + </para> + </listitem> + <listitem> + <para> + An <literal>IMMUTABLE</literal> function cannot modify the database and is + guaranteed to return the same results given the same arguments forever. + This category allows the optimizer to pre-evaluate the function when + a query calls it with constant arguments. For example, a query like + <literal>SELECT ... WHERE x = 2 + 2</literal> can be simplified on sight to + <literal>SELECT ... WHERE x = 4</literal>, because the function underlying + the integer addition operator is marked <literal>IMMUTABLE</literal>. + </para> + </listitem> + </itemizedlist> + </para> + + <para> + For best optimization results, you should label your functions with the + strictest volatility category that is valid for them. + </para> + + <para> + Any function with side-effects <emphasis>must</emphasis> be labeled + <literal>VOLATILE</literal>, so that calls to it cannot be optimized away. + Even a function with no side-effects needs to be labeled + <literal>VOLATILE</literal> if its value can change within a single query; + some examples are <literal>random()</literal>, <literal>currval()</literal>, + <literal>timeofday()</literal>. + </para> + + <para> + Another important example is that the <function>current_timestamp</function> + family of functions qualify as <literal>STABLE</literal>, since their values do + not change within a transaction. + </para> + + <para> + There is relatively little difference between <literal>STABLE</literal> and + <literal>IMMUTABLE</literal> categories when considering simple interactive + queries that are planned and immediately executed: it doesn't matter + a lot whether a function is executed once during planning or once during + query execution startup. But there is a big difference if the plan is + saved and reused later. Labeling a function <literal>IMMUTABLE</literal> when + it really isn't might allow it to be prematurely folded to a constant during + planning, resulting in a stale value being re-used during subsequent uses + of the plan. This is a hazard when using prepared statements or when + using function languages that cache plans (such as + <application>PL/pgSQL</application>). + </para> + + <para> + For functions written in SQL or in any of the standard procedural + languages, there is a second important property determined by the + volatility category, namely the visibility of any data changes that have + been made by the SQL command that is calling the function. A + <literal>VOLATILE</literal> function will see such changes, a <literal>STABLE</literal> + or <literal>IMMUTABLE</literal> function will not. This behavior is implemented + using the snapshotting behavior of MVCC (see <xref linkend="mvcc"/>): + <literal>STABLE</literal> and <literal>IMMUTABLE</literal> functions use a snapshot + established as of the start of the calling query, whereas + <literal>VOLATILE</literal> functions obtain a fresh snapshot at the start of + each query they execute. + </para> + + <note> + <para> + Functions written in C can manage snapshots however they want, but it's + usually a good idea to make C functions work this way too. + </para> + </note> + + <para> + Because of this snapshotting behavior, + a function containing only <command>SELECT</command> commands can safely be + marked <literal>STABLE</literal>, even if it selects from tables that might be + undergoing modifications by concurrent queries. + <productname>PostgreSQL</productname> will execute all commands of a + <literal>STABLE</literal> function using the snapshot established for the + calling query, and so it will see a fixed view of the database throughout + that query. + </para> + + <para> + The same snapshotting behavior is used for <command>SELECT</command> commands + within <literal>IMMUTABLE</literal> functions. It is generally unwise to select + from database tables within an <literal>IMMUTABLE</literal> function at all, + since the immutability will be broken if the table contents ever change. + However, <productname>PostgreSQL</productname> does not enforce that you + do not do that. + </para> + + <para> + A common error is to label a function <literal>IMMUTABLE</literal> when its + results depend on a configuration parameter. For example, a function + that manipulates timestamps might well have results that depend on the + <xref linkend="guc-timezone"/> setting. For safety, such functions should + be labeled <literal>STABLE</literal> instead. + </para> + + <note> + <para> + <productname>PostgreSQL</productname> requires that <literal>STABLE</literal> + and <literal>IMMUTABLE</literal> functions contain no SQL commands other + than <command>SELECT</command> to prevent data modification. + (This is not a completely bulletproof test, since such functions could + still call <literal>VOLATILE</literal> functions that modify the database. + If you do that, you will find that the <literal>STABLE</literal> or + <literal>IMMUTABLE</literal> function does not notice the database changes + applied by the called function, since they are hidden from its snapshot.) + </para> + </note> + </sect1> + + <sect1 id="xfunc-pl"> + <title>Procedural Language Functions</title> + + <para> + <productname>PostgreSQL</productname> allows user-defined functions + to be written in other languages besides SQL and C. These other + languages are generically called <firstterm>procedural + languages</firstterm> (<acronym>PL</acronym>s). + Procedural languages aren't built into the + <productname>PostgreSQL</productname> server; they are offered + by loadable modules. + See <xref linkend="xplang"/> and following chapters for more + information. + </para> + </sect1> + + <sect1 id="xfunc-internal"> + <title>Internal Functions</title> + + <indexterm zone="xfunc-internal"><primary>function</primary><secondary>internal</secondary></indexterm> + + <para> + Internal functions are functions written in C that have been statically + linked into the <productname>PostgreSQL</productname> server. + The <quote>body</quote> of the function definition + specifies the C-language name of the function, which need not be the + same as the name being declared for SQL use. + (For reasons of backward compatibility, an empty body + is accepted as meaning that the C-language function name is the + same as the SQL name.) + </para> + + <para> + Normally, all internal functions present in the + server are declared during the initialization of the database cluster + (see <xref linkend="creating-cluster"/>), + but a user could use <command>CREATE FUNCTION</command> + to create additional alias names for an internal function. + Internal functions are declared in <command>CREATE FUNCTION</command> + with language name <literal>internal</literal>. For instance, to + create an alias for the <function>sqrt</function> function: +<programlisting> +CREATE FUNCTION square_root(double precision) RETURNS double precision + AS 'dsqrt' + LANGUAGE internal + STRICT; +</programlisting> + (Most internal functions expect to be declared <quote>strict</quote>.) + </para> + + <note> + <para> + Not all <quote>predefined</quote> functions are + <quote>internal</quote> in the above sense. Some predefined + functions are written in SQL. + </para> + </note> + </sect1> + + <sect1 id="xfunc-c"> + <title>C-Language Functions</title> + + <indexterm zone="xfunc-c"> + <primary>function</primary> + <secondary>user-defined</secondary> + <tertiary>in C</tertiary> + </indexterm> + + <para> + User-defined functions can be written in C (or a language that can + be made compatible with C, such as C++). Such functions are + compiled into dynamically loadable objects (also called shared + libraries) and are loaded by the server on demand. The dynamic + loading feature is what distinguishes <quote>C language</quote> functions + from <quote>internal</quote> functions — the actual coding conventions + are essentially the same for both. (Hence, the standard internal + function library is a rich source of coding examples for user-defined + C functions.) + </para> + + <para> + Currently only one calling convention is used for C functions + (<quote>version 1</quote>). Support for that calling convention is + indicated by writing a <literal>PG_FUNCTION_INFO_V1()</literal> macro + call for the function, as illustrated below. + </para> + + <sect2 id="xfunc-c-dynload"> + <title>Dynamic Loading</title> + + <indexterm zone="xfunc-c-dynload"> + <primary>dynamic loading</primary> + </indexterm> + + <para> + The first time a user-defined function in a particular + loadable object file is called in a session, + the dynamic loader loads that object file into memory so that the + function can be called. The <command>CREATE FUNCTION</command> + for a user-defined C function must therefore specify two pieces of + information for the function: the name of the loadable + object file, and the C name (link symbol) of the specific function to call + within that object file. If the C name is not explicitly specified then + it is assumed to be the same as the SQL function name. + </para> + + <para> + The following algorithm is used to locate the shared object file + based on the name given in the <command>CREATE FUNCTION</command> + command: + + <orderedlist> + <listitem> + <para> + If the name is an absolute path, the given file is loaded. + </para> + </listitem> + + <listitem> + <para> + If the name starts with the string <literal>$libdir</literal>, + that part is replaced by the <productname>PostgreSQL</productname> package + library directory + name, which is determined at build time.<indexterm><primary>$libdir</primary></indexterm> + </para> + </listitem> + + <listitem> + <para> + If the name does not contain a directory part, the file is + searched for in the path specified by the configuration variable + <xref linkend="guc-dynamic-library-path"/>.<indexterm><primary>dynamic_library_path</primary></indexterm> + </para> + </listitem> + + <listitem> + <para> + Otherwise (the file was not found in the path, or it contains a + non-absolute directory part), the dynamic loader will try to + take the name as given, which will most likely fail. (It is + unreliable to depend on the current working directory.) + </para> + </listitem> + </orderedlist> + + If this sequence does not work, the platform-specific shared + library file name extension (often <filename>.so</filename>) is + appended to the given name and this sequence is tried again. If + that fails as well, the load will fail. + </para> + + <para> + It is recommended to locate shared libraries either relative to + <literal>$libdir</literal> or through the dynamic library path. + This simplifies version upgrades if the new installation is at a + different location. The actual directory that + <literal>$libdir</literal> stands for can be found out with the + command <literal>pg_config --pkglibdir</literal>. + </para> + + <para> + The user ID the <productname>PostgreSQL</productname> server runs + as must be able to traverse the path to the file you intend to + load. Making the file or a higher-level directory not readable + and/or not executable by the <systemitem>postgres</systemitem> + user is a common mistake. + </para> + + <para> + In any case, the file name that is given in the + <command>CREATE FUNCTION</command> command is recorded literally + in the system catalogs, so if the file needs to be loaded again + the same procedure is applied. + </para> + + <note> + <para> + <productname>PostgreSQL</productname> will not compile a C function + automatically. The object file must be compiled before it is referenced + in a <command>CREATE + FUNCTION</command> command. See <xref linkend="dfunc"/> for additional + information. + </para> + </note> + + <indexterm zone="xfunc-c-dynload"> + <primary>magic block</primary> + </indexterm> + + <para> + To ensure that a dynamically loaded object file is not loaded into an + incompatible server, <productname>PostgreSQL</productname> checks that the + file contains a <quote>magic block</quote> with the appropriate contents. + This allows the server to detect obvious incompatibilities, such as code + compiled for a different major version of + <productname>PostgreSQL</productname>. To include a magic block, + write this in one (and only one) of the module source files, after having + included the header <filename>fmgr.h</filename>: + +<programlisting> +PG_MODULE_MAGIC; +</programlisting> + </para> + + <para> + After it is used for the first time, a dynamically loaded object + file is retained in memory. Future calls in the same session to + the function(s) in that file will only incur the small overhead of + a symbol table lookup. If you need to force a reload of an object + file, for example after recompiling it, begin a fresh session. + </para> + + <indexterm zone="xfunc-c-dynload"> + <primary>_PG_init</primary> + </indexterm> + <indexterm zone="xfunc-c-dynload"> + <primary>library initialization function</primary> + </indexterm> + + <para> + Optionally, a dynamically loaded file can contain an initialization + function. If the file includes a function named + <function>_PG_init</function>, that function will be called immediately after + loading the file. The function receives no parameters and should + return void. There is presently no way to unload a dynamically loaded file. + </para> + + </sect2> + + <sect2 id="xfunc-c-basetype"> + <title>Base Types in C-Language Functions</title> + + <indexterm zone="xfunc-c-basetype"> + <primary>data type</primary> + <secondary>internal organization</secondary> + </indexterm> + + <para> + To know how to write C-language functions, you need to know how + <productname>PostgreSQL</productname> internally represents base + data types and how they can be passed to and from functions. + Internally, <productname>PostgreSQL</productname> regards a base + type as a <quote>blob of memory</quote>. The user-defined + functions that you define over a type in turn define the way that + <productname>PostgreSQL</productname> can operate on it. That + is, <productname>PostgreSQL</productname> will only store and + retrieve the data from disk and use your user-defined functions + to input, process, and output the data. + </para> + + <para> + Base types can have one of three internal formats: + + <itemizedlist> + <listitem> + <para> + pass by value, fixed-length + </para> + </listitem> + <listitem> + <para> + pass by reference, fixed-length + </para> + </listitem> + <listitem> + <para> + pass by reference, variable-length + </para> + </listitem> + </itemizedlist> + </para> + + <para> + By-value types can only be 1, 2, or 4 bytes in length + (also 8 bytes, if <literal>sizeof(Datum)</literal> is 8 on your machine). + You should be careful to define your types such that they will be the + same size (in bytes) on all architectures. For example, the + <literal>long</literal> type is dangerous because it is 4 bytes on some + machines and 8 bytes on others, whereas <type>int</type> type is 4 bytes + on most Unix machines. A reasonable implementation of the + <type>int4</type> type on Unix machines might be: + +<programlisting> +/* 4-byte integer, passed by value */ +typedef int int4; +</programlisting> + + (The actual PostgreSQL C code calls this type <type>int32</type>, because + it is a convention in C that <type>int<replaceable>XX</replaceable></type> + means <replaceable>XX</replaceable> <emphasis>bits</emphasis>. Note + therefore also that the C type <type>int8</type> is 1 byte in size. The + SQL type <type>int8</type> is called <type>int64</type> in C. See also + <xref linkend="xfunc-c-type-table"/>.) + </para> + + <para> + On the other hand, fixed-length types of any size can + be passed by-reference. For example, here is a sample + implementation of a <productname>PostgreSQL</productname> type: + +<programlisting> +/* 16-byte structure, passed by reference */ +typedef struct +{ + double x, y; +} Point; +</programlisting> + + Only pointers to such types can be used when passing + them in and out of <productname>PostgreSQL</productname> functions. + To return a value of such a type, allocate the right amount of + memory with <literal>palloc</literal>, fill in the allocated memory, + and return a pointer to it. (Also, if you just want to return the + same value as one of your input arguments that's of the same data type, + you can skip the extra <literal>palloc</literal> and just return the + pointer to the input value.) + </para> + + <para> + Finally, all variable-length types must also be passed + by reference. All variable-length types must begin + with an opaque length field of exactly 4 bytes, which will be set + by <symbol>SET_VARSIZE</symbol>; never set this field directly! All data to + be stored within that type must be located in the memory + immediately following that length field. The + length field contains the total length of the structure, + that is, it includes the size of the length field + itself. + </para> + + <para> + Another important point is to avoid leaving any uninitialized bits + within data type values; for example, take care to zero out any + alignment padding bytes that might be present in structs. Without + this, logically-equivalent constants of your data type might be + seen as unequal by the planner, leading to inefficient (though not + incorrect) plans. + </para> + + <warning> + <para> + <emphasis>Never</emphasis> modify the contents of a pass-by-reference input + value. If you do so you are likely to corrupt on-disk data, since + the pointer you are given might point directly into a disk buffer. + The sole exception to this rule is explained in + <xref linkend="xaggr"/>. + </para> + </warning> + + <para> + As an example, we can define the type <type>text</type> as + follows: + +<programlisting> +typedef struct { + int32 length; + char data[FLEXIBLE_ARRAY_MEMBER]; +} text; +</programlisting> + + The <literal>[FLEXIBLE_ARRAY_MEMBER]</literal> notation means that the actual + length of the data part is not specified by this declaration. + </para> + + <para> + When manipulating + variable-length types, we must be careful to allocate + the correct amount of memory and set the length field correctly. + For example, if we wanted to store 40 bytes in a <structname>text</structname> + structure, we might use a code fragment like this: + +<programlisting><![CDATA[ +#include "postgres.h" +... +char buffer[40]; /* our source data */ +... +text *destination = (text *) palloc(VARHDRSZ + 40); +SET_VARSIZE(destination, VARHDRSZ + 40); +memcpy(destination->data, buffer, 40); +... +]]> +</programlisting> + + <literal>VARHDRSZ</literal> is the same as <literal>sizeof(int32)</literal>, but + it's considered good style to use the macro <literal>VARHDRSZ</literal> + to refer to the size of the overhead for a variable-length type. + Also, the length field <emphasis>must</emphasis> be set using the + <literal>SET_VARSIZE</literal> macro, not by simple assignment. + </para> + + <para> + <xref linkend="xfunc-c-type-table"/> shows the C types + corresponding to many of the built-in SQL data types + of <productname>PostgreSQL</productname>. + The <quote>Defined In</quote> column gives the header file that + needs to be included to get the type definition. (The actual + definition might be in a different file that is included by the + listed file. It is recommended that users stick to the defined + interface.) Note that you should always include + <filename>postgres.h</filename> first in any source file of server + code, because it declares a number of things that you will need + anyway, and because including other headers first can cause + portability issues. + </para> + + <table tocentry="1" id="xfunc-c-type-table"> + <title>Equivalent C Types for Built-in SQL Types</title> + <tgroup cols="3"> + <colspec colname="col1" colwidth="1*"/> + <colspec colname="col2" colwidth="1*"/> + <colspec colname="col3" colwidth="2*"/> + <thead> + <row> + <entry> + SQL Type + </entry> + <entry> + C Type + </entry> + <entry> + Defined In + </entry> + </row> + </thead> + <tbody> + <row> + <entry><type>boolean</type></entry> + <entry><type>bool</type></entry> + <entry><filename>postgres.h</filename> (maybe compiler built-in)</entry> + </row> + <row> + <entry><type>box</type></entry> + <entry><type>BOX*</type></entry> + <entry><filename>utils/geo_decls.h</filename></entry> + </row> + <row> + <entry><type>bytea</type></entry> + <entry><type>bytea*</type></entry> + <entry><filename>postgres.h</filename></entry> + </row> + <row> + <entry><type>"char"</type></entry> + <entry><type>char</type></entry> + <entry>(compiler built-in)</entry> + </row> + <row> + <entry><type>character</type></entry> + <entry><type>BpChar*</type></entry> + <entry><filename>postgres.h</filename></entry> + </row> + <row> + <entry><type>cid</type></entry> + <entry><type>CommandId</type></entry> + <entry><filename>postgres.h</filename></entry> + </row> + <row> + <entry><type>date</type></entry> + <entry><type>DateADT</type></entry> + <entry><filename>utils/date.h</filename></entry> + </row> + <row> + <entry><type>float4</type> (<type>real</type>)</entry> + <entry><type>float4</type></entry> + <entry><filename>postgres.h</filename></entry> + </row> + <row> + <entry><type>float8</type> (<type>double precision</type>)</entry> + <entry><type>float8</type></entry> + <entry><filename>postgres.h</filename></entry> + </row> + <row> + <entry><type>int2</type> (<type>smallint</type>)</entry> + <entry><type>int16</type></entry> + <entry><filename>postgres.h</filename></entry> + </row> + <row> + <entry><type>int4</type> (<type>integer</type>)</entry> + <entry><type>int32</type></entry> + <entry><filename>postgres.h</filename></entry> + </row> + <row> + <entry><type>int8</type> (<type>bigint</type>)</entry> + <entry><type>int64</type></entry> + <entry><filename>postgres.h</filename></entry> + </row> + <row> + <entry><type>interval</type></entry> + <entry><type>Interval*</type></entry> + <entry><filename>datatype/timestamp.h</filename></entry> + </row> + <row> + <entry><type>lseg</type></entry> + <entry><type>LSEG*</type></entry> + <entry><filename>utils/geo_decls.h</filename></entry> + </row> + <row> + <entry><type>name</type></entry> + <entry><type>Name</type></entry> + <entry><filename>postgres.h</filename></entry> + </row> + <row> + <entry><type>numeric</type></entry> + <entry><type>Numeric</type></entry> + <entry><filename>utils/numeric.h</filename></entry> + </row> + <row> + <entry><type>oid</type></entry> + <entry><type>Oid</type></entry> + <entry><filename>postgres.h</filename></entry> + </row> + <row> + <entry><type>oidvector</type></entry> + <entry><type>oidvector*</type></entry> + <entry><filename>postgres.h</filename></entry> + </row> + <row> + <entry><type>path</type></entry> + <entry><type>PATH*</type></entry> + <entry><filename>utils/geo_decls.h</filename></entry> + </row> + <row> + <entry><type>point</type></entry> + <entry><type>POINT*</type></entry> + <entry><filename>utils/geo_decls.h</filename></entry> + </row> + <row> + <entry><type>regproc</type></entry> + <entry><type>RegProcedure</type></entry> + <entry><filename>postgres.h</filename></entry> + </row> + <row> + <entry><type>text</type></entry> + <entry><type>text*</type></entry> + <entry><filename>postgres.h</filename></entry> + </row> + <row> + <entry><type>tid</type></entry> + <entry><type>ItemPointer</type></entry> + <entry><filename>storage/itemptr.h</filename></entry> + </row> + <row> + <entry><type>time</type></entry> + <entry><type>TimeADT</type></entry> + <entry><filename>utils/date.h</filename></entry> + </row> + <row> + <entry><type>time with time zone</type></entry> + <entry><type>TimeTzADT</type></entry> + <entry><filename>utils/date.h</filename></entry> + </row> + <row> + <entry><type>timestamp</type></entry> + <entry><type>Timestamp</type></entry> + <entry><filename>datatype/timestamp.h</filename></entry> + </row> + <row> + <entry><type>timestamp with time zone</type></entry> + <entry><type>TimestampTz</type></entry> + <entry><filename>datatype/timestamp.h</filename></entry> + </row> + <row> + <entry><type>varchar</type></entry> + <entry><type>VarChar*</type></entry> + <entry><filename>postgres.h</filename></entry> + </row> + <row> + <entry><type>xid</type></entry> + <entry><type>TransactionId</type></entry> + <entry><filename>postgres.h</filename></entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + Now that we've gone over all of the possible structures + for base types, we can show some examples of real functions. + </para> + </sect2> + + <sect2> + <title>Version 1 Calling Conventions</title> + + <para> + The version-1 calling convention relies on macros to suppress most + of the complexity of passing arguments and results. The C declaration + of a version-1 function is always: +<programlisting> +Datum funcname(PG_FUNCTION_ARGS) +</programlisting> + In addition, the macro call: +<programlisting> +PG_FUNCTION_INFO_V1(funcname); +</programlisting> + must appear in the same source file. (Conventionally, it's + written just before the function itself.) This macro call is not + needed for <literal>internal</literal>-language functions, since + <productname>PostgreSQL</productname> assumes that all internal functions + use the version-1 convention. It is, however, required for + dynamically-loaded functions. + </para> + + <para> + In a version-1 function, each actual argument is fetched using a + <function>PG_GETARG_<replaceable>xxx</replaceable>()</function> + macro that corresponds to the argument's data type. (In non-strict + functions there needs to be a previous check about argument null-ness + using <function>PG_ARGISNULL()</function>; see below.) + The result is returned using a + <function>PG_RETURN_<replaceable>xxx</replaceable>()</function> + macro for the return type. + <function>PG_GETARG_<replaceable>xxx</replaceable>()</function> + takes as its argument the number of the function argument to + fetch, where the count starts at 0. + <function>PG_RETURN_<replaceable>xxx</replaceable>()</function> + takes as its argument the actual value to return. + </para> + + <para> + Here are some examples using the version-1 calling convention: + </para> + +<programlisting><![CDATA[ +#include "postgres.h" +#include <string.h> +#include "fmgr.h" +#include "utils/geo_decls.h" + +PG_MODULE_MAGIC; + +/* by value */ + +PG_FUNCTION_INFO_V1(add_one); + +Datum +add_one(PG_FUNCTION_ARGS) +{ + int32 arg = PG_GETARG_INT32(0); + + PG_RETURN_INT32(arg + 1); +} + +/* by reference, fixed length */ + +PG_FUNCTION_INFO_V1(add_one_float8); + +Datum +add_one_float8(PG_FUNCTION_ARGS) +{ + /* The macros for FLOAT8 hide its pass-by-reference nature. */ + float8 arg = PG_GETARG_FLOAT8(0); + + PG_RETURN_FLOAT8(arg + 1.0); +} + +PG_FUNCTION_INFO_V1(makepoint); + +Datum +makepoint(PG_FUNCTION_ARGS) +{ + /* Here, the pass-by-reference nature of Point is not hidden. */ + Point *pointx = PG_GETARG_POINT_P(0); + Point *pointy = PG_GETARG_POINT_P(1); + Point *new_point = (Point *) palloc(sizeof(Point)); + + new_point->x = pointx->x; + new_point->y = pointy->y; + + PG_RETURN_POINT_P(new_point); +} + +/* by reference, variable length */ + +PG_FUNCTION_INFO_V1(copytext); + +Datum +copytext(PG_FUNCTION_ARGS) +{ + text *t = PG_GETARG_TEXT_PP(0); + + /* + * VARSIZE_ANY_EXHDR is the size of the struct in bytes, minus the + * VARHDRSZ or VARHDRSZ_SHORT of its header. Construct the copy with a + * full-length header. + */ + text *new_t = (text *) palloc(VARSIZE_ANY_EXHDR(t) + VARHDRSZ); + SET_VARSIZE(new_t, VARSIZE_ANY_EXHDR(t) + VARHDRSZ); + + /* + * VARDATA is a pointer to the data region of the new struct. The source + * could be a short datum, so retrieve its data through VARDATA_ANY. + */ + memcpy((void *) VARDATA(new_t), /* destination */ + (void *) VARDATA_ANY(t), /* source */ + VARSIZE_ANY_EXHDR(t)); /* how many bytes */ + PG_RETURN_TEXT_P(new_t); +} + +PG_FUNCTION_INFO_V1(concat_text); + +Datum +concat_text(PG_FUNCTION_ARGS) +{ + text *arg1 = PG_GETARG_TEXT_PP(0); + text *arg2 = PG_GETARG_TEXT_PP(1); + int32 arg1_size = VARSIZE_ANY_EXHDR(arg1); + int32 arg2_size = VARSIZE_ANY_EXHDR(arg2); + int32 new_text_size = arg1_size + arg2_size + VARHDRSZ; + text *new_text = (text *) palloc(new_text_size); + + SET_VARSIZE(new_text, new_text_size); + memcpy(VARDATA(new_text), VARDATA_ANY(arg1), arg1_size); + memcpy(VARDATA(new_text) + arg1_size, VARDATA_ANY(arg2), arg2_size); + PG_RETURN_TEXT_P(new_text); +} +]]> +</programlisting> + + <para> + Supposing that the above code has been prepared in file + <filename>funcs.c</filename> and compiled into a shared object, + we could define the functions to <productname>PostgreSQL</productname> + with commands like this: + </para> + +<programlisting> +CREATE FUNCTION add_one(integer) RETURNS integer + AS '<replaceable>DIRECTORY</replaceable>/funcs', 'add_one' + LANGUAGE C STRICT; + +-- note overloading of SQL function name "add_one" +CREATE FUNCTION add_one(double precision) RETURNS double precision + AS '<replaceable>DIRECTORY</replaceable>/funcs', 'add_one_float8' + LANGUAGE C STRICT; + +CREATE FUNCTION makepoint(point, point) RETURNS point + AS '<replaceable>DIRECTORY</replaceable>/funcs', 'makepoint' + LANGUAGE C STRICT; + +CREATE FUNCTION copytext(text) RETURNS text + AS '<replaceable>DIRECTORY</replaceable>/funcs', 'copytext' + LANGUAGE C STRICT; + +CREATE FUNCTION concat_text(text, text) RETURNS text + AS '<replaceable>DIRECTORY</replaceable>/funcs', 'concat_text' + LANGUAGE C STRICT; +</programlisting> + + <para> + Here, <replaceable>DIRECTORY</replaceable> stands for the + directory of the shared library file (for instance the + <productname>PostgreSQL</productname> tutorial directory, which + contains the code for the examples used in this section). + (Better style would be to use just <literal>'funcs'</literal> in the + <literal>AS</literal> clause, after having added + <replaceable>DIRECTORY</replaceable> to the search path. In any + case, we can omit the system-specific extension for a shared + library, commonly <literal>.so</literal>.) + </para> + + <para> + Notice that we have specified the functions as <quote>strict</quote>, + meaning that + the system should automatically assume a null result if any input + value is null. By doing this, we avoid having to check for null inputs + in the function code. Without this, we'd have to check for null values + explicitly, using <function>PG_ARGISNULL()</function>. + </para> + + <para> + The macro <function>PG_ARGISNULL(<replaceable>n</replaceable>)</function> + allows a function to test whether each input is null. (Of course, doing + this is only necessary in functions not declared <quote>strict</quote>.) + As with the + <function>PG_GETARG_<replaceable>xxx</replaceable>()</function> macros, + the input arguments are counted beginning at zero. Note that one + should refrain from executing + <function>PG_GETARG_<replaceable>xxx</replaceable>()</function> until + one has verified that the argument isn't null. + To return a null result, execute <function>PG_RETURN_NULL()</function>; + this works in both strict and nonstrict functions. + </para> + + <para> + At first glance, the version-1 coding conventions might appear + to be just pointless obscurantism, compared to using + plain <literal>C</literal> calling conventions. They do however allow + us to deal with <literal>NULL</literal>able arguments/return values, + and <quote>toasted</quote> (compressed or out-of-line) values. + </para> + + <para> + Other options provided by the version-1 interface are two + variants of the + <function>PG_GETARG_<replaceable>xxx</replaceable>()</function> + macros. The first of these, + <function>PG_GETARG_<replaceable>xxx</replaceable>_COPY()</function>, + guarantees to return a copy of the specified argument that is + safe for writing into. (The normal macros will sometimes return a + pointer to a value that is physically stored in a table, which + must not be written to. Using the + <function>PG_GETARG_<replaceable>xxx</replaceable>_COPY()</function> + macros guarantees a writable result.) + The second variant consists of the + <function>PG_GETARG_<replaceable>xxx</replaceable>_SLICE()</function> + macros which take three arguments. The first is the number of the + function argument (as above). The second and third are the offset and + length of the segment to be returned. Offsets are counted from + zero, and a negative length requests that the remainder of the + value be returned. These macros provide more efficient access to + parts of large values in the case where they have storage type + <quote>external</quote>. (The storage type of a column can be specified using + <literal>ALTER TABLE <replaceable>tablename</replaceable> ALTER + COLUMN <replaceable>colname</replaceable> SET STORAGE + <replaceable>storagetype</replaceable></literal>. <replaceable>storagetype</replaceable> is one of + <literal>plain</literal>, <literal>external</literal>, <literal>extended</literal>, + or <literal>main</literal>.) + </para> + + <para> + Finally, the version-1 function call conventions make it possible + to return set results (<xref linkend="xfunc-c-return-set"/>) and + implement trigger functions (<xref linkend="triggers"/>) and + procedural-language call handlers (<xref + linkend="plhandler"/>). For more details + see <filename>src/backend/utils/fmgr/README</filename> in the + source distribution. + </para> + </sect2> + + <sect2> + <title>Writing Code</title> + + <para> + Before we turn to the more advanced topics, we should discuss + some coding rules for <productname>PostgreSQL</productname> + C-language functions. While it might be possible to load functions + written in languages other than C into + <productname>PostgreSQL</productname>, this is usually difficult + (when it is possible at all) because other languages, such as + C++, FORTRAN, or Pascal often do not follow the same calling + convention as C. That is, other languages do not pass argument + and return values between functions in the same way. For this + reason, we will assume that your C-language functions are + actually written in C. + </para> + + <para> + The basic rules for writing and building C functions are as follows: + + <itemizedlist> + <listitem> + <para> + Use <literal>pg_config + --includedir-server</literal><indexterm><primary>pg_config</primary><secondary>with user-defined C functions</secondary></indexterm> + to find out where the <productname>PostgreSQL</productname> server header + files are installed on your system (or the system that your + users will be running on). + </para> + </listitem> + + <listitem> + <para> + Compiling and linking your code so that it can be dynamically + loaded into <productname>PostgreSQL</productname> always + requires special flags. See <xref linkend="dfunc"/> for a + detailed explanation of how to do it for your particular + operating system. + </para> + </listitem> + + <listitem> + <para> + Remember to define a <quote>magic block</quote> for your shared library, + as described in <xref linkend="xfunc-c-dynload"/>. + </para> + </listitem> + + <listitem> + <para> + When allocating memory, use the + <productname>PostgreSQL</productname> functions + <function>palloc</function><indexterm><primary>palloc</primary></indexterm> and <function>pfree</function><indexterm><primary>pfree</primary></indexterm> + instead of the corresponding C library functions + <function>malloc</function> and <function>free</function>. + The memory allocated by <function>palloc</function> will be + freed automatically at the end of each transaction, preventing + memory leaks. + </para> + </listitem> + + <listitem> + <para> + Always zero the bytes of your structures using <function>memset</function> + (or allocate them with <function>palloc0</function> in the first place). + Even if you assign to each field of your structure, there might be + alignment padding (holes in the structure) that contain + garbage values. Without this, it's difficult to + support hash indexes or hash joins, as you must pick out only + the significant bits of your data structure to compute a hash. + The planner also sometimes relies on comparing constants via + bitwise equality, so you can get undesirable planning results if + logically-equivalent values aren't bitwise equal. + </para> + </listitem> + + <listitem> + <para> + Most of the internal <productname>PostgreSQL</productname> + types are declared in <filename>postgres.h</filename>, while + the function manager interfaces + (<symbol>PG_FUNCTION_ARGS</symbol>, etc.) are in + <filename>fmgr.h</filename>, so you will need to include at + least these two files. For portability reasons it's best to + include <filename>postgres.h</filename> <emphasis>first</emphasis>, + before any other system or user header files. Including + <filename>postgres.h</filename> will also include + <filename>elog.h</filename> and <filename>palloc.h</filename> + for you. + </para> + </listitem> + + <listitem> + <para> + Symbol names defined within object files must not conflict + with each other or with symbols defined in the + <productname>PostgreSQL</productname> server executable. You + will have to rename your functions or variables if you get + error messages to this effect. + </para> + </listitem> + </itemizedlist> + </para> + </sect2> + +&dfunc; + + <sect2> + <title>Composite-Type Arguments</title> + + <para> + Composite types do not have a fixed layout like C structures. + Instances of a composite type can contain null fields. In + addition, composite types that are part of an inheritance + hierarchy can have different fields than other members of the + same inheritance hierarchy. Therefore, + <productname>PostgreSQL</productname> provides a function + interface for accessing fields of composite types from C. + </para> + + <para> + Suppose we want to write a function to answer the query: + +<programlisting> +SELECT name, c_overpaid(emp, 1500) AS overpaid + FROM emp + WHERE name = 'Bill' OR name = 'Sam'; +</programlisting> + + Using the version-1 calling conventions, we can define + <function>c_overpaid</function> as: + +<programlisting><![CDATA[ +#include "postgres.h" +#include "executor/executor.h" /* for GetAttributeByName() */ + +PG_MODULE_MAGIC; + +PG_FUNCTION_INFO_V1(c_overpaid); + +Datum +c_overpaid(PG_FUNCTION_ARGS) +{ + HeapTupleHeader t = PG_GETARG_HEAPTUPLEHEADER(0); + int32 limit = PG_GETARG_INT32(1); + bool isnull; + Datum salary; + + salary = GetAttributeByName(t, "salary", &isnull); + if (isnull) + PG_RETURN_BOOL(false); + /* Alternatively, we might prefer to do PG_RETURN_NULL() for null salary. */ + + PG_RETURN_BOOL(DatumGetInt32(salary) > limit); +} +]]> +</programlisting> + </para> + + <para> + <function>GetAttributeByName</function> is the + <productname>PostgreSQL</productname> system function that + returns attributes out of the specified row. It has + three arguments: the argument of type <type>HeapTupleHeader</type> passed + into + the function, the name of the desired attribute, and a + return parameter that tells whether the attribute + is null. <function>GetAttributeByName</function> returns a <type>Datum</type> + value that you can convert to the proper data type by using the + appropriate <function>DatumGet<replaceable>XXX</replaceable>()</function> + macro. Note that the return value is meaningless if the null flag is + set; always check the null flag before trying to do anything with the + result. + </para> + + <para> + There is also <function>GetAttributeByNum</function>, which selects + the target attribute by column number instead of name. + </para> + + <para> + The following command declares the function + <function>c_overpaid</function> in SQL: + +<programlisting> +CREATE FUNCTION c_overpaid(emp, integer) RETURNS boolean + AS '<replaceable>DIRECTORY</replaceable>/funcs', 'c_overpaid' + LANGUAGE C STRICT; +</programlisting> + + Notice we have used <literal>STRICT</literal> so that we did not have to + check whether the input arguments were NULL. + </para> + </sect2> + + <sect2> + <title>Returning Rows (Composite Types)</title> + + <para> + To return a row or composite-type value from a C-language + function, you can use a special API that provides macros and + functions to hide most of the complexity of building composite + data types. To use this API, the source file must include: +<programlisting> +#include "funcapi.h" +</programlisting> + </para> + + <para> + There are two ways you can build a composite data value (henceforth + a <quote>tuple</quote>): you can build it from an array of Datum values, + or from an array of C strings that can be passed to the input + conversion functions of the tuple's column data types. In either + case, you first need to obtain or construct a <structname>TupleDesc</structname> + descriptor for the tuple structure. When working with Datums, you + pass the <structname>TupleDesc</structname> to <function>BlessTupleDesc</function>, + and then call <function>heap_form_tuple</function> for each row. When working + with C strings, you pass the <structname>TupleDesc</structname> to + <function>TupleDescGetAttInMetadata</function>, and then call + <function>BuildTupleFromCStrings</function> for each row. In the case of a + function returning a set of tuples, the setup steps can all be done + once during the first call of the function. + </para> + + <para> + Several helper functions are available for setting up the needed + <structname>TupleDesc</structname>. The recommended way to do this in most + functions returning composite values is to call: +<programlisting> +TypeFuncClass get_call_result_type(FunctionCallInfo fcinfo, + Oid *resultTypeId, + TupleDesc *resultTupleDesc) +</programlisting> + passing the same <literal>fcinfo</literal> struct passed to the calling function + itself. (This of course requires that you use the version-1 + calling conventions.) <varname>resultTypeId</varname> can be specified + as <literal>NULL</literal> or as the address of a local variable to receive the + function's result type OID. <varname>resultTupleDesc</varname> should be the + address of a local <structname>TupleDesc</structname> variable. Check that the + result is <literal>TYPEFUNC_COMPOSITE</literal>; if so, + <varname>resultTupleDesc</varname> has been filled with the needed + <structname>TupleDesc</structname>. (If it is not, you can report an error along + the lines of <quote>function returning record called in context that + cannot accept type record</quote>.) + </para> + + <tip> + <para> + <function>get_call_result_type</function> can resolve the actual type of a + polymorphic function result; so it is useful in functions that return + scalar polymorphic results, not only functions that return composites. + The <varname>resultTypeId</varname> output is primarily useful for functions + returning polymorphic scalars. + </para> + </tip> + + <note> + <para> + <function>get_call_result_type</function> has a sibling + <function>get_expr_result_type</function>, which can be used to resolve the + expected output type for a function call represented by an expression + tree. This can be used when trying to determine the result type from + outside the function itself. There is also + <function>get_func_result_type</function>, which can be used when only the + function's OID is available. However these functions are not able + to deal with functions declared to return <structname>record</structname>, and + <function>get_func_result_type</function> cannot resolve polymorphic types, + so you should preferentially use <function>get_call_result_type</function>. + </para> + </note> + + <para> + Older, now-deprecated functions for obtaining + <structname>TupleDesc</structname>s are: +<programlisting> +TupleDesc RelationNameGetTupleDesc(const char *relname) +</programlisting> + to get a <structname>TupleDesc</structname> for the row type of a named relation, + and: +<programlisting> +TupleDesc TypeGetTupleDesc(Oid typeoid, List *colaliases) +</programlisting> + to get a <structname>TupleDesc</structname> based on a type OID. This can + be used to get a <structname>TupleDesc</structname> for a base or + composite type. It will not work for a function that returns + <structname>record</structname>, however, and it cannot resolve polymorphic + types. + </para> + + <para> + Once you have a <structname>TupleDesc</structname>, call: +<programlisting> +TupleDesc BlessTupleDesc(TupleDesc tupdesc) +</programlisting> + if you plan to work with Datums, or: +<programlisting> +AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc) +</programlisting> + if you plan to work with C strings. If you are writing a function + returning set, you can save the results of these functions in the + <structname>FuncCallContext</structname> structure — use the + <structfield>tuple_desc</structfield> or <structfield>attinmeta</structfield> field + respectively. + </para> + + <para> + When working with Datums, use: +<programlisting> +HeapTuple heap_form_tuple(TupleDesc tupdesc, Datum *values, bool *isnull) +</programlisting> + to build a <structname>HeapTuple</structname> given user data in Datum form. + </para> + + <para> + When working with C strings, use: +<programlisting> +HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values) +</programlisting> + to build a <structname>HeapTuple</structname> given user data + in C string form. <parameter>values</parameter> is an array of C strings, + one for each attribute of the return row. Each C string should be in + the form expected by the input function of the attribute data + type. In order to return a null value for one of the attributes, + the corresponding pointer in the <parameter>values</parameter> array + should be set to <symbol>NULL</symbol>. This function will need to + be called again for each row you return. + </para> + + <para> + Once you have built a tuple to return from your function, it + must be converted into a <type>Datum</type>. Use: +<programlisting> +HeapTupleGetDatum(HeapTuple tuple) +</programlisting> + to convert a <structname>HeapTuple</structname> into a valid Datum. This + <type>Datum</type> can be returned directly if you intend to return + just a single row, or it can be used as the current return value + in a set-returning function. + </para> + + <para> + An example appears in the next section. + </para> + + </sect2> + + <sect2 id="xfunc-c-return-set"> + <title>Returning Sets</title> + + <para> + C-language functions have two options for returning sets (multiple + rows). In one method, called <firstterm>ValuePerCall</firstterm> + mode, a set-returning function is called repeatedly (passing the same + arguments each time) and it returns one new row on each call, until + it has no more rows to return and signals that by returning NULL. + The set-returning function (<acronym>SRF</acronym>) must therefore + save enough state across calls to remember what it was doing and + return the correct next item on each call. + In the other method, called <firstterm>Materialize</firstterm> mode, + an SRF fills and returns a tuplestore object containing its + entire result; then only one call occurs for the whole result, and + no inter-call state is needed. + </para> + + <para> + When using ValuePerCall mode, it is important to remember that the + query is not guaranteed to be run to completion; that is, due to + options such as <literal>LIMIT</literal>, the executor might stop + making calls to the set-returning function before all rows have been + fetched. This means it is not safe to perform cleanup activities in + the last call, because that might not ever happen. It's recommended + to use Materialize mode for functions that need access to external + resources, such as file descriptors. + </para> + + <para> + The remainder of this section documents a set of helper macros that + are commonly used (though not required to be used) for SRFs using + ValuePerCall mode. Additional details about Materialize mode can be + found in <filename>src/backend/utils/fmgr/README</filename>. Also, + the <filename>contrib</filename> modules in + the <productname>PostgreSQL</productname> source distribution contain + many examples of SRFs using both ValuePerCall and Materialize mode. + </para> + + <para> + To use the ValuePerCall support macros described here, + include <filename>funcapi.h</filename>. These macros work with a + structure <structname>FuncCallContext</structname> that contains the + state that needs to be saved across calls. Within the calling + SRF, <literal>fcinfo->flinfo->fn_extra</literal> is used to + hold a pointer to <structname>FuncCallContext</structname> across + calls. The macros automatically fill that field on first use, + and expect to find the same pointer there on subsequent uses. +<programlisting> +typedef struct FuncCallContext +{ + /* + * Number of times we've been called before + * + * call_cntr is initialized to 0 for you by SRF_FIRSTCALL_INIT(), and + * incremented for you every time SRF_RETURN_NEXT() is called. + */ + uint64 call_cntr; + + /* + * OPTIONAL maximum number of calls + * + * max_calls is here for convenience only and setting it is optional. + * If not set, you must provide alternative means to know when the + * function is done. + */ + uint64 max_calls; + + /* + * OPTIONAL pointer to miscellaneous user-provided context information + * + * user_fctx is for use as a pointer to your own data to retain + * arbitrary context information between calls of your function. + */ + void *user_fctx; + + /* + * OPTIONAL pointer to struct containing attribute type input metadata + * + * attinmeta is for use when returning tuples (i.e., composite data types) + * and is not used when returning base data types. It is only needed + * if you intend to use BuildTupleFromCStrings() to create the return + * tuple. + */ + AttInMetadata *attinmeta; + + /* + * memory context used for structures that must live for multiple calls + * + * multi_call_memory_ctx is set by SRF_FIRSTCALL_INIT() for you, and used + * by SRF_RETURN_DONE() for cleanup. It is the most appropriate memory + * context for any memory that is to be reused across multiple calls + * of the SRF. + */ + MemoryContext multi_call_memory_ctx; + + /* + * OPTIONAL pointer to struct containing tuple description + * + * tuple_desc is for use when returning tuples (i.e., composite data types) + * and is only needed if you are going to build the tuples with + * heap_form_tuple() rather than with BuildTupleFromCStrings(). Note that + * the TupleDesc pointer stored here should usually have been run through + * BlessTupleDesc() first. + */ + TupleDesc tuple_desc; + +} FuncCallContext; +</programlisting> + </para> + + <para> + The macros to be used by an <acronym>SRF</acronym> using this + infrastructure are: +<programlisting> +SRF_IS_FIRSTCALL() +</programlisting> + Use this to determine if your function is being called for the first or a + subsequent time. On the first call (only), call: +<programlisting> +SRF_FIRSTCALL_INIT() +</programlisting> + to initialize the <structname>FuncCallContext</structname>. On every function call, + including the first, call: +<programlisting> +SRF_PERCALL_SETUP() +</programlisting> + to set up for using the <structname>FuncCallContext</structname>. + </para> + + <para> + If your function has data to return in the current call, use: +<programlisting> +SRF_RETURN_NEXT(funcctx, result) +</programlisting> + to return it to the caller. (<literal>result</literal> must be of type + <type>Datum</type>, either a single value or a tuple prepared as + described above.) Finally, when your function is finished + returning data, use: +<programlisting> +SRF_RETURN_DONE(funcctx) +</programlisting> + to clean up and end the <acronym>SRF</acronym>. + </para> + + <para> + The memory context that is current when the <acronym>SRF</acronym> is called is + a transient context that will be cleared between calls. This means + that you do not need to call <function>pfree</function> on everything + you allocated using <function>palloc</function>; it will go away anyway. However, if you want to allocate + any data structures to live across calls, you need to put them somewhere + else. The memory context referenced by + <structfield>multi_call_memory_ctx</structfield> is a suitable location for any + data that needs to survive until the <acronym>SRF</acronym> is finished running. In most + cases, this means that you should switch into + <structfield>multi_call_memory_ctx</structfield> while doing the + first-call setup. + Use <literal>funcctx->user_fctx</literal> to hold a pointer to + any such cross-call data structures. + (Data you allocate + in <structfield>multi_call_memory_ctx</structfield> will go away + automatically when the query ends, so it is not necessary to free + that data manually, either.) + </para> + + <warning> + <para> + While the actual arguments to the function remain unchanged between + calls, if you detoast the argument values (which is normally done + transparently by the + <function>PG_GETARG_<replaceable>xxx</replaceable></function> macro) + in the transient context then the detoasted copies will be freed on + each cycle. Accordingly, if you keep references to such values in + your <structfield>user_fctx</structfield>, you must either copy them into the + <structfield>multi_call_memory_ctx</structfield> after detoasting, or ensure + that you detoast the values only in that context. + </para> + </warning> + + <para> + A complete pseudo-code example looks like the following: +<programlisting> +Datum +my_set_returning_function(PG_FUNCTION_ARGS) +{ + FuncCallContext *funcctx; + Datum result; + <replaceable>further declarations as needed</replaceable> + + if (SRF_IS_FIRSTCALL()) + { + MemoryContext oldcontext; + + funcctx = SRF_FIRSTCALL_INIT(); + oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); + /* One-time setup code appears here: */ + <replaceable>user code</replaceable> + <replaceable>if returning composite</replaceable> + <replaceable>build TupleDesc, and perhaps AttInMetadata</replaceable> + <replaceable>endif returning composite</replaceable> + <replaceable>user code</replaceable> + MemoryContextSwitchTo(oldcontext); + } + + /* Each-time setup code appears here: */ + <replaceable>user code</replaceable> + funcctx = SRF_PERCALL_SETUP(); + <replaceable>user code</replaceable> + + /* this is just one way we might test whether we are done: */ + if (funcctx->call_cntr < funcctx->max_calls) + { + /* Here we want to return another item: */ + <replaceable>user code</replaceable> + <replaceable>obtain result Datum</replaceable> + SRF_RETURN_NEXT(funcctx, result); + } + else + { + /* Here we are done returning items, so just report that fact. */ + /* (Resist the temptation to put cleanup code here.) */ + SRF_RETURN_DONE(funcctx); + } +} +</programlisting> + </para> + + <para> + A complete example of a simple <acronym>SRF</acronym> returning a composite type + looks like: +<programlisting><![CDATA[ +PG_FUNCTION_INFO_V1(retcomposite); + +Datum +retcomposite(PG_FUNCTION_ARGS) +{ + FuncCallContext *funcctx; + int call_cntr; + int max_calls; + TupleDesc tupdesc; + AttInMetadata *attinmeta; + + /* stuff done only on the first call of the function */ + if (SRF_IS_FIRSTCALL()) + { + MemoryContext oldcontext; + + /* create a function context for cross-call persistence */ + funcctx = SRF_FIRSTCALL_INIT(); + + /* switch to memory context appropriate for multiple function calls */ + oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); + + /* total number of tuples to be returned */ + funcctx->max_calls = PG_GETARG_INT32(0); + + /* Build a tuple descriptor for our result type */ + if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("function returning record called in context " + "that cannot accept type record"))); + + /* + * generate attribute metadata needed later to produce tuples from raw + * C strings + */ + attinmeta = TupleDescGetAttInMetadata(tupdesc); + funcctx->attinmeta = attinmeta; + + MemoryContextSwitchTo(oldcontext); + } + + /* stuff done on every call of the function */ + funcctx = SRF_PERCALL_SETUP(); + + call_cntr = funcctx->call_cntr; + max_calls = funcctx->max_calls; + attinmeta = funcctx->attinmeta; + + if (call_cntr < max_calls) /* do when there is more left to send */ + { + char **values; + HeapTuple tuple; + Datum result; + + /* + * Prepare a values array for building the returned tuple. + * This should be an array of C strings which will + * be processed later by the type input functions. + */ + values = (char **) palloc(3 * sizeof(char *)); + values[0] = (char *) palloc(16 * sizeof(char)); + values[1] = (char *) palloc(16 * sizeof(char)); + values[2] = (char *) palloc(16 * sizeof(char)); + + snprintf(values[0], 16, "%d", 1 * PG_GETARG_INT32(1)); + snprintf(values[1], 16, "%d", 2 * PG_GETARG_INT32(1)); + snprintf(values[2], 16, "%d", 3 * PG_GETARG_INT32(1)); + + /* build a tuple */ + tuple = BuildTupleFromCStrings(attinmeta, values); + + /* make the tuple into a datum */ + result = HeapTupleGetDatum(tuple); + + /* clean up (this is not really necessary) */ + pfree(values[0]); + pfree(values[1]); + pfree(values[2]); + pfree(values); + + SRF_RETURN_NEXT(funcctx, result); + } + else /* do when there is no more left */ + { + SRF_RETURN_DONE(funcctx); + } +} +]]> +</programlisting> + + One way to declare this function in SQL is: +<programlisting> +CREATE TYPE __retcomposite AS (f1 integer, f2 integer, f3 integer); + +CREATE OR REPLACE FUNCTION retcomposite(integer, integer) + RETURNS SETOF __retcomposite + AS '<replaceable>filename</replaceable>', 'retcomposite' + LANGUAGE C IMMUTABLE STRICT; +</programlisting> + A different way is to use OUT parameters: +<programlisting> +CREATE OR REPLACE FUNCTION retcomposite(IN integer, IN integer, + OUT f1 integer, OUT f2 integer, OUT f3 integer) + RETURNS SETOF record + AS '<replaceable>filename</replaceable>', 'retcomposite' + LANGUAGE C IMMUTABLE STRICT; +</programlisting> + Notice that in this method the output type of the function is formally + an anonymous <structname>record</structname> type. + </para> + </sect2> + + <sect2> + <title>Polymorphic Arguments and Return Types</title> + + <para> + C-language functions can be declared to accept and + return the polymorphic types described in <xref + linkend="extend-types-polymorphic"/>. + When a function's arguments or return types + are defined as polymorphic types, the function author cannot know + in advance what data type it will be called with, or + need to return. There are two routines provided in <filename>fmgr.h</filename> + to allow a version-1 C function to discover the actual data types + of its arguments and the type it is expected to return. The routines are + called <literal>get_fn_expr_rettype(FmgrInfo *flinfo)</literal> and + <literal>get_fn_expr_argtype(FmgrInfo *flinfo, int argnum)</literal>. + They return the result or argument type OID, or <symbol>InvalidOid</symbol> if the + information is not available. + The structure <literal>flinfo</literal> is normally accessed as + <literal>fcinfo->flinfo</literal>. The parameter <literal>argnum</literal> + is zero based. <function>get_call_result_type</function> can also be used + as an alternative to <function>get_fn_expr_rettype</function>. + There is also <function>get_fn_expr_variadic</function>, which can be used to + find out whether variadic arguments have been merged into an array. + This is primarily useful for <literal>VARIADIC "any"</literal> functions, + since such merging will always have occurred for variadic functions + taking ordinary array types. + </para> + + <para> + For example, suppose we want to write a function to accept a single + element of any type, and return a one-dimensional array of that type: + +<programlisting> +PG_FUNCTION_INFO_V1(make_array); +Datum +make_array(PG_FUNCTION_ARGS) +{ + ArrayType *result; + Oid element_type = get_fn_expr_argtype(fcinfo->flinfo, 0); + Datum element; + bool isnull; + int16 typlen; + bool typbyval; + char typalign; + int ndims; + int dims[MAXDIM]; + int lbs[MAXDIM]; + + if (!OidIsValid(element_type)) + elog(ERROR, "could not determine data type of input"); + + /* get the provided element, being careful in case it's NULL */ + isnull = PG_ARGISNULL(0); + if (isnull) + element = (Datum) 0; + else + element = PG_GETARG_DATUM(0); + + /* we have one dimension */ + ndims = 1; + /* and one element */ + dims[0] = 1; + /* and lower bound is 1 */ + lbs[0] = 1; + + /* get required info about the element type */ + get_typlenbyvalalign(element_type, &typlen, &typbyval, &typalign); + + /* now build the array */ + result = construct_md_array(&element, &isnull, ndims, dims, lbs, + element_type, typlen, typbyval, typalign); + + PG_RETURN_ARRAYTYPE_P(result); +} +</programlisting> + </para> + + <para> + The following command declares the function + <function>make_array</function> in SQL: + +<programlisting> +CREATE FUNCTION make_array(anyelement) RETURNS anyarray + AS '<replaceable>DIRECTORY</replaceable>/funcs', 'make_array' + LANGUAGE C IMMUTABLE; +</programlisting> + </para> + + <para> + There is a variant of polymorphism that is only available to C-language + functions: they can be declared to take parameters of type + <literal>"any"</literal>. (Note that this type name must be double-quoted, + since it's also an SQL reserved word.) This works like + <type>anyelement</type> except that it does not constrain different + <literal>"any"</literal> arguments to be the same type, nor do they help + determine the function's result type. A C-language function can also + declare its final parameter to be <literal>VARIADIC "any"</literal>. This will + match one or more actual arguments of any type (not necessarily the same + type). These arguments will <emphasis>not</emphasis> be gathered into an array + as happens with normal variadic functions; they will just be passed to + the function separately. The <function>PG_NARGS()</function> macro and the + methods described above must be used to determine the number of actual + arguments and their types when using this feature. Also, users of such + a function might wish to use the <literal>VARIADIC</literal> keyword in their + function call, with the expectation that the function would treat the + array elements as separate arguments. The function itself must implement + that behavior if wanted, after using <function>get_fn_expr_variadic</function> to + detect that the actual argument was marked with <literal>VARIADIC</literal>. + </para> + </sect2> + + <sect2 id="xfunc-shared-addin"> + <title>Shared Memory and LWLocks</title> + + <para> + Add-ins can reserve LWLocks and an allocation of shared memory on server + startup. The add-in's shared library must be preloaded by specifying + it in + <xref linkend="guc-shared-preload-libraries"/><indexterm><primary>shared_preload_libraries</primary></indexterm>. + The shared library should register a <literal>shmem_request_hook</literal> + in its <function>_PG_init</function> function. This + <literal>shmem_request_hook</literal> can reserve LWLocks or shared memory. + Shared memory is reserved by calling: +<programlisting> +void RequestAddinShmemSpace(int size) +</programlisting> + from your <literal>shmem_request_hook</literal>. + </para> + <para> + LWLocks are reserved by calling: +<programlisting> +void RequestNamedLWLockTranche(const char *tranche_name, int num_lwlocks) +</programlisting> + from your <literal>shmem_request_hook</literal>. This will ensure that an array of + <literal>num_lwlocks</literal> LWLocks is available under the name + <literal>tranche_name</literal>. Use <function>GetNamedLWLockTranche</function> + to get a pointer to this array. + </para> + <para> + An example of a <literal>shmem_request_hook</literal> can be found in + <filename>contrib/pg_stat_statements/pg_stat_statements.c</filename> in the + <productname>PostgreSQL</productname> source tree. + </para> + <para> + To avoid possible race-conditions, each backend should use the LWLock + <function>AddinShmemInitLock</function> when connecting to and initializing + its allocation of shared memory, as shown here: +<programlisting> +static mystruct *ptr = NULL; + +if (!ptr) +{ + bool found; + + LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE); + ptr = ShmemInitStruct("my struct name", size, &found); + if (!found) + { + initialize contents of shmem area; + acquire any requested LWLocks using: + ptr->locks = GetNamedLWLockTranche("my tranche name"); + } + LWLockRelease(AddinShmemInitLock); +} +</programlisting> + </para> + </sect2> + + <sect2 id="extend-cpp"> + <title>Using C++ for Extensibility</title> + + <indexterm zone="extend-cpp"> + <primary>C++</primary> + </indexterm> + + <para> + Although the <productname>PostgreSQL</productname> backend is written in + C, it is possible to write extensions in C++ if these guidelines are + followed: + + <itemizedlist> + <listitem> + <para> + All functions accessed by the backend must present a C interface + to the backend; these C functions can then call C++ functions. + For example, <literal>extern C</literal> linkage is required for + backend-accessed functions. This is also necessary for any + functions that are passed as pointers between the backend and + C++ code. + </para> + </listitem> + <listitem> + <para> + Free memory using the appropriate deallocation method. For example, + most backend memory is allocated using <function>palloc()</function>, so use + <function>pfree()</function> to free it. Using C++ + <function>delete</function> in such cases will fail. + </para> + </listitem> + <listitem> + <para> + Prevent exceptions from propagating into the C code (use a catch-all + block at the top level of all <literal>extern C</literal> functions). This + is necessary even if the C++ code does not explicitly throw any + exceptions, because events like out-of-memory can still throw + exceptions. Any exceptions must be caught and appropriate errors + passed back to the C interface. If possible, compile C++ with + <option>-fno-exceptions</option> to eliminate exceptions entirely; in such + cases, you must check for failures in your C++ code, e.g., check for + NULL returned by <function>new()</function>. + </para> + </listitem> + <listitem> + <para> + If calling backend functions from C++ code, be sure that the + C++ call stack contains only plain old data structures + (<acronym>POD</acronym>). This is necessary because backend errors + generate a distant <function>longjmp()</function> that does not properly + unroll a C++ call stack with non-POD objects. + </para> + </listitem> + </itemizedlist> + </para> + + <para> + In summary, it is best to place C++ code behind a wall of + <literal>extern C</literal> functions that interface to the backend, + and avoid exception, memory, and call stack leakage. + </para> + </sect2> + + </sect1> + + <sect1 id="xfunc-optimization"> + <title>Function Optimization Information</title> + + <indexterm zone="xfunc-optimization"> + <primary>optimization information</primary> + <secondary>for functions</secondary> + </indexterm> + + <para> + By default, a function is just a <quote>black box</quote> that the + database system knows very little about the behavior of. However, + that means that queries using the function may be executed much less + efficiently than they could be. It is possible to supply additional + knowledge that helps the planner optimize function calls. + </para> + + <para> + Some basic facts can be supplied by declarative annotations provided in + the <link linkend="sql-createfunction"><command>CREATE FUNCTION</command></link> command. Most important of + these is the function's <link linkend="xfunc-volatility">volatility + category</link> (<literal>IMMUTABLE</literal>, <literal>STABLE</literal>, + or <literal>VOLATILE</literal>); one should always be careful to + specify this correctly when defining a function. + The parallel safety property (<literal>PARALLEL + UNSAFE</literal>, <literal>PARALLEL RESTRICTED</literal>, or + <literal>PARALLEL SAFE</literal>) must also be specified if you hope + to use the function in parallelized queries. + It can also be useful to specify the function's estimated execution + cost, and/or the number of rows a set-returning function is estimated + to return. However, the declarative way of specifying those two + facts only allows specifying a constant value, which is often + inadequate. + </para> + + <para> + It is also possible to attach a <firstterm>planner support + function</firstterm> to an SQL-callable function (called + its <firstterm>target function</firstterm>), and thereby provide + knowledge about the target function that is too complex to be + represented declaratively. Planner support functions have to be + written in C (although their target functions might not be), so this is + an advanced feature that relatively few people will use. + </para> + + <para> + A planner support function must have the SQL signature +<programlisting> +supportfn(internal) returns internal +</programlisting> + It is attached to its target function by specifying + the <literal>SUPPORT</literal> clause when creating the target function. + </para> + + <para> + The details of the API for planner support functions can be found in + file <filename>src/include/nodes/supportnodes.h</filename> in the + <productname>PostgreSQL</productname> source code. Here we provide + just an overview of what planner support functions can do. + The set of possible requests to a support function is extensible, + so more things might be possible in future versions. + </para> + + <para> + Some function calls can be simplified during planning based on + properties specific to the function. For example, + <literal>int4mul(n, 1)</literal> could be simplified to + just <literal>n</literal>. This type of transformation can be + performed by a planner support function, by having it implement + the <literal>SupportRequestSimplify</literal> request type. + The support function will be called for each instance of its target + function found in a query parse tree. If it finds that the particular + call can be simplified into some other form, it can build and return a + parse tree representing that expression. This will automatically work + for operators based on the function, too — in the example just + given, <literal>n * 1</literal> would also be simplified to + <literal>n</literal>. + (But note that this is just an example; this particular + optimization is not actually performed by + standard <productname>PostgreSQL</productname>.) + We make no guarantee that <productname>PostgreSQL</productname> will + never call the target function in cases that the support function could + simplify. Ensure rigorous equivalence between the simplified + expression and an actual execution of the target function. + </para> + + <para> + For target functions that return <type>boolean</type>, it is often useful to estimate + the fraction of rows that will be selected by a <literal>WHERE</literal> clause using that + function. This can be done by a support function that implements + the <literal>SupportRequestSelectivity</literal> request type. + </para> + + <para> + If the target function's run time is highly dependent on its inputs, + it may be useful to provide a non-constant cost estimate for it. + This can be done by a support function that implements + the <literal>SupportRequestCost</literal> request type. + </para> + + <para> + For target functions that return sets, it is often useful to provide + a non-constant estimate for the number of rows that will be returned. + This can be done by a support function that implements + the <literal>SupportRequestRows</literal> request type. + </para> + + <para> + For target functions that return <type>boolean</type>, it may be possible to + convert a function call appearing in <literal>WHERE</literal> into an indexable operator + clause or clauses. The converted clauses might be exactly equivalent + to the function's condition, or they could be somewhat weaker (that is, + they might accept some values that the function condition does not). + In the latter case the index condition is said to + be <firstterm>lossy</firstterm>; it can still be used to scan an index, + but the function call will have to be executed for each row returned by + the index to see if it really passes the <literal>WHERE</literal> condition or not. + To create such conditions, the support function must implement + the <literal>SupportRequestIndexCondition</literal> request type. + </para> + </sect1> |