38.10. C-Language Functions

38.10. C-Language Functions
Prev	Up	Chapter 38. Extending SQL	Home	Next

38.10.1. Dynamic Loading
38.10.2. Base Types in C-Language Functions
38.10.3. Version 1 Calling Conventions
38.10.4. Writing Code
38.10.5. Compiling and Linking Dynamically-Loaded Functions
38.10.6. Composite-Type Arguments
38.10.7. Returning Rows (Composite Types)
38.10.8. Returning Sets
38.10.9. Polymorphic Arguments and Return Types
38.10.10. Shared Memory and LWLocks
38.10.11. Using C++ for Extensibility

+ User-defined functions can be written in C (or a language that can + be made compatible with C, such as C++). Such functions are + compiled into dynamically loadable objects (also called shared + libraries) and are loaded by the server on demand. The dynamic + loading feature is what distinguishes “C language” functions + from “internal” functions — the actual coding conventions + are essentially the same for both. (Hence, the standard internal + function library is a rich source of coding examples for user-defined + C functions.) +

+ Currently only one calling convention is used for C functions + (“version 1”). Support for that calling convention is + indicated by writing a PG_FUNCTION_INFO_V1() macro + call for the function, as illustrated below. +

38.10.1. Dynamic Loading

+ The first time a user-defined function in a particular + loadable object file is called in a session, + the dynamic loader loads that object file into memory so that the + function can be called. The CREATE FUNCTION + for a user-defined C function must therefore specify two pieces of + information for the function: the name of the loadable + object file, and the C name (link symbol) of the specific function to call + within that object file. If the C name is not explicitly specified then + it is assumed to be the same as the SQL function name. +

+ The following algorithm is used to locate the shared object file + based on the name given in the CREATE FUNCTION + command: + +

+ If the name is an absolute path, the given file is loaded. +
+ If the name starts with the string $libdir, + that part is replaced by the PostgreSQL package + library directory + name, which is determined at build time. +
+ If the name does not contain a directory part, the file is + searched for in the path specified by the configuration variable + dynamic_library_path. +
+ Otherwise (the file was not found in the path, or it contains a + non-absolute directory part), the dynamic loader will try to + take the name as given, which will most likely fail. (It is + unreliable to depend on the current working directory.) +

+ + If this sequence does not work, the platform-specific shared + library file name extension (often .so) is + appended to the given name and this sequence is tried again. If + that fails as well, the load will fail. +

+ It is recommended to locate shared libraries either relative to + $libdir or through the dynamic library path. + This simplifies version upgrades if the new installation is at a + different location. The actual directory that + $libdir stands for can be found out with the + command pg_config --pkglibdir. +

+ The user ID the PostgreSQL server runs + as must be able to traverse the path to the file you intend to + load. Making the file or a higher-level directory not readable + and/or not executable by the postgres + user is a common mistake. +

+ In any case, the file name that is given in the + CREATE FUNCTION command is recorded literally + in the system catalogs, so if the file needs to be loaded again + the same procedure is applied. +

Note

+ PostgreSQL will not compile a C function + automatically. The object file must be compiled before it is referenced + in a CREATE + FUNCTION command. See Section 38.10.5 for additional + information. +

+ To ensure that a dynamically loaded object file is not loaded into an + incompatible server, PostgreSQL checks that the + file contains a “magic block” with the appropriate contents. + This allows the server to detect obvious incompatibilities, such as code + compiled for a different major version of + PostgreSQL. To include a magic block, + write this in one (and only one) of the module source files, after having + included the header fmgr.h: + +

+PG_MODULE_MAGIC;
+

+ After it is used for the first time, a dynamically loaded object + file is retained in memory. Future calls in the same session to + the function(s) in that file will only incur the small overhead of + a symbol table lookup. If you need to force a reload of an object + file, for example after recompiling it, begin a fresh session. +

+ Optionally, a dynamically loaded file can contain an initialization + function. If the file includes a function named + _PG_init, that function will be called immediately after + loading the file. The function receives no parameters and should + return void. There is presently no way to unload a dynamically loaded file. +

38.10.2. Base Types in C-Language Functions

+ To know how to write C-language functions, you need to know how + PostgreSQL internally represents base + data types and how they can be passed to and from functions. + Internally, PostgreSQL regards a base + type as a “blob of memory”. The user-defined + functions that you define over a type in turn define the way that + PostgreSQL can operate on it. That + is, PostgreSQL will only store and + retrieve the data from disk and use your user-defined functions + to input, process, and output the data. +

+ Base types can have one of three internal formats: + +

+ pass by value, fixed-length +
+ pass by reference, fixed-length +
+ pass by reference, variable-length +

+ By-value types can only be 1, 2, or 4 bytes in length + (also 8 bytes, if sizeof(Datum) is 8 on your machine). + You should be careful to define your types such that they will be the + same size (in bytes) on all architectures. For example, the + long type is dangerous because it is 4 bytes on some + machines and 8 bytes on others, whereas int type is 4 bytes + on most Unix machines. A reasonable implementation of the + int4 type on Unix machines might be: + +

+/* 4-byte integer, passed by value */
+typedef int int4;
+

+ + (The actual PostgreSQL C code calls this type int32, because + it is a convention in C that intXX + means XX bits. Note + therefore also that the C type int8 is 1 byte in size. The + SQL type int8 is called int64 in C. See also + Table 38.2.) +

+ On the other hand, fixed-length types of any size can + be passed by-reference. For example, here is a sample + implementation of a PostgreSQL type: + +

+/* 16-byte structure, passed by reference */
+typedef struct
+{
+    double  x, y;
+} Point;
+

+ + Only pointers to such types can be used when passing + them in and out of PostgreSQL functions. + To return a value of such a type, allocate the right amount of + memory with palloc, fill in the allocated memory, + and return a pointer to it. (Also, if you just want to return the + same value as one of your input arguments that's of the same data type, + you can skip the extra palloc and just return the + pointer to the input value.) +

+ Finally, all variable-length types must also be passed + by reference. All variable-length types must begin + with an opaque length field of exactly 4 bytes, which will be set + by SET_VARSIZE; never set this field directly! All data to + be stored within that type must be located in the memory + immediately following that length field. The + length field contains the total length of the structure, + that is, it includes the size of the length field + itself. +

+ Another important point is to avoid leaving any uninitialized bits + within data type values; for example, take care to zero out any + alignment padding bytes that might be present in structs. Without + this, logically-equivalent constants of your data type might be + seen as unequal by the planner, leading to inefficient (though not + incorrect) plans. +

Warning

+ Never modify the contents of a pass-by-reference input + value. If you do so you are likely to corrupt on-disk data, since + the pointer you are given might point directly into a disk buffer. + The sole exception to this rule is explained in + Section 38.12. +

+ As an example, we can define the type text as + follows: + +

+typedef struct {
+    int32 length;
+    char data[FLEXIBLE_ARRAY_MEMBER];
+} text;
+

+ + The [FLEXIBLE_ARRAY_MEMBER] notation means that the actual + length of the data part is not specified by this declaration. +

+ When manipulating + variable-length types, we must be careful to allocate + the correct amount of memory and set the length field correctly. + For example, if we wanted to store 40 bytes in a text + structure, we might use a code fragment like this: + +

+#include "postgres.h"
+...
+char buffer[40]; /* our source data */
+...
+text *destination = (text *) palloc(VARHDRSZ + 40);
+SET_VARSIZE(destination, VARHDRSZ + 40);
+memcpy(destination->data, buffer, 40);
+...
+
+

+ + VARHDRSZ is the same as sizeof(int32), but + it's considered good style to use the macro VARHDRSZ + to refer to the size of the overhead for a variable-length type. + Also, the length field must be set using the + SET_VARSIZE macro, not by simple assignment. +

+ Table 38.2 shows the C types + corresponding to many of the built-in SQL data types + of PostgreSQL. + The “Defined In” column gives the header file that + needs to be included to get the type definition. (The actual + definition might be in a different file that is included by the + listed file. It is recommended that users stick to the defined + interface.) Note that you should always include + postgres.h first in any source file of server + code, because it declares a number of things that you will need + anyway, and because including other headers first can cause + portability issues. +

Table 38.2. Equivalent C Types for Built-in SQL Types

+ SQL Type +	+ C Type +	+ Defined In +
`boolean`	`bool`	`postgres.h` (maybe compiler built-in)
`box`	`BOX*`	`utils/geo_decls.h`
`bytea`	`bytea*`	`postgres.h`
`"char"`	`char`	(compiler built-in)
`character`	`BpChar*`	`postgres.h`
`cid`	`CommandId`	`postgres.h`
`date`	`DateADT`	`utils/date.h`
`float4` (`real`)	`float4`	`postgres.h`
`float8` (`double precision`)	`float8`	`postgres.h`
`int2` (`smallint`)	`int16`	`postgres.h`
`int4` (`integer`)	`int32`	`postgres.h`
`int8` (`bigint`)	`int64`	`postgres.h`
`interval`	`Interval*`	`datatype/timestamp.h`
`lseg`	`LSEG*`	`utils/geo_decls.h`
`name`	`Name`	`postgres.h`
`numeric`	`Numeric`	`utils/numeric.h`
`oid`	`Oid`	`postgres.h`
`oidvector`	`oidvector*`	`postgres.h`
`path`	`PATH*`	`utils/geo_decls.h`
`point`	`POINT*`	`utils/geo_decls.h`
`regproc`	`RegProcedure`	`postgres.h`
`text`	`text*`	`postgres.h`
`tid`	`ItemPointer`	`storage/itemptr.h`
`time`	`TimeADT`	`utils/date.h`
`time with time zone`	`TimeTzADT`	`utils/date.h`
`timestamp`	`Timestamp`	`datatype/timestamp.h`
`timestamp with time zone`	`TimestampTz`	`datatype/timestamp.h`
`varchar`	`VarChar*`	`postgres.h`
`xid`	`TransactionId`	`postgres.h`

+ Now that we've gone over all of the possible structures + for base types, we can show some examples of real functions. +

38.10.3. Version 1 Calling Conventions

+ The version-1 calling convention relies on macros to suppress most + of the complexity of passing arguments and results. The C declaration + of a version-1 function is always: +

+Datum funcname(PG_FUNCTION_ARGS)
+

+ In addition, the macro call: +

+PG_FUNCTION_INFO_V1(funcname);
+

+ must appear in the same source file. (Conventionally, it's + written just before the function itself.) This macro call is not + needed for internal-language functions, since + PostgreSQL assumes that all internal functions + use the version-1 convention. It is, however, required for + dynamically-loaded functions. +

+ In a version-1 function, each actual argument is fetched using a + PG_GETARG_xxx() + macro that corresponds to the argument's data type. (In non-strict + functions there needs to be a previous check about argument null-ness + using PG_ARGISNULL(); see below.) + The result is returned using a + PG_RETURN_xxx() + macro for the return type. + PG_GETARG_xxx() + takes as its argument the number of the function argument to + fetch, where the count starts at 0. + PG_RETURN_xxx() + takes as its argument the actual value to return. +

+ Here are some examples using the version-1 calling convention: +

+#include "postgres.h"
+#include <string.h>
+#include "fmgr.h"
+#include "utils/geo_decls.h"
+
+PG_MODULE_MAGIC;
+
+/* by value */
+
+PG_FUNCTION_INFO_V1(add_one);
+
+Datum
+add_one(PG_FUNCTION_ARGS)
+{
+    int32   arg = PG_GETARG_INT32(0);
+
+    PG_RETURN_INT32(arg + 1);
+}
+
+/* by reference, fixed length */
+
+PG_FUNCTION_INFO_V1(add_one_float8);
+
+Datum
+add_one_float8(PG_FUNCTION_ARGS)
+{
+    /* The macros for FLOAT8 hide its pass-by-reference nature. */
+    float8   arg = PG_GETARG_FLOAT8(0);
+
+    PG_RETURN_FLOAT8(arg + 1.0);
+}
+
+PG_FUNCTION_INFO_V1(makepoint);
+
+Datum
+makepoint(PG_FUNCTION_ARGS)
+{
+    /* Here, the pass-by-reference nature of Point is not hidden. */
+    Point     *pointx = PG_GETARG_POINT_P(0);
+    Point     *pointy = PG_GETARG_POINT_P(1);
+    Point     *new_point = (Point *) palloc(sizeof(Point));
+
+    new_point->x = pointx->x;
+    new_point->y = pointy->y;
+
+    PG_RETURN_POINT_P(new_point);
+}
+
+/* by reference, variable length */
+
+PG_FUNCTION_INFO_V1(copytext);
+
+Datum
+copytext(PG_FUNCTION_ARGS)
+{
+    text     *t = PG_GETARG_TEXT_PP(0);
+
+    /*
+     * VARSIZE_ANY_EXHDR is the size of the struct in bytes, minus the
+     * VARHDRSZ or VARHDRSZ_SHORT of its header.  Construct the copy with a
+     * full-length header.
+     */
+    text     *new_t = (text *) palloc(VARSIZE_ANY_EXHDR(t) + VARHDRSZ);
+    SET_VARSIZE(new_t, VARSIZE_ANY_EXHDR(t) + VARHDRSZ);
+
+    /*
+     * VARDATA is a pointer to the data region of the new struct.  The source
+     * could be a short datum, so retrieve its data through VARDATA_ANY.
+     */
+    memcpy((void *) VARDATA(new_t), /* destination */
+           (void *) VARDATA_ANY(t), /* source */
+           VARSIZE_ANY_EXHDR(t));   /* how many bytes */
+    PG_RETURN_TEXT_P(new_t);
+}
+
+PG_FUNCTION_INFO_V1(concat_text);
+
+Datum
+concat_text(PG_FUNCTION_ARGS)
+{
+    text  *arg1 = PG_GETARG_TEXT_PP(0);
+    text  *arg2 = PG_GETARG_TEXT_PP(1);
+    int32 arg1_size = VARSIZE_ANY_EXHDR(arg1);
+    int32 arg2_size = VARSIZE_ANY_EXHDR(arg2);
+    int32 new_text_size = arg1_size + arg2_size + VARHDRSZ;
+    text *new_text = (text *) palloc(new_text_size);
+
+    SET_VARSIZE(new_text, new_text_size);
+    memcpy(VARDATA(new_text), VARDATA_ANY(arg1), arg1_size);
+    memcpy(VARDATA(new_text) + arg1_size, VARDATA_ANY(arg2), arg2_size);
+    PG_RETURN_TEXT_P(new_text);
+}
+
+

+ Supposing that the above code has been prepared in file + funcs.c and compiled into a shared object, + we could define the functions to PostgreSQL + with commands like this: +

+CREATE FUNCTION add_one(integer) RETURNS integer
+     AS 'DIRECTORY/funcs', 'add_one'
+     LANGUAGE C STRICT;
+
+-- note overloading of SQL function name "add_one"
+CREATE FUNCTION add_one(double precision) RETURNS double precision
+     AS 'DIRECTORY/funcs', 'add_one_float8'
+     LANGUAGE C STRICT;
+
+CREATE FUNCTION makepoint(point, point) RETURNS point
+     AS 'DIRECTORY/funcs', 'makepoint'
+     LANGUAGE C STRICT;
+
+CREATE FUNCTION copytext(text) RETURNS text
+     AS 'DIRECTORY/funcs', 'copytext'
+     LANGUAGE C STRICT;
+
+CREATE FUNCTION concat_text(text, text) RETURNS text
+     AS 'DIRECTORY/funcs', 'concat_text'
+     LANGUAGE C STRICT;
+

+ Here, DIRECTORY stands for the + directory of the shared library file (for instance the + PostgreSQL tutorial directory, which + contains the code for the examples used in this section). + (Better style would be to use just 'funcs' in the + AS clause, after having added + DIRECTORY to the search path. In any + case, we can omit the system-specific extension for a shared + library, commonly .so.) +

+ Notice that we have specified the functions as “strict”, + meaning that + the system should automatically assume a null result if any input + value is null. By doing this, we avoid having to check for null inputs + in the function code. Without this, we'd have to check for null values + explicitly, using PG_ARGISNULL(). +

+ The macro PG_ARGISNULL(n) + allows a function to test whether each input is null. (Of course, doing + this is only necessary in functions not declared “strict”.) + As with the + PG_GETARG_xxx() macros, + the input arguments are counted beginning at zero. Note that one + should refrain from executing + PG_GETARG_xxx() until + one has verified that the argument isn't null. + To return a null result, execute PG_RETURN_NULL(); + this works in both strict and nonstrict functions. +

+ At first glance, the version-1 coding conventions might appear + to be just pointless obscurantism, compared to using + plain C calling conventions. They do however allow + us to deal with NULLable arguments/return values, + and “toasted” (compressed or out-of-line) values. +

+ Other options provided by the version-1 interface are two + variants of the + PG_GETARG_xxx() + macros. The first of these, + PG_GETARG_xxx_COPY(), + guarantees to return a copy of the specified argument that is + safe for writing into. (The normal macros will sometimes return a + pointer to a value that is physically stored in a table, which + must not be written to. Using the + PG_GETARG_xxx_COPY() + macros guarantees a writable result.) + The second variant consists of the + PG_GETARG_xxx_SLICE() + macros which take three arguments. The first is the number of the + function argument (as above). The second and third are the offset and + length of the segment to be returned. Offsets are counted from + zero, and a negative length requests that the remainder of the + value be returned. These macros provide more efficient access to + parts of large values in the case where they have storage type + “external”. (The storage type of a column can be specified using + ALTER TABLE tablename ALTER + COLUMN colname SET STORAGE + storagetype. storagetype is one of + plain, external, extended, + or main.) +

+ Finally, the version-1 function call conventions make it possible + to return set results (Section 38.10.8) and + implement trigger functions (Chapter 39) and + procedural-language call handlers (Chapter 58). For more details + see src/backend/utils/fmgr/README in the + source distribution. +

38.10.4. Writing Code

+ Before we turn to the more advanced topics, we should discuss + some coding rules for PostgreSQL + C-language functions. While it might be possible to load functions + written in languages other than C into + PostgreSQL, this is usually difficult + (when it is possible at all) because other languages, such as + C++, FORTRAN, or Pascal often do not follow the same calling + convention as C. That is, other languages do not pass argument + and return values between functions in the same way. For this + reason, we will assume that your C-language functions are + actually written in C. +

+ The basic rules for writing and building C functions are as follows: + +

+ Use pg_config + --includedir-server + to find out where the PostgreSQL server header + files are installed on your system (or the system that your + users will be running on). +
+ Compiling and linking your code so that it can be dynamically + loaded into PostgreSQL always + requires special flags. See Section 38.10.5 for a + detailed explanation of how to do it for your particular + operating system. +
+ Remember to define a “magic block” for your shared library, + as described in Section 38.10.1. +
+ When allocating memory, use the + PostgreSQL functions + palloc and pfree + instead of the corresponding C library functions + malloc and free. + The memory allocated by palloc will be + freed automatically at the end of each transaction, preventing + memory leaks. +
+ Always zero the bytes of your structures using memset + (or allocate them with palloc0 in the first place). + Even if you assign to each field of your structure, there might be + alignment padding (holes in the structure) that contain + garbage values. Without this, it's difficult to + support hash indexes or hash joins, as you must pick out only + the significant bits of your data structure to compute a hash. + The planner also sometimes relies on comparing constants via + bitwise equality, so you can get undesirable planning results if + logically-equivalent values aren't bitwise equal. +
+ Most of the internal PostgreSQL + types are declared in postgres.h, while + the function manager interfaces + (PG_FUNCTION_ARGS, etc.) are in + fmgr.h, so you will need to include at + least these two files. For portability reasons it's best to + include postgres.h first, + before any other system or user header files. Including + postgres.h will also include + elog.h and palloc.h + for you. +
+ Symbol names defined within object files must not conflict + with each other or with symbols defined in the + PostgreSQL server executable. You + will have to rename your functions or variables if you get + error messages to this effect. +

38.10.5. Compiling and Linking Dynamically-Loaded Functions

+ Before you are able to use your + PostgreSQL extension functions written in + C, they must be compiled and linked in a special way to produce a + file that can be dynamically loaded by the server. To be precise, a + shared library needs to be + created. + +

+ For information beyond what is contained in this section + you should read the documentation of your + operating system, in particular the manual pages for the C compiler, + cc, and the link editor, ld. + In addition, the PostgreSQL source code + contains several working examples in the + contrib directory. If you rely on these + examples you will make your modules dependent on the availability + of the PostgreSQL source code, however. +

+ Creating shared libraries is generally analogous to linking + executables: first the source files are compiled into object files, + then the object files are linked together. The object files need to + be created as position-independent code + (PIC), which + conceptually means that they can be placed at an arbitrary location + in memory when they are loaded by the executable. (Object files + intended for executables are usually not compiled that way.) The + command to link a shared library contains special flags to + distinguish it from linking an executable (at least in theory + — on some systems the practice is much uglier). +

+ In the following examples we assume that your source code is in a + file foo.c and we will create a shared library + foo.so. The intermediate object file will be + called foo.o unless otherwise noted. A shared + library can contain more than one object file, but we only use one + here. +

+ FreeBSD + +

+ The compiler flag to create PIC is + -fPIC. To create shared libraries the compiler + flag is -shared. +

+gcc -fPIC -c foo.c
+gcc -shared -o foo.so foo.o
+

+ This is applicable as of version 3.0 of + FreeBSD. +

+ HP-UX + +

+ The compiler flag of the system compiler to create + PIC is +z. When using + GCC it's -fPIC. The + linker flag for shared libraries is -b. So: +

+cc +z -c foo.c
+

+ or: +

+gcc -fPIC -c foo.c
+

+ and then: +

+ld -b -o foo.sl foo.o
+

+ HP-UX uses the extension + .sl for shared libraries, unlike most other + systems. +

+ Linux + +

+ The compiler flag to create PIC is + -fPIC. + The compiler flag to create a shared library is + -shared. A complete example looks like this: +

+cc -fPIC -c foo.c
+cc -shared -o foo.so foo.o
+

+ macOS + +

+ Here is an example. It assumes the developer tools are installed. +

+cc -c foo.c
+cc -bundle -flat_namespace -undefined suppress -o foo.so foo.o
+

+ NetBSD + +

+ The compiler flag to create PIC is + -fPIC. For ELF systems, the + compiler with the flag -shared is used to link + shared libraries. On the older non-ELF systems, ld + -Bshareable is used. +

+gcc -fPIC -c foo.c
+gcc -shared -o foo.so foo.o
+

+ OpenBSD + +

+ The compiler flag to create PIC is + -fPIC. ld -Bshareable is + used to link shared libraries. +

+gcc -fPIC -c foo.c
+ld -Bshareable -o foo.so foo.o
+

+ Solaris + +

+ The compiler flag to create PIC is + -KPIC with the Sun compiler and + -fPIC with GCC. To + link shared libraries, the compiler option is + -G with either compiler or alternatively + -shared with GCC. +

+cc -KPIC -c foo.c
+cc -G -o foo.so foo.o
+

+ or +

+gcc -fPIC -c foo.c
+gcc -G -o foo.so foo.o
+

Tip

+ If this is too complicated for you, you should consider using + + GNU Libtool, + which hides the platform differences behind a uniform interface. +

+ The resulting shared library file can then be loaded into + PostgreSQL. When specifying the file name + to the CREATE FUNCTION command, one must give it + the name of the shared library file, not the intermediate object file. + Note that the system's standard shared-library extension (usually + .so or .sl) can be omitted from + the CREATE FUNCTION command, and normally should + be omitted for best portability. +

+ Refer back to Section 38.10.1 about where the + server expects to find the shared library files. +

38.10.6. Composite-Type Arguments

+ Composite types do not have a fixed layout like C structures. + Instances of a composite type can contain null fields. In + addition, composite types that are part of an inheritance + hierarchy can have different fields than other members of the + same inheritance hierarchy. Therefore, + PostgreSQL provides a function + interface for accessing fields of composite types from C. +

+ Suppose we want to write a function to answer the query: + +

+SELECT name, c_overpaid(emp, 1500) AS overpaid
+    FROM emp
+    WHERE name = 'Bill' OR name = 'Sam';
+

+ + Using the version-1 calling conventions, we can define + c_overpaid as: + +

+#include "postgres.h"
+#include "executor/executor.h"  /* for GetAttributeByName() */
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(c_overpaid);
+
+Datum
+c_overpaid(PG_FUNCTION_ARGS)
+{
+    HeapTupleHeader  t = PG_GETARG_HEAPTUPLEHEADER(0);
+    int32            limit = PG_GETARG_INT32(1);
+    bool isnull;
+    Datum salary;
+
+    salary = GetAttributeByName(t, "salary", &isnull);
+    if (isnull)
+        PG_RETURN_BOOL(false);
+    /* Alternatively, we might prefer to do PG_RETURN_NULL() for null salary. */
+
+    PG_RETURN_BOOL(DatumGetInt32(salary) > limit);
+}
+
+

+ GetAttributeByName is the + PostgreSQL system function that + returns attributes out of the specified row. It has + three arguments: the argument of type HeapTupleHeader passed + into + the function, the name of the desired attribute, and a + return parameter that tells whether the attribute + is null. GetAttributeByName returns a Datum + value that you can convert to the proper data type by using the + appropriate DatumGetXXX() + macro. Note that the return value is meaningless if the null flag is + set; always check the null flag before trying to do anything with the + result. +

+ There is also GetAttributeByNum, which selects + the target attribute by column number instead of name. +

+ The following command declares the function + c_overpaid in SQL: + +

+CREATE FUNCTION c_overpaid(emp, integer) RETURNS boolean
+    AS 'DIRECTORY/funcs', 'c_overpaid'
+    LANGUAGE C STRICT;
+

+ + Notice we have used STRICT so that we did not have to + check whether the input arguments were NULL. +

38.10.7. Returning Rows (Composite Types)

+ To return a row or composite-type value from a C-language + function, you can use a special API that provides macros and + functions to hide most of the complexity of building composite + data types. To use this API, the source file must include: +

+#include "funcapi.h"
+

+ There are two ways you can build a composite data value (henceforth + a “tuple”): you can build it from an array of Datum values, + or from an array of C strings that can be passed to the input + conversion functions of the tuple's column data types. In either + case, you first need to obtain or construct a TupleDesc + descriptor for the tuple structure. When working with Datums, you + pass the TupleDesc to BlessTupleDesc, + and then call heap_form_tuple for each row. When working + with C strings, you pass the TupleDesc to + TupleDescGetAttInMetadata, and then call + BuildTupleFromCStrings for each row. In the case of a + function returning a set of tuples, the setup steps can all be done + once during the first call of the function. +

+ Several helper functions are available for setting up the needed + TupleDesc. The recommended way to do this in most + functions returning composite values is to call: +

+TypeFuncClass get_call_result_type(FunctionCallInfo fcinfo,
+                                   Oid *resultTypeId,
+                                   TupleDesc *resultTupleDesc)
+

+ passing the same fcinfo struct passed to the calling function + itself. (This of course requires that you use the version-1 + calling conventions.) resultTypeId can be specified + as NULL or as the address of a local variable to receive the + function's result type OID. resultTupleDesc should be the + address of a local TupleDesc variable. Check that the + result is TYPEFUNC_COMPOSITE; if so, + resultTupleDesc has been filled with the needed + TupleDesc. (If it is not, you can report an error along + the lines of “function returning record called in context that + cannot accept type record”.) +

Tip

+ get_call_result_type can resolve the actual type of a + polymorphic function result; so it is useful in functions that return + scalar polymorphic results, not only functions that return composites. + The resultTypeId output is primarily useful for functions + returning polymorphic scalars. +

Note

+ get_call_result_type has a sibling + get_expr_result_type, which can be used to resolve the + expected output type for a function call represented by an expression + tree. This can be used when trying to determine the result type from + outside the function itself. There is also + get_func_result_type, which can be used when only the + function's OID is available. However these functions are not able + to deal with functions declared to return record, and + get_func_result_type cannot resolve polymorphic types, + so you should preferentially use get_call_result_type. +

+ Older, now-deprecated functions for obtaining + TupleDescs are: +

+TupleDesc RelationNameGetTupleDesc(const char *relname)
+

+ to get a TupleDesc for the row type of a named relation, + and: +

+TupleDesc TypeGetTupleDesc(Oid typeoid, List *colaliases)
+

+ to get a TupleDesc based on a type OID. This can + be used to get a TupleDesc for a base or + composite type. It will not work for a function that returns + record, however, and it cannot resolve polymorphic + types. +

+ Once you have a TupleDesc, call: +

+TupleDesc BlessTupleDesc(TupleDesc tupdesc)
+

+ if you plan to work with Datums, or: +

+AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc)
+

+ if you plan to work with C strings. If you are writing a function + returning set, you can save the results of these functions in the + FuncCallContext structure — use the + tuple_desc or attinmeta field + respectively. +

+ When working with Datums, use: +

+HeapTuple heap_form_tuple(TupleDesc tupdesc, Datum *values, bool *isnull)
+

+ to build a HeapTuple given user data in Datum form. +

+ When working with C strings, use: +

+HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values)
+

+ to build a HeapTuple given user data + in C string form. values is an array of C strings, + one for each attribute of the return row. Each C string should be in + the form expected by the input function of the attribute data + type. In order to return a null value for one of the attributes, + the corresponding pointer in the values array + should be set to NULL. This function will need to + be called again for each row you return. +

+ Once you have built a tuple to return from your function, it + must be converted into a Datum. Use: +

+HeapTupleGetDatum(HeapTuple tuple)
+

+ to convert a HeapTuple into a valid Datum. This + Datum can be returned directly if you intend to return + just a single row, or it can be used as the current return value + in a set-returning function. +

+ An example appears in the next section. +

38.10.8. Returning Sets

+ C-language functions have two options for returning sets (multiple + rows). In one method, called ValuePerCall + mode, a set-returning function is called repeatedly (passing the same + arguments each time) and it returns one new row on each call, until + it has no more rows to return and signals that by returning NULL. + The set-returning function (SRF) must therefore + save enough state across calls to remember what it was doing and + return the correct next item on each call. + In the other method, called Materialize mode, + an SRF fills and returns a tuplestore object containing its + entire result; then only one call occurs for the whole result, and + no inter-call state is needed. +

+ When using ValuePerCall mode, it is important to remember that the + query is not guaranteed to be run to completion; that is, due to + options such as LIMIT, the executor might stop + making calls to the set-returning function before all rows have been + fetched. This means it is not safe to perform cleanup activities in + the last call, because that might not ever happen. It's recommended + to use Materialize mode for functions that need access to external + resources, such as file descriptors. +

+ The remainder of this section documents a set of helper macros that + are commonly used (though not required to be used) for SRFs using + ValuePerCall mode. Additional details about Materialize mode can be + found in src/backend/utils/fmgr/README. Also, + the contrib modules in + the PostgreSQL source distribution contain + many examples of SRFs using both ValuePerCall and Materialize mode. +

+ To use the ValuePerCall support macros described here, + include funcapi.h. These macros work with a + structure FuncCallContext that contains the + state that needs to be saved across calls. Within the calling + SRF, fcinfo->flinfo->fn_extra is used to + hold a pointer to FuncCallContext across + calls. The macros automatically fill that field on first use, + and expect to find the same pointer there on subsequent uses. +

+typedef struct FuncCallContext
+{
+    /*
+     * Number of times we've been called before
+     *
+     * call_cntr is initialized to 0 for you by SRF_FIRSTCALL_INIT(), and
+     * incremented for you every time SRF_RETURN_NEXT() is called.
+     */
+    uint64 call_cntr;
+
+    /*
+     * OPTIONAL maximum number of calls
+     *
+     * max_calls is here for convenience only and setting it is optional.
+     * If not set, you must provide alternative means to know when the
+     * function is done.
+     */
+    uint64 max_calls;
+
+    /*
+     * OPTIONAL pointer to miscellaneous user-provided context information
+     *
+     * user_fctx is for use as a pointer to your own data to retain
+     * arbitrary context information between calls of your function.
+     */
+    void *user_fctx;
+
+    /*
+     * OPTIONAL pointer to struct containing attribute type input metadata
+     *
+     * attinmeta is for use when returning tuples (i.e., composite data types)
+     * and is not used when returning base data types. It is only needed
+     * if you intend to use BuildTupleFromCStrings() to create the return
+     * tuple.
+     */
+    AttInMetadata *attinmeta;
+
+    /*
+     * memory context used for structures that must live for multiple calls
+     *
+     * multi_call_memory_ctx is set by SRF_FIRSTCALL_INIT() for you, and used
+     * by SRF_RETURN_DONE() for cleanup. It is the most appropriate memory
+     * context for any memory that is to be reused across multiple calls
+     * of the SRF.
+     */
+    MemoryContext multi_call_memory_ctx;
+
+    /*
+     * OPTIONAL pointer to struct containing tuple description
+     *
+     * tuple_desc is for use when returning tuples (i.e., composite data types)
+     * and is only needed if you are going to build the tuples with
+     * heap_form_tuple() rather than with BuildTupleFromCStrings().  Note that
+     * the TupleDesc pointer stored here should usually have been run through
+     * BlessTupleDesc() first.
+     */
+    TupleDesc tuple_desc;
+
+} FuncCallContext;
+

+ The macros to be used by an SRF using this + infrastructure are: +

+SRF_IS_FIRSTCALL()
+

+ Use this to determine if your function is being called for the first or a + subsequent time. On the first call (only), call: +

+SRF_FIRSTCALL_INIT()
+

+ to initialize the FuncCallContext. On every function call, + including the first, call: +

+SRF_PERCALL_SETUP()
+

+ to set up for using the FuncCallContext. +

+ If your function has data to return in the current call, use: +

+SRF_RETURN_NEXT(funcctx, result)
+

+ to return it to the caller. (result must be of type + Datum, either a single value or a tuple prepared as + described above.) Finally, when your function is finished + returning data, use: +

+SRF_RETURN_DONE(funcctx)
+

+ to clean up and end the SRF. +

+ The memory context that is current when the SRF is called is + a transient context that will be cleared between calls. This means + that you do not need to call pfree on everything + you allocated using palloc; it will go away anyway. However, if you want to allocate + any data structures to live across calls, you need to put them somewhere + else. The memory context referenced by + multi_call_memory_ctx is a suitable location for any + data that needs to survive until the SRF is finished running. In most + cases, this means that you should switch into + multi_call_memory_ctx while doing the + first-call setup. + Use funcctx->user_fctx to hold a pointer to + any such cross-call data structures. + (Data you allocate + in multi_call_memory_ctx will go away + automatically when the query ends, so it is not necessary to free + that data manually, either.) +

Warning

+ While the actual arguments to the function remain unchanged between + calls, if you detoast the argument values (which is normally done + transparently by the + PG_GETARG_xxx macro) + in the transient context then the detoasted copies will be freed on + each cycle. Accordingly, if you keep references to such values in + your user_fctx, you must either copy them into the + multi_call_memory_ctx after detoasting, or ensure + that you detoast the values only in that context. +

+ A complete pseudo-code example looks like the following: +

+Datum
+my_set_returning_function(PG_FUNCTION_ARGS)
+{
+    FuncCallContext  *funcctx;
+    Datum             result;
+    further declarations as needed
+
+    if (SRF_IS_FIRSTCALL())
+    {
+        MemoryContext oldcontext;
+
+        funcctx = SRF_FIRSTCALL_INIT();
+        oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+        /* One-time setup code appears here: */
+        user code
+        if returning composite
+            build TupleDesc, and perhaps AttInMetadata
+        endif returning composite
+        user code
+        MemoryContextSwitchTo(oldcontext);
+    }
+
+    /* Each-time setup code appears here: */
+    user code
+    funcctx = SRF_PERCALL_SETUP();
+    user code
+
+    /* this is just one way we might test whether we are done: */
+    if (funcctx->call_cntr < funcctx->max_calls)
+    {
+        /* Here we want to return another item: */
+        user code
+        obtain result Datum
+        SRF_RETURN_NEXT(funcctx, result);
+    }
+    else
+    {
+        /* Here we are done returning items, so just report that fact. */
+        /* (Resist the temptation to put cleanup code here.) */
+        SRF_RETURN_DONE(funcctx);
+    }
+}
+

+ A complete example of a simple SRF returning a composite type + looks like: +

+PG_FUNCTION_INFO_V1(retcomposite);
+
+Datum
+retcomposite(PG_FUNCTION_ARGS)
+{
+    FuncCallContext     *funcctx;
+    int                  call_cntr;
+    int                  max_calls;
+    TupleDesc            tupdesc;
+    AttInMetadata       *attinmeta;
+
+    /* stuff done only on the first call of the function */
+    if (SRF_IS_FIRSTCALL())
+    {
+        MemoryContext   oldcontext;
+
+        /* create a function context for cross-call persistence */
+        funcctx = SRF_FIRSTCALL_INIT();
+
+        /* switch to memory context appropriate for multiple function calls */
+        oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+        /* total number of tuples to be returned */
+        funcctx->max_calls = PG_GETARG_INT32(0);
+
+        /* Build a tuple descriptor for our result type */
+        if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+            ereport(ERROR,
+                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                     errmsg("function returning record called in context "
+                            "that cannot accept type record")));
+
+        /*
+         * generate attribute metadata needed later to produce tuples from raw
+         * C strings
+         */
+        attinmeta = TupleDescGetAttInMetadata(tupdesc);
+        funcctx->attinmeta = attinmeta;
+
+        MemoryContextSwitchTo(oldcontext);
+    }
+
+    /* stuff done on every call of the function */
+    funcctx = SRF_PERCALL_SETUP();
+
+    call_cntr = funcctx->call_cntr;
+    max_calls = funcctx->max_calls;
+    attinmeta = funcctx->attinmeta;
+
+    if (call_cntr < max_calls)    /* do when there is more left to send */
+    {
+        char       **values;
+        HeapTuple    tuple;
+        Datum        result;
+
+        /*
+         * Prepare a values array for building the returned tuple.
+         * This should be an array of C strings which will
+         * be processed later by the type input functions.
+         */
+        values = (char **) palloc(3 * sizeof(char *));
+        values[0] = (char *) palloc(16 * sizeof(char));
+        values[1] = (char *) palloc(16 * sizeof(char));
+        values[2] = (char *) palloc(16 * sizeof(char));
+
+        snprintf(values[0], 16, "%d", 1 * PG_GETARG_INT32(1));
+        snprintf(values[1], 16, "%d", 2 * PG_GETARG_INT32(1));
+        snprintf(values[2], 16, "%d", 3 * PG_GETARG_INT32(1));
+
+        /* build a tuple */
+        tuple = BuildTupleFromCStrings(attinmeta, values);
+
+        /* make the tuple into a datum */
+        result = HeapTupleGetDatum(tuple);
+
+        /* clean up (this is not really necessary) */
+        pfree(values[0]);
+        pfree(values[1]);
+        pfree(values[2]);
+        pfree(values);
+
+        SRF_RETURN_NEXT(funcctx, result);
+    }
+    else    /* do when there is no more left */
+    {
+        SRF_RETURN_DONE(funcctx);
+    }
+}
+
+

+ + One way to declare this function in SQL is: +

+CREATE TYPE __retcomposite AS (f1 integer, f2 integer, f3 integer);
+
+CREATE OR REPLACE FUNCTION retcomposite(integer, integer)
+    RETURNS SETOF __retcomposite
+    AS 'filename', 'retcomposite'
+    LANGUAGE C IMMUTABLE STRICT;
+

+ A different way is to use OUT parameters: +

+CREATE OR REPLACE FUNCTION retcomposite(IN integer, IN integer,
+    OUT f1 integer, OUT f2 integer, OUT f3 integer)
+    RETURNS SETOF record
+    AS 'filename', 'retcomposite'
+    LANGUAGE C IMMUTABLE STRICT;
+

+ Notice that in this method the output type of the function is formally + an anonymous record type. +

38.10.9. Polymorphic Arguments and Return Types

+ C-language functions can be declared to accept and + return the polymorphic types described in Section 38.2.5. + When a function's arguments or return types + are defined as polymorphic types, the function author cannot know + in advance what data type it will be called with, or + need to return. There are two routines provided in fmgr.h + to allow a version-1 C function to discover the actual data types + of its arguments and the type it is expected to return. The routines are + called get_fn_expr_rettype(FmgrInfo *flinfo) and + get_fn_expr_argtype(FmgrInfo *flinfo, int argnum). + They return the result or argument type OID, or InvalidOid if the + information is not available. + The structure flinfo is normally accessed as + fcinfo->flinfo. The parameter argnum + is zero based. get_call_result_type can also be used + as an alternative to get_fn_expr_rettype. + There is also get_fn_expr_variadic, which can be used to + find out whether variadic arguments have been merged into an array. + This is primarily useful for VARIADIC "any" functions, + since such merging will always have occurred for variadic functions + taking ordinary array types. +

+ For example, suppose we want to write a function to accept a single + element of any type, and return a one-dimensional array of that type: + +

+PG_FUNCTION_INFO_V1(make_array);
+Datum
+make_array(PG_FUNCTION_ARGS)
+{
+    ArrayType  *result;
+    Oid         element_type = get_fn_expr_argtype(fcinfo->flinfo, 0);
+    Datum       element;
+    bool        isnull;
+    int16       typlen;
+    bool        typbyval;
+    char        typalign;
+    int         ndims;
+    int         dims[MAXDIM];
+    int         lbs[MAXDIM];
+
+    if (!OidIsValid(element_type))
+        elog(ERROR, "could not determine data type of input");
+
+    /* get the provided element, being careful in case it's NULL */
+    isnull = PG_ARGISNULL(0);
+    if (isnull)
+        element = (Datum) 0;
+    else
+        element = PG_GETARG_DATUM(0);
+
+    /* we have one dimension */
+    ndims = 1;
+    /* and one element */
+    dims[0] = 1;
+    /* and lower bound is 1 */
+    lbs[0] = 1;
+
+    /* get required info about the element type */
+    get_typlenbyvalalign(element_type, &typlen, &typbyval, &typalign);
+
+    /* now build the array */
+    result = construct_md_array(&element, &isnull, ndims, dims, lbs,
+                                element_type, typlen, typbyval, typalign);
+
+    PG_RETURN_ARRAYTYPE_P(result);
+}
+

+ The following command declares the function + make_array in SQL: + +

+CREATE FUNCTION make_array(anyelement) RETURNS anyarray
+    AS 'DIRECTORY/funcs', 'make_array'
+    LANGUAGE C IMMUTABLE;
+

+ There is a variant of polymorphism that is only available to C-language + functions: they can be declared to take parameters of type + "any". (Note that this type name must be double-quoted, + since it's also an SQL reserved word.) This works like + anyelement except that it does not constrain different + "any" arguments to be the same type, nor do they help + determine the function's result type. A C-language function can also + declare its final parameter to be VARIADIC "any". This will + match one or more actual arguments of any type (not necessarily the same + type). These arguments will not be gathered into an array + as happens with normal variadic functions; they will just be passed to + the function separately. The PG_NARGS() macro and the + methods described above must be used to determine the number of actual + arguments and their types when using this feature. Also, users of such + a function might wish to use the VARIADIC keyword in their + function call, with the expectation that the function would treat the + array elements as separate arguments. The function itself must implement + that behavior if wanted, after using get_fn_expr_variadic to + detect that the actual argument was marked with VARIADIC. +

38.10.10. Shared Memory and LWLocks

+ Add-ins can reserve LWLocks and an allocation of shared memory on server + startup. The add-in's shared library must be preloaded by specifying + it in + shared_preload_libraries . + The shared library should register a shmem_request_hook + in its _PG_init function. This + shmem_request_hook can reserve LWLocks or shared memory. + Shared memory is reserved by calling: +

+void RequestAddinShmemSpace(int size)
+

+ from your shmem_request_hook. +

+ LWLocks are reserved by calling: +

+void RequestNamedLWLockTranche(const char *tranche_name, int num_lwlocks)
+

+ from your shmem_request_hook. This will ensure that an array of + num_lwlocks LWLocks is available under the name + tranche_name. Use GetNamedLWLockTranche + to get a pointer to this array. +

+ An example of a shmem_request_hook can be found in + contrib/pg_stat_statements/pg_stat_statements.c in the + PostgreSQL source tree. +

+ To avoid possible race-conditions, each backend should use the LWLock + AddinShmemInitLock when connecting to and initializing + its allocation of shared memory, as shown here: +

+static mystruct *ptr = NULL;
+
+if (!ptr)
+{
+        bool    found;
+
+        LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+        ptr = ShmemInitStruct("my struct name", size, &found);
+        if (!found)
+        {
+                initialize contents of shmem area;
+                acquire any requested LWLocks using:
+                ptr->locks = GetNamedLWLockTranche("my tranche name");
+        }
+        LWLockRelease(AddinShmemInitLock);
+}
+

38.10.11. Using C++ for Extensibility

+ Although the PostgreSQL backend is written in + C, it is possible to write extensions in C++ if these guidelines are + followed: + +

+ All functions accessed by the backend must present a C interface + to the backend; these C functions can then call C++ functions. + For example, extern C linkage is required for + backend-accessed functions. This is also necessary for any + functions that are passed as pointers between the backend and + C++ code. +
+ Free memory using the appropriate deallocation method. For example, + most backend memory is allocated using palloc(), so use + pfree() to free it. Using C++ + delete in such cases will fail. +
+ Prevent exceptions from propagating into the C code (use a catch-all + block at the top level of all extern C functions). This + is necessary even if the C++ code does not explicitly throw any + exceptions, because events like out-of-memory can still throw + exceptions. Any exceptions must be caught and appropriate errors + passed back to the C interface. If possible, compile C++ with + -fno-exceptions to eliminate exceptions entirely; in such + cases, you must check for failures in your C++ code, e.g., check for + NULL returned by new(). +
+ If calling backend functions from C++ code, be sure that the + C++ call stack contains only plain old data structures + (POD). This is necessary because backend errors + generate a distant longjmp() that does not properly + unroll a C++ call stack with non-POD objects. +

+ In summary, it is best to place C++ code behind a wall of + extern C functions that interface to the backend, + and avoid exception, memory, and call stack leakage. +

Prev	Up	Next
38.9. Internal Functions	Home	38.11. Function Optimization Information