diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-04 12:17:33 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-04 12:17:33 +0000 |
commit | 5e45211a64149b3c659b90ff2de6fa982a5a93ed (patch) | |
tree | 739caf8c461053357daa9f162bef34516c7bf452 /doc/src/sgml/html/xtypes.html | |
parent | Initial commit. (diff) | |
download | postgresql-15-5e45211a64149b3c659b90ff2de6fa982a5a93ed.tar.xz postgresql-15-5e45211a64149b3c659b90ff2de6fa982a5a93ed.zip |
Adding upstream version 15.5.upstream/15.5
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/html/xtypes.html')
-rw-r--r-- | doc/src/sgml/html/xtypes.html | 302 |
1 files changed, 302 insertions, 0 deletions
diff --git a/doc/src/sgml/html/xtypes.html b/doc/src/sgml/html/xtypes.html new file mode 100644 index 0000000..c370c6f --- /dev/null +++ b/doc/src/sgml/html/xtypes.html @@ -0,0 +1,302 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>38.13. User-Defined Types</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="xaggr.html" title="38.12. User-Defined Aggregates" /><link rel="next" href="xoper.html" title="38.14. User-Defined Operators" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">38.13. User-Defined Types</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="xaggr.html" title="38.12. User-Defined Aggregates">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="extend.html" title="Chapter 38. Extending SQL">Up</a></td><th width="60%" align="center">Chapter 38. Extending <acronym class="acronym">SQL</acronym></th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="xoper.html" title="38.14. User-Defined Operators">Next</a></td></tr></table><hr /></div><div class="sect1" id="XTYPES"><div class="titlepage"><div><div><h2 class="title" style="clear: both">38.13. User-Defined Types</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="xtypes.html#XTYPES-TOAST">38.13.1. TOAST Considerations</a></span></dt></dl></div><a id="id-1.8.3.16.2" class="indexterm"></a><p> + As described in <a class="xref" href="extend-type-system.html" title="38.2. The PostgreSQL Type System">Section 38.2</a>, + <span class="productname">PostgreSQL</span> can be extended to support new + data types. This section describes how to define new base types, + which are data types defined below the level of the <acronym class="acronym">SQL</acronym> + language. Creating a new base type requires implementing functions + to operate on the type in a low-level language, usually C. + </p><p> + The examples in this section can be found in + <code class="filename">complex.sql</code> and <code class="filename">complex.c</code> + in the <code class="filename">src/tutorial</code> directory of the source distribution. + See the <code class="filename">README</code> file in that directory for instructions + about running the examples. + </p><p> + <a id="id-1.8.3.16.5.1" class="indexterm"></a> + <a id="id-1.8.3.16.5.2" class="indexterm"></a> + A user-defined type must always have input and output functions. + These functions determine how the type appears in strings (for input + by the user and output to the user) and how the type is organized in + memory. The input function takes a null-terminated character string + as its argument and returns the internal (in memory) representation + of the type. The output function takes the internal representation + of the type as argument and returns a null-terminated character + string. If we want to do anything more with the type than merely + store it, we must provide additional functions to implement whatever + operations we'd like to have for the type. + </p><p> + Suppose we want to define a type <code class="type">complex</code> that represents + complex numbers. A natural way to represent a complex number in + memory would be the following C structure: + +</p><pre class="programlisting"> +typedef struct Complex { + double x; + double y; +} Complex; +</pre><p> + + We will need to make this a pass-by-reference type, since it's too + large to fit into a single <code class="type">Datum</code> value. + </p><p> + As the external string representation of the type, we choose a + string of the form <code class="literal">(x,y)</code>. + </p><p> + The input and output functions are usually not hard to write, + especially the output function. But when defining the external + string representation of the type, remember that you must eventually + write a complete and robust parser for that representation as your + input function. For instance: + +</p><pre class="programlisting"> +PG_FUNCTION_INFO_V1(complex_in); + +Datum +complex_in(PG_FUNCTION_ARGS) +{ + char *str = PG_GETARG_CSTRING(0); + double x, + y; + Complex *result; + + if (sscanf(str, " ( %lf , %lf )", &x, &y) != 2) + ereport(ERROR, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid input syntax for type %s: \"%s\"", + "complex", str))); + + result = (Complex *) palloc(sizeof(Complex)); + result->x = x; + result->y = y; + PG_RETURN_POINTER(result); +} + +</pre><p> + + The output function can simply be: + +</p><pre class="programlisting"> +PG_FUNCTION_INFO_V1(complex_out); + +Datum +complex_out(PG_FUNCTION_ARGS) +{ + Complex *complex = (Complex *) PG_GETARG_POINTER(0); + char *result; + + result = psprintf("(%g,%g)", complex->x, complex->y); + PG_RETURN_CSTRING(result); +} + +</pre><p> + </p><p> + You should be careful to make the input and output functions inverses of + each other. If you do not, you will have severe problems when you + need to dump your data into a file and then read it back in. This + is a particularly common problem when floating-point numbers are + involved. + </p><p> + Optionally, a user-defined type can provide binary input and output + routines. Binary I/O is normally faster but less portable than textual + I/O. As with textual I/O, it is up to you to define exactly what the + external binary representation is. Most of the built-in data types + try to provide a machine-independent binary representation. For + <code class="type">complex</code>, we will piggy-back on the binary I/O converters + for type <code class="type">float8</code>: + +</p><pre class="programlisting"> +PG_FUNCTION_INFO_V1(complex_recv); + +Datum +complex_recv(PG_FUNCTION_ARGS) +{ + StringInfo buf = (StringInfo) PG_GETARG_POINTER(0); + Complex *result; + + result = (Complex *) palloc(sizeof(Complex)); + result->x = pq_getmsgfloat8(buf); + result->y = pq_getmsgfloat8(buf); + PG_RETURN_POINTER(result); +} + +PG_FUNCTION_INFO_V1(complex_send); + +Datum +complex_send(PG_FUNCTION_ARGS) +{ + Complex *complex = (Complex *) PG_GETARG_POINTER(0); + StringInfoData buf; + + pq_begintypsend(&buf); + pq_sendfloat8(&buf, complex->x); + pq_sendfloat8(&buf, complex->y); + PG_RETURN_BYTEA_P(pq_endtypsend(&buf)); +} + +</pre><p> + </p><p> + Once we have written the I/O functions and compiled them into a shared + library, we can define the <code class="type">complex</code> type in SQL. + First we declare it as a shell type: + +</p><pre class="programlisting"> +CREATE TYPE complex; +</pre><p> + + This serves as a placeholder that allows us to reference the type while + defining its I/O functions. Now we can define the I/O functions: + +</p><pre class="programlisting"> +CREATE FUNCTION complex_in(cstring) + RETURNS complex + AS '<em class="replaceable"><code>filename</code></em>' + LANGUAGE C IMMUTABLE STRICT; + +CREATE FUNCTION complex_out(complex) + RETURNS cstring + AS '<em class="replaceable"><code>filename</code></em>' + LANGUAGE C IMMUTABLE STRICT; + +CREATE FUNCTION complex_recv(internal) + RETURNS complex + AS '<em class="replaceable"><code>filename</code></em>' + LANGUAGE C IMMUTABLE STRICT; + +CREATE FUNCTION complex_send(complex) + RETURNS bytea + AS '<em class="replaceable"><code>filename</code></em>' + LANGUAGE C IMMUTABLE STRICT; +</pre><p> + </p><p> + Finally, we can provide the full definition of the data type: +</p><pre class="programlisting"> +CREATE TYPE complex ( + internallength = 16, + input = complex_in, + output = complex_out, + receive = complex_recv, + send = complex_send, + alignment = double +); +</pre><p> + </p><p> + <a id="id-1.8.3.16.13.1" class="indexterm"></a> + When you define a new base type, + <span class="productname">PostgreSQL</span> automatically provides support + for arrays of that type. The array type typically + has the same name as the base type with the underscore character + (<code class="literal">_</code>) prepended. + </p><p> + Once the data type exists, we can declare additional functions to + provide useful operations on the data type. Operators can then be + defined atop the functions, and if needed, operator classes can be + created to support indexing of the data type. These additional + layers are discussed in following sections. + </p><p> + If the internal representation of the data type is variable-length, the + internal representation must follow the standard layout for variable-length + data: the first four bytes must be a <code class="type">char[4]</code> field which is + never accessed directly (customarily named <code class="structfield">vl_len_</code>). You + must use the <code class="function">SET_VARSIZE()</code> macro to store the total + size of the datum (including the length field itself) in this field + and <code class="function">VARSIZE()</code> to retrieve it. (These macros exist + because the length field may be encoded depending on platform.) + </p><p> + For further details see the description of the + <a class="xref" href="sql-createtype.html" title="CREATE TYPE"><span class="refentrytitle">CREATE TYPE</span></a> command. + </p><div class="sect2" id="XTYPES-TOAST"><div class="titlepage"><div><div><h3 class="title">38.13.1. TOAST Considerations</h3></div></div></div><a id="id-1.8.3.16.17.2" class="indexterm"></a><p> + If the values of your data type vary in size (in internal form), it's + usually desirable to make the data type <acronym class="acronym">TOAST</acronym>-able (see <a class="xref" href="storage-toast.html" title="73.2. TOAST">Section 73.2</a>). You should do this even if the values are always + too small to be compressed or stored externally, because + <acronym class="acronym">TOAST</acronym> can save space on small data too, by reducing header + overhead. + </p><p> + To support <acronym class="acronym">TOAST</acronym> storage, the C functions operating on the data + type must always be careful to unpack any toasted values they are handed + by using <code class="function">PG_DETOAST_DATUM</code>. (This detail is customarily hidden + by defining type-specific <code class="function">GETARG_DATATYPE_P</code> macros.) + Then, when running the <code class="command">CREATE TYPE</code> command, specify the + internal length as <code class="literal">variable</code> and select some appropriate storage + option other than <code class="literal">plain</code>. + </p><p> + If data alignment is unimportant (either just for a specific function or + because the data type specifies byte alignment anyway) then it's possible + to avoid some of the overhead of <code class="function">PG_DETOAST_DATUM</code>. You can use + <code class="function">PG_DETOAST_DATUM_PACKED</code> instead (customarily hidden by + defining a <code class="function">GETARG_DATATYPE_PP</code> macro) and using the macros + <code class="function">VARSIZE_ANY_EXHDR</code> and <code class="function">VARDATA_ANY</code> to access + a potentially-packed datum. + Again, the data returned by these macros is not aligned even if the data + type definition specifies an alignment. If the alignment is important you + must go through the regular <code class="function">PG_DETOAST_DATUM</code> interface. + </p><div class="note"><h3 class="title">Note</h3><p> + Older code frequently declares <code class="structfield">vl_len_</code> as an + <code class="type">int32</code> field instead of <code class="type">char[4]</code>. This is OK as long as + the struct definition has other fields that have at least <code class="type">int32</code> + alignment. But it is dangerous to use such a struct definition when + working with a potentially unaligned datum; the compiler may take it as + license to assume the datum actually is aligned, leading to core dumps on + architectures that are strict about alignment. + </p></div><p> + Another feature that's enabled by <acronym class="acronym">TOAST</acronym> support is the + possibility of having an <em class="firstterm">expanded</em> in-memory data + representation that is more convenient to work with than the format that + is stored on disk. The regular or <span class="quote">“<span class="quote">flat</span>”</span> varlena storage format + is ultimately just a blob of bytes; it cannot for example contain + pointers, since it may get copied to other locations in memory. + For complex data types, the flat format may be quite expensive to work + with, so <span class="productname">PostgreSQL</span> provides a way to <span class="quote">“<span class="quote">expand</span>”</span> + the flat format into a representation that is more suited to computation, + and then pass that format in-memory between functions of the data type. + </p><p> + To use expanded storage, a data type must define an expanded format that + follows the rules given in <code class="filename">src/include/utils/expandeddatum.h</code>, + and provide functions to <span class="quote">“<span class="quote">expand</span>”</span> a flat varlena value into + expanded format and <span class="quote">“<span class="quote">flatten</span>”</span> the expanded format back to the + regular varlena representation. Then ensure that all C functions for + the data type can accept either representation, possibly by converting + one into the other immediately upon receipt. This does not require fixing + all existing functions for the data type at once, because the standard + <code class="function">PG_DETOAST_DATUM</code> macro is defined to convert expanded inputs + into regular flat format. Therefore, existing functions that work with + the flat varlena format will continue to work, though slightly + inefficiently, with expanded inputs; they need not be converted until and + unless better performance is important. + </p><p> + C functions that know how to work with an expanded representation + typically fall into two categories: those that can only handle expanded + format, and those that can handle either expanded or flat varlena inputs. + The former are easier to write but may be less efficient overall, because + converting a flat input to expanded form for use by a single function may + cost more than is saved by operating on the expanded format. + When only expanded format need be handled, conversion of flat inputs to + expanded form can be hidden inside an argument-fetching macro, so that + the function appears no more complex than one working with traditional + varlena input. + To handle both types of input, write an argument-fetching function that + will detoast external, short-header, and compressed varlena inputs, but + not expanded inputs. Such a function can be defined as returning a + pointer to a union of the flat varlena format and the expanded format. + Callers can use the <code class="function">VARATT_IS_EXPANDED_HEADER()</code> macro to + determine which format they received. + </p><p> + The <acronym class="acronym">TOAST</acronym> infrastructure not only allows regular varlena + values to be distinguished from expanded values, but also + distinguishes <span class="quote">“<span class="quote">read-write</span>”</span> and <span class="quote">“<span class="quote">read-only</span>”</span> pointers to + expanded values. C functions that only need to examine an expanded + value, or will only change it in safe and non-semantically-visible ways, + need not care which type of pointer they receive. C functions that + produce a modified version of an input value are allowed to modify an + expanded input value in-place if they receive a read-write pointer, but + must not modify the input if they receive a read-only pointer; in that + case they have to copy the value first, producing a new value to modify. + A C function that has constructed a new expanded value should always + return a read-write pointer to it. Also, a C function that is modifying + a read-write expanded value in-place should take care to leave the value + in a sane state if it fails partway through. + </p><p> + For examples of working with expanded values, see the standard array + infrastructure, particularly + <code class="filename">src/backend/utils/adt/array_expanded.c</code>. + </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="xaggr.html" title="38.12. User-Defined Aggregates">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="extend.html" title="Chapter 38. Extending SQL">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="xoper.html" title="38.14. User-Defined Operators">Next</a></td></tr><tr><td width="40%" align="left" valign="top">38.12. User-Defined Aggregates </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 38.14. User-Defined Operators</td></tr></table></div></body></html>
\ No newline at end of file |