summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/xtypes.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/html/xtypes.html')
-rw-r--r--doc/src/sgml/html/xtypes.html302
1 files changed, 302 insertions, 0 deletions
diff --git a/doc/src/sgml/html/xtypes.html b/doc/src/sgml/html/xtypes.html
new file mode 100644
index 0000000..c370c6f
--- /dev/null
+++ b/doc/src/sgml/html/xtypes.html
@@ -0,0 +1,302 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>38.13. User-Defined Types</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="xaggr.html" title="38.12. User-Defined Aggregates" /><link rel="next" href="xoper.html" title="38.14. User-Defined Operators" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">38.13. User-Defined Types</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="xaggr.html" title="38.12. User-Defined Aggregates">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="extend.html" title="Chapter 38. Extending SQL">Up</a></td><th width="60%" align="center">Chapter 38. Extending <acronym class="acronym">SQL</acronym></th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="xoper.html" title="38.14. User-Defined Operators">Next</a></td></tr></table><hr /></div><div class="sect1" id="XTYPES"><div class="titlepage"><div><div><h2 class="title" style="clear: both">38.13. User-Defined Types</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="xtypes.html#XTYPES-TOAST">38.13.1. TOAST Considerations</a></span></dt></dl></div><a id="id-1.8.3.16.2" class="indexterm"></a><p>
+ As described in <a class="xref" href="extend-type-system.html" title="38.2. The PostgreSQL Type System">Section 38.2</a>,
+ <span class="productname">PostgreSQL</span> can be extended to support new
+ data types. This section describes how to define new base types,
+ which are data types defined below the level of the <acronym class="acronym">SQL</acronym>
+ language. Creating a new base type requires implementing functions
+ to operate on the type in a low-level language, usually C.
+ </p><p>
+ The examples in this section can be found in
+ <code class="filename">complex.sql</code> and <code class="filename">complex.c</code>
+ in the <code class="filename">src/tutorial</code> directory of the source distribution.
+ See the <code class="filename">README</code> file in that directory for instructions
+ about running the examples.
+ </p><p>
+ <a id="id-1.8.3.16.5.1" class="indexterm"></a>
+ <a id="id-1.8.3.16.5.2" class="indexterm"></a>
+ A user-defined type must always have input and output functions.
+ These functions determine how the type appears in strings (for input
+ by the user and output to the user) and how the type is organized in
+ memory. The input function takes a null-terminated character string
+ as its argument and returns the internal (in memory) representation
+ of the type. The output function takes the internal representation
+ of the type as argument and returns a null-terminated character
+ string. If we want to do anything more with the type than merely
+ store it, we must provide additional functions to implement whatever
+ operations we'd like to have for the type.
+ </p><p>
+ Suppose we want to define a type <code class="type">complex</code> that represents
+ complex numbers. A natural way to represent a complex number in
+ memory would be the following C structure:
+
+</p><pre class="programlisting">
+typedef struct Complex {
+ double x;
+ double y;
+} Complex;
+</pre><p>
+
+ We will need to make this a pass-by-reference type, since it's too
+ large to fit into a single <code class="type">Datum</code> value.
+ </p><p>
+ As the external string representation of the type, we choose a
+ string of the form <code class="literal">(x,y)</code>.
+ </p><p>
+ The input and output functions are usually not hard to write,
+ especially the output function. But when defining the external
+ string representation of the type, remember that you must eventually
+ write a complete and robust parser for that representation as your
+ input function. For instance:
+
+</p><pre class="programlisting">
+PG_FUNCTION_INFO_V1(complex_in);
+
+Datum
+complex_in(PG_FUNCTION_ARGS)
+{
+ char *str = PG_GETARG_CSTRING(0);
+ double x,
+ y;
+ Complex *result;
+
+ if (sscanf(str, " ( %lf , %lf )", &amp;x, &amp;y) != 2)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid input syntax for type %s: \"%s\"",
+ "complex", str)));
+
+ result = (Complex *) palloc(sizeof(Complex));
+ result-&gt;x = x;
+ result-&gt;y = y;
+ PG_RETURN_POINTER(result);
+}
+
+</pre><p>
+
+ The output function can simply be:
+
+</p><pre class="programlisting">
+PG_FUNCTION_INFO_V1(complex_out);
+
+Datum
+complex_out(PG_FUNCTION_ARGS)
+{
+ Complex *complex = (Complex *) PG_GETARG_POINTER(0);
+ char *result;
+
+ result = psprintf("(%g,%g)", complex-&gt;x, complex-&gt;y);
+ PG_RETURN_CSTRING(result);
+}
+
+</pre><p>
+ </p><p>
+ You should be careful to make the input and output functions inverses of
+ each other. If you do not, you will have severe problems when you
+ need to dump your data into a file and then read it back in. This
+ is a particularly common problem when floating-point numbers are
+ involved.
+ </p><p>
+ Optionally, a user-defined type can provide binary input and output
+ routines. Binary I/O is normally faster but less portable than textual
+ I/O. As with textual I/O, it is up to you to define exactly what the
+ external binary representation is. Most of the built-in data types
+ try to provide a machine-independent binary representation. For
+ <code class="type">complex</code>, we will piggy-back on the binary I/O converters
+ for type <code class="type">float8</code>:
+
+</p><pre class="programlisting">
+PG_FUNCTION_INFO_V1(complex_recv);
+
+Datum
+complex_recv(PG_FUNCTION_ARGS)
+{
+ StringInfo buf = (StringInfo) PG_GETARG_POINTER(0);
+ Complex *result;
+
+ result = (Complex *) palloc(sizeof(Complex));
+ result-&gt;x = pq_getmsgfloat8(buf);
+ result-&gt;y = pq_getmsgfloat8(buf);
+ PG_RETURN_POINTER(result);
+}
+
+PG_FUNCTION_INFO_V1(complex_send);
+
+Datum
+complex_send(PG_FUNCTION_ARGS)
+{
+ Complex *complex = (Complex *) PG_GETARG_POINTER(0);
+ StringInfoData buf;
+
+ pq_begintypsend(&amp;buf);
+ pq_sendfloat8(&amp;buf, complex-&gt;x);
+ pq_sendfloat8(&amp;buf, complex-&gt;y);
+ PG_RETURN_BYTEA_P(pq_endtypsend(&amp;buf));
+}
+
+</pre><p>
+ </p><p>
+ Once we have written the I/O functions and compiled them into a shared
+ library, we can define the <code class="type">complex</code> type in SQL.
+ First we declare it as a shell type:
+
+</p><pre class="programlisting">
+CREATE TYPE complex;
+</pre><p>
+
+ This serves as a placeholder that allows us to reference the type while
+ defining its I/O functions. Now we can define the I/O functions:
+
+</p><pre class="programlisting">
+CREATE FUNCTION complex_in(cstring)
+ RETURNS complex
+ AS '<em class="replaceable"><code>filename</code></em>'
+ LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION complex_out(complex)
+ RETURNS cstring
+ AS '<em class="replaceable"><code>filename</code></em>'
+ LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION complex_recv(internal)
+ RETURNS complex
+ AS '<em class="replaceable"><code>filename</code></em>'
+ LANGUAGE C IMMUTABLE STRICT;
+
+CREATE FUNCTION complex_send(complex)
+ RETURNS bytea
+ AS '<em class="replaceable"><code>filename</code></em>'
+ LANGUAGE C IMMUTABLE STRICT;
+</pre><p>
+ </p><p>
+ Finally, we can provide the full definition of the data type:
+</p><pre class="programlisting">
+CREATE TYPE complex (
+ internallength = 16,
+ input = complex_in,
+ output = complex_out,
+ receive = complex_recv,
+ send = complex_send,
+ alignment = double
+);
+</pre><p>
+ </p><p>
+ <a id="id-1.8.3.16.13.1" class="indexterm"></a>
+ When you define a new base type,
+ <span class="productname">PostgreSQL</span> automatically provides support
+ for arrays of that type. The array type typically
+ has the same name as the base type with the underscore character
+ (<code class="literal">_</code>) prepended.
+ </p><p>
+ Once the data type exists, we can declare additional functions to
+ provide useful operations on the data type. Operators can then be
+ defined atop the functions, and if needed, operator classes can be
+ created to support indexing of the data type. These additional
+ layers are discussed in following sections.
+ </p><p>
+ If the internal representation of the data type is variable-length, the
+ internal representation must follow the standard layout for variable-length
+ data: the first four bytes must be a <code class="type">char[4]</code> field which is
+ never accessed directly (customarily named <code class="structfield">vl_len_</code>). You
+ must use the <code class="function">SET_VARSIZE()</code> macro to store the total
+ size of the datum (including the length field itself) in this field
+ and <code class="function">VARSIZE()</code> to retrieve it. (These macros exist
+ because the length field may be encoded depending on platform.)
+ </p><p>
+ For further details see the description of the
+ <a class="xref" href="sql-createtype.html" title="CREATE TYPE"><span class="refentrytitle">CREATE TYPE</span></a> command.
+ </p><div class="sect2" id="XTYPES-TOAST"><div class="titlepage"><div><div><h3 class="title">38.13.1. TOAST Considerations</h3></div></div></div><a id="id-1.8.3.16.17.2" class="indexterm"></a><p>
+ If the values of your data type vary in size (in internal form), it's
+ usually desirable to make the data type <acronym class="acronym">TOAST</acronym>-able (see <a class="xref" href="storage-toast.html" title="73.2. TOAST">Section 73.2</a>). You should do this even if the values are always
+ too small to be compressed or stored externally, because
+ <acronym class="acronym">TOAST</acronym> can save space on small data too, by reducing header
+ overhead.
+ </p><p>
+ To support <acronym class="acronym">TOAST</acronym> storage, the C functions operating on the data
+ type must always be careful to unpack any toasted values they are handed
+ by using <code class="function">PG_DETOAST_DATUM</code>. (This detail is customarily hidden
+ by defining type-specific <code class="function">GETARG_DATATYPE_P</code> macros.)
+ Then, when running the <code class="command">CREATE TYPE</code> command, specify the
+ internal length as <code class="literal">variable</code> and select some appropriate storage
+ option other than <code class="literal">plain</code>.
+ </p><p>
+ If data alignment is unimportant (either just for a specific function or
+ because the data type specifies byte alignment anyway) then it's possible
+ to avoid some of the overhead of <code class="function">PG_DETOAST_DATUM</code>. You can use
+ <code class="function">PG_DETOAST_DATUM_PACKED</code> instead (customarily hidden by
+ defining a <code class="function">GETARG_DATATYPE_PP</code> macro) and using the macros
+ <code class="function">VARSIZE_ANY_EXHDR</code> and <code class="function">VARDATA_ANY</code> to access
+ a potentially-packed datum.
+ Again, the data returned by these macros is not aligned even if the data
+ type definition specifies an alignment. If the alignment is important you
+ must go through the regular <code class="function">PG_DETOAST_DATUM</code> interface.
+ </p><div class="note"><h3 class="title">Note</h3><p>
+ Older code frequently declares <code class="structfield">vl_len_</code> as an
+ <code class="type">int32</code> field instead of <code class="type">char[4]</code>. This is OK as long as
+ the struct definition has other fields that have at least <code class="type">int32</code>
+ alignment. But it is dangerous to use such a struct definition when
+ working with a potentially unaligned datum; the compiler may take it as
+ license to assume the datum actually is aligned, leading to core dumps on
+ architectures that are strict about alignment.
+ </p></div><p>
+ Another feature that's enabled by <acronym class="acronym">TOAST</acronym> support is the
+ possibility of having an <em class="firstterm">expanded</em> in-memory data
+ representation that is more convenient to work with than the format that
+ is stored on disk. The regular or <span class="quote">“<span class="quote">flat</span>”</span> varlena storage format
+ is ultimately just a blob of bytes; it cannot for example contain
+ pointers, since it may get copied to other locations in memory.
+ For complex data types, the flat format may be quite expensive to work
+ with, so <span class="productname">PostgreSQL</span> provides a way to <span class="quote">“<span class="quote">expand</span>”</span>
+ the flat format into a representation that is more suited to computation,
+ and then pass that format in-memory between functions of the data type.
+ </p><p>
+ To use expanded storage, a data type must define an expanded format that
+ follows the rules given in <code class="filename">src/include/utils/expandeddatum.h</code>,
+ and provide functions to <span class="quote">“<span class="quote">expand</span>”</span> a flat varlena value into
+ expanded format and <span class="quote">“<span class="quote">flatten</span>”</span> the expanded format back to the
+ regular varlena representation. Then ensure that all C functions for
+ the data type can accept either representation, possibly by converting
+ one into the other immediately upon receipt. This does not require fixing
+ all existing functions for the data type at once, because the standard
+ <code class="function">PG_DETOAST_DATUM</code> macro is defined to convert expanded inputs
+ into regular flat format. Therefore, existing functions that work with
+ the flat varlena format will continue to work, though slightly
+ inefficiently, with expanded inputs; they need not be converted until and
+ unless better performance is important.
+ </p><p>
+ C functions that know how to work with an expanded representation
+ typically fall into two categories: those that can only handle expanded
+ format, and those that can handle either expanded or flat varlena inputs.
+ The former are easier to write but may be less efficient overall, because
+ converting a flat input to expanded form for use by a single function may
+ cost more than is saved by operating on the expanded format.
+ When only expanded format need be handled, conversion of flat inputs to
+ expanded form can be hidden inside an argument-fetching macro, so that
+ the function appears no more complex than one working with traditional
+ varlena input.
+ To handle both types of input, write an argument-fetching function that
+ will detoast external, short-header, and compressed varlena inputs, but
+ not expanded inputs. Such a function can be defined as returning a
+ pointer to a union of the flat varlena format and the expanded format.
+ Callers can use the <code class="function">VARATT_IS_EXPANDED_HEADER()</code> macro to
+ determine which format they received.
+ </p><p>
+ The <acronym class="acronym">TOAST</acronym> infrastructure not only allows regular varlena
+ values to be distinguished from expanded values, but also
+ distinguishes <span class="quote">“<span class="quote">read-write</span>”</span> and <span class="quote">“<span class="quote">read-only</span>”</span> pointers to
+ expanded values. C functions that only need to examine an expanded
+ value, or will only change it in safe and non-semantically-visible ways,
+ need not care which type of pointer they receive. C functions that
+ produce a modified version of an input value are allowed to modify an
+ expanded input value in-place if they receive a read-write pointer, but
+ must not modify the input if they receive a read-only pointer; in that
+ case they have to copy the value first, producing a new value to modify.
+ A C function that has constructed a new expanded value should always
+ return a read-write pointer to it. Also, a C function that is modifying
+ a read-write expanded value in-place should take care to leave the value
+ in a sane state if it fails partway through.
+ </p><p>
+ For examples of working with expanded values, see the standard array
+ infrastructure, particularly
+ <code class="filename">src/backend/utils/adt/array_expanded.c</code>.
+ </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="xaggr.html" title="38.12. User-Defined Aggregates">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="extend.html" title="Chapter 38. Extending SQL">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="xoper.html" title="38.14. User-Defined Operators">Next</a></td></tr><tr><td width="40%" align="left" valign="top">38.12. User-Defined Aggregates </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 38.14. User-Defined Operators</td></tr></table></div></body></html> \ No newline at end of file