1 files changed, 1076 insertions, 0 deletions
diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
new file mode 100644
index 0000000..0043251
--- /dev/null
+++ b/doc/src/sgml/spgist.sgml
@@ -0,0 +1,1076 @@
+<!-- doc/src/sgml/spgist.sgml -->
+
+<chapter id="spgist">
+<title>SP-GiST Indexes</title>
+
+   <indexterm>
+    <primary>index</primary>
+    <secondary>SP-GiST</secondary>
+   </indexterm>
+
+<sect1 id="spgist-intro">
+ <title>Introduction</title>
+
+ <para>
+  <acronym>SP-GiST</acronym> is an abbreviation for space-partitioned
+  <acronym>GiST</acronym>.  <acronym>SP-GiST</acronym> supports partitioned
+  search trees, which facilitate development of a wide range of different
+  non-balanced data structures, such as quad-trees, k-d trees, and radix
+  trees (tries).  The common feature of these structures is that they
+  repeatedly divide the search space into partitions that need not be
+  of equal size.  Searches that are well matched to the partitioning rule
+  can be very fast.
+ </para>
+
+ <para>
+  These popular data structures were originally developed for in-memory
+  usage.  In main memory, they are usually designed as a set of dynamically
+  allocated nodes linked by pointers.  This is not suitable for direct
+  storing on disk, since these chains of pointers can be rather long which
+  would require too many disk accesses.  In contrast, disk-based data
+  structures should have a high fanout to minimize I/O.  The challenge
+  addressed by <acronym>SP-GiST</acronym> is to map search tree nodes to
+  disk pages in such a way that a search need access only a few disk pages,
+  even if it traverses many nodes.
+ </para>
+
+ <para>
+  Like <acronym>GiST</acronym>, <acronym>SP-GiST</acronym> is meant to allow
+  the development of custom data types with the appropriate access methods,
+  by an expert in the domain of the data type, rather than a database expert.
+ </para>
+
+ <para>
+  Some of the information here is derived from Purdue University's
+  SP-GiST Indexing Project
+  <ulink url="https://www.cs.purdue.edu/spgist/">web site</ulink>.
+  The <acronym>SP-GiST</acronym> implementation in
+  <productname>PostgreSQL</productname> is primarily maintained by Teodor
+  Sigaev and Oleg Bartunov, and there is more information on their
+  <!-- URL will be changed -->
+  <ulink url="http://www.sai.msu.su/~megera/wiki/spgist_dev">web site</ulink>.
+ </para>
+
+</sect1>
+
+<sect1 id="spgist-builtin-opclasses">
+ <title>Built-in Operator Classes</title>
+
+ <para>
+  The core <productname>PostgreSQL</productname> distribution
+  includes the <acronym>SP-GiST</acronym> operator classes shown in
+  <xref linkend="spgist-builtin-opclasses-table"/>.
+ </para>
+
+  <table id="spgist-builtin-opclasses-table">
+   <title>Built-in <acronym>SP-GiST</acronym> Operator Classes</title>
+   <tgroup cols="3">
+    <thead>
+     <row>
+      <entry>Name</entry>
+      <entry>Indexable Operators</entry>
+      <entry>Ordering Operators</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry valign="middle" morerows="11"><literal>box_ops</literal></entry>
+      <entry><literal>&lt;&lt; (box,box)</literal></entry>
+      <entry valign="middle" morerows="11"><literal>&lt;-&gt; (box,point)</literal></entry>
+     </row>
+     <row><entry><literal>&amp;&lt; (box,box)</literal></entry></row>
+     <row><entry><literal>&amp;&gt; (box,box)</literal></entry></row>
+     <row><entry><literal>&gt;&gt; (box,box)</literal></entry></row>
+     <row><entry><literal>&lt;@ (box,box)</literal></entry></row>
+     <row><entry><literal>@&gt; (box,box)</literal></entry></row>
+     <row><entry><literal>~= (box,box)</literal></entry></row>
+     <row><entry><literal>&amp;&amp; (box,box)</literal></entry></row>
+     <row><entry><literal>&lt;&lt;| (box,box)</literal></entry></row>
+     <row><entry><literal>&amp;&lt;| (box,box)</literal></entry></row>
+     <row><entry><literal>|&amp;&gt; (box,box)</literal></entry></row>
+     <row><entry><literal>|&gt;&gt; (box,box)</literal></entry></row>
+
+     <row>
+      <entry valign="middle" morerows="5"><literal>kd_point_ops</literal></entry>
+      <entry><literal>|&gt;&gt; (point,point)</literal></entry>
+      <entry valign="middle" morerows="5"><literal>&lt;-&gt; (point,point)</literal></entry>
+     </row>
+     <row><entry><literal>&lt;&lt; (point,point)</literal></entry></row>
+     <row><entry><literal>&gt;&gt; (point,point)</literal></entry></row>
+     <row><entry><literal>&lt;&lt;| (point,point)</literal></entry></row>
+     <row><entry><literal>~= (point,point)</literal></entry></row>
+     <row><entry><literal>&lt;@ (point,box)</literal></entry></row>
+
+     <row>
+      <entry valign="middle" morerows="10"><literal>network_ops</literal></entry>
+      <entry><literal>&lt;&lt; (inet,inet)</literal></entry>
+      <entry valign="middle" morerows="10"></entry>
+     </row>
+     <row><entry><literal>&lt;&lt;= (inet,inet)</literal></entry></row>
+     <row><entry><literal>&gt;&gt; (inet,inet)</literal></entry></row>
+     <row><entry><literal>&gt;&gt;= (inet,inet)</literal></entry></row>
+     <row><entry><literal>= (inet,inet)</literal></entry></row>
+     <row><entry><literal>&lt;&gt; (inet,inet)</literal></entry></row>
+     <row><entry><literal>&lt; (inet,inet)</literal></entry></row>
+     <row><entry><literal>&lt;= (inet,inet)</literal></entry></row>
+     <row><entry><literal>&gt; (inet,inet)</literal></entry></row>
+     <row><entry><literal>&gt;= (inet,inet)</literal></entry></row>
+     <row><entry><literal>&amp;&amp; (inet,inet)</literal></entry></row>
+
+     <row>
+      <entry valign="middle" morerows="11"><literal>poly_ops</literal></entry>
+      <entry><literal>&lt;&lt; (polygon,polygon)</literal></entry>
+      <entry valign="middle" morerows="11"><literal>&lt;-&gt; (polygon,point)</literal></entry>
+     </row>
+     <row><entry><literal>&amp;&lt; (polygon,polygon)</literal></entry></row>
+     <row><entry><literal>&amp;&gt; (polygon,polygon)</literal></entry></row>
+     <row><entry><literal>&gt;&gt; (polygon,polygon)</literal></entry></row>
+     <row><entry><literal>&lt;@ (polygon,polygon)</literal></entry></row>
+     <row><entry><literal>@&gt; (polygon,polygon)</literal></entry></row>
+     <row><entry><literal>~= (polygon,polygon)</literal></entry></row>
+     <row><entry><literal>&amp;&amp; (polygon,polygon)</literal></entry></row>
+     <row><entry><literal>&lt;&lt;| (polygon,polygon)</literal></entry></row>
+     <row><entry><literal>&amp;&lt;| (polygon,polygon)</literal></entry></row>
+     <row><entry><literal>|&gt;&gt; (polygon,polygon)</literal></entry></row>
+     <row><entry><literal>|&amp;&gt; (polygon,polygon)</literal></entry></row>
+
+     <row>
+      <entry valign="middle" morerows="5"><literal>quad_point_ops</literal></entry>
+      <entry><literal>|&gt;&gt; (point,point)</literal></entry>
+      <entry valign="middle" morerows="5"><literal>&lt;-&gt; (point,point)</literal></entry>
+     </row>
+     <row><entry><literal>&lt;&lt; (point,point)</literal></entry></row>
+     <row><entry><literal>&gt;&gt; (point,point)</literal></entry></row>
+     <row><entry><literal>&lt;&lt;| (point,point)</literal></entry></row>
+     <row><entry><literal>~= (point,point)</literal></entry></row>
+     <row><entry><literal>&lt;@ (point,box)</literal></entry></row>
+
+     <row>
+      <entry valign="middle" morerows="9"><literal>range_ops</literal></entry>
+      <entry><literal>= (anyrange,anyrange)</literal></entry>
+      <entry valign="middle" morerows="9"></entry>
+     </row>
+     <row><entry><literal>&amp;&amp; (anyrange,anyrange)</literal></entry></row>
+     <row><entry><literal>@&gt; (anyrange,anyelement)</literal></entry></row>
+     <row><entry><literal>@&gt; (anyrange,anyrange)</literal></entry></row>
+     <row><entry><literal>&lt;@ (anyrange,anyrange)</literal></entry></row>
+     <row><entry><literal>&lt;&lt; (anyrange,anyrange)</literal></entry></row>
+     <row><entry><literal>&gt;&gt; (anyrange,anyrange)</literal></entry></row>
+     <row><entry><literal>&amp;&lt; (anyrange,anyrange)</literal></entry></row>
+     <row><entry><literal>&amp;&gt; (anyrange,anyrange)</literal></entry></row>
+     <row><entry><literal>-|- (anyrange,anyrange)</literal></entry></row>
+
+     <row>
+      <entry valign="middle" morerows="9"><literal>text_ops</literal></entry>
+      <entry><literal>= (text,text)</literal></entry>
+      <entry valign="middle" morerows="9"></entry>
+     </row>
+     <row><entry><literal>&lt; (text,text)</literal></entry></row>
+     <row><entry><literal>&lt;= (text,text)</literal></entry></row>
+     <row><entry><literal>&gt; (text,text)</literal></entry></row>
+     <row><entry><literal>&gt;= (text,text)</literal></entry></row>
+     <row><entry><literal>~&lt;~ (text,text)</literal></entry></row>
+     <row><entry><literal>~&lt;=~ (text,text)</literal></entry></row>
+     <row><entry><literal>~&gt;=~ (text,text)</literal></entry></row>
+     <row><entry><literal>~&gt;~ (text,text)</literal></entry></row>
+     <row><entry><literal>^@ (text,text)</literal></entry></row>
+    </tbody>
+   </tgroup>
+  </table>
+
+ <para>
+  Of the two operator classes for type <type>point</type>,
+  <literal>quad_point_ops</literal> is the default.  <literal>kd_point_ops</literal>
+  supports the same operators but uses a different index data structure that
+  may offer better performance in some applications.
+ </para>
+ <para>
+  The <literal>quad_point_ops</literal>, <literal>kd_point_ops</literal> and
+  <literal>poly_ops</literal> operator classes support the <literal>&lt;-&gt;</literal>
+  ordering operator, which enables the k-nearest neighbor (<literal>k-NN</literal>)
+  search over indexed point or polygon data sets.
+ </para>
+
+</sect1>
+
+<sect1 id="spgist-extensibility">
+ <title>Extensibility</title>
+
+ <para>
+  <acronym>SP-GiST</acronym> offers an interface with a high level of
+  abstraction, requiring the access method developer to implement only
+  methods specific to a given data type. The <acronym>SP-GiST</acronym> core
+  is responsible for efficient disk mapping and searching the tree structure.
+  It also takes care of concurrency and logging considerations.
+ </para>
+
+ <para>
+  Leaf tuples of an <acronym>SP-GiST</acronym> tree usually contain values
+  of the same data type as the indexed column, although it is also possible
+  for them to contain lossy representations of the indexed column.
+  Leaf tuples stored at the root level will directly represent
+  the original indexed data value, but leaf tuples at lower
+  levels might contain only a partial value, such as a suffix.
+  In that case the operator class support functions must be able to
+  reconstruct the original value using information accumulated from the
+  inner tuples that are passed through to reach the leaf level.
+ </para>
+
+ <para>
+  When an <acronym>SP-GiST</acronym> index is created with
+  <literal>INCLUDE</literal> columns, the values of those columns are also
+  stored in leaf tuples.  The <literal>INCLUDE</literal> columns are of no
+  concern to the <acronym>SP-GiST</acronym> operator class, so they are
+  not discussed further here.
+ </para>
+
+ <para>
+  Inner tuples are more complex, since they are branching points in the
+  search tree.  Each inner tuple contains a set of one or more
+  <firstterm>nodes</firstterm>, which represent groups of similar leaf values.
+  A node contains a downlink that leads either to another, lower-level inner
+  tuple, or to a short list of leaf tuples that all lie on the same index page.
+  Each node normally has a <firstterm>label</firstterm> that describes it; for example,
+  in a radix tree the node label could be the next character of the string
+  value.  (Alternatively, an operator class can omit the node labels, if it
+  works with a fixed set of nodes for all inner tuples;
+  see <xref linkend="spgist-null-labels"/>.)
+  Optionally, an inner tuple can have a <firstterm>prefix</firstterm> value
+  that describes all its members.  In a radix tree this could be the common
+  prefix of the represented strings.  The prefix value is not necessarily
+  really a prefix, but can be any data needed by the operator class;
+  for example, in a quad-tree it can store the central point that the four
+  quadrants are measured with respect to.  A quad-tree inner tuple would
+  then also contain four nodes corresponding to the quadrants around this
+  central point.
+ </para>
+
+ <para>
+  Some tree algorithms require knowledge of level (or depth) of the current
+  tuple, so the <acronym>SP-GiST</acronym> core provides the possibility for
+  operator classes to manage level counting while descending the tree.
+  There is also support for incrementally reconstructing the represented
+  value when that is needed, and for passing down additional data (called
+  <firstterm>traverse values</firstterm>) during a tree descent.
+ </para>
+
+ <note>
+  <para>
+   The <acronym>SP-GiST</acronym> core code takes care of null entries.
+   Although <acronym>SP-GiST</acronym> indexes do store entries for nulls
+   in indexed columns, this is hidden from the index operator class code:
+   no null index entries or search conditions will ever be passed to the
+   operator class methods.  (It is assumed that <acronym>SP-GiST</acronym>
+   operators are strict and so cannot succeed for null values.)  Null values
+   are therefore not discussed further here.
+  </para>
+ </note>
+
+ <para>
+  There are five user-defined methods that an index operator class for
+  <acronym>SP-GiST</acronym> must provide, and two are optional.  All five
+  mandatory methods follow the convention of accepting two <type>internal</type>
+  arguments, the first of which is a pointer to a C struct containing input
+  values for the support method, while the second argument is a pointer to a
+  C struct where output values must be placed.  Four of the mandatory methods just
+  return <type>void</type>, since all their results appear in the output struct; but
+  <function>leaf_consistent</function> returns a <type>boolean</type> result.
+  The methods must not modify any fields of their input structs.  In all
+  cases, the output struct is initialized to zeroes before calling the
+  user-defined method.  The optional sixth method <function>compress</function>
+  accepts a <type>datum</type> to be indexed as the only argument and returns a value suitable
+  for physical storage in a leaf tuple.  The optional seventh method
+  <function>options</function> accepts an <type>internal</type> pointer to a C struct, where
+  opclass-specific parameters should be placed, and returns <type>void</type>.
+ </para>
+
+ <para>
+  The five mandatory user-defined methods are:
+ </para>
+
+ <variablelist>
+    <varlistentry>
+     <term><function>config</function></term>
+     <listitem>
+      <para>
+       Returns static information about the index implementation, including
+       the data type OIDs of the prefix and node label data types.
+      </para>
+     <para>
+      The <acronym>SQL</acronym> declaration of the function must look like this:
+<programlisting>
+CREATE FUNCTION my_config(internal, internal) RETURNS void ...
+</programlisting>
+      The first argument is a pointer to a <structname>spgConfigIn</structname>
+      C struct, containing input data for the function.
+      The second argument is a pointer to a <structname>spgConfigOut</structname>
+      C struct, which the function must fill with result data.
+<programlisting>
+typedef struct spgConfigIn
+{
+    Oid         attType;        /* Data type to be indexed */
+} spgConfigIn;
+
+typedef struct spgConfigOut
+{
+    Oid         prefixType;     /* Data type of inner-tuple prefixes */
+    Oid         labelType;      /* Data type of inner-tuple node labels */
+    Oid         leafType;       /* Data type of leaf-tuple values */
+    bool        canReturnData;  /* Opclass can reconstruct original data */
+    bool        longValuesOK;   /* Opclass can cope with values &gt; 1 page */
+} spgConfigOut;
+</programlisting>
+
+      <structfield>attType</structfield> is passed in order to support polymorphic
+      index operator classes; for ordinary fixed-data-type operator classes, it
+      will always have the same value and so can be ignored.
+     </para>
+
+     <para>
+      For operator classes that do not use prefixes,
+      <structfield>prefixType</structfield> can be set to <literal>VOIDOID</literal>.
+      Likewise, for operator classes that do not use node labels,
+      <structfield>labelType</structfield> can be set to <literal>VOIDOID</literal>.
+      <structfield>canReturnData</structfield> should be set true if the operator class
+      is capable of reconstructing the originally-supplied index value.
+      <structfield>longValuesOK</structfield> should be set true only when the
+      <structfield>attType</structfield> is of variable length and the operator
+      class is capable of segmenting long values by repeated suffixing
+      (see <xref linkend="spgist-limits"/>).
+     </para>
+
+     <para>
+      <structfield>leafType</structfield> should match the index storage type
+      defined by the operator class's <structfield>opckeytype</structfield>
+      catalog entry.
+      (Note that <structfield>opckeytype</structfield> can be zero,
+      implying the storage type is the same as the operator class's input
+      type, which is the most common situation.)
+      For reasons of backward compatibility, the <function>config</function>
+      method can set <structfield>leafType</structfield> to some other value,
+      and that value will be used; but this is deprecated since the index
+      contents are then incorrectly identified in the catalogs.
+      Also, it's permissible to
+      leave <structfield>leafType</structfield> uninitialized (zero);
+      that is interpreted as meaning the index storage type derived from
+      <structfield>opckeytype</structfield>.
+     </para>
+
+     <para>
+      When <structfield>attType</structfield>
+      and <structfield>leafType</structfield> are different, the optional
+      method <function>compress</function> must be provided.
+      Method <function>compress</function> is responsible
+      for transformation of datums to be indexed from <structfield>attType</structfield>
+      to <structfield>leafType</structfield>.
+     </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><function>choose</function></term>
+     <listitem>
+      <para>
+        Chooses a method for inserting a new value into an inner tuple.
+      </para>
+
+     <para>
+      The <acronym>SQL</acronym> declaration of the function must look like this:
+<programlisting>
+CREATE FUNCTION my_choose(internal, internal) RETURNS void ...
+</programlisting>
+      The first argument is a pointer to a <structname>spgChooseIn</structname>
+      C struct, containing input data for the function.
+      The second argument is a pointer to a <structname>spgChooseOut</structname>
+      C struct, which the function must fill with result data.
+<programlisting>
+typedef struct spgChooseIn
+{
+    Datum       datum;          /* original datum to be indexed */
+    Datum       leafDatum;      /* current datum to be stored at leaf */
+    int         level;          /* current level (counting from zero) */
+
+    /* Data from current inner tuple */
+    bool        allTheSame;     /* tuple is marked all-the-same? */
+    bool        hasPrefix;      /* tuple has a prefix? */
+    Datum       prefixDatum;    /* if so, the prefix value */
+    int         nNodes;         /* number of nodes in the inner tuple */
+    Datum      *nodeLabels;     /* node label values (NULL if none) */
+} spgChooseIn;
+
+typedef enum spgChooseResultType
+{
+    spgMatchNode = 1,           /* descend into existing node */
+    spgAddNode,                 /* add a node to the inner tuple */
+    spgSplitTuple               /* split inner tuple (change its prefix) */
+} spgChooseResultType;
+
+typedef struct spgChooseOut
+{
+    spgChooseResultType resultType;     /* action code, see above */
+    union
+    {
+        struct                  /* results for spgMatchNode */
+        {
+            int         nodeN;      /* descend to this node (index from 0) */
+            int         levelAdd;   /* increment level by this much */
+            Datum       restDatum;  /* new leaf datum */
+        }           matchNode;
+        struct                  /* results for spgAddNode */
+        {
+            Datum       nodeLabel;  /* new node's label */
+            int         nodeN;      /* where to insert it (index from 0) */
+        }           addNode;
+        struct                  /* results for spgSplitTuple */
+        {
+            /* Info to form new upper-level inner tuple with one child tuple */
+            bool        prefixHasPrefix;    /* tuple should have a prefix? */
+            Datum       prefixPrefixDatum;  /* if so, its value */
+            int         prefixNNodes;       /* number of nodes */
+            Datum      *prefixNodeLabels;   /* their labels (or NULL for
+                                             * no labels) */
+            int         childNodeN;         /* which node gets child tuple */
+
+            /* Info to form new lower-level inner tuple with all old nodes */
+            bool        postfixHasPrefix;   /* tuple should have a prefix? */
+            Datum       postfixPrefixDatum; /* if so, its value */
+        }           splitTuple;
+    }           result;
+} spgChooseOut;
+</programlisting>
+
+       <structfield>datum</structfield> is the original datum of
+       <structname>spgConfigIn</structname>.<structfield>attType</structfield>
+       type that was to be inserted into the index.
+       <structfield>leafDatum</structfield> is a value of
+       <structname>spgConfigOut</structname>.<structfield>leafType</structfield>
+       type, which is initially a result of method
+       <function>compress</function> applied to <structfield>datum</structfield>
+       when method <function>compress</function> is provided, or the same value as
+       <structfield>datum</structfield> otherwise.
+       <structfield>leafDatum</structfield> can change at lower levels of the tree
+       if the <function>choose</function> or <function>picksplit</function>
+       methods change it.  When the insertion search reaches a leaf page,
+       the current value of <structfield>leafDatum</structfield> is what will be stored
+       in the newly created leaf tuple.
+       <structfield>level</structfield> is the current inner tuple's level, starting at
+       zero for the root level.
+       <structfield>allTheSame</structfield> is true if the current inner tuple is
+       marked as containing multiple equivalent nodes
+       (see <xref linkend="spgist-all-the-same"/>).
+       <structfield>hasPrefix</structfield> is true if the current inner tuple contains
+       a prefix; if so,
+       <structfield>prefixDatum</structfield> is its value.
+       <structfield>nNodes</structfield> is the number of child nodes contained in the
+       inner tuple, and
+       <structfield>nodeLabels</structfield> is an array of their label values, or
+       NULL if there are no labels.
+      </para>
+
+      <para>
+       The <function>choose</function> function can determine either that
+       the new value matches one of the existing child nodes, or that a new
+       child node must be added, or that the new value is inconsistent with
+       the tuple prefix and so the inner tuple must be split to create a
+       less restrictive prefix.
+      </para>
+
+      <para>
+       If the new value matches one of the existing child nodes,
+       set <structfield>resultType</structfield> to <literal>spgMatchNode</literal>.
+       Set <structfield>nodeN</structfield> to the index (from zero) of that node in
+       the node array.
+       Set <structfield>levelAdd</structfield> to the increment in
+       <structfield>level</structfield> caused by descending through that node,
+       or leave it as zero if the operator class does not use levels.
+       Set <structfield>restDatum</structfield> to equal <structfield>leafDatum</structfield>
+       if the operator class does not modify datums from one level to the
+       next, or otherwise set it to the modified value to be used as
+       <structfield>leafDatum</structfield> at the next level.
+      </para>
+
+      <para>
+       If a new child node must be added,
+       set <structfield>resultType</structfield> to <literal>spgAddNode</literal>.
+       Set <structfield>nodeLabel</structfield> to the label to be used for the new
+       node, and set <structfield>nodeN</structfield> to the index (from zero) at which
+       to insert the node in the node array.
+       After the node has been added, the <function>choose</function>
+       function will be called again with the modified inner tuple;
+       that call should result in an <literal>spgMatchNode</literal> result.
+      </para>
+
+      <para>
+       If the new value is inconsistent with the tuple prefix,
+       set <structfield>resultType</structfield> to <literal>spgSplitTuple</literal>.
+       This action moves all the existing nodes into a new lower-level
+       inner tuple, and replaces the existing inner tuple with a tuple
+       having a single downlink pointing to the new lower-level inner tuple.
+       Set <structfield>prefixHasPrefix</structfield> to indicate whether the new
+       upper tuple should have a prefix, and if so set
+       <structfield>prefixPrefixDatum</structfield> to the prefix value.  This new
+       prefix value must be sufficiently less restrictive than the original
+       to accept the new value to be indexed.
+       Set <structfield>prefixNNodes</structfield> to the number of nodes needed in the
+       new tuple, and set <structfield>prefixNodeLabels</structfield> to a palloc'd array
+       holding their labels, or to NULL if node labels are not required.
+       Note that the total size of the new upper tuple must be no more
+       than the total size of the tuple it is replacing; this constrains
+       the lengths of the new prefix and new labels.
+       Set <structfield>childNodeN</structfield> to the index (from zero) of the node
+       that will downlink to the new lower-level inner tuple.
+       Set <structfield>postfixHasPrefix</structfield> to indicate whether the new
+       lower-level inner tuple should have a prefix, and if so set
+       <structfield>postfixPrefixDatum</structfield> to the prefix value.  The
+       combination of these two prefixes and the downlink node's label
+       (if any) must have the same meaning as the original prefix, because
+       there is no opportunity to alter the node labels that are moved to
+       the new lower-level tuple, nor to change any child index entries.
+       After the node has been split, the <function>choose</function>
+       function will be called again with the replacement inner tuple.
+       That call may return an <literal>spgAddNode</literal> result, if no suitable
+       node was created by the <literal>spgSplitTuple</literal> action.  Eventually
+       <function>choose</function> must return <literal>spgMatchNode</literal> to
+       allow the insertion to descend to the next level.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><function>picksplit</function></term>
+     <listitem>
+      <para>
+       Decides how to create a new inner tuple over a set of leaf tuples.
+      </para>
+
+      <para>
+        The <acronym>SQL</acronym> declaration of the function must look like this:
+<programlisting>
+CREATE FUNCTION my_picksplit(internal, internal) RETURNS void ...
+</programlisting>
+      The first argument is a pointer to a <structname>spgPickSplitIn</structname>
+      C struct, containing input data for the function.
+      The second argument is a pointer to a <structname>spgPickSplitOut</structname>
+      C struct, which the function must fill with result data.
+<programlisting>
+typedef struct spgPickSplitIn
+{
+    int         nTuples;        /* number of leaf tuples */
+    Datum      *datums;         /* their datums (array of length nTuples) */
+    int         level;          /* current level (counting from zero) */
+} spgPickSplitIn;
+
+typedef struct spgPickSplitOut
+{
+    bool        hasPrefix;      /* new inner tuple should have a prefix? */
+    Datum       prefixDatum;    /* if so, its value */
+
+    int         nNodes;         /* number of nodes for new inner tuple */
+    Datum      *nodeLabels;     /* their labels (or NULL for no labels) */
+
+    int        *mapTuplesToNodes;   /* node index for each leaf tuple */
+    Datum      *leafTupleDatums;    /* datum to store in each new leaf tuple */
+} spgPickSplitOut;
+</programlisting>
+
+       <structfield>nTuples</structfield> is the number of leaf tuples provided.
+       <structfield>datums</structfield> is an array of their datum values of
+       <structname>spgConfigOut</structname>.<structfield>leafType</structfield>
+       type.
+       <structfield>level</structfield> is the current level that all the leaf tuples
+       share, which will become the level of the new inner tuple.
+      </para>
+
+      <para>
+       Set <structfield>hasPrefix</structfield> to indicate whether the new inner
+       tuple should have a prefix, and if so set
+       <structfield>prefixDatum</structfield> to the prefix value.
+       Set <structfield>nNodes</structfield> to indicate the number of nodes that
+       the new inner tuple will contain, and
+       set <structfield>nodeLabels</structfield> to an array of their label values,
+       or to NULL if node labels are not required.
+       Set <structfield>mapTuplesToNodes</structfield> to an array that gives the index
+       (from zero) of the node that each leaf tuple should be assigned to.
+       Set <structfield>leafTupleDatums</structfield> to an array of the values to
+       be stored in the new leaf tuples (these will be the same as the
+       input <structfield>datums</structfield> if the operator class does not modify
+       datums from one level to the next).
+       Note that the <function>picksplit</function> function is
+       responsible for palloc'ing the
+       <structfield>nodeLabels</structfield>, <structfield>mapTuplesToNodes</structfield> and
+       <structfield>leafTupleDatums</structfield> arrays.
+      </para>
+
+      <para>
+       If more than one leaf tuple is supplied, it is expected that the
+       <function>picksplit</function> function will classify them into more than
+       one node; otherwise it is not possible to split the leaf tuples
+       across multiple pages, which is the ultimate purpose of this
+       operation.  Therefore, if the <function>picksplit</function> function
+       ends up placing all the leaf tuples in the same node, the core
+       SP-GiST code will override that decision and generate an inner
+       tuple in which the leaf tuples are assigned at random to several
+       identically-labeled nodes.  Such a tuple is marked
+       <literal>allTheSame</literal> to signify that this has happened.  The
+       <function>choose</function> and <function>inner_consistent</function> functions
+       must take suitable care with such inner tuples.
+       See <xref linkend="spgist-all-the-same"/> for more information.
+      </para>
+
+      <para>
+       <function>picksplit</function> can be applied to a single leaf tuple only
+       in the case that the <function>config</function> function set
+       <structfield>longValuesOK</structfield> to true and a larger-than-a-page input
+       value has been supplied.  In this case the point of the operation is
+       to strip off a prefix and produce a new, shorter leaf datum value.
+       The call will be repeated until a leaf datum short enough to fit on
+       a page has been produced.  See <xref linkend="spgist-limits"/> for
+       more information.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><function>inner_consistent</function></term>
+     <listitem>
+      <para>
+       Returns set of nodes (branches) to follow during tree search.
+      </para>
+
+      <para>
+       The <acronym>SQL</acronym> declaration of the function must look like this:
+<programlisting>
+CREATE FUNCTION my_inner_consistent(internal, internal) RETURNS void ...
+</programlisting>
+      The first argument is a pointer to a <structname>spgInnerConsistentIn</structname>
+      C struct, containing input data for the function.
+      The second argument is a pointer to a <structname>spgInnerConsistentOut</structname>
+      C struct, which the function must fill with result data.
+
+<programlisting>
+typedef struct spgInnerConsistentIn
+{
+    ScanKey     scankeys;       /* array of operators and comparison values */
+    ScanKey     orderbys;       /* array of ordering operators and comparison
+                                 * values */
+    int         nkeys;          /* length of scankeys array */
+    int         norderbys;      /* length of orderbys array */
+
+    Datum       reconstructedValue;     /* value reconstructed at parent */
+    void       *traversalValue; /* opclass-specific traverse value */
+    MemoryContext traversalMemoryContext;   /* put new traverse values here */
+    int         level;          /* current level (counting from zero) */
+    bool        returnData;     /* original data must be returned? */
+
+    /* Data from current inner tuple */
+    bool        allTheSame;     /* tuple is marked all-the-same? */
+    bool        hasPrefix;      /* tuple has a prefix? */
+    Datum       prefixDatum;    /* if so, the prefix value */
+    int         nNodes;         /* number of nodes in the inner tuple */
+    Datum      *nodeLabels;     /* node label values (NULL if none) */
+} spgInnerConsistentIn;
+
+typedef struct spgInnerConsistentOut
+{
+    int         nNodes;         /* number of child nodes to be visited */
+    int        *nodeNumbers;    /* their indexes in the node array */
+    int        *levelAdds;      /* increment level by this much for each */
+    Datum      *reconstructedValues;    /* associated reconstructed values */
+    void      **traversalValues;        /* opclass-specific traverse values */
+    double    **distances;              /* associated distances */
+} spgInnerConsistentOut;
+</programlisting>
+
+       The array <structfield>scankeys</structfield>, of length <structfield>nkeys</structfield>,
+       describes the index search condition(s).  These conditions are
+       combined with AND &mdash; only index entries that satisfy all of
+       them are interesting.  (Note that <structfield>nkeys</structfield> = 0 implies
+       that all index entries satisfy the query.)  Usually the consistent
+       function only cares about the <structfield>sk_strategy</structfield> and
+       <structfield>sk_argument</structfield> fields of each array entry, which
+       respectively give the indexable operator and comparison value.
+       In particular it is not necessary to check <structfield>sk_flags</structfield> to
+       see if the comparison value is NULL, because the SP-GiST core code
+       will filter out such conditions.
+       The array <structfield>orderbys</structfield>, of length <structfield>norderbys</structfield>,
+       describes ordering operators (if any) in the same manner.
+       <structfield>reconstructedValue</structfield> is the value reconstructed for the
+       parent tuple; it is <literal>(Datum) 0</literal> at the root level or if the
+       <function>inner_consistent</function> function did not provide a value at the
+       parent level.
+       <structfield>traversalValue</structfield> is a pointer to any traverse data
+       passed down from the previous call of <function>inner_consistent</function>
+       on the parent index tuple, or NULL at the root level.
+       <structfield>traversalMemoryContext</structfield> is the memory context in which
+       to store output traverse values (see below).
+       <structfield>level</structfield> is the current inner tuple's level, starting at
+       zero for the root level.
+       <structfield>returnData</structfield> is <literal>true</literal> if reconstructed data is
+       required for this query; this will only be so if the
+       <function>config</function> function asserted <structfield>canReturnData</structfield>.
+       <structfield>allTheSame</structfield> is true if the current inner tuple is
+       marked <quote>all-the-same</quote>; in this case all the nodes have the
+       same label (if any) and so either all or none of them match the query
+       (see <xref linkend="spgist-all-the-same"/>).
+       <structfield>hasPrefix</structfield> is true if the current inner tuple contains
+       a prefix; if so,
+       <structfield>prefixDatum</structfield> is its value.
+       <structfield>nNodes</structfield> is the number of child nodes contained in the
+       inner tuple, and
+       <structfield>nodeLabels</structfield> is an array of their label values, or
+       NULL if the nodes do not have labels.
+      </para>
+
+      <para>
+       <structfield>nNodes</structfield> must be set to the number of child nodes that
+       need to be visited by the search, and
+       <structfield>nodeNumbers</structfield> must be set to an array of their indexes.
+       If the operator class keeps track of levels, set
+       <structfield>levelAdds</structfield> to an array of the level increments
+       required when descending to each node to be visited.  (Often these
+       increments will be the same for all the nodes, but that's not
+       necessarily so, so an array is used.)
+       If value reconstruction is needed, set
+       <structfield>reconstructedValues</structfield> to an array of the values
+       reconstructed for each child node to be visited; otherwise, leave
+       <structfield>reconstructedValues</structfield> as NULL.
+       The reconstructed values are assumed to be of type
+       <structname>spgConfigOut</structname>.<structfield>leafType</structfield>.
+       (However, since the core system will do nothing with them except
+       possibly copy them, it is sufficient for them to have the
+       same <literal>typlen</literal> and <literal>typbyval</literal>
+       properties as <structfield>leafType</structfield>.)
+       If ordered search is performed, set <structfield>distances</structfield>
+       to an array of distance values according to <structfield>orderbys</structfield>
+       array (nodes with lowest distances will be processed first).  Leave it
+       NULL otherwise.
+       If it is desired to pass down additional out-of-band information
+       (<quote>traverse values</quote>) to lower levels of the tree search,
+       set <structfield>traversalValues</structfield> to an array of the appropriate
+       traverse values, one for each child node to be visited; otherwise,
+       leave <structfield>traversalValues</structfield> as NULL.
+       Note that the <function>inner_consistent</function> function is
+       responsible for palloc'ing the
+       <structfield>nodeNumbers</structfield>, <structfield>levelAdds</structfield>,
+       <structfield>distances</structfield>,
+       <structfield>reconstructedValues</structfield>, and
+       <structfield>traversalValues</structfield> arrays in the current memory context.
+       However, any output traverse values pointed to by
+       the <structfield>traversalValues</structfield> array should be allocated
+       in <structfield>traversalMemoryContext</structfield>.
+       Each traverse value must be a single palloc'd chunk.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><function>leaf_consistent</function></term>
+     <listitem>
+      <para>
+       Returns true if a leaf tuple satisfies a query.
+      </para>
+
+      <para>
+       The <acronym>SQL</acronym> declaration of the function must look like this:
+<programlisting>
+CREATE FUNCTION my_leaf_consistent(internal, internal) RETURNS bool ...
+</programlisting>
+      The first argument is a pointer to a <structname>spgLeafConsistentIn</structname>
+      C struct, containing input data for the function.
+      The second argument is a pointer to a <structname>spgLeafConsistentOut</structname>
+      C struct, which the function must fill with result data.
+<programlisting>
+typedef struct spgLeafConsistentIn
+{
+    ScanKey     scankeys;       /* array of operators and comparison values */
+    ScanKey     orderbys;       /* array of ordering operators and comparison
+                                 * values */
+    int         nkeys;          /* length of scankeys array */
+    int         norderbys;      /* length of orderbys array */
+
+    Datum       reconstructedValue;     /* value reconstructed at parent */
+    void       *traversalValue; /* opclass-specific traverse value */
+    int         level;          /* current level (counting from zero) */
+    bool        returnData;     /* original data must be returned? */
+
+    Datum       leafDatum;      /* datum in leaf tuple */
+} spgLeafConsistentIn;
+
+typedef struct spgLeafConsistentOut
+{
+    Datum       leafValue;        /* reconstructed original data, if any */
+    bool        recheck;          /* set true if operator must be rechecked */
+    bool        recheckDistances; /* set true if distances must be rechecked */
+    double     *distances;        /* associated distances */
+} spgLeafConsistentOut;
+</programlisting>
+
+       The array <structfield>scankeys</structfield>, of length <structfield>nkeys</structfield>,
+       describes the index search condition(s).  These conditions are
+       combined with AND &mdash; only index entries that satisfy all of
+       them satisfy the query.  (Note that <structfield>nkeys</structfield> = 0 implies
+       that all index entries satisfy the query.)  Usually the consistent
+       function only cares about the <structfield>sk_strategy</structfield> and
+       <structfield>sk_argument</structfield> fields of each array entry, which
+       respectively give the indexable operator and comparison value.
+       In particular it is not necessary to check <structfield>sk_flags</structfield> to
+       see if the comparison value is NULL, because the SP-GiST core code
+       will filter out such conditions.
+       The array <structfield>orderbys</structfield>, of length <structfield>norderbys</structfield>,
+       describes the ordering operators in the same manner.
+       <structfield>reconstructedValue</structfield> is the value reconstructed for the
+       parent tuple; it is <literal>(Datum) 0</literal> at the root level or if the
+       <function>inner_consistent</function> function did not provide a value at the
+       parent level.
+       <structfield>traversalValue</structfield> is a pointer to any traverse data
+       passed down from the previous call of <function>inner_consistent</function>
+       on the parent index tuple, or NULL at the root level.
+       <structfield>level</structfield> is the current leaf tuple's level, starting at
+       zero for the root level.
+       <structfield>returnData</structfield> is <literal>true</literal> if reconstructed data is
+       required for this query; this will only be so if the
+       <function>config</function> function asserted <structfield>canReturnData</structfield>.
+       <structfield>leafDatum</structfield> is the key value of
+       <structname>spgConfigOut</structname>.<structfield>leafType</structfield>
+       stored in the current leaf tuple.
+      </para>
+
+      <para>
+       The function must return <literal>true</literal> if the leaf tuple matches the
+       query, or <literal>false</literal> if not.  In the <literal>true</literal> case,
+       if <structfield>returnData</structfield> is <literal>true</literal> then
+       <structfield>leafValue</structfield> must be set to the value (of type
+       <structname>spgConfigIn</structname>.<structfield>attType</structfield>)
+       originally supplied to be indexed for this leaf tuple.  Also,
+       <structfield>recheck</structfield> may be set to <literal>true</literal> if the match
+       is uncertain and so the operator(s) must be re-applied to the actual
+       heap tuple to verify the match.
+       If ordered search is performed, set <structfield>distances</structfield>
+       to an array of distance values according to <structfield>orderbys</structfield>
+       array.  Leave it NULL otherwise.  If at least one of returned distances
+       is not exact, set <structfield>recheckDistances</structfield> to true.
+       In this case, the executor will calculate the exact distances after
+       fetching the tuple from the heap, and will reorder the tuples if needed.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+
+ <para>
+  The optional user-defined methods are:
+ </para>
+
+ <variablelist>
+    <varlistentry>
+     <term><function>Datum compress(Datum in)</function></term>
+     <listitem>
+      <para>
+       Converts a data item into a format suitable for physical storage in
+       a leaf tuple of the index.  It accepts a value of type
+       <structname>spgConfigIn</structname>.<structfield>attType</structfield>
+       and returns a value of type
+       <structname>spgConfigOut</structname>.<structfield>leafType</structfield>.
+       The output value must not contain an out-of-line TOAST pointer.
+      </para>
+
+      <para>
+       Note: the <function>compress</function> method is only applied to
+       values to be stored.  The consistent methods receive query
+       <structfield>scankeys</structfield> unchanged, without transformation
+       using <function>compress</function>.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><function>options</function></term>
+     <listitem>
+      <para>
+       Defines a set of user-visible parameters that control operator class
+       behavior.
+      </para>
+
+      <para>
+        The <acronym>SQL</acronym> declaration of the function must look like this:
+
+<programlisting>
+CREATE OR REPLACE FUNCTION my_options(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+</programlisting>
+      </para>
+
+      <para>
+       The function is passed a pointer to a <structname>local_relopts</structname>
+       struct, which needs to be filled with a set of operator class
+       specific options.  The options can be accessed from other support
+       functions using the <literal>PG_HAS_OPCLASS_OPTIONS()</literal> and
+       <literal>PG_GET_OPCLASS_OPTIONS()</literal> macros.
+      </para>
+
+      <para>
+       Since the representation of the key in <acronym>SP-GiST</acronym> is
+       flexible, it may depend on user-specified parameters.
+      </para>
+     </listitem>
+    </varlistentry>
+  </variablelist>
+
+  <para>
+   All the SP-GiST support methods are normally called in a short-lived
+   memory context; that is, <varname>CurrentMemoryContext</varname> will be reset
+   after processing of each tuple.  It is therefore not very important to
+   worry about pfree'ing everything you palloc.  (The <function>config</function>
+   method is an exception: it should try to avoid leaking memory.  But
+   usually the <function>config</function> method need do nothing but assign
+   constants into the passed parameter struct.)
+  </para>
+
+  <para>
+   If the indexed column is of a collatable data type, the index collation
+   will be passed to all the support methods, using the standard
+   <function>PG_GET_COLLATION()</function> mechanism.
+  </para>
+
+</sect1>
+
+<sect1 id="spgist-implementation">
+ <title>Implementation</title>
+
+  <para>
+   This section covers implementation details and other tricks that are
+   useful for implementers of <acronym>SP-GiST</acronym> operator classes to
+   know.
+  </para>
+
+ <sect2 id="spgist-limits">
+  <title>SP-GiST Limits</title>
+
+  <para>
+   Individual leaf tuples and inner tuples must fit on a single index page
+   (8kB by default).  Therefore, when indexing values of variable-length
+   data types, long values can only be supported by methods such as radix
+   trees, in which each level of the tree includes a prefix that is short
+   enough to fit on a page, and the final leaf level includes a suffix also
+   short enough to fit on a page.  The operator class should set
+   <structfield>longValuesOK</structfield> to true only if it is prepared to arrange for
+   this to happen.  Otherwise, the <acronym>SP-GiST</acronym> core will
+   reject any request to index a value that is too large to fit
+   on an index page.
+  </para>
+
+  <para>
+   Likewise, it is the operator class's responsibility that inner tuples
+   do not grow too large to fit on an index page; this limits the number
+   of child nodes that can be used in one inner tuple, as well as the
+   maximum size of a prefix value.
+  </para>
+
+  <para>
+   Another limitation is that when an inner tuple's node points to a set
+   of leaf tuples, those tuples must all be in the same index page.
+   (This is a design decision to reduce seeking and save space in the
+   links that chain such tuples together.)  If the set of leaf tuples
+   grows too large for a page, a split is performed and an intermediate
+   inner tuple is inserted.  For this to fix the problem, the new inner
+   tuple <emphasis>must</emphasis> divide the set of leaf values into more than one
+   node group.  If the operator class's <function>picksplit</function> function
+   fails to do that, the <acronym>SP-GiST</acronym> core resorts to
+   extraordinary measures described in <xref linkend="spgist-all-the-same"/>.
+  </para>
+
+  <para>
+   When <structfield>longValuesOK</structfield> is true, it is expected
+   that successive levels of the <acronym>SP-GiST</acronym> tree will
+   absorb more and more information into the prefixes and node labels of
+   the inner tuples, making the required leaf datum smaller and smaller,
+   so that eventually it will fit on a page.
+   To prevent bugs in operator classes from causing infinite insertion
+   loops, the <acronym>SP-GiST</acronym> core will raise an error if the
+   leaf datum does not become any smaller within ten cycles
+   of <function>choose</function> method calls.
+  </para>
+ </sect2>
+
+ <sect2 id="spgist-null-labels">
+  <title>SP-GiST Without Node Labels</title>
+
+  <para>
+   Some tree algorithms use a fixed set of nodes for each inner tuple;
+   for example, in a quad-tree there are always exactly four nodes
+   corresponding to the four quadrants around the inner tuple's centroid
+   point.  In such a case the code typically works with the nodes by
+   number, and there is no need for explicit node labels.  To suppress
+   node labels (and thereby save some space), the <function>picksplit</function>
+   function can return NULL for the <structfield>nodeLabels</structfield> array,
+   and likewise the <function>choose</function> function can return NULL for
+   the <structfield>prefixNodeLabels</structfield> array during
+   a <literal>spgSplitTuple</literal> action.
+   This will in turn result in <structfield>nodeLabels</structfield> being NULL during
+   subsequent calls to <function>choose</function> and <function>inner_consistent</function>.
+   In principle, node labels could be used for some inner tuples and omitted
+   for others in the same index.
+  </para>
+
+  <para>
+   When working with an inner tuple having unlabeled nodes, it is an error
+   for <function>choose</function> to return <literal>spgAddNode</literal>, since the set
+   of nodes is supposed to be fixed in such cases.
+  </para>
+ </sect2>
+
+ <sect2 id="spgist-all-the-same">
+  <title><quote>All-the-Same</quote> Inner Tuples</title>
+
+  <para>
+   The <acronym>SP-GiST</acronym> core can override the results of the
+   operator class's <function>picksplit</function> function when
+   <function>picksplit</function> fails to divide the supplied leaf values into
+   at least two node categories.  When this happens, the new inner tuple
+   is created with multiple nodes that each have the same label (if any)
+   that <function>picksplit</function> gave to the one node it did use, and the
+   leaf values are divided at random among these equivalent nodes.
+   The <literal>allTheSame</literal> flag is set on the inner tuple to warn the
+   <function>choose</function> and <function>inner_consistent</function> functions that the
+   tuple does not have the node set that they might otherwise expect.
+  </para>
+
+  <para>
+   When dealing with an <literal>allTheSame</literal> tuple, a <function>choose</function>
+   result of <literal>spgMatchNode</literal> is interpreted to mean that the new
+   value can be assigned to any of the equivalent nodes; the core code will
+   ignore the supplied  <structfield>nodeN</structfield> value and descend into one
+   of the nodes at random (so as to keep the tree balanced).  It is an
+   error for <function>choose</function> to return <literal>spgAddNode</literal>, since
+   that would make the nodes not all equivalent; the
+   <literal>spgSplitTuple</literal> action must be used if the value to be inserted
+   doesn't match the existing nodes.
+  </para>
+
+  <para>
+   When dealing with an <literal>allTheSame</literal> tuple, the
+   <function>inner_consistent</function> function should return either all or none
+   of the nodes as targets for continuing the index search, since they are
+   all equivalent.  This may or may not require any special-case code,
+   depending on how much the <function>inner_consistent</function> function normally
+   assumes about the meaning of the nodes.
+  </para>
+ </sect2>
+
+</sect1>
+
+<sect1 id="spgist-examples">
+ <title>Examples</title>
+
+ <para>
+  The <productname>PostgreSQL</productname> source distribution includes
+  several examples of index operator classes for <acronym>SP-GiST</acronym>,
+  as described in <xref linkend="spgist-builtin-opclasses-table"/>.  Look
+  into <filename>src/backend/access/spgist/</filename>
+  and <filename>src/backend/utils/adt/</filename> to see the code.
+ </para>
+
+</sect1>
+
+</chapter>