summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/spgist.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/spgist.sgml')
-rw-r--r--doc/src/sgml/spgist.sgml1076
1 files changed, 1076 insertions, 0 deletions
diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
new file mode 100644
index 0000000..102f862
--- /dev/null
+++ b/doc/src/sgml/spgist.sgml
@@ -0,0 +1,1076 @@
+<!-- doc/src/sgml/spgist.sgml -->
+
+<chapter id="spgist">
+<title>SP-GiST Indexes</title>
+
+ <indexterm>
+ <primary>index</primary>
+ <secondary>SP-GiST</secondary>
+ </indexterm>
+
+<sect1 id="spgist-intro">
+ <title>Introduction</title>
+
+ <para>
+ <acronym>SP-GiST</acronym> is an abbreviation for space-partitioned
+ <acronym>GiST</acronym>. <acronym>SP-GiST</acronym> supports partitioned
+ search trees, which facilitate development of a wide range of different
+ non-balanced data structures, such as quad-trees, k-d trees, and radix
+ trees (tries). The common feature of these structures is that they
+ repeatedly divide the search space into partitions that need not be
+ of equal size. Searches that are well matched to the partitioning rule
+ can be very fast.
+ </para>
+
+ <para>
+ These popular data structures were originally developed for in-memory
+ usage. In main memory, they are usually designed as a set of dynamically
+ allocated nodes linked by pointers. This is not suitable for direct
+ storing on disk, since these chains of pointers can be rather long which
+ would require too many disk accesses. In contrast, disk-based data
+ structures should have a high fanout to minimize I/O. The challenge
+ addressed by <acronym>SP-GiST</acronym> is to map search tree nodes to
+ disk pages in such a way that a search need access only a few disk pages,
+ even if it traverses many nodes.
+ </para>
+
+ <para>
+ Like <acronym>GiST</acronym>, <acronym>SP-GiST</acronym> is meant to allow
+ the development of custom data types with the appropriate access methods,
+ by an expert in the domain of the data type, rather than a database expert.
+ </para>
+
+ <para>
+ Some of the information here is derived from Purdue University's
+ SP-GiST Indexing Project
+ <ulink url="https://www.cs.purdue.edu/spgist/">web site</ulink>.
+ The <acronym>SP-GiST</acronym> implementation in
+ <productname>PostgreSQL</productname> is primarily maintained by Teodor
+ Sigaev and Oleg Bartunov, and there is more information on their
+ <!-- URL will be changed -->
+ <ulink url="http://www.sai.msu.su/~megera/wiki/spgist_dev">web site</ulink>.
+ </para>
+
+</sect1>
+
+<sect1 id="spgist-builtin-opclasses">
+ <title>Built-in Operator Classes</title>
+
+ <para>
+ The core <productname>PostgreSQL</productname> distribution
+ includes the <acronym>SP-GiST</acronym> operator classes shown in
+ <xref linkend="spgist-builtin-opclasses-table"/>.
+ </para>
+
+ <table id="spgist-builtin-opclasses-table">
+ <title>Built-in <acronym>SP-GiST</acronym> Operator Classes</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Name</entry>
+ <entry>Indexable Operators</entry>
+ <entry>Ordering Operators</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry valign="middle" morerows="11"><literal>box_ops</literal></entry>
+ <entry><literal>&lt;&lt; (box,box)</literal></entry>
+ <entry valign="middle" morerows="11"><literal>&lt;-&gt; (box,point)</literal></entry>
+ </row>
+ <row><entry><literal>&amp;&lt; (box,box)</literal></entry></row>
+ <row><entry><literal>&amp;&gt; (box,box)</literal></entry></row>
+ <row><entry><literal>&gt;&gt; (box,box)</literal></entry></row>
+ <row><entry><literal>&lt;@ (box,box)</literal></entry></row>
+ <row><entry><literal>@&gt; (box,box)</literal></entry></row>
+ <row><entry><literal>~= (box,box)</literal></entry></row>
+ <row><entry><literal>&amp;&amp; (box,box)</literal></entry></row>
+ <row><entry><literal>&lt;&lt;| (box,box)</literal></entry></row>
+ <row><entry><literal>&amp;&lt;| (box,box)</literal></entry></row>
+ <row><entry><literal>|&amp;&gt; (box,box)</literal></entry></row>
+ <row><entry><literal>|&gt;&gt; (box,box)</literal></entry></row>
+
+ <row>
+ <entry valign="middle" morerows="10"><literal>inet_ops</literal></entry>
+ <entry><literal>&lt;&lt; (inet,inet)</literal></entry>
+ <entry valign="middle" morerows="10"></entry>
+ </row>
+ <row><entry><literal>&lt;&lt;= (inet,inet)</literal></entry></row>
+ <row><entry><literal>&gt;&gt; (inet,inet)</literal></entry></row>
+ <row><entry><literal>&gt;&gt;= (inet,inet)</literal></entry></row>
+ <row><entry><literal>= (inet,inet)</literal></entry></row>
+ <row><entry><literal>&lt;&gt; (inet,inet)</literal></entry></row>
+ <row><entry><literal>&lt; (inet,inet)</literal></entry></row>
+ <row><entry><literal>&lt;= (inet,inet)</literal></entry></row>
+ <row><entry><literal>&gt; (inet,inet)</literal></entry></row>
+ <row><entry><literal>&gt;= (inet,inet)</literal></entry></row>
+ <row><entry><literal>&amp;&amp; (inet,inet)</literal></entry></row>
+
+ <row>
+ <entry valign="middle" morerows="5"><literal>kd_point_ops</literal></entry>
+ <entry><literal>|&gt;&gt; (point,point)</literal></entry>
+ <entry valign="middle" morerows="5"><literal>&lt;-&gt; (point,point)</literal></entry>
+ </row>
+ <row><entry><literal>&lt;&lt; (point,point)</literal></entry></row>
+ <row><entry><literal>&gt;&gt; (point,point)</literal></entry></row>
+ <row><entry><literal>&lt;&lt;| (point,point)</literal></entry></row>
+ <row><entry><literal>~= (point,point)</literal></entry></row>
+ <row><entry><literal>&lt;@ (point,box)</literal></entry></row>
+
+ <row>
+ <entry valign="middle" morerows="11"><literal>poly_ops</literal></entry>
+ <entry><literal>&lt;&lt; (polygon,polygon)</literal></entry>
+ <entry valign="middle" morerows="11"><literal>&lt;-&gt; (polygon,point)</literal></entry>
+ </row>
+ <row><entry><literal>&amp;&lt; (polygon,polygon)</literal></entry></row>
+ <row><entry><literal>&amp;&gt; (polygon,polygon)</literal></entry></row>
+ <row><entry><literal>&gt;&gt; (polygon,polygon)</literal></entry></row>
+ <row><entry><literal>&lt;@ (polygon,polygon)</literal></entry></row>
+ <row><entry><literal>@&gt; (polygon,polygon)</literal></entry></row>
+ <row><entry><literal>~= (polygon,polygon)</literal></entry></row>
+ <row><entry><literal>&amp;&amp; (polygon,polygon)</literal></entry></row>
+ <row><entry><literal>&lt;&lt;| (polygon,polygon)</literal></entry></row>
+ <row><entry><literal>&amp;&lt;| (polygon,polygon)</literal></entry></row>
+ <row><entry><literal>|&gt;&gt; (polygon,polygon)</literal></entry></row>
+ <row><entry><literal>|&amp;&gt; (polygon,polygon)</literal></entry></row>
+
+ <row>
+ <entry valign="middle" morerows="5"><literal>quad_point_ops</literal></entry>
+ <entry><literal>|&gt;&gt; (point,point)</literal></entry>
+ <entry valign="middle" morerows="5"><literal>&lt;-&gt; (point,point)</literal></entry>
+ </row>
+ <row><entry><literal>&lt;&lt; (point,point)</literal></entry></row>
+ <row><entry><literal>&gt;&gt; (point,point)</literal></entry></row>
+ <row><entry><literal>&lt;&lt;| (point,point)</literal></entry></row>
+ <row><entry><literal>~= (point,point)</literal></entry></row>
+ <row><entry><literal>&lt;@ (point,box)</literal></entry></row>
+
+ <row>
+ <entry valign="middle" morerows="9"><literal>range_ops</literal></entry>
+ <entry><literal>= (anyrange,anyrange)</literal></entry>
+ <entry valign="middle" morerows="9"></entry>
+ </row>
+ <row><entry><literal>&amp;&amp; (anyrange,anyrange)</literal></entry></row>
+ <row><entry><literal>@&gt; (anyrange,anyelement)</literal></entry></row>
+ <row><entry><literal>@&gt; (anyrange,anyrange)</literal></entry></row>
+ <row><entry><literal>&lt;@ (anyrange,anyrange)</literal></entry></row>
+ <row><entry><literal>&lt;&lt; (anyrange,anyrange)</literal></entry></row>
+ <row><entry><literal>&gt;&gt; (anyrange,anyrange)</literal></entry></row>
+ <row><entry><literal>&amp;&lt; (anyrange,anyrange)</literal></entry></row>
+ <row><entry><literal>&amp;&gt; (anyrange,anyrange)</literal></entry></row>
+ <row><entry><literal>-|- (anyrange,anyrange)</literal></entry></row>
+
+ <row>
+ <entry valign="middle" morerows="9"><literal>text_ops</literal></entry>
+ <entry><literal>= (text,text)</literal></entry>
+ <entry valign="middle" morerows="9"></entry>
+ </row>
+ <row><entry><literal>&lt; (text,text)</literal></entry></row>
+ <row><entry><literal>&lt;= (text,text)</literal></entry></row>
+ <row><entry><literal>&gt; (text,text)</literal></entry></row>
+ <row><entry><literal>&gt;= (text,text)</literal></entry></row>
+ <row><entry><literal>~&lt;~ (text,text)</literal></entry></row>
+ <row><entry><literal>~&lt;=~ (text,text)</literal></entry></row>
+ <row><entry><literal>~&gt;=~ (text,text)</literal></entry></row>
+ <row><entry><literal>~&gt;~ (text,text)</literal></entry></row>
+ <row><entry><literal>^@ (text,text)</literal></entry></row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ Of the two operator classes for type <type>point</type>,
+ <literal>quad_point_ops</literal> is the default. <literal>kd_point_ops</literal>
+ supports the same operators but uses a different index data structure that
+ may offer better performance in some applications.
+ </para>
+ <para>
+ The <literal>quad_point_ops</literal>, <literal>kd_point_ops</literal> and
+ <literal>poly_ops</literal> operator classes support the <literal>&lt;-&gt;</literal>
+ ordering operator, which enables the k-nearest neighbor (<literal>k-NN</literal>)
+ search over indexed point or polygon data sets.
+ </para>
+
+</sect1>
+
+<sect1 id="spgist-extensibility">
+ <title>Extensibility</title>
+
+ <para>
+ <acronym>SP-GiST</acronym> offers an interface with a high level of
+ abstraction, requiring the access method developer to implement only
+ methods specific to a given data type. The <acronym>SP-GiST</acronym> core
+ is responsible for efficient disk mapping and searching the tree structure.
+ It also takes care of concurrency and logging considerations.
+ </para>
+
+ <para>
+ Leaf tuples of an <acronym>SP-GiST</acronym> tree usually contain values
+ of the same data type as the indexed column, although it is also possible
+ for them to contain lossy representations of the indexed column.
+ Leaf tuples stored at the root level will directly represent
+ the original indexed data value, but leaf tuples at lower
+ levels might contain only a partial value, such as a suffix.
+ In that case the operator class support functions must be able to
+ reconstruct the original value using information accumulated from the
+ inner tuples that are passed through to reach the leaf level.
+ </para>
+
+ <para>
+ When an <acronym>SP-GiST</acronym> index is created with
+ <literal>INCLUDE</literal> columns, the values of those columns are also
+ stored in leaf tuples. The <literal>INCLUDE</literal> columns are of no
+ concern to the <acronym>SP-GiST</acronym> operator class, so they are
+ not discussed further here.
+ </para>
+
+ <para>
+ Inner tuples are more complex, since they are branching points in the
+ search tree. Each inner tuple contains a set of one or more
+ <firstterm>nodes</firstterm>, which represent groups of similar leaf values.
+ A node contains a downlink that leads either to another, lower-level inner
+ tuple, or to a short list of leaf tuples that all lie on the same index page.
+ Each node normally has a <firstterm>label</firstterm> that describes it; for example,
+ in a radix tree the node label could be the next character of the string
+ value. (Alternatively, an operator class can omit the node labels, if it
+ works with a fixed set of nodes for all inner tuples;
+ see <xref linkend="spgist-null-labels"/>.)
+ Optionally, an inner tuple can have a <firstterm>prefix</firstterm> value
+ that describes all its members. In a radix tree this could be the common
+ prefix of the represented strings. The prefix value is not necessarily
+ really a prefix, but can be any data needed by the operator class;
+ for example, in a quad-tree it can store the central point that the four
+ quadrants are measured with respect to. A quad-tree inner tuple would
+ then also contain four nodes corresponding to the quadrants around this
+ central point.
+ </para>
+
+ <para>
+ Some tree algorithms require knowledge of level (or depth) of the current
+ tuple, so the <acronym>SP-GiST</acronym> core provides the possibility for
+ operator classes to manage level counting while descending the tree.
+ There is also support for incrementally reconstructing the represented
+ value when that is needed, and for passing down additional data (called
+ <firstterm>traverse values</firstterm>) during a tree descent.
+ </para>
+
+ <note>
+ <para>
+ The <acronym>SP-GiST</acronym> core code takes care of null entries.
+ Although <acronym>SP-GiST</acronym> indexes do store entries for nulls
+ in indexed columns, this is hidden from the index operator class code:
+ no null index entries or search conditions will ever be passed to the
+ operator class methods. (It is assumed that <acronym>SP-GiST</acronym>
+ operators are strict and so cannot succeed for null values.) Null values
+ are therefore not discussed further here.
+ </para>
+ </note>
+
+ <para>
+ There are five user-defined methods that an index operator class for
+ <acronym>SP-GiST</acronym> must provide, and two are optional. All five
+ mandatory methods follow the convention of accepting two <type>internal</type>
+ arguments, the first of which is a pointer to a C struct containing input
+ values for the support method, while the second argument is a pointer to a
+ C struct where output values must be placed. Four of the mandatory methods just
+ return <type>void</type>, since all their results appear in the output struct; but
+ <function>leaf_consistent</function> returns a <type>boolean</type> result.
+ The methods must not modify any fields of their input structs. In all
+ cases, the output struct is initialized to zeroes before calling the
+ user-defined method. The optional sixth method <function>compress</function>
+ accepts a <type>datum</type> to be indexed as the only argument and returns a value suitable
+ for physical storage in a leaf tuple. The optional seventh method
+ <function>options</function> accepts an <type>internal</type> pointer to a C struct, where
+ opclass-specific parameters should be placed, and returns <type>void</type>.
+ </para>
+
+ <para>
+ The five mandatory user-defined methods are:
+ </para>
+
+ <variablelist>
+ <varlistentry>
+ <term><function>config</function></term>
+ <listitem>
+ <para>
+ Returns static information about the index implementation, including
+ the data type OIDs of the prefix and node label data types.
+ </para>
+ <para>
+ The <acronym>SQL</acronym> declaration of the function must look like this:
+<programlisting>
+CREATE FUNCTION my_config(internal, internal) RETURNS void ...
+</programlisting>
+ The first argument is a pointer to a <structname>spgConfigIn</structname>
+ C struct, containing input data for the function.
+ The second argument is a pointer to a <structname>spgConfigOut</structname>
+ C struct, which the function must fill with result data.
+<programlisting>
+typedef struct spgConfigIn
+{
+ Oid attType; /* Data type to be indexed */
+} spgConfigIn;
+
+typedef struct spgConfigOut
+{
+ Oid prefixType; /* Data type of inner-tuple prefixes */
+ Oid labelType; /* Data type of inner-tuple node labels */
+ Oid leafType; /* Data type of leaf-tuple values */
+ bool canReturnData; /* Opclass can reconstruct original data */
+ bool longValuesOK; /* Opclass can cope with values &gt; 1 page */
+} spgConfigOut;
+</programlisting>
+
+ <structfield>attType</structfield> is passed in order to support polymorphic
+ index operator classes; for ordinary fixed-data-type operator classes, it
+ will always have the same value and so can be ignored.
+ </para>
+
+ <para>
+ For operator classes that do not use prefixes,
+ <structfield>prefixType</structfield> can be set to <literal>VOIDOID</literal>.
+ Likewise, for operator classes that do not use node labels,
+ <structfield>labelType</structfield> can be set to <literal>VOIDOID</literal>.
+ <structfield>canReturnData</structfield> should be set true if the operator class
+ is capable of reconstructing the originally-supplied index value.
+ <structfield>longValuesOK</structfield> should be set true only when the
+ <structfield>attType</structfield> is of variable length and the operator
+ class is capable of segmenting long values by repeated suffixing
+ (see <xref linkend="spgist-limits"/>).
+ </para>
+
+ <para>
+ <structfield>leafType</structfield> should match the index storage type
+ defined by the operator class's <structfield>opckeytype</structfield>
+ catalog entry.
+ (Note that <structfield>opckeytype</structfield> can be zero,
+ implying the storage type is the same as the operator class's input
+ type, which is the most common situation.)
+ For reasons of backward compatibility, the <function>config</function>
+ method can set <structfield>leafType</structfield> to some other value,
+ and that value will be used; but this is deprecated since the index
+ contents are then incorrectly identified in the catalogs.
+ Also, it's permissible to
+ leave <structfield>leafType</structfield> uninitialized (zero);
+ that is interpreted as meaning the index storage type derived from
+ <structfield>opckeytype</structfield>.
+ </para>
+
+ <para>
+ When <structfield>attType</structfield>
+ and <structfield>leafType</structfield> are different, the optional
+ method <function>compress</function> must be provided.
+ Method <function>compress</function> is responsible
+ for transformation of datums to be indexed from <structfield>attType</structfield>
+ to <structfield>leafType</structfield>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><function>choose</function></term>
+ <listitem>
+ <para>
+ Chooses a method for inserting a new value into an inner tuple.
+ </para>
+
+ <para>
+ The <acronym>SQL</acronym> declaration of the function must look like this:
+<programlisting>
+CREATE FUNCTION my_choose(internal, internal) RETURNS void ...
+</programlisting>
+ The first argument is a pointer to a <structname>spgChooseIn</structname>
+ C struct, containing input data for the function.
+ The second argument is a pointer to a <structname>spgChooseOut</structname>
+ C struct, which the function must fill with result data.
+<programlisting>
+typedef struct spgChooseIn
+{
+ Datum datum; /* original datum to be indexed */
+ Datum leafDatum; /* current datum to be stored at leaf */
+ int level; /* current level (counting from zero) */
+
+ /* Data from current inner tuple */
+ bool allTheSame; /* tuple is marked all-the-same? */
+ bool hasPrefix; /* tuple has a prefix? */
+ Datum prefixDatum; /* if so, the prefix value */
+ int nNodes; /* number of nodes in the inner tuple */
+ Datum *nodeLabels; /* node label values (NULL if none) */
+} spgChooseIn;
+
+typedef enum spgChooseResultType
+{
+ spgMatchNode = 1, /* descend into existing node */
+ spgAddNode, /* add a node to the inner tuple */
+ spgSplitTuple /* split inner tuple (change its prefix) */
+} spgChooseResultType;
+
+typedef struct spgChooseOut
+{
+ spgChooseResultType resultType; /* action code, see above */
+ union
+ {
+ struct /* results for spgMatchNode */
+ {
+ int nodeN; /* descend to this node (index from 0) */
+ int levelAdd; /* increment level by this much */
+ Datum restDatum; /* new leaf datum */
+ } matchNode;
+ struct /* results for spgAddNode */
+ {
+ Datum nodeLabel; /* new node's label */
+ int nodeN; /* where to insert it (index from 0) */
+ } addNode;
+ struct /* results for spgSplitTuple */
+ {
+ /* Info to form new upper-level inner tuple with one child tuple */
+ bool prefixHasPrefix; /* tuple should have a prefix? */
+ Datum prefixPrefixDatum; /* if so, its value */
+ int prefixNNodes; /* number of nodes */
+ Datum *prefixNodeLabels; /* their labels (or NULL for
+ * no labels) */
+ int childNodeN; /* which node gets child tuple */
+
+ /* Info to form new lower-level inner tuple with all old nodes */
+ bool postfixHasPrefix; /* tuple should have a prefix? */
+ Datum postfixPrefixDatum; /* if so, its value */
+ } splitTuple;
+ } result;
+} spgChooseOut;
+</programlisting>
+
+ <structfield>datum</structfield> is the original datum of
+ <structname>spgConfigIn</structname>.<structfield>attType</structfield>
+ type that was to be inserted into the index.
+ <structfield>leafDatum</structfield> is a value of
+ <structname>spgConfigOut</structname>.<structfield>leafType</structfield>
+ type, which is initially a result of method
+ <function>compress</function> applied to <structfield>datum</structfield>
+ when method <function>compress</function> is provided, or the same value as
+ <structfield>datum</structfield> otherwise.
+ <structfield>leafDatum</structfield> can change at lower levels of the tree
+ if the <function>choose</function> or <function>picksplit</function>
+ methods change it. When the insertion search reaches a leaf page,
+ the current value of <structfield>leafDatum</structfield> is what will be stored
+ in the newly created leaf tuple.
+ <structfield>level</structfield> is the current inner tuple's level, starting at
+ zero for the root level.
+ <structfield>allTheSame</structfield> is true if the current inner tuple is
+ marked as containing multiple equivalent nodes
+ (see <xref linkend="spgist-all-the-same"/>).
+ <structfield>hasPrefix</structfield> is true if the current inner tuple contains
+ a prefix; if so,
+ <structfield>prefixDatum</structfield> is its value.
+ <structfield>nNodes</structfield> is the number of child nodes contained in the
+ inner tuple, and
+ <structfield>nodeLabels</structfield> is an array of their label values, or
+ NULL if there are no labels.
+ </para>
+
+ <para>
+ The <function>choose</function> function can determine either that
+ the new value matches one of the existing child nodes, or that a new
+ child node must be added, or that the new value is inconsistent with
+ the tuple prefix and so the inner tuple must be split to create a
+ less restrictive prefix.
+ </para>
+
+ <para>
+ If the new value matches one of the existing child nodes,
+ set <structfield>resultType</structfield> to <literal>spgMatchNode</literal>.
+ Set <structfield>nodeN</structfield> to the index (from zero) of that node in
+ the node array.
+ Set <structfield>levelAdd</structfield> to the increment in
+ <structfield>level</structfield> caused by descending through that node,
+ or leave it as zero if the operator class does not use levels.
+ Set <structfield>restDatum</structfield> to equal <structfield>leafDatum</structfield>
+ if the operator class does not modify datums from one level to the
+ next, or otherwise set it to the modified value to be used as
+ <structfield>leafDatum</structfield> at the next level.
+ </para>
+
+ <para>
+ If a new child node must be added,
+ set <structfield>resultType</structfield> to <literal>spgAddNode</literal>.
+ Set <structfield>nodeLabel</structfield> to the label to be used for the new
+ node, and set <structfield>nodeN</structfield> to the index (from zero) at which
+ to insert the node in the node array.
+ After the node has been added, the <function>choose</function>
+ function will be called again with the modified inner tuple;
+ that call should result in an <literal>spgMatchNode</literal> result.
+ </para>
+
+ <para>
+ If the new value is inconsistent with the tuple prefix,
+ set <structfield>resultType</structfield> to <literal>spgSplitTuple</literal>.
+ This action moves all the existing nodes into a new lower-level
+ inner tuple, and replaces the existing inner tuple with a tuple
+ having a single downlink pointing to the new lower-level inner tuple.
+ Set <structfield>prefixHasPrefix</structfield> to indicate whether the new
+ upper tuple should have a prefix, and if so set
+ <structfield>prefixPrefixDatum</structfield> to the prefix value. This new
+ prefix value must be sufficiently less restrictive than the original
+ to accept the new value to be indexed.
+ Set <structfield>prefixNNodes</structfield> to the number of nodes needed in the
+ new tuple, and set <structfield>prefixNodeLabels</structfield> to a palloc'd array
+ holding their labels, or to NULL if node labels are not required.
+ Note that the total size of the new upper tuple must be no more
+ than the total size of the tuple it is replacing; this constrains
+ the lengths of the new prefix and new labels.
+ Set <structfield>childNodeN</structfield> to the index (from zero) of the node
+ that will downlink to the new lower-level inner tuple.
+ Set <structfield>postfixHasPrefix</structfield> to indicate whether the new
+ lower-level inner tuple should have a prefix, and if so set
+ <structfield>postfixPrefixDatum</structfield> to the prefix value. The
+ combination of these two prefixes and the downlink node's label
+ (if any) must have the same meaning as the original prefix, because
+ there is no opportunity to alter the node labels that are moved to
+ the new lower-level tuple, nor to change any child index entries.
+ After the node has been split, the <function>choose</function>
+ function will be called again with the replacement inner tuple.
+ That call may return an <literal>spgAddNode</literal> result, if no suitable
+ node was created by the <literal>spgSplitTuple</literal> action. Eventually
+ <function>choose</function> must return <literal>spgMatchNode</literal> to
+ allow the insertion to descend to the next level.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><function>picksplit</function></term>
+ <listitem>
+ <para>
+ Decides how to create a new inner tuple over a set of leaf tuples.
+ </para>
+
+ <para>
+ The <acronym>SQL</acronym> declaration of the function must look like this:
+<programlisting>
+CREATE FUNCTION my_picksplit(internal, internal) RETURNS void ...
+</programlisting>
+ The first argument is a pointer to a <structname>spgPickSplitIn</structname>
+ C struct, containing input data for the function.
+ The second argument is a pointer to a <structname>spgPickSplitOut</structname>
+ C struct, which the function must fill with result data.
+<programlisting>
+typedef struct spgPickSplitIn
+{
+ int nTuples; /* number of leaf tuples */
+ Datum *datums; /* their datums (array of length nTuples) */
+ int level; /* current level (counting from zero) */
+} spgPickSplitIn;
+
+typedef struct spgPickSplitOut
+{
+ bool hasPrefix; /* new inner tuple should have a prefix? */
+ Datum prefixDatum; /* if so, its value */
+
+ int nNodes; /* number of nodes for new inner tuple */
+ Datum *nodeLabels; /* their labels (or NULL for no labels) */
+
+ int *mapTuplesToNodes; /* node index for each leaf tuple */
+ Datum *leafTupleDatums; /* datum to store in each new leaf tuple */
+} spgPickSplitOut;
+</programlisting>
+
+ <structfield>nTuples</structfield> is the number of leaf tuples provided.
+ <structfield>datums</structfield> is an array of their datum values of
+ <structname>spgConfigOut</structname>.<structfield>leafType</structfield>
+ type.
+ <structfield>level</structfield> is the current level that all the leaf tuples
+ share, which will become the level of the new inner tuple.
+ </para>
+
+ <para>
+ Set <structfield>hasPrefix</structfield> to indicate whether the new inner
+ tuple should have a prefix, and if so set
+ <structfield>prefixDatum</structfield> to the prefix value.
+ Set <structfield>nNodes</structfield> to indicate the number of nodes that
+ the new inner tuple will contain, and
+ set <structfield>nodeLabels</structfield> to an array of their label values,
+ or to NULL if node labels are not required.
+ Set <structfield>mapTuplesToNodes</structfield> to an array that gives the index
+ (from zero) of the node that each leaf tuple should be assigned to.
+ Set <structfield>leafTupleDatums</structfield> to an array of the values to
+ be stored in the new leaf tuples (these will be the same as the
+ input <structfield>datums</structfield> if the operator class does not modify
+ datums from one level to the next).
+ Note that the <function>picksplit</function> function is
+ responsible for palloc'ing the
+ <structfield>nodeLabels</structfield>, <structfield>mapTuplesToNodes</structfield> and
+ <structfield>leafTupleDatums</structfield> arrays.
+ </para>
+
+ <para>
+ If more than one leaf tuple is supplied, it is expected that the
+ <function>picksplit</function> function will classify them into more than
+ one node; otherwise it is not possible to split the leaf tuples
+ across multiple pages, which is the ultimate purpose of this
+ operation. Therefore, if the <function>picksplit</function> function
+ ends up placing all the leaf tuples in the same node, the core
+ SP-GiST code will override that decision and generate an inner
+ tuple in which the leaf tuples are assigned at random to several
+ identically-labeled nodes. Such a tuple is marked
+ <literal>allTheSame</literal> to signify that this has happened. The
+ <function>choose</function> and <function>inner_consistent</function> functions
+ must take suitable care with such inner tuples.
+ See <xref linkend="spgist-all-the-same"/> for more information.
+ </para>
+
+ <para>
+ <function>picksplit</function> can be applied to a single leaf tuple only
+ in the case that the <function>config</function> function set
+ <structfield>longValuesOK</structfield> to true and a larger-than-a-page input
+ value has been supplied. In this case the point of the operation is
+ to strip off a prefix and produce a new, shorter leaf datum value.
+ The call will be repeated until a leaf datum short enough to fit on
+ a page has been produced. See <xref linkend="spgist-limits"/> for
+ more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><function>inner_consistent</function></term>
+ <listitem>
+ <para>
+ Returns set of nodes (branches) to follow during tree search.
+ </para>
+
+ <para>
+ The <acronym>SQL</acronym> declaration of the function must look like this:
+<programlisting>
+CREATE FUNCTION my_inner_consistent(internal, internal) RETURNS void ...
+</programlisting>
+ The first argument is a pointer to a <structname>spgInnerConsistentIn</structname>
+ C struct, containing input data for the function.
+ The second argument is a pointer to a <structname>spgInnerConsistentOut</structname>
+ C struct, which the function must fill with result data.
+
+<programlisting>
+typedef struct spgInnerConsistentIn
+{
+ ScanKey scankeys; /* array of operators and comparison values */
+ ScanKey orderbys; /* array of ordering operators and comparison
+ * values */
+ int nkeys; /* length of scankeys array */
+ int norderbys; /* length of orderbys array */
+
+ Datum reconstructedValue; /* value reconstructed at parent */
+ void *traversalValue; /* opclass-specific traverse value */
+ MemoryContext traversalMemoryContext; /* put new traverse values here */
+ int level; /* current level (counting from zero) */
+ bool returnData; /* original data must be returned? */
+
+ /* Data from current inner tuple */
+ bool allTheSame; /* tuple is marked all-the-same? */
+ bool hasPrefix; /* tuple has a prefix? */
+ Datum prefixDatum; /* if so, the prefix value */
+ int nNodes; /* number of nodes in the inner tuple */
+ Datum *nodeLabels; /* node label values (NULL if none) */
+} spgInnerConsistentIn;
+
+typedef struct spgInnerConsistentOut
+{
+ int nNodes; /* number of child nodes to be visited */
+ int *nodeNumbers; /* their indexes in the node array */
+ int *levelAdds; /* increment level by this much for each */
+ Datum *reconstructedValues; /* associated reconstructed values */
+ void **traversalValues; /* opclass-specific traverse values */
+ double **distances; /* associated distances */
+} spgInnerConsistentOut;
+</programlisting>
+
+ The array <structfield>scankeys</structfield>, of length <structfield>nkeys</structfield>,
+ describes the index search condition(s). These conditions are
+ combined with AND &mdash; only index entries that satisfy all of
+ them are interesting. (Note that <structfield>nkeys</structfield> = 0 implies
+ that all index entries satisfy the query.) Usually the consistent
+ function only cares about the <structfield>sk_strategy</structfield> and
+ <structfield>sk_argument</structfield> fields of each array entry, which
+ respectively give the indexable operator and comparison value.
+ In particular it is not necessary to check <structfield>sk_flags</structfield> to
+ see if the comparison value is NULL, because the SP-GiST core code
+ will filter out such conditions.
+ The array <structfield>orderbys</structfield>, of length <structfield>norderbys</structfield>,
+ describes ordering operators (if any) in the same manner.
+ <structfield>reconstructedValue</structfield> is the value reconstructed for the
+ parent tuple; it is <literal>(Datum) 0</literal> at the root level or if the
+ <function>inner_consistent</function> function did not provide a value at the
+ parent level.
+ <structfield>traversalValue</structfield> is a pointer to any traverse data
+ passed down from the previous call of <function>inner_consistent</function>
+ on the parent index tuple, or NULL at the root level.
+ <structfield>traversalMemoryContext</structfield> is the memory context in which
+ to store output traverse values (see below).
+ <structfield>level</structfield> is the current inner tuple's level, starting at
+ zero for the root level.
+ <structfield>returnData</structfield> is <literal>true</literal> if reconstructed data is
+ required for this query; this will only be so if the
+ <function>config</function> function asserted <structfield>canReturnData</structfield>.
+ <structfield>allTheSame</structfield> is true if the current inner tuple is
+ marked <quote>all-the-same</quote>; in this case all the nodes have the
+ same label (if any) and so either all or none of them match the query
+ (see <xref linkend="spgist-all-the-same"/>).
+ <structfield>hasPrefix</structfield> is true if the current inner tuple contains
+ a prefix; if so,
+ <structfield>prefixDatum</structfield> is its value.
+ <structfield>nNodes</structfield> is the number of child nodes contained in the
+ inner tuple, and
+ <structfield>nodeLabels</structfield> is an array of their label values, or
+ NULL if the nodes do not have labels.
+ </para>
+
+ <para>
+ <structfield>nNodes</structfield> must be set to the number of child nodes that
+ need to be visited by the search, and
+ <structfield>nodeNumbers</structfield> must be set to an array of their indexes.
+ If the operator class keeps track of levels, set
+ <structfield>levelAdds</structfield> to an array of the level increments
+ required when descending to each node to be visited. (Often these
+ increments will be the same for all the nodes, but that's not
+ necessarily so, so an array is used.)
+ If value reconstruction is needed, set
+ <structfield>reconstructedValues</structfield> to an array of the values
+ reconstructed for each child node to be visited; otherwise, leave
+ <structfield>reconstructedValues</structfield> as NULL.
+ The reconstructed values are assumed to be of type
+ <structname>spgConfigOut</structname>.<structfield>leafType</structfield>.
+ (However, since the core system will do nothing with them except
+ possibly copy them, it is sufficient for them to have the
+ same <literal>typlen</literal> and <literal>typbyval</literal>
+ properties as <structfield>leafType</structfield>.)
+ If ordered search is performed, set <structfield>distances</structfield>
+ to an array of distance values according to <structfield>orderbys</structfield>
+ array (nodes with lowest distances will be processed first). Leave it
+ NULL otherwise.
+ If it is desired to pass down additional out-of-band information
+ (<quote>traverse values</quote>) to lower levels of the tree search,
+ set <structfield>traversalValues</structfield> to an array of the appropriate
+ traverse values, one for each child node to be visited; otherwise,
+ leave <structfield>traversalValues</structfield> as NULL.
+ Note that the <function>inner_consistent</function> function is
+ responsible for palloc'ing the
+ <structfield>nodeNumbers</structfield>, <structfield>levelAdds</structfield>,
+ <structfield>distances</structfield>,
+ <structfield>reconstructedValues</structfield>, and
+ <structfield>traversalValues</structfield> arrays in the current memory context.
+ However, any output traverse values pointed to by
+ the <structfield>traversalValues</structfield> array should be allocated
+ in <structfield>traversalMemoryContext</structfield>.
+ Each traverse value must be a single palloc'd chunk.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><function>leaf_consistent</function></term>
+ <listitem>
+ <para>
+ Returns true if a leaf tuple satisfies a query.
+ </para>
+
+ <para>
+ The <acronym>SQL</acronym> declaration of the function must look like this:
+<programlisting>
+CREATE FUNCTION my_leaf_consistent(internal, internal) RETURNS bool ...
+</programlisting>
+ The first argument is a pointer to a <structname>spgLeafConsistentIn</structname>
+ C struct, containing input data for the function.
+ The second argument is a pointer to a <structname>spgLeafConsistentOut</structname>
+ C struct, which the function must fill with result data.
+<programlisting>
+typedef struct spgLeafConsistentIn
+{
+ ScanKey scankeys; /* array of operators and comparison values */
+ ScanKey orderbys; /* array of ordering operators and comparison
+ * values */
+ int nkeys; /* length of scankeys array */
+ int norderbys; /* length of orderbys array */
+
+ Datum reconstructedValue; /* value reconstructed at parent */
+ void *traversalValue; /* opclass-specific traverse value */
+ int level; /* current level (counting from zero) */
+ bool returnData; /* original data must be returned? */
+
+ Datum leafDatum; /* datum in leaf tuple */
+} spgLeafConsistentIn;
+
+typedef struct spgLeafConsistentOut
+{
+ Datum leafValue; /* reconstructed original data, if any */
+ bool recheck; /* set true if operator must be rechecked */
+ bool recheckDistances; /* set true if distances must be rechecked */
+ double *distances; /* associated distances */
+} spgLeafConsistentOut;
+</programlisting>
+
+ The array <structfield>scankeys</structfield>, of length <structfield>nkeys</structfield>,
+ describes the index search condition(s). These conditions are
+ combined with AND &mdash; only index entries that satisfy all of
+ them satisfy the query. (Note that <structfield>nkeys</structfield> = 0 implies
+ that all index entries satisfy the query.) Usually the consistent
+ function only cares about the <structfield>sk_strategy</structfield> and
+ <structfield>sk_argument</structfield> fields of each array entry, which
+ respectively give the indexable operator and comparison value.
+ In particular it is not necessary to check <structfield>sk_flags</structfield> to
+ see if the comparison value is NULL, because the SP-GiST core code
+ will filter out such conditions.
+ The array <structfield>orderbys</structfield>, of length <structfield>norderbys</structfield>,
+ describes the ordering operators in the same manner.
+ <structfield>reconstructedValue</structfield> is the value reconstructed for the
+ parent tuple; it is <literal>(Datum) 0</literal> at the root level or if the
+ <function>inner_consistent</function> function did not provide a value at the
+ parent level.
+ <structfield>traversalValue</structfield> is a pointer to any traverse data
+ passed down from the previous call of <function>inner_consistent</function>
+ on the parent index tuple, or NULL at the root level.
+ <structfield>level</structfield> is the current leaf tuple's level, starting at
+ zero for the root level.
+ <structfield>returnData</structfield> is <literal>true</literal> if reconstructed data is
+ required for this query; this will only be so if the
+ <function>config</function> function asserted <structfield>canReturnData</structfield>.
+ <structfield>leafDatum</structfield> is the key value of
+ <structname>spgConfigOut</structname>.<structfield>leafType</structfield>
+ stored in the current leaf tuple.
+ </para>
+
+ <para>
+ The function must return <literal>true</literal> if the leaf tuple matches the
+ query, or <literal>false</literal> if not. In the <literal>true</literal> case,
+ if <structfield>returnData</structfield> is <literal>true</literal> then
+ <structfield>leafValue</structfield> must be set to the value (of type
+ <structname>spgConfigIn</structname>.<structfield>attType</structfield>)
+ originally supplied to be indexed for this leaf tuple. Also,
+ <structfield>recheck</structfield> may be set to <literal>true</literal> if the match
+ is uncertain and so the operator(s) must be re-applied to the actual
+ heap tuple to verify the match.
+ If ordered search is performed, set <structfield>distances</structfield>
+ to an array of distance values according to <structfield>orderbys</structfield>
+ array. Leave it NULL otherwise. If at least one of returned distances
+ is not exact, set <structfield>recheckDistances</structfield> to true.
+ In this case, the executor will calculate the exact distances after
+ fetching the tuple from the heap, and will reorder the tuples if needed.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>
+ The optional user-defined methods are:
+ </para>
+
+ <variablelist>
+ <varlistentry>
+ <term><function>Datum compress(Datum in)</function></term>
+ <listitem>
+ <para>
+ Converts a data item into a format suitable for physical storage in
+ a leaf tuple of the index. It accepts a value of type
+ <structname>spgConfigIn</structname>.<structfield>attType</structfield>
+ and returns a value of type
+ <structname>spgConfigOut</structname>.<structfield>leafType</structfield>.
+ The output value must not contain an out-of-line TOAST pointer.
+ </para>
+
+ <para>
+ Note: the <function>compress</function> method is only applied to
+ values to be stored. The consistent methods receive query
+ <structfield>scankeys</structfield> unchanged, without transformation
+ using <function>compress</function>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><function>options</function></term>
+ <listitem>
+ <para>
+ Defines a set of user-visible parameters that control operator class
+ behavior.
+ </para>
+
+ <para>
+ The <acronym>SQL</acronym> declaration of the function must look like this:
+
+<programlisting>
+CREATE OR REPLACE FUNCTION my_options(internal)
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+</programlisting>
+ </para>
+
+ <para>
+ The function is passed a pointer to a <structname>local_relopts</structname>
+ struct, which needs to be filled with a set of operator class
+ specific options. The options can be accessed from other support
+ functions using the <literal>PG_HAS_OPCLASS_OPTIONS()</literal> and
+ <literal>PG_GET_OPCLASS_OPTIONS()</literal> macros.
+ </para>
+
+ <para>
+ Since the representation of the key in <acronym>SP-GiST</acronym> is
+ flexible, it may depend on user-specified parameters.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>
+ All the SP-GiST support methods are normally called in a short-lived
+ memory context; that is, <varname>CurrentMemoryContext</varname> will be reset
+ after processing of each tuple. It is therefore not very important to
+ worry about pfree'ing everything you palloc. (The <function>config</function>
+ method is an exception: it should try to avoid leaking memory. But
+ usually the <function>config</function> method need do nothing but assign
+ constants into the passed parameter struct.)
+ </para>
+
+ <para>
+ If the indexed column is of a collatable data type, the index collation
+ will be passed to all the support methods, using the standard
+ <function>PG_GET_COLLATION()</function> mechanism.
+ </para>
+
+</sect1>
+
+<sect1 id="spgist-implementation">
+ <title>Implementation</title>
+
+ <para>
+ This section covers implementation details and other tricks that are
+ useful for implementers of <acronym>SP-GiST</acronym> operator classes to
+ know.
+ </para>
+
+ <sect2 id="spgist-limits">
+ <title>SP-GiST Limits</title>
+
+ <para>
+ Individual leaf tuples and inner tuples must fit on a single index page
+ (8kB by default). Therefore, when indexing values of variable-length
+ data types, long values can only be supported by methods such as radix
+ trees, in which each level of the tree includes a prefix that is short
+ enough to fit on a page, and the final leaf level includes a suffix also
+ short enough to fit on a page. The operator class should set
+ <structfield>longValuesOK</structfield> to true only if it is prepared to arrange for
+ this to happen. Otherwise, the <acronym>SP-GiST</acronym> core will
+ reject any request to index a value that is too large to fit
+ on an index page.
+ </para>
+
+ <para>
+ Likewise, it is the operator class's responsibility that inner tuples
+ do not grow too large to fit on an index page; this limits the number
+ of child nodes that can be used in one inner tuple, as well as the
+ maximum size of a prefix value.
+ </para>
+
+ <para>
+ Another limitation is that when an inner tuple's node points to a set
+ of leaf tuples, those tuples must all be in the same index page.
+ (This is a design decision to reduce seeking and save space in the
+ links that chain such tuples together.) If the set of leaf tuples
+ grows too large for a page, a split is performed and an intermediate
+ inner tuple is inserted. For this to fix the problem, the new inner
+ tuple <emphasis>must</emphasis> divide the set of leaf values into more than one
+ node group. If the operator class's <function>picksplit</function> function
+ fails to do that, the <acronym>SP-GiST</acronym> core resorts to
+ extraordinary measures described in <xref linkend="spgist-all-the-same"/>.
+ </para>
+
+ <para>
+ When <structfield>longValuesOK</structfield> is true, it is expected
+ that successive levels of the <acronym>SP-GiST</acronym> tree will
+ absorb more and more information into the prefixes and node labels of
+ the inner tuples, making the required leaf datum smaller and smaller,
+ so that eventually it will fit on a page.
+ To prevent bugs in operator classes from causing infinite insertion
+ loops, the <acronym>SP-GiST</acronym> core will raise an error if the
+ leaf datum does not become any smaller within ten cycles
+ of <function>choose</function> method calls.
+ </para>
+ </sect2>
+
+ <sect2 id="spgist-null-labels">
+ <title>SP-GiST Without Node Labels</title>
+
+ <para>
+ Some tree algorithms use a fixed set of nodes for each inner tuple;
+ for example, in a quad-tree there are always exactly four nodes
+ corresponding to the four quadrants around the inner tuple's centroid
+ point. In such a case the code typically works with the nodes by
+ number, and there is no need for explicit node labels. To suppress
+ node labels (and thereby save some space), the <function>picksplit</function>
+ function can return NULL for the <structfield>nodeLabels</structfield> array,
+ and likewise the <function>choose</function> function can return NULL for
+ the <structfield>prefixNodeLabels</structfield> array during
+ a <literal>spgSplitTuple</literal> action.
+ This will in turn result in <structfield>nodeLabels</structfield> being NULL during
+ subsequent calls to <function>choose</function> and <function>inner_consistent</function>.
+ In principle, node labels could be used for some inner tuples and omitted
+ for others in the same index.
+ </para>
+
+ <para>
+ When working with an inner tuple having unlabeled nodes, it is an error
+ for <function>choose</function> to return <literal>spgAddNode</literal>, since the set
+ of nodes is supposed to be fixed in such cases.
+ </para>
+ </sect2>
+
+ <sect2 id="spgist-all-the-same">
+ <title><quote>All-the-Same</quote> Inner Tuples</title>
+
+ <para>
+ The <acronym>SP-GiST</acronym> core can override the results of the
+ operator class's <function>picksplit</function> function when
+ <function>picksplit</function> fails to divide the supplied leaf values into
+ at least two node categories. When this happens, the new inner tuple
+ is created with multiple nodes that each have the same label (if any)
+ that <function>picksplit</function> gave to the one node it did use, and the
+ leaf values are divided at random among these equivalent nodes.
+ The <literal>allTheSame</literal> flag is set on the inner tuple to warn the
+ <function>choose</function> and <function>inner_consistent</function> functions that the
+ tuple does not have the node set that they might otherwise expect.
+ </para>
+
+ <para>
+ When dealing with an <literal>allTheSame</literal> tuple, a <function>choose</function>
+ result of <literal>spgMatchNode</literal> is interpreted to mean that the new
+ value can be assigned to any of the equivalent nodes; the core code will
+ ignore the supplied <structfield>nodeN</structfield> value and descend into one
+ of the nodes at random (so as to keep the tree balanced). It is an
+ error for <function>choose</function> to return <literal>spgAddNode</literal>, since
+ that would make the nodes not all equivalent; the
+ <literal>spgSplitTuple</literal> action must be used if the value to be inserted
+ doesn't match the existing nodes.
+ </para>
+
+ <para>
+ When dealing with an <literal>allTheSame</literal> tuple, the
+ <function>inner_consistent</function> function should return either all or none
+ of the nodes as targets for continuing the index search, since they are
+ all equivalent. This may or may not require any special-case code,
+ depending on how much the <function>inner_consistent</function> function normally
+ assumes about the meaning of the nodes.
+ </para>
+ </sect2>
+
+</sect1>
+
+<sect1 id="spgist-examples">
+ <title>Examples</title>
+
+ <para>
+ The <productname>PostgreSQL</productname> source distribution includes
+ several examples of index operator classes for <acronym>SP-GiST</acronym>,
+ as described in <xref linkend="spgist-builtin-opclasses-table"/>. Look
+ into <filename>src/backend/access/spgist/</filename>
+ and <filename>src/backend/utils/adt/</filename> to see the code.
+ </para>
+
+</sect1>
+
+</chapter>