diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-04 12:19:15 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-04 12:19:15 +0000 |
commit | 6eb9c5a5657d1fe77b55cc261450f3538d35a94d (patch) | |
tree | 657d8194422a5daccecfd42d654b8a245ef7b4c8 /doc/src/sgml/xindex.sgml | |
parent | Initial commit. (diff) | |
download | postgresql-13-6eb9c5a5657d1fe77b55cc261450f3538d35a94d.tar.xz postgresql-13-6eb9c5a5657d1fe77b55cc261450f3538d35a94d.zip |
Adding upstream version 13.4.upstream/13.4upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/xindex.sgml')
-rw-r--r-- | doc/src/sgml/xindex.sgml | 1446 |
1 files changed, 1446 insertions, 0 deletions
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml new file mode 100644 index 0000000..609fa35 --- /dev/null +++ b/doc/src/sgml/xindex.sgml @@ -0,0 +1,1446 @@ +<!-- doc/src/sgml/xindex.sgml --> + +<sect1 id="xindex"> + <title>Interfacing Extensions to Indexes</title> + + <indexterm zone="xindex"> + <primary>index</primary> + <secondary>for user-defined data type</secondary> + </indexterm> + + <para> + The procedures described thus far let you define new types, new + functions, and new operators. However, we cannot yet define an + index on a column of a new data type. To do this, we must define an + <firstterm>operator class</firstterm> for the new data type. Later in this + section, we will illustrate this concept in an example: a new + operator class for the B-tree index method that stores and sorts + complex numbers in ascending absolute value order. + </para> + + <para> + Operator classes can be grouped into <firstterm>operator families</firstterm> + to show the relationships between semantically compatible classes. + When only a single data type is involved, an operator class is sufficient, + so we'll focus on that case first and then return to operator families. + </para> + + <sect2 id="xindex-opclass"> + <title>Index Methods and Operator Classes</title> + + <para> + The <classname>pg_am</classname> table contains one row for every + index method (internally known as access method). Support for + regular access to tables is built into + <productname>PostgreSQL</productname>, but all index methods are + described in <classname>pg_am</classname>. It is possible to add a + new index access method by writing the necessary code and + then creating an entry in <classname>pg_am</classname> — but that is + beyond the scope of this chapter (see <xref linkend="indexam"/>). + </para> + + <para> + The routines for an index method do not directly know anything + about the data types that the index method will operate on. + Instead, an <firstterm>operator + class</firstterm><indexterm><primary>operator class</primary></indexterm> + identifies the set of operations that the index method needs to use + to work with a particular data type. Operator classes are so + called because one thing they specify is the set of + <literal>WHERE</literal>-clause operators that can be used with an index + (i.e., can be converted into an index-scan qualification). An + operator class can also specify some <firstterm>support + function</firstterm> that are needed by the internal operations of the + index method, but do not directly correspond to any + <literal>WHERE</literal>-clause operator that can be used with the index. + </para> + + <para> + It is possible to define multiple operator classes for the same + data type and index method. By doing this, multiple + sets of indexing semantics can be defined for a single data type. + For example, a B-tree index requires a sort ordering to be defined + for each data type it works on. + It might be useful for a complex-number data type + to have one B-tree operator class that sorts the data by complex + absolute value, another that sorts by real part, and so on. + Typically, one of the operator classes will be deemed most commonly + useful and will be marked as the default operator class for that + data type and index method. + </para> + + <para> + The same operator class name + can be used for several different index methods (for example, both B-tree + and hash index methods have operator classes named + <literal>int4_ops</literal>), but each such class is an independent + entity and must be defined separately. + </para> + </sect2> + + <sect2 id="xindex-strategies"> + <title>Index Method Strategies</title> + + <para> + The operators associated with an operator class are identified by + <quote>strategy numbers</quote>, which serve to identify the semantics of + each operator within the context of its operator class. + For example, B-trees impose a strict ordering on keys, lesser to greater, + and so operators like <quote>less than</quote> and <quote>greater than or equal + to</quote> are interesting with respect to a B-tree. + Because + <productname>PostgreSQL</productname> allows the user to define operators, + <productname>PostgreSQL</productname> cannot look at the name of an operator + (e.g., <literal><</literal> or <literal>>=</literal>) and tell what kind of + comparison it is. Instead, the index method defines a set of + <quote>strategies</quote>, which can be thought of as generalized operators. + Each operator class specifies which actual operator corresponds to each + strategy for a particular data type and interpretation of the index + semantics. + </para> + + <para> + The B-tree index method defines five strategies, shown in <xref + linkend="xindex-btree-strat-table"/>. + </para> + + <table tocentry="1" id="xindex-btree-strat-table"> + <title>B-Tree Strategies</title> + <tgroup cols="2"> + <thead> + <row> + <entry>Operation</entry> + <entry>Strategy Number</entry> + </row> + </thead> + <tbody> + <row> + <entry>less than</entry> + <entry>1</entry> + </row> + <row> + <entry>less than or equal</entry> + <entry>2</entry> + </row> + <row> + <entry>equal</entry> + <entry>3</entry> + </row> + <row> + <entry>greater than or equal</entry> + <entry>4</entry> + </row> + <row> + <entry>greater than</entry> + <entry>5</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + Hash indexes support only equality comparisons, and so they use only one + strategy, shown in <xref linkend="xindex-hash-strat-table"/>. + </para> + + <table tocentry="1" id="xindex-hash-strat-table"> + <title>Hash Strategies</title> + <tgroup cols="2"> + <thead> + <row> + <entry>Operation</entry> + <entry>Strategy Number</entry> + </row> + </thead> + <tbody> + <row> + <entry>equal</entry> + <entry>1</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + GiST indexes are more flexible: they do not have a fixed set of + strategies at all. Instead, the <quote>consistency</quote> support routine + of each particular GiST operator class interprets the strategy numbers + however it likes. As an example, several of the built-in GiST index + operator classes index two-dimensional geometric objects, providing + the <quote>R-tree</quote> strategies shown in + <xref linkend="xindex-rtree-strat-table"/>. Four of these are true + two-dimensional tests (overlaps, same, contains, contained by); + four of them consider only the X direction; and the other four + provide the same tests in the Y direction. + </para> + + <table tocentry="1" id="xindex-rtree-strat-table"> + <title>GiST Two-Dimensional <quote>R-tree</quote> Strategies</title> + <tgroup cols="2"> + <thead> + <row> + <entry>Operation</entry> + <entry>Strategy Number</entry> + </row> + </thead> + <tbody> + <row> + <entry>strictly left of</entry> + <entry>1</entry> + </row> + <row> + <entry>does not extend to right of</entry> + <entry>2</entry> + </row> + <row> + <entry>overlaps</entry> + <entry>3</entry> + </row> + <row> + <entry>does not extend to left of</entry> + <entry>4</entry> + </row> + <row> + <entry>strictly right of</entry> + <entry>5</entry> + </row> + <row> + <entry>same</entry> + <entry>6</entry> + </row> + <row> + <entry>contains</entry> + <entry>7</entry> + </row> + <row> + <entry>contained by</entry> + <entry>8</entry> + </row> + <row> + <entry>does not extend above</entry> + <entry>9</entry> + </row> + <row> + <entry>strictly below</entry> + <entry>10</entry> + </row> + <row> + <entry>strictly above</entry> + <entry>11</entry> + </row> + <row> + <entry>does not extend below</entry> + <entry>12</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + SP-GiST indexes are similar to GiST indexes in flexibility: they don't have + a fixed set of strategies. Instead the support routines of each operator + class interpret the strategy numbers according to the operator class's + definition. As an example, the strategy numbers used by the built-in + operator classes for points are shown in <xref + linkend="xindex-spgist-point-strat-table"/>. + </para> + + <table tocentry="1" id="xindex-spgist-point-strat-table"> + <title>SP-GiST Point Strategies</title> + <tgroup cols="2"> + <thead> + <row> + <entry>Operation</entry> + <entry>Strategy Number</entry> + </row> + </thead> + <tbody> + <row> + <entry>strictly left of</entry> + <entry>1</entry> + </row> + <row> + <entry>strictly right of</entry> + <entry>5</entry> + </row> + <row> + <entry>same</entry> + <entry>6</entry> + </row> + <row> + <entry>contained by</entry> + <entry>8</entry> + </row> + <row> + <entry>strictly below</entry> + <entry>10</entry> + </row> + <row> + <entry>strictly above</entry> + <entry>11</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + GIN indexes are similar to GiST and SP-GiST indexes, in that they don't + have a fixed set of strategies either. Instead the support routines of + each operator class interpret the strategy numbers according to the + operator class's definition. As an example, the strategy numbers used by + the built-in operator class for arrays are shown in + <xref linkend="xindex-gin-array-strat-table"/>. + </para> + + <table tocentry="1" id="xindex-gin-array-strat-table"> + <title>GIN Array Strategies</title> + <tgroup cols="2"> + <thead> + <row> + <entry>Operation</entry> + <entry>Strategy Number</entry> + </row> + </thead> + <tbody> + <row> + <entry>overlap</entry> + <entry>1</entry> + </row> + <row> + <entry>contains</entry> + <entry>2</entry> + </row> + <row> + <entry>is contained by</entry> + <entry>3</entry> + </row> + <row> + <entry>equal</entry> + <entry>4</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + BRIN indexes are similar to GiST, SP-GiST and GIN indexes in that they + don't have a fixed set of strategies either. Instead the support routines + of each operator class interpret the strategy numbers according to the + operator class's definition. As an example, the strategy numbers used by + the built-in <literal>Minmax</literal> operator classes are shown in + <xref linkend="xindex-brin-minmax-strat-table"/>. + </para> + + <table tocentry="1" id="xindex-brin-minmax-strat-table"> + <title>BRIN Minmax Strategies</title> + <tgroup cols="2"> + <thead> + <row> + <entry>Operation</entry> + <entry>Strategy Number</entry> + </row> + </thead> + <tbody> + <row> + <entry>less than</entry> + <entry>1</entry> + </row> + <row> + <entry>less than or equal</entry> + <entry>2</entry> + </row> + <row> + <entry>equal</entry> + <entry>3</entry> + </row> + <row> + <entry>greater than or equal</entry> + <entry>4</entry> + </row> + <row> + <entry>greater than</entry> + <entry>5</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + Notice that all the operators listed above return Boolean values. In + practice, all operators defined as index method search operators must + return type <type>boolean</type>, since they must appear at the top + level of a <literal>WHERE</literal> clause to be used with an index. + (Some index access methods also support <firstterm>ordering operators</firstterm>, + which typically don't return Boolean values; that feature is discussed + in <xref linkend="xindex-ordering-ops"/>.) + </para> + </sect2> + + <sect2 id="xindex-support"> + <title>Index Method Support Routines</title> + + <para> + Strategies aren't usually enough information for the system to figure + out how to use an index. In practice, the index methods require + additional support routines in order to work. For example, the B-tree + index method must be able to compare two keys and determine whether one + is greater than, equal to, or less than the other. Similarly, the + hash index method must be able to compute hash codes for key values. + These operations do not correspond to operators used in qualifications in + SQL commands; they are administrative routines used by + the index methods, internally. + </para> + + <para> + Just as with strategies, the operator class identifies which specific + functions should play each of these roles for a given data type and + semantic interpretation. The index method defines the set + of functions it needs, and the operator class identifies the correct + functions to use by assigning them to the <quote>support function numbers</quote> + specified by the index method. + </para> + + <para> + Additionally, some opclasses allow users to specify parameters which + control their behavior. Each builtin index access method has an optional + <function>options</function> support function, which defines a set of + opclass-specific parameters. + </para> + + <para> + B-trees require a comparison support function, + and allow four additional support functions to be + supplied at the operator class author's option, as shown in <xref + linkend="xindex-btree-support-table"/>. + The requirements for these support functions are explained further in + <xref linkend="btree-support-funcs"/>. + </para> + + <table tocentry="1" id="xindex-btree-support-table"> + <title>B-Tree Support Functions</title> + <tgroup cols="2"> + <colspec colname="col1" colwidth="3*"/> + <colspec colname="col2" colwidth="1*"/> + <thead> + <row> + <entry>Function</entry> + <entry>Support Number</entry> + </row> + </thead> + <tbody> + <row> + <entry> + Compare two keys and return an integer less than zero, zero, or + greater than zero, indicating whether the first key is less than, + equal to, or greater than the second + </entry> + <entry>1</entry> + </row> + <row> + <entry> + Return the addresses of C-callable sort support function(s) + (optional) + </entry> + <entry>2</entry> + </row> + <row> + <entry> + Compare a test value to a base value plus/minus an offset, and return + true or false according to the comparison result (optional) + </entry> + <entry>3</entry> + </row> + <row> + <entry> + Determine if it is safe for indexes that use the operator + class to apply the btree deduplication optimization (optional) + </entry> + <entry>4</entry> + </row> + <row> + <entry> + Defines a set of options that are specific to this operator class + (optional) + </entry> + <entry>5</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + Hash indexes require one support function, and allow two additional ones to + be supplied at the operator class author's option, as shown in <xref + linkend="xindex-hash-support-table"/>. + </para> + + <table tocentry="1" id="xindex-hash-support-table"> + <title>Hash Support Functions</title> + <tgroup cols="2"> + <colspec colname="col1" colwidth="3*"/> + <colspec colname="col2" colwidth="1*"/> + <thead> + <row> + <entry>Function</entry> + <entry>Support Number</entry> + </row> + </thead> + <tbody> + <row> + <entry>Compute the 32-bit hash value for a key</entry> + <entry>1</entry> + </row> + <row> + <entry> + Compute the 64-bit hash value for a key given a 64-bit salt; if + the salt is 0, the low 32 bits of the result must match the value + that would have been computed by function 1 + (optional) + </entry> + <entry>2</entry> + </row> + <row> + <entry> + Defines a set of options that are specific to this operator class + (optional) + </entry> + <entry>3</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + GiST indexes have ten support functions, three of which are optional, + as shown in <xref linkend="xindex-gist-support-table"/>. + (For more information see <xref linkend="gist"/>.) + </para> + + <table tocentry="1" id="xindex-gist-support-table"> + <title>GiST Support Functions</title> + <tgroup cols="3"> + <colspec colname="col1" colwidth="2*"/> + <colspec colname="col2" colwidth="3*"/> + <colspec colname="col3" colwidth="1*"/> + <thead> + <row> + <entry>Function</entry> + <entry>Description</entry> + <entry>Support Number</entry> + </row> + </thead> + <tbody> + <row> + <entry><function>consistent</function></entry> + <entry>determine whether key satisfies the + query qualifier</entry> + <entry>1</entry> + </row> + <row> + <entry><function>union</function></entry> + <entry>compute union of a set of keys</entry> + <entry>2</entry> + </row> + <row> + <entry><function>compress</function></entry> + <entry>compute a compressed representation of a key or value + to be indexed</entry> + <entry>3</entry> + </row> + <row> + <entry><function>decompress</function></entry> + <entry>compute a decompressed representation of a + compressed key</entry> + <entry>4</entry> + </row> + <row> + <entry><function>penalty</function></entry> + <entry>compute penalty for inserting new key into subtree + with given subtree's key</entry> + <entry>5</entry> + </row> + <row> + <entry><function>picksplit</function></entry> + <entry>determine which entries of a page are to be moved + to the new page and compute the union keys for resulting pages</entry> + <entry>6</entry> + </row> + <row> + <entry><function>equal</function></entry> + <entry>compare two keys and return true if they are equal</entry> + <entry>7</entry> + </row> + <row> + <entry><function>distance</function></entry> + <entry>determine distance from key to query value (optional)</entry> + <entry>8</entry> + </row> + <row> + <entry><function>fetch</function></entry> + <entry>compute original representation of a compressed key for + index-only scans (optional)</entry> + <entry>9</entry> + </row> + <row> + <entry><function>options</function></entry> + <entry> + Defines a set of options that are specific to this operator class + (optional) + </entry> + <entry>10</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + SP-GiST indexes have six support functions, one of which is optional, as + shown in <xref linkend="xindex-spgist-support-table"/>. + (For more information see <xref linkend="spgist"/>.) + </para> + + <table tocentry="1" id="xindex-spgist-support-table"> + <title>SP-GiST Support Functions</title> + <tgroup cols="3"> + <colspec colname="col1" colwidth="2*"/> + <colspec colname="col2" colwidth="3*"/> + <colspec colname="col3" colwidth="1*"/> + <thead> + <row> + <entry>Function</entry> + <entry>Description</entry> + <entry>Support Number</entry> + </row> + </thead> + <tbody> + <row> + <entry><function>config</function></entry> + <entry>provide basic information about the operator class</entry> + <entry>1</entry> + </row> + <row> + <entry><function>choose</function></entry> + <entry>determine how to insert a new value into an inner tuple</entry> + <entry>2</entry> + </row> + <row> + <entry><function>picksplit</function></entry> + <entry>determine how to partition a set of values</entry> + <entry>3</entry> + </row> + <row> + <entry><function>inner_consistent</function></entry> + <entry>determine which sub-partitions need to be searched for a + query</entry> + <entry>4</entry> + </row> + <row> + <entry><function>leaf_consistent</function></entry> + <entry>determine whether key satisfies the + query qualifier</entry> + <entry>5</entry> + </row> + <row> + <entry><function>options</function></entry> + <entry> + Defines a set of options that are specific to this operator class + (optional) + </entry> + <entry>6</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + GIN indexes have seven support functions, four of which are optional, + as shown in <xref linkend="xindex-gin-support-table"/>. + (For more information see <xref linkend="gin"/>.) + </para> + + <table tocentry="1" id="xindex-gin-support-table"> + <title>GIN Support Functions</title> + <tgroup cols="3"> + <colspec colname="col1" colwidth="2*"/> + <colspec colname="col2" colwidth="3*"/> + <colspec colname="col3" colwidth="1*"/> + <thead> + <row> + <entry>Function</entry> + <entry>Description</entry> + <entry>Support Number</entry> + </row> + </thead> + <tbody> + <row> + <entry><function>compare</function></entry> + <entry> + compare two keys and return an integer less than zero, zero, + or greater than zero, indicating whether the first key is less than, + equal to, or greater than the second + </entry> + <entry>1</entry> + </row> + <row> + <entry><function>extractValue</function></entry> + <entry>extract keys from a value to be indexed</entry> + <entry>2</entry> + </row> + <row> + <entry><function>extractQuery</function></entry> + <entry>extract keys from a query condition</entry> + <entry>3</entry> + </row> + <row> + <entry><function>consistent</function></entry> + <entry> + determine whether value matches query condition (Boolean variant) + (optional if support function 6 is present) + </entry> + <entry>4</entry> + </row> + <row> + <entry><function>comparePartial</function></entry> + <entry> + compare partial key from + query and key from index, and return an integer less than zero, zero, + or greater than zero, indicating whether GIN should ignore this index + entry, treat the entry as a match, or stop the index scan (optional) + </entry> + <entry>5</entry> + </row> + <row> + <entry><function>triConsistent</function></entry> + <entry> + determine whether value matches query condition (ternary variant) + (optional if support function 4 is present) + </entry> + <entry>6</entry> + </row> + <row> + <entry><function>options</function></entry> + <entry> + Defines a set of options that are specific to this operator class + (optional) + </entry> + <entry>7</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + BRIN indexes have five basic support functions, one of which is optional, + as shown in <xref linkend="xindex-brin-support-table"/>. Some versions of + the basic functions require additional support functions to be provided. + (For more information see <xref linkend="brin-extensibility"/>.) + </para> + + <table tocentry="1" id="xindex-brin-support-table"> + <title>BRIN Support Functions</title> + <tgroup cols="3"> + <colspec colname="col1" colwidth="2*"/> + <colspec colname="col2" colwidth="3*"/> + <colspec colname="col3" colwidth="1*"/> + <thead> + <row> + <entry>Function</entry> + <entry>Description</entry> + <entry>Support Number</entry> + </row> + </thead> + <tbody> + <row> + <entry><function>opcInfo</function></entry> + <entry> + return internal information describing the indexed columns' + summary data + </entry> + <entry>1</entry> + </row> + <row> + <entry><function>add_value</function></entry> + <entry>add a new value to an existing summary index tuple</entry> + <entry>2</entry> + </row> + <row> + <entry><function>consistent</function></entry> + <entry>determine whether value matches query condition</entry> + <entry>3</entry> + </row> + <row> + <entry><function>union</function></entry> + <entry> + compute union of two summary tuples + </entry> + <entry>4</entry> + </row> + <row> + <entry><function>options</function></entry> + <entry> + Defines a set of options that are specific to this operator class + (optional) + </entry> + <entry>5</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + Unlike search operators, support functions return whichever data + type the particular index method expects; for example in the case + of the comparison function for B-trees, a signed integer. The number + and types of the arguments to each support function are likewise + dependent on the index method. For B-tree and hash the comparison and + hashing support functions take the same input data types as do the + operators included in the operator class, but this is not the case for + most GiST, SP-GiST, GIN, and BRIN support functions. + </para> + </sect2> + + <sect2 id="xindex-example"> + <title>An Example</title> + + <para> + Now that we have seen the ideas, here is the promised example of + creating a new operator class. + (You can find a working copy of this example in + <filename>src/tutorial/complex.c</filename> and + <filename>src/tutorial/complex.sql</filename> in the source + distribution.) + The operator class encapsulates + operators that sort complex numbers in absolute value order, so we + choose the name <literal>complex_abs_ops</literal>. First, we need + a set of operators. The procedure for defining operators was + discussed in <xref linkend="xoper"/>. For an operator class on + B-trees, the operators we require are: + + <itemizedlist spacing="compact"> + <listitem><simpara>absolute-value less-than (strategy 1)</simpara></listitem> + <listitem><simpara>absolute-value less-than-or-equal (strategy 2)</simpara></listitem> + <listitem><simpara>absolute-value equal (strategy 3)</simpara></listitem> + <listitem><simpara>absolute-value greater-than-or-equal (strategy 4)</simpara></listitem> + <listitem><simpara>absolute-value greater-than (strategy 5)</simpara></listitem> + </itemizedlist> + </para> + + <para> + The least error-prone way to define a related set of comparison operators + is to write the B-tree comparison support function first, and then write the + other functions as one-line wrappers around the support function. This + reduces the odds of getting inconsistent results for corner cases. + Following this approach, we first write: + +<programlisting><![CDATA[ +#define Mag(c) ((c)->x*(c)->x + (c)->y*(c)->y) + +static int +complex_abs_cmp_internal(Complex *a, Complex *b) +{ + double amag = Mag(a), + bmag = Mag(b); + + if (amag < bmag) + return -1; + if (amag > bmag) + return 1; + return 0; +} +]]> +</programlisting> + + Now the less-than function looks like: + +<programlisting><![CDATA[ +PG_FUNCTION_INFO_V1(complex_abs_lt); + +Datum +complex_abs_lt(PG_FUNCTION_ARGS) +{ + Complex *a = (Complex *) PG_GETARG_POINTER(0); + Complex *b = (Complex *) PG_GETARG_POINTER(1); + + PG_RETURN_BOOL(complex_abs_cmp_internal(a, b) < 0); +} +]]> +</programlisting> + + The other four functions differ only in how they compare the internal + function's result to zero. + </para> + + <para> + Next we declare the functions and the operators based on the functions + to SQL: + +<programlisting> +CREATE FUNCTION complex_abs_lt(complex, complex) RETURNS bool + AS '<replaceable>filename</replaceable>', 'complex_abs_lt' + LANGUAGE C IMMUTABLE STRICT; + +CREATE OPERATOR < ( + leftarg = complex, rightarg = complex, procedure = complex_abs_lt, + commutator = > , negator = >= , + restrict = scalarltsel, join = scalarltjoinsel +); +</programlisting> + It is important to specify the correct commutator and negator operators, + as well as suitable restriction and join selectivity + functions, otherwise the optimizer will be unable to make effective + use of the index. + </para> + + <para> + Other things worth noting are happening here: + + <itemizedlist> + <listitem> + <para> + There can only be one operator named, say, <literal>=</literal> + and taking type <type>complex</type> for both operands. In this + case we don't have any other operator <literal>=</literal> for + <type>complex</type>, but if we were building a practical data + type we'd probably want <literal>=</literal> to be the ordinary + equality operation for complex numbers (and not the equality of + the absolute values). In that case, we'd need to use some other + operator name for <function>complex_abs_eq</function>. + </para> + </listitem> + + <listitem> + <para> + Although <productname>PostgreSQL</productname> can cope with + functions having the same SQL name as long as they have different + argument data types, C can only cope with one global function + having a given name. So we shouldn't name the C function + something simple like <filename>abs_eq</filename>. Usually it's + a good practice to include the data type name in the C function + name, so as not to conflict with functions for other data types. + </para> + </listitem> + + <listitem> + <para> + We could have made the SQL name + of the function <filename>abs_eq</filename>, relying on + <productname>PostgreSQL</productname> to distinguish it by + argument data types from any other SQL function of the same name. + To keep the example simple, we make the function have the same + names at the C level and SQL level. + </para> + </listitem> + </itemizedlist> + </para> + + <para> + The next step is the registration of the support routine required + by B-trees. The example C code that implements this is in the same + file that contains the operator functions. This is how we declare + the function: + +<programlisting> +CREATE FUNCTION complex_abs_cmp(complex, complex) + RETURNS integer + AS '<replaceable>filename</replaceable>' + LANGUAGE C IMMUTABLE STRICT; +</programlisting> + </para> + + <para> + Now that we have the required operators and support routine, + we can finally create the operator class: + +<programlisting><![CDATA[ +CREATE OPERATOR CLASS complex_abs_ops + DEFAULT FOR TYPE complex USING btree AS + OPERATOR 1 < , + OPERATOR 2 <= , + OPERATOR 3 = , + OPERATOR 4 >= , + OPERATOR 5 > , + FUNCTION 1 complex_abs_cmp(complex, complex); +]]> +</programlisting> + </para> + + <para> + And we're done! It should now be possible to create + and use B-tree indexes on <type>complex</type> columns. + </para> + + <para> + We could have written the operator entries more verbosely, as in: +<programlisting> + OPERATOR 1 < (complex, complex) , +</programlisting> + but there is no need to do so when the operators take the same data type + we are defining the operator class for. + </para> + + <para> + The above example assumes that you want to make this new operator class the + default B-tree operator class for the <type>complex</type> data type. + If you don't, just leave out the word <literal>DEFAULT</literal>. + </para> + </sect2> + + <sect2 id="xindex-opfamily"> + <title>Operator Classes and Operator Families</title> + + <para> + So far we have implicitly assumed that an operator class deals with + only one data type. While there certainly can be only one data type in + a particular index column, it is often useful to index operations that + compare an indexed column to a value of a different data type. Also, + if there is use for a cross-data-type operator in connection with an + operator class, it is often the case that the other data type has a + related operator class of its own. It is helpful to make the connections + between related classes explicit, because this can aid the planner in + optimizing SQL queries (particularly for B-tree operator classes, since + the planner contains a great deal of knowledge about how to work with them). + </para> + + <para> + To handle these needs, <productname>PostgreSQL</productname> + uses the concept of an <firstterm>operator + family</firstterm><indexterm><primary>operator family</primary></indexterm>. + An operator family contains one or more operator classes, and can also + contain indexable operators and corresponding support functions that + belong to the family as a whole but not to any single class within the + family. We say that such operators and functions are <quote>loose</quote> + within the family, as opposed to being bound into a specific class. + Typically each operator class contains single-data-type operators + while cross-data-type operators are loose in the family. + </para> + + <para> + All the operators and functions in an operator family must have compatible + semantics, where the compatibility requirements are set by the index + method. You might therefore wonder why bother to single out particular + subsets of the family as operator classes; and indeed for many purposes + the class divisions are irrelevant and the family is the only interesting + grouping. The reason for defining operator classes is that they specify + how much of the family is needed to support any particular index. + If there is an index using an operator class, then that operator class + cannot be dropped without dropping the index — but other parts of + the operator family, namely other operator classes and loose operators, + could be dropped. Thus, an operator class should be specified to contain + the minimum set of operators and functions that are reasonably needed + to work with an index on a specific data type, and then related but + non-essential operators can be added as loose members of the operator + family. + </para> + + <para> + As an example, <productname>PostgreSQL</productname> has a built-in + B-tree operator family <literal>integer_ops</literal>, which includes operator + classes <literal>int8_ops</literal>, <literal>int4_ops</literal>, and + <literal>int2_ops</literal> for indexes on <type>bigint</type> (<type>int8</type>), + <type>integer</type> (<type>int4</type>), and <type>smallint</type> (<type>int2</type>) + columns respectively. The family also contains cross-data-type comparison + operators allowing any two of these types to be compared, so that an index + on one of these types can be searched using a comparison value of another + type. The family could be duplicated by these definitions: + +<programlisting><![CDATA[ +CREATE OPERATOR FAMILY integer_ops USING btree; + +CREATE OPERATOR CLASS int8_ops +DEFAULT FOR TYPE int8 USING btree FAMILY integer_ops AS + -- standard int8 comparisons + OPERATOR 1 < , + OPERATOR 2 <= , + OPERATOR 3 = , + OPERATOR 4 >= , + OPERATOR 5 > , + FUNCTION 1 btint8cmp(int8, int8) , + FUNCTION 2 btint8sortsupport(internal) , + FUNCTION 3 in_range(int8, int8, int8, boolean, boolean) , + FUNCTION 4 btequalimage(oid) ; + +CREATE OPERATOR CLASS int4_ops +DEFAULT FOR TYPE int4 USING btree FAMILY integer_ops AS + -- standard int4 comparisons + OPERATOR 1 < , + OPERATOR 2 <= , + OPERATOR 3 = , + OPERATOR 4 >= , + OPERATOR 5 > , + FUNCTION 1 btint4cmp(int4, int4) , + FUNCTION 2 btint4sortsupport(internal) , + FUNCTION 3 in_range(int4, int4, int4, boolean, boolean) , + FUNCTION 4 btequalimage(oid) ; + +CREATE OPERATOR CLASS int2_ops +DEFAULT FOR TYPE int2 USING btree FAMILY integer_ops AS + -- standard int2 comparisons + OPERATOR 1 < , + OPERATOR 2 <= , + OPERATOR 3 = , + OPERATOR 4 >= , + OPERATOR 5 > , + FUNCTION 1 btint2cmp(int2, int2) , + FUNCTION 2 btint2sortsupport(internal) , + FUNCTION 3 in_range(int2, int2, int2, boolean, boolean) , + FUNCTION 4 btequalimage(oid) ; + +ALTER OPERATOR FAMILY integer_ops USING btree ADD + -- cross-type comparisons int8 vs int2 + OPERATOR 1 < (int8, int2) , + OPERATOR 2 <= (int8, int2) , + OPERATOR 3 = (int8, int2) , + OPERATOR 4 >= (int8, int2) , + OPERATOR 5 > (int8, int2) , + FUNCTION 1 btint82cmp(int8, int2) , + + -- cross-type comparisons int8 vs int4 + OPERATOR 1 < (int8, int4) , + OPERATOR 2 <= (int8, int4) , + OPERATOR 3 = (int8, int4) , + OPERATOR 4 >= (int8, int4) , + OPERATOR 5 > (int8, int4) , + FUNCTION 1 btint84cmp(int8, int4) , + + -- cross-type comparisons int4 vs int2 + OPERATOR 1 < (int4, int2) , + OPERATOR 2 <= (int4, int2) , + OPERATOR 3 = (int4, int2) , + OPERATOR 4 >= (int4, int2) , + OPERATOR 5 > (int4, int2) , + FUNCTION 1 btint42cmp(int4, int2) , + + -- cross-type comparisons int4 vs int8 + OPERATOR 1 < (int4, int8) , + OPERATOR 2 <= (int4, int8) , + OPERATOR 3 = (int4, int8) , + OPERATOR 4 >= (int4, int8) , + OPERATOR 5 > (int4, int8) , + FUNCTION 1 btint48cmp(int4, int8) , + + -- cross-type comparisons int2 vs int8 + OPERATOR 1 < (int2, int8) , + OPERATOR 2 <= (int2, int8) , + OPERATOR 3 = (int2, int8) , + OPERATOR 4 >= (int2, int8) , + OPERATOR 5 > (int2, int8) , + FUNCTION 1 btint28cmp(int2, int8) , + + -- cross-type comparisons int2 vs int4 + OPERATOR 1 < (int2, int4) , + OPERATOR 2 <= (int2, int4) , + OPERATOR 3 = (int2, int4) , + OPERATOR 4 >= (int2, int4) , + OPERATOR 5 > (int2, int4) , + FUNCTION 1 btint24cmp(int2, int4) , + + -- cross-type in_range functions + FUNCTION 3 in_range(int4, int4, int8, boolean, boolean) , + FUNCTION 3 in_range(int4, int4, int2, boolean, boolean) , + FUNCTION 3 in_range(int2, int2, int8, boolean, boolean) , + FUNCTION 3 in_range(int2, int2, int4, boolean, boolean) ; +]]> +</programlisting> + + Notice that this definition <quote>overloads</quote> the operator strategy and + support function numbers: each number occurs multiple times within the + family. This is allowed so long as each instance of a + particular number has distinct input data types. The instances that have + both input types equal to an operator class's input type are the + primary operators and support functions for that operator class, + and in most cases should be declared as part of the operator class rather + than as loose members of the family. + </para> + + <para> + In a B-tree operator family, all the operators in the family must sort + compatibly, as is specified in detail in <xref linkend="btree-behavior"/>. + For each + operator in the family there must be a support function having the same + two input data types as the operator. It is recommended that a family be + complete, i.e., for each combination of data types, all operators are + included. Each operator class should include just the non-cross-type + operators and support function for its data type. + </para> + + <para> + To build a multiple-data-type hash operator family, compatible hash + support functions must be created for each data type supported by the + family. Here compatibility means that the functions are guaranteed to + return the same hash code for any two values that are considered equal + by the family's equality operators, even when the values are of different + types. This is usually difficult to accomplish when the types have + different physical representations, but it can be done in some cases. + Furthermore, casting a value from one data type represented in the operator + family to another data type also represented in the operator family via + an implicit or binary coercion cast must not change the computed hash value. + Notice that there is only one support function per data type, not one + per equality operator. It is recommended that a family be complete, i.e., + provide an equality operator for each combination of data types. + Each operator class should include just the non-cross-type equality + operator and the support function for its data type. + </para> + + <para> + GiST, SP-GiST, and GIN indexes do not have any explicit notion of + cross-data-type operations. The set of operators supported is just + whatever the primary support functions for a given operator class can + handle. + </para> + + <para> + In BRIN, the requirements depends on the framework that provides the + operator classes. For operator classes based on <literal>minmax</literal>, + the behavior required is the same as for B-tree operator families: + all the operators in the family must sort compatibly, and casts must + not change the associated sort ordering. + </para> + + <note> + <para> + Prior to <productname>PostgreSQL</productname> 8.3, there was no concept + of operator families, and so any cross-data-type operators intended to be + used with an index had to be bound directly into the index's operator + class. While this approach still works, it is deprecated because it + makes an index's dependencies too broad, and because the planner can + handle cross-data-type comparisons more effectively when both data types + have operators in the same operator family. + </para> + </note> + </sect2> + + <sect2 id="xindex-opclass-dependencies"> + <title>System Dependencies on Operator Classes</title> + + <indexterm> + <primary>ordering operator</primary> + </indexterm> + + <para> + <productname>PostgreSQL</productname> uses operator classes to infer the + properties of operators in more ways than just whether they can be used + with indexes. Therefore, you might want to create operator classes + even if you have no intention of indexing any columns of your data type. + </para> + + <para> + In particular, there are SQL features such as <literal>ORDER BY</literal> and + <literal>DISTINCT</literal> that require comparison and sorting of values. + To implement these features on a user-defined data type, + <productname>PostgreSQL</productname> looks for the default B-tree operator + class for the data type. The <quote>equals</quote> member of this operator + class defines the system's notion of equality of values for + <literal>GROUP BY</literal> and <literal>DISTINCT</literal>, and the sort ordering + imposed by the operator class defines the default <literal>ORDER BY</literal> + ordering. + </para> + + <para> + If there is no default B-tree operator class for a data type, the system + will look for a default hash operator class. But since that kind of + operator class only provides equality, it is only able to support grouping + not sorting. + </para> + + <para> + When there is no default operator class for a data type, you will get + errors like <quote>could not identify an ordering operator</quote> if you + try to use these SQL features with the data type. + </para> + + <note> + <para> + In <productname>PostgreSQL</productname> versions before 7.4, + sorting and grouping operations would implicitly use operators named + <literal>=</literal>, <literal><</literal>, and <literal>></literal>. The new + behavior of relying on default operator classes avoids having to make + any assumption about the behavior of operators with particular names. + </para> + </note> + + <para> + Sorting by a non-default B-tree operator class is possible by specifying + the class's less-than operator in a <literal>USING</literal> option, + for example +<programlisting> +SELECT * FROM mytable ORDER BY somecol USING ~<~; +</programlisting> + Alternatively, specifying the class's greater-than operator + in <literal>USING</literal> selects a descending-order sort. + </para> + + <para> + Comparison of arrays of a user-defined type also relies on the semantics + defined by the type's default B-tree operator class. If there is no + default B-tree operator class, but there is a default hash operator class, + then array equality is supported, but not ordering comparisons. + </para> + + <para> + Another SQL feature that requires even more data-type-specific knowledge + is the <literal>RANGE</literal> <replaceable>offset</replaceable> + <literal>PRECEDING</literal>/<literal>FOLLOWING</literal> framing option + for window functions (see <xref linkend="syntax-window-functions"/>). + For a query such as +<programlisting> +SELECT sum(x) OVER (ORDER BY x RANGE BETWEEN 5 PRECEDING AND 10 FOLLOWING) + FROM mytable; +</programlisting> + it is not sufficient to know how to order by <literal>x</literal>; + the database must also understand how to <quote>subtract 5</quote> or + <quote>add 10</quote> to the current row's value of <literal>x</literal> + to identify the bounds of the current window frame. Comparing the + resulting bounds to other rows' values of <literal>x</literal> is + possible using the comparison operators provided by the B-tree operator + class that defines the <literal>ORDER BY</literal> ordering — but + addition and subtraction operators are not part of the operator class, so + which ones should be used? Hard-wiring that choice would be undesirable, + because different sort orders (different B-tree operator classes) might + need different behavior. Therefore, a B-tree operator class can specify + an <firstterm>in_range</firstterm> support function that encapsulates the + addition and subtraction behaviors that make sense for its sort order. + It can even provide more than one in_range support function, in case + there is more than one data type that makes sense to use as the offset + in <literal>RANGE</literal> clauses. + If the B-tree operator class associated with the window's <literal>ORDER + BY</literal> clause does not have a matching in_range support function, + the <literal>RANGE</literal> <replaceable>offset</replaceable> + <literal>PRECEDING</literal>/<literal>FOLLOWING</literal> + option is not supported. + </para> + + <para> + Another important point is that an equality operator that + appears in a hash operator family is a candidate for hash joins, + hash aggregation, and related optimizations. The hash operator family + is essential here since it identifies the hash function(s) to use. + </para> + </sect2> + + <sect2 id="xindex-ordering-ops"> + <title>Ordering Operators</title> + + <para> + Some index access methods (currently, only GiST and SP-GiST) support the concept of + <firstterm>ordering operators</firstterm>. What we have been discussing so far + are <firstterm>search operators</firstterm>. A search operator is one for which + the index can be searched to find all rows satisfying + <literal>WHERE</literal> + <replaceable>indexed_column</replaceable> + <replaceable>operator</replaceable> + <replaceable>constant</replaceable>. + Note that nothing is promised about the order in which the matching rows + will be returned. In contrast, an ordering operator does not restrict the + set of rows that can be returned, but instead determines their order. + An ordering operator is one for which the index can be scanned to return + rows in the order represented by + <literal>ORDER BY</literal> + <replaceable>indexed_column</replaceable> + <replaceable>operator</replaceable> + <replaceable>constant</replaceable>. + The reason for defining ordering operators that way is that it supports + nearest-neighbor searches, if the operator is one that measures distance. + For example, a query like +<programlisting><![CDATA[ +SELECT * FROM places ORDER BY location <-> point '(101,456)' LIMIT 10; +]]> +</programlisting> + finds the ten places closest to a given target point. A GiST index + on the location column can do this efficiently because + <literal><-></literal> is an ordering operator. + </para> + + <para> + While search operators have to return Boolean results, ordering operators + usually return some other type, such as float or numeric for distances. + This type is normally not the same as the data type being indexed. + To avoid hard-wiring assumptions about the behavior of different data + types, the definition of an ordering operator is required to name + a B-tree operator family that specifies the sort ordering of the result + data type. As was stated in the previous section, B-tree operator families + define <productname>PostgreSQL</productname>'s notion of ordering, so + this is a natural representation. Since the point <literal><-></literal> + operator returns <type>float8</type>, it could be specified in an operator + class creation command like this: +<programlisting><![CDATA[ +OPERATOR 15 <-> (point, point) FOR ORDER BY float_ops +]]> +</programlisting> + where <literal>float_ops</literal> is the built-in operator family that includes + operations on <type>float8</type>. This declaration states that the index + is able to return rows in order of increasing values of the + <literal><-></literal> operator. + </para> + </sect2> + + <sect2 id="xindex-opclass-features"> + <title>Special Features of Operator Classes</title> + + <para> + There are two special features of operator classes that we have + not discussed yet, mainly because they are not useful + with the most commonly used index methods. + </para> + + <para> + Normally, declaring an operator as a member of an operator class + (or family) means that the index method can retrieve exactly the set of rows + that satisfy a <literal>WHERE</literal> condition using the operator. For example: +<programlisting> +SELECT * FROM table WHERE integer_column < 4; +</programlisting> + can be satisfied exactly by a B-tree index on the integer column. + But there are cases where an index is useful as an inexact guide to + the matching rows. For example, if a GiST index stores only bounding boxes + for geometric objects, then it cannot exactly satisfy a <literal>WHERE</literal> + condition that tests overlap between nonrectangular objects such as + polygons. Yet we could use the index to find objects whose bounding + box overlaps the bounding box of the target object, and then do the + exact overlap test only on the objects found by the index. If this + scenario applies, the index is said to be <quote>lossy</quote> for the + operator. Lossy index searches are implemented by having the index + method return a <firstterm>recheck</firstterm> flag when a row might or might + not really satisfy the query condition. The core system will then + test the original query condition on the retrieved row to see whether + it should be returned as a valid match. This approach works if + the index is guaranteed to return all the required rows, plus perhaps + some additional rows, which can be eliminated by performing the original + operator invocation. The index methods that support lossy searches + (currently, GiST, SP-GiST and GIN) allow the support functions of individual + operator classes to set the recheck flag, and so this is essentially an + operator-class feature. + </para> + + <para> + Consider again the situation where we are storing in the index only + the bounding box of a complex object such as a polygon. In this + case there's not much value in storing the whole polygon in the index + entry — we might as well store just a simpler object of type + <type>box</type>. This situation is expressed by the <literal>STORAGE</literal> + option in <command>CREATE OPERATOR CLASS</command>: we'd write something like: + +<programlisting> +CREATE OPERATOR CLASS polygon_ops + DEFAULT FOR TYPE polygon USING gist AS + ... + STORAGE box; +</programlisting> + + At present, only the GiST, GIN and BRIN index methods support a + <literal>STORAGE</literal> type that's different from the column data type. + The GiST <function>compress</function> and <function>decompress</function> support + routines must deal with data-type conversion when <literal>STORAGE</literal> + is used. In GIN, the <literal>STORAGE</literal> type identifies the type of + the <quote>key</quote> values, which normally is different from the type + of the indexed column — for example, an operator class for + integer-array columns might have keys that are just integers. The + GIN <function>extractValue</function> and <function>extractQuery</function> support + routines are responsible for extracting keys from indexed values. + BRIN is similar to GIN: the <literal>STORAGE</literal> type identifies the + type of the stored summary values, and operator classes' support + procedures are responsible for interpreting the summary values + correctly. + </para> + </sect2> + +</sect1> |