summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/gin-extensibility.html
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-13 13:44:03 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-13 13:44:03 +0000
commit293913568e6a7a86fd1479e1cff8e2ecb58d6568 (patch)
treefc3b469a3ec5ab71b36ea97cc7aaddb838423a0c /doc/src/sgml/html/gin-extensibility.html
parentInitial commit. (diff)
downloadpostgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.tar.xz
postgresql-16-293913568e6a7a86fd1479e1cff8e2ecb58d6568.zip
Adding upstream version 16.2.upstream/16.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/html/gin-extensibility.html')
-rw-r--r--doc/src/sgml/html/gin-extensibility.html237
1 files changed, 237 insertions, 0 deletions
diff --git a/doc/src/sgml/html/gin-extensibility.html b/doc/src/sgml/html/gin-extensibility.html
new file mode 100644
index 0000000..1dca272
--- /dev/null
+++ b/doc/src/sgml/html/gin-extensibility.html
@@ -0,0 +1,237 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>70.3. Extensibility</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="gin-builtin-opclasses.html" title="70.2. Built-in Operator Classes" /><link rel="next" href="gin-implementation.html" title="70.4. Implementation" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">70.3. Extensibility</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="gin-builtin-opclasses.html" title="70.2. Built-in Operator Classes">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="gin.html" title="Chapter 70. GIN Indexes">Up</a></td><th width="60%" align="center">Chapter 70. GIN Indexes</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="gin-implementation.html" title="70.4. Implementation">Next</a></td></tr></table><hr /></div><div class="sect1" id="GIN-EXTENSIBILITY"><div class="titlepage"><div><div><h2 class="title" style="clear: both">70.3. Extensibility <a href="#GIN-EXTENSIBILITY" class="id_link">#</a></h2></div></div></div><p>
+ The <acronym class="acronym">GIN</acronym> interface has a high level of abstraction,
+ requiring the access method implementer only to implement the semantics of
+ the data type being accessed. The <acronym class="acronym">GIN</acronym> layer itself
+ takes care of concurrency, logging and searching the tree structure.
+ </p><p>
+ All it takes to get a <acronym class="acronym">GIN</acronym> access method working is to
+ implement a few user-defined methods, which define the behavior of
+ keys in the tree and the relationships between keys, indexed items,
+ and indexable queries. In short, <acronym class="acronym">GIN</acronym> combines
+ extensibility with generality, code reuse, and a clean interface.
+ </p><p>
+ There are two methods that an operator class for
+ <acronym class="acronym">GIN</acronym> must provide:
+
+ </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="function">Datum *extractValue(Datum itemValue, int32 *nkeys,
+ bool **nullFlags)</code></span></dt><dd><p>
+ Returns a palloc'd array of keys given an item to be indexed. The
+ number of returned keys must be stored into <code class="literal">*nkeys</code>.
+ If any of the keys can be null, also palloc an array of
+ <code class="literal">*nkeys</code> <code class="type">bool</code> fields, store its address at
+ <code class="literal">*nullFlags</code>, and set these null flags as needed.
+ <code class="literal">*nullFlags</code> can be left <code class="symbol">NULL</code> (its initial value)
+ if all keys are non-null.
+ The return value can be <code class="symbol">NULL</code> if the item contains no keys.
+ </p></dd><dt><span class="term"><code class="function">Datum *extractQuery(Datum query, int32 *nkeys,
+ StrategyNumber n, bool **pmatch, Pointer **extra_data,
+ bool **nullFlags, int32 *searchMode)</code></span></dt><dd><p>
+ Returns a palloc'd array of keys given a value to be queried; that is,
+ <code class="literal">query</code> is the value on the right-hand side of an
+ indexable operator whose left-hand side is the indexed column.
+ <code class="literal">n</code> is the strategy number of the operator within the
+ operator class (see <a class="xref" href="xindex.html#XINDEX-STRATEGIES" title="38.16.2. Index Method Strategies">Section 38.16.2</a>).
+ Often, <code class="function">extractQuery</code> will need
+ to consult <code class="literal">n</code> to determine the data type of
+ <code class="literal">query</code> and the method it should use to extract key values.
+ The number of returned keys must be stored into <code class="literal">*nkeys</code>.
+ If any of the keys can be null, also palloc an array of
+ <code class="literal">*nkeys</code> <code class="type">bool</code> fields, store its address at
+ <code class="literal">*nullFlags</code>, and set these null flags as needed.
+ <code class="literal">*nullFlags</code> can be left <code class="symbol">NULL</code> (its initial value)
+ if all keys are non-null.
+ The return value can be <code class="symbol">NULL</code> if the <code class="literal">query</code> contains no keys.
+ </p><p>
+ <code class="literal">searchMode</code> is an output argument that allows
+ <code class="function">extractQuery</code> to specify details about how the search
+ will be done.
+ If <code class="literal">*searchMode</code> is set to
+ <code class="literal">GIN_SEARCH_MODE_DEFAULT</code> (which is the value it is
+ initialized to before call), only items that match at least one of
+ the returned keys are considered candidate matches.
+ If <code class="literal">*searchMode</code> is set to
+ <code class="literal">GIN_SEARCH_MODE_INCLUDE_EMPTY</code>, then in addition to items
+ containing at least one matching key, items that contain no keys at
+ all are considered candidate matches. (This mode is useful for
+ implementing is-subset-of operators, for example.)
+ If <code class="literal">*searchMode</code> is set to <code class="literal">GIN_SEARCH_MODE_ALL</code>,
+ then all non-null items in the index are considered candidate
+ matches, whether they match any of the returned keys or not. (This
+ mode is much slower than the other two choices, since it requires
+ scanning essentially the entire index, but it may be necessary to
+ implement corner cases correctly. An operator that needs this mode
+ in most cases is probably not a good candidate for a GIN operator
+ class.)
+ The symbols to use for setting this mode are defined in
+ <code class="filename">access/gin.h</code>.
+ </p><p>
+ <code class="literal">pmatch</code> is an output argument for use when partial match
+ is supported. To use it, <code class="function">extractQuery</code> must allocate
+ an array of <code class="literal">*nkeys</code> <code class="type">bool</code>s and store its address at
+ <code class="literal">*pmatch</code>. Each element of the array should be set to true
+ if the corresponding key requires partial match, false if not.
+ If <code class="literal">*pmatch</code> is set to <code class="symbol">NULL</code> then GIN assumes partial match
+ is not required. The variable is initialized to <code class="symbol">NULL</code> before call,
+ so this argument can simply be ignored by operator classes that do
+ not support partial match.
+ </p><p>
+ <code class="literal">extra_data</code> is an output argument that allows
+ <code class="function">extractQuery</code> to pass additional data to the
+ <code class="function">consistent</code> and <code class="function">comparePartial</code> methods.
+ To use it, <code class="function">extractQuery</code> must allocate
+ an array of <code class="literal">*nkeys</code> pointers and store its address at
+ <code class="literal">*extra_data</code>, then store whatever it wants to into the
+ individual pointers. The variable is initialized to <code class="symbol">NULL</code> before
+ call, so this argument can simply be ignored by operator classes that
+ do not require extra data. If <code class="literal">*extra_data</code> is set, the
+ whole array is passed to the <code class="function">consistent</code> method, and
+ the appropriate element to the <code class="function">comparePartial</code> method.
+ </p></dd></dl></div><p>
+
+ An operator class must also provide a function to check if an indexed item
+ matches the query. It comes in two flavors, a Boolean <code class="function">consistent</code>
+ function, and a ternary <code class="function">triConsistent</code> function.
+ <code class="function">triConsistent</code> covers the functionality of both, so providing
+ <code class="function">triConsistent</code> alone is sufficient. However, if the Boolean
+ variant is significantly cheaper to calculate, it can be advantageous to
+ provide both. If only the Boolean variant is provided, some optimizations
+ that depend on refuting index items before fetching all the keys are
+ disabled.
+
+ </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="function">bool consistent(bool check[], StrategyNumber n, Datum query,
+ int32 nkeys, Pointer extra_data[], bool *recheck,
+ Datum queryKeys[], bool nullFlags[])</code></span></dt><dd><p>
+ Returns true if an indexed item satisfies the query operator with
+ strategy number <code class="literal">n</code> (or might satisfy it, if the recheck
+ indication is returned). This function does not have direct access
+ to the indexed item's value, since <acronym class="acronym">GIN</acronym> does not
+ store items explicitly. Rather, what is available is knowledge
+ about which key values extracted from the query appear in a given
+ indexed item. The <code class="literal">check</code> array has length
+ <code class="literal">nkeys</code>, which is the same as the number of keys previously
+ returned by <code class="function">extractQuery</code> for this <code class="literal">query</code> datum.
+ Each element of the
+ <code class="literal">check</code> array is true if the indexed item contains the
+ corresponding query key, i.e., if (check[i] == true) the i-th key of the
+ <code class="function">extractQuery</code> result array is present in the indexed item.
+ The original <code class="literal">query</code> datum is
+ passed in case the <code class="function">consistent</code> method needs to consult it,
+ and so are the <code class="literal">queryKeys[]</code> and <code class="literal">nullFlags[]</code>
+ arrays previously returned by <code class="function">extractQuery</code>.
+ <code class="literal">extra_data</code> is the extra-data array returned by
+ <code class="function">extractQuery</code>, or <code class="symbol">NULL</code> if none.
+ </p><p>
+ When <code class="function">extractQuery</code> returns a null key in
+ <code class="literal">queryKeys[]</code>, the corresponding <code class="literal">check[]</code> element
+ is true if the indexed item contains a null key; that is, the
+ semantics of <code class="literal">check[]</code> are like <code class="literal">IS NOT DISTINCT
+ FROM</code>. The <code class="function">consistent</code> function can examine the
+ corresponding <code class="literal">nullFlags[]</code> element if it needs to tell
+ the difference between a regular value match and a null match.
+ </p><p>
+ On success, <code class="literal">*recheck</code> should be set to true if the heap
+ tuple needs to be rechecked against the query operator, or false if
+ the index test is exact. That is, a false return value guarantees
+ that the heap tuple does not match the query; a true return value with
+ <code class="literal">*recheck</code> set to false guarantees that the heap tuple does
+ match the query; and a true return value with
+ <code class="literal">*recheck</code> set to true means that the heap tuple might match
+ the query, so it needs to be fetched and rechecked by evaluating the
+ query operator directly against the originally indexed item.
+ </p></dd><dt><span class="term"><code class="function">GinTernaryValue triConsistent(GinTernaryValue check[], StrategyNumber n, Datum query,
+ int32 nkeys, Pointer extra_data[],
+ Datum queryKeys[], bool nullFlags[])</code></span></dt><dd><p>
+ <code class="function">triConsistent</code> is similar to <code class="function">consistent</code>,
+ but instead of Booleans in the <code class="literal">check</code> vector, there are
+ three possible values for each
+ key: <code class="literal">GIN_TRUE</code>, <code class="literal">GIN_FALSE</code> and
+ <code class="literal">GIN_MAYBE</code>. <code class="literal">GIN_FALSE</code> and <code class="literal">GIN_TRUE</code>
+ have the same meaning as regular Boolean values, while
+ <code class="literal">GIN_MAYBE</code> means that the presence of that key is not known.
+ When <code class="literal">GIN_MAYBE</code> values are present, the function should only
+ return <code class="literal">GIN_TRUE</code> if the item certainly matches whether or
+ not the index item contains the corresponding query keys. Likewise, the
+ function must return <code class="literal">GIN_FALSE</code> only if the item certainly
+ does not match, whether or not it contains the <code class="literal">GIN_MAYBE</code>
+ keys. If the result depends on the <code class="literal">GIN_MAYBE</code> entries, i.e.,
+ the match cannot be confirmed or refuted based on the known query keys,
+ the function must return <code class="literal">GIN_MAYBE</code>.
+ </p><p>
+ When there are no <code class="literal">GIN_MAYBE</code> values in the <code class="literal">check</code>
+ vector, a <code class="literal">GIN_MAYBE</code> return value is the equivalent of
+ setting the <code class="literal">recheck</code> flag in the
+ Boolean <code class="function">consistent</code> function.
+ </p></dd></dl></div><p>
+ </p><p>
+ In addition, GIN must have a way to sort the key values stored in the index.
+ The operator class can define the sort ordering by specifying a comparison
+ method:
+
+ </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="function">int compare(Datum a, Datum b)</code></span></dt><dd><p>
+ Compares two keys (not indexed items!) and returns an integer less than
+ zero, zero, or greater than zero, indicating whether the first key is
+ less than, equal to, or greater than the second. Null keys are never
+ passed to this function.
+ </p></dd></dl></div><p>
+
+ Alternatively, if the operator class does not provide a <code class="function">compare</code>
+ method, GIN will look up the default btree operator class for the index
+ key data type, and use its comparison function. It is recommended to
+ specify the comparison function in a GIN operator class that is meant for
+ just one data type, as looking up the btree operator class costs a few
+ cycles. However, polymorphic GIN operator classes (such
+ as <code class="literal">array_ops</code>) typically cannot specify a single comparison
+ function.
+ </p><p>
+ An operator class for <acronym class="acronym">GIN</acronym> can optionally supply the
+ following methods:
+
+ </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="function">int comparePartial(Datum partial_key, Datum key, StrategyNumber n,
+ Pointer extra_data)</code></span></dt><dd><p>
+ Compare a partial-match query key to an index key. Returns an integer
+ whose sign indicates the result: less than zero means the index key
+ does not match the query, but the index scan should continue; zero
+ means that the index key does match the query; greater than zero
+ indicates that the index scan should stop because no more matches
+ are possible. The strategy number <code class="literal">n</code> of the operator
+ that generated the partial match query is provided, in case its
+ semantics are needed to determine when to end the scan. Also,
+ <code class="literal">extra_data</code> is the corresponding element of the extra-data
+ array made by <code class="function">extractQuery</code>, or <code class="symbol">NULL</code> if none.
+ Null keys are never passed to this function.
+ </p></dd><dt><span class="term"><code class="function">void options(local_relopts *relopts)</code></span></dt><dd><p>
+ Defines a set of user-visible parameters that control operator class
+ behavior.
+ </p><p>
+ The <code class="function">options</code> function is passed a pointer to a
+ <code class="structname">local_relopts</code> struct, which needs to be
+ filled with a set of operator class specific options. The options
+ can be accessed from other support functions using the
+ <code class="literal">PG_HAS_OPCLASS_OPTIONS()</code> and
+ <code class="literal">PG_GET_OPCLASS_OPTIONS()</code> macros.
+ </p><p>
+ Since both key extraction of indexed values and representation of the
+ key in <acronym class="acronym">GIN</acronym> are flexible, they may depend on
+ user-specified parameters.
+ </p></dd></dl></div><p>
+ </p><p>
+ To support <span class="quote">“<span class="quote">partial match</span>”</span> queries, an operator class must
+ provide the <code class="function">comparePartial</code> method, and its
+ <code class="function">extractQuery</code> method must set the <code class="literal">pmatch</code>
+ parameter when a partial-match query is encountered. See
+ <a class="xref" href="gin-implementation.html#GIN-PARTIAL-MATCH" title="70.4.2. Partial Match Algorithm">Section 70.4.2</a> for details.
+ </p><p>
+ The actual data types of the various <code class="literal">Datum</code> values mentioned
+ above vary depending on the operator class. The item values passed to
+ <code class="function">extractValue</code> are always of the operator class's input type, and
+ all key values must be of the class's <code class="literal">STORAGE</code> type. The type of
+ the <code class="literal">query</code> argument passed to <code class="function">extractQuery</code>,
+ <code class="function">consistent</code> and <code class="function">triConsistent</code> is whatever is the
+ right-hand input type of the class member operator identified by the
+ strategy number. This need not be the same as the indexed type, so long as
+ key values of the correct type can be extracted from it. However, it is
+ recommended that the SQL declarations of these three support functions use
+ the opclass's indexed data type for the <code class="literal">query</code> argument, even
+ though the actual type might be something else depending on the operator.
+ </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="gin-builtin-opclasses.html" title="70.2. Built-in Operator Classes">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="gin.html" title="Chapter 70. GIN Indexes">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="gin-implementation.html" title="70.4. Implementation">Next</a></td></tr><tr><td width="40%" align="left" valign="top">70.2. Built-in Operator Classes </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 70.4. Implementation</td></tr></table></div></body></html> \ No newline at end of file