summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/textsearch-indexes.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/html/textsearch-indexes.html')
-rw-r--r--doc/src/sgml/html/textsearch-indexes.html80
1 files changed, 80 insertions, 0 deletions
diff --git a/doc/src/sgml/html/textsearch-indexes.html b/doc/src/sgml/html/textsearch-indexes.html
new file mode 100644
index 0000000..289e4bb
--- /dev/null
+++ b/doc/src/sgml/html/textsearch-indexes.html
@@ -0,0 +1,80 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>12.9. Preferred Index Types for Text Search</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="textsearch-debugging.html" title="12.8. Testing and Debugging Text Search" /><link rel="next" href="textsearch-psql.html" title="12.10. psql Support" /></head><body id="docContent" class="container-fluid col-10"><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">12.9. Preferred Index Types for Text Search</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="textsearch-debugging.html" title="12.8. Testing and Debugging Text Search">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="textsearch.html" title="Chapter 12. Full Text Search">Up</a></td><th width="60%" align="center">Chapter 12. Full Text Search</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="textsearch-psql.html" title="12.10. psql Support">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="TEXTSEARCH-INDEXES"><div class="titlepage"><div><div><h2 class="title" style="clear: both">12.9. Preferred Index Types for Text Search</h2></div></div></div><a id="id-1.5.11.12.2" class="indexterm"></a><p>
+ There are two kinds of indexes that can be used to speed up full text
+ searches:
+ <a class="link" href="gin.html" title="Chapter 67. GIN Indexes"><acronym class="acronym">GIN</acronym></a> and
+ <a class="link" href="gist.html" title="Chapter 65. GiST Indexes"><acronym class="acronym">GiST</acronym></a>.
+ Note that indexes are not mandatory for full text searching, but in
+ cases where a column is searched on a regular basis, an index is
+ usually desirable.
+ </p><p>
+ To create such an index, do one of:
+
+ </p><div class="variablelist"><dl class="variablelist"><dt><span class="term">
+ <a id="id-1.5.11.12.4.1.1.1.1" class="indexterm"></a>
+
+ <code class="literal">CREATE INDEX <em class="replaceable"><code>name</code></em> ON <em class="replaceable"><code>table</code></em> USING GIN (<em class="replaceable"><code>column</code></em>);</code>
+ </span></dt><dd><p>
+ Creates a GIN (Generalized Inverted Index)-based index.
+ The <em class="replaceable"><code>column</code></em> must be of <code class="type">tsvector</code> type.
+ </p></dd><dt><span class="term">
+ <a id="id-1.5.11.12.4.1.2.1.1" class="indexterm"></a>
+
+ <code class="literal">CREATE INDEX <em class="replaceable"><code>name</code></em> ON <em class="replaceable"><code>table</code></em> USING GIST (<em class="replaceable"><code>column</code></em> [ { DEFAULT | tsvector_ops } (siglen = <em class="replaceable"><code>number</code></em>) ] );</code>
+ </span></dt><dd><p>
+ Creates a GiST (Generalized Search Tree)-based index.
+ The <em class="replaceable"><code>column</code></em> can be of <code class="type">tsvector</code> or
+ <code class="type">tsquery</code> type.
+ Optional integer parameter <code class="literal">siglen</code> determines
+ signature length in bytes (see below for details).
+ </p></dd></dl></div><p>
+ </p><p>
+ GIN indexes are the preferred text search index type. As inverted
+ indexes, they contain an index entry for each word (lexeme), with a
+ compressed list of matching locations. Multi-word searches can find
+ the first match, then use the index to remove rows that are lacking
+ additional words. GIN indexes store only the words (lexemes) of
+ <code class="type">tsvector</code> values, and not their weight labels. Thus a table
+ row recheck is needed when using a query that involves weights.
+ </p><p>
+ A GiST index is <em class="firstterm">lossy</em>, meaning that the index
+ might produce false matches, and it is necessary
+ to check the actual table row to eliminate such false matches.
+ (<span class="productname">PostgreSQL</span> does this automatically when needed.)
+ GiST indexes are lossy because each document is represented in the
+ index by a fixed-length signature. The signature length in bytes is determined
+ by the value of the optional integer parameter <code class="literal">siglen</code>.
+ The default signature length (when <code class="literal">siglen</code> is not specified) is
+ 124 bytes, the maximum signature length is 2024 bytes. The signature is generated by hashing
+ each word into a single bit in an n-bit string, with all these bits OR-ed
+ together to produce an n-bit document signature. When two words hash to
+ the same bit position there will be a false match. If all words in
+ the query have matches (real or false) then the table row must be
+ retrieved to see if the match is correct. Longer signatures lead to a more
+ precise search (scanning a smaller fraction of the index and fewer heap
+ pages), at the cost of a larger index.
+ </p><p>
+ A GiST index can be covering, i.e., use the <code class="literal">INCLUDE</code>
+ clause. Included columns can have data types without any GiST operator
+ class. Included attributes will be stored uncompressed.
+ </p><p>
+ Lossiness causes performance degradation due to unnecessary fetches of table
+ records that turn out to be false matches. Since random access to table
+ records is slow, this limits the usefulness of GiST indexes. The
+ likelihood of false matches depends on several factors, in particular the
+ number of unique words, so using dictionaries to reduce this number is
+ recommended.
+ </p><p>
+ Note that <acronym class="acronym">GIN</acronym> index build time can often be improved
+ by increasing <a class="xref" href="runtime-config-resource.html#GUC-MAINTENANCE-WORK-MEM">maintenance_work_mem</a>, while
+ <acronym class="acronym">GiST</acronym> index build time is not sensitive to that
+ parameter.
+ </p><p>
+ Partitioning of big collections and the proper use of GIN and GiST indexes
+ allows the implementation of very fast searches with online update.
+ Partitioning can be done at the database level using table inheritance,
+ or by distributing documents over
+ servers and collecting external search results, e.g., via <a class="link" href="ddl-foreign-data.html" title="5.12. Foreign Data">Foreign Data</a> access.
+ The latter is possible because ranking functions use
+ only local information.
+ </p></div><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navfooter"><hr></hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="textsearch-debugging.html" title="12.8. Testing and Debugging Text Search">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="textsearch.html" title="Chapter 12. Full Text Search">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="textsearch-psql.html" title="12.10. psql Support">Next</a></td></tr><tr><td width="40%" align="left" valign="top">12.8. Testing and Debugging Text Search </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 14.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 12.10. <span xmlns="http://www.w3.org/1999/xhtml" class="application">psql</span> Support</td></tr></table></div></body></html> \ No newline at end of file