diff options
Diffstat (limited to 'doc/src/sgml/html/textsearch-tables.html')
-rw-r--r-- | doc/src/sgml/html/textsearch-tables.html | 139 |
1 files changed, 139 insertions, 0 deletions
diff --git a/doc/src/sgml/html/textsearch-tables.html b/doc/src/sgml/html/textsearch-tables.html new file mode 100644 index 0000000..6ce2688 --- /dev/null +++ b/doc/src/sgml/html/textsearch-tables.html @@ -0,0 +1,139 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>12.2. Tables and Indexes</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="prev" href="textsearch-intro.html" title="12.1. Introduction" /><link rel="next" href="textsearch-controls.html" title="12.3. Controlling Text Search" /></head><body id="docContent" class="container-fluid col-10"><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">12.2. Tables and Indexes</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="textsearch-intro.html" title="12.1. Introduction">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="textsearch.html" title="Chapter 12. Full Text Search">Up</a></td><th width="60%" align="center">Chapter 12. Full Text Search</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 13.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="textsearch-controls.html" title="12.3. Controlling Text Search">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="TEXTSEARCH-TABLES"><div class="titlepage"><div><div><h2 class="title" style="clear: both">12.2. Tables and Indexes</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="textsearch-tables.html#TEXTSEARCH-TABLES-SEARCH">12.2.1. Searching a Table</a></span></dt><dt><span class="sect2"><a href="textsearch-tables.html#TEXTSEARCH-TABLES-INDEX">12.2.2. Creating Indexes</a></span></dt></dl></div><p> + The examples in the previous section illustrated full text matching using + simple constant strings. This section shows how to search table data, + optionally using indexes. + </p><div class="sect2" id="TEXTSEARCH-TABLES-SEARCH"><div class="titlepage"><div><div><h3 class="title">12.2.1. Searching a Table</h3></div></div></div><p> + It is possible to do a full text search without an index. A simple query + to print the <code class="structname">title</code> of each row that contains the word + <code class="literal">friend</code> in its <code class="structfield">body</code> field is: + +</p><pre class="programlisting"> +SELECT title +FROM pgweb +WHERE to_tsvector('english', body) @@ to_tsquery('english', 'friend'); +</pre><p> + + This will also find related words such as <code class="literal">friends</code> + and <code class="literal">friendly</code>, since all these are reduced to the same + normalized lexeme. + </p><p> + The query above specifies that the <code class="literal">english</code> configuration + is to be used to parse and normalize the strings. Alternatively we + could omit the configuration parameters: + +</p><pre class="programlisting"> +SELECT title +FROM pgweb +WHERE to_tsvector(body) @@ to_tsquery('friend'); +</pre><p> + + This query will use the configuration set by <a class="xref" href="runtime-config-client.html#GUC-DEFAULT-TEXT-SEARCH-CONFIG">default_text_search_config</a>. + </p><p> + A more complex example is to + select the ten most recent documents that contain <code class="literal">create</code> and + <code class="literal">table</code> in the <code class="structname">title</code> or <code class="structname">body</code>: + +</p><pre class="programlisting"> +SELECT title +FROM pgweb +WHERE to_tsvector(title || ' ' || body) @@ to_tsquery('create & table') +ORDER BY last_mod_date DESC +LIMIT 10; +</pre><p> + + For clarity we omitted the <code class="function">coalesce</code> function calls + which would be needed to find rows that contain <code class="literal">NULL</code> + in one of the two fields. + </p><p> + Although these queries will work without an index, most applications + will find this approach too slow, except perhaps for occasional ad-hoc + searches. Practical use of text searching usually requires creating + an index. + </p></div><div class="sect2" id="TEXTSEARCH-TABLES-INDEX"><div class="titlepage"><div><div><h3 class="title">12.2.2. Creating Indexes</h3></div></div></div><p> + We can create a <acronym class="acronym">GIN</acronym> index (<a class="xref" href="textsearch-indexes.html" title="12.9. GIN and GiST Index Types">Section 12.9</a>) to speed up text searches: + +</p><pre class="programlisting"> +CREATE INDEX pgweb_idx ON pgweb USING GIN (to_tsvector('english', body)); +</pre><p> + + Notice that the 2-argument version of <code class="function">to_tsvector</code> is + used. Only text search functions that specify a configuration name can + be used in expression indexes (<a class="xref" href="indexes-expressional.html" title="11.7. Indexes on Expressions">Section 11.7</a>). + This is because the index contents must be unaffected by <a class="xref" href="runtime-config-client.html#GUC-DEFAULT-TEXT-SEARCH-CONFIG">default_text_search_config</a>. If they were affected, the + index contents might be inconsistent because different entries could + contain <code class="type">tsvector</code>s that were created with different text search + configurations, and there would be no way to guess which was which. It + would be impossible to dump and restore such an index correctly. + </p><p> + Because the two-argument version of <code class="function">to_tsvector</code> was + used in the index above, only a query reference that uses the 2-argument + version of <code class="function">to_tsvector</code> with the same configuration + name will use that index. That is, <code class="literal">WHERE + to_tsvector('english', body) @@ 'a & b'</code> can use the index, + but <code class="literal">WHERE to_tsvector(body) @@ 'a & b'</code> cannot. + This ensures that an index will be used only with the same configuration + used to create the index entries. + </p><p> + It is possible to set up more complex expression indexes wherein the + configuration name is specified by another column, e.g.: + +</p><pre class="programlisting"> +CREATE INDEX pgweb_idx ON pgweb USING GIN (to_tsvector(config_name, body)); +</pre><p> + + where <code class="literal">config_name</code> is a column in the <code class="literal">pgweb</code> + table. This allows mixed configurations in the same index while + recording which configuration was used for each index entry. This + would be useful, for example, if the document collection contained + documents in different languages. Again, + queries that are meant to use the index must be phrased to match, e.g., + <code class="literal">WHERE to_tsvector(config_name, body) @@ 'a & b'</code>. + </p><p> + Indexes can even concatenate columns: + +</p><pre class="programlisting"> +CREATE INDEX pgweb_idx ON pgweb USING GIN (to_tsvector('english', title || ' ' || body)); +</pre><p> + </p><p> + Another approach is to create a separate <code class="type">tsvector</code> column + to hold the output of <code class="function">to_tsvector</code>. To keep this + column automatically up to date with its source data, use a stored + generated column. This example is a + concatenation of <code class="literal">title</code> and <code class="literal">body</code>, + using <code class="function">coalesce</code> to ensure that one field will still be + indexed when the other is <code class="literal">NULL</code>: + +</p><pre class="programlisting"> +ALTER TABLE pgweb + ADD COLUMN textsearchable_index_col tsvector + GENERATED ALWAYS AS (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))) STORED; +</pre><p> + + Then we create a <acronym class="acronym">GIN</acronym> index to speed up the search: + +</p><pre class="programlisting"> +CREATE INDEX textsearch_idx ON pgweb USING GIN (textsearchable_index_col); +</pre><p> + + Now we are ready to perform a fast full text search: + +</p><pre class="programlisting"> +SELECT title +FROM pgweb +WHERE textsearchable_index_col @@ to_tsquery('create & table') +ORDER BY last_mod_date DESC +LIMIT 10; +</pre><p> + </p><p> + One advantage of the separate-column approach over an expression index + is that it is not necessary to explicitly specify the text search + configuration in queries in order to make use of the index. As shown + in the example above, the query can depend on + <code class="varname">default_text_search_config</code>. Another advantage is that + searches will be faster, since it will not be necessary to redo the + <code class="function">to_tsvector</code> calls to verify index matches. (This is more + important when using a GiST index than a GIN index; see <a class="xref" href="textsearch-indexes.html" title="12.9. GIN and GiST Index Types">Section 12.9</a>.) The expression-index approach is + simpler to set up, however, and it requires less disk space since the + <code class="type">tsvector</code> representation is not stored explicitly. + </p></div></div><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navfooter"><hr></hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="textsearch-intro.html" title="12.1. Introduction">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="textsearch.html" title="Chapter 12. Full Text Search">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="textsearch-controls.html" title="12.3. Controlling Text Search">Next</a></td></tr><tr><td width="40%" align="left" valign="top">12.1. Introduction </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 13.4 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 12.3. Controlling Text Search</td></tr></table></div></body></html>
\ No newline at end of file |