summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/datatype-textsearch.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/html/datatype-textsearch.html')
-rw-r--r--doc/src/sgml/html/datatype-textsearch.html196
1 files changed, 196 insertions, 0 deletions
diff --git a/doc/src/sgml/html/datatype-textsearch.html b/doc/src/sgml/html/datatype-textsearch.html
new file mode 100644
index 0000000..2f4e4a9
--- /dev/null
+++ b/doc/src/sgml/html/datatype-textsearch.html
@@ -0,0 +1,196 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>8.11. Text Search Types</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="datatype-bit.html" title="8.10. Bit String Types" /><link rel="next" href="datatype-uuid.html" title="8.12. UUID Type" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">8.11. Text Search Types</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="datatype-bit.html" title="8.10. Bit String Types">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="datatype.html" title="Chapter 8. Data Types">Up</a></td><th width="60%" align="center">Chapter 8. Data Types</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="datatype-uuid.html" title="8.12. UUID Type">Next</a></td></tr></table><hr /></div><div class="sect1" id="DATATYPE-TEXTSEARCH"><div class="titlepage"><div><div><h2 class="title" style="clear: both">8.11. Text Search Types</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="datatype-textsearch.html#DATATYPE-TSVECTOR">8.11.1. <code class="type">tsvector</code></a></span></dt><dt><span class="sect2"><a href="datatype-textsearch.html#DATATYPE-TSQUERY">8.11.2. <code class="type">tsquery</code></a></span></dt></dl></div><a id="id-1.5.7.19.2" class="indexterm"></a><a id="id-1.5.7.19.3" class="indexterm"></a><p>
+ <span class="productname">PostgreSQL</span> provides two data types that
+ are designed to support full text search, which is the activity of
+ searching through a collection of natural-language <em class="firstterm">documents</em>
+ to locate those that best match a <em class="firstterm">query</em>.
+ The <code class="type">tsvector</code> type represents a document in a form optimized
+ for text search; the <code class="type">tsquery</code> type similarly represents
+ a text query.
+ <a class="xref" href="textsearch.html" title="Chapter 12. Full Text Search">Chapter 12</a> provides a detailed explanation of this
+ facility, and <a class="xref" href="functions-textsearch.html" title="9.13. Text Search Functions and Operators">Section 9.13</a> summarizes the
+ related functions and operators.
+ </p><div class="sect2" id="DATATYPE-TSVECTOR"><div class="titlepage"><div><div><h3 class="title">8.11.1. <code class="type">tsvector</code></h3></div></div></div><a id="id-1.5.7.19.5.2" class="indexterm"></a><p>
+ A <code class="type">tsvector</code> value is a sorted list of distinct
+ <em class="firstterm">lexemes</em>, which are words that have been
+ <em class="firstterm">normalized</em> to merge different variants of the same word
+ (see <a class="xref" href="textsearch.html" title="Chapter 12. Full Text Search">Chapter 12</a> for details). Sorting and
+ duplicate-elimination are done automatically during input, as shown in
+ this example:
+
+</p><pre class="programlisting">
+SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector;
+ tsvector
+----------------------------------------------------
+ 'a' 'and' 'ate' 'cat' 'fat' 'mat' 'on' 'rat' 'sat'
+</pre><p>
+
+ To represent
+ lexemes containing whitespace or punctuation, surround them with quotes:
+
+</p><pre class="programlisting">
+SELECT $$the lexeme ' ' contains spaces$$::tsvector;
+ tsvector
+-------------------------------------------
+ ' ' 'contains' 'lexeme' 'spaces' 'the'
+</pre><p>
+
+ (We use dollar-quoted string literals in this example and the next one
+ to avoid the confusion of having to double quote marks within the
+ literals.) Embedded quotes and backslashes must be doubled:
+
+</p><pre class="programlisting">
+SELECT $$the lexeme 'Joe''s' contains a quote$$::tsvector;
+ tsvector
+------------------------------------------------
+ 'Joe''s' 'a' 'contains' 'lexeme' 'quote' 'the'
+</pre><p>
+
+ Optionally, integer <em class="firstterm">positions</em>
+ can be attached to lexemes:
+
+</p><pre class="programlisting">
+SELECT 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12'::tsvector;
+ tsvector
+-------------------------------------------------------------------​------------
+ 'a':1,6,10 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'on':5 'rat':12 'sat':4
+</pre><p>
+
+ A position normally indicates the source word's location in the
+ document. Positional information can be used for
+ <em class="firstterm">proximity ranking</em>. Position values can
+ range from 1 to 16383; larger numbers are silently set to 16383.
+ Duplicate positions for the same lexeme are discarded.
+ </p><p>
+ Lexemes that have positions can further be labeled with a
+ <em class="firstterm">weight</em>, which can be <code class="literal">A</code>,
+ <code class="literal">B</code>, <code class="literal">C</code>, or <code class="literal">D</code>.
+ <code class="literal">D</code> is the default and hence is not shown on output:
+
+</p><pre class="programlisting">
+SELECT 'a:1A fat:2B,4C cat:5D'::tsvector;
+ tsvector
+----------------------------
+ 'a':1A 'cat':5 'fat':2B,4C
+</pre><p>
+
+ Weights are typically used to reflect document structure, for example
+ by marking title words differently from body words. Text search
+ ranking functions can assign different priorities to the different
+ weight markers.
+ </p><p>
+ It is important to understand that the
+ <code class="type">tsvector</code> type itself does not perform any word
+ normalization; it assumes the words it is given are normalized
+ appropriately for the application. For example,
+
+</p><pre class="programlisting">
+SELECT 'The Fat Rats'::tsvector;
+ tsvector
+--------------------
+ 'Fat' 'Rats' 'The'
+</pre><p>
+
+ For most English-text-searching applications the above words would
+ be considered non-normalized, but <code class="type">tsvector</code> doesn't care.
+ Raw document text should usually be passed through
+ <code class="function">to_tsvector</code> to normalize the words appropriately
+ for searching:
+
+</p><pre class="programlisting">
+SELECT to_tsvector('english', 'The Fat Rats');
+ to_tsvector
+-----------------
+ 'fat':2 'rat':3
+</pre><p>
+
+ Again, see <a class="xref" href="textsearch.html" title="Chapter 12. Full Text Search">Chapter 12</a> for more detail.
+ </p></div><div class="sect2" id="DATATYPE-TSQUERY"><div class="titlepage"><div><div><h3 class="title">8.11.2. <code class="type">tsquery</code></h3></div></div></div><a id="id-1.5.7.19.6.2" class="indexterm"></a><p>
+ A <code class="type">tsquery</code> value stores lexemes that are to be
+ searched for, and can combine them using the Boolean operators
+ <code class="literal">&amp;</code> (AND), <code class="literal">|</code> (OR), and
+ <code class="literal">!</code> (NOT), as well as the phrase search operator
+ <code class="literal">&lt;-&gt;</code> (FOLLOWED BY). There is also a variant
+ <code class="literal">&lt;<em class="replaceable"><code>N</code></em>&gt;</code> of the FOLLOWED BY
+ operator, where <em class="replaceable"><code>N</code></em> is an integer constant that
+ specifies the distance between the two lexemes being searched
+ for. <code class="literal">&lt;-&gt;</code> is equivalent to <code class="literal">&lt;1&gt;</code>.
+ </p><p>
+ Parentheses can be used to enforce grouping of these operators.
+ In the absence of parentheses, <code class="literal">!</code> (NOT) binds most tightly,
+ <code class="literal">&lt;-&gt;</code> (FOLLOWED BY) next most tightly, then
+ <code class="literal">&amp;</code> (AND), with <code class="literal">|</code> (OR) binding
+ the least tightly.
+ </p><p>
+ Here are some examples:
+
+</p><pre class="programlisting">
+SELECT 'fat &amp; rat'::tsquery;
+ tsquery
+---------------
+ 'fat' &amp; 'rat'
+
+SELECT 'fat &amp; (rat | cat)'::tsquery;
+ tsquery
+---------------------------
+ 'fat' &amp; ( 'rat' | 'cat' )
+
+SELECT 'fat &amp; rat &amp; ! cat'::tsquery;
+ tsquery
+------------------------
+ 'fat' &amp; 'rat' &amp; !'cat'
+</pre><p>
+ </p><p>
+ Optionally, lexemes in a <code class="type">tsquery</code> can be labeled with
+ one or more weight letters, which restricts them to match only
+ <code class="type">tsvector</code> lexemes with one of those weights:
+
+</p><pre class="programlisting">
+SELECT 'fat:ab &amp; cat'::tsquery;
+ tsquery
+------------------
+ 'fat':AB &amp; 'cat'
+</pre><p>
+ </p><p>
+ Also, lexemes in a <code class="type">tsquery</code> can be labeled with <code class="literal">*</code>
+ to specify prefix matching:
+</p><pre class="programlisting">
+SELECT 'super:*'::tsquery;
+ tsquery
+-----------
+ 'super':*
+</pre><p>
+ This query will match any word in a <code class="type">tsvector</code> that begins
+ with <span class="quote">“<span class="quote">super</span>”</span>.
+ </p><p>
+ Quoting rules for lexemes are the same as described previously for
+ lexemes in <code class="type">tsvector</code>; and, as with <code class="type">tsvector</code>,
+ any required normalization of words must be done before converting
+ to the <code class="type">tsquery</code> type. The <code class="function">to_tsquery</code>
+ function is convenient for performing such normalization:
+
+</p><pre class="programlisting">
+SELECT to_tsquery('Fat:ab &amp; Cats');
+ to_tsquery
+------------------
+ 'fat':AB &amp; 'cat'
+</pre><p>
+
+ Note that <code class="function">to_tsquery</code> will process prefixes in the same way
+ as other words, which means this comparison returns true:
+
+</p><pre class="programlisting">
+SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
+ ?column?
+----------
+ t
+</pre><p>
+ because <code class="literal">postgres</code> gets stemmed to <code class="literal">postgr</code>:
+</p><pre class="programlisting">
+SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
+ to_tsvector | to_tsquery
+---------------+------------
+ 'postgradu':1 | 'postgr':*
+</pre><p>
+ which will match the stemmed form of <code class="literal">postgraduate</code>.
+ </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="datatype-bit.html" title="8.10. Bit String Types">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="datatype.html" title="Chapter 8. Data Types">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="datatype-uuid.html" title="8.12. UUID Type">Next</a></td></tr><tr><td width="40%" align="left" valign="top">8.10. Bit String Types </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 15.5 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 8.12. <acronym class="acronym">UUID</acronym> Type</td></tr></table></div></body></html> \ No newline at end of file