From 293913568e6a7a86fd1479e1cff8e2ecb58d6568 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sat, 13 Apr 2024 15:44:03 +0200 Subject: Adding upstream version 16.2. Signed-off-by: Daniel Baumann --- doc/src/sgml/html/functions-textsearch.html | 763 ++++++++++++++++++++++++++++ 1 file changed, 763 insertions(+) create mode 100644 doc/src/sgml/html/functions-textsearch.html (limited to 'doc/src/sgml/html/functions-textsearch.html') diff --git a/doc/src/sgml/html/functions-textsearch.html b/doc/src/sgml/html/functions-textsearch.html new file mode 100644 index 0000000..6e3f163 --- /dev/null +++ b/doc/src/sgml/html/functions-textsearch.html @@ -0,0 +1,763 @@ + +9.13. Text Search Functions and Operators

9.13. Text Search Functions and Operators #

+ Table 9.42, + Table 9.43 and + Table 9.44 + summarize the functions and operators that are provided + for full text searching. See Chapter 12 for a detailed + explanation of PostgreSQL's text search + facility. +

Table 9.42. Text Search Operators

+ Operator +

+

+ Description +

+

+ Example(s) +

+ tsvector @@ tsquery + → boolean +

+

+ tsquery @@ tsvector + → boolean +

+

+ Does tsvector match tsquery? + (The arguments can be given in either order.) +

+

+ to_tsvector('fat cats ate rats') @@ to_tsquery('cat & rat') + → t +

+ text @@ tsquery + → boolean +

+

+ Does text string, after implicit invocation + of to_tsvector(), match tsquery? +

+

+ 'fat cats ate rats' @@ to_tsquery('cat & rat') + → t +

+ tsvector @@@ tsquery + → boolean +

+

+ tsquery @@@ tsvector + → boolean +

+

+ This is a deprecated synonym for @@. +

+

+ to_tsvector('fat cats ate rats') @@@ to_tsquery('cat & rat') + → t +

+ tsvector || tsvector + → tsvector +

+

+ Concatenates two tsvectors. If both inputs contain + lexeme positions, the second input's positions are adjusted + accordingly. +

+

+ 'a:1 b:2'::tsvector || 'c:1 d:2 b:3'::tsvector + → 'a':1 'b':2,5 'c':3 'd':4 +

+ tsquery && tsquery + → tsquery +

+

+ ANDs two tsquerys together, producing a query that + matches documents that match both input queries. +

+

+ 'fat | rat'::tsquery && 'cat'::tsquery + → ( 'fat' | 'rat' ) & 'cat' +

+ tsquery || tsquery + → tsquery +

+

+ ORs two tsquerys together, producing a query that + matches documents that match either input query. +

+

+ 'fat | rat'::tsquery || 'cat'::tsquery + → 'fat' | 'rat' | 'cat' +

+ !! tsquery + → tsquery +

+

+ Negates a tsquery, producing a query that matches + documents that do not match the input query. +

+

+ !! 'cat'::tsquery + → !'cat' +

+ tsquery <-> tsquery + → tsquery +

+

+ Constructs a phrase query, which matches if the two input queries + match at successive lexemes. +

+

+ to_tsquery('fat') <-> to_tsquery('rat') + → 'fat' <-> 'rat' +

+ tsquery @> tsquery + → boolean +

+

+ Does first tsquery contain the second? (This considers + only whether all the lexemes appearing in one query appear in the + other, ignoring the combining operators.) +

+

+ 'cat'::tsquery @> 'cat & rat'::tsquery + → f +

+ tsquery <@ tsquery + → boolean +

+

+ Is first tsquery contained in the second? (This + considers only whether all the lexemes appearing in one query appear + in the other, ignoring the combining operators.) +

+

+ 'cat'::tsquery <@ 'cat & rat'::tsquery + → t +

+

+ 'cat'::tsquery <@ '!cat & rat'::tsquery + → t +


+ In addition to these specialized operators, the usual comparison + operators shown in Table 9.1 are + available for types tsvector and tsquery. + These are not very + useful for text searching but allow, for example, unique indexes to be + built on columns of these types. +

Table 9.43. Text Search Functions

+ Function +

+

+ Description +

+

+ Example(s) +

+ + array_to_tsvector ( text[] ) + → tsvector +

+

+ Converts an array of text strings to a tsvector. + The given strings are used as lexemes as-is, without further + processing. Array elements must not be empty strings + or NULL. +

+

+ array_to_tsvector('{fat,cat,rat}'::text[]) + → 'cat' 'fat' 'rat' +

+ + get_current_ts_config ( ) + → regconfig +

+

+ Returns the OID of the current default text search configuration + (as set by default_text_search_config). +

+

+ get_current_ts_config() + → english +

+ + length ( tsvector ) + → integer +

+

+ Returns the number of lexemes in the tsvector. +

+

+ length('fat:2,4 cat:3 rat:5A'::tsvector) + → 3 +

+ + numnode ( tsquery ) + → integer +

+

+ Returns the number of lexemes plus operators in + the tsquery. +

+

+ numnode('(fat & rat) | cat'::tsquery) + → 5 +

+ + plainto_tsquery ( + [ config regconfig, ] + query text ) + → tsquery +

+

+ Converts text to a tsquery, normalizing words according to + the specified or default configuration. Any punctuation in the string + is ignored (it does not determine query operators). The resulting + query matches documents containing all non-stopwords in the text. +

+

+ plainto_tsquery('english', 'The Fat Rats') + → 'fat' & 'rat' +

+ + phraseto_tsquery ( + [ config regconfig, ] + query text ) + → tsquery +

+

+ Converts text to a tsquery, normalizing words according to + the specified or default configuration. Any punctuation in the string + is ignored (it does not determine query operators). The resulting + query matches phrases containing all non-stopwords in the text. +

+

+ phraseto_tsquery('english', 'The Fat Rats') + → 'fat' <-> 'rat' +

+

+ phraseto_tsquery('english', 'The Cat and Rats') + → 'cat' <2> 'rat' +

+ + websearch_to_tsquery ( + [ config regconfig, ] + query text ) + → tsquery +

+

+ Converts text to a tsquery, normalizing words according + to the specified or default configuration. Quoted word sequences are + converted to phrase tests. The word or is understood + as producing an OR operator, and a dash produces a NOT operator; + other punctuation is ignored. + This approximates the behavior of some common web search tools. +

+

+ websearch_to_tsquery('english', '"fat rat" or cat dog') + → 'fat' <-> 'rat' | 'cat' & 'dog' +

+ + querytree ( tsquery ) + → text +

+

+ Produces a representation of the indexable portion of + a tsquery. A result that is empty or + just T indicates a non-indexable query. +

+

+ querytree('foo & ! bar'::tsquery) + → 'foo' +

+ + setweight ( vector tsvector, weight "char" ) + → tsvector +

+

+ Assigns the specified weight to each element + of the vector. +

+

+ setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A') + → 'cat':3A 'fat':2A,4A 'rat':5A +

+ + setweight ( vector tsvector, weight "char", lexemes text[] ) + → tsvector +

+

+ Assigns the specified weight to elements + of the vector that are listed + in lexemes. + The strings in lexemes are taken as lexemes + as-is, without further processing. Strings that do not match any + lexeme in vector are ignored. +

+

+ setweight('fat:2,4 cat:3 rat:5,6B'::tsvector, 'A', '{cat,rat}') + → 'cat':3A 'fat':2,4 'rat':5A,6A +

+ + strip ( tsvector ) + → tsvector +

+

+ Removes positions and weights from the tsvector. +

+

+ strip('fat:2,4 cat:3 rat:5A'::tsvector) + → 'cat' 'fat' 'rat' +

+ + to_tsquery ( + [ config regconfig, ] + query text ) + → tsquery +

+

+ Converts text to a tsquery, normalizing words according to + the specified or default configuration. The words must be combined + by valid tsquery operators. +

+

+ to_tsquery('english', 'The & Fat & Rats') + → 'fat' & 'rat' +

+ + to_tsvector ( + [ config regconfig, ] + document text ) + → tsvector +

+

+ Converts text to a tsvector, normalizing words according + to the specified or default configuration. Position information is + included in the result. +

+

+ to_tsvector('english', 'The Fat Rats') + → 'fat':2 'rat':3 +

+ to_tsvector ( + [ config regconfig, ] + document json ) + → tsvector +

+

+ to_tsvector ( + [ config regconfig, ] + document jsonb ) + → tsvector +

+

+ Converts each string value in the JSON document to + a tsvector, normalizing words according to the specified + or default configuration. The results are then concatenated in + document order to produce the output. Position information is + generated as though one stopword exists between each pair of string + values. (Beware that document order of the fields of a + JSON object is implementation-dependent when the input + is jsonb; observe the difference in the examples.) +

+

+ to_tsvector('english', '{"aa": "The Fat Rats", "b": "dog"}'::json) + → 'dog':5 'fat':2 'rat':3 +

+

+ to_tsvector('english', '{"aa": "The Fat Rats", "b": "dog"}'::jsonb) + → 'dog':1 'fat':4 'rat':5 +

+ + json_to_tsvector ( + [ config regconfig, ] + document json, + filter jsonb ) + → tsvector +

+

+ + jsonb_to_tsvector ( + [ config regconfig, ] + document jsonb, + filter jsonb ) + → tsvector +

+

+ Selects each item in the JSON document that is requested by + the filter and converts each one to + a tsvector, normalizing words according to the specified + or default configuration. The results are then concatenated in + document order to produce the output. Position information is + generated as though one stopword exists between each pair of selected + items. (Beware that document order of the fields of a + JSON object is implementation-dependent when the input + is jsonb.) + The filter must be a jsonb + array containing zero or more of these keywords: + "string" (to include all string values), + "numeric" (to include all numeric values), + "boolean" (to include all boolean values), + "key" (to include all keys), or + "all" (to include all the above). + As a special case, the filter can also be a + simple JSON value that is one of these keywords. +

+

+ json_to_tsvector('english', '{"a": "The Fat Rats", "b": 123}'::json, '["string", "numeric"]') + → '123':5 'fat':2 'rat':3 +

+

+ json_to_tsvector('english', '{"cat": "The Fat Rats", "dog": 123}'::json, '"all"') + → '123':9 'cat':1 'dog':7 'fat':4 'rat':5 +

+ + ts_delete ( vector tsvector, lexeme text ) + → tsvector +

+

+ Removes any occurrence of the given lexeme + from the vector. + The lexeme string is treated as a lexeme as-is, + without further processing. +

+

+ ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, 'fat') + → 'cat':3 'rat':5A +

+ ts_delete ( vector tsvector, lexemes text[] ) + → tsvector +

+

+ Removes any occurrences of the lexemes + in lexemes + from the vector. + The strings in lexemes are taken as lexemes + as-is, without further processing. Strings that do not match any + lexeme in vector are ignored. +

+

+ ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat']) + → 'cat':3 +

+ + ts_filter ( vector tsvector, weights "char"[] ) + → tsvector +

+

+ Selects only elements with the given weights + from the vector. +

+

+ ts_filter('fat:2,4 cat:3b,7c rat:5A'::tsvector, '{a,b}') + → 'cat':3B 'rat':5A +

+ + ts_headline ( + [ config regconfig, ] + document text, + query tsquery + [, options text ] ) + → text +

+

+ Displays, in an abbreviated form, the match(es) for + the query in + the document, which must be raw text not + a tsvector. Words in the document are normalized + according to the specified or default configuration before matching to + the query. Use of this function is discussed in + Section 12.3.4, which also describes the + available options. +

+

+ ts_headline('The fat cat ate the rat.', 'cat') + → The fat <b>cat</b> ate the rat. +

+ ts_headline ( + [ config regconfig, ] + document json, + query tsquery + [, options text ] ) + → text +

+

+ ts_headline ( + [ config regconfig, ] + document jsonb, + query tsquery + [, options text ] ) + → text +

+

+ Displays, in an abbreviated form, match(es) for + the query that occur in string values + within the JSON document. + See Section 12.3.4 for more details. +

+

+ ts_headline('{"cat":"raining cats and dogs"}'::jsonb, 'cat') + → {"cat": "raining <b>cats</b> and dogs"} +

+ + ts_rank ( + [ weights real[], ] + vector tsvector, + query tsquery + [, normalization integer ] ) + → real +

+

+ Computes a score showing how well + the vector matches + the query. See + Section 12.3.3 for details. +

+

+ ts_rank(to_tsvector('raining cats and dogs'), 'cat') + → 0.06079271 +

+ + ts_rank_cd ( + [ weights real[], ] + vector tsvector, + query tsquery + [, normalization integer ] ) + → real +

+

+ Computes a score showing how well + the vector matches + the query, using a cover density + algorithm. See Section 12.3.3 for details. +

+

+ ts_rank_cd(to_tsvector('raining cats and dogs'), 'cat') + → 0.1 +

+ + ts_rewrite ( query tsquery, + target tsquery, + substitute tsquery ) + → tsquery +

+

+ Replaces occurrences of target + with substitute + within the query. + See Section 12.4.2.1 for details. +

+

+ ts_rewrite('a & b'::tsquery, 'a'::tsquery, 'foo|bar'::tsquery) + → 'b' & ( 'foo' | 'bar' ) +

+ ts_rewrite ( query tsquery, + select text ) + → tsquery +

+

+ Replaces portions of the query according to + target(s) and substitute(s) obtained by executing + a SELECT command. + See Section 12.4.2.1 for details. +

+

+ SELECT ts_rewrite('a & b'::tsquery, 'SELECT t,s FROM aliases') + → 'b' & ( 'foo' | 'bar' ) +

+ + tsquery_phrase ( query1 tsquery, query2 tsquery ) + → tsquery +

+

+ Constructs a phrase query that searches + for matches of query1 + and query2 at successive lexemes (same + as <-> operator). +

+

+ tsquery_phrase(to_tsquery('fat'), to_tsquery('cat')) + → 'fat' <-> 'cat' +

+ tsquery_phrase ( query1 tsquery, query2 tsquery, distance integer ) + → tsquery +

+

+ Constructs a phrase query that searches + for matches of query1 and + query2 that occur exactly + distance lexemes apart. +

+

+ tsquery_phrase(to_tsquery('fat'), to_tsquery('cat'), 10) + → 'fat' <10> 'cat' +

+ + tsvector_to_array ( tsvector ) + → text[] +

+

+ Converts a tsvector to an array of lexemes. +

+

+ tsvector_to_array('fat:2,4 cat:3 rat:5A'::tsvector) + → {cat,fat,rat} +

+ + unnest ( tsvector ) + → setof record + ( lexeme text, + positions smallint[], + weights text ) +

+

+ Expands a tsvector into a set of rows, one per lexeme. +

+

+ select * from unnest('cat:3 fat:2,4 rat:5A'::tsvector) + → +

+ lexeme | positions | weights
+--------+-----------+---------
+ cat    | {3}       | {D}
+ fat    | {2,4}     | {D,D}
+ rat    | {5}       | {A}
+

+


Note

+ All the text search functions that accept an optional regconfig + argument will use the configuration specified by + default_text_search_config + when that argument is omitted. +

+ The functions in + Table 9.44 + are listed separately because they are not usually used in everyday text + searching operations. They are primarily helpful for development and + debugging of new text search configurations. +

Table 9.44. Text Search Debugging Functions

+ Function +

+

+ Description +

+

+ Example(s) +

+ + ts_debug ( + [ config regconfig, ] + document text ) + → setof record + ( alias text, + description text, + token text, + dictionaries regdictionary[], + dictionary regdictionary, + lexemes text[] ) +

+

+ Extracts and normalizes tokens from + the document according to the specified or + default text search configuration, and returns information about how + each token was processed. + See Section 12.8.1 for details. +

+

+ ts_debug('english', 'The Brightest supernovaes') + → (asciiword,"Word, all ASCII",The,{english_stem},english_stem,{}) ... +

+ + ts_lexize ( dict regdictionary, token text ) + → text[] +

+

+ Returns an array of replacement lexemes if the input token is known to + the dictionary, or an empty array if the token is known to the + dictionary but it is a stop word, or NULL if it is not a known word. + See Section 12.8.3 for details. +

+

+ ts_lexize('english_stem', 'stars') + → {star} +

+ + ts_parse ( parser_name text, + document text ) + → setof record + ( tokid integer, + token text ) +

+

+ Extracts tokens from the document using the + named parser. + See Section 12.8.2 for details. +

+

+ ts_parse('default', 'foo - bar') + → (1,foo) ... +

+ ts_parse ( parser_oid oid, + document text ) + → setof record + ( tokid integer, + token text ) +

+

+ Extracts tokens from the document using a + parser specified by OID. + See Section 12.8.2 for details. +

+

+ ts_parse(3722, 'foo - bar') + → (1,foo) ... +

+ + ts_token_type ( parser_name text ) + → setof record + ( tokid integer, + alias text, + description text ) +

+

+ Returns a table that describes each type of token the named parser can + recognize. + See Section 12.8.2 for details. +

+

+ ts_token_type('default') + → (1,asciiword,"Word, all ASCII") ... +

+ ts_token_type ( parser_oid oid ) + → setof record + ( tokid integer, + alias text, + description text ) +

+

+ Returns a table that describes each type of token a parser specified + by OID can recognize. + See Section 12.8.2 for details. +

+

+ ts_token_type(3722) + → (1,asciiword,"Word, all ASCII") ... +

+ + ts_stat ( sqlquery text + [, weights text ] ) + → setof record + ( word text, + ndoc integer, + nentry integer ) +

+

+ Executes the sqlquery, which must return a + single tsvector column, and returns statistics about each + distinct lexeme contained in the data. + See Section 12.4.4 for details. +

+

+ ts_stat('SELECT vector FROM apod') + → (foo,10,15) ... +


\ No newline at end of file -- cgit v1.2.3