1 files changed, 248 insertions, 0 deletions
diff --git a/doc/src/sgml/html/fuzzystrmatch.html b/doc/src/sgml/html/fuzzystrmatch.html
new file mode 100644
index 0000000..f02d47e
--- /dev/null
+++ b/doc/src/sgml/html/fuzzystrmatch.html
@@ -0,0 +1,248 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>F.17. fuzzystrmatch — determine string similarities and distance</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="file-fdw.html" title="F.16. file_fdw — access data files in the server's file system" /><link rel="next" href="hstore.html" title="F.18. hstore — hstore key/value datatype" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">F.17. fuzzystrmatch — determine string similarities and distance</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="file-fdw.html" title="F.16. file_fdw — access data files in the server's file system">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="contrib.html" title="Appendix F. Additional Supplied Modules and Extensions">Up</a></td><th width="60%" align="center">Appendix F. Additional Supplied Modules and Extensions</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="hstore.html" title="F.18. hstore — hstore key/value datatype">Next</a></td></tr></table><hr /></div><div class="sect1" id="FUZZYSTRMATCH"><div class="titlepage"><div><div><h2 class="title" style="clear: both">F.17. fuzzystrmatch — determine string similarities and distance <a href="#FUZZYSTRMATCH" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="fuzzystrmatch.html#FUZZYSTRMATCH-SOUNDEX">F.17.1. Soundex</a></span></dt><dt><span class="sect2"><a href="fuzzystrmatch.html#FUZZYSTRMATCH-DAITCH-MOKOTOFF">F.17.2. Daitch-Mokotoff Soundex</a></span></dt><dt><span class="sect2"><a href="fuzzystrmatch.html#FUZZYSTRMATCH-LEVENSHTEIN">F.17.3. Levenshtein</a></span></dt><dt><span class="sect2"><a href="fuzzystrmatch.html#FUZZYSTRMATCH-METAPHONE">F.17.4. Metaphone</a></span></dt><dt><span class="sect2"><a href="fuzzystrmatch.html#FUZZYSTRMATCH-DOUBLE-METAPHONE">F.17.5. Double Metaphone</a></span></dt></dl></div><a id="id-1.11.7.27.2" class="indexterm"></a><p>
+  The <code class="filename">fuzzystrmatch</code> module provides several
+  functions to determine similarities and distance between strings.
+ </p><div class="caution"><h3 class="title">Caution</h3><p>
+   At present, the <code class="function">soundex</code>, <code class="function">metaphone</code>,
+   <code class="function">dmetaphone</code>, and <code class="function">dmetaphone_alt</code> functions do
+   not work well with multibyte encodings (such as UTF-8).
+   Use <code class="function">daitch_mokotoff</code>
+   or <code class="function">levenshtein</code> with such data.
+  </p></div><p>
+  This module is considered <span class="quote">“<span class="quote">trusted</span>”</span>, that is, it can be
+  installed by non-superusers who have <code class="literal">CREATE</code> privilege
+  on the current database.
+ </p><div class="sect2" id="FUZZYSTRMATCH-SOUNDEX"><div class="titlepage"><div><div><h3 class="title">F.17.1. Soundex <a href="#FUZZYSTRMATCH-SOUNDEX" class="id_link">#</a></h3></div></div></div><p>
+   The Soundex system is a method of matching similar-sounding names
+   by converting them to the same code.  It was initially used by the
+   United States Census in 1880, 1900, and 1910.  Note that Soundex
+   is not very useful for non-English names.
+  </p><p>
+   The <code class="filename">fuzzystrmatch</code> module provides two functions
+   for working with Soundex codes:
+  </p><a id="id-1.11.7.27.6.4" class="indexterm"></a><a id="id-1.11.7.27.6.5" class="indexterm"></a><pre class="synopsis">
+soundex(text) returns text
+difference(text, text) returns int
+</pre><p>
+   The <code class="function">soundex</code> function converts a string to its Soundex code.
+   The <code class="function">difference</code> function converts two strings to their Soundex
+   codes and then reports the number of matching code positions.  Since
+   Soundex codes have four characters, the result ranges from zero to four,
+   with zero being no match and four being an exact match.  (Thus, the
+   function is misnamed — <code class="function">similarity</code> would have been
+   a better name.)
+  </p><p>
+   Here are some usage examples:
+  </p><pre class="programlisting">
+SELECT soundex('hello world!');
+
+SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann');
+SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew');
+SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret');
+
+CREATE TABLE s (nm text);
+
+INSERT INTO s VALUES ('john');
+INSERT INTO s VALUES ('joan');
+INSERT INTO s VALUES ('wobbly');
+INSERT INTO s VALUES ('jack');
+
+SELECT * FROM s WHERE soundex(nm) = soundex('john');
+
+SELECT * FROM s WHERE difference(s.nm, 'john') &gt; 2;
+</pre></div><div class="sect2" id="FUZZYSTRMATCH-DAITCH-MOKOTOFF"><div class="titlepage"><div><div><h3 class="title">F.17.2. Daitch-Mokotoff Soundex <a href="#FUZZYSTRMATCH-DAITCH-MOKOTOFF" class="id_link">#</a></h3></div></div></div><p>
+   Like the original Soundex system, Daitch-Mokotoff Soundex matches
+   similar-sounding names by converting them to the same code.
+   However, Daitch-Mokotoff Soundex is significantly more useful for
+   non-English names than the original system.
+   Major improvements over the original system include:
+
+   </p><div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: bullet; "><li class="listitem" style="list-style-type: disc"><p>
+      The code is based on the first six meaningful letters rather than four.
+     </p></li><li class="listitem" style="list-style-type: disc"><p>
+      A letter or combination of letters maps into ten possible codes rather
+      than seven.
+     </p></li><li class="listitem" style="list-style-type: disc"><p>
+      Where two consecutive letters have a single sound, they are coded as a
+      single number.
+     </p></li><li class="listitem" style="list-style-type: disc"><p>
+      When a letter or combination of letters may have different sounds,
+      multiple codes are emitted to cover all possibilities.
+     </p></li></ul></div><p>
+  </p><a id="id-1.11.7.27.7.3" class="indexterm"></a><p>
+   This function generates the Daitch-Mokotoff soundex codes for its input:
+  </p><pre class="synopsis">
+daitch_mokotoff(<em class="parameter"><code>source</code></em> text) returns text[]
+</pre><p>
+   The result may contain one or more codes depending on how many plausible
+   pronunciations there are, so it is represented as an array.
+  </p><p>
+   Since a Daitch-Mokotoff soundex code consists of only 6 digits,
+   <em class="parameter"><code>source</code></em> should be preferably a single word or name.
+  </p><p>
+   Here are some examples:
+  </p><pre class="programlisting">
+SELECT daitch_mokotoff('George');
+ daitch_mokotoff
+-----------------
+ {595000}
+
+SELECT daitch_mokotoff('John');
+ daitch_mokotoff
+-----------------
+ {160000,460000}
+
+SELECT daitch_mokotoff('Bierschbach');
+                      daitch_mokotoff
+-----------------------------------------------------------
+ {794575,794574,794750,794740,745750,745740,747500,747400}
+
+SELECT daitch_mokotoff('Schwartzenegger');
+ daitch_mokotoff
+-----------------
+ {479465}
+</pre><p>
+   For matching of single names, returned text arrays can be matched
+   directly using the <code class="literal">&amp;&amp;</code> operator: any overlap
+   can be considered a match.  A GIN index may
+   be used for efficiency, see <a class="xref" href="gin.html" title="Chapter 70. GIN Indexes">Chapter 70</a> and this example:
+  </p><pre class="programlisting">
+CREATE TABLE s (nm text);
+CREATE INDEX ix_s_dm ON s USING gin (daitch_mokotoff(nm)) WITH (fastupdate = off);
+
+INSERT INTO s (nm) VALUES
+  ('Schwartzenegger'),
+  ('John'),
+  ('James'),
+  ('Steinman'),
+  ('Steinmetz');
+
+SELECT * FROM s WHERE daitch_mokotoff(nm) &amp;&amp; daitch_mokotoff('Swartzenegger');
+SELECT * FROM s WHERE daitch_mokotoff(nm) &amp;&amp; daitch_mokotoff('Jane');
+SELECT * FROM s WHERE daitch_mokotoff(nm) &amp;&amp; daitch_mokotoff('Jens');
+</pre><p>
+   For indexing and matching of any number of names in any order, Full Text
+   Search features can be used. See <a class="xref" href="textsearch.html" title="Chapter 12. Full Text Search">Chapter 12</a> and this
+   example:
+  </p><pre class="programlisting">
+CREATE FUNCTION soundex_tsvector(v_name text) RETURNS tsvector
+BEGIN ATOMIC
+  SELECT to_tsvector('simple',
+                     string_agg(array_to_string(daitch_mokotoff(n), ' '), ' '))
+  FROM regexp_split_to_table(v_name, '\s+') AS n;
+END;
+
+CREATE FUNCTION soundex_tsquery(v_name text) RETURNS tsquery
+BEGIN ATOMIC
+  SELECT string_agg('(' || array_to_string(daitch_mokotoff(n), '|') || ')', '&amp;')::tsquery
+  FROM regexp_split_to_table(v_name, '\s+') AS n;
+END;
+
+CREATE TABLE s (nm text);
+CREATE INDEX ix_s_txt ON s USING gin (soundex_tsvector(nm)) WITH (fastupdate = off);
+
+INSERT INTO s (nm) VALUES
+  ('John Doe'),
+  ('Jane Roe'),
+  ('Public John Q.'),
+  ('George Best'),
+  ('John Yamson');
+
+SELECT * FROM s WHERE soundex_tsvector(nm) @@ soundex_tsquery('john');
+SELECT * FROM s WHERE soundex_tsvector(nm) @@ soundex_tsquery('jane doe');
+SELECT * FROM s WHERE soundex_tsvector(nm) @@ soundex_tsquery('john public');
+SELECT * FROM s WHERE soundex_tsvector(nm) @@ soundex_tsquery('besst, giorgio');
+SELECT * FROM s WHERE soundex_tsvector(nm) @@ soundex_tsquery('Jameson John');
+</pre><p>
+   If it is desired to avoid recalculation of soundex codes during index
+   rechecks, an index on a separate column can be used instead of an index on
+   an expression.  A stored generated column can be used for this; see
+   <a class="xref" href="ddl-generated-columns.html" title="5.3. Generated Columns">Section 5.3</a>.
+  </p></div><div class="sect2" id="FUZZYSTRMATCH-LEVENSHTEIN"><div class="titlepage"><div><div><h3 class="title">F.17.3. Levenshtein <a href="#FUZZYSTRMATCH-LEVENSHTEIN" class="id_link">#</a></h3></div></div></div><p>
+   This function calculates the Levenshtein distance between two strings:
+  </p><a id="id-1.11.7.27.8.3" class="indexterm"></a><a id="id-1.11.7.27.8.4" class="indexterm"></a><pre class="synopsis">
+levenshtein(source text, target text, ins_cost int, del_cost int, sub_cost int) returns int
+levenshtein(source text, target text) returns int
+levenshtein_less_equal(source text, target text, ins_cost int, del_cost int, sub_cost int, max_d int) returns int
+levenshtein_less_equal(source text, target text, max_d int) returns int
+</pre><p>
+   Both <code class="literal">source</code> and <code class="literal">target</code> can be any
+   non-null string, with a maximum of 255 characters.  The cost parameters
+   specify how much to charge for a character insertion, deletion, or
+   substitution, respectively.  You can omit the cost parameters, as in
+   the second version of the function; in that case they all default to 1.
+  </p><p>
+   <code class="function">levenshtein_less_equal</code> is an accelerated version of the
+   Levenshtein function for use when only small distances are of interest.
+   If the actual distance is less than or equal to <code class="literal">max_d</code>,
+   then <code class="function">levenshtein_less_equal</code> returns the correct
+   distance; otherwise it returns some value greater than <code class="literal">max_d</code>.
+   If <code class="literal">max_d</code> is negative then the behavior is the same as
+   <code class="function">levenshtein</code>.
+  </p><p>
+   Examples:
+  </p><pre class="screen">
+test=# SELECT levenshtein('GUMBO', 'GAMBOL');
+ levenshtein
+-------------
+           2
+(1 row)
+
+test=# SELECT levenshtein('GUMBO', 'GAMBOL', 2, 1, 1);
+ levenshtein
+-------------
+           3
+(1 row)
+
+test=# SELECT levenshtein_less_equal('extensive', 'exhaustive', 2);
+ levenshtein_less_equal
+------------------------
+                      3
+(1 row)
+
+test=# SELECT levenshtein_less_equal('extensive', 'exhaustive', 4);
+ levenshtein_less_equal
+------------------------
+                      4
+(1 row)
+</pre></div><div class="sect2" id="FUZZYSTRMATCH-METAPHONE"><div class="titlepage"><div><div><h3 class="title">F.17.4. Metaphone <a href="#FUZZYSTRMATCH-METAPHONE" class="id_link">#</a></h3></div></div></div><p>
+   Metaphone, like Soundex, is based on the idea of constructing a
+   representative code for an input string.  Two strings are then
+   deemed similar if they have the same codes.
+  </p><p>
+   This function calculates the metaphone code of an input string:
+  </p><a id="id-1.11.7.27.9.4" class="indexterm"></a><pre class="synopsis">
+metaphone(source text, max_output_length int) returns text
+</pre><p>
+   <code class="literal">source</code> has to be a non-null string with a maximum of
+   255 characters.  <code class="literal">max_output_length</code> sets the maximum
+   length of the output metaphone code; if longer, the output is truncated
+   to this length.
+  </p><p>
+   Example:
+  </p><pre class="screen">
+test=# SELECT metaphone('GUMBO', 4);
+ metaphone
+-----------
+ KM
+(1 row)
+</pre></div><div class="sect2" id="FUZZYSTRMATCH-DOUBLE-METAPHONE"><div class="titlepage"><div><div><h3 class="title">F.17.5. Double Metaphone <a href="#FUZZYSTRMATCH-DOUBLE-METAPHONE" class="id_link">#</a></h3></div></div></div><p>
+   The Double Metaphone system computes two <span class="quote">“<span class="quote">sounds like</span>”</span> strings
+   for a given input string — a <span class="quote">“<span class="quote">primary</span>”</span> and an
+   <span class="quote">“<span class="quote">alternate</span>”</span>.  In most cases they are the same, but for non-English
+   names especially they can be a bit different, depending on pronunciation.
+   These functions compute the primary and alternate codes:
+  </p><a id="id-1.11.7.27.10.3" class="indexterm"></a><a id="id-1.11.7.27.10.4" class="indexterm"></a><pre class="synopsis">
+dmetaphone(source text) returns text
+dmetaphone_alt(source text) returns text
+</pre><p>
+   There is no length limit on the input strings.
+  </p><p>
+   Example:
+  </p><pre class="screen">
+test=# SELECT dmetaphone('gumbo');
+ dmetaphone
+------------
+ KMP
+(1 row)
+</pre></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="file-fdw.html" title="F.16. file_fdw — access data files in the server's file system">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="contrib.html" title="Appendix F. Additional Supplied Modules and Extensions">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="hstore.html" title="F.18. hstore — hstore key/value datatype">Next</a></td></tr><tr><td width="40%" align="left" valign="top">F.16. file_fdw — access data files in the server's file system </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="40%" align="right" valign="top"> F.18. hstore — hstore key/value datatype</td></tr></table></div></body></html>
+\ No newline at end of file