summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/unaccent.html
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-16 19:46:48 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-16 19:46:48 +0000
commit311bcfc6b3acdd6fd152798c7f287ddf74fa2a98 (patch)
tree0ec307299b1dada3701e42f4ca6eda57d708261e /doc/src/sgml/html/unaccent.html
parentInitial commit. (diff)
downloadpostgresql-15-311bcfc6b3acdd6fd152798c7f287ddf74fa2a98.tar.xz
postgresql-15-311bcfc6b3acdd6fd152798c7f287ddf74fa2a98.zip
Adding upstream version 15.4.upstream/15.4upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/html/unaccent.html')
-rw-r--r--doc/src/sgml/html/unaccent.html131
1 files changed, 131 insertions, 0 deletions
diff --git a/doc/src/sgml/html/unaccent.html b/doc/src/sgml/html/unaccent.html
new file mode 100644
index 0000000..72554f8
--- /dev/null
+++ b/doc/src/sgml/html/unaccent.html
@@ -0,0 +1,131 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>F.48. unaccent</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="tsm-system-time.html" title="F.47. tsm_system_time" /><link rel="next" href="uuid-ossp.html" title="F.49. uuid-ossp" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">F.48. unaccent</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="tsm-system-time.html" title="F.47. tsm_system_time">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="contrib.html" title="Appendix F. Additional Supplied Modules">Up</a></td><th width="60%" align="center">Appendix F. Additional Supplied Modules</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 15.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="uuid-ossp.html" title="F.49. uuid-ossp">Next</a></td></tr></table><hr /></div><div class="sect1" id="UNACCENT"><div class="titlepage"><div><div><h2 class="title" style="clear: both">F.48. unaccent</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="unaccent.html#id-1.11.7.57.6">F.48.1. Configuration</a></span></dt><dt><span class="sect2"><a href="unaccent.html#id-1.11.7.57.7">F.48.2. Usage</a></span></dt><dt><span class="sect2"><a href="unaccent.html#id-1.11.7.57.8">F.48.3. Functions</a></span></dt></dl></div><a id="id-1.11.7.57.2" class="indexterm"></a><p>
+ <code class="filename">unaccent</code> is a text search dictionary that removes accents
+ (diacritic signs) from lexemes.
+ It's a filtering dictionary, which means its output is
+ always passed to the next dictionary (if any), unlike the normal
+ behavior of dictionaries. This allows accent-insensitive processing
+ for full text search.
+ </p><p>
+ The current implementation of <code class="filename">unaccent</code> cannot be used as a
+ normalizing dictionary for the <code class="filename">thesaurus</code> dictionary.
+ </p><p>
+ This module is considered <span class="quote">“<span class="quote">trusted</span>”</span>, that is, it can be
+ installed by non-superusers who have <code class="literal">CREATE</code> privilege
+ on the current database.
+ </p><div class="sect2" id="id-1.11.7.57.6"><div class="titlepage"><div><div><h3 class="title">F.48.1. Configuration</h3></div></div></div><p>
+ An <code class="literal">unaccent</code> dictionary accepts the following options:
+ </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
+ <code class="literal">RULES</code> is the base name of the file containing the list of
+ translation rules. This file must be stored in
+ <code class="filename">$SHAREDIR/tsearch_data/</code> (where <code class="literal">$SHAREDIR</code> means
+ the <span class="productname">PostgreSQL</span> installation's shared-data directory).
+ Its name must end in <code class="literal">.rules</code> (which is not to be included in
+ the <code class="literal">RULES</code> parameter).
+ </p></li></ul></div><p>
+ The rules file has the following format:
+ </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
+ Each line represents one translation rule, consisting of a character with
+ accent followed by a character without accent. The first is translated
+ into the second. For example,
+</p><pre class="programlisting">
+À A
+Á A
+Â A
+Ã A
+Ä A
+Å A
+Æ AE
+</pre><p>
+ The two characters must be separated by whitespace, and any leading or
+ trailing whitespace on a line is ignored.
+ </p></li><li class="listitem"><p>
+ Alternatively, if only one character is given on a line, instances of
+ that character are deleted; this is useful in languages where accents
+ are represented by separate characters.
+ </p></li><li class="listitem"><p>
+ Actually, each <span class="quote">“<span class="quote">character</span>”</span> can be any string not containing
+ whitespace, so <code class="filename">unaccent</code> dictionaries could be used for
+ other sorts of substring substitutions besides diacritic removal.
+ </p></li><li class="listitem"><p>
+ As with other <span class="productname">PostgreSQL</span> text search configuration files,
+ the rules file must be stored in UTF-8 encoding. The data is
+ automatically translated into the current database's encoding when
+ loaded. Any lines containing untranslatable characters are silently
+ ignored, so that rules files can contain rules that are not applicable in
+ the current encoding.
+ </p></li></ul></div><p>
+ A more complete example, which is directly useful for most European
+ languages, can be found in <code class="filename">unaccent.rules</code>, which is installed
+ in <code class="filename">$SHAREDIR/tsearch_data/</code> when the <code class="filename">unaccent</code>
+ module is installed. This rules file translates characters with accents
+ to the same characters without accents, and it also expands ligatures
+ into the equivalent series of simple characters (for example, Æ to
+ AE).
+ </p></div><div class="sect2" id="id-1.11.7.57.7"><div class="titlepage"><div><div><h3 class="title">F.48.2. Usage</h3></div></div></div><p>
+ Installing the <code class="literal">unaccent</code> extension creates a text
+ search template <code class="literal">unaccent</code> and a dictionary <code class="literal">unaccent</code>
+ based on it. The <code class="literal">unaccent</code> dictionary has the default
+ parameter setting <code class="literal">RULES='unaccent'</code>, which makes it immediately
+ usable with the standard <code class="filename">unaccent.rules</code> file.
+ If you wish, you can alter the parameter, for example
+
+</p><pre class="programlisting">
+mydb=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules');
+</pre><p>
+
+ or create new dictionaries based on the template.
+ </p><p>
+ To test the dictionary, you can try:
+</p><pre class="programlisting">
+mydb=# select ts_lexize('unaccent','Hôtel');
+ ts_lexize
+-----------
+ {Hotel}
+(1 row)
+</pre><p>
+ </p><p>
+ Here is an example showing how to insert the
+ <code class="filename">unaccent</code> dictionary into a text search configuration:
+</p><pre class="programlisting">
+mydb=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french );
+mydb=# ALTER TEXT SEARCH CONFIGURATION fr
+ ALTER MAPPING FOR hword, hword_part, word
+ WITH unaccent, french_stem;
+mydb=# select to_tsvector('fr','Hôtels de la Mer');
+ to_tsvector
+-------------------
+ 'hotel':1 'mer':4
+(1 row)
+
+mydb=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels');
+ ?column?
+----------
+ t
+(1 row)
+
+mydb=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels'));
+ ts_headline
+------------------------
+ &lt;b&gt;Hôtel&lt;/b&gt; de la Mer
+(1 row)
+</pre><p>
+ </p></div><div class="sect2" id="id-1.11.7.57.8"><div class="titlepage"><div><div><h3 class="title">F.48.3. Functions</h3></div></div></div><p>
+ The <code class="function">unaccent()</code> function removes accents (diacritic signs) from
+ a given string. Basically, it's a wrapper around
+ <code class="filename">unaccent</code>-type dictionaries, but it can be used outside normal
+ text search contexts.
+ </p><a id="id-1.11.7.57.8.3" class="indexterm"></a><pre class="synopsis">
+unaccent([<span class="optional"><em class="replaceable"><code>dictionary</code></em> <code class="type">regdictionary</code>, </span>] <em class="replaceable"><code>string</code></em> <code class="type">text</code>) returns <code class="type">text</code>
+</pre><p>
+ If the <em class="replaceable"><code>dictionary</code></em> argument is
+ omitted, the text search dictionary named <code class="literal">unaccent</code> and
+ appearing in the same schema as the <code class="function">unaccent()</code>
+ function itself is used.
+ </p><p>
+ For example:
+</p><pre class="programlisting">
+SELECT unaccent('unaccent', 'Hôtel');
+SELECT unaccent('Hôtel');
+</pre><p>
+ </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="tsm-system-time.html" title="F.47. tsm_system_time">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="contrib.html" title="Appendix F. Additional Supplied Modules">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="uuid-ossp.html" title="F.49. uuid-ossp">Next</a></td></tr><tr><td width="40%" align="left" valign="top">F.47. tsm_system_time </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 15.4 Documentation">Home</a></td><td width="40%" align="right" valign="top"> F.49. uuid-ossp</td></tr></table></div></body></html> \ No newline at end of file