diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-04 12:19:15 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-04 12:19:15 +0000 |
commit | 6eb9c5a5657d1fe77b55cc261450f3538d35a94d (patch) | |
tree | 657d8194422a5daccecfd42d654b8a245ef7b4c8 /doc/src/sgml/html/collation.html | |
parent | Initial commit. (diff) | |
download | postgresql-13-upstream.tar.xz postgresql-13-upstream.zip |
Adding upstream version 13.4.upstream/13.4upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'doc/src/sgml/html/collation.html')
-rw-r--r-- | doc/src/sgml/html/collation.html | 420 |
1 files changed, 420 insertions, 0 deletions
diff --git a/doc/src/sgml/html/collation.html b/doc/src/sgml/html/collation.html new file mode 100644 index 0000000..4639f87 --- /dev/null +++ b/doc/src/sgml/html/collation.html @@ -0,0 +1,420 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>23.2. Collation Support</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="prev" href="locale.html" title="23.1. Locale Support" /><link rel="next" href="multibyte.html" title="23.3. Character Set Support" /></head><body id="docContent" class="container-fluid col-10"><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">23.2. Collation Support</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="locale.html" title="23.1. Locale Support">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="charset.html" title="Chapter 23. Localization">Up</a></td><th width="60%" align="center">Chapter 23. Localization</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 13.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="multibyte.html" title="23.3. Character Set Support">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="COLLATION"><div class="titlepage"><div><div><h2 class="title" style="clear: both">23.2. Collation Support</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="collation.html#id-1.6.10.4.4">23.2.1. Concepts</a></span></dt><dt><span class="sect2"><a href="collation.html#COLLATION-MANAGING">23.2.2. Managing Collations</a></span></dt></dl></div><a id="id-1.6.10.4.2" class="indexterm"></a><p> + The collation feature allows specifying the sort order and character + classification behavior of data per-column, or even per-operation. + This alleviates the restriction that the + <code class="symbol">LC_COLLATE</code> and <code class="symbol">LC_CTYPE</code> settings + of a database cannot be changed after its creation. + </p><div class="sect2" id="id-1.6.10.4.4"><div class="titlepage"><div><div><h3 class="title">23.2.1. Concepts</h3></div></div></div><p> + Conceptually, every expression of a collatable data type has a + collation. (The built-in collatable data types are + <code class="type">text</code>, <code class="type">varchar</code>, and <code class="type">char</code>. + User-defined base types can also be marked collatable, and of course + a domain over a collatable data type is collatable.) If the + expression is a column reference, the collation of the expression is the + defined collation of the column. If the expression is a constant, the + collation is the default collation of the data type of the + constant. The collation of a more complex expression is derived + from the collations of its inputs, as described below. + </p><p> + The collation of an expression can be the <span class="quote">“<span class="quote">default</span>”</span> + collation, which means the locale settings defined for the + database. It is also possible for an expression's collation to be + indeterminate. In such cases, ordering operations and other + operations that need to know the collation will fail. + </p><p> + When the database system has to perform an ordering or a character + classification, it uses the collation of the input expression. This + happens, for example, with <code class="literal">ORDER BY</code> clauses + and function or operator calls such as <code class="literal"><</code>. + The collation to apply for an <code class="literal">ORDER BY</code> clause + is simply the collation of the sort key. The collation to apply for a + function or operator call is derived from the arguments, as described + below. In addition to comparison operators, collations are taken into + account by functions that convert between lower and upper case + letters, such as <code class="function">lower</code>, <code class="function">upper</code>, and + <code class="function">initcap</code>; by pattern matching operators; and by + <code class="function">to_char</code> and related functions. + </p><p> + For a function or operator call, the collation that is derived by + examining the argument collations is used at run time for performing + the specified operation. If the result of the function or operator + call is of a collatable data type, the collation is also used at parse + time as the defined collation of the function or operator expression, + in case there is a surrounding expression that requires knowledge of + its collation. + </p><p> + The <em class="firstterm">collation derivation</em> of an expression can be + implicit or explicit. This distinction affects how collations are + combined when multiple different collations appear in an + expression. An explicit collation derivation occurs when a + <code class="literal">COLLATE</code> clause is used; all other collation + derivations are implicit. When multiple collations need to be + combined, for example in a function call, the following rules are + used: + + </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p> + If any input expression has an explicit collation derivation, then + all explicitly derived collations among the input expressions must be + the same, otherwise an error is raised. If any explicitly + derived collation is present, that is the result of the + collation combination. + </p></li><li class="listitem"><p> + Otherwise, all input expressions must have the same implicit + collation derivation or the default collation. If any non-default + collation is present, that is the result of the collation combination. + Otherwise, the result is the default collation. + </p></li><li class="listitem"><p> + If there are conflicting non-default implicit collations among the + input expressions, then the combination is deemed to have indeterminate + collation. This is not an error condition unless the particular + function being invoked requires knowledge of the collation it should + apply. If it does, an error will be raised at run-time. + </p></li></ol></div><p> + + For example, consider this table definition: +</p><pre class="programlisting"> +CREATE TABLE test1 ( + a text COLLATE "de_DE", + b text COLLATE "es_ES", + ... +); +</pre><p> + + Then in +</p><pre class="programlisting"> +SELECT a < 'foo' FROM test1; +</pre><p> + the <code class="literal"><</code> comparison is performed according to + <code class="literal">de_DE</code> rules, because the expression combines an + implicitly derived collation with the default collation. But in +</p><pre class="programlisting"> +SELECT a < ('foo' COLLATE "fr_FR") FROM test1; +</pre><p> + the comparison is performed using <code class="literal">fr_FR</code> rules, + because the explicit collation derivation overrides the implicit one. + Furthermore, given +</p><pre class="programlisting"> +SELECT a < b FROM test1; +</pre><p> + the parser cannot determine which collation to apply, since the + <code class="structfield">a</code> and <code class="structfield">b</code> columns have conflicting + implicit collations. Since the <code class="literal"><</code> operator + does need to know which collation to use, this will result in an + error. The error can be resolved by attaching an explicit collation + specifier to either input expression, thus: +</p><pre class="programlisting"> +SELECT a < b COLLATE "de_DE" FROM test1; +</pre><p> + or equivalently +</p><pre class="programlisting"> +SELECT a COLLATE "de_DE" < b FROM test1; +</pre><p> + On the other hand, the structurally similar case +</p><pre class="programlisting"> +SELECT a || b FROM test1; +</pre><p> + does not result in an error, because the <code class="literal">||</code> operator + does not care about collations: its result is the same regardless + of the collation. + </p><p> + The collation assigned to a function or operator's combined input + expressions is also considered to apply to the function or operator's + result, if the function or operator delivers a result of a collatable + data type. So, in +</p><pre class="programlisting"> +SELECT * FROM test1 ORDER BY a || 'foo'; +</pre><p> + the ordering will be done according to <code class="literal">de_DE</code> rules. + But this query: +</p><pre class="programlisting"> +SELECT * FROM test1 ORDER BY a || b; +</pre><p> + results in an error, because even though the <code class="literal">||</code> operator + doesn't need to know a collation, the <code class="literal">ORDER BY</code> clause does. + As before, the conflict can be resolved with an explicit collation + specifier: +</p><pre class="programlisting"> +SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR"; +</pre><p> + </p></div><div class="sect2" id="COLLATION-MANAGING"><div class="titlepage"><div><div><h3 class="title">23.2.2. Managing Collations</h3></div></div></div><p> + A collation is an SQL schema object that maps an SQL name to locales + provided by libraries installed in the operating system. A collation + definition has a <em class="firstterm">provider</em> that specifies which + library supplies the locale data. One standard provider name + is <code class="literal">libc</code>, which uses the locales provided by the + operating system C library. These are the locales that most tools + provided by the operating system use. Another provider + is <code class="literal">icu</code>, which uses the external + ICU<a id="id-1.6.10.4.5.2.4" class="indexterm"></a> library. ICU locales can only be + used if support for ICU was configured when PostgreSQL was built. + </p><p> + A collation object provided by <code class="literal">libc</code> maps to a + combination of <code class="symbol">LC_COLLATE</code> and <code class="symbol">LC_CTYPE</code> + settings, as accepted by the <code class="literal">setlocale()</code> system library call. (As + the name would suggest, the main purpose of a collation is to set + <code class="symbol">LC_COLLATE</code>, which controls the sort order. But + it is rarely necessary in practice to have an + <code class="symbol">LC_CTYPE</code> setting that is different from + <code class="symbol">LC_COLLATE</code>, so it is more convenient to collect + these under one concept than to create another infrastructure for + setting <code class="symbol">LC_CTYPE</code> per expression.) Also, + a <code class="literal">libc</code> collation + is tied to a character set encoding (see <a class="xref" href="multibyte.html" title="23.3. Character Set Support">Section 23.3</a>). + The same collation name may exist for different encodings. + </p><p> + A collation object provided by <code class="literal">icu</code> maps to a named + collator provided by the ICU library. ICU does not support + separate <span class="quote">“<span class="quote">collate</span>”</span> and <span class="quote">“<span class="quote">ctype</span>”</span> settings, so + they are always the same. Also, ICU collations are independent of the + encoding, so there is always only one ICU collation of a given name in + a database. + </p><div class="sect3" id="id-1.6.10.4.5.5"><div class="titlepage"><div><div><h4 class="title">23.2.2.1. Standard Collations</h4></div></div></div><p> + On all platforms, the collations named <code class="literal">default</code>, + <code class="literal">C</code>, and <code class="literal">POSIX</code> are available. Additional + collations may be available depending on operating system support. + The <code class="literal">default</code> collation selects the <code class="symbol">LC_COLLATE</code> + and <code class="symbol">LC_CTYPE</code> values specified at database creation time. + The <code class="literal">C</code> and <code class="literal">POSIX</code> collations both specify + <span class="quote">“<span class="quote">traditional C</span>”</span> behavior, in which only the ASCII letters + <span class="quote">“<span class="quote"><code class="literal">A</code></span>”</span> through <span class="quote">“<span class="quote"><code class="literal">Z</code></span>”</span> + are treated as letters, and sorting is done strictly by character + code byte values. + </p><p> + Additionally, the SQL standard collation name <code class="literal">ucs_basic</code> + is available for encoding <code class="literal">UTF8</code>. It is equivalent + to <code class="literal">C</code> and sorts by Unicode code point. + </p></div><div class="sect3" id="id-1.6.10.4.5.6"><div class="titlepage"><div><div><h4 class="title">23.2.2.2. Predefined Collations</h4></div></div></div><p> + If the operating system provides support for using multiple locales + within a single program (<code class="function">newlocale</code> and related functions), + or if support for ICU is configured, + then when a database cluster is initialized, <code class="command">initdb</code> + populates the system catalog <code class="literal">pg_collation</code> with + collations based on all the locales it finds in the operating + system at the time. + </p><p> + To inspect the currently available locales, use the query <code class="literal">SELECT + * FROM pg_collation</code>, or the command <code class="command">\dOS+</code> + in <span class="application">psql</span>. + </p><div class="sect4" id="id-1.6.10.4.5.6.4"><div class="titlepage"><div><div><h5 class="title">23.2.2.2.1. libc Collations</h5></div></div></div><p> + For example, the operating system might + provide a locale named <code class="literal">de_DE.utf8</code>. + <code class="command">initdb</code> would then create a collation named + <code class="literal">de_DE.utf8</code> for encoding <code class="literal">UTF8</code> + that has both <code class="symbol">LC_COLLATE</code> and + <code class="symbol">LC_CTYPE</code> set to <code class="literal">de_DE.utf8</code>. + It will also create a collation with the <code class="literal">.utf8</code> + tag stripped off the name. So you could also use the collation + under the name <code class="literal">de_DE</code>, which is less cumbersome + to write and makes the name less encoding-dependent. Note that, + nevertheless, the initial set of collation names is + platform-dependent. + </p><p> + The default set of collations provided by <code class="literal">libc</code> map + directly to the locales installed in the operating system, which can be + listed using the command <code class="literal">locale -a</code>. In case + a <code class="literal">libc</code> collation is needed that has different values + for <code class="symbol">LC_COLLATE</code> and <code class="symbol">LC_CTYPE</code>, or if new + locales are installed in the operating system after the database system + was initialized, then a new collation may be created using + the <a class="xref" href="sql-createcollation.html" title="CREATE COLLATION"><span class="refentrytitle">CREATE COLLATION</span></a> command. + New operating system locales can also be imported en masse using + the <a class="link" href="functions-admin.html#FUNCTIONS-ADMIN-COLLATION" title="Table 9.92. Collation Management Functions"><code class="function">pg_import_system_collations()</code></a> function. + </p><p> + Within any particular database, only collations that use that + database's encoding are of interest. Other entries in + <code class="literal">pg_collation</code> are ignored. Thus, a stripped collation + name such as <code class="literal">de_DE</code> can be considered unique + within a given database even though it would not be unique globally. + Use of the stripped collation names is recommended, since it will + make one fewer thing you need to change if you decide to change to + another database encoding. Note however that the <code class="literal">default</code>, + <code class="literal">C</code>, and <code class="literal">POSIX</code> collations can be used regardless of + the database encoding. + </p><p> + <span class="productname">PostgreSQL</span> considers distinct collation + objects to be incompatible even when they have identical properties. + Thus for example, +</p><pre class="programlisting"> +SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1; +</pre><p> + will draw an error even though the <code class="literal">C</code> and <code class="literal">POSIX</code> + collations have identical behaviors. Mixing stripped and non-stripped + collation names is therefore not recommended. + </p></div><div class="sect4" id="id-1.6.10.4.5.6.5"><div class="titlepage"><div><div><h5 class="title">23.2.2.2.2. ICU Collations</h5></div></div></div><p> + With ICU, it is not sensible to enumerate all possible locale names. ICU + uses a particular naming system for locales, but there are many more ways + to name a locale than there are actually distinct locales. + <code class="command">initdb</code> uses the ICU APIs to extract a set of distinct + locales to populate the initial set of collations. Collations provided by + ICU are created in the SQL environment with names in BCP 47 language tag + format, with a <span class="quote">“<span class="quote">private use</span>”</span> + extension <code class="literal">-x-icu</code> appended, to distinguish them from + libc locales. + </p><p> + Here are some example collations that might be created: + + </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">de-x-icu</code></span></dt><dd><p>German collation, default variant</p></dd><dt><span class="term"><code class="literal">de-AT-x-icu</code></span></dt><dd><p>German collation for Austria, default variant</p><p> + (There are also, say, <code class="literal">de-DE-x-icu</code> + or <code class="literal">de-CH-x-icu</code>, but as of this writing, they are + equivalent to <code class="literal">de-x-icu</code>.) + </p></dd><dt><span class="term"><code class="literal">und-x-icu</code> (for <span class="quote">“<span class="quote">undefined</span>”</span>)</span></dt><dd><p> + ICU <span class="quote">“<span class="quote">root</span>”</span> collation. Use this to get a reasonable + language-agnostic sort order. + </p></dd></dl></div><p> + </p><p> + Some (less frequently used) encodings are not supported by ICU. When the + database encoding is one of these, ICU collation entries + in <code class="literal">pg_collation</code> are ignored. Attempting to use one + will draw an error along the lines of <span class="quote">“<span class="quote">collation "de-x-icu" for + encoding "WIN874" does not exist</span>”</span>. + </p></div></div><div class="sect3" id="COLLATION-CREATE"><div class="titlepage"><div><div><h4 class="title">23.2.2.3. Creating New Collation Objects</h4></div></div></div><p> + If the standard and predefined collations are not sufficient, users can + create their own collation objects using the SQL + command <a class="xref" href="sql-createcollation.html" title="CREATE COLLATION"><span class="refentrytitle">CREATE COLLATION</span></a>. + </p><p> + The standard and predefined collations are in the + schema <code class="literal">pg_catalog</code>, like all predefined objects. + User-defined collations should be created in user schemas. This also + ensures that they are saved by <code class="command">pg_dump</code>. + </p><div class="sect4" id="id-1.6.10.4.5.7.4"><div class="titlepage"><div><div><h5 class="title">23.2.2.3.1. libc Collations</h5></div></div></div><p> + New libc collations can be created like this: +</p><pre class="programlisting"> +CREATE COLLATION german (provider = libc, locale = 'de_DE'); +</pre><p> + The exact values that are acceptable for the <code class="literal">locale</code> + clause in this command depend on the operating system. On Unix-like + systems, the command <code class="literal">locale -a</code> will show a list. + </p><p> + Since the predefined libc collations already include all collations + defined in the operating system when the database instance is + initialized, it is not often necessary to manually create new ones. + Reasons might be if a different naming system is desired (in which case + see also <a class="xref" href="collation.html#COLLATION-COPY" title="23.2.2.3.3. Copying Collations">Section 23.2.2.3.3</a>) or if the operating system has + been upgraded to provide new locale definitions (in which case see + also <a class="link" href="functions-admin.html#FUNCTIONS-ADMIN-COLLATION" title="Table 9.92. Collation Management Functions"><code class="function">pg_import_system_collations()</code></a>). + </p></div><div class="sect4" id="id-1.6.10.4.5.7.5"><div class="titlepage"><div><div><h5 class="title">23.2.2.3.2. ICU Collations</h5></div></div></div><p> + ICU allows collations to be customized beyond the basic language+country + set that is preloaded by <code class="command">initdb</code>. Users are encouraged + to define their own collation objects that make use of these facilities to + suit the sorting behavior to their requirements. + See <a class="ulink" href="http://userguide.icu-project.org/locale" target="_top">http://userguide.icu-project.org/locale</a> + and <a class="ulink" href="http://userguide.icu-project.org/collation/api" target="_top">http://userguide.icu-project.org/collation/api</a> for + information on ICU locale naming. The set of acceptable names and + attributes depends on the particular ICU version. + </p><p> + Here are some examples: + + </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</code><br /></span><span class="term"><code class="literal">CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de@collation=phonebook');</code></span></dt><dd><p>German collation with phone book collation type</p><p> + The first example selects the ICU locale using a <span class="quote">“<span class="quote">language + tag</span>”</span> per BCP 47. The second example uses the traditional + ICU-specific locale syntax. The first style is preferred going + forward, but it is not supported by older ICU versions. + </p><p> + Note that you can name the collation objects in the SQL environment + anything you want. In this example, we follow the naming style that + the predefined collations use, which in turn also follow BCP 47, but + that is not required for user-defined collations. + </p></dd><dt><span class="term"><code class="literal">CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');</code><br /></span><span class="term"><code class="literal">CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = '@collation=emoji');</code></span></dt><dd><p> + Root collation with Emoji collation type, per Unicode Technical Standard #51 + </p><p> + Observe how in the traditional ICU locale naming system, the root + locale is selected by an empty string. + </p></dd><dt><span class="term"><code class="literal">CREATE COLLATION latinlast (provider = icu, locale = 'en-u-kr-grek-latn');</code><br /></span><span class="term"><code class="literal">CREATE COLLATION latinlast (provider = icu, locale = 'en@colReorder=grek-latn');</code></span></dt><dd><p> + Sort Greek letters before Latin ones. (The default is Latin before Greek.) + </p></dd><dt><span class="term"><code class="literal">CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');</code><br /></span><span class="term"><code class="literal">CREATE COLLATION upperfirst (provider = icu, locale = 'en@colCaseFirst=upper');</code></span></dt><dd><p> + Sort upper-case letters before lower-case letters. (The default is + lower-case letters first.) + </p></dd><dt><span class="term"><code class="literal">CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-grek-latn');</code><br /></span><span class="term"><code class="literal">CREATE COLLATION special (provider = icu, locale = 'en@colCaseFirst=upper;colReorder=grek-latn');</code></span></dt><dd><p> + Combines both of the above options. + </p></dd><dt><span class="term"><code class="literal">CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true');</code><br /></span><span class="term"><code class="literal">CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');</code></span></dt><dd><p> + Numeric ordering, sorts sequences of digits by their numeric value, + for example: <code class="literal">A-21</code> < <code class="literal">A-123</code> + (also known as natural sort). + </p></dd></dl></div><p> + + See <a class="ulink" href="https://www.unicode.org/reports/tr35/tr35-collation.html" target="_top">Unicode + Technical Standard #35</a> + and <a class="ulink" href="https://tools.ietf.org/html/bcp47" target="_top">BCP 47</a> for + details. The list of possible collation types (<code class="literal">co</code> + subtag) can be found in + the <a class="ulink" href="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml" target="_top">CLDR + repository</a>. + The <a class="ulink" href="https://ssl.icu-project.org/icu-bin/locexp" target="_top">ICU Locale + Explorer</a> can be used to check the details of a particular locale + definition. The examples using the <code class="literal">k*</code> subtags require + at least ICU version 54. + </p><p> + Note that while this system allows creating collations that <span class="quote">“<span class="quote">ignore + case</span>”</span> or <span class="quote">“<span class="quote">ignore accents</span>”</span> or similar (using the + <code class="literal">ks</code> key), in order for such collations to act in a + truly case- or accent-insensitive manner, they also need to be declared as not + <em class="firstterm">deterministic</em> in <code class="command">CREATE COLLATION</code>; + see <a class="xref" href="collation.html#COLLATION-NONDETERMINISTIC" title="23.2.2.4. Nondeterministic Collations">Section 23.2.2.4</a>. + Otherwise, any strings that compare equal according to the collation but + are not byte-wise equal will be sorted according to their byte values. + </p><div class="note"><h3 class="title">Note</h3><p> + By design, ICU will accept almost any string as a locale name and match + it to the closest locale it can provide, using the fallback procedure + described in its documentation. Thus, there will be no direct feedback + if a collation specification is composed using features that the given + ICU installation does not actually support. It is therefore recommended + to create application-level test cases to check that the collation + definitions satisfy one's requirements. + </p></div></div><div class="sect4" id="COLLATION-COPY"><div class="titlepage"><div><div><h5 class="title">23.2.2.3.3. Copying Collations</h5></div></div></div><p> + The command <a class="xref" href="sql-createcollation.html" title="CREATE COLLATION"><span class="refentrytitle">CREATE COLLATION</span></a> can also be used to + create a new collation from an existing collation, which can be useful to + be able to use operating-system-independent collation names in + applications, create compatibility names, or use an ICU-provided collation + under a more readable name. For example: +</p><pre class="programlisting"> +CREATE COLLATION german FROM "de_DE"; +CREATE COLLATION french FROM "fr-x-icu"; +</pre><p> + </p></div></div><div class="sect3" id="COLLATION-NONDETERMINISTIC"><div class="titlepage"><div><div><h4 class="title">23.2.2.4. Nondeterministic Collations</h4></div></div></div><p> + A collation is either <em class="firstterm">deterministic</em> or + <em class="firstterm">nondeterministic</em>. A deterministic collation uses + deterministic comparisons, which means that it considers strings to be + equal only if they consist of the same byte sequence. Nondeterministic + comparison may determine strings to be equal even if they consist of + different bytes. Typical situations include case-insensitive comparison, + accent-insensitive comparison, as well as comparison of strings in + different Unicode normal forms. It is up to the collation provider to + actually implement such insensitive comparisons; the deterministic flag + only determines whether ties are to be broken using bytewise comparison. + See also <a class="ulink" href="https://www.unicode.org/reports/tr10" target="_top">Unicode Technical + Standard 10</a> for more information on the terminology. + </p><p> + To create a nondeterministic collation, specify the property + <code class="literal">deterministic = false</code> to <code class="command">CREATE + COLLATION</code>, for example: +</p><pre class="programlisting"> +CREATE COLLATION ndcoll (provider = icu, locale = 'und', deterministic = false); +</pre><p> + This example would use the standard Unicode collation in a + nondeterministic way. In particular, this would allow strings in + different normal forms to be compared correctly. More interesting + examples make use of the ICU customization facilities explained above. + For example: +</p><pre class="programlisting"> +CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false); +CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false); +</pre><p> + </p><p> + All standard and predefined collations are deterministic, all + user-defined collations are deterministic by default. While + nondeterministic collations give a more <span class="quote">“<span class="quote">correct</span>”</span> behavior, + especially when considering the full power of Unicode and its many + special cases, they also have some drawbacks. Foremost, their use leads + to a performance penalty. Note, in particular, that B-tree cannot use + deduplication with indexes that use a nondeterministic collation. Also, + certain operations are not possible with nondeterministic collations, + such as pattern matching operations. Therefore, they should be used + only in cases where they are specifically wanted. + </p><div class="tip"><h3 class="title">Tip</h3><p> + To deal with text in different Unicode normalization forms, it is also + an option to use the functions/expressions + <code class="function">normalize</code> and <code class="literal">is normalized</code> to + preprocess or check the strings, instead of using nondeterministic + collations. There are different trade-offs for each approach. + </p></div></div></div></div><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navfooter"><hr></hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="locale.html" title="23.1. Locale Support">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="charset.html" title="Chapter 23. Localization">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="multibyte.html" title="23.3. Character Set Support">Next</a></td></tr><tr><td width="40%" align="left" valign="top">23.1. Locale Support </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 13.4 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 23.3. Character Set Support</td></tr></table></div></body></html>
\ No newline at end of file |