1 files changed, 603 insertions, 0 deletions
diff --git a/doc/src/sgml/html/collation.html b/doc/src/sgml/html/collation.html
new file mode 100644
index 0000000..5336258
--- /dev/null
+++ b/doc/src/sgml/html/collation.html
@@ -0,0 +1,603 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>24.2. Collation Support</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="locale.html" title="24.1. Locale Support" /><link rel="next" href="multibyte.html" title="24.3. Character Set Support" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">24.2. Collation Support</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="locale.html" title="24.1. Locale Support">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="charset.html" title="Chapter 24. Localization">Up</a></td><th width="60%" align="center">Chapter 24. Localization</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="multibyte.html" title="24.3. Character Set Support">Next</a></td></tr></table><hr /></div><div class="sect1" id="COLLATION"><div class="titlepage"><div><div><h2 class="title" style="clear: both">24.2. Collation Support <a href="#COLLATION" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="collation.html#COLLATION-CONCEPTS">24.2.1. Concepts</a></span></dt><dt><span class="sect2"><a href="collation.html#COLLATION-MANAGING">24.2.2. Managing Collations</a></span></dt><dt><span class="sect2"><a href="collation.html#ICU-CUSTOM-COLLATIONS">24.2.3. ICU Custom Collations</a></span></dt></dl></div><a id="id-1.6.11.4.2" class="indexterm"></a><p>
+   The collation feature allows specifying the sort order and character
+   classification behavior of data per-column, or even per-operation.
+   This alleviates the restriction that the
+   <code class="symbol">LC_COLLATE</code> and <code class="symbol">LC_CTYPE</code> settings
+   of a database cannot be changed after its creation.
+  </p><div class="sect2" id="COLLATION-CONCEPTS"><div class="titlepage"><div><div><h3 class="title">24.2.1. Concepts <a href="#COLLATION-CONCEPTS" class="id_link">#</a></h3></div></div></div><p>
+    Conceptually, every expression of a collatable data type has a
+    collation.  (The built-in collatable data types are
+    <code class="type">text</code>, <code class="type">varchar</code>, and <code class="type">char</code>.
+    User-defined base types can also be marked collatable, and of course
+    a <a class="glossterm" href="glossary.html#GLOSSARY-DOMAIN"><em class="glossterm"><a class="glossterm" href="glossary.html#GLOSSARY-DOMAIN" title="Domain">domain</a></em></a> over a
+    collatable data type is collatable.)  If the
+    expression is a column reference, the collation of the expression is the
+    defined collation of the column.  If the expression is a constant, the
+    collation is the default collation of the data type of the
+    constant.  The collation of a more complex expression is derived
+    from the collations of its inputs, as described below.
+   </p><p>
+    The collation of an expression can be the <span class="quote">“<span class="quote">default</span>”</span>
+    collation, which means the locale settings defined for the
+    database.  It is also possible for an expression's collation to be
+    indeterminate.  In such cases, ordering operations and other
+    operations that need to know the collation will fail.
+   </p><p>
+    When the database system has to perform an ordering or a character
+    classification, it uses the collation of the input expression.  This
+    happens, for example, with <code class="literal">ORDER BY</code> clauses
+    and function or operator calls such as <code class="literal">&lt;</code>.
+    The collation to apply for an <code class="literal">ORDER BY</code> clause
+    is simply the collation of the sort key.  The collation to apply for a
+    function or operator call is derived from the arguments, as described
+    below.  In addition to comparison operators, collations are taken into
+    account by functions that convert between lower and upper case
+    letters, such as <code class="function">lower</code>, <code class="function">upper</code>, and
+    <code class="function">initcap</code>; by pattern matching operators; and by
+    <code class="function">to_char</code> and related functions.
+   </p><p>
+    For a function or operator call, the collation that is derived by
+    examining the argument collations is used at run time for performing
+    the specified operation.  If the result of the function or operator
+    call is of a collatable data type, the collation is also used at parse
+    time as the defined collation of the function or operator expression,
+    in case there is a surrounding expression that requires knowledge of
+    its collation.
+   </p><p>
+    The <em class="firstterm">collation derivation</em> of an expression can be
+    implicit or explicit.  This distinction affects how collations are
+    combined when multiple different collations appear in an
+    expression.  An explicit collation derivation occurs when a
+    <code class="literal">COLLATE</code> clause is used; all other collation
+    derivations are implicit.  When multiple collations need to be
+    combined, for example in a function call, the following rules are
+    used:
+
+    </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>
+       If any input expression has an explicit collation derivation, then
+       all explicitly derived collations among the input expressions must be
+       the same, otherwise an error is raised.  If any explicitly
+       derived collation is present, that is the result of the
+       collation combination.
+      </p></li><li class="listitem"><p>
+       Otherwise, all input expressions must have the same implicit
+       collation derivation or the default collation.  If any non-default
+       collation is present, that is the result of the collation combination.
+       Otherwise, the result is the default collation.
+      </p></li><li class="listitem"><p>
+       If there are conflicting non-default implicit collations among the
+       input expressions, then the combination is deemed to have indeterminate
+       collation.  This is not an error condition unless the particular
+       function being invoked requires knowledge of the collation it should
+       apply.  If it does, an error will be raised at run-time.
+      </p></li></ol></div><p>
+
+    For example, consider this table definition:
+</p><pre class="programlisting">
+CREATE TABLE test1 (
+    a text COLLATE "de_DE",
+    b text COLLATE "es_ES",
+    ...
+);
+</pre><p>
+
+    Then in
+</p><pre class="programlisting">
+SELECT a &lt; 'foo' FROM test1;
+</pre><p>
+    the <code class="literal">&lt;</code> comparison is performed according to
+    <code class="literal">de_DE</code> rules, because the expression combines an
+    implicitly derived collation with the default collation.  But in
+</p><pre class="programlisting">
+SELECT a &lt; ('foo' COLLATE "fr_FR") FROM test1;
+</pre><p>
+    the comparison is performed using <code class="literal">fr_FR</code> rules,
+    because the explicit collation derivation overrides the implicit one.
+    Furthermore, given
+</p><pre class="programlisting">
+SELECT a &lt; b FROM test1;
+</pre><p>
+    the parser cannot determine which collation to apply, since the
+    <code class="structfield">a</code> and <code class="structfield">b</code> columns have conflicting
+    implicit collations.  Since the <code class="literal">&lt;</code> operator
+    does need to know which collation to use, this will result in an
+    error.  The error can be resolved by attaching an explicit collation
+    specifier to either input expression, thus:
+</p><pre class="programlisting">
+SELECT a &lt; b COLLATE "de_DE" FROM test1;
+</pre><p>
+    or equivalently
+</p><pre class="programlisting">
+SELECT a COLLATE "de_DE" &lt; b FROM test1;
+</pre><p>
+    On the other hand, the structurally similar case
+</p><pre class="programlisting">
+SELECT a || b FROM test1;
+</pre><p>
+    does not result in an error, because the <code class="literal">||</code> operator
+    does not care about collations: its result is the same regardless
+    of the collation.
+   </p><p>
+    The collation assigned to a function or operator's combined input
+    expressions is also considered to apply to the function or operator's
+    result, if the function or operator delivers a result of a collatable
+    data type.  So, in
+</p><pre class="programlisting">
+SELECT * FROM test1 ORDER BY a || 'foo';
+</pre><p>
+    the ordering will be done according to <code class="literal">de_DE</code> rules.
+    But this query:
+</p><pre class="programlisting">
+SELECT * FROM test1 ORDER BY a || b;
+</pre><p>
+    results in an error, because even though the <code class="literal">||</code> operator
+    doesn't need to know a collation, the <code class="literal">ORDER BY</code> clause does.
+    As before, the conflict can be resolved with an explicit collation
+    specifier:
+</p><pre class="programlisting">
+SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
+</pre><p>
+   </p></div><div class="sect2" id="COLLATION-MANAGING"><div class="titlepage"><div><div><h3 class="title">24.2.2. Managing Collations <a href="#COLLATION-MANAGING" class="id_link">#</a></h3></div></div></div><p>
+    A collation is an SQL schema object that maps an SQL name to locales
+    provided by libraries installed in the operating system.  A collation
+    definition has a <em class="firstterm">provider</em> that specifies which
+    library supplies the locale data.  One standard provider name
+    is <code class="literal">libc</code>, which uses the locales provided by the
+    operating system C library.  These are the locales used by most tools
+    provided by the operating system.  Another provider
+    is <code class="literal">icu</code>, which uses the external
+    ICU<a id="id-1.6.11.4.5.2.4" class="indexterm"></a> library.  ICU locales can only be
+    used if support for ICU was configured when PostgreSQL was built.
+   </p><p>
+    A collation object provided by <code class="literal">libc</code> maps to a
+    combination of <code class="symbol">LC_COLLATE</code> and <code class="symbol">LC_CTYPE</code>
+    settings, as accepted by the <code class="literal">setlocale()</code> system library call.  (As
+    the name would suggest, the main purpose of a collation is to set
+    <code class="symbol">LC_COLLATE</code>, which controls the sort order.  But
+    it is rarely necessary in practice to have an
+    <code class="symbol">LC_CTYPE</code> setting that is different from
+    <code class="symbol">LC_COLLATE</code>, so it is more convenient to collect
+    these under one concept than to create another infrastructure for
+    setting <code class="symbol">LC_CTYPE</code> per expression.)  Also,
+    a <code class="literal">libc</code> collation
+    is tied to a character set encoding (see <a class="xref" href="multibyte.html" title="24.3. Character Set Support">Section 24.3</a>).
+    The same collation name may exist for different encodings.
+   </p><p>
+    A collation object provided by <code class="literal">icu</code> maps to a named
+    collator provided by the ICU library.  ICU does not support
+    separate <span class="quote">“<span class="quote">collate</span>”</span> and <span class="quote">“<span class="quote">ctype</span>”</span> settings, so
+    they are always the same.  Also, ICU collations are independent of the
+    encoding, so there is always only one ICU collation of a given name in
+    a database.
+   </p><div class="sect3" id="COLLATION-MANAGING-STANDARD"><div class="titlepage"><div><div><h4 class="title">24.2.2.1. Standard Collations <a href="#COLLATION-MANAGING-STANDARD" class="id_link">#</a></h4></div></div></div><p>
+    On all platforms, the collations named <code class="literal">default</code>,
+    <code class="literal">C</code>, and <code class="literal">POSIX</code> are available.  Additional
+    collations may be available depending on operating system support.
+    The <code class="literal">default</code> collation selects the <code class="symbol">LC_COLLATE</code>
+    and <code class="symbol">LC_CTYPE</code> values specified at database creation time.
+    The <code class="literal">C</code> and <code class="literal">POSIX</code> collations both specify
+    <span class="quote">“<span class="quote">traditional C</span>”</span> behavior, in which only the ASCII letters
+    <span class="quote">“<span class="quote"><code class="literal">A</code></span>”</span> through <span class="quote">“<span class="quote"><code class="literal">Z</code></span>”</span>
+    are treated as letters, and sorting is done strictly by character
+    code byte values.
+   </p><div class="note"><h3 class="title">Note</h3><p>
+     The <code class="literal">C</code> and <code class="literal">POSIX</code> locales may behave
+     differently depending on the database encoding.
+    </p></div><p>
+    Additionally, two SQL standard collation names are available:
+
+    </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">unicode</code></span></dt><dd><p>
+        This collation sorts using the Unicode Collation Algorithm with the
+        Default Unicode Collation Element Table.  It is available in all
+        encodings.  ICU support is required to use this collation.  (This
+        collation has the same behavior as the ICU root locale; see <a class="xref" href="collation.html#COLLATION-MANAGING-PREDEFINED-ICU-UND-X-ICU"><code class="literal">und-x-icu</code> (for <span class="quote">“<span class="quote">undefined</span>”</span>)</a>.)
+       </p></dd><dt><span class="term"><code class="literal">ucs_basic</code></span></dt><dd><p>
+        This collation sorts by Unicode code point.  It is only available for
+        encoding <code class="literal">UTF8</code>.  (This collation has the same
+        behavior as the libc locale specification <code class="literal">C</code> in
+        <code class="literal">UTF8</code> encoding.)
+       </p></dd></dl></div><p>
+   </p></div><div class="sect3" id="COLLATION-MANAGING-PREDEFINED"><div class="titlepage"><div><div><h4 class="title">24.2.2.2. Predefined Collations <a href="#COLLATION-MANAGING-PREDEFINED" class="id_link">#</a></h4></div></div></div><p>
+    If the operating system provides support for using multiple locales
+    within a single program (<code class="function">newlocale</code> and related functions),
+    or if support for ICU is configured,
+    then when a database cluster is initialized, <code class="command">initdb</code>
+    populates the system catalog <code class="literal">pg_collation</code> with
+    collations based on all the locales it finds in the operating
+    system at the time.
+   </p><p>
+    To inspect the currently available locales, use the query <code class="literal">SELECT
+    * FROM pg_collation</code>, or the command <code class="command">\dOS+</code>
+    in <span class="application">psql</span>.
+   </p><div class="sect4" id="COLLATION-MANAGING-PREDEFINED-LIBC"><div class="titlepage"><div><div><h5 class="title">24.2.2.2.1. libc Collations <a href="#COLLATION-MANAGING-PREDEFINED-LIBC" class="id_link">#</a></h5></div></div></div><p>
+    For example, the operating system might
+    provide a locale named <code class="literal">de_DE.utf8</code>.
+    <code class="command">initdb</code> would then create a collation named
+    <code class="literal">de_DE.utf8</code> for encoding <code class="literal">UTF8</code>
+    that has both <code class="symbol">LC_COLLATE</code> and
+    <code class="symbol">LC_CTYPE</code> set to <code class="literal">de_DE.utf8</code>.
+    It will also create a collation with the <code class="literal">.utf8</code>
+    tag stripped off the name.  So you could also use the collation
+    under the name <code class="literal">de_DE</code>, which is less cumbersome
+    to write and makes the name less encoding-dependent.  Note that,
+    nevertheless, the initial set of collation names is
+    platform-dependent.
+   </p><p>
+    The default set of collations provided by <code class="literal">libc</code> map
+    directly to the locales installed in the operating system, which can be
+    listed using the command <code class="literal">locale -a</code>.  In case
+    a <code class="literal">libc</code> collation is needed that has different values
+    for <code class="symbol">LC_COLLATE</code> and <code class="symbol">LC_CTYPE</code>, or if new
+    locales are installed in the operating system after the database system
+    was initialized, then a new collation may be created using
+    the <a class="xref" href="sql-createcollation.html" title="CREATE COLLATION"><span class="refentrytitle">CREATE COLLATION</span></a> command.
+    New operating system locales can also be imported en masse using
+    the <a class="link" href="functions-admin.html#FUNCTIONS-ADMIN-COLLATION" title="Table 9.98. Collation Management Functions"><code class="function">pg_import_system_collations()</code></a> function.
+   </p><p>
+    Within any particular database, only collations that use that
+    database's encoding are of interest.  Other entries in
+    <code class="literal">pg_collation</code> are ignored.  Thus, a stripped collation
+    name such as <code class="literal">de_DE</code> can be considered unique
+    within a given database even though it would not be unique globally.
+    Use of the stripped collation names is recommended, since it will
+    make one fewer thing you need to change if you decide to change to
+    another database encoding.  Note however that the <code class="literal">default</code>,
+    <code class="literal">C</code>, and <code class="literal">POSIX</code> collations can be used regardless of
+    the database encoding.
+   </p><p>
+    <span class="productname">PostgreSQL</span> considers distinct collation
+    objects to be incompatible even when they have identical properties.
+    Thus for example,
+</p><pre class="programlisting">
+SELECT a COLLATE "C" &lt; b COLLATE "POSIX" FROM test1;
+</pre><p>
+    will draw an error even though the <code class="literal">C</code> and <code class="literal">POSIX</code>
+    collations have identical behaviors.  Mixing stripped and non-stripped
+    collation names is therefore not recommended.
+   </p></div><div class="sect4" id="COLLATION-MANAGING-PREDEFINED-ICU"><div class="titlepage"><div><div><h5 class="title">24.2.2.2.2. ICU Collations <a href="#COLLATION-MANAGING-PREDEFINED-ICU" class="id_link">#</a></h5></div></div></div><p>
+    With ICU, it is not sensible to enumerate all possible locale names.  ICU
+    uses a particular naming system for locales, but there are many more ways
+    to name a locale than there are actually distinct locales.
+    <code class="command">initdb</code> uses the ICU APIs to extract a set of distinct
+    locales to populate the initial set of collations.  Collations provided by
+    ICU are created in the SQL environment with names in BCP 47 language tag
+    format, with a <span class="quote">“<span class="quote">private use</span>”</span>
+    extension <code class="literal">-x-icu</code> appended, to distinguish them from
+    libc locales.
+   </p><p>
+    Here are some example collations that might be created:
+
+    </p><div class="variablelist"><dl class="variablelist"><dt id="COLLATION-MANAGING-PREDEFINED-ICU-DE-X-ICU"><span class="term"><code class="literal">de-x-icu</code></span> <a href="#COLLATION-MANAGING-PREDEFINED-ICU-DE-X-ICU" class="id_link">#</a></dt><dd><p>German collation, default variant</p></dd><dt id="COLLATION-MANAGING-PREDEFINED-ICU-DE-AT-X-ICU"><span class="term"><code class="literal">de-AT-x-icu</code></span> <a href="#COLLATION-MANAGING-PREDEFINED-ICU-DE-AT-X-ICU" class="id_link">#</a></dt><dd><p>German collation for Austria, default variant</p><p>
+        (There are also, say, <code class="literal">de-DE-x-icu</code>
+        or <code class="literal">de-CH-x-icu</code>, but as of this writing, they are
+        equivalent to <code class="literal">de-x-icu</code>.)
+       </p></dd><dt id="COLLATION-MANAGING-PREDEFINED-ICU-UND-X-ICU"><span class="term"><code class="literal">und-x-icu</code> (for <span class="quote">“<span class="quote">undefined</span>”</span>)</span> <a href="#COLLATION-MANAGING-PREDEFINED-ICU-UND-X-ICU" class="id_link">#</a></dt><dd><p>
+        ICU <span class="quote">“<span class="quote">root</span>”</span> collation.  Use this to get a reasonable
+        language-agnostic sort order.
+       </p></dd></dl></div><p>
+   </p><p>
+    Some (less frequently used) encodings are not supported by ICU.  When the
+    database encoding is one of these, ICU collation entries
+    in <code class="literal">pg_collation</code> are ignored.  Attempting to use one
+    will draw an error along the lines of <span class="quote">“<span class="quote">collation "de-x-icu" for
+    encoding "WIN874" does not exist</span>”</span>.
+   </p></div></div><div class="sect3" id="COLLATION-CREATE"><div class="titlepage"><div><div><h4 class="title">24.2.2.3. Creating New Collation Objects <a href="#COLLATION-CREATE" class="id_link">#</a></h4></div></div></div><p>
+    If the standard and predefined collations are not sufficient, users can
+    create their own collation objects using the SQL
+    command <a class="xref" href="sql-createcollation.html" title="CREATE COLLATION"><span class="refentrytitle">CREATE COLLATION</span></a>.
+   </p><p>
+    The standard and predefined collations are in the
+    schema <code class="literal">pg_catalog</code>, like all predefined objects.
+    User-defined collations should be created in user schemas.  This also
+    ensures that they are saved by <code class="command">pg_dump</code>.
+   </p><div class="sect4" id="COLLATION-MANAGING-CREATE-LIBC"><div class="titlepage"><div><div><h5 class="title">24.2.2.3.1. libc Collations <a href="#COLLATION-MANAGING-CREATE-LIBC" class="id_link">#</a></h5></div></div></div><p>
+     New libc collations can be created like this:
+</p><pre class="programlisting">
+CREATE COLLATION german (provider = libc, locale = 'de_DE');
+</pre><p>
+     The exact values that are acceptable for the <code class="literal">locale</code>
+     clause in this command depend on the operating system.  On Unix-like
+     systems, the command <code class="literal">locale -a</code> will show a list.
+    </p><p>
+     Since the predefined libc collations already include all collations
+     defined in the operating system when the database instance is
+     initialized, it is not often necessary to manually create new ones.
+     Reasons might be if a different naming system is desired (in which case
+     see also <a class="xref" href="collation.html#COLLATION-COPY" title="24.2.2.3.3. Copying Collations">Section 24.2.2.3.3</a>) or if the operating system has
+     been upgraded to provide new locale definitions (in which case see
+     also <a class="link" href="functions-admin.html#FUNCTIONS-ADMIN-COLLATION" title="Table 9.98. Collation Management Functions"><code class="function">pg_import_system_collations()</code></a>).
+    </p></div><div class="sect4" id="COLLATION-MANAGING-CREATE-ICU"><div class="titlepage"><div><div><h5 class="title">24.2.2.3.2. ICU Collations <a href="#COLLATION-MANAGING-CREATE-ICU" class="id_link">#</a></h5></div></div></div><p>
+     ICU collations can be created like:
+
+</p><pre class="programlisting">
+CREATE COLLATION german (provider = icu, locale = 'de-DE');
+</pre><p>
+
+     ICU locales are specified as a BCP 47 <a class="link" href="locale.html#ICU-LANGUAGE-TAG" title="24.1.5.3. Language Tag">Language Tag</a>, but can also accept most
+     libc-style locale names. If possible, libc-style locale names are
+     transformed into language tags.
+    </p><p>
+     New ICU collations can customize collation behavior extensively by
+     including collation attributes in the language tag. See <a class="xref" href="collation.html#ICU-CUSTOM-COLLATIONS" title="24.2.3. ICU Custom Collations">Section 24.2.3</a> for details and examples.
+    </p></div><div class="sect4" id="COLLATION-COPY"><div class="titlepage"><div><div><h5 class="title">24.2.2.3.3. Copying Collations <a href="#COLLATION-COPY" class="id_link">#</a></h5></div></div></div><p>
+    The command <a class="xref" href="sql-createcollation.html" title="CREATE COLLATION"><span class="refentrytitle">CREATE COLLATION</span></a> can also be used to
+    create a new collation from an existing collation, which can be useful to
+    be able to use operating-system-independent collation names in
+    applications, create compatibility names, or use an ICU-provided collation
+    under a more readable name.  For example:
+</p><pre class="programlisting">
+CREATE COLLATION german FROM "de_DE";
+CREATE COLLATION french FROM "fr-x-icu";
+</pre><p>
+   </p></div></div><div class="sect3" id="COLLATION-NONDETERMINISTIC"><div class="titlepage"><div><div><h4 class="title">24.2.2.4. Nondeterministic Collations <a href="#COLLATION-NONDETERMINISTIC" class="id_link">#</a></h4></div></div></div><p>
+     A collation is either <em class="firstterm">deterministic</em> or
+     <em class="firstterm">nondeterministic</em>.  A deterministic collation uses
+     deterministic comparisons, which means that it considers strings to be
+     equal only if they consist of the same byte sequence.  Nondeterministic
+     comparison may determine strings to be equal even if they consist of
+     different bytes.  Typical situations include case-insensitive comparison,
+     accent-insensitive comparison, as well as comparison of strings in
+     different Unicode normal forms.  It is up to the collation provider to
+     actually implement such insensitive comparisons; the deterministic flag
+     only determines whether ties are to be broken using bytewise comparison.
+     See also <a class="ulink" href="https://www.unicode.org/reports/tr10" target="_top">Unicode Technical
+     Standard 10</a> for more information on the terminology.
+    </p><p>
+     To create a nondeterministic collation, specify the property
+     <code class="literal">deterministic = false</code> to <code class="command">CREATE
+     COLLATION</code>, for example:
+</p><pre class="programlisting">
+CREATE COLLATION ndcoll (provider = icu, locale = 'und', deterministic = false);
+</pre><p>
+     This example would use the standard Unicode collation in a
+     nondeterministic way.  In particular, this would allow strings in
+     different normal forms to be compared correctly.  More interesting
+     examples make use of the ICU customization facilities explained above.
+     For example:
+</p><pre class="programlisting">
+CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false);
+CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false);
+</pre><p>
+    </p><p>
+     All standard and predefined collations are deterministic, all
+     user-defined collations are deterministic by default.  While
+     nondeterministic collations give a more <span class="quote">“<span class="quote">correct</span>”</span> behavior,
+     especially when considering the full power of Unicode and its many
+     special cases, they also have some drawbacks.  Foremost, their use leads
+     to a performance penalty.  Note, in particular, that B-tree cannot use
+     deduplication with indexes that use a nondeterministic collation.  Also,
+     certain operations are not possible with nondeterministic collations,
+     such as pattern matching operations.  Therefore, they should be used
+     only in cases where they are specifically wanted.
+    </p><div class="tip"><h3 class="title">Tip</h3><p>
+      To deal with text in different Unicode normalization forms, it is also
+      an option to use the functions/expressions
+      <code class="function">normalize</code> and <code class="literal">is normalized</code> to
+      preprocess or check the strings, instead of using nondeterministic
+      collations.  There are different trade-offs for each approach.
+     </p></div></div></div><div class="sect2" id="ICU-CUSTOM-COLLATIONS"><div class="titlepage"><div><div><h3 class="title">24.2.3. ICU Custom Collations <a href="#ICU-CUSTOM-COLLATIONS" class="id_link">#</a></h3></div></div></div><p>
+    ICU allows extensive control over collation behavior by defining new
+    collations with collation settings as a part of the language tag. These
+    settings can modify the collation order to suit a variety of needs. For
+    instance:
+
+</p><pre class="programlisting">
+-- ignore differences in accents and case
+CREATE COLLATION ignore_accent_case (provider = icu, deterministic = false, locale = 'und-u-ks-level1');
+SELECT 'Å' = 'A' COLLATE ignore_accent_case; -- true
+SELECT 'z' = 'Z' COLLATE ignore_accent_case; -- true
+
+-- upper case letters sort before lower case.
+CREATE COLLATION upper_first (provider = icu, locale = 'und-u-kf-upper');
+SELECT 'B' &lt; 'b' COLLATE upper_first; -- true
+
+-- treat digits numerically and ignore punctuation
+CREATE COLLATION num_ignore_punct (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-kn');
+SELECT 'id-45' &lt; 'id-123' COLLATE num_ignore_punct; -- true
+SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
+</pre><p>
+
+    Many of the available options are described in <a class="xref" href="collation.html#ICU-COLLATION-SETTINGS" title="24.2.3.2. Collation Settings for an ICU Locale">Section 24.2.3.2</a>, or see <a class="xref" href="collation.html#ICU-EXTERNAL-REFERENCES" title="24.2.3.5. External References for ICU">Section 24.2.3.5</a> for more details.
+   </p><div class="sect3" id="ICU-COLLATION-COMPARISON-LEVELS"><div class="titlepage"><div><div><h4 class="title">24.2.3.1. ICU Comparison Levels <a href="#ICU-COLLATION-COMPARISON-LEVELS" class="id_link">#</a></h4></div></div></div><p>
+     Comparison of two strings (collation) in ICU is determined by a
+     multi-level process, where textual features are grouped into
+     "levels". Treatment of each level is controlled by the <a class="link" href="collation.html#ICU-COLLATION-SETTINGS-TABLE" title="Table 24.2. ICU Collation Settings">collation settings</a>. Higher
+     levels correspond to finer textual features.
+    </p><p>
+     <a class="xref" href="collation.html#ICU-COLLATION-LEVELS" title="Table 24.1. ICU Collation Levels">Table 24.1</a> shows which textual feature
+     differences are considered significant when determining equality at the
+     given level. The Unicode character <code class="literal">U+2063</code> is an
+     invisible separator, and as seen in the table, is ignored for at all
+     levels of comparison less than <code class="literal">identic</code>.
+    </p><div class="table" id="ICU-COLLATION-LEVELS"><p class="title"><strong>Table 24.1. ICU Collation Levels</strong></p><div class="table-contents"><table class="table" summary="ICU Collation Levels" border="1"><colgroup><col class="col1" /><col class="col2" /><col class="col3" /><col class="col4" /><col class="col5" /><col class="col6" /><col class="col7" /><col class="col8" /></colgroup><thead><tr><th>Level</th><th>Description</th><th><code class="literal">'f' = 'f'</code></th><th><code class="literal">'ab' = U&amp;'a\2063b'</code></th><th><code class="literal">'x-y' = 'x_y'</code></th><th><code class="literal">'g' = 'G'</code></th><th><code class="literal">'n' = 'ñ'</code></th><th><code class="literal">'y' = 'z'</code></th></tr></thead><tbody><tr><td>level1</td><td>Base Character</td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">false</code></td></tr><tr><td>level2</td><td>Accents</td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td></tr><tr><td>level3</td><td>Case/Variants</td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td></tr><tr><td>level4</td><td>Punctuation</td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td></tr><tr><td>identic</td><td>All</td><td><code class="literal">true</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td></tr></tbody></table></div></div><br class="table-break" /><p>
+     At every level, even with full normalization off, basic normalization is
+     performed. For example, <code class="literal">'á'</code> may be composed of the
+     code points <code class="literal">U&amp;'\0061\0301'</code> or the single code
+     point <code class="literal">U&amp;'\00E1'</code>, and those sequences will be
+     considered equal even at the <code class="literal">identic</code> level. To treat
+     any difference in code point representation as distinct, use a collation
+     created with <code class="symbol">deterministic</code> set to
+     <code class="literal">true</code>.
+    </p><div class="sect4" id="ICU-COLLATION-LEVEL-EXAMPLES"><div class="titlepage"><div><div><h5 class="title">24.2.3.1.1. Collation Level Examples <a href="#ICU-COLLATION-LEVEL-EXAMPLES" class="id_link">#</a></h5></div></div></div><pre class="programlisting">
+CREATE COLLATION level3 (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-level3');
+CREATE COLLATION level4 (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-level4');
+CREATE COLLATION identic (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-identic');
+
+-- invisible separator ignored at all levels except identic
+SELECT 'ab' = U&amp;'a\2063b' COLLATE level4; -- true
+SELECT 'ab' = U&amp;'a\2063b' COLLATE identic; -- false
+
+-- punctuation ignored at level3 but not at level 4
+SELECT 'x-y' = 'x_y' COLLATE level3; -- true
+SELECT 'x-y' = 'x_y' COLLATE level4; -- false
+</pre></div></div><div class="sect3" id="ICU-COLLATION-SETTINGS"><div class="titlepage"><div><div><h4 class="title">24.2.3.2. Collation Settings for an ICU Locale <a href="#ICU-COLLATION-SETTINGS" class="id_link">#</a></h4></div></div></div><p>
+     <a class="xref" href="collation.html#ICU-COLLATION-SETTINGS-TABLE" title="Table 24.2. ICU Collation Settings">Table 24.2</a> shows the available
+     collation settings, which can be used as part of a language tag to
+     customize a collation.
+    </p><div class="table" id="ICU-COLLATION-SETTINGS-TABLE"><p class="title"><strong>Table 24.2. ICU Collation Settings</strong></p><div class="table-contents"><table class="table" summary="ICU Collation Settings" border="1"><colgroup><col class="col1" /><col class="col2" /><col class="col3" /><col class="col4" /></colgroup><thead><tr><th>Key</th><th>Values</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td><code class="literal">co</code></td><td><code class="literal">emoji</code>, <code class="literal">phonebk</code>, <code class="literal">standard</code>, <em class="replaceable"><code>...</code></em></td><td><code class="literal">standard</code></td><td>
+          Collation type. See <a class="xref" href="collation.html#ICU-EXTERNAL-REFERENCES" title="24.2.3.5. External References for ICU">Section 24.2.3.5</a> for additional options and details.
+         </td></tr><tr><td><code class="literal">ka</code></td><td><code class="literal">noignore</code>, <code class="literal">shifted</code></td><td><code class="literal">noignore</code></td><td>
+          If set to <code class="literal">shifted</code>, causes some characters
+          (e.g. punctuation or space) to be ignored in comparison. Key
+          <code class="literal">ks</code> must be set to <code class="literal">level3</code> or
+          lower to take effect. Set key <code class="literal">kv</code> to control which
+          character classes are ignored.
+         </td></tr><tr><td><code class="literal">kb</code></td><td><code class="literal">true</code>, <code class="literal">false</code></td><td><code class="literal">false</code></td><td>
+          Backwards comparison for the level 2 differences. For example,
+          locale <code class="literal">und-u-kb</code> sorts <code class="literal">'àe'</code>
+          before <code class="literal">'aé'</code>.
+         </td></tr><tr><td><code class="literal">kc</code></td><td><code class="literal">true</code>, <code class="literal">false</code></td><td><code class="literal">false</code></td><td>
+          <p>
+           Separates case into a "level 2.5" that falls between accents and
+           other level 3 features.
+          </p>
+          <p>
+           If set to <code class="literal">true</code> and <code class="literal">ks</code> is set
+           to <code class="literal">level1</code>, will ignore accents but take case
+           into account.
+          </p>
+         </td></tr><tr><td><code class="literal">kf</code></td><td>
+          <code class="literal">upper</code>, <code class="literal">lower</code>,
+          <code class="literal">false</code>
+         </td><td><code class="literal">false</code></td><td>
+          If set to <code class="literal">upper</code>, upper case sorts before lower
+          case. If set to <code class="literal">lower</code>, lower case sorts before
+          upper case. If set to <code class="literal">false</code>, the sort depends on
+          the rules of the locale.
+         </td></tr><tr><td><code class="literal">kn</code></td><td><code class="literal">true</code>, <code class="literal">false</code></td><td><code class="literal">false</code></td><td>
+          If set to <code class="literal">true</code>, numbers within a string are
+          treated as a single numeric value rather than a sequence of
+          digits. For example, <code class="literal">'id-45'</code> sorts before
+          <code class="literal">'id-123'</code>.
+         </td></tr><tr><td><code class="literal">kk</code></td><td><code class="literal">true</code>, <code class="literal">false</code></td><td><code class="literal">false</code></td><td>
+          <p>
+           Enable full normalization; may affect performance. Basic
+           normalization is performed even when set to
+           <code class="literal">false</code>. Locales for languages that require full
+           normalization typically enable it by default.
+          </p>
+          <p>
+           Full normalization is important in some cases, such as when
+           multiple accents are applied to a single character. For example,
+           the code point sequences <code class="literal">U&amp;'\0065\0323\0302'</code>
+           and <code class="literal">U&amp;'\0065\0302\0323'</code> represent
+           an <code class="literal">e</code> with circumflex and dot-below accents
+           applied in different orders. With full normalization
+           on, these code point sequences are treated as equal; otherwise they
+           are unequal.
+          </p>
+         </td></tr><tr><td><code class="literal">kr</code></td><td>
+          <code class="literal">space</code>, <code class="literal">punct</code>,
+          <code class="literal">symbol</code>, <code class="literal">currency</code>,
+          <code class="literal">digit</code>, <em class="replaceable"><code>script-id</code></em>
+         </td><td> </td><td>
+          <p>
+           Set to one or more of the valid values, or any BCP 47
+           <em class="replaceable"><code>script-id</code></em>, e.g. <code class="literal">latn</code>
+           ("Latin") or <code class="literal">grek</code> ("Greek"). Multiple values are
+           separated by "<code class="literal">-</code>".
+          </p>
+          <p>
+           Redefines the ordering of classes of characters; those characters
+           belonging to a class earlier in the list sort before characters
+           belonging to a class later in the list. For instance, the value
+           <code class="literal">digit-currency-space</code> (as part of a language tag
+           like <code class="literal">und-u-kr-digit-currency-space</code>) sorts
+           punctuation before digits and spaces.
+          </p>
+         </td></tr><tr><td><code class="literal">ks</code></td><td><code class="literal">level1</code>, <code class="literal">level2</code>, <code class="literal">level3</code>, <code class="literal">level4</code>, <code class="literal">identic</code></td><td><code class="literal">level3</code></td><td>
+          Sensitivity (or "strength") when determining equality, with
+          <code class="literal">level1</code> the least sensitive to differences and
+          <code class="literal">identic</code> the most sensitive to differences. See
+          <a class="xref" href="collation.html#ICU-COLLATION-LEVELS" title="Table 24.1. ICU Collation Levels">Table 24.1</a> for details.
+         </td></tr><tr><td><code class="literal">kv</code></td><td>
+          <code class="literal">space</code>, <code class="literal">punct</code>,
+          <code class="literal">symbol</code>, <code class="literal">currency</code>
+         </td><td><code class="literal">punct</code></td><td>
+          Classes of characters ignored during comparison at level 3. Setting
+          to a later value includes earlier values;
+          e.g. <code class="literal">symbol</code> also includes
+          <code class="literal">punct</code> and <code class="literal">space</code> in the
+          characters to be ignored. Key <code class="literal">ka</code> must be set to
+          <code class="literal">shifted</code> and key <code class="literal">ks</code> must be set
+          to <code class="literal">level3</code> or lower to take effect.
+         </td></tr></tbody></table></div></div><br class="table-break" /><p>
+     Defaults may depend on locale. The above table is not meant to be
+     complete. See <a class="xref" href="collation.html#ICU-EXTERNAL-REFERENCES" title="24.2.3.5. External References for ICU">Section 24.2.3.5</a> for additional
+     options and details.
+    </p><div class="note"><h3 class="title">Note</h3><p>
+      For many collation settings, you must create the collation with
+      <code class="option">deterministic</code> set to <code class="literal">false</code> for the
+      setting to have the desired effect (see <a class="xref" href="collation.html#COLLATION-NONDETERMINISTIC" title="24.2.2.4. Nondeterministic Collations">Section 24.2.2.4</a>). Additionally, some settings
+      only take effect when the key <code class="literal">ka</code> is set to
+      <code class="literal">shifted</code> (see <a class="xref" href="collation.html#ICU-COLLATION-SETTINGS-TABLE" title="Table 24.2. ICU Collation Settings">Table 24.2</a>).
+     </p></div></div><div class="sect3" id="ICU-LOCALE-EXAMPLES"><div class="titlepage"><div><div><h4 class="title">24.2.3.3. Collation Settings Examples <a href="#ICU-LOCALE-EXAMPLES" class="id_link">#</a></h4></div></div></div><div class="variablelist"><dl class="variablelist"><dt id="COLLATION-MANAGING-CREATE-ICU-DE-U-CO-PHONEBK-X-ICU"><span class="term"><code class="literal">CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</code></span> <a href="#COLLATION-MANAGING-CREATE-ICU-DE-U-CO-PHONEBK-X-ICU" class="id_link">#</a></dt><dd><p>German collation with phone book collation type</p></dd><dt id="COLLATION-MANAGING-CREATE-ICU-UND-U-CO-EMOJI-X-ICU"><span class="term"><code class="literal">CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');</code></span> <a href="#COLLATION-MANAGING-CREATE-ICU-UND-U-CO-EMOJI-X-ICU" class="id_link">#</a></dt><dd><p>
+         Root collation with Emoji collation type, per Unicode Technical Standard #51
+        </p></dd><dt id="COLLATION-MANAGING-CREATE-ICU-EN-U-KR-GREK-LATN"><span class="term"><code class="literal">CREATE COLLATION latinlast (provider = icu, locale = 'en-u-kr-grek-latn');</code></span> <a href="#COLLATION-MANAGING-CREATE-ICU-EN-U-KR-GREK-LATN" class="id_link">#</a></dt><dd><p>
+         Sort Greek letters before Latin ones.  (The default is Latin before Greek.)
+        </p></dd><dt id="COLLATION-MANAGING-CREATE-ICU-EN-U-KF-UPPER"><span class="term"><code class="literal">CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');</code></span> <a href="#COLLATION-MANAGING-CREATE-ICU-EN-U-KF-UPPER" class="id_link">#</a></dt><dd><p>
+         Sort upper-case letters before lower-case letters.  (The default is
+         lower-case letters first.)
+        </p></dd><dt id="COLLATION-MANAGING-CREATE-ICU-EN-U-KF-UPPER-KR-GREK-LATN"><span class="term"><code class="literal">CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-grek-latn');</code></span> <a href="#COLLATION-MANAGING-CREATE-ICU-EN-U-KF-UPPER-KR-GREK-LATN" class="id_link">#</a></dt><dd><p>
+         Combines both of the above options.
+        </p></dd></dl></div></div><div class="sect3" id="ICU-TAILORING-RULES"><div class="titlepage"><div><div><h4 class="title">24.2.3.4. ICU Tailoring Rules <a href="#ICU-TAILORING-RULES" class="id_link">#</a></h4></div></div></div><p>
+     If the options provided by the collation settings shown above are not
+     sufficient, the order of collation elements can be changed with tailoring
+     rules, whose syntax is detailed at <a class="ulink" href="https://unicode-org.github.io/icu/userguide/collation/customization/" target="_top">https://unicode-org.github.io/icu/userguide/collation/customization/</a>.
+    </p><p>
+     This small example creates a collation based on the root locale with a
+     tailoring rule:
+</p><pre class="programlisting">
+CREATE COLLATION custom (provider = icu, locale = 'und', rules = '&amp;V &lt;&lt; w &lt;&lt;&lt; W');
+</pre><p>
+     With this rule, the letter <span class="quote">“<span class="quote">W</span>”</span> is sorted after
+     <span class="quote">“<span class="quote">V</span>”</span>, but is treated as a secondary difference similar to an
+     accent.  Rules like this are contained in the locale definitions of some
+     languages.  (Of course, if a locale definition already contains the
+     desired rules, then they don't need to be specified again explicitly.)
+    </p><p>
+     Here is a more complex example.  The following statement sets up a
+     collation named <code class="literal">ebcdic</code> with rules to sort US-ASCII
+     characters in the order of the EBCDIC encoding.
+
+</p><pre class="programlisting">
+CREATE COLLATION ebcdic (provider = icu, locale = 'und',
+rules = $$
+&amp; ' ' &lt; '.' &lt; '&lt;' &lt; '(' &lt; '+' &lt; \|
+&lt; '&amp;' &lt; '!' &lt; '$' &lt; '*' &lt; ')' &lt; ';'
+&lt; '-' &lt; '/' &lt; ',' &lt; '%' &lt; '_' &lt; '&gt;' &lt; '?'
+&lt; '`' &lt; ':' &lt; '#' &lt; '@' &lt; \' &lt; '=' &lt; '"'
+&lt;*a-r &lt; '~' &lt;*s-z &lt; '^' &lt; '[' &lt; ']'
+&lt; '{' &lt;*A-I &lt; '}' &lt;*J-R &lt; '\' &lt;*S-Z &lt;*0-9
+$$);
+
+SELECT c
+FROM (VALUES ('a'), ('b'), ('A'), ('B'), ('1'), ('2'), ('!'), ('^')) AS x(c)
+ORDER BY c COLLATE ebcdic;
+ c
+---
+ !
+ a
+ b
+ ^
+ A
+ B
+ 1
+ 2
+</pre><p>
+    </p></div><div class="sect3" id="ICU-EXTERNAL-REFERENCES"><div class="titlepage"><div><div><h4 class="title">24.2.3.5. External References for ICU <a href="#ICU-EXTERNAL-REFERENCES" class="id_link">#</a></h4></div></div></div><p>
+     This section (<a class="xref" href="collation.html#ICU-CUSTOM-COLLATIONS" title="24.2.3. ICU Custom Collations">Section 24.2.3</a>) is only a brief
+     overview of ICU behavior and language tags. Refer to the following
+     documents for technical details, additional options, and new behavior:
+    </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
+       <a class="ulink" href="https://www.unicode.org/reports/tr35/tr35-collation.html" target="_top">Unicode Technical Standard #35</a>
+      </p></li><li class="listitem"><p>
+       <a class="ulink" href="https://tools.ietf.org/html/bcp47" target="_top">BCP 47</a>
+      </p></li><li class="listitem"><p>
+       <a class="ulink" href="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml" target="_top">CLDR repository</a>
+      </p></li><li class="listitem"><p>
+       <a class="ulink" href="https://unicode-org.github.io/icu/userguide/locale/" target="_top">https://unicode-org.github.io/icu/userguide/locale/</a>
+      </p></li><li class="listitem"><p>
+       <a class="ulink" href="https://unicode-org.github.io/icu/userguide/collation/" target="_top">https://unicode-org.github.io/icu/userguide/collation/</a>
+      </p></li></ul></div></div></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="locale.html" title="24.1. Locale Support">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="charset.html" title="Chapter 24. Localization">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="multibyte.html" title="24.3. Character Set Support">Next</a></td></tr><tr><td width="40%" align="left" valign="top">24.1. Locale Support </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 24.3. Character Set Support</td></tr></table></div></body></html>
+\ No newline at end of file