1 files changed, 3318 insertions, 0 deletions
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
new file mode 100644
index 0000000..975b9dc
--- /dev/null
+++ b/doc/src/sgml/charset.sgml
@@ -0,0 +1,3318 @@
+<!-- doc/src/sgml/charset.sgml -->
+
+<chapter id="charset">
+ <title>Localization</title>
+
+ <para>
+  This chapter describes the available localization features from the
+  point of view of the administrator.
+  <productname>PostgreSQL</productname> supports two localization
+  facilities:
+
+   <itemizedlist>
+    <listitem>
+     <para>
+      Using the locale features of the operating system to provide
+      locale-specific collation order, number formatting, translated
+      messages, and other aspects.
+      This is covered in <xref linkend="locale"/> and
+      <xref linkend="collation"/>.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      Providing a number of different character sets to support storing text
+      in all kinds of languages, and providing character set translation
+      between client and server.
+      This is covered in <xref linkend="multibyte"/>.
+     </para>
+    </listitem>
+   </itemizedlist>
+  </para>
+
+
+ <sect1 id="locale">
+  <title>Locale Support</title>
+
+  <indexterm zone="locale"><primary>locale</primary></indexterm>
+
+  <para>
+   <firstterm>Locale</firstterm> support refers to an application respecting
+   cultural preferences regarding alphabets, sorting, number
+   formatting, etc.  <productname>PostgreSQL</productname> uses the standard ISO
+   C and <acronym>POSIX</acronym> locale facilities provided by the server operating
+   system.  For additional information refer to the documentation of your
+   system.
+  </para>
+
+  <sect2 id="locale-overview">
+   <title>Overview</title>
+
+   <para>
+    Locale support is automatically initialized when a database
+    cluster is created using <command>initdb</command>.
+    <command>initdb</command> will initialize the database cluster
+    with the locale setting of its execution environment by default,
+    so if your system is already set to use the locale that you want
+    in your database cluster then there is nothing else you need to
+    do.  If you want to use a different locale (or you are not sure
+    which locale your system is set to), you can instruct
+    <command>initdb</command> exactly which locale to use by
+    specifying the <option>--locale</option> option. For example:
+<screen>
+initdb --locale=sv_SE
+</screen>
+   </para>
+
+   <para>
+    This example for Unix systems sets the locale to Swedish
+    (<literal>sv</literal>) as spoken
+    in Sweden (<literal>SE</literal>).  Other possibilities might include
+    <literal>en_US</literal> (U.S. English) and <literal>fr_CA</literal> (French
+    Canadian).  If more than one character set can be used for a
+    locale then the specifications can take the form
+    <replaceable>language_territory.codeset</replaceable>.  For example,
+    <literal>fr_BE.UTF-8</literal> represents the French language (fr) as
+    spoken in Belgium (BE), with a <acronym>UTF-8</acronym> character set
+    encoding.
+   </para>
+
+   <para>
+    What locales are available on your
+    system under what names depends on what was provided by the operating
+    system vendor and what was installed.  On most Unix systems, the command
+    <literal>locale -a</literal> will provide a list of available locales.
+    Windows uses more verbose locale names, such as <literal>German_Germany</literal>
+    or <literal>Swedish_Sweden.1252</literal>, but the principles are the same.
+   </para>
+
+   <para>
+    Occasionally it is useful to mix rules from several locales, e.g.,
+    use English collation rules but Spanish messages.  To support that, a
+    set of locale subcategories exist that control only certain
+    aspects of the localization rules:
+
+    <informaltable>
+     <tgroup cols="2">
+      <colspec colname="col1" colwidth="1*"/>
+      <colspec colname="col2" colwidth="3*"/>
+      <tbody>
+       <row>
+        <entry><envar>LC_COLLATE</envar></entry>
+        <entry>String sort order</entry>
+       </row>
+       <row>
+        <entry><envar>LC_CTYPE</envar></entry>
+        <entry>Character classification (What is a letter? Its upper-case equivalent?)</entry>
+       </row>
+       <row>
+        <entry><envar>LC_MESSAGES</envar></entry>
+        <entry>Language of messages</entry>
+       </row>
+       <row>
+        <entry><envar>LC_MONETARY</envar></entry>
+        <entry>Formatting of currency amounts</entry>
+       </row>
+       <row>
+        <entry><envar>LC_NUMERIC</envar></entry>
+        <entry>Formatting of numbers</entry>
+       </row>
+       <row>
+        <entry><envar>LC_TIME</envar></entry>
+        <entry>Formatting of dates and times</entry>
+       </row>
+      </tbody>
+     </tgroup>
+    </informaltable>
+
+    The category names translate into names of
+    <command>initdb</command> options to override the locale choice
+    for a specific category.  For instance, to set the locale to
+    French Canadian, but use U.S. rules for formatting currency, use
+    <literal>initdb --locale=fr_CA --lc-monetary=en_US</literal>.
+   </para>
+
+   <para>
+    If you want the system to behave as if it had no locale support,
+    use the special locale name <literal>C</literal>, or equivalently
+    <literal>POSIX</literal>.
+   </para>
+
+   <para>
+    Some locale categories must have their values
+    fixed when the database is created.  You can use different settings
+    for different databases, but once a database is created, you cannot
+    change them for that database anymore. <literal>LC_COLLATE</literal>
+    and <literal>LC_CTYPE</literal> are these categories.  They affect
+    the sort order of indexes, so they must be kept fixed, or indexes on
+    text columns would become corrupt.
+    (But you can alleviate this restriction using collations, as discussed
+    in <xref linkend="collation"/>.)
+    The default values for these
+    categories are determined when <command>initdb</command> is run, and
+    those values are used when new databases are created, unless
+    specified otherwise in the <command>CREATE DATABASE</command> command.
+   </para>
+
+   <para>
+    The other locale categories can be changed whenever desired
+    by setting the server configuration parameters
+    that have the same name as the locale categories (see <xref
+    linkend="runtime-config-client-format"/> for details).  The values
+    that are chosen by <command>initdb</command> are actually only written
+    into the configuration file <filename>postgresql.conf</filename> to
+    serve as defaults when the server is started.  If you remove these
+    assignments from <filename>postgresql.conf</filename> then the
+    server will inherit the settings from its execution environment.
+   </para>
+
+   <para>
+    Note that the locale behavior of the server is determined by the
+    environment variables seen by the server, not by the environment
+    of any client.  Therefore, be careful to configure the correct locale settings
+    before starting the server.  A consequence of this is that if
+    client and server are set up in different locales, messages might
+    appear in different languages depending on where they originated.
+   </para>
+
+   <note>
+    <para>
+     When we speak of inheriting the locale from the execution
+     environment, this means the following on most operating systems:
+     For a given locale category, say the collation, the following
+     environment variables are consulted in this order until one is
+     found to be set: <envar>LC_ALL</envar>, <envar>LC_COLLATE</envar>
+     (or the variable corresponding to the respective category),
+     <envar>LANG</envar>.  If none of these environment variables are
+     set then the locale defaults to <literal>C</literal>.
+    </para>
+
+    <para>
+     Some message localization libraries also look at the environment
+     variable <envar>LANGUAGE</envar> which overrides all other locale
+     settings for the purpose of setting the language of messages.  If
+     in doubt, please refer to the documentation of your operating
+     system, in particular the documentation about
+     <application>gettext</application>.
+    </para>
+   </note>
+
+   <para>
+    To enable messages to be translated to the user's preferred language,
+    <acronym>NLS</acronym> must have been selected at build time
+    (<literal>configure --enable-nls</literal>).  All other locale support is
+    built in automatically.
+   </para>
+  </sect2>
+
+  <sect2 id="locale-behavior">
+   <title>Behavior</title>
+
+   <para>
+    The locale settings influence the following SQL features:
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       Sort order in queries using <literal>ORDER BY</literal> or the standard
+       comparison operators on textual data
+       <indexterm><primary>ORDER BY</primary><secondary>and locales</secondary></indexterm>
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       The <function>upper</function>, <function>lower</function>, and <function>initcap</function>
+       functions
+       <indexterm><primary>upper</primary><secondary>and locales</secondary></indexterm>
+       <indexterm><primary>lower</primary><secondary>and locales</secondary></indexterm>
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       Pattern matching operators (<literal>LIKE</literal>, <literal>SIMILAR TO</literal>,
+       and POSIX-style regular expressions); locales affect both case
+       insensitive matching and the classification of characters by
+       character-class regular expressions
+       <indexterm><primary>LIKE</primary><secondary>and locales</secondary></indexterm>
+       <indexterm><primary>regular expressions</primary><secondary>and locales</secondary></indexterm>
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       The <function>to_char</function> family of functions
+       <indexterm><primary>to_char</primary><secondary>and locales</secondary></indexterm>
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       The ability to use indexes with <literal>LIKE</literal> clauses
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    The drawback of using locales other than <literal>C</literal> or
+    <literal>POSIX</literal> in <productname>PostgreSQL</productname> is its performance
+    impact. It slows character handling and prevents ordinary indexes
+    from being used by <literal>LIKE</literal>. For this reason use locales
+    only if you actually need them.
+   </para>
+
+   <para>
+    As a workaround to allow <productname>PostgreSQL</productname> to use indexes
+    with <literal>LIKE</literal> clauses under a non-C locale, several custom
+    operator classes exist. These allow the creation of an index that
+    performs a strict character-by-character comparison, ignoring
+    locale comparison rules. Refer to <xref linkend="indexes-opclass"/>
+    for more information.  Another approach is to create indexes using
+    the <literal>C</literal> collation, as discussed in
+    <xref linkend="collation"/>.
+   </para>
+  </sect2>
+
+  <sect2 id="locale-selecting-locales">
+   <title>Selecting Locales</title>
+
+   <para>
+    Locales can be selected in different scopes depending on requirements.
+    The above overview showed how locales are specified using
+    <command>initdb</command> to set the defaults for the entire cluster.  The
+    following list shows where locales can be selected.  Each item provides
+    the defaults for the subsequent items, and each lower item allows
+    overriding the defaults on a finer granularity.
+   </para>
+
+   <orderedlist>
+    <listitem>
+     <para>
+      As explained above, the environment of the operating system provides the
+      defaults for the locales of a newly initialized database cluster.  In
+      many cases, this is enough: If the operating system is configured for
+      the desired language/territory, then
+      <productname>PostgreSQL</productname> will by default also behave
+      according to that locale.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      As shown above, command-line options for <command>initdb</command>
+      specify the locale settings for a newly initialized database cluster.
+      Use this if the operating system does not have the locale configuration
+      you want for your database system.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      A locale can be selected separately for each database.  The SQL command
+      <command>CREATE DATABASE</command> and its command-line equivalent
+      <command>createdb</command> have options for that.  Use this for example
+      if a database cluster houses databases for multiple tenants with
+      different requirements.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      Locale settings can be made for individual table columns.  This uses an
+      SQL object called <firstterm>collation</firstterm> and is explained in
+      <xref linkend="collation"/>.  Use this for example to sort data in
+      different languages or customize the sort order of a particular table.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      Finally, locales can be selected for an individual query.  Again, this
+      uses SQL collation objects.  This could be used to change the sort order
+      based on run-time choices or for ad-hoc experimentation.
+     </para>
+    </listitem>
+   </orderedlist>
+  </sect2>
+
+  <sect2 id="locale-providers">
+   <title>Locale Providers</title>
+
+   <para>
+    <productname>PostgreSQL</productname> supports multiple <firstterm>locale
+    providers</firstterm>.  This specifies which library supplies the locale
+    data.  One standard provider name is <literal>libc</literal>, which uses
+    the locales provided by the operating system C library.  These are the
+    locales used by most tools provided by the operating system.  Another
+    provider is <literal>icu</literal>, which uses the external
+    ICU<indexterm><primary>ICU</primary></indexterm> library.  ICU locales can
+    only be used if support for ICU was configured when PostgreSQL was built.
+   </para>
+
+   <para>
+    The commands and tools that select the locale settings, as described
+    above, each have an option to select the locale provider.  The examples
+    shown earlier all use the <literal>libc</literal> provider, which is the
+    default.  Here is an example to initialize a database cluster using the
+    ICU provider:
+<programlisting>
+initdb --locale-provider=icu --icu-locale=en
+</programlisting>
+    See the description of the respective commands and programs for
+    details.  Note that you can mix locale providers at different
+    granularities, for example use <literal>libc</literal> by default for the
+    cluster but have one database that uses the <literal>icu</literal>
+    provider, and then have collation objects using either provider within
+    those databases.
+   </para>
+
+   <para>
+    Which locale provider to use depends on individual requirements.  For most
+    basic uses, either provider will give adequate results.  For the libc
+    provider, it depends on what the operating system offers; some operating
+    systems are better than others.  For advanced uses, ICU offers more locale
+    variants and customization options.
+   </para>
+  </sect2>
+
+  <sect2 id="icu-locales">
+   <title>ICU Locales</title>
+
+   <sect3 id="icu-locale-names">
+    <title>ICU Locale Names</title>
+
+    <para>
+     The ICU format for the locale name is a <link
+     linkend="icu-language-tag">Language Tag</link>.
+
+<programlisting>
+CREATE COLLATION mycollation1 (provider = icu, locale = 'ja-JP');
+CREATE COLLATION mycollation2 (provider = icu, locale = 'fr');
+</programlisting>
+    </para>
+   </sect3>
+
+   <sect3 id="icu-canonicalization">
+    <title>Locale Canonicalization and Validation</title>
+    <para>
+     When defining a new ICU collation object or database with ICU as the
+     provider, the given locale name is transformed ("canonicalized") into a
+     language tag if not already in that form. For instance,
+
+<screen>
+CREATE COLLATION mycollation3 (provider = icu, locale = 'en-US-u-kn-true');
+NOTICE:  using standard form "en-US-u-kn" for locale "en-US-u-kn-true"
+CREATE COLLATION mycollation4 (provider = icu, locale = 'de_DE.utf8');
+NOTICE:  using standard form "de-DE" for locale "de_DE.utf8"
+</screen>
+
+     If you see this notice, ensure that the <symbol>provider</symbol> and
+     <symbol>locale</symbol> are the expected result. For consistent results
+     when using the ICU provider, specify the canonical <link
+     linkend="icu-language-tag">language tag</link> instead of relying on the
+     transformation.
+    </para>
+
+    <para>
+     A locale with no language name, or the special language name
+     <literal>root</literal>, is transformed to have the language
+     <literal>und</literal> ("undefined").
+    </para>
+
+    <para>
+     ICU can transform most libc locale names, as well as some other formats,
+     into language tags for easier transition to ICU. If a libc locale name is
+     used in ICU, it may not have precisely the same behavior as in libc.
+    </para>
+
+    <para>
+     If there is a problem interpreting the locale name, or if the locale name
+     represents a language or region that ICU does not recognize, you will see
+     the following warning:
+
+<screen>
+CREATE COLLATION nonsense (provider = icu, locale = 'nonsense');
+WARNING:  ICU locale "nonsense" has unknown language "nonsense"
+HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION
+</screen>
+
+     <xref linkend="guc-icu-validation-level"/> controls how the message is
+     reported. Unless set to <literal>ERROR</literal>, the collation will
+     still be created, but the behavior may not be what the user intended.
+    </para>
+   </sect3>
+
+   <sect3 id="icu-language-tag">
+    <title>Language Tag</title>
+
+    <para>
+     A language tag, defined in BCP 47, is a standardized identifier used to
+     identify languages, regions, and other information about a locale.
+    </para>
+
+    <para>
+     Basic language tags are simply
+     <replaceable>language</replaceable><literal>-</literal><replaceable>region</replaceable>;
+     or even just <replaceable>language</replaceable>. The
+     <replaceable>language</replaceable> is a language code
+     (e.g. <literal>fr</literal> for French), and
+     <replaceable>region</replaceable> is a region code
+     (e.g. <literal>CA</literal> for Canada). Examples:
+     <literal>ja-JP</literal>, <literal>de</literal>, or
+     <literal>fr-CA</literal>.
+    </para>
+
+    <para>
+     Collation settings may be included in the language tag to customize
+     collation behavior. ICU allows extensive customization, such as
+     sensitivity (or insensitivity) to accents, case, and punctuation;
+     treatment of digits within text; and many other options to satisfy a
+     variety of uses.
+    </para>
+
+    <para>
+     To include this additional collation information in a language tag,
+     append <literal>-u</literal>, which indicates there are additional
+     collation settings, followed by one or more
+     <literal>-</literal><replaceable>key</replaceable><literal>-</literal><replaceable>value</replaceable>
+     pairs. The <replaceable>key</replaceable> is the key for a <link
+     linkend="icu-collation-settings">collation setting</link> and
+     <replaceable>value</replaceable> is a valid value for that setting. For
+     boolean settings, the <literal>-</literal><replaceable>key</replaceable>
+     may be specified without a corresponding
+     <literal>-</literal><replaceable>value</replaceable>, which implies a
+     value of <literal>true</literal>.
+    </para>
+
+    <para>
+     For example, the language tag <literal>en-US-u-kn-ks-level2</literal>
+     means the locale with the English language in the US region, with
+     collation settings <literal>kn</literal> set to <literal>true</literal>
+     and <literal>ks</literal> set to <literal>level2</literal>. Those
+     settings mean the collation will be case-insensitive and treat a sequence
+     of digits as a single number:
+
+<screen>
+CREATE COLLATION mycollation5 (provider = icu, deterministic = false, locale = 'en-US-u-kn-ks-level2');
+SELECT 'aB' = 'Ab' COLLATE mycollation5 as result;
+ result
+--------
+ t
+(1 row)
+
+SELECT 'N-45' &lt; 'N-123' COLLATE mycollation5 as result;
+ result
+--------
+ t
+(1 row)
+</screen>
+    </para>
+
+    <para>
+     See <xref linkend="icu-custom-collations"/> for details and additional
+     examples of using language tags with custom collation information for the
+     locale.
+    </para>
+   </sect3>
+  </sect2>
+
+  <sect2 id="locale-problems">
+   <title>Problems</title>
+
+   <para>
+    If locale support doesn't work according to the explanation above,
+    check that the locale support in your operating system is
+    correctly configured.  To check what locales are installed on your
+    system, you can use the command <literal>locale -a</literal> if
+    your operating system provides it.
+   </para>
+
+   <para>
+    Check that <productname>PostgreSQL</productname> is actually using the locale
+    that you think it is.  The <envar>LC_COLLATE</envar> and <envar>LC_CTYPE</envar>
+    settings are determined when a database is created, and cannot be
+    changed except by creating a new database.  Other locale
+    settings including <envar>LC_MESSAGES</envar> and <envar>LC_MONETARY</envar>
+    are initially determined by the environment the server is started
+    in, but can be changed on-the-fly.  You can check the active locale
+    settings using the <command>SHOW</command> command.
+   </para>
+
+   <para>
+    The directory <filename>src/test/locale</filename> in the source
+    distribution contains a test suite for
+    <productname>PostgreSQL</productname>'s locale support.
+   </para>
+
+   <para>
+    Client applications that handle server-side errors by parsing the
+    text of the error message will obviously have problems when the
+    server's messages are in a different language.  Authors of such
+    applications are advised to make use of the error code scheme
+    instead.
+   </para>
+
+   <para>
+    Maintaining catalogs of message translations requires the on-going
+    efforts of many volunteers that want to see
+    <productname>PostgreSQL</productname> speak their preferred language well.
+    If messages in your language are currently not available or not fully
+    translated, your assistance would be appreciated.  If you want to
+    help, refer to <xref linkend="nls"/> or write to the developers'
+    mailing list.
+   </para>
+  </sect2>
+ </sect1>
+
+
+ <sect1 id="collation">
+  <title>Collation Support</title>
+
+  <indexterm zone="collation"><primary>collation</primary></indexterm>
+
+  <para>
+   The collation feature allows specifying the sort order and character
+   classification behavior of data per-column, or even per-operation.
+   This alleviates the restriction that the
+   <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol> settings
+   of a database cannot be changed after its creation.
+  </para>
+
+  <sect2 id="collation-concepts">
+   <title>Concepts</title>
+
+   <para>
+    Conceptually, every expression of a collatable data type has a
+    collation.  (The built-in collatable data types are
+    <type>text</type>, <type>varchar</type>, and <type>char</type>.
+    User-defined base types can also be marked collatable, and of course
+    a <glossterm linkend="glossary-domain">domain</glossterm> over a
+    collatable data type is collatable.)  If the
+    expression is a column reference, the collation of the expression is the
+    defined collation of the column.  If the expression is a constant, the
+    collation is the default collation of the data type of the
+    constant.  The collation of a more complex expression is derived
+    from the collations of its inputs, as described below.
+   </para>
+
+   <para>
+    The collation of an expression can be the <quote>default</quote>
+    collation, which means the locale settings defined for the
+    database.  It is also possible for an expression's collation to be
+    indeterminate.  In such cases, ordering operations and other
+    operations that need to know the collation will fail.
+   </para>
+
+   <para>
+    When the database system has to perform an ordering or a character
+    classification, it uses the collation of the input expression.  This
+    happens, for example, with <literal>ORDER BY</literal> clauses
+    and function or operator calls such as <literal>&lt;</literal>.
+    The collation to apply for an <literal>ORDER BY</literal> clause
+    is simply the collation of the sort key.  The collation to apply for a
+    function or operator call is derived from the arguments, as described
+    below.  In addition to comparison operators, collations are taken into
+    account by functions that convert between lower and upper case
+    letters, such as <function>lower</function>, <function>upper</function>, and
+    <function>initcap</function>; by pattern matching operators; and by
+    <function>to_char</function> and related functions.
+   </para>
+
+   <para>
+    For a function or operator call, the collation that is derived by
+    examining the argument collations is used at run time for performing
+    the specified operation.  If the result of the function or operator
+    call is of a collatable data type, the collation is also used at parse
+    time as the defined collation of the function or operator expression,
+    in case there is a surrounding expression that requires knowledge of
+    its collation.
+   </para>
+
+   <para>
+    The <firstterm>collation derivation</firstterm> of an expression can be
+    implicit or explicit.  This distinction affects how collations are
+    combined when multiple different collations appear in an
+    expression.  An explicit collation derivation occurs when a
+    <literal>COLLATE</literal> clause is used; all other collation
+    derivations are implicit.  When multiple collations need to be
+    combined, for example in a function call, the following rules are
+    used:
+
+    <orderedlist>
+     <listitem>
+      <para>
+       If any input expression has an explicit collation derivation, then
+       all explicitly derived collations among the input expressions must be
+       the same, otherwise an error is raised.  If any explicitly
+       derived collation is present, that is the result of the
+       collation combination.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       Otherwise, all input expressions must have the same implicit
+       collation derivation or the default collation.  If any non-default
+       collation is present, that is the result of the collation combination.
+       Otherwise, the result is the default collation.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       If there are conflicting non-default implicit collations among the
+       input expressions, then the combination is deemed to have indeterminate
+       collation.  This is not an error condition unless the particular
+       function being invoked requires knowledge of the collation it should
+       apply.  If it does, an error will be raised at run-time.
+      </para>
+     </listitem>
+    </orderedlist>
+
+    For example, consider this table definition:
+<programlisting>
+CREATE TABLE test1 (
+    a text COLLATE "de_DE",
+    b text COLLATE "es_ES",
+    ...
+);
+</programlisting>
+
+    Then in
+<programlisting>
+SELECT a &lt; 'foo' FROM test1;
+</programlisting>
+    the <literal>&lt;</literal> comparison is performed according to
+    <literal>de_DE</literal> rules, because the expression combines an
+    implicitly derived collation with the default collation.  But in
+<programlisting>
+SELECT a &lt; ('foo' COLLATE "fr_FR") FROM test1;
+</programlisting>
+    the comparison is performed using <literal>fr_FR</literal> rules,
+    because the explicit collation derivation overrides the implicit one.
+    Furthermore, given
+<programlisting>
+SELECT a &lt; b FROM test1;
+</programlisting>
+    the parser cannot determine which collation to apply, since the
+    <structfield>a</structfield> and <structfield>b</structfield> columns have conflicting
+    implicit collations.  Since the <literal>&lt;</literal> operator
+    does need to know which collation to use, this will result in an
+    error.  The error can be resolved by attaching an explicit collation
+    specifier to either input expression, thus:
+<programlisting>
+SELECT a &lt; b COLLATE "de_DE" FROM test1;
+</programlisting>
+    or equivalently
+<programlisting>
+SELECT a COLLATE "de_DE" &lt; b FROM test1;
+</programlisting>
+    On the other hand, the structurally similar case
+<programlisting>
+SELECT a || b FROM test1;
+</programlisting>
+    does not result in an error, because the <literal>||</literal> operator
+    does not care about collations: its result is the same regardless
+    of the collation.
+   </para>
+
+   <para>
+    The collation assigned to a function or operator's combined input
+    expressions is also considered to apply to the function or operator's
+    result, if the function or operator delivers a result of a collatable
+    data type.  So, in
+<programlisting>
+SELECT * FROM test1 ORDER BY a || 'foo';
+</programlisting>
+    the ordering will be done according to <literal>de_DE</literal> rules.
+    But this query:
+<programlisting>
+SELECT * FROM test1 ORDER BY a || b;
+</programlisting>
+    results in an error, because even though the <literal>||</literal> operator
+    doesn't need to know a collation, the <literal>ORDER BY</literal> clause does.
+    As before, the conflict can be resolved with an explicit collation
+    specifier:
+<programlisting>
+SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
+</programlisting>
+   </para>
+  </sect2>
+
+  <sect2 id="collation-managing">
+   <title>Managing Collations</title>
+
+   <para>
+    A collation is an SQL schema object that maps an SQL name to locales
+    provided by libraries installed in the operating system.  A collation
+    definition has a <firstterm>provider</firstterm> that specifies which
+    library supplies the locale data.  One standard provider name
+    is <literal>libc</literal>, which uses the locales provided by the
+    operating system C library.  These are the locales used by most tools
+    provided by the operating system.  Another provider
+    is <literal>icu</literal>, which uses the external
+    ICU<indexterm><primary>ICU</primary></indexterm> library.  ICU locales can only be
+    used if support for ICU was configured when PostgreSQL was built.
+   </para>
+
+   <para>
+    A collation object provided by <literal>libc</literal> maps to a
+    combination of <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol>
+    settings, as accepted by the <literal>setlocale()</literal> system library call.  (As
+    the name would suggest, the main purpose of a collation is to set
+    <symbol>LC_COLLATE</symbol>, which controls the sort order.  But
+    it is rarely necessary in practice to have an
+    <symbol>LC_CTYPE</symbol> setting that is different from
+    <symbol>LC_COLLATE</symbol>, so it is more convenient to collect
+    these under one concept than to create another infrastructure for
+    setting <symbol>LC_CTYPE</symbol> per expression.)  Also,
+    a <literal>libc</literal> collation
+    is tied to a character set encoding (see <xref linkend="multibyte"/>).
+    The same collation name may exist for different encodings.
+   </para>
+
+   <para>
+    A collation object provided by <literal>icu</literal> maps to a named
+    collator provided by the ICU library.  ICU does not support
+    separate <quote>collate</quote> and <quote>ctype</quote> settings, so
+    they are always the same.  Also, ICU collations are independent of the
+    encoding, so there is always only one ICU collation of a given name in
+    a database.
+   </para>
+
+   <sect3 id="collation-managing-standard">
+    <title>Standard Collations</title>
+
+   <para>
+    On all platforms, the collations named <literal>default</literal>,
+    <literal>C</literal>, and <literal>POSIX</literal> are available.  Additional
+    collations may be available depending on operating system support.
+    The <literal>default</literal> collation selects the <symbol>LC_COLLATE</symbol>
+    and <symbol>LC_CTYPE</symbol> values specified at database creation time.
+    The <literal>C</literal> and <literal>POSIX</literal> collations both specify
+    <quote>traditional C</quote> behavior, in which only the ASCII letters
+    <quote><literal>A</literal></quote> through <quote><literal>Z</literal></quote>
+    are treated as letters, and sorting is done strictly by character
+    code byte values.
+   </para>
+
+   <note>
+    <para>
+     The <literal>C</literal> and <literal>POSIX</literal> locales may behave
+     differently depending on the database encoding.
+    </para>
+   </note>
+
+   <para>
+    Additionally, two SQL standard collation names are available:
+
+    <variablelist>
+     <varlistentry>
+      <term><literal>unicode</literal></term>
+      <listitem>
+       <para>
+        This collation sorts using the Unicode Collation Algorithm with the
+        Default Unicode Collation Element Table.  It is available in all
+        encodings.  ICU support is required to use this collation.  (This
+        collation has the same behavior as the ICU root locale; see <xref
+        linkend="collation-managing-predefined-icu-und-x-icu"/>.)
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><literal>ucs_basic</literal></term>
+      <listitem>
+       <para>
+        This collation sorts by Unicode code point.  It is only available for
+        encoding <literal>UTF8</literal>.  (This collation has the same
+        behavior as the libc locale specification <literal>C</literal> in
+        <literal>UTF8</literal> encoding.)
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+  </sect3>
+
+  <sect3 id="collation-managing-predefined">
+   <title>Predefined Collations</title>
+
+   <para>
+    If the operating system provides support for using multiple locales
+    within a single program (<function>newlocale</function> and related functions),
+    or if support for ICU is configured,
+    then when a database cluster is initialized, <command>initdb</command>
+    populates the system catalog <literal>pg_collation</literal> with
+    collations based on all the locales it finds in the operating
+    system at the time.
+   </para>
+
+   <para>
+    To inspect the currently available locales, use the query <literal>SELECT
+    * FROM pg_collation</literal>, or the command <command>\dOS+</command>
+    in <application>psql</application>.
+   </para>
+
+  <sect4 id="collation-managing-predefined-libc">
+   <title>libc Collations</title>
+
+   <para>
+    For example, the operating system might
+    provide a locale named <literal>de_DE.utf8</literal>.
+    <command>initdb</command> would then create a collation named
+    <literal>de_DE.utf8</literal> for encoding <literal>UTF8</literal>
+    that has both <symbol>LC_COLLATE</symbol> and
+    <symbol>LC_CTYPE</symbol> set to <literal>de_DE.utf8</literal>.
+    It will also create a collation with the <literal>.utf8</literal>
+    tag stripped off the name.  So you could also use the collation
+    under the name <literal>de_DE</literal>, which is less cumbersome
+    to write and makes the name less encoding-dependent.  Note that,
+    nevertheless, the initial set of collation names is
+    platform-dependent.
+   </para>
+
+   <para>
+    The default set of collations provided by <literal>libc</literal> map
+    directly to the locales installed in the operating system, which can be
+    listed using the command <literal>locale -a</literal>.  In case
+    a <literal>libc</literal> collation is needed that has different values
+    for <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol>, or if new
+    locales are installed in the operating system after the database system
+    was initialized, then a new collation may be created using
+    the <xref linkend="sql-createcollation"/> command.
+    New operating system locales can also be imported en masse using
+    the <link linkend="functions-admin-collation"><function>pg_import_system_collations()</function></link> function.
+   </para>
+
+   <para>
+    Within any particular database, only collations that use that
+    database's encoding are of interest.  Other entries in
+    <literal>pg_collation</literal> are ignored.  Thus, a stripped collation
+    name such as <literal>de_DE</literal> can be considered unique
+    within a given database even though it would not be unique globally.
+    Use of the stripped collation names is recommended, since it will
+    make one fewer thing you need to change if you decide to change to
+    another database encoding.  Note however that the <literal>default</literal>,
+    <literal>C</literal>, and <literal>POSIX</literal> collations can be used regardless of
+    the database encoding.
+   </para>
+
+   <para>
+    <productname>PostgreSQL</productname> considers distinct collation
+    objects to be incompatible even when they have identical properties.
+    Thus for example,
+<programlisting>
+SELECT a COLLATE "C" &lt; b COLLATE "POSIX" FROM test1;
+</programlisting>
+    will draw an error even though the <literal>C</literal> and <literal>POSIX</literal>
+    collations have identical behaviors.  Mixing stripped and non-stripped
+    collation names is therefore not recommended.
+   </para>
+  </sect4>
+
+  <sect4 id="collation-managing-predefined-icu">
+   <title>ICU Collations</title>
+
+   <para>
+    With ICU, it is not sensible to enumerate all possible locale names.  ICU
+    uses a particular naming system for locales, but there are many more ways
+    to name a locale than there are actually distinct locales.
+    <command>initdb</command> uses the ICU APIs to extract a set of distinct
+    locales to populate the initial set of collations.  Collations provided by
+    ICU are created in the SQL environment with names in BCP 47 language tag
+    format, with a <quote>private use</quote>
+    extension <literal>-x-icu</literal> appended, to distinguish them from
+    libc locales.
+   </para>
+
+   <para>
+    Here are some example collations that might be created:
+
+    <variablelist>
+     <varlistentry id="collation-managing-predefined-icu-de-x-icu">
+      <term><literal>de-x-icu</literal></term>
+      <listitem>
+       <para>German collation, default variant</para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="collation-managing-predefined-icu-de-at-x-icu">
+      <term><literal>de-AT-x-icu</literal></term>
+      <listitem>
+       <para>German collation for Austria, default variant</para>
+       <para>
+        (There are also, say, <literal>de-DE-x-icu</literal>
+        or <literal>de-CH-x-icu</literal>, but as of this writing, they are
+        equivalent to <literal>de-x-icu</literal>.)
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="collation-managing-predefined-icu-und-x-icu">
+      <term><literal>und-x-icu</literal> (for <quote>undefined</quote>)</term>
+      <listitem>
+       <para>
+        ICU <quote>root</quote> collation.  Use this to get a reasonable
+        language-agnostic sort order.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+   </para>
+
+   <para>
+    Some (less frequently used) encodings are not supported by ICU.  When the
+    database encoding is one of these, ICU collation entries
+    in <literal>pg_collation</literal> are ignored.  Attempting to use one
+    will draw an error along the lines of <quote>collation "de-x-icu" for
+    encoding "WIN874" does not exist</quote>.
+   </para>
+  </sect4>
+  </sect3>
+
+  <sect3 id="collation-create">
+   <title>Creating New Collation Objects</title>
+
+   <para>
+    If the standard and predefined collations are not sufficient, users can
+    create their own collation objects using the SQL
+    command <xref linkend="sql-createcollation"/>.
+   </para>
+
+   <para>
+    The standard and predefined collations are in the
+    schema <literal>pg_catalog</literal>, like all predefined objects.
+    User-defined collations should be created in user schemas.  This also
+    ensures that they are saved by <command>pg_dump</command>.
+   </para>
+
+   <sect4 id="collation-managing-create-libc">
+    <title>libc Collations</title>
+
+    <para>
+     New libc collations can be created like this:
+<programlisting>
+CREATE COLLATION german (provider = libc, locale = 'de_DE');
+</programlisting>
+     The exact values that are acceptable for the <literal>locale</literal>
+     clause in this command depend on the operating system.  On Unix-like
+     systems, the command <literal>locale -a</literal> will show a list.
+    </para>
+
+    <para>
+     Since the predefined libc collations already include all collations
+     defined in the operating system when the database instance is
+     initialized, it is not often necessary to manually create new ones.
+     Reasons might be if a different naming system is desired (in which case
+     see also <xref linkend="collation-copy"/>) or if the operating system has
+     been upgraded to provide new locale definitions (in which case see
+     also <link linkend="functions-admin-collation"><function>pg_import_system_collations()</function></link>).
+    </para>
+   </sect4>
+
+   <sect4 id="collation-managing-create-icu">
+    <title>ICU Collations</title>
+
+    <para>
+     ICU collations can be created like:
+
+<programlisting>
+CREATE COLLATION german (provider = icu, locale = 'de-DE');
+</programlisting>
+
+     ICU locales are specified as a BCP 47 <link
+     linkend="icu-language-tag">Language Tag</link>, but can also accept most
+     libc-style locale names. If possible, libc-style locale names are
+     transformed into language tags.
+    </para>
+    <para>
+     New ICU collations can customize collation behavior extensively by
+     including collation attributes in the language tag. See <xref
+     linkend="icu-custom-collations"/> for details and examples.
+    </para>
+   </sect4>
+   <sect4 id="collation-copy">
+   <title>Copying Collations</title>
+
+   <para>
+    The command <xref linkend="sql-createcollation"/> can also be used to
+    create a new collation from an existing collation, which can be useful to
+    be able to use operating-system-independent collation names in
+    applications, create compatibility names, or use an ICU-provided collation
+    under a more readable name.  For example:
+<programlisting>
+CREATE COLLATION german FROM "de_DE";
+CREATE COLLATION french FROM "fr-x-icu";
+</programlisting>
+   </para>
+   </sect4>
+   </sect3>
+
+   <sect3 id="collation-nondeterministic">
+    <title>Nondeterministic Collations</title>
+
+    <para>
+     A collation is either <firstterm>deterministic</firstterm> or
+     <firstterm>nondeterministic</firstterm>.  A deterministic collation uses
+     deterministic comparisons, which means that it considers strings to be
+     equal only if they consist of the same byte sequence.  Nondeterministic
+     comparison may determine strings to be equal even if they consist of
+     different bytes.  Typical situations include case-insensitive comparison,
+     accent-insensitive comparison, as well as comparison of strings in
+     different Unicode normal forms.  It is up to the collation provider to
+     actually implement such insensitive comparisons; the deterministic flag
+     only determines whether ties are to be broken using bytewise comparison.
+     See also <ulink url="https://www.unicode.org/reports/tr10">Unicode Technical
+     Standard 10</ulink> for more information on the terminology.
+    </para>
+
+    <para>
+     To create a nondeterministic collation, specify the property
+     <literal>deterministic = false</literal> to <command>CREATE
+     COLLATION</command>, for example:
+<programlisting>
+CREATE COLLATION ndcoll (provider = icu, locale = 'und', deterministic = false);
+</programlisting>
+     This example would use the standard Unicode collation in a
+     nondeterministic way.  In particular, this would allow strings in
+     different normal forms to be compared correctly.  More interesting
+     examples make use of the ICU customization facilities explained above.
+     For example:
+<programlisting>
+CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false);
+CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false);
+</programlisting>
+    </para>
+
+    <para>
+     All standard and predefined collations are deterministic, all
+     user-defined collations are deterministic by default.  While
+     nondeterministic collations give a more <quote>correct</quote> behavior,
+     especially when considering the full power of Unicode and its many
+     special cases, they also have some drawbacks.  Foremost, their use leads
+     to a performance penalty.  Note, in particular, that B-tree cannot use
+     deduplication with indexes that use a nondeterministic collation.  Also,
+     certain operations are not possible with nondeterministic collations,
+     such as pattern matching operations.  Therefore, they should be used
+     only in cases where they are specifically wanted.
+    </para>
+
+    <tip>
+     <para>
+      To deal with text in different Unicode normalization forms, it is also
+      an option to use the functions/expressions
+      <function>normalize</function> and <literal>is normalized</literal> to
+      preprocess or check the strings, instead of using nondeterministic
+      collations.  There are different trade-offs for each approach.
+     </para>
+    </tip>
+   </sect3>
+  </sect2>
+
+  <sect2 id="icu-custom-collations">
+   <title>ICU Custom Collations</title>
+
+   <para>
+    ICU allows extensive control over collation behavior by defining new
+    collations with collation settings as a part of the language tag. These
+    settings can modify the collation order to suit a variety of needs. For
+    instance:
+
+<programlisting>
+-- ignore differences in accents and case
+CREATE COLLATION ignore_accent_case (provider = icu, deterministic = false, locale = 'und-u-ks-level1');
+SELECT 'Å' = 'A' COLLATE ignore_accent_case; -- true
+SELECT 'z' = 'Z' COLLATE ignore_accent_case; -- true
+
+-- upper case letters sort before lower case.
+CREATE COLLATION upper_first (provider = icu, locale = 'und-u-kf-upper');
+SELECT 'B' &lt; 'b' COLLATE upper_first; -- true
+
+-- treat digits numerically and ignore punctuation
+CREATE COLLATION num_ignore_punct (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-kn');
+SELECT 'id-45' &lt; 'id-123' COLLATE num_ignore_punct; -- true
+SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
+</programlisting>
+
+    Many of the available options are described in <xref
+    linkend="icu-collation-settings"/>, or see <xref
+    linkend="icu-external-references"/> for more details.
+   </para>
+
+   <sect3 id="icu-collation-comparison-levels">
+    <title>ICU Comparison Levels</title>
+
+    <para>
+     Comparison of two strings (collation) in ICU is determined by a
+     multi-level process, where textual features are grouped into
+     "levels". Treatment of each level is controlled by the <link
+     linkend="icu-collation-settings-table">collation settings</link>. Higher
+     levels correspond to finer textual features.
+    </para>
+
+    <para>
+     <xref linkend="icu-collation-levels"/> shows which textual feature
+     differences are considered significant when determining equality at the
+     given level. The Unicode character <literal>U+2063</literal> is an
+     invisible separator, and as seen in the table, is ignored for at all
+     levels of comparison less than <literal>identic</literal>.
+    </para>
+
+     <table id="icu-collation-levels">
+      <title>ICU Collation Levels</title>
+      <tgroup cols="8">
+       <colspec colname="col1" colwidth="1*"/>
+       <colspec colname="col2" colwidth="1.25*"/>
+       <colspec colname="col3" colwidth="1*"/>
+       <colspec colname="col4" colwidth="1*"/>
+       <colspec colname="col5" colwidth="1*"/>
+       <colspec colname="col6" colwidth="1*"/>
+       <colspec colname="col7" colwidth="1*"/>
+       <colspec colname="col8" colwidth="1*"/>
+
+       <thead>
+        <row>
+         <entry>Level</entry>
+         <entry>Description</entry>
+         <entry><literal>'f' = 'f'</literal></entry>
+         <entry><literal>'ab' = U&amp;'a\2063b'</literal></entry>
+         <entry><literal>'x-y' = 'x_y'</literal></entry>
+         <entry><literal>'g' = 'G'</literal></entry>
+         <entry><literal>'n' = 'ñ'</literal></entry>
+         <entry><literal>'y' = 'z'</literal></entry>
+        </row>
+       </thead>
+
+       <tbody>
+        <row>
+         <entry>level1</entry>
+         <entry>Base Character</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+        <row>
+         <entry>level2</entry>
+         <entry>Accents</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+        <row>
+         <entry>level3</entry>
+         <entry>Case/Variants</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+        <row>
+         <entry>level4</entry>
+         <entry>Punctuation</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+        <row>
+         <entry>identic</entry>
+         <entry>All</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+       </tbody>
+      </tgroup>
+     </table>
+
+    <para>
+     At every level, even with full normalization off, basic normalization is
+     performed. For example, <literal>'á'</literal> may be composed of the
+     code points <literal>U&amp;'\0061\0301'</literal> or the single code
+     point <literal>U&amp;'\00E1'</literal>, and those sequences will be
+     considered equal even at the <literal>identic</literal> level. To treat
+     any difference in code point representation as distinct, use a collation
+     created with <symbol>deterministic</symbol> set to
+     <literal>true</literal>.
+    </para>
+
+    <sect4 id="icu-collation-level-examples">
+     <title>Collation Level Examples</title>
+
+<programlisting>
+CREATE COLLATION level3 (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-level3');
+CREATE COLLATION level4 (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-level4');
+CREATE COLLATION identic (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-identic');
+
+-- invisible separator ignored at all levels except identic
+SELECT 'ab' = U&amp;'a\2063b' COLLATE level4; -- true
+SELECT 'ab' = U&amp;'a\2063b' COLLATE identic; -- false
+
+-- punctuation ignored at level3 but not at level 4
+SELECT 'x-y' = 'x_y' COLLATE level3; -- true
+SELECT 'x-y' = 'x_y' COLLATE level4; -- false
+</programlisting>
+
+    </sect4>
+   </sect3>
+
+   <sect3 id="icu-collation-settings">
+    <title>Collation Settings for an ICU Locale</title>
+
+    <para>
+     <xref linkend="icu-collation-settings-table"/> shows the available
+     collation settings, which can be used as part of a language tag to
+     customize a collation.
+    </para>
+
+     <table id="icu-collation-settings-table">
+      <title>ICU Collation Settings</title>
+      <tgroup cols="4">
+       <colspec colname="col1" colwidth="1*"/>
+       <colspec colname="col2" colwidth="2*"/>
+       <colspec colname="col3" colwidth="2*"/>
+       <colspec colname="col4" colwidth="5*"/>
+
+       <thead>
+        <row>
+         <entry>Key</entry>
+         <entry>Values</entry>
+         <entry>Default</entry>
+         <entry>Description</entry>
+        </row>
+       </thead>
+
+       <tbody>
+        <row>
+         <entry><literal>co</literal></entry>
+         <entry><literal>emoji</literal>, <literal>phonebk</literal>, <literal>standard</literal>, <replaceable>...</replaceable></entry>
+         <entry><literal>standard</literal></entry>
+         <entry>
+          Collation type. See <xref linkend="icu-external-references"/> for additional options and details.
+         </entry>
+        </row>
+
+        <row>
+         <entry><literal>ka</literal></entry>
+         <entry><literal>noignore</literal>, <literal>shifted</literal></entry>
+         <entry><literal>noignore</literal></entry>
+         <entry>
+          If set to <literal>shifted</literal>, causes some characters
+          (e.g. punctuation or space) to be ignored in comparison. Key
+          <literal>ks</literal> must be set to <literal>level3</literal> or
+          lower to take effect. Set key <literal>kv</literal> to control which
+          character classes are ignored.
+         </entry>
+        </row>
+
+        <row>
+         <entry><literal>kb</literal></entry>
+         <entry><literal>true</literal>, <literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          Backwards comparison for the level 2 differences. For example,
+          locale <literal>und-u-kb</literal> sorts <literal>'àe'</literal>
+          before <literal>'aé'</literal>.
+         </entry>
+        </row>
+
+        <row>
+         <entry><literal>kc</literal></entry>
+         <entry><literal>true</literal>, <literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          <para>
+           Separates case into a "level 2.5" that falls between accents and
+           other level 3 features.
+          </para>
+          <para>
+           If set to <literal>true</literal> and <literal>ks</literal> is set
+           to <literal>level1</literal>, will ignore accents but take case
+           into account.
+          </para>
+         </entry>
+        </row>
+
+        <row>
+         <entry><literal>kf</literal></entry>
+         <entry>
+          <literal>upper</literal>, <literal>lower</literal>,
+          <literal>false</literal>
+         </entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          If set to <literal>upper</literal>, upper case sorts before lower
+          case. If set to <literal>lower</literal>, lower case sorts before
+          upper case. If set to <literal>false</literal>, the sort depends on
+          the rules of the locale.
+         </entry>
+        </row>
+
+        <row>
+         <entry><literal>kn</literal></entry>
+         <entry><literal>true</literal>, <literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          If set to <literal>true</literal>, numbers within a string are
+          treated as a single numeric value rather than a sequence of
+          digits. For example, <literal>'id-45'</literal> sorts before
+          <literal>'id-123'</literal>.
+         </entry>
+        </row>
+
+        <row>
+         <entry><literal>kk</literal></entry>
+         <entry><literal>true</literal>, <literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          <para>
+           Enable full normalization; may affect performance. Basic
+           normalization is performed even when set to
+           <literal>false</literal>. Locales for languages that require full
+           normalization typically enable it by default.
+          </para>
+          <para>
+           Full normalization is important in some cases, such as when
+           multiple accents are applied to a single character. For example,
+           the code point sequences <literal>U&amp;'\0065\0323\0302'</literal>
+           and <literal>U&amp;'\0065\0302\0323'</literal> represent
+           an <literal>e</literal> with circumflex and dot-below accents
+           applied in different orders. With full normalization
+           on, these code point sequences are treated as equal; otherwise they
+           are unequal.
+          </para>
+         </entry>
+        </row>
+
+        <row>
+         <entry><literal>kr</literal></entry>
+         <entry>
+          <literal>space</literal>, <literal>punct</literal>,
+          <literal>symbol</literal>, <literal>currency</literal>,
+          <literal>digit</literal>, <replaceable>script-id</replaceable>
+         </entry>
+         <entry></entry>
+         <entry>
+          <para>
+           Set to one or more of the valid values, or any BCP 47
+           <replaceable>script-id</replaceable>, e.g. <literal>latn</literal>
+           ("Latin") or <literal>grek</literal> ("Greek"). Multiple values are
+           separated by "<literal>-</literal>".
+          </para>
+          <para>
+           Redefines the ordering of classes of characters; those characters
+           belonging to a class earlier in the list sort before characters
+           belonging to a class later in the list. For instance, the value
+           <literal>digit-currency-space</literal> (as part of a language tag
+           like <literal>und-u-kr-digit-currency-space</literal>) sorts
+           punctuation before digits and spaces.
+          </para>
+         </entry>
+        </row>
+
+        <row>
+         <entry><literal>ks</literal></entry>
+         <entry><literal>level1</literal>, <literal>level2</literal>, <literal>level3</literal>, <literal>level4</literal>, <literal>identic</literal></entry>
+         <entry><literal>level3</literal></entry>
+         <entry>
+          Sensitivity (or "strength") when determining equality, with
+          <literal>level1</literal> the least sensitive to differences and
+          <literal>identic</literal> the most sensitive to differences. See
+          <xref linkend="icu-collation-levels"/> for details.
+         </entry>
+        </row>
+
+        <row>
+         <entry><literal>kv</literal></entry>
+         <entry>
+          <literal>space</literal>, <literal>punct</literal>,
+          <literal>symbol</literal>, <literal>currency</literal>
+         </entry>
+         <entry><literal>punct</literal></entry>
+         <entry>
+          Classes of characters ignored during comparison at level 3. Setting
+          to a later value includes earlier values;
+          e.g. <literal>symbol</literal> also includes
+          <literal>punct</literal> and <literal>space</literal> in the
+          characters to be ignored. Key <literal>ka</literal> must be set to
+          <literal>shifted</literal> and key <literal>ks</literal> must be set
+          to <literal>level3</literal> or lower to take effect.
+         </entry>
+        </row>
+       </tbody>
+      </tgroup>
+     </table>
+
+    <para>
+     Defaults may depend on locale. The above table is not meant to be
+     complete. See <xref linkend="icu-external-references"/> for additional
+     options and details.
+    </para>
+
+    <note>
+     <para>
+      For many collation settings, you must create the collation with
+      <option>deterministic</option> set to <literal>false</literal> for the
+      setting to have the desired effect (see <xref
+      linkend="collation-nondeterministic"/>). Additionally, some settings
+      only take effect when the key <literal>ka</literal> is set to
+      <literal>shifted</literal> (see <xref
+      linkend="icu-collation-settings-table"/>).
+     </para>
+    </note>
+   </sect3>
+
+   <sect3 id="icu-locale-examples">
+    <title>Collation Settings Examples</title>
+
+     <variablelist>
+      <varlistentry id="collation-managing-create-icu-de-u-co-phonebk-x-icu">
+       <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</literal></term>
+       <listitem>
+        <para>German collation with phone book collation type</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="collation-managing-create-icu-und-u-co-emoji-x-icu">
+       <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');</literal></term>
+       <listitem>
+        <para>
+         Root collation with Emoji collation type, per Unicode Technical Standard #51
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="collation-managing-create-icu-en-u-kr-grek-latn">
+       <term><literal>CREATE COLLATION latinlast (provider = icu, locale = 'en-u-kr-grek-latn');</literal></term>
+       <listitem>
+        <para>
+         Sort Greek letters before Latin ones.  (The default is Latin before Greek.)
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="collation-managing-create-icu-en-u-kf-upper">
+       <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');</literal></term>
+       <listitem>
+        <para>
+         Sort upper-case letters before lower-case letters.  (The default is
+         lower-case letters first.)
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="collation-managing-create-icu-en-u-kf-upper-kr-grek-latn">
+       <term><literal>CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-grek-latn');</literal></term>
+       <listitem>
+        <para>
+         Combines both of the above options.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+   </sect3>
+
+   <sect3 id="icu-tailoring-rules">
+    <title>ICU Tailoring Rules</title>
+
+    <para>
+     If the options provided by the collation settings shown above are not
+     sufficient, the order of collation elements can be changed with tailoring
+     rules, whose syntax is detailed at <ulink
+     url="https://unicode-org.github.io/icu/userguide/collation/customization/"></ulink>.
+    </para>
+
+    <para>
+     This small example creates a collation based on the root locale with a
+     tailoring rule:
+<programlisting>
+<![CDATA[CREATE COLLATION custom (provider = icu, locale = 'und', rules = '&V << w <<< W');]]>
+</programlisting>
+     With this rule, the letter <quote>W</quote> is sorted after
+     <quote>V</quote>, but is treated as a secondary difference similar to an
+     accent.  Rules like this are contained in the locale definitions of some
+     languages.  (Of course, if a locale definition already contains the
+     desired rules, then they don't need to be specified again explicitly.)
+    </para>
+
+    <para>
+     Here is a more complex example.  The following statement sets up a
+     collation named <literal>ebcdic</literal> with rules to sort US-ASCII
+     characters in the order of the EBCDIC encoding.
+
+<programlisting>
+<![CDATA[CREATE COLLATION ebcdic (provider = icu, locale = 'und',
+rules = $$
+& ' ' < '.' < '<' < '(' < '+' < \|
+< '&' < '!' < '$' < '*' < ')' < ';'
+< '-' < '/' < ',' < '%' < '_' < '>' < '?'
+< '`' < ':' < '#' < '@' < \' < '=' < '"'
+<*a-r < '~' <*s-z < '^' < '[' < ']'
+< '{' <*A-I < '}' <*J-R < '\' <*S-Z <*0-9
+$$);]]>
+
+SELECT c
+FROM (VALUES ('a'), ('b'), ('A'), ('B'), ('1'), ('2'), ('!'), ('^')) AS x(c)
+ORDER BY c COLLATE ebcdic;
+ c
+---
+ !
+ a
+ b
+ ^
+ A
+ B
+ 1
+ 2
+</programlisting>
+    </para>
+   </sect3>
+
+   <sect3 id="icu-external-references">
+    <title>External References for ICU</title>
+
+    <para>
+     This section (<xref linkend="icu-custom-collations"/>) is only a brief
+     overview of ICU behavior and language tags. Refer to the following
+     documents for technical details, additional options, and new behavior:
+    </para>
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       <ulink url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode Technical Standard #35</ulink>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <ulink url="https://tools.ietf.org/html/bcp47">BCP 47</ulink>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR repository</ulink>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <ulink url="https://unicode-org.github.io/icu/userguide/locale/"></ulink>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <ulink url="https://unicode-org.github.io/icu/userguide/collation/"></ulink>
+      </para>
+     </listitem>
+    </itemizedlist>
+   </sect3>
+  </sect2>
+ </sect1>
+
+ <sect1 id="multibyte">
+  <title>Character Set Support</title>
+
+  <indexterm zone="multibyte"><primary>character set</primary></indexterm>
+
+  <para>
+   The character set support in <productname>PostgreSQL</productname>
+   allows you to store text in a variety of character sets (also called
+   encodings), including
+   single-byte character sets such as the ISO 8859 series and
+   multiple-byte character sets such as <acronym>EUC</acronym> (Extended Unix
+   Code), UTF-8, and Mule internal code.  All supported character sets
+   can be used transparently by clients, but a few are not supported
+   for use within the server (that is, as a server-side encoding).
+   The default character set is selected while
+   initializing your <productname>PostgreSQL</productname> database
+   cluster using <command>initdb</command>.  It can be overridden when you
+   create a database, so you can have multiple
+   databases each with a different character set.
+  </para>
+
+  <para>
+   An important restriction, however, is that each database's character set
+   must be compatible with the database's <envar>LC_CTYPE</envar> (character
+   classification) and <envar>LC_COLLATE</envar> (string sort order) locale
+   settings. For <literal>C</literal> or
+   <literal>POSIX</literal> locale, any character set is allowed, but for other
+   libc-provided locales there is only one character set that will work
+   correctly.
+   (On Windows, however, UTF-8 encoding can be used with any locale.)
+   If you have ICU support configured, ICU-provided locales can be used
+   with most but not all server-side encodings.
+  </para>
+
+   <sect2 id="multibyte-charset-supported">
+    <title>Supported Character Sets</title>
+
+    <para>
+     <xref linkend="charset-table"/> shows the character sets available
+     for use in <productname>PostgreSQL</productname>.
+    </para>
+
+     <table id="charset-table">
+      <title><productname>PostgreSQL</productname> Character Sets</title>
+      <tgroup cols="7">
+       <colspec colname="col1" colwidth="3*"/>
+       <colspec colname="col2" colwidth="2*"/>
+       <colspec colname="col3" colwidth="2*"/>
+       <colspec colname="col4" colwidth="1.25*"/>
+       <colspec colname="col5" colwidth="1*"/>
+       <colspec colname="col6" colwidth="1*"/>
+       <colspec colname="col7" colwidth="2*"/>
+       <thead>
+        <row>
+         <entry>Name</entry>
+         <entry>Description</entry>
+         <entry>Language</entry>
+         <entry>Server?</entry>
+         <entry>ICU?</entry>
+         <!--
+          The Bytes/Char field is populated by looking at the values returned
+          by pg_wchar_table.mblen function for each encoding.
+         -->
+         <entry>Bytes/&zwsp;Char</entry>
+         <entry>Aliases</entry>
+        </row>
+       </thead>
+       <tbody>
+        <row>
+         <entry><literal>BIG5</literal></entry>
+         <entry>Big Five</entry>
+         <entry>Traditional Chinese</entry>
+         <entry>No</entry>
+         <entry>No</entry>
+         <entry>1&ndash;2</entry>
+         <entry><literal>WIN950</literal>, <literal>Windows950</literal></entry>
+        </row>
+        <row>
+         <entry><literal>EUC_CN</literal></entry>
+         <entry>Extended UNIX Code-CN</entry>
+         <entry>Simplified Chinese</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1&ndash;3</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>EUC_JP</literal></entry>
+         <entry>Extended UNIX Code-JP</entry>
+         <entry>Japanese</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1&ndash;3</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>EUC_JIS_2004</literal></entry>
+         <entry>Extended UNIX Code-JP, JIS X 0213</entry>
+         <entry>Japanese</entry>
+         <entry>Yes</entry>
+         <entry>No</entry>
+         <entry>1&ndash;3</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>EUC_KR</literal></entry>
+         <entry>Extended UNIX Code-KR</entry>
+         <entry>Korean</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1&ndash;3</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>EUC_TW</literal></entry>
+         <entry>Extended UNIX Code-TW</entry>
+         <entry>Traditional Chinese, Taiwanese</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1&ndash;3</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>GB18030</literal></entry>
+         <entry>National Standard</entry>
+         <entry>Chinese</entry>
+         <entry>No</entry>
+         <entry>No</entry>
+         <entry>1&ndash;4</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>GBK</literal></entry>
+         <entry>Extended National Standard</entry>
+         <entry>Simplified Chinese</entry>
+         <entry>No</entry>
+         <entry>No</entry>
+         <entry>1&ndash;2</entry>
+         <entry><literal>WIN936</literal>, <literal>Windows936</literal></entry>
+        </row>
+        <row>
+         <entry><literal>ISO_8859_5</literal></entry>
+         <entry>ISO 8859-5, <acronym>ECMA</acronym> 113</entry>
+         <entry>Latin/Cyrillic</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>ISO_8859_6</literal></entry>
+         <entry>ISO 8859-6, <acronym>ECMA</acronym> 114</entry>
+         <entry>Latin/Arabic</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>ISO_8859_7</literal></entry>
+         <entry>ISO 8859-7, <acronym>ECMA</acronym> 118</entry>
+         <entry>Latin/Greek</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>ISO_8859_8</literal></entry>
+         <entry>ISO 8859-8, <acronym>ECMA</acronym> 121</entry>
+         <entry>Latin/Hebrew</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>JOHAB</literal></entry>
+         <entry><acronym>JOHAB</acronym></entry>
+         <entry>Korean (Hangul)</entry>
+         <entry>No</entry>
+         <entry>No</entry>
+         <entry>1&ndash;3</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>KOI8R</literal></entry>
+         <entry><acronym>KOI</acronym>8-R</entry>
+         <entry>Cyrillic (Russian)</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry><literal>KOI8</literal></entry>
+        </row>
+        <row>
+         <entry><literal>KOI8U</literal></entry>
+         <entry><acronym>KOI</acronym>8-U</entry>
+         <entry>Cyrillic (Ukrainian)</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>LATIN1</literal></entry>
+         <entry>ISO 8859-1, <acronym>ECMA</acronym> 94</entry>
+         <entry>Western European</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry><literal>ISO88591</literal></entry>
+        </row>
+        <row>
+         <entry><literal>LATIN2</literal></entry>
+         <entry>ISO 8859-2, <acronym>ECMA</acronym> 94</entry>
+         <entry>Central European</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry><literal>ISO88592</literal></entry>
+        </row>
+        <row>
+         <entry><literal>LATIN3</literal></entry>
+         <entry>ISO 8859-3, <acronym>ECMA</acronym> 94</entry>
+         <entry>South European</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry><literal>ISO88593</literal></entry>
+        </row>
+        <row>
+         <entry><literal>LATIN4</literal></entry>
+         <entry>ISO 8859-4, <acronym>ECMA</acronym> 94</entry>
+         <entry>North European</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry><literal>ISO88594</literal></entry>
+        </row>
+        <row>
+         <entry><literal>LATIN5</literal></entry>
+         <entry>ISO 8859-9, <acronym>ECMA</acronym> 128</entry>
+         <entry>Turkish</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry><literal>ISO88599</literal></entry>
+        </row>
+        <row>
+         <entry><literal>LATIN6</literal></entry>
+         <entry>ISO 8859-10, <acronym>ECMA</acronym> 144</entry>
+         <entry>Nordic</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry><literal>ISO885910</literal></entry>
+        </row>
+        <row>
+         <entry><literal>LATIN7</literal></entry>
+         <entry>ISO 8859-13</entry>
+         <entry>Baltic</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry><literal>ISO885913</literal></entry>
+        </row>
+        <row>
+         <entry><literal>LATIN8</literal></entry>
+         <entry>ISO 8859-14</entry>
+         <entry>Celtic</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry><literal>ISO885914</literal></entry>
+        </row>
+        <row>
+         <entry><literal>LATIN9</literal></entry>
+         <entry>ISO 8859-15</entry>
+         <entry>LATIN1 with Euro and accents</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry><literal>ISO885915</literal></entry>
+        </row>
+        <row>
+         <entry><literal>LATIN10</literal></entry>
+         <entry>ISO 8859-16, <acronym>ASRO</acronym> SR 14111</entry>
+         <entry>Romanian</entry>
+         <entry>Yes</entry>
+         <entry>No</entry>
+         <entry>1</entry>
+         <entry><literal>ISO885916</literal></entry>
+        </row>
+        <row>
+         <entry><literal>MULE_INTERNAL</literal></entry>
+         <entry>Mule internal code</entry>
+         <entry>Multilingual Emacs</entry>
+         <entry>Yes</entry>
+         <entry>No</entry>
+         <entry>1&ndash;4</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>SJIS</literal></entry>
+         <entry>Shift JIS</entry>
+         <entry>Japanese</entry>
+         <entry>No</entry>
+         <entry>No</entry>
+         <entry>1&ndash;2</entry>
+         <entry><literal>Mskanji</literal>, <literal>ShiftJIS</literal>, <literal>WIN932</literal>, <literal>Windows932</literal></entry>
+        </row>
+        <row>
+         <entry><literal>SHIFT_JIS_2004</literal></entry>
+         <entry>Shift JIS, JIS X 0213</entry>
+         <entry>Japanese</entry>
+         <entry>No</entry>
+         <entry>No</entry>
+         <entry>1&ndash;2</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>SQL_ASCII</literal></entry>
+         <entry>unspecified (see text)</entry>
+         <entry><emphasis>any</emphasis></entry>
+         <entry>Yes</entry>
+         <entry>No</entry>
+         <entry>1</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>UHC</literal></entry>
+         <entry>Unified Hangul Code</entry>
+         <entry>Korean</entry>
+         <entry>No</entry>
+         <entry>No</entry>
+         <entry>1&ndash;2</entry>
+         <entry><literal>WIN949</literal>, <literal>Windows949</literal></entry>
+        </row>
+        <row>
+         <entry><literal>UTF8</literal></entry>
+         <entry>Unicode, 8-bit</entry>
+         <entry><emphasis>all</emphasis></entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1&ndash;4</entry>
+         <entry><literal>Unicode</literal></entry>
+        </row>
+        <row>
+         <entry><literal>WIN866</literal></entry>
+         <entry>Windows CP866</entry>
+         <entry>Cyrillic</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry><literal>ALT</literal></entry>
+        </row>
+        <row>
+         <entry><literal>WIN874</literal></entry>
+         <entry>Windows CP874</entry>
+         <entry>Thai</entry>
+         <entry>Yes</entry>
+         <entry>No</entry>
+         <entry>1</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>WIN1250</literal></entry>
+         <entry>Windows CP1250</entry>
+         <entry>Central European</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>WIN1251</literal></entry>
+         <entry>Windows CP1251</entry>
+         <entry>Cyrillic</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry><literal>WIN</literal></entry>
+        </row>
+        <row>
+         <entry><literal>WIN1252</literal></entry>
+         <entry>Windows CP1252</entry>
+         <entry>Western European</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>WIN1253</literal></entry>
+         <entry>Windows CP1253</entry>
+         <entry>Greek</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>WIN1254</literal></entry>
+         <entry>Windows CP1254</entry>
+         <entry>Turkish</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>WIN1255</literal></entry>
+         <entry>Windows CP1255</entry>
+         <entry>Hebrew</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>WIN1256</literal></entry>
+         <entry>Windows CP1256</entry>
+         <entry>Arabic</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>WIN1257</literal></entry>
+         <entry>Windows CP1257</entry>
+         <entry>Baltic</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry></entry>
+        </row>
+        <row>
+         <entry><literal>WIN1258</literal></entry>
+         <entry>Windows CP1258</entry>
+         <entry>Vietnamese</entry>
+         <entry>Yes</entry>
+         <entry>Yes</entry>
+         <entry>1</entry>
+         <entry><literal>ABC</literal>, <literal>TCVN</literal>, <literal>TCVN5712</literal>, <literal>VSCII</literal></entry>
+        </row>
+       </tbody>
+      </tgroup>
+     </table>
+
+     <para>
+      Not all client <acronym>API</acronym>s support all the listed character sets. For example, the
+      <productname>PostgreSQL</productname>
+      JDBC driver does not support <literal>MULE_INTERNAL</literal>, <literal>LATIN6</literal>,
+      <literal>LATIN8</literal>, and <literal>LATIN10</literal>.
+     </para>
+
+     <para>
+      The <literal>SQL_ASCII</literal> setting behaves considerably differently
+      from the other settings.  When the server character set is
+      <literal>SQL_ASCII</literal>, the server interprets byte values 0&ndash;127
+      according to the ASCII standard, while byte values 128&ndash;255 are taken
+      as uninterpreted characters.  No encoding conversion will be done when
+      the setting is <literal>SQL_ASCII</literal>.  Thus, this setting is not so
+      much a declaration that a specific encoding is in use, as a declaration
+      of ignorance about the encoding.  In most cases, if you are
+      working with any non-ASCII data, it is unwise to use the
+      <literal>SQL_ASCII</literal> setting because
+      <productname>PostgreSQL</productname> will be unable to help you by
+      converting or validating non-ASCII characters.
+     </para>
+    </sect2>
+
+   <sect2 id="multibyte-setting">
+    <title>Setting the Character Set</title>
+
+    <para>
+     <command>initdb</command> defines the default character set (encoding)
+     for a <productname>PostgreSQL</productname> cluster. For example,
+
+<screen>
+initdb -E EUC_JP
+</screen>
+
+     sets the default character set to
+     <literal>EUC_JP</literal> (Extended Unix Code for Japanese).  You
+     can use <option>--encoding</option> instead of
+     <option>-E</option> if you prefer longer option strings.
+     If no <option>-E</option> or <option>--encoding</option> option is
+     given, <command>initdb</command> attempts to determine the appropriate
+     encoding to use based on the specified or default locale.
+    </para>
+
+    <para>
+     You can specify a non-default encoding at database creation time,
+     provided that the encoding is compatible with the selected locale:
+
+<screen>
+createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean
+</screen>
+
+     This will create a database named <literal>korean</literal> that
+     uses the character set <literal>EUC_KR</literal>, and locale <literal>ko_KR</literal>.
+     Another way to accomplish this is to use this SQL command:
+
+<programlisting>
+CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;
+</programlisting>
+
+     Notice that the above commands specify copying the <literal>template0</literal>
+     database.  When copying any other database, the encoding and locale
+     settings cannot be changed from those of the source database, because
+     that might result in corrupt data.  For more information see
+     <xref linkend="manage-ag-templatedbs"/>.
+    </para>
+
+    <para>
+     The encoding for a database is stored in the system catalog
+     <literal>pg_database</literal>.  You can see it by using the
+     <command>psql</command> <option>-l</option> option or the
+     <command>\l</command> command.
+
+<screen>
+$ <userinput>psql -l</userinput>
+                                         List of databases
+   Name    |  Owner   | Encoding  |  Collation  |    Ctype    |          Access Privileges
+-----------+----------+-----------+-------------+-------------+-------------------------------------
+ clocaledb | hlinnaka | SQL_ASCII | C           | C           |
+ englishdb | hlinnaka | UTF8      | en_GB.UTF8  | en_GB.UTF8  |
+ japanese  | hlinnaka | UTF8      | ja_JP.UTF8  | ja_JP.UTF8  |
+ korean    | hlinnaka | EUC_KR    | ko_KR.euckr | ko_KR.euckr |
+ postgres  | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  |
+ template0 | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  | {=c/hlinnaka,hlinnaka=CTc/hlinnaka}
+ template1 | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  | {=c/hlinnaka,hlinnaka=CTc/hlinnaka}
+(7 rows)
+</screen>
+    </para>
+
+    <important>
+     <para>
+      On most modern operating systems, <productname>PostgreSQL</productname>
+      can determine which character set is implied by the <envar>LC_CTYPE</envar>
+      setting, and it will enforce that only the matching database encoding is
+      used.  On older systems it is your responsibility to ensure that you use
+      the encoding expected by the locale you have selected.  A mistake in
+      this area is likely to lead to strange behavior of locale-dependent
+      operations such as sorting.
+     </para>
+
+     <para>
+      <productname>PostgreSQL</productname> will allow superusers to create
+      databases with <literal>SQL_ASCII</literal> encoding even when
+      <envar>LC_CTYPE</envar> is not <literal>C</literal> or <literal>POSIX</literal>.  As noted
+      above, <literal>SQL_ASCII</literal> does not enforce that the data stored in
+      the database has any particular encoding, and so this choice poses risks
+      of locale-dependent misbehavior.  Using this combination of settings is
+      deprecated and may someday be forbidden altogether.
+     </para>
+    </important>
+   </sect2>
+
+   <sect2 id="multibyte-automatic-conversion">
+    <title>Automatic Character Set Conversion Between Server and Client</title>
+
+    <para>
+     <productname>PostgreSQL</productname> supports automatic character
+     set conversion between server and client for many combinations of
+     character sets (<xref linkend="multibyte-conversions-supported"/>
+     shows which ones).
+    </para>
+
+    <para>
+     To enable automatic character set conversion, you have to
+     tell <productname>PostgreSQL</productname> the character set
+     (encoding) you would like to use in the client. There are several
+     ways to accomplish this:
+
+     <itemizedlist>
+      <listitem>
+       <para>
+        Using the <command>\encoding</command> command in
+        <application>psql</application>.
+        <command>\encoding</command> allows you to change client
+        encoding on the fly. For
+        example, to change the encoding to <literal>SJIS</literal>, type:
+
+<programlisting>
+\encoding SJIS
+</programlisting>
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        <application>libpq</application> (<xref linkend="libpq-control"/>) has functions to control the client encoding.
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        Using <command>SET client_encoding TO</command>.
+
+        Setting the client encoding can be done with this SQL command:
+
+<programlisting>
+SET CLIENT_ENCODING TO '<replaceable>value</replaceable>';
+</programlisting>
+
+        Also you can use the standard SQL syntax <literal>SET NAMES</literal>
+        for this purpose:
+
+<programlisting>
+SET NAMES '<replaceable>value</replaceable>';
+</programlisting>
+
+        To query the current client encoding:
+
+<programlisting>
+SHOW client_encoding;
+</programlisting>
+
+        To return to the default encoding:
+
+<programlisting>
+RESET client_encoding;
+</programlisting>
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        Using <envar>PGCLIENTENCODING</envar>. If the environment variable
+        <envar>PGCLIENTENCODING</envar> is defined in the client's
+        environment, that client encoding is automatically selected
+        when a connection to the server is made.  (This can
+        subsequently be overridden using any of the other methods
+        mentioned above.)
+       </para>
+      </listitem>
+
+      <listitem>
+      <para>
+       Using the configuration variable <xref
+       linkend="guc-client-encoding"/>. If the
+       <varname>client_encoding</varname> variable is set, that client
+       encoding is automatically selected when a connection to the
+       server is made.  (This can subsequently be overridden using any
+       of the other methods mentioned above.)
+       </para>
+      </listitem>
+
+     </itemizedlist>
+    </para>
+
+    <para>
+     If the conversion of a particular character is not possible
+     &mdash; suppose you chose <literal>EUC_JP</literal> for the
+     server and <literal>LATIN1</literal> for the client, and some
+     Japanese characters are returned that do not have a representation in
+     <literal>LATIN1</literal> &mdash; an error is reported.
+    </para>
+
+    <para>
+     If the client character set is defined as <literal>SQL_ASCII</literal>,
+     encoding conversion is disabled, regardless of the server's character
+     set.  (However, if the server's character set is
+     not <literal>SQL_ASCII</literal>, the server will still check that
+     incoming data is valid for that encoding; so the net effect is as
+     though the client character set were the same as the server's.)
+     Just as for the server, use of <literal>SQL_ASCII</literal> is unwise
+     unless you are working with all-ASCII data.
+    </para>
+   </sect2>
+
+   <sect2 id="multibyte-conversions-supported">
+    <title>Available Character Set Conversions</title>
+
+    <para>
+     <productname>PostgreSQL</productname> allows conversion between any
+     two character sets for which a conversion function is listed in the
+     <link linkend="catalog-pg-conversion"><structname>pg_conversion</structname></link>
+     system catalog.  <productname>PostgreSQL</productname> comes with
+     some predefined conversions, as summarized in
+     <xref linkend="multibyte-translation-table"/> and shown in more
+     detail in <xref linkend="builtin-conversions-table"/>.  You can
+     create a new conversion using the SQL command
+     <xref linkend="sql-createconversion"/>.  (To be used for automatic
+     client/server conversions, a conversion must be marked
+     as <quote>default</quote> for its character set pair.)
+    </para>
+
+    <table id="multibyte-translation-table">
+     <title>Built-in Client/Server Character Set Conversions</title>
+     <tgroup cols="2">
+      <colspec colname="col1" colwidth="1*"/>
+      <colspec colname="col2" colwidth="3*"/>
+      <thead>
+       <row>
+        <entry>Server Character Set</entry>
+        <entry>Available Client Character Sets</entry>
+       </row>
+      </thead>
+      <tbody>
+       <row>
+        <entry><literal>BIG5</literal></entry>
+        <entry><emphasis>not supported as a server encoding</emphasis>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>EUC_CN</literal></entry>
+        <entry><emphasis>EUC_CN</emphasis>,
+        <literal>MULE_INTERNAL</literal>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>EUC_JP</literal></entry>
+        <entry><emphasis>EUC_JP</emphasis>,
+        <literal>MULE_INTERNAL</literal>,
+        <literal>SJIS</literal>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>EUC_JIS_2004</literal></entry>
+        <entry><emphasis>EUC_JIS_2004</emphasis>,
+        <literal>SHIFT_JIS_2004</literal>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>EUC_KR</literal></entry>
+        <entry><emphasis>EUC_KR</emphasis>,
+        <literal>MULE_INTERNAL</literal>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>EUC_TW</literal></entry>
+        <entry><emphasis>EUC_TW</emphasis>,
+        <literal>BIG5</literal>,
+        <literal>MULE_INTERNAL</literal>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>GB18030</literal></entry>
+        <entry><emphasis>not supported as a server encoding</emphasis>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>GBK</literal></entry>
+        <entry><emphasis>not supported as a server encoding</emphasis>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>ISO_8859_5</literal></entry>
+        <entry><emphasis>ISO_8859_5</emphasis>,
+        <literal>KOI8R</literal>,
+        <literal>MULE_INTERNAL</literal>,
+        <literal>UTF8</literal>,
+        <literal>WIN866</literal>,
+        <literal>WIN1251</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>ISO_8859_6</literal></entry>
+        <entry><emphasis>ISO_8859_6</emphasis>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>ISO_8859_7</literal></entry>
+        <entry><emphasis>ISO_8859_7</emphasis>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>ISO_8859_8</literal></entry>
+        <entry><emphasis>ISO_8859_8</emphasis>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>JOHAB</literal></entry>
+        <entry><emphasis>not supported as a server encoding</emphasis>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>KOI8R</literal></entry>
+        <entry><emphasis>KOI8R</emphasis>,
+        <literal>ISO_8859_5</literal>,
+        <literal>MULE_INTERNAL</literal>,
+        <literal>UTF8</literal>,
+        <literal>WIN866</literal>,
+        <literal>WIN1251</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>KOI8U</literal></entry>
+        <entry><emphasis>KOI8U</emphasis>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>LATIN1</literal></entry>
+        <entry><emphasis>LATIN1</emphasis>,
+        <literal>MULE_INTERNAL</literal>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>LATIN2</literal></entry>
+        <entry><emphasis>LATIN2</emphasis>,
+        <literal>MULE_INTERNAL</literal>,
+        <literal>UTF8</literal>,
+        <literal>WIN1250</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>LATIN3</literal></entry>
+        <entry><emphasis>LATIN3</emphasis>,
+        <literal>MULE_INTERNAL</literal>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>LATIN4</literal></entry>
+        <entry><emphasis>LATIN4</emphasis>,
+        <literal>MULE_INTERNAL</literal>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>LATIN5</literal></entry>
+        <entry><emphasis>LATIN5</emphasis>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>LATIN6</literal></entry>
+        <entry><emphasis>LATIN6</emphasis>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>LATIN7</literal></entry>
+        <entry><emphasis>LATIN7</emphasis>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>LATIN8</literal></entry>
+        <entry><emphasis>LATIN8</emphasis>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>LATIN9</literal></entry>
+        <entry><emphasis>LATIN9</emphasis>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>LATIN10</literal></entry>
+        <entry><emphasis>LATIN10</emphasis>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+        <entry><emphasis>MULE_INTERNAL</emphasis>,
+         <literal>BIG5</literal>,
+         <literal>EUC_CN</literal>,
+         <literal>EUC_JP</literal>,
+         <literal>EUC_KR</literal>,
+         <literal>EUC_TW</literal>,
+         <literal>ISO_8859_5</literal>,
+         <literal>KOI8R</literal>,
+         <literal>LATIN1</literal> to <literal>LATIN4</literal>,
+         <literal>SJIS</literal>,
+         <literal>WIN866</literal>,
+         <literal>WIN1250</literal>,
+         <literal>WIN1251</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>SJIS</literal></entry>
+        <entry><emphasis>not supported as a server encoding</emphasis>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>SHIFT_JIS_2004</literal></entry>
+        <entry><emphasis>not supported as a server encoding</emphasis>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>SQL_ASCII</literal></entry>
+        <entry><emphasis>any (no conversion will be performed)</emphasis>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>UHC</literal></entry>
+        <entry><emphasis>not supported as a server encoding</emphasis>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>UTF8</literal></entry>
+        <entry><emphasis>all supported encodings</emphasis>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>WIN866</literal></entry>
+        <entry><emphasis>WIN866</emphasis>,
+         <literal>ISO_8859_5</literal>,
+         <literal>KOI8R</literal>,
+         <literal>MULE_INTERNAL</literal>,
+         <literal>UTF8</literal>,
+         <literal>WIN1251</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>WIN874</literal></entry>
+        <entry><emphasis>WIN874</emphasis>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>WIN1250</literal></entry>
+        <entry><emphasis>WIN1250</emphasis>,
+         <literal>LATIN2</literal>,
+         <literal>MULE_INTERNAL</literal>,
+         <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>WIN1251</literal></entry>
+        <entry><emphasis>WIN1251</emphasis>,
+         <literal>ISO_8859_5</literal>,
+         <literal>KOI8R</literal>,
+         <literal>MULE_INTERNAL</literal>,
+         <literal>UTF8</literal>,
+         <literal>WIN866</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>WIN1252</literal></entry>
+        <entry><emphasis>WIN1252</emphasis>,
+         <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>WIN1253</literal></entry>
+        <entry><emphasis>WIN1253</emphasis>,
+         <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>WIN1254</literal></entry>
+        <entry><emphasis>WIN1254</emphasis>,
+         <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>WIN1255</literal></entry>
+        <entry><emphasis>WIN1255</emphasis>,
+         <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>WIN1256</literal></entry>
+        <entry><emphasis>WIN1256</emphasis>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>WIN1257</literal></entry>
+        <entry><emphasis>WIN1257</emphasis>,
+         <literal>UTF8</literal>
+        </entry>
+       </row>
+       <row>
+        <entry><literal>WIN1258</literal></entry>
+        <entry><emphasis>WIN1258</emphasis>,
+        <literal>UTF8</literal>
+        </entry>
+       </row>
+      </tbody>
+     </tgroup>
+    </table>
+
+    <table id="builtin-conversions-table">
+     <title>All Built-in Character Set Conversions</title>
+     <tgroup cols="3">
+      <colspec colname="col1" colwidth="2*"/>
+      <colspec colname="col2" colwidth="1*"/>
+      <colspec colname="col3" colwidth="1*"/>
+      <thead>
+       <row>
+        <entry>Conversion Name
+         <footnote>
+          <para>
+           The conversion names follow a standard naming scheme: The
+           official name of the source encoding with all
+           non-alphanumeric characters replaced by underscores, followed
+           by <literal>_to_</literal>, followed by the similarly processed
+           destination encoding name.  Therefore, these names sometimes
+           deviate from the customary encoding names shown in
+           <xref linkend="charset-table"/>.
+          </para>
+         </footnote>
+        </entry>
+        <entry>Source Encoding</entry>
+        <entry>Destination Encoding</entry>
+       </row>
+      </thead>
+
+      <tbody>
+       <row>
+        <entry><literal>big5_to_euc_tw</literal></entry>
+        <entry><literal>BIG5</literal></entry>
+        <entry><literal>EUC_TW</literal></entry>
+       </row>
+       <row>
+        <entry><literal>big5_to_mic</literal></entry>
+        <entry><literal>BIG5</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+       </row>
+       <row>
+        <entry><literal>big5_to_utf8</literal></entry>
+        <entry><literal>BIG5</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>euc_cn_to_mic</literal></entry>
+        <entry><literal>EUC_CN</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+       </row>
+       <row>
+        <entry><literal>euc_cn_to_utf8</literal></entry>
+        <entry><literal>EUC_CN</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>euc_jp_to_mic</literal></entry>
+        <entry><literal>EUC_JP</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+       </row>
+       <row>
+        <entry><literal>euc_jp_to_sjis</literal></entry>
+        <entry><literal>EUC_JP</literal></entry>
+        <entry><literal>SJIS</literal></entry>
+       </row>
+       <row>
+        <entry><literal>euc_jp_to_utf8</literal></entry>
+        <entry><literal>EUC_JP</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>euc_kr_to_mic</literal></entry>
+        <entry><literal>EUC_KR</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+       </row>
+       <row>
+        <entry><literal>euc_kr_to_utf8</literal></entry>
+        <entry><literal>EUC_KR</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>euc_tw_to_big5</literal></entry>
+        <entry><literal>EUC_TW</literal></entry>
+        <entry><literal>BIG5</literal></entry>
+       </row>
+       <row>
+        <entry><literal>euc_tw_to_mic</literal></entry>
+        <entry><literal>EUC_TW</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+       </row>
+       <row>
+        <entry><literal>euc_tw_to_utf8</literal></entry>
+        <entry><literal>EUC_TW</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>gb18030_to_utf8</literal></entry>
+        <entry><literal>GB18030</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>gbk_to_utf8</literal></entry>
+        <entry><literal>GBK</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_10_to_utf8</literal></entry>
+        <entry><literal>LATIN6</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_13_to_utf8</literal></entry>
+        <entry><literal>LATIN7</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_14_to_utf8</literal></entry>
+        <entry><literal>LATIN8</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_15_to_utf8</literal></entry>
+        <entry><literal>LATIN9</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_16_to_utf8</literal></entry>
+        <entry><literal>LATIN10</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_1_to_mic</literal></entry>
+        <entry><literal>LATIN1</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_1_to_utf8</literal></entry>
+        <entry><literal>LATIN1</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_2_to_mic</literal></entry>
+        <entry><literal>LATIN2</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_2_to_utf8</literal></entry>
+        <entry><literal>LATIN2</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_2_to_windows_1250</literal></entry>
+        <entry><literal>LATIN2</literal></entry>
+        <entry><literal>WIN1250</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_3_to_mic</literal></entry>
+        <entry><literal>LATIN3</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_3_to_utf8</literal></entry>
+        <entry><literal>LATIN3</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_4_to_mic</literal></entry>
+        <entry><literal>LATIN4</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_4_to_utf8</literal></entry>
+        <entry><literal>LATIN4</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_5_to_koi8_r</literal></entry>
+        <entry><literal>ISO_8859_5</literal></entry>
+        <entry><literal>KOI8R</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_5_to_mic</literal></entry>
+        <entry><literal>ISO_8859_5</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_5_to_utf8</literal></entry>
+        <entry><literal>ISO_8859_5</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_5_to_windows_1251</literal></entry>
+        <entry><literal>ISO_8859_5</literal></entry>
+        <entry><literal>WIN1251</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_5_to_windows_866</literal></entry>
+        <entry><literal>ISO_8859_5</literal></entry>
+        <entry><literal>WIN866</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_6_to_utf8</literal></entry>
+        <entry><literal>ISO_8859_6</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_7_to_utf8</literal></entry>
+        <entry><literal>ISO_8859_7</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_8_to_utf8</literal></entry>
+        <entry><literal>ISO_8859_8</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>iso_8859_9_to_utf8</literal></entry>
+        <entry><literal>LATIN5</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>johab_to_utf8</literal></entry>
+        <entry><literal>JOHAB</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>koi8_r_to_iso_8859_5</literal></entry>
+        <entry><literal>KOI8R</literal></entry>
+        <entry><literal>ISO_8859_5</literal></entry>
+       </row>
+       <row>
+        <entry><literal>koi8_r_to_mic</literal></entry>
+        <entry><literal>KOI8R</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+       </row>
+       <row>
+        <entry><literal>koi8_r_to_utf8</literal></entry>
+        <entry><literal>KOI8R</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>koi8_r_to_windows_1251</literal></entry>
+        <entry><literal>KOI8R</literal></entry>
+        <entry><literal>WIN1251</literal></entry>
+       </row>
+       <row>
+        <entry><literal>koi8_r_to_windows_866</literal></entry>
+        <entry><literal>KOI8R</literal></entry>
+        <entry><literal>WIN866</literal></entry>
+       </row>
+       <row>
+        <entry><literal>koi8_u_to_utf8</literal></entry>
+        <entry><literal>KOI8U</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>mic_to_big5</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+        <entry><literal>BIG5</literal></entry>
+       </row>
+       <row>
+        <entry><literal>mic_to_euc_cn</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+        <entry><literal>EUC_CN</literal></entry>
+       </row>
+       <row>
+        <entry><literal>mic_to_euc_jp</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+        <entry><literal>EUC_JP</literal></entry>
+       </row>
+       <row>
+        <entry><literal>mic_to_euc_kr</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+        <entry><literal>EUC_KR</literal></entry>
+       </row>
+       <row>
+        <entry><literal>mic_to_euc_tw</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+        <entry><literal>EUC_TW</literal></entry>
+       </row>
+       <row>
+        <entry><literal>mic_to_iso_8859_1</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+        <entry><literal>LATIN1</literal></entry>
+       </row>
+       <row>
+        <entry><literal>mic_to_iso_8859_2</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+        <entry><literal>LATIN2</literal></entry>
+       </row>
+       <row>
+        <entry><literal>mic_to_iso_8859_3</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+        <entry><literal>LATIN3</literal></entry>
+       </row>
+       <row>
+        <entry><literal>mic_to_iso_8859_4</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+        <entry><literal>LATIN4</literal></entry>
+       </row>
+       <row>
+        <entry><literal>mic_to_iso_8859_5</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+        <entry><literal>ISO_8859_5</literal></entry>
+       </row>
+       <row>
+        <entry><literal>mic_to_koi8_r</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+        <entry><literal>KOI8R</literal></entry>
+       </row>
+       <row>
+        <entry><literal>mic_to_sjis</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+        <entry><literal>SJIS</literal></entry>
+       </row>
+       <row>
+        <entry><literal>mic_to_windows_1250</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+        <entry><literal>WIN1250</literal></entry>
+       </row>
+       <row>
+        <entry><literal>mic_to_windows_1251</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+        <entry><literal>WIN1251</literal></entry>
+       </row>
+       <row>
+        <entry><literal>mic_to_windows_866</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+        <entry><literal>WIN866</literal></entry>
+       </row>
+       <row>
+        <entry><literal>sjis_to_euc_jp</literal></entry>
+        <entry><literal>SJIS</literal></entry>
+        <entry><literal>EUC_JP</literal></entry>
+       </row>
+       <row>
+        <entry><literal>sjis_to_mic</literal></entry>
+        <entry><literal>SJIS</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+       </row>
+       <row>
+        <entry><literal>sjis_to_utf8</literal></entry>
+        <entry><literal>SJIS</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_1258_to_utf8</literal></entry>
+        <entry><literal>WIN1258</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>uhc_to_utf8</literal></entry>
+        <entry><literal>UHC</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_big5</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>BIG5</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_euc_cn</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>EUC_CN</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_euc_jp</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>EUC_JP</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_euc_kr</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>EUC_KR</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_euc_tw</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>EUC_TW</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_gb18030</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>GB18030</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_gbk</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>GBK</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_iso_8859_1</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>LATIN1</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_iso_8859_10</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>LATIN6</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_iso_8859_13</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>LATIN7</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_iso_8859_14</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>LATIN8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_iso_8859_15</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>LATIN9</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_iso_8859_16</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>LATIN10</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_iso_8859_2</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>LATIN2</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_iso_8859_3</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>LATIN3</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_iso_8859_4</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>LATIN4</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_iso_8859_5</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>ISO_8859_5</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_iso_8859_6</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>ISO_8859_6</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_iso_8859_7</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>ISO_8859_7</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_iso_8859_8</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>ISO_8859_8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_iso_8859_9</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>LATIN5</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_johab</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>JOHAB</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_koi8_r</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>KOI8R</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_koi8_u</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>KOI8U</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_sjis</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>SJIS</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_windows_1258</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>WIN1258</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_uhc</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>UHC</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_windows_1250</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>WIN1250</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_windows_1251</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>WIN1251</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_windows_1252</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>WIN1252</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_windows_1253</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>WIN1253</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_windows_1254</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>WIN1254</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_windows_1255</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>WIN1255</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_windows_1256</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>WIN1256</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_windows_1257</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>WIN1257</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_windows_866</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>WIN866</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_windows_874</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>WIN874</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_1250_to_iso_8859_2</literal></entry>
+        <entry><literal>WIN1250</literal></entry>
+        <entry><literal>LATIN2</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_1250_to_mic</literal></entry>
+        <entry><literal>WIN1250</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_1250_to_utf8</literal></entry>
+        <entry><literal>WIN1250</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_1251_to_iso_8859_5</literal></entry>
+        <entry><literal>WIN1251</literal></entry>
+        <entry><literal>ISO_8859_5</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_1251_to_koi8_r</literal></entry>
+        <entry><literal>WIN1251</literal></entry>
+        <entry><literal>KOI8R</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_1251_to_mic</literal></entry>
+        <entry><literal>WIN1251</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_1251_to_utf8</literal></entry>
+        <entry><literal>WIN1251</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_1251_to_windows_866</literal></entry>
+        <entry><literal>WIN1251</literal></entry>
+        <entry><literal>WIN866</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_1252_to_utf8</literal></entry>
+        <entry><literal>WIN1252</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_1256_to_utf8</literal></entry>
+        <entry><literal>WIN1256</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_866_to_iso_8859_5</literal></entry>
+        <entry><literal>WIN866</literal></entry>
+        <entry><literal>ISO_8859_5</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_866_to_koi8_r</literal></entry>
+        <entry><literal>WIN866</literal></entry>
+        <entry><literal>KOI8R</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_866_to_mic</literal></entry>
+        <entry><literal>WIN866</literal></entry>
+        <entry><literal>MULE_INTERNAL</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_866_to_utf8</literal></entry>
+        <entry><literal>WIN866</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_866_to_windows_1251</literal></entry>
+        <entry><literal>WIN866</literal></entry>
+        <entry><literal>WIN</literal></entry>
+       </row>
+       <row>
+        <entry><literal>windows_874_to_utf8</literal></entry>
+        <entry><literal>WIN874</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>euc_jis_2004_to_utf8</literal></entry>
+        <entry><literal>EUC_JIS_2004</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_euc_jis_2004</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>EUC_JIS_2004</literal></entry>
+       </row>
+       <row>
+        <entry><literal>shift_jis_2004_to_utf8</literal></entry>
+        <entry><literal>SHIFT_JIS_2004</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+       </row>
+       <row>
+        <entry><literal>utf8_to_shift_jis_2004</literal></entry>
+        <entry><literal>UTF8</literal></entry>
+        <entry><literal>SHIFT_JIS_2004</literal></entry>
+       </row>
+       <row>
+        <entry><literal>euc_jis_2004_to_shift_jis_2004</literal></entry>
+        <entry><literal>EUC_JIS_2004</literal></entry>
+        <entry><literal>SHIFT_JIS_2004</literal></entry>
+       </row>
+       <row>
+        <entry><literal>shift_jis_2004_to_euc_jis_2004</literal></entry>
+        <entry><literal>SHIFT_JIS_2004</literal></entry>
+        <entry><literal>EUC_JIS_2004</literal></entry>
+       </row>
+      </tbody>
+     </tgroup>
+    </table>
+   </sect2>
+
+   <sect2 id="multibyte-further-reading">
+    <title>Further Reading</title>
+
+    <para>
+     These are good sources to start learning about various kinds of encoding
+     systems.
+
+     <variablelist>
+      <varlistentry>
+       <term><citetitle>CJKV Information Processing: Chinese, Japanese, Korean &amp; Vietnamese Computing</citetitle></term>
+
+       <listitem>
+        <para>
+         Contains detailed explanations of <literal>EUC_JP</literal>,
+         <literal>EUC_CN</literal>, <literal>EUC_KR</literal>,
+         <literal>EUC_TW</literal>.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term><ulink url="https://www.unicode.org/"></ulink></term>
+
+       <listitem>
+        <para>
+         The web site of the Unicode Consortium.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term><ulink url="https://tools.ietf.org/html/rfc3629">RFC 3629</ulink></term>
+
+       <listitem>
+        <para>
+         <acronym>UTF</acronym>-8 (8-bit UCS/Unicode Transformation
+         Format) is defined here.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </para>
+   </sect2>
+
+  </sect1>
+
+</chapter>