summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/html/locale.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/html/locale.html')
-rw-r--r--doc/src/sgml/html/locale.html348
1 files changed, 348 insertions, 0 deletions
diff --git a/doc/src/sgml/html/locale.html b/doc/src/sgml/html/locale.html
new file mode 100644
index 0000000..65a3bc9
--- /dev/null
+++ b/doc/src/sgml/html/locale.html
@@ -0,0 +1,348 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>24.1. Locale Support</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="charset.html" title="Chapter 24. Localization" /><link rel="next" href="collation.html" title="24.2. Collation Support" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">24.1. Locale Support</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="charset.html" title="Chapter 24. Localization">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="charset.html" title="Chapter 24. Localization">Up</a></td><th width="60%" align="center">Chapter 24. Localization</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="collation.html" title="24.2. Collation Support">Next</a></td></tr></table><hr /></div><div class="sect1" id="LOCALE"><div class="titlepage"><div><div><h2 class="title" style="clear: both">24.1. Locale Support <a href="#LOCALE" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="locale.html#LOCALE-OVERVIEW">24.1.1. Overview</a></span></dt><dt><span class="sect2"><a href="locale.html#LOCALE-BEHAVIOR">24.1.2. Behavior</a></span></dt><dt><span class="sect2"><a href="locale.html#LOCALE-SELECTING-LOCALES">24.1.3. Selecting Locales</a></span></dt><dt><span class="sect2"><a href="locale.html#LOCALE-PROVIDERS">24.1.4. Locale Providers</a></span></dt><dt><span class="sect2"><a href="locale.html#ICU-LOCALES">24.1.5. ICU Locales</a></span></dt><dt><span class="sect2"><a href="locale.html#LOCALE-PROBLEMS">24.1.6. Problems</a></span></dt></dl></div><a id="id-1.6.11.3.2" class="indexterm"></a><p>
+ <em class="firstterm">Locale</em> support refers to an application respecting
+ cultural preferences regarding alphabets, sorting, number
+ formatting, etc. <span class="productname">PostgreSQL</span> uses the standard ISO
+ C and <acronym class="acronym">POSIX</acronym> locale facilities provided by the server operating
+ system. For additional information refer to the documentation of your
+ system.
+ </p><div class="sect2" id="LOCALE-OVERVIEW"><div class="titlepage"><div><div><h3 class="title">24.1.1. Overview <a href="#LOCALE-OVERVIEW" class="id_link">#</a></h3></div></div></div><p>
+ Locale support is automatically initialized when a database
+ cluster is created using <code class="command">initdb</code>.
+ <code class="command">initdb</code> will initialize the database cluster
+ with the locale setting of its execution environment by default,
+ so if your system is already set to use the locale that you want
+ in your database cluster then there is nothing else you need to
+ do. If you want to use a different locale (or you are not sure
+ which locale your system is set to), you can instruct
+ <code class="command">initdb</code> exactly which locale to use by
+ specifying the <code class="option">--locale</code> option. For example:
+</p><pre class="screen">
+initdb --locale=sv_SE
+</pre><p>
+ </p><p>
+ This example for Unix systems sets the locale to Swedish
+ (<code class="literal">sv</code>) as spoken
+ in Sweden (<code class="literal">SE</code>). Other possibilities might include
+ <code class="literal">en_US</code> (U.S. English) and <code class="literal">fr_CA</code> (French
+ Canadian). If more than one character set can be used for a
+ locale then the specifications can take the form
+ <em class="replaceable"><code>language_territory.codeset</code></em>. For example,
+ <code class="literal">fr_BE.UTF-8</code> represents the French language (fr) as
+ spoken in Belgium (BE), with a <acronym class="acronym">UTF-8</acronym> character set
+ encoding.
+ </p><p>
+ What locales are available on your
+ system under what names depends on what was provided by the operating
+ system vendor and what was installed. On most Unix systems, the command
+ <code class="literal">locale -a</code> will provide a list of available locales.
+ Windows uses more verbose locale names, such as <code class="literal">German_Germany</code>
+ or <code class="literal">Swedish_Sweden.1252</code>, but the principles are the same.
+ </p><p>
+ Occasionally it is useful to mix rules from several locales, e.g.,
+ use English collation rules but Spanish messages. To support that, a
+ set of locale subcategories exist that control only certain
+ aspects of the localization rules:
+
+ </p><div class="informaltable"><table class="informaltable" border="1"><colgroup><col class="col1" /><col class="col2" /></colgroup><tbody><tr><td><code class="envar">LC_COLLATE</code></td><td>String sort order</td></tr><tr><td><code class="envar">LC_CTYPE</code></td><td>Character classification (What is a letter? Its upper-case equivalent?)</td></tr><tr><td><code class="envar">LC_MESSAGES</code></td><td>Language of messages</td></tr><tr><td><code class="envar">LC_MONETARY</code></td><td>Formatting of currency amounts</td></tr><tr><td><code class="envar">LC_NUMERIC</code></td><td>Formatting of numbers</td></tr><tr><td><code class="envar">LC_TIME</code></td><td>Formatting of dates and times</td></tr></tbody></table></div><p>
+
+ The category names translate into names of
+ <code class="command">initdb</code> options to override the locale choice
+ for a specific category. For instance, to set the locale to
+ French Canadian, but use U.S. rules for formatting currency, use
+ <code class="literal">initdb --locale=fr_CA --lc-monetary=en_US</code>.
+ </p><p>
+ If you want the system to behave as if it had no locale support,
+ use the special locale name <code class="literal">C</code>, or equivalently
+ <code class="literal">POSIX</code>.
+ </p><p>
+ Some locale categories must have their values
+ fixed when the database is created. You can use different settings
+ for different databases, but once a database is created, you cannot
+ change them for that database anymore. <code class="literal">LC_COLLATE</code>
+ and <code class="literal">LC_CTYPE</code> are these categories. They affect
+ the sort order of indexes, so they must be kept fixed, or indexes on
+ text columns would become corrupt.
+ (But you can alleviate this restriction using collations, as discussed
+ in <a class="xref" href="collation.html" title="24.2. Collation Support">Section 24.2</a>.)
+ The default values for these
+ categories are determined when <code class="command">initdb</code> is run, and
+ those values are used when new databases are created, unless
+ specified otherwise in the <code class="command">CREATE DATABASE</code> command.
+ </p><p>
+ The other locale categories can be changed whenever desired
+ by setting the server configuration parameters
+ that have the same name as the locale categories (see <a class="xref" href="runtime-config-client.html#RUNTIME-CONFIG-CLIENT-FORMAT" title="20.11.2. Locale and Formatting">Section 20.11.2</a> for details). The values
+ that are chosen by <code class="command">initdb</code> are actually only written
+ into the configuration file <code class="filename">postgresql.conf</code> to
+ serve as defaults when the server is started. If you remove these
+ assignments from <code class="filename">postgresql.conf</code> then the
+ server will inherit the settings from its execution environment.
+ </p><p>
+ Note that the locale behavior of the server is determined by the
+ environment variables seen by the server, not by the environment
+ of any client. Therefore, be careful to configure the correct locale settings
+ before starting the server. A consequence of this is that if
+ client and server are set up in different locales, messages might
+ appear in different languages depending on where they originated.
+ </p><div class="note"><h3 class="title">Note</h3><p>
+ When we speak of inheriting the locale from the execution
+ environment, this means the following on most operating systems:
+ For a given locale category, say the collation, the following
+ environment variables are consulted in this order until one is
+ found to be set: <code class="envar">LC_ALL</code>, <code class="envar">LC_COLLATE</code>
+ (or the variable corresponding to the respective category),
+ <code class="envar">LANG</code>. If none of these environment variables are
+ set then the locale defaults to <code class="literal">C</code>.
+ </p><p>
+ Some message localization libraries also look at the environment
+ variable <code class="envar">LANGUAGE</code> which overrides all other locale
+ settings for the purpose of setting the language of messages. If
+ in doubt, please refer to the documentation of your operating
+ system, in particular the documentation about
+ <span class="application">gettext</span>.
+ </p></div><p>
+ To enable messages to be translated to the user's preferred language,
+ <acronym class="acronym">NLS</acronym> must have been selected at build time
+ (<code class="literal">configure --enable-nls</code>). All other locale support is
+ built in automatically.
+ </p></div><div class="sect2" id="LOCALE-BEHAVIOR"><div class="titlepage"><div><div><h3 class="title">24.1.2. Behavior <a href="#LOCALE-BEHAVIOR" class="id_link">#</a></h3></div></div></div><p>
+ The locale settings influence the following SQL features:
+
+ </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
+ Sort order in queries using <code class="literal">ORDER BY</code> or the standard
+ comparison operators on textual data
+ <a id="id-1.6.11.3.5.2.1.1.1.2" class="indexterm"></a>
+ </p></li><li class="listitem"><p>
+ The <code class="function">upper</code>, <code class="function">lower</code>, and <code class="function">initcap</code>
+ functions
+ <a id="id-1.6.11.3.5.2.1.2.1.4" class="indexterm"></a>
+ <a id="id-1.6.11.3.5.2.1.2.1.5" class="indexterm"></a>
+ </p></li><li class="listitem"><p>
+ Pattern matching operators (<code class="literal">LIKE</code>, <code class="literal">SIMILAR TO</code>,
+ and POSIX-style regular expressions); locales affect both case
+ insensitive matching and the classification of characters by
+ character-class regular expressions
+ <a id="id-1.6.11.3.5.2.1.3.1.3" class="indexterm"></a>
+ <a id="id-1.6.11.3.5.2.1.3.1.4" class="indexterm"></a>
+ </p></li><li class="listitem"><p>
+ The <code class="function">to_char</code> family of functions
+ <a id="id-1.6.11.3.5.2.1.4.1.2" class="indexterm"></a>
+ </p></li><li class="listitem"><p>
+ The ability to use indexes with <code class="literal">LIKE</code> clauses
+ </p></li></ul></div><p>
+ </p><p>
+ The drawback of using locales other than <code class="literal">C</code> or
+ <code class="literal">POSIX</code> in <span class="productname">PostgreSQL</span> is its performance
+ impact. It slows character handling and prevents ordinary indexes
+ from being used by <code class="literal">LIKE</code>. For this reason use locales
+ only if you actually need them.
+ </p><p>
+ As a workaround to allow <span class="productname">PostgreSQL</span> to use indexes
+ with <code class="literal">LIKE</code> clauses under a non-C locale, several custom
+ operator classes exist. These allow the creation of an index that
+ performs a strict character-by-character comparison, ignoring
+ locale comparison rules. Refer to <a class="xref" href="indexes-opclass.html" title="11.10. Operator Classes and Operator Families">Section 11.10</a>
+ for more information. Another approach is to create indexes using
+ the <code class="literal">C</code> collation, as discussed in
+ <a class="xref" href="collation.html" title="24.2. Collation Support">Section 24.2</a>.
+ </p></div><div class="sect2" id="LOCALE-SELECTING-LOCALES"><div class="titlepage"><div><div><h3 class="title">24.1.3. Selecting Locales <a href="#LOCALE-SELECTING-LOCALES" class="id_link">#</a></h3></div></div></div><p>
+ Locales can be selected in different scopes depending on requirements.
+ The above overview showed how locales are specified using
+ <code class="command">initdb</code> to set the defaults for the entire cluster. The
+ following list shows where locales can be selected. Each item provides
+ the defaults for the subsequent items, and each lower item allows
+ overriding the defaults on a finer granularity.
+ </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>
+ As explained above, the environment of the operating system provides the
+ defaults for the locales of a newly initialized database cluster. In
+ many cases, this is enough: If the operating system is configured for
+ the desired language/territory, then
+ <span class="productname">PostgreSQL</span> will by default also behave
+ according to that locale.
+ </p></li><li class="listitem"><p>
+ As shown above, command-line options for <code class="command">initdb</code>
+ specify the locale settings for a newly initialized database cluster.
+ Use this if the operating system does not have the locale configuration
+ you want for your database system.
+ </p></li><li class="listitem"><p>
+ A locale can be selected separately for each database. The SQL command
+ <code class="command">CREATE DATABASE</code> and its command-line equivalent
+ <code class="command">createdb</code> have options for that. Use this for example
+ if a database cluster houses databases for multiple tenants with
+ different requirements.
+ </p></li><li class="listitem"><p>
+ Locale settings can be made for individual table columns. This uses an
+ SQL object called <em class="firstterm">collation</em> and is explained in
+ <a class="xref" href="collation.html" title="24.2. Collation Support">Section 24.2</a>. Use this for example to sort data in
+ different languages or customize the sort order of a particular table.
+ </p></li><li class="listitem"><p>
+ Finally, locales can be selected for an individual query. Again, this
+ uses SQL collation objects. This could be used to change the sort order
+ based on run-time choices or for ad-hoc experimentation.
+ </p></li></ol></div></div><div class="sect2" id="LOCALE-PROVIDERS"><div class="titlepage"><div><div><h3 class="title">24.1.4. Locale Providers <a href="#LOCALE-PROVIDERS" class="id_link">#</a></h3></div></div></div><p>
+ <span class="productname">PostgreSQL</span> supports multiple <em class="firstterm">locale
+ providers</em>. This specifies which library supplies the locale
+ data. One standard provider name is <code class="literal">libc</code>, which uses
+ the locales provided by the operating system C library. These are the
+ locales used by most tools provided by the operating system. Another
+ provider is <code class="literal">icu</code>, which uses the external
+ ICU<a id="id-1.6.11.3.7.2.5" class="indexterm"></a> library. ICU locales can
+ only be used if support for ICU was configured when PostgreSQL was built.
+ </p><p>
+ The commands and tools that select the locale settings, as described
+ above, each have an option to select the locale provider. The examples
+ shown earlier all use the <code class="literal">libc</code> provider, which is the
+ default. Here is an example to initialize a database cluster using the
+ ICU provider:
+</p><pre class="programlisting">
+initdb --locale-provider=icu --icu-locale=en
+</pre><p>
+ See the description of the respective commands and programs for
+ details. Note that you can mix locale providers at different
+ granularities, for example use <code class="literal">libc</code> by default for the
+ cluster but have one database that uses the <code class="literal">icu</code>
+ provider, and then have collation objects using either provider within
+ those databases.
+ </p><p>
+ Which locale provider to use depends on individual requirements. For most
+ basic uses, either provider will give adequate results. For the libc
+ provider, it depends on what the operating system offers; some operating
+ systems are better than others. For advanced uses, ICU offers more locale
+ variants and customization options.
+ </p></div><div class="sect2" id="ICU-LOCALES"><div class="titlepage"><div><div><h3 class="title">24.1.5. ICU Locales <a href="#ICU-LOCALES" class="id_link">#</a></h3></div></div></div><div class="sect3" id="ICU-LOCALE-NAMES"><div class="titlepage"><div><div><h4 class="title">24.1.5.1. ICU Locale Names <a href="#ICU-LOCALE-NAMES" class="id_link">#</a></h4></div></div></div><p>
+ The ICU format for the locale name is a <a class="link" href="locale.html#ICU-LANGUAGE-TAG" title="24.1.5.3. Language Tag">Language Tag</a>.
+
+</p><pre class="programlisting">
+CREATE COLLATION mycollation1 (provider = icu, locale = 'ja-JP');
+CREATE COLLATION mycollation2 (provider = icu, locale = 'fr');
+</pre><p>
+ </p></div><div class="sect3" id="ICU-CANONICALIZATION"><div class="titlepage"><div><div><h4 class="title">24.1.5.2. Locale Canonicalization and Validation <a href="#ICU-CANONICALIZATION" class="id_link">#</a></h4></div></div></div><p>
+ When defining a new ICU collation object or database with ICU as the
+ provider, the given locale name is transformed ("canonicalized") into a
+ language tag if not already in that form. For instance,
+
+</p><pre class="screen">
+CREATE COLLATION mycollation3 (provider = icu, locale = 'en-US-u-kn-true');
+NOTICE: using standard form "en-US-u-kn" for locale "en-US-u-kn-true"
+CREATE COLLATION mycollation4 (provider = icu, locale = 'de_DE.utf8');
+NOTICE: using standard form "de-DE" for locale "de_DE.utf8"
+</pre><p>
+
+ If you see this notice, ensure that the <code class="symbol">provider</code> and
+ <code class="symbol">locale</code> are the expected result. For consistent results
+ when using the ICU provider, specify the canonical <a class="link" href="locale.html#ICU-LANGUAGE-TAG" title="24.1.5.3. Language Tag">language tag</a> instead of relying on the
+ transformation.
+ </p><p>
+ A locale with no language name, or the special language name
+ <code class="literal">root</code>, is transformed to have the language
+ <code class="literal">und</code> ("undefined").
+ </p><p>
+ ICU can transform most libc locale names, as well as some other formats,
+ into language tags for easier transition to ICU. If a libc locale name is
+ used in ICU, it may not have precisely the same behavior as in libc.
+ </p><p>
+ If there is a problem interpreting the locale name, or if the locale name
+ represents a language or region that ICU does not recognize, you will see
+ the following warning:
+
+</p><pre class="screen">
+CREATE COLLATION nonsense (provider = icu, locale = 'nonsense');
+WARNING: ICU locale "nonsense" has unknown language "nonsense"
+HINT: To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION
+</pre><p>
+
+ <a class="xref" href="runtime-config-client.html#GUC-ICU-VALIDATION-LEVEL">icu_validation_level</a> controls how the message is
+ reported. Unless set to <code class="literal">ERROR</code>, the collation will
+ still be created, but the behavior may not be what the user intended.
+ </p></div><div class="sect3" id="ICU-LANGUAGE-TAG"><div class="titlepage"><div><div><h4 class="title">24.1.5.3. Language Tag <a href="#ICU-LANGUAGE-TAG" class="id_link">#</a></h4></div></div></div><p>
+ A language tag, defined in BCP 47, is a standardized identifier used to
+ identify languages, regions, and other information about a locale.
+ </p><p>
+ Basic language tags are simply
+ <em class="replaceable"><code>language</code></em><code class="literal">-</code><em class="replaceable"><code>region</code></em>;
+ or even just <em class="replaceable"><code>language</code></em>. The
+ <em class="replaceable"><code>language</code></em> is a language code
+ (e.g. <code class="literal">fr</code> for French), and
+ <em class="replaceable"><code>region</code></em> is a region code
+ (e.g. <code class="literal">CA</code> for Canada). Examples:
+ <code class="literal">ja-JP</code>, <code class="literal">de</code>, or
+ <code class="literal">fr-CA</code>.
+ </p><p>
+ Collation settings may be included in the language tag to customize
+ collation behavior. ICU allows extensive customization, such as
+ sensitivity (or insensitivity) to accents, case, and punctuation;
+ treatment of digits within text; and many other options to satisfy a
+ variety of uses.
+ </p><p>
+ To include this additional collation information in a language tag,
+ append <code class="literal">-u</code>, which indicates there are additional
+ collation settings, followed by one or more
+ <code class="literal">-</code><em class="replaceable"><code>key</code></em><code class="literal">-</code><em class="replaceable"><code>value</code></em>
+ pairs. The <em class="replaceable"><code>key</code></em> is the key for a <a class="link" href="collation.html#ICU-COLLATION-SETTINGS" title="24.2.3.2. Collation Settings for an ICU Locale">collation setting</a> and
+ <em class="replaceable"><code>value</code></em> is a valid value for that setting. For
+ boolean settings, the <code class="literal">-</code><em class="replaceable"><code>key</code></em>
+ may be specified without a corresponding
+ <code class="literal">-</code><em class="replaceable"><code>value</code></em>, which implies a
+ value of <code class="literal">true</code>.
+ </p><p>
+ For example, the language tag <code class="literal">en-US-u-kn-ks-level2</code>
+ means the locale with the English language in the US region, with
+ collation settings <code class="literal">kn</code> set to <code class="literal">true</code>
+ and <code class="literal">ks</code> set to <code class="literal">level2</code>. Those
+ settings mean the collation will be case-insensitive and treat a sequence
+ of digits as a single number:
+
+</p><pre class="screen">
+CREATE COLLATION mycollation5 (provider = icu, deterministic = false, locale = 'en-US-u-kn-ks-level2');
+SELECT 'aB' = 'Ab' COLLATE mycollation5 as result;
+ result
+--------
+ t
+(1 row)
+
+SELECT 'N-45' &lt; 'N-123' COLLATE mycollation5 as result;
+ result
+--------
+ t
+(1 row)
+</pre><p>
+ </p><p>
+ See <a class="xref" href="collation.html#ICU-CUSTOM-COLLATIONS" title="24.2.3. ICU Custom Collations">Section 24.2.3</a> for details and additional
+ examples of using language tags with custom collation information for the
+ locale.
+ </p></div></div><div class="sect2" id="LOCALE-PROBLEMS"><div class="titlepage"><div><div><h3 class="title">24.1.6. Problems <a href="#LOCALE-PROBLEMS" class="id_link">#</a></h3></div></div></div><p>
+ If locale support doesn't work according to the explanation above,
+ check that the locale support in your operating system is
+ correctly configured. To check what locales are installed on your
+ system, you can use the command <code class="literal">locale -a</code> if
+ your operating system provides it.
+ </p><p>
+ Check that <span class="productname">PostgreSQL</span> is actually using the locale
+ that you think it is. The <code class="envar">LC_COLLATE</code> and <code class="envar">LC_CTYPE</code>
+ settings are determined when a database is created, and cannot be
+ changed except by creating a new database. Other locale
+ settings including <code class="envar">LC_MESSAGES</code> and <code class="envar">LC_MONETARY</code>
+ are initially determined by the environment the server is started
+ in, but can be changed on-the-fly. You can check the active locale
+ settings using the <code class="command">SHOW</code> command.
+ </p><p>
+ The directory <code class="filename">src/test/locale</code> in the source
+ distribution contains a test suite for
+ <span class="productname">PostgreSQL</span>'s locale support.
+ </p><p>
+ Client applications that handle server-side errors by parsing the
+ text of the error message will obviously have problems when the
+ server's messages are in a different language. Authors of such
+ applications are advised to make use of the error code scheme
+ instead.
+ </p><p>
+ Maintaining catalogs of message translations requires the on-going
+ efforts of many volunteers that want to see
+ <span class="productname">PostgreSQL</span> speak their preferred language well.
+ If messages in your language are currently not available or not fully
+ translated, your assistance would be appreciated. If you want to
+ help, refer to <a class="xref" href="nls.html" title="Chapter 57. Native Language Support">Chapter 57</a> or write to the developers'
+ mailing list.
+ </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="charset.html" title="Chapter 24. Localization">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="charset.html" title="Chapter 24. Localization">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="collation.html" title="24.2. Collation Support">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 24. Localization </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 24.2. Collation Support</td></tr></table></div></body></html> \ No newline at end of file