diff options
Diffstat (limited to 'doc/src/sgml/html/locale.html')
-rw-r--r-- | doc/src/sgml/html/locale.html | 348 |
1 files changed, 348 insertions, 0 deletions
diff --git a/doc/src/sgml/html/locale.html b/doc/src/sgml/html/locale.html new file mode 100644 index 0000000..65a3bc9 --- /dev/null +++ b/doc/src/sgml/html/locale.html @@ -0,0 +1,348 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>24.1. Locale Support</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="charset.html" title="Chapter 24. Localization" /><link rel="next" href="collation.html" title="24.2. Collation Support" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">24.1. Locale Support</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="charset.html" title="Chapter 24. Localization">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="charset.html" title="Chapter 24. Localization">Up</a></td><th width="60%" align="center">Chapter 24. Localization</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="collation.html" title="24.2. Collation Support">Next</a></td></tr></table><hr /></div><div class="sect1" id="LOCALE"><div class="titlepage"><div><div><h2 class="title" style="clear: both">24.1. Locale Support <a href="#LOCALE" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="locale.html#LOCALE-OVERVIEW">24.1.1. Overview</a></span></dt><dt><span class="sect2"><a href="locale.html#LOCALE-BEHAVIOR">24.1.2. Behavior</a></span></dt><dt><span class="sect2"><a href="locale.html#LOCALE-SELECTING-LOCALES">24.1.3. Selecting Locales</a></span></dt><dt><span class="sect2"><a href="locale.html#LOCALE-PROVIDERS">24.1.4. Locale Providers</a></span></dt><dt><span class="sect2"><a href="locale.html#ICU-LOCALES">24.1.5. ICU Locales</a></span></dt><dt><span class="sect2"><a href="locale.html#LOCALE-PROBLEMS">24.1.6. Problems</a></span></dt></dl></div><a id="id-1.6.11.3.2" class="indexterm"></a><p> + <em class="firstterm">Locale</em> support refers to an application respecting + cultural preferences regarding alphabets, sorting, number + formatting, etc. <span class="productname">PostgreSQL</span> uses the standard ISO + C and <acronym class="acronym">POSIX</acronym> locale facilities provided by the server operating + system. For additional information refer to the documentation of your + system. + </p><div class="sect2" id="LOCALE-OVERVIEW"><div class="titlepage"><div><div><h3 class="title">24.1.1. Overview <a href="#LOCALE-OVERVIEW" class="id_link">#</a></h3></div></div></div><p> + Locale support is automatically initialized when a database + cluster is created using <code class="command">initdb</code>. + <code class="command">initdb</code> will initialize the database cluster + with the locale setting of its execution environment by default, + so if your system is already set to use the locale that you want + in your database cluster then there is nothing else you need to + do. If you want to use a different locale (or you are not sure + which locale your system is set to), you can instruct + <code class="command">initdb</code> exactly which locale to use by + specifying the <code class="option">--locale</code> option. For example: +</p><pre class="screen"> +initdb --locale=sv_SE +</pre><p> + </p><p> + This example for Unix systems sets the locale to Swedish + (<code class="literal">sv</code>) as spoken + in Sweden (<code class="literal">SE</code>). Other possibilities might include + <code class="literal">en_US</code> (U.S. English) and <code class="literal">fr_CA</code> (French + Canadian). If more than one character set can be used for a + locale then the specifications can take the form + <em class="replaceable"><code>language_territory.codeset</code></em>. For example, + <code class="literal">fr_BE.UTF-8</code> represents the French language (fr) as + spoken in Belgium (BE), with a <acronym class="acronym">UTF-8</acronym> character set + encoding. + </p><p> + What locales are available on your + system under what names depends on what was provided by the operating + system vendor and what was installed. On most Unix systems, the command + <code class="literal">locale -a</code> will provide a list of available locales. + Windows uses more verbose locale names, such as <code class="literal">German_Germany</code> + or <code class="literal">Swedish_Sweden.1252</code>, but the principles are the same. + </p><p> + Occasionally it is useful to mix rules from several locales, e.g., + use English collation rules but Spanish messages. To support that, a + set of locale subcategories exist that control only certain + aspects of the localization rules: + + </p><div class="informaltable"><table class="informaltable" border="1"><colgroup><col class="col1" /><col class="col2" /></colgroup><tbody><tr><td><code class="envar">LC_COLLATE</code></td><td>String sort order</td></tr><tr><td><code class="envar">LC_CTYPE</code></td><td>Character classification (What is a letter? Its upper-case equivalent?)</td></tr><tr><td><code class="envar">LC_MESSAGES</code></td><td>Language of messages</td></tr><tr><td><code class="envar">LC_MONETARY</code></td><td>Formatting of currency amounts</td></tr><tr><td><code class="envar">LC_NUMERIC</code></td><td>Formatting of numbers</td></tr><tr><td><code class="envar">LC_TIME</code></td><td>Formatting of dates and times</td></tr></tbody></table></div><p> + + The category names translate into names of + <code class="command">initdb</code> options to override the locale choice + for a specific category. For instance, to set the locale to + French Canadian, but use U.S. rules for formatting currency, use + <code class="literal">initdb --locale=fr_CA --lc-monetary=en_US</code>. + </p><p> + If you want the system to behave as if it had no locale support, + use the special locale name <code class="literal">C</code>, or equivalently + <code class="literal">POSIX</code>. + </p><p> + Some locale categories must have their values + fixed when the database is created. You can use different settings + for different databases, but once a database is created, you cannot + change them for that database anymore. <code class="literal">LC_COLLATE</code> + and <code class="literal">LC_CTYPE</code> are these categories. They affect + the sort order of indexes, so they must be kept fixed, or indexes on + text columns would become corrupt. + (But you can alleviate this restriction using collations, as discussed + in <a class="xref" href="collation.html" title="24.2. Collation Support">Section 24.2</a>.) + The default values for these + categories are determined when <code class="command">initdb</code> is run, and + those values are used when new databases are created, unless + specified otherwise in the <code class="command">CREATE DATABASE</code> command. + </p><p> + The other locale categories can be changed whenever desired + by setting the server configuration parameters + that have the same name as the locale categories (see <a class="xref" href="runtime-config-client.html#RUNTIME-CONFIG-CLIENT-FORMAT" title="20.11.2. Locale and Formatting">Section 20.11.2</a> for details). The values + that are chosen by <code class="command">initdb</code> are actually only written + into the configuration file <code class="filename">postgresql.conf</code> to + serve as defaults when the server is started. If you remove these + assignments from <code class="filename">postgresql.conf</code> then the + server will inherit the settings from its execution environment. + </p><p> + Note that the locale behavior of the server is determined by the + environment variables seen by the server, not by the environment + of any client. Therefore, be careful to configure the correct locale settings + before starting the server. A consequence of this is that if + client and server are set up in different locales, messages might + appear in different languages depending on where they originated. + </p><div class="note"><h3 class="title">Note</h3><p> + When we speak of inheriting the locale from the execution + environment, this means the following on most operating systems: + For a given locale category, say the collation, the following + environment variables are consulted in this order until one is + found to be set: <code class="envar">LC_ALL</code>, <code class="envar">LC_COLLATE</code> + (or the variable corresponding to the respective category), + <code class="envar">LANG</code>. If none of these environment variables are + set then the locale defaults to <code class="literal">C</code>. + </p><p> + Some message localization libraries also look at the environment + variable <code class="envar">LANGUAGE</code> which overrides all other locale + settings for the purpose of setting the language of messages. If + in doubt, please refer to the documentation of your operating + system, in particular the documentation about + <span class="application">gettext</span>. + </p></div><p> + To enable messages to be translated to the user's preferred language, + <acronym class="acronym">NLS</acronym> must have been selected at build time + (<code class="literal">configure --enable-nls</code>). All other locale support is + built in automatically. + </p></div><div class="sect2" id="LOCALE-BEHAVIOR"><div class="titlepage"><div><div><h3 class="title">24.1.2. Behavior <a href="#LOCALE-BEHAVIOR" class="id_link">#</a></h3></div></div></div><p> + The locale settings influence the following SQL features: + + </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> + Sort order in queries using <code class="literal">ORDER BY</code> or the standard + comparison operators on textual data + <a id="id-1.6.11.3.5.2.1.1.1.2" class="indexterm"></a> + </p></li><li class="listitem"><p> + The <code class="function">upper</code>, <code class="function">lower</code>, and <code class="function">initcap</code> + functions + <a id="id-1.6.11.3.5.2.1.2.1.4" class="indexterm"></a> + <a id="id-1.6.11.3.5.2.1.2.1.5" class="indexterm"></a> + </p></li><li class="listitem"><p> + Pattern matching operators (<code class="literal">LIKE</code>, <code class="literal">SIMILAR TO</code>, + and POSIX-style regular expressions); locales affect both case + insensitive matching and the classification of characters by + character-class regular expressions + <a id="id-1.6.11.3.5.2.1.3.1.3" class="indexterm"></a> + <a id="id-1.6.11.3.5.2.1.3.1.4" class="indexterm"></a> + </p></li><li class="listitem"><p> + The <code class="function">to_char</code> family of functions + <a id="id-1.6.11.3.5.2.1.4.1.2" class="indexterm"></a> + </p></li><li class="listitem"><p> + The ability to use indexes with <code class="literal">LIKE</code> clauses + </p></li></ul></div><p> + </p><p> + The drawback of using locales other than <code class="literal">C</code> or + <code class="literal">POSIX</code> in <span class="productname">PostgreSQL</span> is its performance + impact. It slows character handling and prevents ordinary indexes + from being used by <code class="literal">LIKE</code>. For this reason use locales + only if you actually need them. + </p><p> + As a workaround to allow <span class="productname">PostgreSQL</span> to use indexes + with <code class="literal">LIKE</code> clauses under a non-C locale, several custom + operator classes exist. These allow the creation of an index that + performs a strict character-by-character comparison, ignoring + locale comparison rules. Refer to <a class="xref" href="indexes-opclass.html" title="11.10. Operator Classes and Operator Families">Section 11.10</a> + for more information. Another approach is to create indexes using + the <code class="literal">C</code> collation, as discussed in + <a class="xref" href="collation.html" title="24.2. Collation Support">Section 24.2</a>. + </p></div><div class="sect2" id="LOCALE-SELECTING-LOCALES"><div class="titlepage"><div><div><h3 class="title">24.1.3. Selecting Locales <a href="#LOCALE-SELECTING-LOCALES" class="id_link">#</a></h3></div></div></div><p> + Locales can be selected in different scopes depending on requirements. + The above overview showed how locales are specified using + <code class="command">initdb</code> to set the defaults for the entire cluster. The + following list shows where locales can be selected. Each item provides + the defaults for the subsequent items, and each lower item allows + overriding the defaults on a finer granularity. + </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p> + As explained above, the environment of the operating system provides the + defaults for the locales of a newly initialized database cluster. In + many cases, this is enough: If the operating system is configured for + the desired language/territory, then + <span class="productname">PostgreSQL</span> will by default also behave + according to that locale. + </p></li><li class="listitem"><p> + As shown above, command-line options for <code class="command">initdb</code> + specify the locale settings for a newly initialized database cluster. + Use this if the operating system does not have the locale configuration + you want for your database system. + </p></li><li class="listitem"><p> + A locale can be selected separately for each database. The SQL command + <code class="command">CREATE DATABASE</code> and its command-line equivalent + <code class="command">createdb</code> have options for that. Use this for example + if a database cluster houses databases for multiple tenants with + different requirements. + </p></li><li class="listitem"><p> + Locale settings can be made for individual table columns. This uses an + SQL object called <em class="firstterm">collation</em> and is explained in + <a class="xref" href="collation.html" title="24.2. Collation Support">Section 24.2</a>. Use this for example to sort data in + different languages or customize the sort order of a particular table. + </p></li><li class="listitem"><p> + Finally, locales can be selected for an individual query. Again, this + uses SQL collation objects. This could be used to change the sort order + based on run-time choices or for ad-hoc experimentation. + </p></li></ol></div></div><div class="sect2" id="LOCALE-PROVIDERS"><div class="titlepage"><div><div><h3 class="title">24.1.4. Locale Providers <a href="#LOCALE-PROVIDERS" class="id_link">#</a></h3></div></div></div><p> + <span class="productname">PostgreSQL</span> supports multiple <em class="firstterm">locale + providers</em>. This specifies which library supplies the locale + data. One standard provider name is <code class="literal">libc</code>, which uses + the locales provided by the operating system C library. These are the + locales used by most tools provided by the operating system. Another + provider is <code class="literal">icu</code>, which uses the external + ICU<a id="id-1.6.11.3.7.2.5" class="indexterm"></a> library. ICU locales can + only be used if support for ICU was configured when PostgreSQL was built. + </p><p> + The commands and tools that select the locale settings, as described + above, each have an option to select the locale provider. The examples + shown earlier all use the <code class="literal">libc</code> provider, which is the + default. Here is an example to initialize a database cluster using the + ICU provider: +</p><pre class="programlisting"> +initdb --locale-provider=icu --icu-locale=en +</pre><p> + See the description of the respective commands and programs for + details. Note that you can mix locale providers at different + granularities, for example use <code class="literal">libc</code> by default for the + cluster but have one database that uses the <code class="literal">icu</code> + provider, and then have collation objects using either provider within + those databases. + </p><p> + Which locale provider to use depends on individual requirements. For most + basic uses, either provider will give adequate results. For the libc + provider, it depends on what the operating system offers; some operating + systems are better than others. For advanced uses, ICU offers more locale + variants and customization options. + </p></div><div class="sect2" id="ICU-LOCALES"><div class="titlepage"><div><div><h3 class="title">24.1.5. ICU Locales <a href="#ICU-LOCALES" class="id_link">#</a></h3></div></div></div><div class="sect3" id="ICU-LOCALE-NAMES"><div class="titlepage"><div><div><h4 class="title">24.1.5.1. ICU Locale Names <a href="#ICU-LOCALE-NAMES" class="id_link">#</a></h4></div></div></div><p> + The ICU format for the locale name is a <a class="link" href="locale.html#ICU-LANGUAGE-TAG" title="24.1.5.3. Language Tag">Language Tag</a>. + +</p><pre class="programlisting"> +CREATE COLLATION mycollation1 (provider = icu, locale = 'ja-JP'); +CREATE COLLATION mycollation2 (provider = icu, locale = 'fr'); +</pre><p> + </p></div><div class="sect3" id="ICU-CANONICALIZATION"><div class="titlepage"><div><div><h4 class="title">24.1.5.2. Locale Canonicalization and Validation <a href="#ICU-CANONICALIZATION" class="id_link">#</a></h4></div></div></div><p> + When defining a new ICU collation object or database with ICU as the + provider, the given locale name is transformed ("canonicalized") into a + language tag if not already in that form. For instance, + +</p><pre class="screen"> +CREATE COLLATION mycollation3 (provider = icu, locale = 'en-US-u-kn-true'); +NOTICE: using standard form "en-US-u-kn" for locale "en-US-u-kn-true" +CREATE COLLATION mycollation4 (provider = icu, locale = 'de_DE.utf8'); +NOTICE: using standard form "de-DE" for locale "de_DE.utf8" +</pre><p> + + If you see this notice, ensure that the <code class="symbol">provider</code> and + <code class="symbol">locale</code> are the expected result. For consistent results + when using the ICU provider, specify the canonical <a class="link" href="locale.html#ICU-LANGUAGE-TAG" title="24.1.5.3. Language Tag">language tag</a> instead of relying on the + transformation. + </p><p> + A locale with no language name, or the special language name + <code class="literal">root</code>, is transformed to have the language + <code class="literal">und</code> ("undefined"). + </p><p> + ICU can transform most libc locale names, as well as some other formats, + into language tags for easier transition to ICU. If a libc locale name is + used in ICU, it may not have precisely the same behavior as in libc. + </p><p> + If there is a problem interpreting the locale name, or if the locale name + represents a language or region that ICU does not recognize, you will see + the following warning: + +</p><pre class="screen"> +CREATE COLLATION nonsense (provider = icu, locale = 'nonsense'); +WARNING: ICU locale "nonsense" has unknown language "nonsense" +HINT: To disable ICU locale validation, set parameter icu_validation_level to DISABLED. +CREATE COLLATION +</pre><p> + + <a class="xref" href="runtime-config-client.html#GUC-ICU-VALIDATION-LEVEL">icu_validation_level</a> controls how the message is + reported. Unless set to <code class="literal">ERROR</code>, the collation will + still be created, but the behavior may not be what the user intended. + </p></div><div class="sect3" id="ICU-LANGUAGE-TAG"><div class="titlepage"><div><div><h4 class="title">24.1.5.3. Language Tag <a href="#ICU-LANGUAGE-TAG" class="id_link">#</a></h4></div></div></div><p> + A language tag, defined in BCP 47, is a standardized identifier used to + identify languages, regions, and other information about a locale. + </p><p> + Basic language tags are simply + <em class="replaceable"><code>language</code></em><code class="literal">-</code><em class="replaceable"><code>region</code></em>; + or even just <em class="replaceable"><code>language</code></em>. The + <em class="replaceable"><code>language</code></em> is a language code + (e.g. <code class="literal">fr</code> for French), and + <em class="replaceable"><code>region</code></em> is a region code + (e.g. <code class="literal">CA</code> for Canada). Examples: + <code class="literal">ja-JP</code>, <code class="literal">de</code>, or + <code class="literal">fr-CA</code>. + </p><p> + Collation settings may be included in the language tag to customize + collation behavior. ICU allows extensive customization, such as + sensitivity (or insensitivity) to accents, case, and punctuation; + treatment of digits within text; and many other options to satisfy a + variety of uses. + </p><p> + To include this additional collation information in a language tag, + append <code class="literal">-u</code>, which indicates there are additional + collation settings, followed by one or more + <code class="literal">-</code><em class="replaceable"><code>key</code></em><code class="literal">-</code><em class="replaceable"><code>value</code></em> + pairs. The <em class="replaceable"><code>key</code></em> is the key for a <a class="link" href="collation.html#ICU-COLLATION-SETTINGS" title="24.2.3.2. Collation Settings for an ICU Locale">collation setting</a> and + <em class="replaceable"><code>value</code></em> is a valid value for that setting. For + boolean settings, the <code class="literal">-</code><em class="replaceable"><code>key</code></em> + may be specified without a corresponding + <code class="literal">-</code><em class="replaceable"><code>value</code></em>, which implies a + value of <code class="literal">true</code>. + </p><p> + For example, the language tag <code class="literal">en-US-u-kn-ks-level2</code> + means the locale with the English language in the US region, with + collation settings <code class="literal">kn</code> set to <code class="literal">true</code> + and <code class="literal">ks</code> set to <code class="literal">level2</code>. Those + settings mean the collation will be case-insensitive and treat a sequence + of digits as a single number: + +</p><pre class="screen"> +CREATE COLLATION mycollation5 (provider = icu, deterministic = false, locale = 'en-US-u-kn-ks-level2'); +SELECT 'aB' = 'Ab' COLLATE mycollation5 as result; + result +-------- + t +(1 row) + +SELECT 'N-45' < 'N-123' COLLATE mycollation5 as result; + result +-------- + t +(1 row) +</pre><p> + </p><p> + See <a class="xref" href="collation.html#ICU-CUSTOM-COLLATIONS" title="24.2.3. ICU Custom Collations">Section 24.2.3</a> for details and additional + examples of using language tags with custom collation information for the + locale. + </p></div></div><div class="sect2" id="LOCALE-PROBLEMS"><div class="titlepage"><div><div><h3 class="title">24.1.6. Problems <a href="#LOCALE-PROBLEMS" class="id_link">#</a></h3></div></div></div><p> + If locale support doesn't work according to the explanation above, + check that the locale support in your operating system is + correctly configured. To check what locales are installed on your + system, you can use the command <code class="literal">locale -a</code> if + your operating system provides it. + </p><p> + Check that <span class="productname">PostgreSQL</span> is actually using the locale + that you think it is. The <code class="envar">LC_COLLATE</code> and <code class="envar">LC_CTYPE</code> + settings are determined when a database is created, and cannot be + changed except by creating a new database. Other locale + settings including <code class="envar">LC_MESSAGES</code> and <code class="envar">LC_MONETARY</code> + are initially determined by the environment the server is started + in, but can be changed on-the-fly. You can check the active locale + settings using the <code class="command">SHOW</code> command. + </p><p> + The directory <code class="filename">src/test/locale</code> in the source + distribution contains a test suite for + <span class="productname">PostgreSQL</span>'s locale support. + </p><p> + Client applications that handle server-side errors by parsing the + text of the error message will obviously have problems when the + server's messages are in a different language. Authors of such + applications are advised to make use of the error code scheme + instead. + </p><p> + Maintaining catalogs of message translations requires the on-going + efforts of many volunteers that want to see + <span class="productname">PostgreSQL</span> speak their preferred language well. + If messages in your language are currently not available or not fully + translated, your assistance would be appreciated. If you want to + help, refer to <a class="xref" href="nls.html" title="Chapter 57. Native Language Support">Chapter 57</a> or write to the developers' + mailing list. + </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="charset.html" title="Chapter 24. Localization">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="charset.html" title="Chapter 24. Localization">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="collation.html" title="24.2. Collation Support">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 24. Localization </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 16.2 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 24.2. Collation Support</td></tr></table></div></body></html>
\ No newline at end of file |