summaryrefslogtreecommitdiffstats
path: root/doc/src/sgml/nls.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/nls.sgml')
-rw-r--r--doc/src/sgml/nls.sgml532
1 files changed, 532 insertions, 0 deletions
diff --git a/doc/src/sgml/nls.sgml b/doc/src/sgml/nls.sgml
new file mode 100644
index 0000000..d49f44f
--- /dev/null
+++ b/doc/src/sgml/nls.sgml
@@ -0,0 +1,532 @@
+<!-- doc/src/sgml/nls.sgml -->
+
+<chapter id="nls">
+ <title>Native Language Support</title>
+
+ <sect1 id="nls-translator">
+ <title>For the Translator</title>
+
+ <para>
+ <productname>PostgreSQL</productname>
+ programs (server and client) can issue their messages in
+ your favorite language &mdash; if the messages have been translated.
+ Creating and maintaining translated message sets needs the help of
+ people who speak their own language well and want to contribute to
+ the <productname>PostgreSQL</productname> effort. You do not have to be a
+ programmer at all
+ to do this. This section explains how to help.
+ </para>
+
+ <sect2>
+ <title>Requirements</title>
+
+ <para>
+ We won't judge your language skills &mdash; this section is about
+ software tools. Theoretically, you only need a text editor. But
+ this is only in the unlikely event that you do not want to try out
+ your translated messages. When you configure your source tree, be
+ sure to use the <option>--enable-nls</option> option. This will
+ also check for the <application>libintl</application> library and the
+ <filename>msgfmt</filename> program, which all end users will need
+ anyway. To try out your work, follow the applicable portions of
+ the installation instructions.
+ </para>
+
+ <para>
+ If you want to start a new translation effort or want to do a
+ message catalog merge (described later), you will need the
+ programs <filename>xgettext</filename> and
+ <filename>msgmerge</filename>, respectively, in a GNU-compatible
+ implementation. Later, we will try to arrange it so that if you
+ use a packaged source distribution, you won't need
+ <filename>xgettext</filename>. (If working from Git, you will still need
+ it.) <application>GNU Gettext 0.10.36</application> or later is currently recommended.
+ </para>
+
+ <para>
+ Your local gettext implementation should come with its own
+ documentation. Some of that is probably duplicated in what
+ follows, but for additional details you should look there.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Concepts</title>
+
+ <para>
+ The pairs of original (English) messages and their (possibly)
+ translated equivalents are kept in <firstterm>message
+ catalogs</firstterm>, one for each program (although related
+ programs can share a message catalog) and for each target
+ language. There are two file formats for message catalogs: The
+ first is the <quote>PO</quote> file (for Portable Object), which
+ is a plain text file with special syntax that translators edit.
+ The second is the <quote>MO</quote> file (for Machine Object),
+ which is a binary file generated from the respective PO file and
+ is used while the internationalized program is run. Translators
+ do not deal with MO files; in fact hardly anyone does.
+ </para>
+
+ <para>
+ The extension of the message catalog file is to no surprise either
+ <filename>.po</filename> or <filename>.mo</filename>. The base
+ name is either the name of the program it accompanies, or the
+ language the file is for, depending on the situation. This is a
+ bit confusing. Examples are <filename>psql.po</filename> (PO file
+ for psql) or <filename>fr.mo</filename> (MO file in French).
+ </para>
+
+ <para>
+ The file format of the PO files is illustrated here:
+<programlisting>
+# comment
+
+msgid "original string"
+msgstr "translated string"
+
+msgid "more original"
+msgstr "another translated"
+"string can be broken up like this"
+
+...
+</programlisting>
+ The msgid lines are extracted from the program source. (They need not
+ be, but this is the most common way.) The msgstr lines are
+ initially empty and are filled in with useful strings by the
+ translator. The strings can contain C-style escape characters and
+ can be continued across lines as illustrated. (The next line must
+ start at the beginning of the line.)
+ </para>
+
+ <para>
+ The # character introduces a comment. If whitespace immediately
+ follows the # character, then this is a comment maintained by the
+ translator. There can also be automatic comments, which have a
+ non-whitespace character immediately following the #. These are
+ maintained by the various tools that operate on the PO files and
+ are intended to aid the translator.
+<programlisting>
+#. automatic comment
+#: filename.c:1023
+#, flags, flags
+</programlisting>
+ The #. style comments are extracted from the source file where the
+ message is used. Possibly the programmer has inserted information
+ for the translator, such as about expected alignment. The #:
+ comments indicate the exact locations where the message is used
+ in the source. The translator need not look at the program
+ source, but can if there is doubt about the correct
+ translation. The #, comments contain flags that describe the
+ message in some way. There are currently two flags:
+ <literal>fuzzy</literal> is set if the message has possibly been
+ outdated because of changes in the program source. The translator
+ can then verify this and possibly remove the fuzzy flag. Note
+ that fuzzy messages are not made available to the end user. The
+ other flag is <literal>c-format</literal>, which indicates that
+ the message is a <function>printf</function>-style format
+ template. This means that the translation should also be a format
+ string with the same number and type of placeholders. There are
+ tools that can verify this, which key off the c-format flag.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Creating and Maintaining Message Catalogs</title>
+
+ <para>
+ OK, so how does one create a <quote>blank</quote> message
+ catalog? First, go into the directory that contains the program
+ whose messages you want to translate. If there is a file
+ <filename>nls.mk</filename>, then this program has been prepared
+ for translation.
+ </para>
+
+ <para>
+ If there are already some <filename>.po</filename> files, then
+ someone has already done some translation work. The files are
+ named <filename><replaceable>language</replaceable>.po</filename>,
+ where <replaceable>language</replaceable> is the
+ <ulink url="https://www.loc.gov/standards/iso639-2/php/English_list.php">
+ ISO 639-1 two-letter language code (in lower case)</ulink>, e.g.,
+ <filename>fr.po</filename> for French. If there is really a need
+ for more than one translation effort per language then the files
+ can also be named
+ <filename><replaceable>language</replaceable>_<replaceable>region</replaceable>.po</filename>
+ where <replaceable>region</replaceable> is the
+ <ulink url="https://www.iso.org/iso-3166-country-codes.html">
+ ISO 3166-1 two-letter country code (in upper case)</ulink>,
+ e.g.,
+ <filename>pt_BR.po</filename> for Portuguese in Brazil. If you
+ find the language you wanted you can just start working on that
+ file.
+ </para>
+
+ <para>
+ If you need to start a new translation effort, then first run the
+ command:
+<programlisting>
+make init-po
+</programlisting>
+ This will create a file
+ <filename><replaceable>progname</replaceable>.pot</filename>.
+ (<filename>.pot</filename> to distinguish it from PO files that
+ are <quote>in production</quote>. The <literal>T</literal> stands for
+ <quote>template</quote>.)
+ Copy this file to
+ <filename><replaceable>language</replaceable>.po</filename> and
+ edit it. To make it known that the new language is available,
+ also edit the file <filename>nls.mk</filename> and add the
+ language (or language and country) code to the line that looks like:
+<programlisting>
+AVAIL_LANGUAGES := de fr
+</programlisting>
+ (Other languages can appear, of course.)
+ </para>
+
+ <para>
+ As the underlying program or library changes, messages might be
+ changed or added by the programmers. In this case you do not need
+ to start from scratch. Instead, run the command:
+<programlisting>
+make update-po
+</programlisting>
+ which will create a new blank message catalog file (the pot file
+ you started with) and will merge it with the existing PO files.
+ If the merge algorithm is not sure about a particular message it
+ marks it <quote>fuzzy</quote> as explained above. The new PO file
+ is saved with a <filename>.po.new</filename> extension.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Editing the PO Files</title>
+
+ <para>
+ The PO files can be edited with a regular text editor. The
+ translator should only change the area between the quotes after
+ the msgstr directive, add comments, and alter the fuzzy flag.
+ There is (unsurprisingly) a PO mode for Emacs, which I find quite
+ useful.
+ </para>
+
+ <para>
+ The PO files need not be completely filled in. The software will
+ automatically fall back to the original string if no translation
+ (or an empty translation) is available. It is no problem to
+ submit incomplete translations for inclusions in the source tree;
+ that gives room for other people to pick up your work. However,
+ you are encouraged to give priority to removing fuzzy entries
+ after doing a merge. Remember that fuzzy entries will not be
+ installed; they only serve as reference for what might be the right
+ translation.
+ </para>
+
+ <para>
+ Here are some things to keep in mind while editing the
+ translations:
+ <itemizedlist>
+ <listitem>
+ <para>
+ Make sure that if the original ends with a newline, the
+ translation does, too. Similarly for tabs, etc.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ If the original is a <function>printf</function> format string, the translation
+ also needs to be. The translation also needs to have the same
+ format specifiers in the same order. Sometimes the natural
+ rules of the language make this impossible or at least awkward.
+ In that case you can modify the format specifiers like this:
+<programlisting>
+msgstr "Die Datei %2$s hat %1$u Zeichen."
+</programlisting>
+ Then the first placeholder will actually use the second
+ argument from the list. The
+ <literal><replaceable>digits</replaceable>$</literal> needs to
+ follow the % immediately, before any other format manipulators.
+ (This feature really exists in the <function>printf</function>
+ family of functions. You might not have heard of it before because
+ there is little use for it outside of message
+ internationalization.)
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ If the original string contains a linguistic mistake, report
+ that (or fix it yourself in the program source) and translate
+ normally. The corrected string can be merged in when the
+ program sources have been updated. If the original string
+ contains a factual mistake, report that (or fix it yourself)
+ and do not translate it. Instead, you can mark the string with
+ a comment in the PO file.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Maintain the style and tone of the original string.
+ Specifically, messages that are not sentences (<literal>cannot
+ open file %s</literal>) should probably not start with a
+ capital letter (if your language distinguishes letter case) or
+ end with a period (if your language uses punctuation marks).
+ It might help to read <xref linkend="error-style-guide"/>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ If you don't know what a message means, or if it is ambiguous,
+ ask on the developers' mailing list. Chances are that English
+ speaking end users might also not understand it or find it
+ ambiguous, so it's best to improve the message.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+ </para>
+ </sect2>
+
+ </sect1>
+
+
+ <sect1 id="nls-programmer">
+ <title>For the Programmer</title>
+
+ <sect2 id="nls-mechanics">
+ <title>Mechanics</title>
+
+ <para>
+ This section describes how to implement native language support in a
+ program or library that is part of the
+ <productname>PostgreSQL</productname> distribution.
+ Currently, it only applies to C programs.
+ </para>
+
+ <procedure>
+ <title>Adding NLS Support to a Program</title>
+
+ <step>
+ <para>
+ Insert this code into the start-up sequence of the program:
+<programlisting>
+#ifdef ENABLE_NLS
+#include &lt;locale.h&gt;
+#endif
+
+...
+
+#ifdef ENABLE_NLS
+setlocale(LC_ALL, "");
+bindtextdomain("<replaceable>progname</replaceable>", LOCALEDIR);
+textdomain("<replaceable>progname</replaceable>");
+#endif
+</programlisting>
+ (The <replaceable>progname</replaceable> can actually be chosen
+ freely.)
+ </para>
+ </step>
+
+ <step>
+ <para>
+ Wherever a message that is a candidate for translation is found,
+ a call to <function>gettext()</function> needs to be inserted. E.g.:
+<programlisting>
+fprintf(stderr, "panic level %d\n", lvl);
+</programlisting>
+ would be changed to:
+<programlisting>
+fprintf(stderr, gettext("panic level %d\n"), lvl);
+</programlisting>
+ (<symbol>gettext</symbol> is defined as a no-op if NLS support is
+ not configured.)
+ </para>
+
+ <para>
+ This tends to add a lot of clutter. One common shortcut is to use:
+<programlisting>
+#define _(x) gettext(x)
+</programlisting>
+ Another solution is feasible if the program does much of its
+ communication through one or a few functions, such as
+ <function>ereport()</function> in the backend. Then you make this
+ function call <function>gettext</function> internally on all
+ input strings.
+ </para>
+ </step>
+
+ <step>
+ <para>
+ Add a file <filename>nls.mk</filename> in the directory with the
+ program sources. This file will be read as a makefile. The
+ following variable assignments need to be made here:
+
+ <variablelist>
+ <varlistentry>
+ <term><varname>CATALOG_NAME</varname></term>
+
+ <listitem>
+ <para>
+ The program name, as provided in the
+ <function>textdomain()</function> call.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>AVAIL_LANGUAGES</varname></term>
+
+ <listitem>
+ <para>
+ List of provided translations &mdash; initially empty.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>GETTEXT_FILES</varname></term>
+
+ <listitem>
+ <para>
+ List of files that contain translatable strings, i.e., those
+ marked with <function>gettext</function> or an alternative
+ solution. Eventually, this will include nearly all source
+ files of the program. If this list gets too long you can
+ make the first <quote>file</quote> be a <literal>+</literal>
+ and the second word be a file that contains one file name per
+ line.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>GETTEXT_TRIGGERS</varname></term>
+
+ <listitem>
+ <para>
+ The tools that generate message catalogs for the translators
+ to work on need to know what function calls contain
+ translatable strings. By default, only
+ <function>gettext()</function> calls are known. If you used
+ <function>_</function> or other identifiers you need to list
+ them here. If the translatable string is not the first
+ argument, the item needs to be of the form
+ <literal>func:2</literal> (for the second argument).
+ If you have a function that supports pluralized messages,
+ the item should look like <literal>func:1,2</literal>
+ (identifying the singular and plural message arguments).
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </step>
+
+ </procedure>
+
+ <para>
+ The build system will automatically take care of building and
+ installing the message catalogs.
+ </para>
+ </sect2>
+
+ <sect2 id="nls-guidelines">
+ <title>Message-Writing Guidelines</title>
+
+ <para>
+ Here are some guidelines for writing messages that are easily
+ translatable.
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ Do not construct sentences at run-time, like:
+<programlisting>
+printf("Files were %s.\n", flag ? "copied" : "removed");
+</programlisting>
+ The word order within the sentence might be different in other
+ languages. Also, even if you remember to call <function>gettext()</function> on
+ each fragment, the fragments might not translate well separately. It's
+ better to duplicate a little code so that each message to be
+ translated is a coherent whole. Only numbers, file names, and
+ such-like run-time variables should be inserted at run time into
+ a message text.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ For similar reasons, this won't work:
+<programlisting>
+printf("copied %d file%s", n, n!=1 ? "s" : "");
+</programlisting>
+ because it assumes how the plural is formed. If you figured you
+ could solve it like this:
+<programlisting>
+if (n==1)
+ printf("copied 1 file");
+else
+ printf("copied %d files", n):
+</programlisting>
+ then be disappointed. Some languages have more than two forms,
+ with some peculiar rules. It's often best to design the message
+ to avoid the issue altogether, for instance like this:
+<programlisting>
+printf("number of copied files: %d", n);
+</programlisting>
+ </para>
+
+ <para>
+ If you really want to construct a properly pluralized message,
+ there is support for this, but it's a bit awkward. When generating
+ a primary or detail error message in <function>ereport()</function>, you can
+ write something like this:
+<programlisting>
+errmsg_plural("copied %d file",
+ "copied %d files",
+ n,
+ n)
+</programlisting>
+ The first argument is the format string appropriate for English
+ singular form, the second is the format string appropriate for
+ English plural form, and the third is the integer control value
+ that determines which plural form to use. Subsequent arguments
+ are formatted per the format string as usual. (Normally, the
+ pluralization control value will also be one of the values to be
+ formatted, so it has to be written twice.) In English it only
+ matters whether <replaceable>n</replaceable> is 1 or not 1, but in other
+ languages there can be many different plural forms. The translator
+ sees the two English forms as a group and has the opportunity to
+ supply multiple substitute strings, with the appropriate one being
+ selected based on the run-time value of <replaceable>n</replaceable>.
+ </para>
+
+ <para>
+ If you need to pluralize a message that isn't going directly to an
+ <function>errmsg</function> or <function>errdetail</function> report, you have to use
+ the underlying function <function>ngettext</function>. See the gettext
+ documentation.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ If you want to communicate something to the translator, such as
+ about how a message is intended to line up with other output,
+ precede the occurrence of the string with a comment that starts
+ with <literal>translator</literal>, e.g.:
+<programlisting>
+/* translator: This message is not what it seems to be. */
+</programlisting>
+ These comments are copied to the message catalog files so that
+ the translators can see them.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </sect2>
+ </sect1>
+
+</chapter>