diff options
Diffstat (limited to 'doc/src/sgml/nls.sgml')
-rw-r--r-- | doc/src/sgml/nls.sgml | 532 |
1 files changed, 532 insertions, 0 deletions
diff --git a/doc/src/sgml/nls.sgml b/doc/src/sgml/nls.sgml new file mode 100644 index 0000000..d49f44f --- /dev/null +++ b/doc/src/sgml/nls.sgml @@ -0,0 +1,532 @@ +<!-- doc/src/sgml/nls.sgml --> + +<chapter id="nls"> + <title>Native Language Support</title> + + <sect1 id="nls-translator"> + <title>For the Translator</title> + + <para> + <productname>PostgreSQL</productname> + programs (server and client) can issue their messages in + your favorite language — if the messages have been translated. + Creating and maintaining translated message sets needs the help of + people who speak their own language well and want to contribute to + the <productname>PostgreSQL</productname> effort. You do not have to be a + programmer at all + to do this. This section explains how to help. + </para> + + <sect2> + <title>Requirements</title> + + <para> + We won't judge your language skills — this section is about + software tools. Theoretically, you only need a text editor. But + this is only in the unlikely event that you do not want to try out + your translated messages. When you configure your source tree, be + sure to use the <option>--enable-nls</option> option. This will + also check for the <application>libintl</application> library and the + <filename>msgfmt</filename> program, which all end users will need + anyway. To try out your work, follow the applicable portions of + the installation instructions. + </para> + + <para> + If you want to start a new translation effort or want to do a + message catalog merge (described later), you will need the + programs <filename>xgettext</filename> and + <filename>msgmerge</filename>, respectively, in a GNU-compatible + implementation. Later, we will try to arrange it so that if you + use a packaged source distribution, you won't need + <filename>xgettext</filename>. (If working from Git, you will still need + it.) <application>GNU Gettext 0.10.36</application> or later is currently recommended. + </para> + + <para> + Your local gettext implementation should come with its own + documentation. Some of that is probably duplicated in what + follows, but for additional details you should look there. + </para> + </sect2> + + <sect2> + <title>Concepts</title> + + <para> + The pairs of original (English) messages and their (possibly) + translated equivalents are kept in <firstterm>message + catalogs</firstterm>, one for each program (although related + programs can share a message catalog) and for each target + language. There are two file formats for message catalogs: The + first is the <quote>PO</quote> file (for Portable Object), which + is a plain text file with special syntax that translators edit. + The second is the <quote>MO</quote> file (for Machine Object), + which is a binary file generated from the respective PO file and + is used while the internationalized program is run. Translators + do not deal with MO files; in fact hardly anyone does. + </para> + + <para> + The extension of the message catalog file is to no surprise either + <filename>.po</filename> or <filename>.mo</filename>. The base + name is either the name of the program it accompanies, or the + language the file is for, depending on the situation. This is a + bit confusing. Examples are <filename>psql.po</filename> (PO file + for psql) or <filename>fr.mo</filename> (MO file in French). + </para> + + <para> + The file format of the PO files is illustrated here: +<programlisting> +# comment + +msgid "original string" +msgstr "translated string" + +msgid "more original" +msgstr "another translated" +"string can be broken up like this" + +... +</programlisting> + The msgid lines are extracted from the program source. (They need not + be, but this is the most common way.) The msgstr lines are + initially empty and are filled in with useful strings by the + translator. The strings can contain C-style escape characters and + can be continued across lines as illustrated. (The next line must + start at the beginning of the line.) + </para> + + <para> + The # character introduces a comment. If whitespace immediately + follows the # character, then this is a comment maintained by the + translator. There can also be automatic comments, which have a + non-whitespace character immediately following the #. These are + maintained by the various tools that operate on the PO files and + are intended to aid the translator. +<programlisting> +#. automatic comment +#: filename.c:1023 +#, flags, flags +</programlisting> + The #. style comments are extracted from the source file where the + message is used. Possibly the programmer has inserted information + for the translator, such as about expected alignment. The #: + comments indicate the exact locations where the message is used + in the source. The translator need not look at the program + source, but can if there is doubt about the correct + translation. The #, comments contain flags that describe the + message in some way. There are currently two flags: + <literal>fuzzy</literal> is set if the message has possibly been + outdated because of changes in the program source. The translator + can then verify this and possibly remove the fuzzy flag. Note + that fuzzy messages are not made available to the end user. The + other flag is <literal>c-format</literal>, which indicates that + the message is a <function>printf</function>-style format + template. This means that the translation should also be a format + string with the same number and type of placeholders. There are + tools that can verify this, which key off the c-format flag. + </para> + </sect2> + + <sect2> + <title>Creating and Maintaining Message Catalogs</title> + + <para> + OK, so how does one create a <quote>blank</quote> message + catalog? First, go into the directory that contains the program + whose messages you want to translate. If there is a file + <filename>nls.mk</filename>, then this program has been prepared + for translation. + </para> + + <para> + If there are already some <filename>.po</filename> files, then + someone has already done some translation work. The files are + named <filename><replaceable>language</replaceable>.po</filename>, + where <replaceable>language</replaceable> is the + <ulink url="https://www.loc.gov/standards/iso639-2/php/English_list.php"> + ISO 639-1 two-letter language code (in lower case)</ulink>, e.g., + <filename>fr.po</filename> for French. If there is really a need + for more than one translation effort per language then the files + can also be named + <filename><replaceable>language</replaceable>_<replaceable>region</replaceable>.po</filename> + where <replaceable>region</replaceable> is the + <ulink url="https://www.iso.org/iso-3166-country-codes.html"> + ISO 3166-1 two-letter country code (in upper case)</ulink>, + e.g., + <filename>pt_BR.po</filename> for Portuguese in Brazil. If you + find the language you wanted you can just start working on that + file. + </para> + + <para> + If you need to start a new translation effort, then first run the + command: +<programlisting> +make init-po +</programlisting> + This will create a file + <filename><replaceable>progname</replaceable>.pot</filename>. + (<filename>.pot</filename> to distinguish it from PO files that + are <quote>in production</quote>. The <literal>T</literal> stands for + <quote>template</quote>.) + Copy this file to + <filename><replaceable>language</replaceable>.po</filename> and + edit it. To make it known that the new language is available, + also edit the file <filename>nls.mk</filename> and add the + language (or language and country) code to the line that looks like: +<programlisting> +AVAIL_LANGUAGES := de fr +</programlisting> + (Other languages can appear, of course.) + </para> + + <para> + As the underlying program or library changes, messages might be + changed or added by the programmers. In this case you do not need + to start from scratch. Instead, run the command: +<programlisting> +make update-po +</programlisting> + which will create a new blank message catalog file (the pot file + you started with) and will merge it with the existing PO files. + If the merge algorithm is not sure about a particular message it + marks it <quote>fuzzy</quote> as explained above. The new PO file + is saved with a <filename>.po.new</filename> extension. + </para> + </sect2> + + <sect2> + <title>Editing the PO Files</title> + + <para> + The PO files can be edited with a regular text editor. The + translator should only change the area between the quotes after + the msgstr directive, add comments, and alter the fuzzy flag. + There is (unsurprisingly) a PO mode for Emacs, which I find quite + useful. + </para> + + <para> + The PO files need not be completely filled in. The software will + automatically fall back to the original string if no translation + (or an empty translation) is available. It is no problem to + submit incomplete translations for inclusions in the source tree; + that gives room for other people to pick up your work. However, + you are encouraged to give priority to removing fuzzy entries + after doing a merge. Remember that fuzzy entries will not be + installed; they only serve as reference for what might be the right + translation. + </para> + + <para> + Here are some things to keep in mind while editing the + translations: + <itemizedlist> + <listitem> + <para> + Make sure that if the original ends with a newline, the + translation does, too. Similarly for tabs, etc. + </para> + </listitem> + + <listitem> + <para> + If the original is a <function>printf</function> format string, the translation + also needs to be. The translation also needs to have the same + format specifiers in the same order. Sometimes the natural + rules of the language make this impossible or at least awkward. + In that case you can modify the format specifiers like this: +<programlisting> +msgstr "Die Datei %2$s hat %1$u Zeichen." +</programlisting> + Then the first placeholder will actually use the second + argument from the list. The + <literal><replaceable>digits</replaceable>$</literal> needs to + follow the % immediately, before any other format manipulators. + (This feature really exists in the <function>printf</function> + family of functions. You might not have heard of it before because + there is little use for it outside of message + internationalization.) + </para> + </listitem> + + <listitem> + <para> + If the original string contains a linguistic mistake, report + that (or fix it yourself in the program source) and translate + normally. The corrected string can be merged in when the + program sources have been updated. If the original string + contains a factual mistake, report that (or fix it yourself) + and do not translate it. Instead, you can mark the string with + a comment in the PO file. + </para> + </listitem> + + <listitem> + <para> + Maintain the style and tone of the original string. + Specifically, messages that are not sentences (<literal>cannot + open file %s</literal>) should probably not start with a + capital letter (if your language distinguishes letter case) or + end with a period (if your language uses punctuation marks). + It might help to read <xref linkend="error-style-guide"/>. + </para> + </listitem> + + <listitem> + <para> + If you don't know what a message means, or if it is ambiguous, + ask on the developers' mailing list. Chances are that English + speaking end users might also not understand it or find it + ambiguous, so it's best to improve the message. + </para> + </listitem> + + </itemizedlist> + </para> + </sect2> + + </sect1> + + + <sect1 id="nls-programmer"> + <title>For the Programmer</title> + + <sect2 id="nls-mechanics"> + <title>Mechanics</title> + + <para> + This section describes how to implement native language support in a + program or library that is part of the + <productname>PostgreSQL</productname> distribution. + Currently, it only applies to C programs. + </para> + + <procedure> + <title>Adding NLS Support to a Program</title> + + <step> + <para> + Insert this code into the start-up sequence of the program: +<programlisting> +#ifdef ENABLE_NLS +#include <locale.h> +#endif + +... + +#ifdef ENABLE_NLS +setlocale(LC_ALL, ""); +bindtextdomain("<replaceable>progname</replaceable>", LOCALEDIR); +textdomain("<replaceable>progname</replaceable>"); +#endif +</programlisting> + (The <replaceable>progname</replaceable> can actually be chosen + freely.) + </para> + </step> + + <step> + <para> + Wherever a message that is a candidate for translation is found, + a call to <function>gettext()</function> needs to be inserted. E.g.: +<programlisting> +fprintf(stderr, "panic level %d\n", lvl); +</programlisting> + would be changed to: +<programlisting> +fprintf(stderr, gettext("panic level %d\n"), lvl); +</programlisting> + (<symbol>gettext</symbol> is defined as a no-op if NLS support is + not configured.) + </para> + + <para> + This tends to add a lot of clutter. One common shortcut is to use: +<programlisting> +#define _(x) gettext(x) +</programlisting> + Another solution is feasible if the program does much of its + communication through one or a few functions, such as + <function>ereport()</function> in the backend. Then you make this + function call <function>gettext</function> internally on all + input strings. + </para> + </step> + + <step> + <para> + Add a file <filename>nls.mk</filename> in the directory with the + program sources. This file will be read as a makefile. The + following variable assignments need to be made here: + + <variablelist> + <varlistentry> + <term><varname>CATALOG_NAME</varname></term> + + <listitem> + <para> + The program name, as provided in the + <function>textdomain()</function> call. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><varname>AVAIL_LANGUAGES</varname></term> + + <listitem> + <para> + List of provided translations — initially empty. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><varname>GETTEXT_FILES</varname></term> + + <listitem> + <para> + List of files that contain translatable strings, i.e., those + marked with <function>gettext</function> or an alternative + solution. Eventually, this will include nearly all source + files of the program. If this list gets too long you can + make the first <quote>file</quote> be a <literal>+</literal> + and the second word be a file that contains one file name per + line. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><varname>GETTEXT_TRIGGERS</varname></term> + + <listitem> + <para> + The tools that generate message catalogs for the translators + to work on need to know what function calls contain + translatable strings. By default, only + <function>gettext()</function> calls are known. If you used + <function>_</function> or other identifiers you need to list + them here. If the translatable string is not the first + argument, the item needs to be of the form + <literal>func:2</literal> (for the second argument). + If you have a function that supports pluralized messages, + the item should look like <literal>func:1,2</literal> + (identifying the singular and plural message arguments). + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + </step> + + </procedure> + + <para> + The build system will automatically take care of building and + installing the message catalogs. + </para> + </sect2> + + <sect2 id="nls-guidelines"> + <title>Message-Writing Guidelines</title> + + <para> + Here are some guidelines for writing messages that are easily + translatable. + + <itemizedlist> + <listitem> + <para> + Do not construct sentences at run-time, like: +<programlisting> +printf("Files were %s.\n", flag ? "copied" : "removed"); +</programlisting> + The word order within the sentence might be different in other + languages. Also, even if you remember to call <function>gettext()</function> on + each fragment, the fragments might not translate well separately. It's + better to duplicate a little code so that each message to be + translated is a coherent whole. Only numbers, file names, and + such-like run-time variables should be inserted at run time into + a message text. + </para> + </listitem> + + <listitem> + <para> + For similar reasons, this won't work: +<programlisting> +printf("copied %d file%s", n, n!=1 ? "s" : ""); +</programlisting> + because it assumes how the plural is formed. If you figured you + could solve it like this: +<programlisting> +if (n==1) + printf("copied 1 file"); +else + printf("copied %d files", n): +</programlisting> + then be disappointed. Some languages have more than two forms, + with some peculiar rules. It's often best to design the message + to avoid the issue altogether, for instance like this: +<programlisting> +printf("number of copied files: %d", n); +</programlisting> + </para> + + <para> + If you really want to construct a properly pluralized message, + there is support for this, but it's a bit awkward. When generating + a primary or detail error message in <function>ereport()</function>, you can + write something like this: +<programlisting> +errmsg_plural("copied %d file", + "copied %d files", + n, + n) +</programlisting> + The first argument is the format string appropriate for English + singular form, the second is the format string appropriate for + English plural form, and the third is the integer control value + that determines which plural form to use. Subsequent arguments + are formatted per the format string as usual. (Normally, the + pluralization control value will also be one of the values to be + formatted, so it has to be written twice.) In English it only + matters whether <replaceable>n</replaceable> is 1 or not 1, but in other + languages there can be many different plural forms. The translator + sees the two English forms as a group and has the opportunity to + supply multiple substitute strings, with the appropriate one being + selected based on the run-time value of <replaceable>n</replaceable>. + </para> + + <para> + If you need to pluralize a message that isn't going directly to an + <function>errmsg</function> or <function>errdetail</function> report, you have to use + the underlying function <function>ngettext</function>. See the gettext + documentation. + </para> + </listitem> + + <listitem> + <para> + If you want to communicate something to the translator, such as + about how a message is intended to line up with other output, + precede the occurrence of the string with a comment that starts + with <literal>translator</literal>, e.g.: +<programlisting> +/* translator: This message is not what it seems to be. */ +</programlisting> + These comments are copied to the message catalog files so that + the translators can see them. + </para> + </listitem> + </itemizedlist> + </para> + </sect2> + </sect1> + +</chapter> |