summaryrefslogtreecommitdiffstats
path: root/upstream/archlinux/man3/Locale::Maketext::TPJ13.3perl
diff options
context:
space:
mode:
Diffstat (limited to 'upstream/archlinux/man3/Locale::Maketext::TPJ13.3perl')
-rw-r--r--upstream/archlinux/man3/Locale::Maketext::TPJ13.3perl855
1 files changed, 855 insertions, 0 deletions
diff --git a/upstream/archlinux/man3/Locale::Maketext::TPJ13.3perl b/upstream/archlinux/man3/Locale::Maketext::TPJ13.3perl
new file mode 100644
index 00000000..751b3478
--- /dev/null
+++ b/upstream/archlinux/man3/Locale::Maketext::TPJ13.3perl
@@ -0,0 +1,855 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+. ds C` ""
+. ds C' ""
+'br\}
+.el\{\
+. ds C`
+. ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD. Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+. if \nF \{\
+. de IX
+. tm Index:\\$1\t\\n%\t"\\$2"
+..
+. if !\nF==2 \{\
+. nr % 0
+. nr F 2
+. \}
+. \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "Locale::Maketext::TPJ13 3perl"
+.TH Locale::Maketext::TPJ13 3perl 2024-02-11 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification. Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+Locale::Maketext::TPJ13 \-\- article about software localization
+.SH SYNOPSIS
+.IX Header "SYNOPSIS"
+.Vb 1
+\& # This an article, not a module.
+.Ve
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+The following article by Sean M. Burke and Jordan Lachler
+first appeared in \fIThe Perl Journal\fR #13
+and is copyright 1999 The Perl Journal. It appears
+courtesy of Jon Orwant and The Perl Journal. This document may be
+distributed under the same terms as Perl itself.
+.SH "Localization and Perl: gettext breaks, Maketext fixes"
+.IX Header "Localization and Perl: gettext breaks, Maketext fixes"
+by Sean M. Burke and Jordan Lachler
+.PP
+This article points out cases where gettext (a common system for
+localizing software interfaces \-\- i.e., making them work in the user's
+language of choice) fails because of basic differences between human
+languages. This article then describes Maketext, a new system capable
+of correctly treating these differences.
+.SS "A Localization Horror Story: It Could Happen To You"
+.IX Subsection "A Localization Horror Story: It Could Happen To You"
+.RS 4
+"There are a number of languages spoken by human beings in this
+world."
+.Sp
+\&\-\- Harald Tveit Alvestrand, in RFC 1766, "Tags for the
+Identification of Languages"
+.RE
+.PP
+Imagine that your task for the day is to localize a piece of software
+\&\-\- and luckily for you, the only output the program emits is two
+messages, like this:
+.PP
+.Vb 1
+\& I scanned 12 directories.
+\&
+\& Your query matched 10 files in 4 directories.
+.Ve
+.PP
+So how hard could that be? You look at the code that
+produces the first item, and it reads:
+.PP
+.Vb 2
+\& printf("I scanned %g directories.",
+\& $directory_count);
+.Ve
+.PP
+You think about that, and realize that it doesn't even work right for
+English, as it can produce this output:
+.PP
+.Vb 1
+\& I scanned 1 directories.
+.Ve
+.PP
+So you rewrite it to read:
+.PP
+.Vb 5
+\& printf("I scanned %g %s.",
+\& $directory_count,
+\& $directory_count == 1 ?
+\& "directory" : "directories",
+\& );
+.Ve
+.PP
+\&...which does the Right Thing. (In case you don't recall, "%g" is for
+locale-specific number interpolation, and "%s" is for string
+interpolation.)
+.PP
+But you still have to localize it for all the languages you're
+producing this software for, so you pull Locale::gettext off of CPAN
+so you can access the \f(CW\*(C`gettext\*(C'\fR C functions you've heard are standard
+for localization tasks.
+.PP
+And you write:
+.PP
+.Vb 5
+\& printf(gettext("I scanned %g %s."),
+\& $dir_scan_count,
+\& $dir_scan_count == 1 ?
+\& gettext("directory") : gettext("directories"),
+\& );
+.Ve
+.PP
+But you then read in the gettext manual (Drepper, Miller, and Pinard 1995)
+that this is not a good idea, since how a single word like "directory"
+or "directories" is translated may depend on context \-\- and this is
+true, since in a case language like German or Russian, you'd may need
+these words with a different case ending in the first instance (where the
+word is the object of a verb) than in the second instance, which you haven't even
+gotten to yet (where the word is the object of a preposition, "in \f(CW%g\fR
+directories") \-\- assuming these keep the same syntax when translated
+into those languages.
+.PP
+So, on the advice of the gettext manual, you rewrite:
+.PP
+.Vb 4
+\& printf( $dir_scan_count == 1 ?
+\& gettext("I scanned %g directory.") :
+\& gettext("I scanned %g directories."),
+\& $dir_scan_count );
+.Ve
+.PP
+So, you email your various translators (the boss decides that the
+languages du jour are Chinese, Arabic, Russian, and Italian, so you
+have one translator for each), asking for translations for "I scanned
+\&\f(CW%g\fR directory." and "I scanned \f(CW%g\fR directories.". When they reply,
+you'll put that in the lexicons for gettext to use when it localizes
+your software, so that when the user is running under the "zh"
+(Chinese) locale, gettext("I scanned \f(CW%g\fR directory.") will return the
+appropriate Chinese text, with a "%g" in there where printf can then
+interpolate \f(CW$dir_scan\fR.
+.PP
+Your Chinese translator emails right back \-\- he says both of these
+phrases translate to the same thing in Chinese, because, in linguistic
+jargon, Chinese "doesn't have number as a grammatical category" \-\-
+whereas English does. That is, English has grammatical rules that
+refer to "number", i.e., whether something is grammatically singular
+or plural; and one of these rules is the one that forces nouns to take
+a plural suffix (generally "s") when in a plural context, as they are when
+they follow a number other than "one" (including, oddly enough, "zero").
+Chinese has no such rules, and so has just the one phrase where English
+has two. But, no problem, you can have this one Chinese phrase appear
+as the translation for the two English phrases in the "zh" gettext
+lexicon for your program.
+.PP
+Emboldened by this, you dive into the second phrase that your software
+needs to output: "Your query matched 10 files in 4 directories.". You notice
+that if you want to treat phrases as indivisible, as the gettext
+manual wisely advises, you need four cases now, instead of two, to
+cover the permutations of singular and plural on the two items,
+\&\f(CW$dir_count\fR and \f(CW$file_count\fR. So you try this:
+.PP
+.Vb 9
+\& printf( $file_count == 1 ?
+\& ( $directory_count == 1 ?
+\& gettext("Your query matched %g file in %g directory.") :
+\& gettext("Your query matched %g file in %g directories.") ) :
+\& ( $directory_count == 1 ?
+\& gettext("Your query matched %g files in %g directory.") :
+\& gettext("Your query matched %g files in %g directories.") ),
+\& $file_count, $directory_count,
+\& );
+.Ve
+.PP
+(The case of "1 file in 2 [or more] directories" could, I suppose,
+occur in the case of symlinking or something of the sort.)
+.PP
+It occurs to you that this is not the prettiest code you've ever
+written, but this seems the way to go. You mail off to the
+translators asking for translations for these four cases. The
+Chinese guy replies with the one phrase that these all translate to in
+Chinese, and that phrase has two "%g"s in it, as it should \-\- but
+there's a problem. He translates it word-for-word back: "In \f(CW%g\fR
+directories contains \f(CW%g\fR files match your query." The \f(CW%g\fR
+slots are in an order reverse to what they are in English. You wonder
+how you'll get gettext to handle that.
+.PP
+But you put it aside for the moment, and optimistically hope that the
+other translators won't have this problem, and that their languages
+will be better behaved \-\- i.e., that they will be just like English.
+.PP
+But the Arabic translator is the next to write back. First off, your
+code for "I scanned \f(CW%g\fR directory." or "I scanned \f(CW%g\fR directories."
+assumes there's only singular or plural. But, to use linguistic
+jargon again, Arabic has grammatical number, like English (but unlike
+Chinese), but it's a three-term category: singular, dual, and plural.
+In other words, the way you say "directory" depends on whether there's
+one directory, or \fItwo\fR of them, or \fImore than two\fR of them. Your
+test of \f(CW\*(C`($directory == 1)\*(C'\fR no longer does the job. And it means
+that where English's grammatical category of number necessitates
+only the two permutations of the first sentence based on "directory
+[singular]" and "directories [plural]", Arabic has three \-\- and,
+worse, in the second sentence ("Your query matched \f(CW%g\fR file in \f(CW%g\fR
+directory."), where English has four, Arabic has nine. You sense
+an unwelcome, exponential trend taking shape.
+.PP
+Your Italian translator emails you back and says that "I searched 0
+directories" (a possible English output of your program) is stilted,
+and if you think that's fine English, that's your problem, but that
+\&\fIjust will not do\fR in the language of Dante. He insists that where
+\&\f(CW$directory_count\fR is 0, your program should produce the Italian text
+for "I \fIdidn't\fR scan \fIany\fR directories.". And ditto for "I didn't
+match any files in any directories", although he says the last part
+about "in any directories" should probably just be left off.
+.PP
+You wonder how you'll get gettext to handle this; to accommodate the
+ways Arabic, Chinese, and Italian deal with numbers in just these few
+very simple phrases, you need to write code that will ask gettext for
+different queries depending on whether the numerical values in
+question are 1, 2, more than 2, or in some cases 0, and you still haven't
+figured out the problem with the different word order in Chinese.
+.PP
+Then your Russian translator calls on the phone, to \fIpersonally\fR tell
+you the bad news about how really unpleasant your life is about to
+become:
+.PP
+Russian, like German or Latin, is an inflectional language; that is, nouns
+and adjectives have to take endings that depend on their case
+(i.e., nominative, accusative, genitive, etc...) \-\- which is roughly a matter of
+what role they have in syntax of the sentence \-\-
+as well as on the grammatical gender (i.e., masculine, feminine, neuter)
+and number (i.e., singular or plural) of the noun, as well as on the
+declension class of the noun. But unlike with most other inflected languages,
+putting a number-phrase (like "ten" or "forty-three", or their Arabic
+numeral equivalents) in front of noun in Russian can change the case and
+number that noun is, and therefore the endings you have to put on it.
+.PP
+He elaborates: In "I scanned \f(CW%g\fR directories", you'd \fIexpect\fR
+"directories" to be in the accusative case (since it is the direct
+object in the sentence) and the plural number,
+except where \f(CW$directory_count\fR is 1, then you'd expect the singular, of
+course. Just like Latin or German. \fIBut!\fR Where \f(CW$directory_count\fR %
+10 is 1 ("%" for modulo, remember), assuming \f(CW$directory\fR count is an
+integer, and except where \f(CW$directory_count\fR % 100 is 11, "directories"
+is forced to become grammatically singular, which means it gets the
+ending for the accusative singular... You begin to visualize the code
+it'd take to test for the problem so far, \fIand still work for Chinese
+and Arabic and Italian\fR, and how many gettext items that'd take, but
+he keeps going... But where \f(CW$directory_count\fR % 10 is 2, 3, or 4
+(except where \f(CW$directory_count\fR % 100 is 12, 13, or 14), the word for
+"directories" is forced to be genitive singular \-\- which means another
+ending... The room begins to spin around you, slowly at first... But
+with \fIall other\fR integer values, since "directory" is an inanimate
+noun, when preceded by a number and in the nominative or accusative
+cases (as it is here, just your luck!), it does stay plural, but it is
+forced into the genitive case \-\- yet another ending... And
+you never hear him get to the part about how you're going to run into
+similar (but maybe subtly different) problems with other Slavic
+languages like Polish, because the floor comes up to meet you, and you
+fade into unconsciousness.
+.PP
+The above cautionary tale relates how an attempt at localization can
+lead from programmer consternation, to program obfuscation, to a need
+for sedation. But careful evaluation shows that your choice of tools
+merely needed further consideration.
+.SS "The Linguistic View"
+.IX Subsection "The Linguistic View"
+.RS 4
+"It is more complicated than you think."
+.Sp
+\&\-\- The Eighth Networking Truth, from RFC 1925
+.RE
+.PP
+The field of Linguistics has expended a great deal of effort over the
+past century trying to find grammatical patterns which hold across
+languages; it's been a constant process
+of people making generalizations that should apply to all languages,
+only to find out that, all too often, these generalizations fail \-\-
+sometimes failing for just a few languages, sometimes whole classes of
+languages, and sometimes nearly every language in the world except
+English. Broad statistical trends are evident in what the "average
+language" is like as far as what its rules can look like, must look
+like, and cannot look like. But the "average language" is just as
+unreal a concept as the "average person" \-\- it runs up against the
+fact no language (or person) is, in fact, average. The wisdom of past
+experience leads us to believe that any given language can do whatever
+it wants, in any order, with appeal to any kind of grammatical
+categories wants \-\- case, number, tense, real or metaphoric
+characteristics of the things that words refer to, arbitrary or
+predictable classifications of words based on what endings or prefixes
+they can take, degree or means of certainty about the truth of
+statements expressed, and so on, ad infinitum.
+.PP
+Mercifully, most localization tasks are a matter of finding ways to
+translate whole phrases, generally sentences, where the context is
+relatively set, and where the only variation in content is \fIusually\fR
+in a number being expressed \-\- as in the example sentences above.
+Translating specific, fully-formed sentences is, in practice, fairly
+foolproof \-\- which is good, because that's what's in the phrasebooks
+that so many tourists rely on. Now, a given phrase (whether in a
+phrasebook or in a gettext lexicon) in one language \fImight\fR have a
+greater or lesser applicability than that phrase's translation into
+another language \-\- for example, strictly speaking, in Arabic, the
+"your" in "Your query matched..." would take a different form
+depending on whether the user is male or female; so the Arabic
+translation "your[feminine] query" is applicable in fewer cases than
+the corresponding English phrase, which doesn't distinguish the user's
+gender. (In practice, it's not feasible to have a program know the
+user's gender, so the masculine "you" in Arabic is usually used, by
+default.)
+.PP
+But in general, such surprises are rare when entire sentences are
+being translated, especially when the functional context is restricted
+to that of a computer interacting with a user either to convey a fact
+or to prompt for a piece of information. So, for purposes of
+localization, translation by phrase (generally by sentence) is both the
+simplest and the least problematic.
+.SS "Breaking gettext"
+.IX Subsection "Breaking gettext"
+.RS 4
+"It Has To Work."
+.Sp
+\&\-\- First Networking Truth, RFC 1925
+.RE
+.PP
+Consider that sentences in a tourist phrasebook are of two types: ones
+like "How do I get to the marketplace?" that don't have any blanks to
+fill in, and ones like "How much do these _\|_\|_ cost?", where there's
+one or more blanks to fill in (and these are usually linked to a
+list of words that you can put in that blank: "fish", "potatoes",
+"tomatoes", etc.). The ones with no blanks are no problem, but the
+fill-in-the-blank ones may not be really straightforward. If it's a
+Swahili phrasebook, for example, the authors probably didn't bother to
+tell you the complicated ways that the verb "cost" changes its
+inflectional prefix depending on the noun you're putting in the blank.
+The trader in the marketplace will still understand what you're saying if
+you say "how much do these potatoes cost?" with the wrong
+inflectional prefix on "cost". After all, \fIyou\fR can't speak proper Swahili,
+\&\fIyou're\fR just a tourist. But while tourists can be stupid, computers
+are supposed to be smart; the computer should be able to fill in the
+blank, and still have the results be grammatical.
+.PP
+In other words, a phrasebook entry takes some values as parameters
+(the things that you fill in the blank or blanks), and provides a value
+based on these parameters, where the way you get that final value from
+the given values can, properly speaking, involve an arbitrarily
+complex series of operations. (In the case of Chinese, it'd be not at
+all complex, at least in cases like the examples at the beginning of
+this article; whereas in the case of Russian it'd be a rather complex
+series of operations. And in some languages, the
+complexity could be spread around differently: while the act of
+putting a number-expression in front of a noun phrase might not be
+complex by itself, it may change how you have to, for example, inflect
+a verb elsewhere in the sentence. This is what in syntax is called
+"long-distance dependencies".)
+.PP
+This talk of parameters and arbitrary complexity is just another way
+to say that an entry in a phrasebook is what in a programming language
+would be called a "function". Just so you don't miss it, this is the
+crux of this article: \fIA phrase is a function; a phrasebook is a
+bunch of functions.\fR
+.PP
+The reason that using gettext runs into walls (as in the above
+second-person horror story) is that you're trying to use a string (or
+worse, a choice among a bunch of strings) to do what you really need a
+function for \-\- which is futile. Preforming (s)printf interpolation
+on the strings which you get back from gettext does allow you to do \fIsome\fR
+common things passably well... sometimes... sort of; but, to paraphrase
+what some people say about \f(CW\*(C`csh\*(C'\fR script programming, "it fools you
+into thinking you can use it for real things, but you can't, and you
+don't discover this until you've already spent too much time trying,
+and by then it's too late."
+.SS "Replacing gettext"
+.IX Subsection "Replacing gettext"
+So, what needs to replace gettext is a system that supports lexicons
+of functions instead of lexicons of strings. An entry in a lexicon
+from such a system should \fInot\fR look like this:
+.PP
+.Vb 1
+\& "J\*(Aqai trouv\exE9 %g fichiers dans %g r\exE9pertoires"
+.Ve
+.PP
+[\exE9 is e\-acute in Latin\-1. Some pod renderers would
+scream if I used the actual character here. \-\- SB]
+.PP
+but instead like this, bearing in mind that this is just a first stab:
+.PP
+.Vb 8
+\& sub I_found_X1_files_in_X2_directories {
+\& my( $files, $dirs ) = @_[0,1];
+\& $files = sprintf("%g %s", $files,
+\& $files == 1 ? \*(Aqfichier\*(Aq : \*(Aqfichiers\*(Aq);
+\& $dirs = sprintf("%g %s", $dirs,
+\& $dirs == 1 ? "r\exE9pertoire" : "r\exE9pertoires");
+\& return "J\*(Aqai trouv\exE9 $files dans $dirs.";
+\& }
+.Ve
+.PP
+Now, there's no particularly obvious way to store anything but strings
+in a gettext lexicon; so it looks like we just have to start over and
+make something better, from scratch. I call my shot at a
+gettext-replacement system "Maketext", or, in CPAN terms,
+Locale::Maketext.
+.PP
+When designing Maketext, I chose to plan its main features in terms of
+"buzzword compliance". And here are the buzzwords:
+.SS "Buzzwords: Abstraction and Encapsulation"
+.IX Subsection "Buzzwords: Abstraction and Encapsulation"
+The complexity of the language you're trying to output a phrase in is
+entirely abstracted inside (and encapsulated within) the Maketext module
+for that interface. When you call:
+.PP
+.Vb 2
+\& print $lang\->maketext("You have [quant,_1,piece] of new mail.",
+\& scalar(@messages));
+.Ve
+.PP
+you don't know (and in fact can't easily find out) whether this will
+involve lots of figuring, as in Russian (if \f(CW$lang\fR is a handle to the
+Russian module), or relatively little, as in Chinese. That kind of
+abstraction and encapsulation may encourage other pleasant buzzwords
+like modularization and stratification, depending on what design
+decisions you make.
+.SS "Buzzword: Isomorphism"
+.IX Subsection "Buzzword: Isomorphism"
+"Isomorphism" means "having the same structure or form"; in discussions
+of program design, the word takes on the special, specific meaning that
+your implementation of a solution to a problem \fIhas the same
+structure\fR as, say, an informal verbal description of the solution, or
+maybe of the problem itself. Isomorphism is, all things considered,
+a good thing \-\- it's what problem-solving (and solution-implementing)
+should look like.
+.PP
+What's wrong the with gettext-using code like this...
+.PP
+.Vb 9
+\& printf( $file_count == 1 ?
+\& ( $directory_count == 1 ?
+\& "Your query matched %g file in %g directory." :
+\& "Your query matched %g file in %g directories." ) :
+\& ( $directory_count == 1 ?
+\& "Your query matched %g files in %g directory." :
+\& "Your query matched %g files in %g directories." ),
+\& $file_count, $directory_count,
+\& );
+.Ve
+.PP
+is first off that it's not well abstracted \-\- these ways of testing
+for grammatical number (as in the expressions like \f(CW\*(C`foo == 1 ?
+singular_form : plural_form\*(C'\fR) should be abstracted to each language
+module, since how you get grammatical number is language-specific.
+.PP
+But second off, it's not isomorphic \-\- the "solution" (i.e., the
+phrasebook entries) for Chinese maps from these four English phrases to
+the one Chinese phrase that fits for all of them. In other words, the
+informal solution would be "The way to say what you want in Chinese is
+with the one phrase 'For your question, in Y directories you would
+find X files'" \-\- and so the implemented solution should be,
+isomorphically, just a straightforward way to spit out that one
+phrase, with numerals properly interpolated. It shouldn't have to map
+from the complexity of other languages to the simplicity of this one.
+.SS "Buzzword: Inheritance"
+.IX Subsection "Buzzword: Inheritance"
+There's a great deal of reuse possible for sharing of phrases between
+modules for related dialects, or for sharing of auxiliary functions
+between related languages. (By "auxiliary functions", I mean
+functions that don't produce phrase-text, but which, say, return an
+answer to "does this number require a plural noun after it?". Such
+auxiliary functions would be used in the internal logic of functions
+that actually do produce phrase-text.)
+.PP
+In the case of sharing phrases, consider that you have an interface
+already localized for American English (probably by having been
+written with that as the native locale, but that's incidental).
+Localizing it for UK English should, in practical terms, be just a
+matter of running it past a British person with the instructions to
+indicate what few phrases would benefit from a change in spelling or
+possibly minor rewording. In that case, you should be able to put in
+the UK English localization module \fIonly\fR those phrases that are
+UK-specific, and for all the rest, \fIinherit\fR from the American
+English module. (And I expect this same situation would apply with
+Brazilian and Continental Portugese, possibly with some \fIvery\fR
+closely related languages like Czech and Slovak, and possibly with the
+slightly different "versions" of written Mandarin Chinese, as I hear exist in
+Taiwan and mainland China.)
+.PP
+As to sharing of auxiliary functions, consider the problem of Russian
+numbers from the beginning of this article; obviously, you'd want to
+write only once the hairy code that, given a numeric value, would
+return some specification of which case and number a given quantified
+noun should use. But suppose that you discover, while localizing an
+interface for, say, Ukrainian (a Slavic language related to Russian,
+spoken by several million people, many of whom would be relieved to
+find that your Web site's or software's interface is available in
+their language), that the rules in Ukrainian are the same as in Russian
+for quantification, and probably for many other grammatical functions.
+While there may well be no phrases in common between Russian and
+Ukrainian, you could still choose to have the Ukrainian module inherit
+from the Russian module, just for the sake of inheriting all the
+various grammatical methods. Or, probably better organizationally,
+you could move those functions to a module called \f(CW\*(C`_E_Slavic\*(C'\fR or
+something, which Russian and Ukrainian could inherit useful functions
+from, but which would (presumably) provide no lexicon.
+.SS "Buzzword: Concision"
+.IX Subsection "Buzzword: Concision"
+Okay, concision isn't a buzzword. But it should be, so I decree that
+as a new buzzword, "concision" means that simple common things should
+be expressible in very few lines (or maybe even just a few characters)
+of code \-\- call it a special case of "making simple things easy and
+hard things possible", and see also the role it played in the
+MIDI::Simple language, discussed elsewhere in this issue [TPJ#13].
+.PP
+Consider our first stab at an entry in our "phrasebook of functions":
+.PP
+.Vb 8
+\& sub I_found_X1_files_in_X2_directories {
+\& my( $files, $dirs ) = @_[0,1];
+\& $files = sprintf("%g %s", $files,
+\& $files == 1 ? \*(Aqfichier\*(Aq : \*(Aqfichiers\*(Aq);
+\& $dirs = sprintf("%g %s", $dirs,
+\& $dirs == 1 ? "r\exE9pertoire" : "r\exE9pertoires");
+\& return "J\*(Aqai trouv\exE9 $files dans $dirs.";
+\& }
+.Ve
+.PP
+You may sense that a lexicon (to use a non-committal catch-all term for a
+collection of things you know how to say, regardless of whether they're
+phrases or words) consisting of functions \fIexpressed\fR as above would
+make for rather long-winded and repetitive code \-\- even if you wisely
+rewrote this to have quantification (as we call adding a number
+expression to a noun phrase) be a function called like:
+.PP
+.Vb 6
+\& sub I_found_X1_files_in_X2_directories {
+\& my( $files, $dirs ) = @_[0,1];
+\& $files = quant($files, "fichier");
+\& $dirs = quant($dirs, "r\exE9pertoire");
+\& return "J\*(Aqai trouv\exE9 $files dans $dirs.";
+\& }
+.Ve
+.PP
+And you may also sense that you do not want to bother your translators
+with having to write Perl code \-\- you'd much rather that they spend
+their \fIvery costly time\fR on just translation. And this is to say
+nothing of the near impossibility of finding a commercial translator
+who would know even simple Perl.
+.PP
+In a first-hack implementation of Maketext, each language-module's
+lexicon looked like this:
+.PP
+.Vb 10
+\& %Lexicon = (
+\& "I found %g files in %g directories"
+\& => sub {
+\& my( $files, $dirs ) = @_[0,1];
+\& $files = quant($files, "fichier");
+\& $dirs = quant($dirs, "r\exE9pertoire");
+\& return "J\*(Aqai trouv\exE9 $files dans $dirs.";
+\& },
+\& ... and so on with other phrase => sub mappings ...
+\& );
+.Ve
+.PP
+but I immediately went looking for some more concise way to basically
+denote the same phrase-function \-\- a way that would also serve to
+concisely denote \fImost\fR phrase-functions in the lexicon for \fImost\fR
+languages. After much time and even some actual thought, I decided on
+this system:
+.PP
+* Where a value in a \f(CW%Lexicon\fR hash is a contentful string instead of
+an anonymous sub (or, conceivably, a coderef), it would be interpreted
+as a sort of shorthand expression of what the sub does. When accessed
+for the first time in a session, it is parsed, turned into Perl code,
+and then eval'd into an anonymous sub; then that sub replaces the
+original string in that lexicon. (That way, the work of parsing and
+evaling the shorthand form for a given phrase is done no more than
+once per session.)
+.PP
+* Calls to \f(CW\*(C`maketext\*(C'\fR (as Maketext's main function is called) happen
+thru a "language session handle", notionally very much like an IO
+handle, in that you open one at the start of the session, and use it
+for "sending signals" to an object in order to have it return the text
+you want.
+.PP
+So, this:
+.PP
+.Vb 2
+\& $lang\->maketext("You have [quant,_1,piece] of new mail.",
+\& scalar(@messages));
+.Ve
+.PP
+basically means this: look in the lexicon for \f(CW$lang\fR (which may inherit
+from any number of other lexicons), and find the function that we
+happen to associate with the string "You have [quant,_1,piece] of new
+mail" (which is, and should be, a functioning "shorthand" for this
+function in the native locale \-\- English in this case). If you find
+such a function, call it with \f(CW$lang\fR as its first parameter (as if it
+were a method), and then a copy of scalar(@messages) as its second,
+and then return that value. If that function was found, but was in
+string shorthand instead of being a fully specified function, parse it
+and make it into a function before calling it the first time.
+.PP
+* The shorthand uses code in brackets to indicate method calls that
+should be performed. A full explanation is not in order here, but a
+few examples will suffice:
+.PP
+.Vb 1
+\& "You have [quant,_1,piece] of new mail."
+.Ve
+.PP
+The above code is shorthand for, and will be interpreted as,
+this:
+.PP
+.Vb 8
+\& sub {
+\& my $handle = $_[0];
+\& my(@params) = @_;
+\& return join \*(Aq\*(Aq,
+\& "You have ",
+\& $handle\->quant($params[1], \*(Aqpiece\*(Aq),
+\& "of new mail.";
+\& }
+.Ve
+.PP
+where "quant" is the name of a method you're using to quantify the
+noun "piece" with the number \f(CW$params\fR[0].
+.PP
+A string with no brackety calls, like this:
+.PP
+.Vb 1
+\& "Your search expression was malformed."
+.Ve
+.PP
+is somewhat of a degenerate case, and just gets turned into:
+.PP
+.Vb 1
+\& sub { return "Your search expression was malformed." }
+.Ve
+.PP
+However, not everything you can write in Perl code can be written in
+the above shorthand system \-\- not by a long shot. For example, consider
+the Italian translator from the beginning of this article, who wanted
+the Italian for "I didn't find any files" as a special case, instead
+of "I found 0 files". That couldn't be specified (at least not easily
+or simply) in our shorthand system, and it would have to be written
+out in full, like this:
+.PP
+.Vb 10
+\& sub { # pretend the English strings are in Italian
+\& my($handle, $files, $dirs) = @_[0,1,2];
+\& return "I didn\*(Aqt find any files" unless $files;
+\& return join \*(Aq\*(Aq,
+\& "I found ",
+\& $handle\->quant($files, \*(Aqfile\*(Aq),
+\& " in ",
+\& $handle\->quant($dirs, \*(Aqdirectory\*(Aq),
+\& ".";
+\& }
+.Ve
+.PP
+Next to a lexicon full of shorthand code, that sort of sticks out like a
+sore thumb \-\- but this \fIis\fR a special case, after all; and at least
+it's possible, if not as concise as usual.
+.PP
+As to how you'd implement the Russian example from the beginning of
+the article, well, There's More Than One Way To Do It, but it could be
+something like this (using English words for Russian, just so you know
+what's going on):
+.PP
+.Vb 1
+\& "I [quant,_1,directory,accusative] scanned."
+.Ve
+.PP
+This shifts the burden of complexity off to the quant method. That
+method's parameters are: the numeric value it's going to use to
+quantify something; the Russian word it's going to quantify; and the
+parameter "accusative", which you're using to mean that this
+sentence's syntax wants a noun in the accusative case there, although
+that quantification method may have to overrule, for grammatical
+reasons you may recall from the beginning of this article.
+.PP
+Now, the Russian quant method here is responsible not only for
+implementing the strange logic necessary for figuring out how Russian
+number-phrases impose case and number on their noun-phrases, but also
+for inflecting the Russian word for "directory". How that inflection
+is to be carried out is no small issue, and among the solutions I've
+seen, some (like variations on a simple lookup in a hash where all
+possible forms are provided for all necessary words) are
+straightforward but \fIcan\fR become cumbersome when you need to inflect
+more than a few dozen words; and other solutions (like using
+algorithms to model the inflections, storing only root forms and
+irregularities) \fIcan\fR involve more overhead than is justifiable for
+all but the largest lexicons.
+.PP
+Mercifully, this design decision becomes crucial only in the hairiest
+of inflected languages, of which Russian is by no means the \fIworst\fR case
+scenario, but is worse than most. Most languages have simpler
+inflection systems; for example, in English or Swahili, there are
+generally no more than two possible inflected forms for a given noun
+("error/errors"; "kosa/makosa"), and the
+rules for producing these forms are fairly simple \-\- or at least,
+simple rules can be formulated that work for most words, and you can
+then treat the exceptions as just "irregular", at least relative to
+your ad hoc rules. A simpler inflection system (simpler rules, fewer
+forms) means that design decisions are less crucial to maintaining
+sanity, whereas the same decisions could incur
+overhead-versus-scalability problems in languages like Russian. It
+may \fIalso\fR be likely that code (possibly in Perl, as with
+Lingua::EN::Inflect, for English nouns) has already
+been written for the language in question, whether simple or complex.
+.PP
+Moreover, a third possibility may even be simpler than anything
+discussed above: "Just require that all possible (or at least
+applicable) forms be provided in the call to the given language's quant
+method, as in:"
+.PP
+.Vb 1
+\& "I found [quant,_1,file,files]."
+.Ve
+.PP
+That way, quant just has to chose which form it needs, without having
+to look up or generate anything. While possibly not optimal for
+Russian, this should work well for most other languages, where
+quantification is not as complicated an operation.
+.SS "The Devil in the Details"
+.IX Subsection "The Devil in the Details"
+There's plenty more to Maketext than described above \-\- for example,
+there's the details of how language tags ("en-US", "i\-pwn", "fi",
+etc.) or locale IDs ("en_US") interact with actual module naming
+("BogoQuery/Locale/en_us.pm"), and what magic can ensue; there's the
+details of how to record (and possibly negotiate) what character
+encoding Maketext will return text in (UTF8? Latin\-1? KOI8?). There's
+the interesting fact that Maketext is for localization, but nowhere
+actually has a "\f(CW\*(C`use locale;\*(C'\fR" anywhere in it. For the curious,
+there's the somewhat frightening details of how I actually
+implement something like data inheritance so that searches across
+modules' \f(CW%Lexicon\fR hashes can parallel how Perl implements method
+inheritance.
+.PP
+And, most importantly, there's all the practical details of how to
+actually go about deriving from Maketext so you can use it for your
+interfaces, and the various tools and conventions for starting out and
+maintaining individual language modules.
+.PP
+That is all covered in the documentation for Locale::Maketext and the
+modules that come with it, available in CPAN. After having read this
+article, which covers the why's of Maketext, the documentation,
+which covers the how's of it, should be quite straightforward.
+.SS "The Proof in the Pudding: Localizing Web Sites"
+.IX Subsection "The Proof in the Pudding: Localizing Web Sites"
+Maketext and gettext have a notable difference: gettext is in C,
+accessible thru C library calls, whereas Maketext is in Perl, and
+really can't work without a Perl interpreter (although I suppose
+something like it could be written for C). Accidents of history (and
+not necessarily lucky ones) have made C++ the most common language for
+the implementation of applications like word processors, Web browsers,
+and even many in-house applications like custom query systems. Current
+conditions make it somewhat unlikely that the next one of any of these
+kinds of applications will be written in Perl, albeit clearly more for
+reasons of custom and inertia than out of consideration of what is the
+right tool for the job.
+.PP
+However, other accidents of history have made Perl a well-accepted
+language for design of server-side programs (generally in CGI form)
+for Web site interfaces. Localization of static pages in Web sites is
+trivial, feasible either with simple language-negotiation features in
+servers like Apache, or with some kind of server-side inclusions of
+language-appropriate text into layout templates. However, I think
+that the localization of Perl-based search systems (or other kinds of
+dynamic content) in Web sites, be they public or access-restricted,
+is where Maketext will see the greatest use.
+.PP
+I presume that it would be only the exceptional Web site that gets
+localized for English \fIand\fR Chinese \fIand\fR Italian \fIand\fR Arabic
+\&\fIand\fR Russian, to recall the languages from the beginning of this
+article \-\- to say nothing of German, Spanish, French, Japanese,
+Finnish, and Hindi, to name a few languages that benefit from large
+numbers of programmers or Web viewers or both.
+.PP
+However, the ever-increasing internationalization of the Web (whether
+measured in terms of amount of content, of numbers of content writers
+or programmers, or of size of content audiences) makes it increasingly
+likely that the interface to the average Web-based dynamic content
+service will be localized for two or maybe three languages. It is my
+hope that Maketext will make that task as simple as possible, and will
+remove previous barriers to localization for languages dissimilar to
+English.
+.PP
+.Vb 1
+\& _\|_END_\|_
+.Ve
+.PP
+Sean M. Burke (sburke@cpan.org) has a Master's in linguistics
+from Northwestern University; he specializes in language technology.
+Jordan Lachler (lachler@unm.edu) is a PhD student in the Department of
+Linguistics at the University of New Mexico; he specializes in
+morphology and pedagogy of North American native languages.
+.SS References
+.IX Subsection "References"
+Alvestrand, Harald Tveit. 1995. \fIRFC 1766: Tags for the
+Identification of Languages.\fR
+\&\f(CW\*(C`<http://www.ietf.org/rfc/rfc1766.txt>\*(C'\fR
+[Now see RFC 3066.]
+.PP
+Callon, Ross, editor. 1996. \fIRFC 1925: The Twelve
+Networking Truths.\fR
+\&\f(CW\*(C`<http://www.ietf.org/rfc/rfc1925.txt>\*(C'\fR
+.PP
+Drepper, Ulrich, Peter Miller,
+and François Pinard. 1995\-2001. GNU
+\&\f(CW\*(C`gettext\*(C'\fR. Available in \f(CW\*(C`<ftp://prep.ai.mit.edu/pub/gnu/>\*(C'\fR, with
+extensive docs in the distribution tarball. [Since
+I wrote this article in 1998, I now see that the
+gettext docs are now trying more to come to terms with
+plurality. Whether useful conclusions have come from it
+is another question altogether. \-\- SMB, May 2001]
+.PP
+Forbes, Nevill. 1964. \fIRussian Grammar.\fR Third Edition, revised
+by J. C. Dumbreck. Oxford University Press.