diff options
Diffstat (limited to 'upstream/debian-unstable/man3/Locale::Maketext.3perl')
-rw-r--r-- | upstream/debian-unstable/man3/Locale::Maketext.3perl | 1509 |
1 files changed, 1509 insertions, 0 deletions
diff --git a/upstream/debian-unstable/man3/Locale::Maketext.3perl b/upstream/debian-unstable/man3/Locale::Maketext.3perl new file mode 100644 index 00000000..63c1c3d7 --- /dev/null +++ b/upstream/debian-unstable/man3/Locale::Maketext.3perl @@ -0,0 +1,1509 @@ +.\" -*- mode: troff; coding: utf-8 -*- +.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. +.ie n \{\ +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds C` +. ds C' +'br\} +.\" +.\" Escape single quotes in literal strings from groff's Unicode transform. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" +.\" If the F register is >0, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.\" +.\" Avoid warning from groff about undefined register 'F'. +.de IX +.. +.nr rF 0 +.if \n(.g .if rF .nr rF 1 +.if (\n(rF:(\n(.g==0)) \{\ +. if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. if !\nF==2 \{\ +. nr % 0 +. nr F 2 +. \} +. \} +.\} +.rr rF +.\" ======================================================================== +.\" +.IX Title "Locale::Maketext 3perl" +.TH Locale::Maketext 3perl 2024-01-12 "perl v5.38.2" "Perl Programmers Reference Guide" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.if n .ad l +.nh +.SH NAME +Locale::Maketext \- framework for localization +.SH SYNOPSIS +.IX Header "SYNOPSIS" +.Vb 9 +\& package MyProgram; +\& use strict; +\& use MyProgram::L10N; +\& # ...which inherits from Locale::Maketext +\& my $lh = MyProgram::L10N\->get_handle() || die "What language?"; +\& ... +\& # And then any messages your program emits, like: +\& warn $lh\->maketext( "Can\*(Aqt open file [_1]: [_2]\en", $f, $! ); +\& ... +.Ve +.SH DESCRIPTION +.IX Header "DESCRIPTION" +It is a common feature of applications (whether run directly, +or via the Web) for them to be "localized" \-\- i.e., for them +to a present an English interface to an English-speaker, a German +interface to a German-speaker, and so on for all languages it's +programmed with. Locale::Maketext +is a framework for software localization; it provides you with the +tools for organizing and accessing the bits of text and text-processing +code that you need for producing localized applications. +.PP +In order to make sense of Maketext and how all its +components fit together, you should probably +go read Locale::Maketext::TPJ13, and +\&\fIthen\fR read the following documentation. +.PP +You may also want to read over the source for \f(CW\*(C`File::Findgrep\*(C'\fR +and its constituent modules \-\- they are a complete (if small) +example application that uses Maketext. +.SH "QUICK OVERVIEW" +.IX Header "QUICK OVERVIEW" +The basic design of Locale::Maketext is object-oriented, and +Locale::Maketext is an abstract base class, from which you +derive a "project class". +The project class (with a name like "TkBocciBall::Localize", +which you then use in your module) is in turn the base class +for all the "language classes" for your project +(with names "TkBocciBall::Localize::it", +"TkBocciBall::Localize::en", +"TkBocciBall::Localize::fr", etc.). +.PP +A language class is +a class containing a lexicon of phrases as class data, +and possibly also some methods that are of use in interpreting +phrases in the lexicon, or otherwise dealing with text in that +language. +.PP +An object belonging to a language class is called a "language +handle"; it's typically a flyweight object. +.PP +The normal course of action is to call: +.PP +.Vb 6 +\& use TkBocciBall::Localize; # the localization project class +\& $lh = TkBocciBall::Localize\->get_handle(); +\& # Depending on the user\*(Aqs locale, etc., this will +\& # make a language handle from among the classes available, +\& # and any defaults that you declare. +\& die "Couldn\*(Aqt make a language handle??" unless $lh; +.Ve +.PP +From then on, you use the \f(CW\*(C`maketext\*(C'\fR function to access +entries in whatever lexicon(s) belong to the language handle +you got. So, this: +.PP +.Vb 1 +\& print $lh\->maketext("You won!"), "\en"; +.Ve +.PP +\&...emits the right text for this language. If the object +in \f(CW$lh\fR belongs to class "TkBocciBall::Localize::fr" and +\&\f(CW%TkBocciBall::Localize::fr::Lexicon\fR contains \f(CW\*(C`("You won!" +=> "Tu as gagné!")\*(C'\fR, then the above +code happily tells the user "Tu as gagné!". +.SH METHODS +.IX Header "METHODS" +Locale::Maketext offers a variety of methods, which fall +into three categories: +.IP \(bu 4 +Methods to do with constructing language handles. +.IP \(bu 4 +\&\f(CW\*(C`maketext\*(C'\fR and other methods to do with accessing \f(CW%Lexicon\fR data +for a given language handle. +.IP \(bu 4 +Methods that you may find it handy to use, from routines of +yours that you put in \f(CW%Lexicon\fR entries. +.PP +These are covered in the following section. +.SS "Construction Methods" +.IX Subsection "Construction Methods" +These are to do with constructing a language handle: +.IP \(bu 4 +\&\f(CW$lh\fR = YourProjClass\->get_handle( ...langtags... ) || die "lg-handle?"; +.Sp +This tries loading classes based on the language-tags you give (like +\&\f(CW\*(C`("en\-US", "sk", "kon", "es\-MX", "ja", "i\-klingon")\*(C'\fR, and for the first class +that succeeds, returns YourProjClass::\fIlanguage\fR\->\fBnew()\fR. +.Sp +If it runs thru the entire given list of language-tags, and finds no classes +for those exact terms, it then tries "superordinate" language classes. +So if no "en-US" class (i.e., YourProjClass::en_us) +was found, nor classes for anything else in that list, we then try +its superordinate, "en" (i.e., YourProjClass::en), and so on thru +the other language-tags in the given list: "es". +(The other language-tags in our example list: +happen to have no superordinates.) +.Sp +If none of those language-tags leads to loadable classes, we then +try classes derived from YourProjClass\->\fBfallback_languages()\fR and +then if nothing comes of that, we use classes named by +YourProjClass\->\fBfallback_language_classes()\fR. Then in the (probably +quite unlikely) event that that fails, we just return undef. +.IP \(bu 4 +\&\f(CW$lh\fR = YourProjClass\->get_handle\fB()\fR || die "lg-handle?"; +.Sp +When \f(CW\*(C`get_handle\*(C'\fR is called with an empty parameter list, magic happens: +.Sp +If \f(CW\*(C`get_handle\*(C'\fR senses that it's running in program that was +invoked as a CGI, then it tries to get language-tags out of the +environment variable "HTTP_ACCEPT_LANGUAGE", and it pretends that +those were the languages passed as parameters to \f(CW\*(C`get_handle\*(C'\fR. +.Sp +Otherwise (i.e., if not a CGI), this tries various OS-specific ways +to get the language-tags for the current locale/language, and then +pretends that those were the value(s) passed to \f(CW\*(C`get_handle\*(C'\fR. +.Sp +Currently this OS-specific stuff consists of looking in the environment +variables "LANG" and "LANGUAGE"; and on MSWin machines (where those +variables are typically unused), this also tries using +the module Win32::Locale to get a language-tag for whatever language/locale +is currently selected in the "Regional Settings" (or "International"?) +Control Panel. I welcome further +suggestions for making this do the Right Thing under other operating +systems that support localization. +.Sp +If you're using localization in an application that keeps a configuration +file, you might consider something like this in your project class: +.Sp +.Vb 10 +\& sub get_handle_via_config { +\& my $class = $_[0]; +\& my $chosen_language = $Config_settings{\*(Aqlanguage\*(Aq}; +\& my $lh; +\& if($chosen_language) { +\& $lh = $class\->get_handle($chosen_language) +\& || die "No language handle for \e"$chosen_language\e"" +\& . " or the like"; +\& } else { +\& # Config file missing, maybe? +\& $lh = $class\->get_handle() +\& || die "Can\*(Aqt get a language handle"; +\& } +\& return $lh; +\& } +.Ve +.IP \(bu 4 +\&\f(CW$lh\fR = YourProjClass::langname\->\fBnew()\fR; +.Sp +This constructs a language handle. You usually \fBdon't\fR call this +directly, but instead let \f(CW\*(C`get_handle\*(C'\fR find a language class to \f(CW\*(C`use\*(C'\fR +and to then call \->new on. +.IP \(bu 4 +\&\f(CW$lh\fR\->\fBinit()\fR; +.Sp +This is called by \->new to initialize newly-constructed language handles. +If you define an init method in your class, remember that it's usually +considered a good idea to call \f(CW$lh\fR\->SUPER::init in it (presumably at the +beginning), so that all classes get a chance to initialize a new object +however they see fit. +.IP \(bu 4 +YourProjClass\->\fBfallback_languages()\fR +.Sp +\&\f(CW\*(C`get_handle\*(C'\fR appends the return value of this to the end of +whatever list of languages you pass \f(CW\*(C`get_handle\*(C'\fR. Unless +you override this method, your project class +will inherit Locale::Maketext's \f(CW\*(C`fallback_languages\*(C'\fR, which +currently returns \f(CW\*(C`(\*(Aqi\-default\*(Aq, \*(Aqen\*(Aq, \*(Aqen\-US\*(Aq)\*(C'\fR. +("i\-default" is defined in RFC 2277). +.Sp +This method (by having it return the name +of a language-tag that has an existing language class) +can be used for making sure that +\&\f(CW\*(C`get_handle\*(C'\fR will always manage to construct a language +handle (assuming your language classes are in an appropriate +\&\f(CW@INC\fR directory). Or you can use the next method: +.IP \(bu 4 +YourProjClass\->\fBfallback_language_classes()\fR +.Sp +\&\f(CW\*(C`get_handle\*(C'\fR appends the return value of this to the end +of the list of classes it will try using. Unless +you override this method, your project class +will inherit Locale::Maketext's \f(CW\*(C`fallback_language_classes\*(C'\fR, +which currently returns an empty list, \f(CW\*(C`()\*(C'\fR. +By setting this to some value (namely, the name of a loadable +language class), you can be sure that +\&\f(CW\*(C`get_handle\*(C'\fR will always manage to construct a language +handle. +.SS "The ""maketext"" Method" +.IX Subsection "The ""maketext"" Method" +This is the most important method in Locale::Maketext: +.PP +.Vb 1 +\& $text = $lh\->maketext(I<key>, ...parameters for this phrase...); +.Ve +.PP +This looks in the \f(CW%Lexicon\fR of the language handle +\&\f(CW$lh\fR and all its superclasses, looking +for an entry whose key is the string \fIkey\fR. Assuming such +an entry is found, various things then happen, depending on the +value found: +.PP +If the value is a scalarref, the scalar is dereferenced and returned +(and any parameters are ignored). +.PP +If the value is a coderef, we return &$value($lh, ...parameters...). +.PP +If the value is a string that \fIdoesn't\fR look like it's in Bracket Notation, +we return it (after replacing it with a scalarref, in its \f(CW%Lexicon\fR). +.PP +If the value \fIdoes\fR look like it's in Bracket Notation, then we compile +it into a sub, replace the string in the \f(CW%Lexicon\fR with the new coderef, +and then we return &$new_sub($lh, ...parameters...). +.PP +Bracket Notation is discussed in a later section. Note +that trying to compile a string into Bracket Notation can throw +an exception if the string is not syntactically valid (say, by not +balancing brackets right.) +.PP +Also, calling &$coderef($lh, ...parameters...) can throw any sort of +exception (if, say, code in that sub tries to divide by zero). But +a very common exception occurs when you have Bracket +Notation text that says to call a method "foo", but there is no such +method. (E.g., "You have [qua\fBtn\fR,_1,ball]." will throw an exception +on trying to call \f(CW$lh\fR\->qua\fBtn\fR($_[1],'ball') \-\- you presumably meant +"quant".) \f(CW\*(C`maketext\*(C'\fR catches these exceptions, but only to make the +error message more readable, at which point it rethrows the exception. +.PP +An exception \fImay\fR be thrown if \fIkey\fR is not found in any +of \f(CW$lh\fR's \f(CW%Lexicon\fR hashes. What happens if a key is not found, +is discussed in a later section, "Controlling Lookup Failure". +.PP +Note that you might find it useful in some cases to override +the \f(CW\*(C`maketext\*(C'\fR method with an "after method", if you want to +translate encodings, or even scripts: +.PP +.Vb 7 +\& package YrProj::zh_cn; # Chinese with PRC\-style glyphs +\& use base (\*(AqYrProj::zh_tw\*(Aq); # Taiwan\-style +\& sub maketext { +\& my $self = shift(@_); +\& my $value = $self\->maketext(@_); +\& return Chineeze::taiwan2mainland($value); +\& } +.Ve +.PP +Or you may want to override it with something that traps +any exceptions, if that's critical to your program: +.PP +.Vb 7 +\& sub maketext { +\& my($lh, @stuff) = @_; +\& my $out; +\& eval { $out = $lh\->SUPER::maketext(@stuff) }; +\& return $out unless $@; +\& ...otherwise deal with the exception... +\& } +.Ve +.PP +Other than those two situations, I don't imagine that +it's useful to override the \f(CW\*(C`maketext\*(C'\fR method. (If +you run into a situation where it is useful, I'd be +interested in hearing about it.) +.ie n .IP "$lh\->fail_with \fIor\fR $lh\->fail_with(\fIPARAM\fR)" 4 +.el .IP "\f(CW$lh\fR\->fail_with \fIor\fR \f(CW$lh\fR\->fail_with(\fIPARAM\fR)" 4 +.IX Item "$lh->fail_with or $lh->fail_with(PARAM)" +.PD 0 +.ie n .IP $lh\->failure_handler_auto 4 +.el .IP \f(CW$lh\fR\->failure_handler_auto 4 +.IX Item "$lh->failure_handler_auto" +.PD +These two methods are discussed in the section "Controlling +Lookup Failure". +.ie n .IP "$lh\->denylist(@list) <or> $lh\->blacklist(@list)" 4 +.el .IP "\f(CW$lh\fR\->denylist(@list) <or> \f(CW$lh\fR\->blacklist(@list)" 4 +.IX Item "$lh->denylist(@list) <or> $lh->blacklist(@list)" +.PD 0 +.ie n .IP "$lh\->allowlist(@list) <or> $lh\->whitelist(@list)" 4 +.el .IP "\f(CW$lh\fR\->allowlist(@list) <or> \f(CW$lh\fR\->whitelist(@list)" 4 +.IX Item "$lh->allowlist(@list) <or> $lh->whitelist(@list)" +.PD +These methods are discussed in the section "Bracket Notation +Security". +.SS "Utility Methods" +.IX Subsection "Utility Methods" +These are methods that you may find it handy to use, generally +from \f(CW%Lexicon\fR routines of yours (whether expressed as +Bracket Notation or not). +.ie n .IP "$language\->quant($number, $singular)" 4 +.el .IP "\f(CW$language\fR\->quant($number, \f(CW$singular\fR)" 4 +.IX Item "$language->quant($number, $singular)" +.PD 0 +.ie n .IP "$language\->quant($number, $singular, $plural)" 4 +.el .IP "\f(CW$language\fR\->quant($number, \f(CW$singular\fR, \f(CW$plural\fR)" 4 +.IX Item "$language->quant($number, $singular, $plural)" +.ie n .IP "$language\->quant($number, $singular, $plural, $negative)" 4 +.el .IP "\f(CW$language\fR\->quant($number, \f(CW$singular\fR, \f(CW$plural\fR, \f(CW$negative\fR)" 4 +.IX Item "$language->quant($number, $singular, $plural, $negative)" +.PD +This is generally meant to be called from inside Bracket Notation +(which is discussed later), as in +.Sp +.Vb 1 +\& "Your search matched [quant,_1,document]!" +.Ve +.Sp +It's for \fIquantifying\fR a noun (i.e., saying how much of it there is, +while giving the correct form of it). The behavior of this method is +handy for English and a few other Western European languages, and you +should override it for languages where it's not suitable. You can feel +free to read the source, but the current implementation is basically +as this pseudocode describes: +.Sp +.Vb 11 +\& if $number is 0 and there\*(Aqs a $negative, +\& return $negative; +\& elsif $number is 1, +\& return "1 $singular"; +\& elsif there\*(Aqs a $plural, +\& return "$number $plural"; +\& else +\& return "$number " . $singular . "s"; +\& # +\& # ...except that we actually call numf to +\& # stringify $number before returning it. +.Ve +.Sp +So for English (with Bracket Notation) +\&\f(CW"...[quant,_1,file]..."\fR is fine (for 0 it returns "0 files", +for 1 it returns "1 file", and for more it returns "2 files", etc.) +.Sp +But for "directory", you'd want \f(CW"[quant,_1,directory,directories]"\fR +so that our elementary \f(CW\*(C`quant\*(C'\fR method doesn't think that the +plural of "directory" is "directorys". And you might find that the +output may sound better if you specify a negative form, as in: +.Sp +.Vb 1 +\& "[quant,_1,file,files,No files] matched your query.\en" +.Ve +.Sp +Remember to keep in mind verb agreement (or adjectives too, in +other languages), as in: +.Sp +.Vb 1 +\& "[quant,_1,document] were matched.\en" +.Ve +.Sp +Because if _1 is one, you get "1 document \fBwere\fR matched". +An acceptable hack here is to do something like this: +.Sp +.Vb 1 +\& "[quant,_1,document was, documents were] matched.\en" +.Ve +.ie n .IP $language\->numf($number) 4 +.el .IP \f(CW$language\fR\->numf($number) 4 +.IX Item "$language->numf($number)" +This returns the given number formatted nicely according to +this language's conventions. Maketext's default method is +mostly to just take the normal string form of the number +(applying sprintf "%G" for only very large numbers), and then +to add commas as necessary. (Except that +we apply \f(CW\*(C`tr/,./.,/\*(C'\fR if \f(CW$language\fR\->{'numf_comma'} is true; +that's a bit of a hack that's useful for languages that express +two million as "2.000.000" and not as "2,000,000"). +.Sp +If you want anything fancier, consider overriding this with something +that uses Number::Format, or does something else +entirely. +.Sp +Note that numf is called by quant for stringifying all quantifying +numbers. +.ie n .IP "$language\->numerate($number, $singular, $plural, $negative)" 4 +.el .IP "\f(CW$language\fR\->numerate($number, \f(CW$singular\fR, \f(CW$plural\fR, \f(CW$negative\fR)" 4 +.IX Item "$language->numerate($number, $singular, $plural, $negative)" +This returns the given noun form which is appropriate for the quantity +\&\f(CW$number\fR according to this language's conventions. \f(CW\*(C`numerate\*(C'\fR is +used internally by \f(CW\*(C`quant\*(C'\fR to quantify nouns. Use it directly \-\- +usually from bracket notation \-\- to avoid \f(CW\*(C`quant\*(C'\fR's implicit call to +\&\f(CW\*(C`numf\*(C'\fR and output of a numeric quantity. +.ie n .IP "$language\->sprintf($format, @items)" 4 +.el .IP "\f(CW$language\fR\->sprintf($format, \f(CW@items\fR)" 4 +.IX Item "$language->sprintf($format, @items)" +This is just a wrapper around Perl's normal \f(CW\*(C`sprintf\*(C'\fR function. +It's provided so that you can use "sprintf" in Bracket Notation: +.Sp +.Vb 1 +\& "Couldn\*(Aqt access datanode [sprintf,%10x=~[%s~],_1,_2]!\en" +.Ve +.Sp +returning... +.Sp +.Vb 1 +\& Couldn\*(Aqt access datanode Stuff=[thangamabob]! +.Ve +.ie n .IP $language\->\fBlanguage_tag()\fR 4 +.el .IP \f(CW$language\fR\->\fBlanguage_tag()\fR 4 +.IX Item "$language->language_tag()" +Currently this just takes the last bit of \f(CWref($language)\fR, turns +underscores to dashes, and returns it. So if \f(CW$language\fR is +an object of class Hee::HOO::Haw::en_us, \f(CW$language\fR\->\fBlanguage_tag()\fR +returns "en-us". (Yes, the usual representation for that language +tag is "en-US", but case is \fInever\fR considered meaningful in +language-tag comparison.) +.Sp +You may override this as you like; Maketext doesn't use it for +anything. +.ie n .IP $language\->\fBencoding()\fR 4 +.el .IP \f(CW$language\fR\->\fBencoding()\fR 4 +.IX Item "$language->encoding()" +Currently this isn't used for anything, but it's provided +(with default value of +\&\f(CW\*(C`(ref($language) && $language\->{\*(Aqencoding\*(Aq})) or "iso\-8859\-1"\*(C'\fR +) as a sort of suggestion that it may be useful/necessary to +associate encodings with your language handles (whether on a +per-class or even per-handle basis.) +.SS "Language Handle Attributes and Internals" +.IX Subsection "Language Handle Attributes and Internals" +A language handle is a flyweight object \-\- i.e., it doesn't (necessarily) +carry any data of interest, other than just being a member of +whatever class it belongs to. +.PP +A language handle is implemented as a blessed hash. Subclasses of yours +can store whatever data you want in the hash. Currently the only hash +entry used by any crucial Maketext method is "fail", so feel free to +use anything else as you like. +.PP +\&\fBRemember: Don't be afraid to read the Maketext source if there's +any point on which this documentation is unclear.\fR This documentation +is vastly longer than the module source itself. +.SH "LANGUAGE CLASS HIERARCHIES" +.IX Header "LANGUAGE CLASS HIERARCHIES" +These are Locale::Maketext's assumptions about the class +hierarchy formed by all your language classes: +.IP \(bu 4 +You must have a project base class, which you load, and +which you then use as the first argument in +the call to YourProjClass\->get_handle(...). It should derive +(whether directly or indirectly) from Locale::Maketext. +It \fBdoesn't matter\fR how you name this class, although assuming this +is the localization component of your Super Mega Program, +good names for your project class might be +SuperMegaProgram::Localization, SuperMegaProgram::L10N, +SuperMegaProgram::I18N, SuperMegaProgram::International, +or even SuperMegaProgram::Languages or SuperMegaProgram::Messages. +.IP \(bu 4 +Language classes are what YourProjClass\->get_handle will try to load. +It will look for them by taking each language-tag (\fBskipping\fR it +if it doesn't look like a language-tag or locale-tag!), turning it to +all lowercase, turning dashes to underscores, and appending it +to YourProjClass . "::". So this: +.Sp +.Vb 3 +\& $lh = YourProjClass\->get_handle( +\& \*(Aqen\-US\*(Aq, \*(Aqfr\*(Aq, \*(Aqkon\*(Aq, \*(Aqi\-klingon\*(Aq, \*(Aqi\-klingon\-romanized\*(Aq +\& ); +.Ve +.Sp +will try loading the classes +YourProjClass::en_us (note lowercase!), YourProjClass::fr, +YourProjClass::kon, +YourProjClass::i_klingon +and YourProjClass::i_klingon_romanized. (And it'll stop at the +first one that actually loads.) +.IP \(bu 4 +I assume that each language class derives (directly or indirectly) +from your project class, and also defines its \f(CW@ISA\fR, its \f(CW%Lexicon\fR, +or both. But I anticipate no dire consequences if these assumptions +do not hold. +.IP \(bu 4 +Language classes may derive from other language classes (although they +should have "use \fIThatclassname\fR" or "use base qw(\fI...classes...\fR)"). +They may derive from the project +class. They may derive from some other class altogether. Or via +multiple inheritance, it may derive from any mixture of these. +.IP \(bu 4 +I foresee no problems with having multiple inheritance in +your hierarchy of language classes. (As usual, however, Perl will +complain bitterly if you have a cycle in the hierarchy: i.e., if +any class is its own ancestor.) +.SH "ENTRIES IN EACH LEXICON" +.IX Header "ENTRIES IN EACH LEXICON" +A typical \f(CW%Lexicon\fR entry is meant to signify a phrase, +taking some number (0 or more) of parameters. An entry +is meant to be accessed by via +a string \fIkey\fR in \f(CW$lh\fR\->maketext(\fIkey\fR, ...parameters...), +which should return a string that is generally meant for +be used for "output" to the user \-\- regardless of whether +this actually means printing to STDOUT, writing to a file, +or putting into a GUI widget. +.PP +While the key must be a string value (since that's a basic +restriction that Perl places on hash keys), the value in +the lexicon can currently be of several types: +a defined scalar, scalarref, or coderef. The use of these is +explained above, in the section 'The "maketext" Method', and +Bracket Notation for strings is discussed in the next section. +.PP +While you can use arbitrary unique IDs for lexicon keys +(like "_min_larger_max_error"), it is often +useful for if an entry's key is itself a valid value, like +this example error message: +.PP +.Vb 1 +\& "Minimum ([_1]) is larger than maximum ([_2])!\en", +.Ve +.PP +Compare this code that uses an arbitrary ID... +.PP +.Vb 2 +\& die $lh\->maketext( "_min_larger_max_error", $min, $max ) +\& if $min > $max; +.Ve +.PP +\&...to this code that uses a key-as-value: +.PP +.Vb 4 +\& die $lh\->maketext( +\& "Minimum ([_1]) is larger than maximum ([_2])!\en", +\& $min, $max +\& ) if $min > $max; +.Ve +.PP +The second is, in short, more readable. In particular, it's obvious +that the number of parameters you're feeding to that phrase (two) is +the number of parameters that it \fIwants\fR to be fed. (Since you see +_1 and a _2 being used in the key there.) +.PP +Also, once a project is otherwise +complete and you start to localize it, you can scrape together +all the various keys you use, and pass it to a translator; and then +the translator's work will go faster if what he's presented is this: +.PP +.Vb 2 +\& "Minimum ([_1]) is larger than maximum ([_2])!\en", +\& => "", # fill in something here, Jacques! +.Ve +.PP +rather than this more cryptic mess: +.PP +.Vb 2 +\& "_min_larger_max_error" +\& => "", # fill in something here, Jacques +.Ve +.PP +I think that keys as lexicon values makes the completed lexicon +entries more readable: +.PP +.Vb 2 +\& "Minimum ([_1]) is larger than maximum ([_2])!\en", +\& => "Le minimum ([_1]) est plus grand que le maximum ([_2])!\en", +.Ve +.PP +Also, having valid values as keys becomes very useful if you set +up an _AUTO lexicon. _AUTO lexicons are discussed in a later +section. +.PP +I almost always use keys that are themselves +valid lexicon values. One notable exception is when the value is +quite long. For example, to get the screenful of data that +a command-line program might return when given an unknown switch, +I often just use a brief, self-explanatory key such as "_USAGE_MESSAGE". At that point I then go +and immediately to define that lexicon entry in the +ProjectClass::L10N::en lexicon (since English is always my "project +language"): +.PP +.Vb 3 +\& \*(Aq_USAGE_MESSAGE\*(Aq => <<\*(AqEOSTUFF\*(Aq, +\& ...long long message... +\& EOSTUFF +.Ve +.PP +and then I can use it as: +.PP +.Vb 1 +\& getopt(\*(AqoDI\*(Aq, \e%opts) or die $lh\->maketext(\*(Aq_USAGE_MESSAGE\*(Aq); +.Ve +.PP +Incidentally, +note that each class's \f(CW%Lexicon\fR inherits-and-extends +the lexicons in its superclasses. This is not because these are +special hashes \fIper se\fR, but because you access them via the +\&\f(CW\*(C`maketext\*(C'\fR method, which looks for entries across all the +\&\f(CW%Lexicon\fR hashes in a language class \fIand\fR all its ancestor classes. +(This is because the idea of "class data" isn't directly implemented +in Perl, but is instead left to individual class-systems to implement +as they see fit..) +.PP +Note that you may have things stored in a lexicon +besides just phrases for output: for example, if your program +takes input from the keyboard, asking a "(Y/N)" question, +you probably need to know what the equivalent of "Y[es]/N[o]" is +in whatever language. You probably also need to know what +the equivalents of the answers "y" and "n" are. You can +store that information in the lexicon (say, under the keys +"~answer_y" and "~answer_n", and the long forms as +"~answer_yes" and "~answer_no", where "~" is just an ad-hoc +character meant to indicate to programmers/translators that +these are not phrases for output). +.PP +Or instead of storing this in the language class's lexicon, +you can (and, in some cases, really should) represent the same bit +of knowledge as code in a method in the language class. (That +leaves a tidy distinction between the lexicon as the things we +know how to \fIsay\fR, and the rest of the things in the lexicon class +as things that we know how to \fIdo\fR.) Consider +this example of a processor for responses to French "oui/non" +questions: +.PP +.Vb 7 +\& sub y_or_n { +\& return undef unless defined $_[1] and length $_[1]; +\& my $answer = lc $_[1]; # smash case +\& return 1 if $answer eq \*(Aqo\*(Aq or $answer eq \*(Aqoui\*(Aq; +\& return 0 if $answer eq \*(Aqn\*(Aq or $answer eq \*(Aqnon\*(Aq; +\& return undef; +\& } +.Ve +.PP +\&...which you'd then call in a construct like this: +.PP +.Vb 7 +\& my $response; +\& until(defined $response) { +\& print $lh\->maketext("Open the pod bay door (y/n)? "); +\& $response = $lh\->y_or_n( get_input_from_keyboard_somehow() ); +\& } +\& if($response) { $pod_bay_door\->open() } +\& else { $pod_bay_door\->leave_closed() } +.Ve +.PP +Other data worth storing in a lexicon might be things like +filenames for language-targetted resources: +.PP +.Vb 10 +\& ... +\& "_main_splash_png" +\& => "/styles/en_us/main_splash.png", +\& "_main_splash_imagemap" +\& => "/styles/en_us/main_splash.incl", +\& "_general_graphics_path" +\& => "/styles/en_us/", +\& "_alert_sound" +\& => "/styles/en_us/hey_there.wav", +\& "_forward_icon" +\& => "left_arrow.png", +\& "_backward_icon" +\& => "right_arrow.png", +\& # In some other languages, left equals +\& # BACKwards, and right is FOREwards. +\& ... +.Ve +.PP +You might want to do the same thing for expressing key bindings +or the like (since hardwiring "q" as the binding for the function +that quits a screen/menu/program is useful only if your language +happens to associate "q" with "quit"!) +.SH "BRACKET NOTATION" +.IX Header "BRACKET NOTATION" +Bracket Notation is a crucial feature of Locale::Maketext. I mean +Bracket Notation to provide a replacement for the use of sprintf formatting. +Everything you do with Bracket Notation could be done with a sub block, +but bracket notation is meant to be much more concise. +.PP +Bracket Notation is a like a miniature "template" system (in the sense +of Text::Template, not in the sense of C++ templates), +where normal text is passed thru basically as is, but text in special +regions is specially interpreted. In Bracket Notation, you use square brackets ("[...]"), +not curly braces ("{...}") to note sections that are specially interpreted. +.PP +For example, here all the areas that are taken literally are underlined with +a "^", and all the in-bracket special regions are underlined with an X: +.PP +.Vb 2 +\& "Minimum ([_1]) is larger than maximum ([_2])!\en", +\& ^^^^^^^^^ XX ^^^^^^^^^^^^^^^^^^^^^^^^^^ XX ^^^^ +.Ve +.PP +When that string is compiled from bracket notation into a real Perl sub, +it's basically turned into: +.PP +.Vb 11 +\& sub { +\& my $lh = $_[0]; +\& my @params = @_; +\& return join \*(Aq\*(Aq, +\& "Minimum (", +\& ...some code here... +\& ") is larger than maximum (", +\& ...some code here... +\& ")!\en", +\& } +\& # to be called by $lh\->maketext(KEY, params...) +.Ve +.PP +In other words, text outside bracket groups is turned into string +literals. Text in brackets is rather more complex, and currently follows +these rules: +.IP \(bu 4 +Bracket groups that are empty, or which consist only of whitespace, +are ignored. (Examples: "[]", "[ ]", or a [ and a ] with returns +and/or tabs and/or spaces between them. +.Sp +Otherwise, each group is taken to be a comma-separated group of items, +and each item is interpreted as follows: +.IP \(bu 4 +An item that is "_\fIdigits\fR" or "_\-\fIdigits\fR" is interpreted as +\&\f(CW$_\fR[\fIvalue\fR]. I.e., "_1" becomes with \f(CW$_\fR[1], and "_\-3" is interpreted +as \f(CW$_\fR[\-3] (in which case \f(CW@_\fR should have at least three elements in it). +Note that \f(CW$_\fR[0] is the language handle, and is typically not named +directly. +.IP \(bu 4 +An item "_*" is interpreted to mean "all of \f(CW@_\fR except \f(CW$_\fR[0]". +I.e., \f(CW@_[1..$#_]\fR. Note that this is an empty list in the case +of calls like \f(CW$lh\fR\->maketext(\fIkey\fR) where there are no +parameters (except \f(CW$_\fR[0], the language handle). +.IP \(bu 4 +Otherwise, each item is interpreted as a string literal. +.PP +The group as a whole is interpreted as follows: +.IP \(bu 4 +If the first item in a bracket group looks like a method name, +then that group is interpreted like this: +.Sp +.Vb 3 +\& $lh\->that_method_name( +\& ...rest of items in this group... +\& ), +.Ve +.IP \(bu 4 +If the first item in a bracket group is "*", it's taken as shorthand +for the so commonly called "quant" method. Similarly, if the first +item in a bracket group is "#", it's taken to be shorthand for +"numf". +.IP \(bu 4 +If the first item in a bracket group is the empty-string, or "_*" +or "_\fIdigits\fR" or "_\-\fIdigits\fR", then that group is interpreted +as just the interpolation of all its items: +.Sp +.Vb 3 +\& join(\*(Aq\*(Aq, +\& ...rest of items in this group... +\& ), +.Ve +.Sp +Examples: "[_1]" and "[,_1]", which are synonymous; and +"\f(CW\*(C`[,ID\-(,_4,\-,_2,)]\*(C'\fR", which compiles as +\&\f(CW\*(C`join "", "ID\-(", $_[4], "\-", $_[2], ")"\*(C'\fR. +.IP \(bu 4 +Otherwise this bracket group is invalid. For example, in the group +"[!@#,whatever]", the first item \f(CW"!@#"\fR is neither the empty-string, +"_\fInumber\fR", "_\-\fInumber\fR", "_*", nor a valid method name; and so +Locale::Maketext will throw an exception of you try compiling an +expression containing this bracket group. +.PP +Note, incidentally, that items in each group are comma-separated, +not \f(CW\*(C`/\es*,\es*/\*(C'\fR\-separated. That is, you might expect that this +bracket group: +.PP +.Vb 1 +\& "Hoohah [foo, _1 , bar ,baz]!" +.Ve +.PP +would compile to this: +.PP +.Vb 7 +\& sub { +\& my $lh = $_[0]; +\& return join \*(Aq\*(Aq, +\& "Hoohah ", +\& $lh\->foo( $_[1], "bar", "baz"), +\& "!", +\& } +.Ve +.PP +But it actually compiles as this: +.PP +.Vb 7 +\& sub { +\& my $lh = $_[0]; +\& return join \*(Aq\*(Aq, +\& "Hoohah ", +\& $lh\->foo(" _1 ", " bar ", "baz"), # note the <space> in " bar " +\& "!", +\& } +.Ve +.PP +In the notation discussed so far, the characters "[" and "]" are given +special meaning, for opening and closing bracket groups, and "," has +a special meaning inside bracket groups, where it separates items in the +group. This begs the question of how you'd express a literal "[" or +"]" in a Bracket Notation string, and how you'd express a literal +comma inside a bracket group. For this purpose I've adopted "~" (tilde) +as an escape character: "~[" means a literal '[' character anywhere +in Bracket Notation (i.e., regardless of whether you're in a bracket +group or not), and ditto for "~]" meaning a literal ']', and "~," meaning +a literal comma. (Altho "," means a literal comma outside of +bracket groups \-\- it's only inside bracket groups that commas are special.) +.PP +And on the off chance you need a literal tilde in a bracket expression, +you get it with "~~". +.PP +Currently, an unescaped "~" before a character +other than a bracket or a comma is taken to mean just a "~" and that +character. I.e., "~X" means the same as "~~X" \-\- i.e., one literal tilde, +and then one literal "X". However, by using "~X", you are assuming that +no future version of Maketext will use "~X" as a magic escape sequence. +In practice this is not a great problem, since first off you can just +write "~~X" and not worry about it; second off, I doubt I'll add lots +of new magic characters to bracket notation; and third off, you +aren't likely to want literal "~" characters in your messages anyway, +since it's not a character with wide use in natural language text. +.PP +Brackets must be balanced \-\- every openbracket must have +one matching closebracket, and vice versa. So these are all \fBinvalid\fR: +.PP +.Vb 4 +\& "I ate [quant,_1,rhubarb pie." +\& "I ate [quant,_1,rhubarb pie[." +\& "I ate quant,_1,rhubarb pie]." +\& "I ate quant,_1,rhubarb pie[." +.Ve +.PP +Currently, bracket groups do not nest. That is, you \fBcannot\fR say: +.PP +.Vb 1 +\& "Foo [bar,baz,[quux,quuux]]\en"; +.Ve +.PP +If you need a notation that's that powerful, use normal Perl: +.PP +.Vb 11 +\& %Lexicon = ( +\& ... +\& "some_key" => sub { +\& my $lh = $_[0]; +\& join \*(Aq\*(Aq, +\& "Foo ", +\& $lh\->bar(\*(Aqbaz\*(Aq, $lh\->quux(\*(Aqquuux\*(Aq)), +\& "\en", +\& }, +\& ... +\& ); +.Ve +.PP +Or write the "bar" method so you don't need to pass it the +output from calling quux. +.PP +I do not anticipate that you will need (or particularly want) +to nest bracket groups, but you are welcome to email me with +convincing (real-life) arguments to the contrary. +.SH "BRACKET NOTATION SECURITY" +.IX Header "BRACKET NOTATION SECURITY" +Locale::Maketext does not use any special syntax to differentiate +bracket notation methods from normal class or object methods. This +design makes it vulnerable to format string attacks whenever it is +used to process strings provided by untrusted users. +.PP +Locale::Maketext does support denylist and allowlist functionality +to limit which methods may be called as bracket notation methods. +.PP +By default, Locale::Maketext denies all methods in the +Locale::Maketext namespace that begin with the '_' character, +and all methods which include Perl's namespace separator characters. +.PP +The default denylist for Locale::Maketext also prevents use of the +following methods in bracket notation: +.PP +.Vb 10 +\& denylist +\& encoding +\& fail_with +\& failure_handler_auto +\& fallback_language_classes +\& fallback_languages +\& get_handle +\& init +\& language_tag +\& maketext +\& new +\& allowlist +\& whitelist +\& blacklist +.Ve +.PP +This list can be extended by either deny-listing additional "known bad" +methods, or allow-listing only "known good" methods. +.PP +To prevent specific methods from being called in bracket notation, use +the \fBdenylist()\fR method: +.PP +.Vb 3 +\& my $lh = MyProgram::L10N\->get_handle(); +\& $lh\->denylist(qw{my_internal_method my_other_method}); +\& $lh\->maketext(\*(Aq[my_internal_method]\*(Aq); # dies +.Ve +.PP +To limit the allowed bracked notation methods to a specific list, use the +\&\fBallowlist()\fR method: +.PP +.Vb 4 +\& my $lh = MyProgram::L10N\->get_handle(); +\& $lh\->allowlist(\*(Aqnumerate\*(Aq, \*(Aqnumf\*(Aq); +\& $lh\->maketext(\*(Aq[_1] [numerate, _1,shoe,shoes]\*(Aq, 12); # works +\& $lh\->maketext(\*(Aq[my_internal_method]\*(Aq); # dies +.Ve +.PP +The \fBdenylist()\fR and \fBallowlist()\fR methods extend their internal lists +whenever they are called. To reset the denylist or allowlist, create +a new maketext object. +.PP +.Vb 4 +\& my $lh = MyProgram::L10N\->get_handle(); +\& $lh\->denylist(\*(Aqnumerate\*(Aq); +\& $lh\->denylist(\*(Aqnumf\*(Aq); +\& $lh\->maketext(\*(Aq[_1] [numerate,_1,shoe,shoes]\*(Aq, 12); # dies +.Ve +.PP +For lexicons that use an internal cache, translations which have already +been cached in their compiled form are not affected by subsequent changes +to the allowlist or denylist settings. Lexicons that use an external +cache will have their cache cleared whenever the allowlist or denylist +settings change. The difference between the two types of caching is explained +in the "Readonly Lexicons" section. +.PP +Methods disallowed by the denylist cannot be permitted by the +allowlist. +.PP +NOTE: \fBdenylist()\fR is the preferred method name to use instead of the +historical and non-inclusive method \fBblacklist()\fR. \fBblacklist()\fR may be +removed in a future release of this package and so it's use should be +removed from usage. +.PP +NOTE: \fBallowlist()\fR is the preferred method name to use instead of the +historical and non-inclusive method \fBwhitelist()\fR. \fBwhitelist()\fR may be +removed in a future release of this package and so it's use should be +removed from usage. +.SH "AUTO LEXICONS" +.IX Header "AUTO LEXICONS" +If maketext goes to look in an individual \f(CW%Lexicon\fR for an entry +for \fIkey\fR (where \fIkey\fR does not start with an underscore), and +sees none, \fBbut does see\fR an entry of "_AUTO" => \fIsome_true_value\fR, +then we actually define \f(CW$Lexicon\fR{\fIkey\fR} = \fIkey\fR right then and there, +and then use that value as if it had been there all +along. This happens before we even look in any superclass \f(CW%Lexicons\fR! +.PP +(This is meant to be somewhat like the AUTOLOAD mechanism in +Perl's function call system \-\- or, looked at another way, +like the AutoLoader module.) +.PP +I can picture all sorts of circumstances where you just +do not want lookup to be able to fail (since failing +normally means that maketext throws a \f(CW\*(C`die\*(C'\fR, although +see the next section for greater control over that). But +here's one circumstance where _AUTO lexicons are meant to +be \fIespecially\fR useful: +.PP +As you're writing an application, you decide as you go what messages +you need to emit. Normally you'd go to write this: +.PP +.Vb 5 +\& if(\-e $filename) { +\& go_process_file($filename) +\& } else { +\& print qq{Couldn\*(Aqt find file "$filename"!\en}; +\& } +.Ve +.PP +but since you anticipate localizing this, you write: +.PP +.Vb 10 +\& use ThisProject::I18N; +\& my $lh = ThisProject::I18N\->get_handle(); +\& # For the moment, assume that things are set up so +\& # that we load class ThisProject::I18N::en +\& # and that that\*(Aqs the class that $lh belongs to. +\& ... +\& if(\-e $filename) { +\& go_process_file($filename) +\& } else { +\& print $lh\->maketext( +\& qq{Couldn\*(Aqt find file "[_1]"!\en}, $filename +\& ); +\& } +.Ve +.PP +Now, right after you've just written the above lines, you'd +normally have to go open the file +ThisProject/I18N/en.pm, and immediately add an entry: +.PP +.Vb 2 +\& "Couldn\*(Aqt find file \e"[_1]\e"!\en" +\& => "Couldn\*(Aqt find file \e"[_1]\e"!\en", +.Ve +.PP +But I consider that somewhat of a distraction from the work +of getting the main code working \-\- to say nothing of the fact +that I often have to play with the program a few times before +I can decide exactly what wording I want in the messages (which +in this case would require me to go changing three lines of code: +the call to maketext with that key, and then the two lines in +ThisProject/I18N/en.pm). +.PP +However, if you set "_AUTO => 1" in the \f(CW%Lexicon\fR in, +ThisProject/I18N/en.pm (assuming that English (en) is +the language that all your programmers will be using for this +project's internal message keys), then you don't ever have to +go adding lines like this +.PP +.Vb 2 +\& "Couldn\*(Aqt find file \e"[_1]\e"!\en" +\& => "Couldn\*(Aqt find file \e"[_1]\e"!\en", +.Ve +.PP +to ThisProject/I18N/en.pm, because if _AUTO is true there, +then just looking for an entry with the key "Couldn't find +file \e"[_1]\e"!\en" in that lexicon will cause it to be added, +with that value! +.PP +Note that the reason that keys that start with "_" +are immune to _AUTO isn't anything generally magical about +the underscore character \-\- I just wanted a way to have most +lexicon keys be autoable, except for possibly a few, and I +arbitrarily decided to use a leading underscore as a signal +to distinguish those few. +.SH "READONLY LEXICONS" +.IX Header "READONLY LEXICONS" +If your lexicon is a tied hash the simple act of caching the compiled value can be fatal. +.PP +For example a GDBM_File GDBM_READER tied hash will die with something like: +.PP +.Vb 1 +\& gdbm store returned \-1, errno 2, key "..." at ... +.Ve +.PP +All you need to do is turn on caching outside of the lexicon hash itself like so: +.PP +.Vb 6 +\& sub init { +\& my ($lh) = @_; +\& ... +\& $lh\->{\*(Aquse_external_lex_cache\*(Aq} = 1; +\& ... +\& } +.Ve +.PP +And then instead of storing the compiled value in the lexicon hash it will store it in \f(CW$lh\fR\->{'_external_lex_cache'} +.SH "CONTROLLING LOOKUP FAILURE" +.IX Header "CONTROLLING LOOKUP FAILURE" +If you call \f(CW$lh\fR\->maketext(\fIkey\fR, ...parameters...), +and there's no entry \fIkey\fR in \f(CW$lh\fR's class's \f(CW%Lexicon\fR, nor +in the superclass \f(CW%Lexicon\fR hash, \fIand\fR if we can't auto-make +\&\fIkey\fR (because either it starts with a "_", or because none +of its lexicons have \f(CW\*(C`_AUTO => 1,\*(C'\fR), then we have +failed to find a normal way to maketext \fIkey\fR. What then +happens in these failure conditions, depends on the \f(CW$lh\fR object's +"fail" attribute. +.PP +If the language handle has no "fail" attribute, maketext +will simply throw an exception (i.e., it calls \f(CW\*(C`die\*(C'\fR, mentioning +the \fIkey\fR whose lookup failed, and naming the line number where +the calling \f(CW$lh\fR\->maketext(\fIkey\fR,...) was. +.PP +If the language handle has a "fail" attribute whose value is a +coderef, then \f(CW$lh\fR\->maketext(\fIkey\fR,...params...) gives up and calls: +.PP +.Vb 1 +\& return $that_subref\->($lh, $key, @params); +.Ve +.PP +Otherwise, the "fail" attribute's value should be a string denoting +a method name, so that \f(CW$lh\fR\->maketext(\fIkey\fR,...params...) can +give up with: +.PP +.Vb 1 +\& return $lh\->$that_method_name($phrase, @params); +.Ve +.PP +The "fail" attribute can be accessed with the \f(CW\*(C`fail_with\*(C'\fR method: +.PP +.Vb 2 +\& # Set to a coderef: +\& $lh\->fail_with( \e&failure_handler ); +\& +\& # Set to a method name: +\& $lh\->fail_with( \*(Aqfailure_method\*(Aq ); +\& +\& # Set to nothing (i.e., so failure throws a plain exception) +\& $lh\->fail_with( undef ); +\& +\& # Get the current value +\& $handler = $lh\->fail_with(); +.Ve +.PP +Now, as to what you may want to do with these handlers: Maybe you'd +want to log what key failed for what class, and then die. Maybe +you don't like \f(CW\*(C`die\*(C'\fR and instead you want to send the error message +to STDOUT (or wherever) and then merely \f(CWexit()\fR. +.PP +Or maybe you don't want to \f(CW\*(C`die\*(C'\fR at all! Maybe you could use a +handler like this: +.PP +.Vb 10 +\& # Make all lookups fall back onto an English value, +\& # but only after we log it for later fingerpointing. +\& my $lh_backup = ThisProject\->get_handle(\*(Aqen\*(Aq); +\& open(LEX_FAIL_LOG, ">>wherever/lex.log") || die "GNAARGH $!"; +\& sub lex_fail { +\& my($failing_lh, $key, $params) = @_; +\& print LEX_FAIL_LOG scalar(localtime), "\et", +\& ref($failing_lh), "\et", $key, "\en"; +\& return $lh_backup\->maketext($key,@params); +\& } +.Ve +.PP +Some users have expressed that they think this whole mechanism of +having a "fail" attribute at all, seems a rather pointless complication. +But I want Locale::Maketext to be usable for software projects of \fIany\fR +scale and type; and different software projects have different ideas +of what the right thing is to do in failure conditions. I could simply +say that failure always throws an exception, and that if you want to be +careful, you'll just have to wrap every call to \f(CW$lh\fR\->maketext in an +eval\ {\ }. However, I want programmers to reserve the right (via +the "fail" attribute) to treat lookup failure as something other than +an exception of the same level of severity as a config file being +unreadable, or some essential resource being inaccessible. +.PP +One possibly useful value for the "fail" attribute is the method name +"failure_handler_auto". This is a method defined in the class +Locale::Maketext itself. You set it with: +.PP +.Vb 1 +\& $lh\->fail_with(\*(Aqfailure_handler_auto\*(Aq); +.Ve +.PP +Then when you call \f(CW$lh\fR\->maketext(\fIkey\fR, ...parameters...) and +there's no \fIkey\fR in any of those lexicons, maketext gives up with +.PP +.Vb 1 +\& return $lh\->failure_handler_auto($key, @params); +.Ve +.PP +But failure_handler_auto, instead of dying or anything, compiles +\&\f(CW$key\fR, caching it in +.PP +.Vb 1 +\& $lh\->{\*(Aqfailure_lex\*(Aq}{$key} = $compiled +.Ve +.PP +and then calls the compiled value, and returns that. (I.e., if +\&\f(CW$key\fR looks like bracket notation, \f(CW$compiled\fR is a sub, and we return +&{$compiled}(@params); but if \f(CW$key\fR is just a plain string, we just +return that.) +.PP +The effect of using "failure_auto_handler" +is like an AUTO lexicon, except that it 1) compiles \f(CW$key\fR even if +it starts with "_", and 2) you have a record in the new hashref +\&\f(CW$lh\fR\->{'failure_lex'} of all the keys that have failed for +this object. This should avoid your program dying \-\- as long +as your keys aren't actually invalid as bracket code, and as +long as they don't try calling methods that don't exist. +.PP +"failure_auto_handler" may not be exactly what you want, but I +hope it at least shows you that maketext failure can be mitigated +in any number of very flexible ways. If you can formalize exactly +what you want, you should be able to express that as a failure +handler. You can even make it default for every object of a given +class, by setting it in that class's init: +.PP +.Vb 9 +\& sub init { +\& my $lh = $_[0]; # a newborn handle +\& $lh\->SUPER::init(); +\& $lh\->fail_with(\*(Aqmy_clever_failure_handler\*(Aq); +\& return; +\& } +\& sub my_clever_failure_handler { +\& ...you clever things here... +\& } +.Ve +.SH "HOW TO USE MAKETEXT" +.IX Header "HOW TO USE MAKETEXT" +Here is a brief checklist on how to use Maketext to localize +applications: +.IP \(bu 4 +Decide what system you'll use for lexicon keys. If you insist, +you can use opaque IDs (if you're nostalgic for \f(CW\*(C`catgets\*(C'\fR), +but I have better suggestions in the +section "Entries in Each Lexicon", above. Assuming you opt for +meaningful keys that double as values (like "Minimum ([_1]) is +larger than maximum ([_2])!\en"), you'll have to settle on what +language those should be in. For the sake of argument, I'll +call this English, specifically American English, "en-US". +.IP \(bu 4 +Create a class for your localization project. This is +the name of the class that you'll use in the idiom: +.Sp +.Vb 2 +\& use Projname::L10N; +\& my $lh = Projname::L10N\->get_handle(...) || die "Language?"; +.Ve +.Sp +Assuming you call your class Projname::L10N, create a class +consisting minimally of: +.Sp +.Vb 3 +\& package Projname::L10N; +\& use base qw(Locale::Maketext); +\& ...any methods you might want all your languages to share... +\& +\& # And, assuming you want the base class to be an _AUTO lexicon, +\& # as is discussed a few sections up: +\& +\& 1; +.Ve +.IP \(bu 4 +Create a class for the language your internal keys are in. Name +the class after the language-tag for that language, in lowercase, +with dashes changed to underscores. Assuming your project's first +language is US English, you should call this Projname::L10N::en_us. +It should consist minimally of: +.Sp +.Vb 6 +\& package Projname::L10N::en_us; +\& use base qw(Projname::L10N); +\& %Lexicon = ( +\& \*(Aq_AUTO\*(Aq => 1, +\& ); +\& 1; +.Ve +.Sp +(For the rest of this section, I'll assume that this "first +language class" of Projname::L10N::en_us has +_AUTO lexicon.) +.IP \(bu 4 +Go and write your program. Everywhere in your program where +you would say: +.Sp +.Vb 1 +\& print "Foobar $thing stuff\en"; +.Ve +.Sp +instead do it thru maketext, using no variable interpolation in +the key: +.Sp +.Vb 1 +\& print $lh\->maketext("Foobar [_1] stuff\en", $thing); +.Ve +.Sp +If you get tired of constantly saying \f(CW\*(C`print $lh\->maketext\*(C'\fR, +consider making a functional wrapper for it, like so: +.Sp +.Vb 7 +\& use Projname::L10N; +\& our $lh; +\& $lh = Projname::L10N\->get_handle(...) || die "Language?"; +\& sub pmt (@) { print( $lh\->maketext(@_)) } +\& # "pmt" is short for "Print MakeText" +\& $Carp::Verbose = 1; +\& # so if maketext fails, we see made the call to pmt +.Ve +.Sp +Besides whole phrases meant for output, anything language-dependent +should be put into the class Projname::L10N::en_us, +whether as methods, or as lexicon entries \-\- this is discussed +in the section "Entries in Each Lexicon", above. +.IP \(bu 4 +Once the program is otherwise done, and once its localization for +the first language works right (via the data and methods in +Projname::L10N::en_us), you can get together the data for translation. +If your first language lexicon isn't an _AUTO lexicon, then you already +have all the messages explicitly in the lexicon (or else you'd be +getting exceptions thrown when you call \f(CW$lh\fR\->maketext to get +messages that aren't in there). But if you were (advisedly) lazy and are +using an _AUTO lexicon, then you've got to make a list of all the phrases +that you've so far been letting _AUTO generate for you. There are very +many ways to assemble such a list. The most straightforward is to simply +grep the source for every occurrence of "maketext" (or calls +to wrappers around it, like the above \f(CW\*(C`pmt\*(C'\fR function), and to log the +following phrase. +.IP \(bu 4 +You may at this point want to consider whether your base class +(Projname::L10N), from which all lexicons inherit from (Projname::L10N::en, +Projname::L10N::es, etc.), should be an _AUTO lexicon. It may be true +that in theory, all needed messages will be in each language class; +but in the presumably unlikely or "impossible" case of lookup failure, +you should consider whether your program should throw an exception, +emit text in English (or whatever your project's first language is), +or some more complex solution as described in the section +"Controlling Lookup Failure", above. +.IP \(bu 4 +Submit all messages/phrases/etc. to translators. +.Sp +(You may, in fact, want to start with localizing to \fIone\fR other language +at first, if you're not sure that you've properly abstracted the +language-dependent parts of your code.) +.Sp +Translators may request clarification of the situation in which a +particular phrase is found. For example, in English we are entirely happy +saying "\fIn\fR files found", regardless of whether we mean "I looked for files, +and found \fIn\fR of them" or the rather distinct situation of "I looked for +something else (like lines in files), and along the way I saw \fIn\fR +files." This may involve rethinking things that you thought quite clear: +should "Edit" on a toolbar be a noun ("editing") or a verb ("to edit")? Is +there already a conventionalized way to express that menu option, separate +from the target language's normal word for "to edit"? +.Sp +In all cases where the very common phenomenon of quantification +(saying "\fIN\fR files", for \fBany\fR value of N) +is involved, each translator should make clear what dependencies the +number causes in the sentence. In many cases, dependency is +limited to words adjacent to the number, in places where you might +expect them ("I found the\-?PLURAL \fIN\fR +empty\-?PLURAL directory\-?PLURAL"), but in some cases there are +unexpected dependencies ("I found\-?PLURAL ..."!) as well as long-distance +dependencies "The \fIN\fR directory\-?PLURAL could not be deleted\-?PLURAL"!). +.Sp +Remind the translators to consider the case where N is 0: +"0 files found" isn't exactly natural-sounding in any language, but it +may be unacceptable in many \-\- or it may condition special +kinds of agreement (similar to English "I didN'T find ANY files"). +.Sp +Remember to ask your translators about numeral formatting in their +language, so that you can override the \f(CW\*(C`numf\*(C'\fR method as +appropriate. Typical variables in number formatting are: what to +use as a decimal point (comma? period?); what to use as a thousands +separator (space? nonbreaking space? comma? period? small +middot? prime? apostrophe?); and even whether the so-called "thousands +separator" is actually for every third digit \-\- I've heard reports of +two hundred thousand being expressible as "2,00,000" for some Indian +(Subcontinental) languages, besides the less surprising "200\ 000", +"200.000", "200,000", and "200'000". Also, using a set of numeral +glyphs other than the usual ASCII "0"\-"9" might be appreciated, as via +\&\f(CW\*(C`tr/0\-9/\ex{0966}\-\ex{096F}/\*(C'\fR for getting digits in Devanagari script +(for Hindi, Konkani, others). +.Sp +The basic \f(CW\*(C`quant\*(C'\fR method that Locale::Maketext provides should be +good for many languages. For some languages, it might be useful +to modify it (or its constituent \f(CW\*(C`numerate\*(C'\fR method) +to take a plural form in the two-argument call to \f(CW\*(C`quant\*(C'\fR +(as in "[quant,_1,files]") if +it's all-around easier to infer the singular form from the plural, than +to infer the plural form from the singular. +.Sp +But for other languages (as is discussed at length +in Locale::Maketext::TPJ13), simple +\&\f(CW\*(C`quant\*(C'\fR/\f(CW\*(C`numf\*(C'\fR is not enough. For the particularly problematic +Slavic languages, what you may need is a method which you provide +with the number, the citation form of the noun to quantify, and +the case and gender that the sentence's syntax projects onto that +noun slot. The method would then be responsible for determining +what grammatical number that numeral projects onto its noun phrase, +and what case and gender it may override the normal case and gender +with; and then it would look up the noun in a lexicon providing +all needed inflected forms. +.IP \(bu 4 +You may also wish to discuss with the translators the question of +how to relate different subforms of the same language tag, +considering how this reacts with \f(CW\*(C`get_handle\*(C'\fR's treatment of +these. For example, if a user accepts interfaces in "en, fr", and +you have interfaces available in "en-US" and "fr", what should +they get? You may wish to resolve this by establishing that "en" +and "en-US" are effectively synonymous, by having one class +zero-derive from the other. +.Sp +For some languages this issue may never come up (Danish is rarely +expressed as "da-DK", but instead is just "da"). And for other +languages, the whole concept of a "generic" form may verge on +being uselessly vague, particularly for interfaces involving voice +media in forms of Arabic or Chinese. +.IP \(bu 4 +Once you've localized your program/site/etc. for all desired +languages, be sure to show the result (whether live, or via +screenshots) to the translators. Once they approve, make every +effort to have it then checked by at least one other speaker of +that language. This holds true even when (or especially when) the +translation is done by one of your own programmers. Some +kinds of systems may be harder to find testers for than others, +depending on the amount of domain-specific jargon and concepts +involved \-\- it's easier to find people who can tell you whether +they approve of your translation for "delete this message" in an +email-via-Web interface, than to find people who can give you +an informed opinion on your translation for "attribute value" +in an XML query tool's interface. +.SH "SEE ALSO" +.IX Header "SEE ALSO" +I recommend reading all of these: +.PP +Locale::Maketext::TPJ13 \-\- my \fIThe Perl +Journal\fR article about Maketext. It explains many important concepts +underlying Locale::Maketext's design, and some insight into why +Maketext is better than the plain old approach of having +message catalogs that are just databases of sprintf formats. +.PP +File::Findgrep is a sample application/module +that uses Locale::Maketext to localize its messages. For a larger +internationalized system, see also Apache::MP3. +.PP +I18N::LangTags. +.PP +Win32::Locale. +.PP +RFC 3066, \fITags for the Identification of Languages\fR, +as at <http://sunsite.dk/RFC/rfc/rfc3066.html> +.PP +RFC 2277, \fIIETF Policy on Character Sets and Languages\fR +is at <http://sunsite.dk/RFC/rfc/rfc2277.html> \-\- much of it is +just things of interest to protocol designers, but it explains +some basic concepts, like the distinction between locales and +language-tags. +.PP +The manual for GNU \f(CW\*(C`gettext\*(C'\fR. The gettext dist is available in +\&\f(CW\*(C`<ftp://prep.ai.mit.edu/pub/gnu/>\*(C'\fR \-\- get +a recent gettext tarball and look in its "doc/" directory, there's +an easily browsable HTML version in there. The +gettext documentation asks lots of questions worth thinking +about, even if some of their answers are sometimes wonky, +particularly where they start talking about pluralization. +.PP +The Locale/Maketext.pm source. Observe that the module is much +shorter than its documentation! +.SH "COPYRIGHT AND DISCLAIMER" +.IX Header "COPYRIGHT AND DISCLAIMER" +Copyright (c) 1999\-2004 Sean M. Burke. All rights reserved. +.PP +This library is free software; you can redistribute it and/or modify +it under the same terms as Perl itself. +.PP +This program is distributed in the hope that it will be useful, but +without any warranty; without even the implied warranty of +merchantability or fitness for a particular purpose. +.SH AUTHOR +.IX Header "AUTHOR" +Sean M. Burke \f(CW\*(C`sburke@cpan.org\*(C'\fR |