summaryrefslogtreecommitdiffstats
path: root/intl/docs/locale.rst
blob: 6602209bad7403ae7f97f1473edc73a903b8a935 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
.. role:: js(code)
   :language: javascript

=================
Locale management
=================

A locale is a combination of language, region, script, and regional preferences the
user wants to format their data into.

There are multiple models of locale data structures in the industry that have varying degrees
of compatibility between each other. Historically, each major platform has used their own,
and many standard bodies provided conflicting proposals.

Mozilla, alongside with most modern platforms, follows Unicode and W3C recommendation
and conforms to a standard known as `BCP 47`_ which describes a low level textual
representation of a locale known as `language tag`.

A few examples of language tags: *en-US*, *de*, *ar*, *zh-Hans*, *es-CL*.

Locales and Language Tags
=========================

Locale data structure consists of four primary fields.

 - Language (Example: English - *en*, French - *fr*, Serbian - *sr*)
 - Script (Example: Latin - *Latn*, Cyrylic - *Cyrl*)
 - Region (Example: United States - *US*, Canada - *CA*, Russia - *RU*)
 - Variants (Example: Mac OS - *macos*, Windows - *windows*, Linux - *linux*)

`BCP 47`_ specifies the syntax for each of those fields (called subtags) when
represented as a string. The syntax defines the allowed selection of characters,
their capitalization, and the order in which the fields should be defined.

Most of the base subtags are valid ISO codes, such as `ISO 639`_ for
language subtag, or `ISO 3166-1`_ for region.

The examples above present language tags with several fields omitted, which is allowed
by the standard.

On top of that, a locale may contain:

 - extensions and private fields
     These fields can be used to carry additional information about a locale.
     Mozilla currently has partial support for them in the JS implementation and plans to
     extend support to all APIs.
 - extkeys and "grandfathered" tags (unfortunate language, but part of the spec)
     Mozilla does not support these yet.


An example locale can be visualized as:

.. code-block:: javascript

  {
      "language": "sr",
      "script": "Cyrl",
      "region": "RU",
      "variants": [],
      "extensions": {},
      "privateuse": [],
  }

which can be then serialized into a string: **"sr-Cyrl-RU"**.

.. important::

  Since locales are often stored and passed around the codebase as
  language tag strings, it is important to always use an appropriate
  API to parse, manipulate and serialize them.
  Avoid `Do-It-Yourself` solutions which leave your code fragile and may
  break on unexpected language tag structures.

Locale Fallback Chains
======================

Locale sensitive operations are always considered "best-effort". That means that it
cannot be assumed that a perfect match will exist between what the user requested and what
the API can provide.

As a result, the best practice is to *always* operate on locale fallback chains -
ordered lists of locales according to the user preference.

An example of a locale fallback chain may be: :js:`["es-CL", "es-ES", "es", "fr", "en"]`.

The above means a request to format the data according to the Chilean Spanish if possible,
fall back to Spanish Spanish, then any (generic) Spanish, French and eventually to
English.

.. important::

  It is *always* better to use a locale fallback chain over a single locale.
  In case there's only one locale available, a list with one element will work
  while allowing for future extensions without a costly refactor.

Language Negotiation
====================

Due to the imperfections in data matching, all operations on locales should always
use a language negotiation algorithm to resolve the best available set of locales,
based on the list of all available locales and an ordered list of requested locales.

Such algorithms may vary in sophistication and number of strategies. Mozilla's
solution is based on modified logic from `RFC 5656`_.

The three lists of locales used in negotiation:

 - **Available** - locales that are locally installed
 - **Requested** - locales that the user selected in decreasing order of preference
 - **Resolved** - result of the negotiation

The result of a negotiation is an ordered list of locales that are available to
the system, and the consumer is expected to attempt using the locales in the
resolved order.

Negotiation should be used in all scenarios like selecting language resources,
calendar, number formatting, etc.

Single Locale Matching
----------------------

Every negotiation strategy goes through a list of steps in an attempt to find the
best possible match between locales.

The exact algorithm is custom, and consists of a 6 level strategy:

::

  1) Attempt to find an exact match for each requested locale in available
     locales.
     Example: ['en-US'] * ['en-US'] = ['en-US']

  2) Attempt to match a requested locale to an available locale treated
     as a locale range.
     Example: ['en-US'] * ['en'] = ['en']
                            ^^
                            |-- becomes 'en-*-*-*'

  3) Attempt to use the maximized version of the requested locale, to
     find the best match in available locales.
     Example: ['en'] * ['en-GB', 'en-US'] = ['en-US']
                ^^
                |-- ICU likelySubtags expands it to 'en-Latn-US'

  4) Attempt to look for a different variant of the same locale.
     Example: ['ja-JP-win'] * ['ja-JP-mac'] = ['ja-JP-mac']
                ^^^^^^^^^
                |----------- replace variant with range: 'ja-JP-*'

  5) Attempt to look for a maximized version of the requested locale,
     stripped of the region code.
     Example: ['en-CA'] * ['en-ZA', 'en-US'] = ['en-US', 'en-ZA']
                ^^^^^
                |----------- look for likelySubtag of 'en': 'en-Latn-US'

  6) Attempt to look for a different region of the same locale.
     Example: ['en-GB'] * ['en-AU'] = ['en-AU']
                ^^^^^
                |----- replace region with range: 'en-*'

Filtering / Matching / Lookup
-----------------------------

When negotiating between lists of locales, Mozilla's :js:`LocaleService` API
offers three language negotiation strategies:

Filtering
^^^^^^^^^

This is the most common scenario, where there is an advantage in creating a
maximal possible list of locales that the user may benefit from.

An example of a scenario:

.. code-block:: javascript

    let requested = ["fr-CA", "en-US"];
    let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-CH"];

    let result = Services.locale.negotiateLanguages(requested, available);

    result == ["fr-CA", "fr", "fr-CH", "en-GB", "en-ZA"];

In the example above the algorithm was able to match *"fr-CA"* as a perfect match,
but then was able to find other matches as well - a generic French is a very
good match, and Swiss French is also very close to the top requested language.

In case of the second of the requested locales, unfortunately American English
is not available, but British English and South African English are.

The algorithm is greedy and attempts to match as many locales
as possible. This is usually what the developer wants.

Matching
^^^^^^^^

In less common scenarios the code needs to match a single, best available locale for
each of the requested locales.

An example of this scenario:

.. code-block:: javascript

    let requested = ["fr-CA", "en-US"];
    let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-ZH"];

    let result = Services.locale.negotiateLanguages(
      requested,
      available,
      undefined,
      Services.locale.langNegStrategyMatching);

    result == ["fr-CA", "en-GB"];

The best available locales for *"fr-CA"* is a perfect match, and for *"en-US"*, the
algorithm selected British English.

Lookup
^^^^^^

The third strategy should be used in cases where no matter what, only one locale
can be ever used. Some third-party APIs don't support fallback and it doesn't make
sense to continue resolving after finding the first locale.

It is still advised to continue using this API as a fallback chain list, just in
this case with a single element.

.. code-block:: javascript

    let requested = ["fr-CA", "en-US"];
    let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-ZH"];

    let result = Services.locale.negotiateLanguages(
      requested,
      available,
      Services.locale.defaultLocale,
      Services.locale.langNegStrategyLookup);

    result == ["fr-CA"];

Default Locale
--------------

Besides *Available*, *Requested* and *Resolved* locale lists, there's also a concept
of *DefaultLocale*, which is a single locale out of the list of available ones that
should be used in case there is no match to be found between available and
requested locales.

Every Firefox is built with a single default locale - for example
**Firefox zh-CN** has *DefaultLocale* set to *zh-CN* since this locale is guaranteed
to be packaged in, have all the resources, and should be used if the negotiation fails
to return any matches.

.. code-block:: javascript

    let requested = ["fr-CA", "en-US"];
    let available = ["it", "de", "zh-CN", "pl", "sr-RU"];
    let defaultLocale = "zh-CN";

    let result = Services.locale.negotiateLanguages(requested, available, defaultLocale);

    result == ["zh-CN"];

Chained Language Negotiation
----------------------------

In some cases the user may want to link a language selection to another component.

For example, a Firefox extension may come with its own list of available locales, which
may have locales that Firefox doesn't.

In that case, negotiation between user requested locales and the add-on's list may result
in a selection of locales superseding that of Firefox itself.


.. code-block:: none

         Fx Available
        +-------------+
        |  it, fr, ar |
        +-------------+                 Fx Locales
                      |                +--------+
                      +--------------> | fr, ar |
                      |                +--------+
            Requested |
     +----------------+
     | es, fr, pl, ar |
     +----------------+                 Add-on Locales
                      |                +------------+
                      +--------------> | es, fr, ar |
      Add-on Available |               +------------+
    +-----------------+
    |  de, es, fr, ar |
    +-----------------+


In that case, an add-on may end up being displayed in Spanish, while Firefox UI will
use French. In most cases this results in a bad UX.

In order to avoid that, one can chain the add-on negotiation and take Firefox's resolved
locales as a `requested`, and negotiate that against the add-ons' `available` list.

.. code-block:: none

        Fx Available
       +-------------+
       |  it, ar, fr |
       +-------------+                Fx Locales (as Add-on Requested)
                     |                +--------+
                     +--------------> | fr, ar |
                     |                +--------+
           Requested |                         |                Add-on Locales
    +----------------+                         |                +--------+
    | es, fr, pl, ar |                         +------------->  | fr, ar |
    +----------------+                         |                +--------+
                                               |
                              Add-on Available |
                             +-----------------+
                             |  de, es, ar, fr |
                             +-----------------+

Available Locales
=================

In Gecko, available locales come from the `Packaged Locales` and the installed
`language packs`. Language packs are a variant of web extensions providing just
localized resources for one or more languages.

The primary notion of which locales are available is based on which locales Gecko has
UI localization resources for, and other datasets such as internationalization may
carry different lists of available locales.

Requested Locales
=================

The list of requested locales can be read and set using :js:`LocaleService::requestedLocales` API.

Using the API will perform necessary sanity checks and canonicalize the values.

After the sanitization, the value will be stored in a pref :js:`intl.locale.requested`.
The pref usually will store a comma separated list of valid BCP47 locale
codes, but it can also have two special meanings:

 - If the pref is not set at all, Gecko will use the default locale as the requested one.
 - If the pref is set to an empty string, Gecko will look into OS app locales as the requested.

The former is the current default setting for Firefox Desktop, and the latter is the
default setting for Firefox for Android.

If the developer wants to programmatically request the app to follow OS locales,
they can assign :js:`null` to :js:`requestedLocales`.

Regional Preferences
====================

Every locale comes with a set of default preferences that are specific to a culture
and region. This contains preferences such as calendar system, way to display
time (24h vs 12h clock), which day the week starts on, which days constitute a weekend,
what numbering system and date time formatting a given locale uses
(for example "MM/DD" in en-US vs "DD/MM" in en-AU).

For all such preferences Gecko has a list of default settings for every region,
but there's also a degree of customization every user may want to make.

All major operating systems have a Settings UI for selecting those preferences,
and since Firefox does not provide its own, Gecko looks into the OS for them.

A special API :js:`mozilla::intl::OSPreferences` handles communication with the
host operating system, retrieving regional preferences and altering
internationalization formatting with user preferences.

One thing to notice is that the boundary between regional preferences and language
selection is not strong. In many cases the internationalization formats
will contain language specific terms and literals. For example a date formatting
pattern into Japanese may look like this - *"2018年3月24日"*, or the date format
may contains names of months or weekdays to be translated
("April", "Tuesday" etc.).

For that reason it is tricky to follow regional preferences in a scenario where Operating
System locale selection does not match the Firefox UI locales.

Such behavior might lead to a UI case like "Today is 24 października" in an English Firefox
with Polish date formats.

For that reason, by default, Gecko will *only* look into OS Preferences if the *language*
portion of the locale of the OS and Firefox match.
That means that if Windows is in "**en**-AU" and Firefox is in "**en**-US" Gecko will look
into Windows Regional Preferences, but if Windows is in "**de**-CH" and Firefox
is in "**fr**-FR" it won't.
In order to force Gecko to look into OS preferences irrelevant of the language match,
set the flag :js:`intl.regional_prefs.use_os_locales` to :js:`true`.

UI Direction
------------

Since the UI direction is so tightly coupled with the locale selection, the
main method of testing the directionality of the Gecko app lives in LocaleService.

:js:`LocaleService::IsAppLocaleRTL` returns a boolean indicating if the current
direction of the app UI is right-to-left.

Default and Last Fallback Locales
=================================

Every Gecko application is built with a single locale as the default one. Such locale
is guaranteed to have all linguistic resources available, should be used
as the default locale in case language negotiation cannot find any match, and also
as the last locale to look for in a fallback chain.

If all else fails, Gecko also support a notion of last fallback locale, which is
currently hardcoded to *"en-US"*, and is the very final locale to try in case
nothing else (including the default locale) works.
Notice that Unicode and ICU use *"en-GB"* in that role because more English speaking
people around the World recognize British regional preferences than American (metric vs.
imperial, Fahrenheit vs Celsius etc.).
Mozilla may switch to *"en-GB"* in the future.

Packaged Locales
================

When the Gecko application is being packaged it bundles a selection of locale resources
to be available within it. At the moment, for example, most Firefox for Android
builds come with almost 100 locales packaged into it, while Desktop Firefox comes
with usually just one packaged locale.

There is currently work being done on enabling more flexibility in how
the locales are packaged to allow for bundling applications with different
sets of locales in different areas - dictionaries, hyphenations, product language resources,
installer language resources, etc.

Web Exposed Locales
====================

For anti-tracking or some other reasons, we tend to expose spoofed locale to web content instead
of default locales. This can be done by setting the pref :js:`intl.locale.privacy.web_exposed`.
The pref is a comma separated list of locale, and empty string implies default locales.

The pref has no function while :js:`privacy.spoof_english` is set to 2, where *"en-US"* will always
be returned.

Multi-Process
=============

Locale management can operate in a client/server model. This allows a Gecko process
to manage locales (server mode) or just receive the locale selection from a parent
process (client mode).

The client mode is currently used by all child processes of Desktop Firefox, and
may be used by, for example, GeckoView to follow locale selection from a parent
process.

To check the mode the process is operating in, the :js:`LocaleService::IsServer` method is available.

Note that :js:`L10nRegistry.registerSources`, :js:`L10nRegistry.updateSources`, and
:js:`L10nRegistry.removeSources` each trigger an IPC synchronization between the parent
process and any extant content processes, which is expensive. If you need to change the
registration of multiple sources, the best way to do so is to coalesce multiple requests
into a single array and then call the method once.

Mozilla Exceptions
==================

There's currently only a single exception of the BCP47 used, and that's
a legacy "ja-JP-mac" locale. The "mac" is a variant and BCP47 requires all variants
to be 5-8 character long.

Gecko supports the limitation by accepting the 3-letter variants in our APIs and also
provides a special :js:`appLocalesAsLangTags` method which returns this locale in that form.
(:js:`appLocalesAsBCP47` will canonicalize it and turn into `"ja-JP-macos"`).

Usage of language negotiation etc. shouldn't rely on this behavior.

Events
======

:js:`LocaleService` emits two events: :js:`intl:app-locales-changed` and
:js:`intl:requested-locales-changed` which all code can listen to.

Those events may be broadcasted in response to new language packs being installed, or
uninstalled, or user selection of languages changing.

In most cases, the code should observe the :js:`intl:app-locales-changed`
and react to only that event since this is the one indicating a change
in the currently used language settings that the components should follow.

Testing
=======

Many components may have logic encoded to react to changes in requested, available
or resolved locales.

In order to test the component's behavior, it is important to replicate
the environment in which such change may happen.

Since in most cases it is advised for a component to tie its
language negotiation to the main application (see `Chained Language Negotiation`),
it is not enough to add a new locale to trigger the language change.

First, it is necessary to add a new locale to the available ones, then change
the requested, and only that will result in a new negotiation and language
change happening.

There are two primary ways to add a locale to available ones.

Testing Localization
--------------------

If the goal is to test that the correct localization ends up in the correct place,
the developer needs to register a new :js:`L10nFileSource` in :js:`L10nRegistry` and
provide a mock cached data to be returned by the API.

It may look like this:

.. code-block:: javascript

    let source = L10nFileSource.createMock(
      "mock-source", "app",
      ["ko-KR", "ar"],
      "resource://mock-addon/localization/{locale}",
      [
        {
          path: "resource://mock-addon/localization/ko-KR/test.ftl",
          source: "key = Value in Korean"
        },
        {
          path: "resource://mock-addon/localization/ar/test.ftl",
          source: "key = Value in Arabic"
        }
      ]
    );

    L10nRegistry.registerSources([fs]);

    let availableLocales = Services.locale.availableLocales;

    assert(availableLocales.includes("ko-KR"));
    assert(availableLocales.includes("ar"));

    Services.locale.requestedLocales = ["ko-KR"];

    let appLocales = Services.locale.appLocalesAsBCP47;
    assert(appLocales[0], "ko-KR");

From here, a resource :js:`test.ftl` can be added to a `Localization` and for ID :js:`key`
the correct value from the mocked cache will be returned.

Testing Locale Switching
------------------------

The second method is much more limited, as it only mocks the locale availability,
but it is also simpler:

.. code-block:: javascript

    Services.locale.availableLocales = ["ko-KR", "ar"];
    Services.locale.requestedLocales = ["ko-KR"];

    let appLocales = Services.locale.appLocalesAsBCP47;
    assert(appLocales[0], "ko-KR");

In the future, Mozilla plans to add a third way for add-ons (`bug 1440969`_)
to allow for either manual or automated testing purposes disconnecting its locales
from the main application ones.

Testing the outcome
-------------------

Except of testing for reaction to locale changes, it is advised to avoid writing
tests that expect a certain locale to be selected, or certain internationalization
or localization data to be used.

Doing so locks down the test infrastructure to be only usable when launched in
a single locale environment and requires those tests to be updated whenever the underlying
data changes.

In the case of testing locale selection it is best to use a fake locale like :js:`x-test`, that
will not be present at the beginning of the test.

In the case of testing for internationalization data it is best to use :js:`resolvedOptions()`,
to verify the right data is being used, rather than comparing the output string.

In the case of localization, it is best to test against the correct :js:`data-l10n-id`
being set or, in edge cases, verify that a given variable is present in the string using
:js:`String.prototype.includes`.

Deep Dive
=========

Below is a list of articles with additional
details on selected subjects:

.. toctree::
   :maxdepth: 1

   locale_env
   locale_startup

Feedback
========

In case of questions, please consult Intl module peers.


.. _RFC 5656: https://tools.ietf.org/html/rfc5656
.. _BCP 47: https://tools.ietf.org/html/bcp47#section-2.1
.. _ISO 639: http://www.loc.gov/standards/iso639-2/php/code_list.php
.. _ISO 3166-1: https://www.iso.org/iso-3166-country-codes.html
.. _Intl.Locale: https://bugzilla.mozilla.org/show_bug.cgi?id=1433303
.. _fluent-locale: https://docs.rs/fluent-locale/
.. _bug 1440969: https://bugzilla.mozilla.org/show_bug.cgi?id=1440969