diff options
Diffstat (limited to 'intl/docs')
-rw-r--r-- | intl/docs/dataintl.rst | 348 | ||||
-rw-r--r-- | intl/docs/icu.rst | 289 | ||||
-rw-r--r-- | intl/docs/icu4x.rst | 52 | ||||
-rw-r--r-- | intl/docs/index.rst | 27 | ||||
-rw-r--r-- | intl/docs/locale.rst | 610 | ||||
-rw-r--r-- | intl/docs/locale_env.rst | 37 | ||||
-rw-r--r-- | intl/docs/locale_startup.rst | 96 |
7 files changed, 1459 insertions, 0 deletions
diff --git a/intl/docs/dataintl.rst b/intl/docs/dataintl.rst new file mode 100644 index 0000000000..d5616e08ae --- /dev/null +++ b/intl/docs/dataintl.rst @@ -0,0 +1,348 @@ +.. role:: js(code) + :language: javascript + +========================= +UI Internationalization +========================= + +There are many types of data that need to be formatted into a locale specific format, +or require locale specific API operations. + +Gecko provides a rich set of locale aware APIs for operations such as: + +* date and time formatting +* number formatting +* searching +* sorting +* plural rules +* calendar and locale information + +.. note:: + + Most of the APIs are backed by the Unicode projects `CLDR`_ and `ICU`_ and are + focused on enabling front-end code internationalization, which means the majority of + the APIs are primarily available in JavaScript, with C++ and Rust having only a small + subset of them exposed. + +JavaScript Internationalization API +=================================== + +Data internationalization APIs are formalized in the JavaScript standard `ECMA 402`_. +These APIs are supported by all major JS environments. + +It is best to consult the MDN article on the current state of the `Intl API`_. +Mozilla has an excellent support of the API and relies on it for majority +of its needs. Yet, when working on Firefox UI the :js:`Services.intl` wrapper +should be used. + +Services.intl +============= + +:js:`Services.intl` is an extension of the JS Intl API which should be used whenever +working with Gecko app user interface with chrome privileges. + +The API provides the same objects and methods as :js:`Intl.*`, but fine tunes them +to the Gecko app user preferences, including matching OS Preferences and +other locale choices that web content exposed JS Intl API cannot. + +For example, here's an example of a locale aware date formatting +using the regular :js:`Intl.DateTimeFormat`: + +.. code-block:: javascript + + let rtf = new Intl.DateTimeFormat(navigator.languages, { + year: "numeric", + month: "long", + day: "numeric" + }); + let value = rtf.format(new Date()); + +It will do a good job at formatting the date to the user locale, but it will +only be able to use the customization bits that are exposed to the Web, based on +the locale the user broadcasts to the Web and any additional settings. + +But that ignores bits of information that could inform the formatting. + +Public API such as :js:`Intl.*` will not be able to look into the Operating System for +regional preferences. It will also respect settings such as `Resist Fingerprinting` +by masking its timezone and locale settings. + +This is a fair tradeoff when dealing with the Web Content, but in most cases, the +privileged UI of the Gecko application should be able to access all of those +additional bits and not be affected by the anti-fingerprinting masking. + +`mozIntl` is a simple wrapper which in its simplest form works exactly the same. It's +exposed on :js:`Services.intl` object and can be used just like a regular `Intl` API: + +.. code-block:: javascript + + let rtf = new Services.intl.DateTimeFormat(undefined, { + year: "numeric", + month: "long", + day: "numeric" + }); + let value = rtf.format(new Date()); + +The difference is that this API will now use the set of locales as defined for +Gecko, and will also respect additional regional preferences that Gecko +will fetch from the Operating System. + +For those reasons, when dealing with Gecko application UI, it is always recommended +to use the :js:`Services.intl` wrapper. + +Additional APIs +================ + +On top of wrapping up `Intl` API, `mozIntl` provides a number of features +in form of additional options to existing APIs as well as completely new APIs. + +Many of those extensions are in the process of being standardized, but are +already available to Gecko developers for internal use. + +Below is the list of current extensions: + +mozIntl.DateTimeFormat +---------------------- + +`DateTimeFormat` in `mozIntl` gets additional options that provide greater +simplicity and consistency to the API. + +* :js:`timeStyle` and :js:`dateStyle` can take values :js:`short`, :js:`medium`, + :js:`long` and :js:`full`. + These options can replace the manual listing of tokens like :js:`year`, :js:`day`, :js:`hour` etc. + and will compose the most natural date or time format of a given style for the selected + locale. + +Using :js:`timeStyle` and :js:`dateStyle` is highly recommended over listing the tokens, +because different locales may use different default styles for displaying the same tokens. + +Additional value is that using those styles allows `mozIntl` to look into +Operating System patterns, which gives users the ability to customize those +patterns to their liking. + +Example use: + +.. code-block:: javascript + + let dtf = new Services.intl.DateTimeFormat(undefined, { + timeStyle: "short", + dateStyle: "short" + }); + let value = dtf.format(new Date()); + +This will select the best locale to match the current Gecko application locale, +then potentially check for Operating System regional preferences customizations, +produce the correct pattern for short date+time style and format the date into it. + + +mozIntl.getCalendarInfo(locale) +------------------------------- + +The API will return the following calendar information for a given locale code: + +* firstDayOfWeek + an integer in the range 1=Monday to 7=Sunday indicating the day + considered the first day of the week in calendars, e.g. 7 for en-US, + 1 for en-GB, 7 for bn-IN +* minDays + an integer in the range of 1 to 7 indicating the minimum number + of days required in the first week of the year, e.g. 1 for en-US, 4 for de +* weekend + an array with values in the range 1=Monday to 7=Sunday indicating the days + of the week considered as part of the weekend, e.g. [6, 7] for en-US and en-GB, + [7] for bn-IN (note that "weekend" is *not* necessarily two days) + +Those bits of information should be especially useful for any UI that works +with calendar data. + +Example: + +.. code-block:: javascript + + // omitting the `locale` argument will make the API return data for the + // current Gecko application UI locale. + let { + firstDayOfWeek, // 1 + minDays, // 4 + weekend, // [6, 7] + calendar, // "gregory" + locale, // "pl" + } = Services.intl.getCalendarInfo(); + + +mozIntl.DisplayNames(locales, options) +----------------------------------------- + +:js:`DisplayNames` API is useful to retrieve various terms available in the +internationalization API. :js:`mozIntl.DisplayNames` extends the standard +`Intl.DisplayNames`_ to additionally provide localization for date-time types. + +The API takes a locale fallback chain list, and an options object which can contain +two keys: + +* :js:`style` which can take values :js:`narrow`, :js:`short`, :js:`abbreviated`, :js:`long` +* :js:`type` which can take values :js:`language`, :js:`script`, :js:`region`, + :js:`currency`, :js:`weekday`, :js:`month`, :js:`quarter`, :js:`dayPeriod`, + :js:`dateTimeField` + +Example: + +.. code-block:: javascript + + let dateTimeFieldDisplayNames = new Services.intl.DisplayNames(undefined, { + type: "dateTimeField", + }); + dateTimeFieldDisplayNames.resolvedOptions().locale = "pl"; + dateTimeFieldDisplayNames.of("year") = "rok"; + + let monthDisplayNames = new Services.intl.DisplayNames(undefined, { + type: "month", style: "long", + }); + monthDisplayNames.of(1) = "styczeń"; + + let weekdaysDisplayNames = new Services.intl.DisplayNames(undefined, { + type: "weekday", style: "short", + }); + weekdaysDisplayNames.of(1) = "pon"; + + let dayPeriodsDisplayNames = new Services.intl.DisplayNames(undefined, { + type: "dayPeriod", style: "narrow", + }); + dayPeriodsDisplayNames.of("am") = "AM"; + + +mozIntl.RelativeTimeFormat(locales, options) +-------------------------------------------- + +API which can be used to format an interval or a date into a textual +representation of a relative time, such as **5 minutes ago** or **in 2 days**. + +This API is in the process of standardization and in its raw form will not handle +any calculations to select the best unit. It is intended to just offer a way +to format a value. + +`mozIntl` wrapper extends the functionality providing the calculations and +allowing the user to get the current best textual representation of the delta. + +Example: + +.. code-block:: javascript + + let rtf = new Services.intl.RelativeTimeFormat(undefined, { + style: "long", // "narrow" | "short" | "long" (default) + numeric: "auto", // "always" | "auto" (default) + }); + + let now = Date.now(); + rtf.formatBestUnit(new Date(now - 3 * 1000 * 60)); // "3 minutes ago" + +The option `numeric` has value set to `auto` by default, which means that when possible +the formatter will use special textual terms like *yesterday*, *last year*, and so on. + +Those values require specific calculations that the raw `Intl.*` API cannot provide. +For example, *yesterday* requires the algorithm to know not only the time delta, +but also what time of the day `now` is. 15 hours ago may be *yesterday* if it +is 10am, but will still be *today* if it is 11pm. + +For that reason the future `Intl.RelativeTimeFormat` will use *always* as default, +since terms such as *15 hours ago* are independent of the current time. + +.. note:: + + In the current form, the API should be only used to format standalone values. + Without additional capitalization rules, it cannot be freely used in sentences. + +mozIntl.getLanguageDisplayNames(locales, langCodes) +--------------------------------------------------- + +API which returns a list of language names formatted for display. + +Example: + +.. code-block:: javascript + + let langs = getLanguageDisplayNames(["pl"], ["fr", "de", "en"]); + langs === ["Francuski", "Niemiecki", "Angielski"]; + + +mozIntl.getRegionDisplayNames(locales, regionCodes) +--------------------------------------------------- + +API which returns a list of region names formatted for display. + +Example: + +.. code-block:: javascript + + let regs = getRegionDisplayNames(["pl"], ["US", "CA", "MX"]); + regs === ["Stany Zjednoczone", "Kanada", "Meksyk"]; + +mozIntl.getLocaleDisplayNames(locales, localeCodes) +--------------------------------------------------- + +API which returns a list of region names formatted for display. + +Example: + +.. code-block:: javascript + + let locs = getLocaleDisplayNames(["pl"], ["sr-RU", "es-MX", "fr-CA"]); + locs === ["Serbski (Rosja)", "Hiszpański (Meksyk)", "Francuski (Kanada)"]; + +mozIntl.getAvailableLocaleDisplayNames(type) +--------------------------------------------------- + +API which returns a list of locale display name codes available for a +given type. +Available types are: "language", "region". + +Example: + +.. code-block:: javascript + + let codes = getAvailableLocaleDisplayNames("region"); + codes === ["au", "ae", "af", ...]; + +Best Practices +============== + +The most important best practice when dealing with data internationalization is to +perform it as close to the actual UI as possible; right before the UI is displayed. + +The reason for this practice is that internationalized data is considered *"opaque"*, +which means that no code should ever attempt to operate on it. Late resolution also +increases the chance that the data will be formatted in the current locale +selection and not formatted and cached prematurely. + +It's very important to not attempt to search, concatenate or in any other way +alter the output of the API. Once it gets formatted, the only thing to do with +the output should be to present it to the user. + +Testing +------- + +The above is also important in the context of testing. It is a common mistake to +attempt to write tests that verify the output of the UI with internationalized data. + +The underlying data set used to create the formatted version of the data may and will +change over time, both due to dataset improvements and also changes to the language +and regional preferences over time. +That means that tests that attempt to verify the exact output will require +significantly higher level of maintenance and will remain brittle. + +Most of the APIs provide special method, like :js:`resolvedOptions` which should be used +instead to verify that the output is matching the expectations. + +Future extensions +================= + +If you find yourself in the need of additional internationalization APIs not currently +supported, you can verify if the API proposal is already in the works here, +and file a bug in the component `Core::Internationalization`_ to request it. + +.. _ECMA 402: https://tc39.github.io/ecma402/ +.. _Intl API: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl +.. _CLDR: http://cldr.unicode.org/ +.. _ICU: http://site.icu-project.org/ +.. _Core::Internationalization: https://bugzilla.mozilla.org/enter_bug.cgi?product=Core&component=Internationalization +.. _Intl.DisplayNames: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/DisplayNames diff --git a/intl/docs/icu.rst b/intl/docs/icu.rst new file mode 100644 index 0000000000..3a540658c4 --- /dev/null +++ b/intl/docs/icu.rst @@ -0,0 +1,289 @@ +### +ICU +### + +Introduction +============ + +Internationalization (i18n, “i” then 18 letters then “n”) is the process of handling data with respect to a particular locale: + +- The number 5 representing five US dollars might be formatted as + + - “$5.00” in American English, + - “US$5.00” in Canadian English, or + - “5,00 $US” in French. + +- A list of people’s names in a phone book would sort + + - in English alphabetically; but + - in German, where “ä”/“ö”/“ü” are often interchangeable with “ae”/“oe”/“ue”, alphabetically but with vowels with umlauts treated as their two-vowel counterparts. + +- The currency whose code is “CHF” might be formatted as + + - “Swiss Franc” in English, but + - “franc suisse” in French. + +- The Unix time 1590803313070 might format as the time string + + - “9:48:33 PM Eastern Daylight Time” in American English, but + - “21:48:33 Nordamerikanische Ostküsten-Sommerzeit” in German. + +i18n encompasses far more than this, but you get the basic idea. + +Internationalization in SpiderMonkey and Gecko +============================================== + +SpiderMonkey implements extensive i18n capabilities through the `ECMAScript Internationalization API <https://tc39.es/ecma402/>`__ and the global ``Intl`` object. Gecko requires i18n capabilities to implement text shaping, sort operations in some contexts, and various other features. + +SpiderMonkey and Gecko use `ICU <http://site.icu-project.org/>`__, Internationalization Components for Unicode, to implement many low-level i18n operations. (Line breaking, implemented instead in ``intl/lwbrk``, is a notable exception.) Gecko and SpiderMonkey also use ICU’s implementations of certain i18n-*adjacent* operations (for example, Unicode normalization). + +ICU date/time formatting functionality requires extensive knowledge of time zone names and when zone transitions occur. The IANA ``tzdata`` database supplies this information. + +A final note of caution: ICU carefully depends upon an exact Unicode version. Other parts of SpiderMonkey and Gecko have separate dependencies on an exact Unicode version. Updates to ICU and related components *must* be synchronized with those updates so that the entirety of SpiderMonkey, and the entirety of Gecko including SpiderMonkey within it, advance to new Unicode versions in lockstep. [#lockstep]_ + +.. [#lockstep] + The steps involved in updating Gecko-in-general’s Unicode version, and updating SpiderMonkey’s code dependent on Unicode version, are `documented on WikiMO <https://wiki.mozilla.org/I18n:Updating_Unicode_version>`__. + +Building SpiderMonkey or Gecko with ICU +======================================= + +SpiderMonkey and Gecko can be built using either a periodically-updated copy of ICU in ``intl/icu/source`` (using time zone data in ``intl/tzdata/source``), or using a system-provided ICU library (dependent on its own ``tzdata`` information). Pass ``--with-system-icu`` when configuring to use system ICU. (Using system ICU will disable some ``Intl`` functionality, such as historically accurate time zone calculations, that can’t be readily supported without a precisely-controlled ICU.) ICU version requirements advance fairly quickly as Gecko depends on features and bug fixes in newer ICU releases. You’ll get a build error if you try to use an unsupported ICU. + +SpiderMonkey’s ``Intl`` API may be built or disabled by configuring ``--with-intl-api`` (the default) or ``--without-intl-api``. SpiderMonkey built without the ``Intl`` API doesn’t require ICU. However, if you build without the ``Intl`` API, some non-``Intl`` JavaScript functionality will not exist (``String.prototype.normalize``) or won’t fully work (for example, ``String.prototype.toLocale{Lower,Upper}Case`` will not respect a provided locale, and the various ``toLocaleString`` functions have best-effort behavior). + +Using ICU functionality in SpiderMonkey and Gecko +================================================= + +ICU headers are considered system headers by the Gecko build system, so they must be listed in ``config/system-headers.mozbuild``. Code that wishes to use ICU functionality may use ``#include "unicode/unorm.h"`` or similar to do so. + +Gecko and SpiderMonkey code may use ICU’s stable C API (ICU4C). These functions are stable and shouldn’t change as ICU updates occur. (ICU4C’s ``enum`` initializers are not always stable: while initializer values are stable, new initializers are sometimes added, perhaps behind ``#ifdef U_HIDE_DRAFT_API``. This may be necessary for exhaustive ``switch``\ es to add ``#ifdef``\ s around some ``case``\ s.) + +Gecko and SpiderMonkey are strongly discouraged from using ICU’s C++ API (unfortunately including all smart pointer classes), because the C++ API doesn’t provide ICU4C’s compatibility guarantees. Rarely, we tolerate C++ API use when no stable option exists. But the API has to “look” reasonably stable, and we usually want to start a discussion with upstream about adding a stable API to eventually use. Use symbols from ``namespace icu`` to access ICU C++ functionality. *Talk to the current imported-ICU owner (presently Jeff Walden) before you start doing any of this!* + +SpiderMonkey and Gecko’s imported ICU +===================================== + +Build system +------------ + +The system for building ICU lives in ``config/external/icu`` and ``intl/icu/icu_sources_data.py``. We generate a Mozilla-compatible build system rather than using ICU’s build system. The build system is shared by SpiderMonkey and Gecko both. + +ICU includes functionality we never use, so we don’t naively compile all of it. We extract the list of files to compile from ``intl/icu/source/{common,i18n}/Makefile.in`` and then apply a manually-maintained list of unused files (stored in ``intl/icu_sources_data.py``) when we update ICU. + +Locale and time zone data +------------------------- + +ICU contains a considerable amount of raw locale data: formatting characteristics for each locale, strings for things like currencies and languages for each locale, localized time zone specifiers, and so on. This data lives in human-readable files in ``intl/icu/source/data``. Time zone data in ``intl/tzdata/source`` is stored in partially-compiled formats (some of them only partly human-readable). + +However, a normal Gecko build never uses these files! Instead, both ICU and ``tzdata`` data are precompiled into a large, endian-specific ``icudtNNE.dat`` (``NN`` = ICU version, ``E`` = endianness) file. [#why-icudt-not-rebuilt-every-time]_ That file is added to ``config/external/icu/data/`` and is checked into the Mozilla tree, to be directly incorporated into Gecko/SpiderMonkey builds. For size reasons, only the little-endian version is checked into the tree. It is converted into a big-endian version when necessary during the build. + +ICU’s locale data covers *all* ICU internationalization features, including ones we never need. We trim locale data to size with a ``intl/icu/data_filter.json`` `data filter <https://github.com/unicode-org/icu/blob/master/docs/userguide/icu_data/buildtool.md>`__ when compiling ``icudtNNE.dat``. Removing *too much* data won’t *necessarily* break the build, so it’s important that we have automated tests for the locale data we actually use in order to detect mistakes. + +.. [#why-icudt-not-rebuilt-every-time] + ``icudtNNE.dat`` isn’t compiled during a SpiderMonkey/Gecko build because it would require ICU command-line tools. And it’s a pain to either compile and run them during the build, or to require them as build dependencies. + +Local patching of ICU and CLDR +------------------------------ + +We generally don’t patch our copy of ICU except for compelling need. When we do patch, we usually only apply reasonably small patches that have been reviewed and landed upstream (so that our patch will be obsolete when we next update ICU). + +Local patches are stored in the ``intl/icu-patches`` directory. They’re applied when ICU is updated, so merely updating ICU files in place won’t persist changes across an ICU update. + +Patching ICU also allows for patching some parts and uses of CLDR, the data backing ICU operations. Note that this does not include character data, which is `updated separately <https://wiki.mozilla.org/I18n:Updating_Unicode_version>`__, and that any such patching does not affect any other CLDR uses. In particular, Fluent localization depends on Rust crates which themselves depend on CLDR data directly and separately from ICU. Any CLDR patches should remain reasonably small; larger changes such as adding support for a new locale should be done upstream. + +Updating imported code +---------------------- + +The process of updating imported i18n-relevant code is *semi*-automated. We use a series of shell and Python scripts to do the job. + +Updating ICU +~~~~~~~~~~~~ + +New ICU versions are announced on the `icu-announce <https://lists.sourceforge.net/lists/listinfo/icu-announce>`__ mailing list. Both release candidates and actual releases are announced here. It’s a good idea to attempt to update ICU when a release candidate is announced, just in case some serious problem is present (especially one that would be painful to fix through local patching). + +``intl/update-icu.sh`` updates our ICU to a given ICU release: [#icu-git-argument]_ + +.. code:: bash + + $ cd "$topsrcdir/intl" + $ # Ensure certain Python modules in the tree are accessible when updating. + $ export PYTHONPATH="$topsrcdir/python/mozbuild/" + $ # <URL to ICU Git> <release tag name> + $ ./update-icu.sh https://github.com/unicode-org/icu.git release-67-1 + +.. [#icu-git-argument] + The ICU Git URL argument lets you update from a local ICU clone. This can speed up work when you’re updating to a new ICU release and need to adjust or add new local patches. + +But usually you’ll want to update to the latest commit from the corresponding ICU maintenance branch so that you pick up fixes landed post-release: + +.. code:: bash + + $ cd "$topsrcdir/intl" + $ # Ensure certain Python modules in the tree are accessible when updating. + $ export PYTHONPATH="$topsrcdir/python/mozbuild/" + $ # <URL to ICU Git> <maintenance name> + $ ./update-icu.sh https://github.com/unicode-org/icu.git maint/maint-67 + +Updating ICU will also update the language tag registry (which records language tag semantics needed to correctly implement ``Intl`` functionality). Therefore it’s likely necessary to update SpiderMonkey’s language tag handling after running this [#update-icu-warning-langtags]_. See below where the ``langtags`` mode of ``make_intl_data.py`` is discussed. + +.. [#update-icu-warning-langtags] + ``update-icu.sh`` will print a notice as a reminder of this: + + .. code:: bash + + INFO: Please run 'js/src/builtin/intl/make_intl_data.py langtags' to update additional language tag files for SpiderMonkey. + +``update-icu.sh`` is intended for *replayability*, not for hands-off runnability. It downloads ICU source, prunes various irrelevant files, replaces ``intl/icu/source`` with the new files – and then blindly applies local patches in fixed order. + +Often a local patch won’t apply, or new patches must be applied to successfully build. In this case you’ll have to manually edit ``update-icu.sh`` to abort after only *some* patches have been applied, make whatever changes are necessary by hand, generate a new/updated patch file by hand, then carefully reattempt updating. (The people who have updated ICU in the past, usually jwalden and anba, follow this awkward process and don’t have good ideas on how to improve it.) + +Any time ICU is updated, you’ll need to fully rebuild whichever of SpiderMonkey or Gecko you’re building. For SpiderMonkey, delete your object directory and reconfigure from scratch. For Gecko, change the message in the top-level `CLOBBER <https://searchfox.org/mozilla-central/source/CLOBBER>`__ file. + +Updating tzdata +~~~~~~~~~~~~~~~ + +ICU contains a copy of ``tzdata``, but that copy is whatever ``tzdata`` release was current at the time the ICU release was finalized. Time zone data changes much more often than that: every time some national legislature or tinpot dictator decides to alter time zones. [#tzdata-release-frequency]_ The `tz-announce <https://mm.icann.org/pipermail/tz-announce/>`__ mailing list announces changes as they occur. (Note that we can’t *immediately* update when a release occurs: ICU’s `icu-data <https://github.com/unicode-org/icu-data>`__ repository must be updated before we can update our ``tzdata``.) + +.. [#tzdata-release-frequency] + To give a sense of how frequently ``tzdata`` is updated, and the irregularity of releases over time: + + - 2019 had three ``tzdata`` releases, 2019a through 2019c. + - 2018 had nine ``tzdata`` releases, 2018a through 2018i. + - 2017 had three ``tzdata`` releases, 2017a through 2017c. + +Therefore, either (usually) after you update ICU *or* when a new ``tzdata`` release occurs, you’ll need to update our imported ``tzdata`` files. (If you do need to update time zone data, note that you’ll also need to additionally update SpiderMonkey’s time zone handling, described further below.) This also suitably updates ``config/external/icu/data/icudtNNE.dat``. (If you’ve just run ``update-icu.sh``, it will warn you that you need to do this. [#update-icu-warning-old-tzdata]_) + +.. [#update-icu-warning-old-tzdata] + For example: + + :: + + WARN: Local tzdata (2020a) is newer than ICU tzdata (2019c), please run './update-tzdata.sh 2020a' + +First, make sure you have a usable ``icupkg`` on your system. [#icupkg-on-system]_ Then run the ``update-tzdata.sh`` script to update ``intl/tzdata`` and ``icudtNNE.dat``: + +.. code:: bash + + $ cd "$topsrcdir/intl" + $ ./update-tzdata.sh 2020a # or whatever the latest release is + +.. [#icupkg-on-system] + To install ``icupkg`` on your system: + + - On Fedora, use ``sudo dnf install icu``. + - On Ubuntu, use ``sudo apt-get install icu-devtools``. + - On Mac OS X, use ``brew install icu4c``. + - On Windows, you’ll need to `download a binary build of ICU for Windows <https://github.com/unicode-org/icu/releases/tag/release-67-1>`__ and use the ``bin/icupkg.exe`` or ``bin64/icupkg.exe`` utility inside it. + + If you’re on Windows, or for some reason you don’t want to use the ``icupkg`` now in your ``$PATH``, you can manually specify it on the command line using the ``-e /path/to/icupkg`` flag: + + .. code:: bash + + $ cd "$topsrcdir/intl" + $ ./update-tzdata.sh -e /path/to/icupkg 2020a # or whatever the latest release is + + *In principle*, the ``icupkg`` you use *should* be the one from the ICU release/maintenance branch being built: if there’s a mismatch, you might encounter an ICU “format version not supported” error. If you’re on Windows, make sure to download a binary build for that release/branch. On other platforms, you might have to build your own ICU from source. The steps required to do this are left as an exercise for the reader. (In the somewhat longer term, the update commands might be changed to do this themselves.) + +If ``tzdata`` must be updated on trunk, you’ll almost certainly have to backport the update to Beta and ESR. Don’t attempt to backport the literal patch; just run the appropriate commands documented here to do so. + +Updating SpiderMonkey ``Intl`` data +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +SpiderMonkey itself can’t blindly invoke ICU to perform every i18n operation, because sometimes ICU behavior deviates from what web specifications require. Therefore, when ICU is updated, we also must update SpiderMonkey itself as well (including various generated tests). Such updating is performed using the various modes of ``js/src/builtin/make_intl_data.py``. + +Updating SpiderMonkey time zone handling +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ECMAScript Internationalization API requires that time zone identifiers (``America/New_York``, ``Antarctica/McMurdo``, etc.) be interpreted according to `IANA <https://www.iana.org/time-zones>`__ semantics. Unfortunately, ICU doesn’t precisely implement those semantics. (See comments in ``js/src/builtin/intl/SharedIntlData.h`` for details.) Therefore SpiderMonkey has to do certain pre- and post-processing based on what’s in IANA but not in ICU, and what’s in ICU that isn’t in IANA. + +Use ``make_intl_data.py``\ ’s ``tzdata`` mode to update time zone information: + +.. code:: bash + + $ cd "$topsrcdir/js/src/builtin/intl" + $ # make_intl_data.py requires yaml. + $ export PYTHONPATH="$topsrcdir/third_party/python/PyYAML/lib3/" + $ python3 ./make_intl_data.py tzdata + +The ``tzdata`` mode accepts two optional arguments that generally will not be needed: + +- **``--tz``** will act using data from a local ``tzdata/`` directory containing raw ``tzdata`` source (note that this is *not* the same as what is in ``intl/tzdata/source``). It may be useful to help debug problems that arise during an update. +- **``--ignore-backzone``** will omit time zone information before 1970. SpiderMonkey and Gecko include this information by default. However, because (by deliberate policy) ``tzdata`` information before 1970 is not reliable to the same degree as data since 1970, and backzone data has a size cost, a SpiderMonkey embedding or custom Gecko build might decide to omit it. + +Updating SpiderMonkey language tag handling +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Language tags (``en``, ``de-CH``, ``ar-u-ca-islamicc``, and so on) are the primary means of specifying localization characteristics. The ECMAScript Internationalization API supports certain operations that depend upon the current state of the language tag registry (stored in the Unicode Common Locale Data Repository, CLDR, a repository of all locale-specific characteristics) that specifies subtag semantics: + +- ``Intl.getCanonicalLocales`` and ``Intl.Locale`` must replace alias subtags with their preferred forms. For example, ``ar-u-ca-islamic-civil`` uses the preferred Islamic calendar subtag, while ``ar-u-ca-islamicc`` uses an alias. +- ``Intl.Locale.prototype.maximize`` and ``Intl.Locale.prototype.minimize`` accept a language tag and add or remove “likely” subtags from it. For example, ``de`` most likely refers to German using Latin script in Germany, so it maximizes to ``de-Latn-DE`` – and in reverse, ``de-Latn-DE`` minimizes to simply ``de``. + +These decisions vary over time: as countries change [#soviet-union]_, as customs change, as language prevalence in regions varies, etc. + +.. [#soviet-union] + For just one relevant example, the breakup of the Soviet Union is the cause of numerous entries in the language tag registry. ``ru-SU``, Russian as used in the Soviet Union, is now expressed as ``ru-RU``, Russian as used in Russia; ``ab-SU``, Abkhazian as used in the Soviet Union, is now expressed as ``ab-GE``, Abkhazian as used in Georgia; and so on for all the other satellite states. + +Use ``make_intl_data.py``\ ’s ``langtags`` mode to update language tag information to the same CLDR version used by ICU: + +.. code:: bash + + $ cd "$topsrcdir/js/src/builtin/intl" + $ # make_intl_data.py requires yaml. + $ export PYTHONPATH="$topsrcdir/third_party/python/PyYAML/lib3/" + $ python3 ./make_intl_data.py langtags + +The CLDR version used will be printed in the header of CLDR-sensitive generated files. For example, ``intl/components/src/LocaleGenerated.cpp`` currently begins with: + +.. code:: cpp + + // Generated by make_intl_data.py. DO NOT EDIT. + // Version: CLDR-37 + // URL: https://unicode.org/Public/cldr/37/core.zip + +Updating SpiderMonkey currency support +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Currencies use different numbers of fractional digits in their preferred formatting. Most currencies use two decimal digits; a handful use no fractional digits or some other number. Currency fractional digit is maintained by ISO and must be updated as currencies change their preferred fractional digits or new currencies arise that don’t use two decimal digits. + +Currency updates are fairly uncommon, so it’ll be rare to need to update currency info. A `newsletter <https://www.currency-iso.org/en/home/amendments/newsletter.html>`__ periodically sends updates about changes. + +Use ``make_intl_data.py``\ ’s ``currency`` mode to update currency fractional digit information: + +.. code:: bash + + $ cd "$topsrcdir/js/src/builtin/intl" + $ # make_intl_data.py requires yaml. + $ export PYTHONPATH="$topsrcdir/third_party/python/PyYAML/lib3/" + $ python3 ./make_intl_data.py currency + +Updating SpiderMonkey measurement formatting support +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``Intl`` API supports formatting numbers as measurement units (for example, “17 meters” or “42 meters per second”). It specifies a list of units that must be supported, that we centrally record in ``js/src/builtin/intl/SanctionedSimpleUnitIdentifiers.yaml``, that we verify are supported by ICU and generate supporting files from. + +If ``Intl``\ ’s list of supported units is ever updated, two separate changes will be required. + +First, ``intl/icu/data_filter.json`` must be updated to incorporate localized strings for the new unit. These strings are stored in ``icudtNNE.dat``, so you’ll have to re-update ICU (and likely reimport ``tzdata`` as well, if it’s been updated since the last ICU update) to rewrite that file. + +Second, use ``make_intl_data.py``\ ’s ``units`` mode to update unit handling and associated tests in SpiderMonkey: + +.. code:: bash + + $ cd "$topsrcdir/js/src/builtin/intl" + $ # make_intl_data.py requires yaml. + $ export PYTHONPATH="$topsrcdir/third_party/python/PyYAML/lib3/" + $ python3 ./make_intl_data.py units + +Updating SpiderMonkey numbering systems support +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``Intl`` API also supports formatting numbers in various numbering systems (for example, “123“ using Latin numbers or “一二三“ using Han decimal numbers). The list of numbering systems that we must support is stored in ``js/src/builtin/intl/NumberingSystems.yaml``. We verify these numbering systems are supported by ICU and generate supporting files from it. + +When the list of supported numbering systems needs to be updated, run ``make_intl_data.py`` with the ``numbering`` mode to update it and associated tests in SpiderMonkey: + +.. code:: bash + + $ cd "$topsrcdir/js/src/builtin/intl" + $ # make_intl_data.py requires yaml. + $ export PYTHONPATH="$topsrcdir/third_party/python/PyYAML/lib3/" + $ python3 ./make_intl_data.py numbering diff --git a/intl/docs/icu4x.rst b/intl/docs/icu4x.rst new file mode 100644 index 0000000000..7f8e13cf3f --- /dev/null +++ b/intl/docs/icu4x.rst @@ -0,0 +1,52 @@ +##### +ICU4X +##### + +This file documents the procedures for building with `ICU4X <https://github.com/unicode-org/icu4x>`__. + +Enabling ICU4X +============== + +#. Add the ``ac_add_options --enable-icu4x`` mozconfig. (This is the default) +#. Do a full build. + +Updating the bundled ICU4X data +=============================== + +ICU4X data is bundled directly as a rust crate with ``compiled_data`` feature. + +But each ``icu_*_data`` crate has all locale data, so it might include unnecessary data. ``icu_segmenter`` uses custom crates in Gecko since it includes unnecessary data such as the word dictionaries for East Asian. + +The script ``intl/update-icu4x.sh`` can generate and update this binary data. If you want to add the new data type, you modify this script and then, run it. + +When using ICU4X 1.4.0 data with the latest data that is hard-coded, you have to run the following. The baked data of ``icu_segmenter`` is generated into ``intl/icu_segmenter_data/data``. + +.. code:: bash + + $ cd $(TOPSRCDIR)/intl + $ ./update-icu4x.sh https://github.com/unicode-org/icu4x.git icu@1.4.0 44.0.0 release-74-1 1.4.0 + +Updating ICU4X +============== + +If you update ICU4X crate into Gecko, you have to check ``Cargo.toml`` in Gecko's root directory. We might have some hacks to replace crate.io version with a custom version. + +C/C++ FFI +========= + +ICU4X provides ``icu_capi`` crate for C/C++ FFI. ``mozilla::intl::GetDataProvider`` returns ``capi::ICU4XDataProvider`` of ``icu_capi``. It can return valid data until shutting down. + +Accessing the data provider from Rust +===================================== + +Use ``compiled_data`` feature. You don't consider data provider. + +Adding new ICU4X features to Gecko +================================== + +To reduce build time and binary size, embedded ICU4X in Gecko is minimal configuration. If you have to add new features, you have to update some files. + +#. Adding the feature to ``icu_capi`` entry in ``js/src/rust/shared/Cargo.toml``. +#. Modify ``[features]`` section in ``intl/icu_capi/Cargo.toml`` to enable ``compiled_data`` feature of added crate. +#. Modify the patch file for ``intl/icu_capi/Cargo.toml`` into ``intl/icu4x-patches``. +#. (Optional) Modify ``intl/update-icu4x.sh`` to add generated ICU4X data if you want to modify ICU4X baked data. diff --git a/intl/docs/index.rst b/intl/docs/index.rst new file mode 100644 index 0000000000..c9e28fa851 --- /dev/null +++ b/intl/docs/index.rst @@ -0,0 +1,27 @@ +==================== +Internationalization +==================== + +Internationalization (`"i18n"`) is a domain of computer science focused on making +software accessible across languages, regions and cultures. +A combination of those is called a `locale`. + +On the most abstract level, Gecko internationalization is a set of algorithms, +data structures and APIs that aim to enable Gecko to work with all human scripts and +languages, both as a UI toolkit and as a web engine. + +In order to achieve that, i18n has to hook into many components such as layout, gfx, dom, +widget, build, front-end, JS engine and accessibility. +It also has to be available across programming languages and frameworks used in the +platform and front-end. + +Below is a list of articles that introduce the concepts necessary to understand and +use Mozilla's I18n APIs. + +.. toctree:: + :maxdepth: 1 + + locale + dataintl + icu + icu4x diff --git a/intl/docs/locale.rst b/intl/docs/locale.rst new file mode 100644 index 0000000000..37f08d859e --- /dev/null +++ b/intl/docs/locale.rst @@ -0,0 +1,610 @@ +.. role:: js(code) + :language: javascript + +================= +Locale management +================= + +A locale is a combination of language, region, script, and regional preferences the +user wants to format their data into. + +There are multiple models of locale data structures in the industry that have varying degrees +of compatibility between each other. Historically, each major platform has used their own, +and many standard bodies provided conflicting proposals. + +Mozilla, alongside with most modern platforms, follows Unicode and W3C recommendation +and conforms to a standard known as `BCP 47`_ which describes a low level textual +representation of a locale known as `language tag`. + +A few examples of language tags: *en-US*, *de*, *ar*, *zh-Hans*, *es-CL*. + +Locales and Language Tags +========================= + +Locale data structure consists of four primary fields. + + - Language (Example: English - *en*, French - *fr*, Serbian - *sr*) + - Script (Example: Latin - *Latn*, Cyrylic - *Cyrl*) + - Region (Example: United States - *US*, Canada - *CA*, Russia - *RU*) + - Variants (Example: Mac OS - *macos*, Windows - *windows*, Linux - *linux*) + +`BCP 47`_ specifies the syntax for each of those fields (called subtags) when +represented as a string. The syntax defines the allowed selection of characters, +their capitalization, and the order in which the fields should be defined. + +Most of the base subtags are valid ISO codes, such as `ISO 639`_ for +language subtag, or `ISO 3166-1`_ for region. + +The examples above present language tags with several fields omitted, which is allowed +by the standard. + +On top of that, a locale may contain: + + - extensions and private fields + These fields can be used to carry additional information about a locale. + Mozilla currently has partial support for them in the JS implementation and plans to + extend support to all APIs. + - extkeys and "grandfathered" tags (unfortunate language, but part of the spec) + Mozilla does not support these yet. + + +An example locale can be visualized as: + +.. code-block:: javascript + + { + "language": "sr", + "script": "Cyrl", + "region": "RU", + "variants": [], + "extensions": {}, + "privateuse": [], + } + +which can be then serialized into a string: **"sr-Cyrl-RU"**. + +.. important:: + + Since locales are often stored and passed around the codebase as + language tag strings, it is important to always use an appropriate + API to parse, manipulate and serialize them. + Avoid `Do-It-Yourself` solutions which leave your code fragile and may + break on unexpected language tag structures. + +Locale Fallback Chains +====================== + +Locale sensitive operations are always considered "best-effort". That means that it +cannot be assumed that a perfect match will exist between what the user requested and what +the API can provide. + +As a result, the best practice is to *always* operate on locale fallback chains - +ordered lists of locales according to the user preference. + +An example of a locale fallback chain may be: :js:`["es-CL", "es-ES", "es", "fr", "en"]`. + +The above means a request to format the data according to the Chilean Spanish if possible, +fall back to Spanish Spanish, then any (generic) Spanish, French and eventually to +English. + +.. important:: + + It is *always* better to use a locale fallback chain over a single locale. + In case there's only one locale available, a list with one element will work + while allowing for future extensions without a costly refactor. + +Language Negotiation +==================== + +Due to the imperfections in data matching, all operations on locales should always +use a language negotiation algorithm to resolve the best available set of locales, +based on the list of all available locales and an ordered list of requested locales. + +Such algorithms may vary in sophistication and number of strategies. Mozilla's +solution is based on modified logic from `RFC 5656`_. + +The three lists of locales used in negotiation: + + - **Available** - locales that are locally installed + - **Requested** - locales that the user selected in decreasing order of preference + - **Resolved** - result of the negotiation + +The result of a negotiation is an ordered list of locales that are available to +the system, and the consumer is expected to attempt using the locales in the +resolved order. + +Negotiation should be used in all scenarios like selecting language resources, +calendar, number formatting, etc. + +Single Locale Matching +---------------------- + +Every negotiation strategy goes through a list of steps in an attempt to find the +best possible match between locales. + +The exact algorithm is custom, and consists of a 6 level strategy: + +:: + + 1) Attempt to find an exact match for each requested locale in available + locales. + Example: ['en-US'] * ['en-US'] = ['en-US'] + + 2) Attempt to match a requested locale to an available locale treated + as a locale range. + Example: ['en-US'] * ['en'] = ['en'] + ^^ + |-- becomes 'en-*-*-*' + + 3) Attempt to use the maximized version of the requested locale, to + find the best match in available locales. + Example: ['en'] * ['en-GB', 'en-US'] = ['en-US'] + ^^ + |-- ICU likelySubtags expands it to 'en-Latn-US' + + 4) Attempt to look for a different variant of the same locale. + Example: ['ja-JP-win'] * ['ja-JP-mac'] = ['ja-JP-mac'] + ^^^^^^^^^ + |----------- replace variant with range: 'ja-JP-*' + + 5) Attempt to look for a maximized version of the requested locale, + stripped of the region code. + Example: ['en-CA'] * ['en-ZA', 'en-US'] = ['en-US', 'en-ZA'] + ^^^^^ + |----------- look for likelySubtag of 'en': 'en-Latn-US' + + 6) Attempt to look for a different region of the same locale. + Example: ['en-GB'] * ['en-AU'] = ['en-AU'] + ^^^^^ + |----- replace region with range: 'en-*' + +Filtering / Matching / Lookup +----------------------------- + +When negotiating between lists of locales, Mozilla's :js:`LocaleService` API +offers three language negotiation strategies: + +Filtering +^^^^^^^^^ + +This is the most common scenario, where there is an advantage in creating a +maximal possible list of locales that the user may benefit from. + +An example of a scenario: + +.. code-block:: javascript + + let requested = ["fr-CA", "en-US"]; + let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-CH"]; + + let result = Services.locale.negotiateLanguages(requested, available); + + result == ["fr-CA", "fr", "fr-CH", "en-GB", "en-ZA"]; + +In the example above the algorithm was able to match *"fr-CA"* as a perfect match, +but then was able to find other matches as well - a generic French is a very +good match, and Swiss French is also very close to the top requested language. + +In case of the second of the requested locales, unfortunately American English +is not available, but British English and South African English are. + +The algorithm is greedy and attempts to match as many locales +as possible. This is usually what the developer wants. + +Matching +^^^^^^^^ + +In less common scenarios the code needs to match a single, best available locale for +each of the requested locales. + +An example of this scenario: + +.. code-block:: javascript + + let requested = ["fr-CA", "en-US"]; + let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-ZH"]; + + let result = Services.locale.negotiateLanguages( + requested, + available, + undefined, + Services.locale.langNegStrategyMatching); + + result == ["fr-CA", "en-GB"]; + +The best available locales for *"fr-CA"* is a perfect match, and for *"en-US"*, the +algorithm selected British English. + +Lookup +^^^^^^ + +The third strategy should be used in cases where no matter what, only one locale +can be ever used. Some third-party APIs don't support fallback and it doesn't make +sense to continue resolving after finding the first locale. + +It is still advised to continue using this API as a fallback chain list, just in +this case with a single element. + +.. code-block:: javascript + + let requested = ["fr-CA", "en-US"]; + let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-ZH"]; + + let result = Services.locale.negotiateLanguages( + requested, + available, + Services.locale.defaultLocale, + Services.locale.langNegStrategyLookup); + + result == ["fr-CA"]; + +Default Locale +-------------- + +Besides *Available*, *Requested* and *Resolved* locale lists, there's also a concept +of *DefaultLocale*, which is a single locale out of the list of available ones that +should be used in case there is no match to be found between available and +requested locales. + +Every Firefox is built with a single default locale - for example +**Firefox zh-CN** has *DefaultLocale* set to *zh-CN* since this locale is guaranteed +to be packaged in, have all the resources, and should be used if the negotiation fails +to return any matches. + +.. code-block:: javascript + + let requested = ["fr-CA", "en-US"]; + let available = ["it", "de", "zh-CN", "pl", "sr-RU"]; + let defaultLocale = "zh-CN"; + + let result = Services.locale.negotiateLanguages(requested, available, defaultLocale); + + result == ["zh-CN"]; + +Chained Language Negotiation +---------------------------- + +In some cases the user may want to link a language selection to another component. + +For example, a Firefox extension may come with its own list of available locales, which +may have locales that Firefox doesn't. + +In that case, negotiation between user requested locales and the add-on's list may result +in a selection of locales superseding that of Firefox itself. + + +.. code-block:: none + + Fx Available + +-------------+ + | it, fr, ar | + +-------------+ Fx Locales + | +--------+ + +--------------> | fr, ar | + | +--------+ + Requested | + +----------------+ + | es, fr, pl, ar | + +----------------+ Add-on Locales + | +------------+ + +--------------> | es, fr, ar | + Add-on Available | +------------+ + +-----------------+ + | de, es, fr, ar | + +-----------------+ + + +In that case, an add-on may end up being displayed in Spanish, while Firefox UI will +use French. In most cases this results in a bad UX. + +In order to avoid that, one can chain the add-on negotiation and take Firefox's resolved +locales as a `requested`, and negotiate that against the add-ons' `available` list. + +.. code-block:: none + + Fx Available + +-------------+ + | it, ar, fr | + +-------------+ Fx Locales (as Add-on Requested) + | +--------+ + +--------------> | fr, ar | + | +--------+ + Requested | | Add-on Locales + +----------------+ | +--------+ + | es, fr, pl, ar | +-------------> | fr, ar | + +----------------+ | +--------+ + | + Add-on Available | + +-----------------+ + | de, es, ar, fr | + +-----------------+ + +Available Locales +================= + +In Gecko, available locales come from the `Packaged Locales` and the installed +`language packs`. Language packs are a variant of WebExtensions providing just +localized resources for one or more languages. + +The primary notion of which locales are available is based on which locales Gecko has +UI localization resources for, and other datasets such as internationalization may +carry different lists of available locales. + +Requested Locales +================= + +The list of requested locales can be read and set using :js:`LocaleService::requestedLocales` API. + +Using the API will perform necessary sanity checks and canonicalize the values. + +After the sanitization, the value will be stored in a pref :js:`intl.locale.requested`. +The pref usually will store a comma separated list of valid BCP47 locale +codes, but it can also have two special meanings: + + - If the pref is not set at all, Gecko will use the default locale as the requested one. + - If the pref is set to an empty string, Gecko will look into OS app locales as the requested. + +The former is the current default setting for Firefox Desktop, and the latter is the +default setting for Firefox for Android. + +If the developer wants to programmatically request the app to follow OS locales, +they can assign :js:`null` to :js:`requestedLocales`. + +Regional Preferences +==================== + +Every locale comes with a set of default preferences that are specific to a culture +and region. This contains preferences such as calendar system, way to display +time (24h vs 12h clock), which day the week starts on, which days constitute a weekend, +what numbering system and date time formatting a given locale uses +(for example "MM/DD" in en-US vs "DD/MM" in en-AU). + +For all such preferences Gecko has a list of default settings for every region, +but there's also a degree of customization every user may want to make. + +All major operating systems have a Settings UI for selecting those preferences, +and since Firefox does not provide its own, Gecko looks into the OS for them. + +A special API :js:`mozilla::intl::OSPreferences` handles communication with the +host operating system, retrieving regional preferences and altering +internationalization formatting with user preferences. + +One thing to notice is that the boundary between regional preferences and language +selection is not strong. In many cases the internationalization formats +will contain language specific terms and literals. For example a date formatting +pattern into Japanese may look like this - *"2018年3月24日"*, or the date format +may contains names of months or weekdays to be translated +("April", "Tuesday" etc.). + +For that reason it is tricky to follow regional preferences in a scenario where Operating +System locale selection does not match the Firefox UI locales. + +Such behavior might lead to a UI case like "Today is 24 października" in an English Firefox +with Polish date formats. + +For that reason, by default, Gecko will *only* look into OS Preferences if the *language* +portion of the locale of the OS and Firefox match. +That means that if Windows is in "**en**-AU" and Firefox is in "**en**-US" Gecko will look +into Windows Regional Preferences, but if Windows is in "**de**-CH" and Firefox +is in "**fr**-FR" it won't. +In order to force Gecko to look into OS preferences irrelevant of the language match, +set the flag :js:`intl.regional_prefs.use_os_locales` to :js:`true`. + +UI Direction +------------ + +Since the UI direction is so tightly coupled with the locale selection, the +main method of testing the directionality of the Gecko app lives in LocaleService. + +:js:`LocaleService::IsAppLocaleRTL` returns a boolean indicating if the current +direction of the app UI is right-to-left. + +Default and Last Fallback Locales +================================= + +Every Gecko application is built with a single locale as the default one. Such locale +is guaranteed to have all linguistic resources available, should be used +as the default locale in case language negotiation cannot find any match, and also +as the last locale to look for in a fallback chain. + +If all else fails, Gecko also support a notion of last fallback locale, which is +currently hardcoded to *"en-US"*, and is the very final locale to try in case +nothing else (including the default locale) works. +Notice that Unicode and ICU use *"en-GB"* in that role because more English speaking +people around the World recognize British regional preferences than American (metric vs. +imperial, Fahrenheit vs Celsius etc.). +Mozilla may switch to *"en-GB"* in the future. + +Packaged Locales +================ + +When the Gecko application is being packaged it bundles a selection of locale resources +to be available within it. At the moment, for example, most Firefox for Android +builds come with almost 100 locales packaged into it, while Desktop Firefox comes +with usually just one packaged locale. + +There is currently work being done on enabling more flexibility in how +the locales are packaged to allow for bundling applications with different +sets of locales in different areas - dictionaries, hyphenations, product language resources, +installer language resources, etc. + +Web Exposed Locales +==================== + +For anti-tracking or some other reasons, we tend to expose spoofed locale to web content instead +of default locales. This can be done by setting the pref :js:`intl.locale.privacy.web_exposed`. +The pref is a comma separated list of locale, and empty string implies default locales. + +The pref has no function while :js:`privacy.spoof_english` is set to 2, where *"en-US"* will always +be returned. + +Multi-Process +============= + +Locale management can operate in a client/server model. This allows a Gecko process +to manage locales (server mode) or just receive the locale selection from a parent +process (client mode). + +The client mode is currently used by all child processes of Desktop Firefox, and +may be used by, for example, GeckoView to follow locale selection from a parent +process. + +To check the mode the process is operating in, the :js:`LocaleService::IsServer` method is available. + +Note that :js:`L10nRegistry.registerSources`, :js:`L10nRegistry.updateSources`, and +:js:`L10nRegistry.removeSources` each trigger an IPC synchronization between the parent +process and any extant content processes, which is expensive. If you need to change the +registration of multiple sources, the best way to do so is to coalesce multiple requests +into a single array and then call the method once. + +Mozilla Exceptions +================== + +There's currently only a single exception of the BCP47 used, and that's +a legacy "ja-JP-mac" locale. The "mac" is a variant and BCP47 requires all variants +to be 5-8 character long. + +Gecko supports the limitation by accepting the 3-letter variants in our APIs and also +provides a special :js:`appLocalesAsLangTags` method which returns this locale in that form. +(:js:`appLocalesAsBCP47` will canonicalize it and turn into `"ja-JP-macos"`). + +Usage of language negotiation etc. shouldn't rely on this behavior. + +Events +====== + +:js:`LocaleService` emits two events: :js:`intl:app-locales-changed` and +:js:`intl:requested-locales-changed` which all code can listen to. + +Those events may be broadcasted in response to new language packs being installed, or +uninstalled, or user selection of languages changing. + +In most cases, the code should observe the :js:`intl:app-locales-changed` +and react to only that event since this is the one indicating a change +in the currently used language settings that the components should follow. + +Testing +======= + +Many components may have logic encoded to react to changes in requested, available +or resolved locales. + +In order to test the component's behavior, it is important to replicate +the environment in which such change may happen. + +Since in most cases it is advised for a component to tie its +language negotiation to the main application (see `Chained Language Negotiation`), +it is not enough to add a new locale to trigger the language change. + +First, it is necessary to add a new locale to the available ones, then change +the requested, and only that will result in a new negotiation and language +change happening. + +There are two primary ways to add a locale to available ones. + +Testing Localization +-------------------- + +If the goal is to test that the correct localization ends up in the correct place, +the developer needs to register a new :js:`L10nFileSource` in :js:`L10nRegistry` and +provide a mock cached data to be returned by the API. + +It may look like this: + +.. code-block:: javascript + + let source = L10nFileSource.createMock( + "mock-source", "app", + ["ko-KR", "ar"], + "resource://mock-addon/localization/{locale}", + [ + { + path: "resource://mock-addon/localization/ko-KR/test.ftl", + source: "key = Value in Korean" + }, + { + path: "resource://mock-addon/localization/ar/test.ftl", + source: "key = Value in Arabic" + } + ] + ); + + L10nRegistry.registerSources([fs]); + + let availableLocales = Services.locale.availableLocales; + + assert(availableLocales.includes("ko-KR")); + assert(availableLocales.includes("ar")); + + Services.locale.requestedLocales = ["ko-KR"]; + + let appLocales = Services.locale.appLocalesAsBCP47; + assert(appLocales[0], "ko-KR"); + +From here, a resource :js:`test.ftl` can be added to a `Localization` and for ID :js:`key` +the correct value from the mocked cache will be returned. + +Testing Locale Switching +------------------------ + +The second method is much more limited, as it only mocks the locale availability, +but it is also simpler: + +.. code-block:: javascript + + Services.locale.availableLocales = ["ko-KR", "ar"]; + Services.locale.requestedLocales = ["ko-KR"]; + + let appLocales = Services.locale.appLocalesAsBCP47; + assert(appLocales[0], "ko-KR"); + +In the future, Mozilla plans to add a third way for add-ons (`bug 1440969`_) +to allow for either manual or automated testing purposes disconnecting its locales +from the main application ones. + +Testing the outcome +------------------- + +Except of testing for reaction to locale changes, it is advised to avoid writing +tests that expect a certain locale to be selected, or certain internationalization +or localization data to be used. + +Doing so locks down the test infrastructure to be only usable when launched in +a single locale environment and requires those tests to be updated whenever the underlying +data changes. + +In the case of testing locale selection it is best to use a fake locale like :js:`x-test`, that +will not be present at the beginning of the test. + +In the case of testing for internationalization data it is best to use :js:`resolvedOptions()`, +to verify the right data is being used, rather than comparing the output string. + +In the case of localization, it is best to test against the correct :js:`data-l10n-id` +being set or, in edge cases, verify that a given variable is present in the string using +:js:`String.prototype.includes`. + +Deep Dive +========= + +Below is a list of articles with additional +details on selected subjects: + +.. toctree:: + :maxdepth: 1 + + locale_env + locale_startup + +Feedback +======== + +In case of questions, please consult Intl module peers. + + +.. _RFC 5656: https://tools.ietf.org/html/rfc5656 +.. _BCP 47: https://tools.ietf.org/html/bcp47#section-2.1 +.. _ISO 639: http://www.loc.gov/standards/iso639-2/php/code_list.php +.. _ISO 3166-1: https://www.iso.org/iso-3166-country-codes.html +.. _Intl.Locale: https://bugzilla.mozilla.org/show_bug.cgi?id=1433303 +.. _fluent-locale: https://docs.rs/fluent-locale/ +.. _bug 1440969: https://bugzilla.mozilla.org/show_bug.cgi?id=1440969 diff --git a/intl/docs/locale_env.rst b/intl/docs/locale_env.rst new file mode 100644 index 0000000000..7f9f2fc754 --- /dev/null +++ b/intl/docs/locale_env.rst @@ -0,0 +1,37 @@ +Environments +============ + +While all the concepts described above apply to all programming languages and frameworks +used by Mozilla, there are differences in completeness of the implementation. + +Below is the current list of APIs supported in each environment and examples of how to +use them: + +C++ +--- + +In C++ the core API for Locale is :js:`mozilla::intl::Locale` and the service for locale +management is :js:`mozilla::intl::LocaleService`. + +For any OSPreference operations there's :js:`mozilla::intl::OSPreferences`. + + +JavaScript +---------- + +In JavaScript users can use :js:`mozilla.org/intl/mozILocaleService` XPCOM API to access +the LocaleService and :js:`mozilla.org/intl/mozIOSPreferences` for OS preferences. + +The LocaleService API is exposed as :js:`Services.locale` object. + +There's currently no API available for operations on language tags and Locale objects, +but `Intl.Locale`_ API is in the works. + +Rust +---- + +For Rust Mozilla provides a crate `fluent-locale`_ which implements the concepts described +above. + +.. _Intl.Locale: https://bugzilla.mozilla.org/show_bug.cgi?id=1433303 +.. _fluent-locale: https://docs.rs/fluent-locale/ diff --git a/intl/docs/locale_startup.rst b/intl/docs/locale_startup.rst new file mode 100644 index 0000000000..ea6849b117 --- /dev/null +++ b/intl/docs/locale_startup.rst @@ -0,0 +1,96 @@ +Startup +======= + +There are cases where it may be important to understand how Gecko locale management +acts during the startup. + +Below is the description of the `server` mode, since the `client` mode is starting +with no data and doesn't perform any operations waiting for the parent to fill +basic locale lists (`requested` and `appLocales`) and then maintain them in a +unidirectional way. + +Data Types +---------- + +There are two primary data types involved in negotiating locales during startup: +`requested` and `available`. +Throughout the startup different sources for this lists become available, and +in result the values for those lists change. + +Data Sources +------------ + +There are three primary sources that become available during the bootstrap. + +1) Packaged locale lists stored in files :js:`update.locale` and :js:`multilocale.txt`. + +2) User preferences read from the profile. + +3) Language packs installed in user profile or system wide. + +Bootstrap +--------- + +1) Packaged Data +^^^^^^^^^^^^^^^^ + +In the `server` mode Gecko starts with no knowledge of `available` or `requested` +locales. + +Initially, all fields are resolved lazily, so no data for available, requested, +default or resolved locales is retrieved. + +If any code queries any of the APIs, it triggers the initial data fetching +and language negotiation. + +The initial request comes from the XPCLocale which is initializing +the first JS context and needs to know which locale the JS context should use as +the default. + +At that moment :js:`LocaleService` fetches the list of available locales, using +packaged locales which are retrieved via :js:`multilocale.txt` file in the toolkit's +package. +This gives LocaleService information about which locales are initially available. + +Notice that this happens before any of the language packs gets registered, so +at that point Gecko only knows about packaged locales. + +For requested locales, the initial request comes before user profile preferences +are being read, so the data is being fetched using packaged preferences. + +In case of Desktop Firefox the :js:`intl.locale.requested` pref will be not set, +which means Gecko will use the default locale which is retrieved from +:js:`update.locale` file (also packaged). + +This means that the initial result of language negotiation is between packaged +locales as available and the default requested locale. + +2) Profile Prefs Read +^^^^^^^^^^^^^^^^^^^^^ + +Next, the profile is being read and if the user set any requested locales, +LocaleService updates its list of requested locales and broadcasts +:js:`intl:requested-locales-changed` event. + +This may lead to language renegotiation if the requested locale is one of the packaged +ones. In that case, :js:`intl:app-locales-changed` will be broadcasted. + +3) Language Packs Registered +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Finally, the AddonManager registers all the language packs, they get added to +:js:`L10nRegistry` and in result LocaleService's available locales get updated. + +That triggers language negotiation and, if the language from the language pack +is used in the requested list, final list of locales is being set. + +All of that happens before any UI is being built, but there's no guarantee of this +order being preserved, so it is important to understand that, depending on where the +code is used during the startup, it may receive different list of locales. + +In order to maintain the correct locale settings it is important to set an observer +on :js:`intl:app-locales-changed` and update the code when the locale list changes. + +That ensures the code always uses the best possible locale selection during startup, +but also during runtime in case user changes their requested locale list, or +language packs are updated/removed on the fly. |