diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 19:33:14 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 19:33:14 +0000 |
commit | 36d22d82aa202bb199967e9512281e9a53db42c9 (patch) | |
tree | 105e8c98ddea1c1e4784a60a5a6410fa416be2de /intl/l10n/docs | |
parent | Initial commit. (diff) | |
download | firefox-esr-36d22d82aa202bb199967e9512281e9a53db42c9.tar.xz firefox-esr-36d22d82aa202bb199967e9512281e9a53db42c9.zip |
Adding upstream version 115.7.0esr.upstream/115.7.0esr
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'intl/l10n/docs')
-rw-r--r-- | intl/l10n/docs/crosschannel/commits.rst | 33 | ||||
-rw-r--r-- | intl/l10n/docs/crosschannel/content.rst | 129 | ||||
-rw-r--r-- | intl/l10n/docs/crosschannel/index.rst | 88 | ||||
-rw-r--r-- | intl/l10n/docs/crosschannel/repositories.rst | 14 | ||||
-rw-r--r-- | intl/l10n/docs/fluent/index.rst | 25 | ||||
-rw-r--r-- | intl/l10n/docs/fluent/review.rst | 303 | ||||
-rw-r--r-- | intl/l10n/docs/fluent/tutorial.rst | 750 | ||||
-rw-r--r-- | intl/l10n/docs/glossary.rst | 22 | ||||
-rw-r--r-- | intl/l10n/docs/index.rst | 26 | ||||
-rw-r--r-- | intl/l10n/docs/migrations/fluent.rst | 153 | ||||
-rw-r--r-- | intl/l10n/docs/migrations/index.rst | 53 | ||||
-rw-r--r-- | intl/l10n/docs/migrations/legacy.rst | 642 | ||||
-rw-r--r-- | intl/l10n/docs/migrations/localizations.rst | 42 | ||||
-rw-r--r-- | intl/l10n/docs/migrations/overview.rst | 136 | ||||
-rw-r--r-- | intl/l10n/docs/migrations/testing.rst | 58 | ||||
-rw-r--r-- | intl/l10n/docs/overview.rst | 199 |
16 files changed, 2673 insertions, 0 deletions
diff --git a/intl/l10n/docs/crosschannel/commits.rst b/intl/l10n/docs/crosschannel/commits.rst new file mode 100644 index 0000000000..955baf734f --- /dev/null +++ b/intl/l10n/docs/crosschannel/commits.rst @@ -0,0 +1,33 @@ +Commits and Metadata +==================== + +When creating the commit for a particular revision, we need to find the +revisions on the other branches of cross-channel to unify the created +content with. + +To do so, the cross-channel algorithm keeps track of metadata associated with +a revision in the target repository. This metadata is stored in the commit +message: + +.. code-block:: bash + + X-Channel-Repo: mozilla-central + X-Channel-Converted-Revision: af4a1de0a11cb3afbb7e50bcdd0919f56c23959a + X-Channel-Repo: releases/mozilla-beta + X-Channel-Revision: 65fb3f6bce94f8696e1571c2d48104dbdc0b31e2 + X-Channel-Repo: releases/mozilla-release + X-Channel-Revision: 1c5bf69f887359645f1c3df4de0d0e3caf957e59 + X-Channel-Repo: releases/mozilla-esr68 + X-Channel-Revision: 4cbbc30e1ebc3254ec74dc041aff128c81220507 + +This metadata is appended to the original commit message when committing. +For each branch in the cross-channel configuration we have the name and +a revision. The revision that's currently converted is explicitly highlighted +by the ``-Converted-`` marker. On hg.mozilla.org, those revisions are also +marked up as links, so one can navigate from the converted changeset to the +original patch. + +When starting the update for an incremental graph from the previous section, +the metadata is read from the target repository, and the data for the +currently converted branch is updated for each commit. Each revision in +this metadata then goes into the algorithm to create the unified content. diff --git a/intl/l10n/docs/crosschannel/content.rst b/intl/l10n/docs/crosschannel/content.rst new file mode 100644 index 0000000000..01f5e1ab66 --- /dev/null +++ b/intl/l10n/docs/crosschannel/content.rst @@ -0,0 +1,129 @@ +===================== +Cross-channel Content +===================== + +When creating the actual content, there's a number of questions to answer. + +#. Where to take content from? +#. Which content to take? +#. Where to put the content? +#. What to put into each file? + +Content Sources +--------------- + +The content of each revision in ``gecko-strings`` corresponds to a given +revision in each original repository. For example, we could have + ++------------------+--------------+ +| Repository | Revision | ++==================+==============+ +| mozilla-central | 4c92802939c1 | ++------------------+--------------+ +| mozilla-beta | ace4081e8200 | ++------------------+--------------+ +| mozilla-release | 2cf08fbb92b2 | ++------------------+--------------+ +| mozilla-esr68 | 2cf9e0c91d51 | ++------------------+--------------+ +| comm-central | 3f3fc2c0d804 | ++------------------+--------------+ +| comm-beta | f95a6f4408a3 | ++------------------+--------------+ +| comm-release | dc2694f035fa | ++------------------+--------------+ +| comm-esr68 | d05d4d87d25c | ++------------------+--------------+ + +The assumption is that there's no content that's shared between ``mozilla-*`` and +``comm-*``, so we can just convert one repository and its branches at a time. + +Covered Content +--------------- + +Which content is included in ``gecko-strings`` is +controlled by the project configurations of each product, on each branch. +Currently, those are :file:`browser/locales/l10n.toml` and +:file:`mobile/android/locales/l10n.toml` in ``mozilla-central``. + +Created Content Structure +------------------------- + +The created content is laid out in the directory in the same structure as +the files in ``l10n-central``. The localizable files end up like this: + +.. code-block:: + + browser/ + browser/ + browser.ftl + chrome/ + browser.properties + toolkit/ + toolkit/ + about/aboutAbout.ftl + +This matches the file locations in ``mozilla-central`` with the +:file:`locales/en-US` part dropped. + +The project configuration files are also converted and added to the +created file structure. As they're commonly in the :file:`locales` folder +which we strip, they're added to the dedicated :file:`_configs` folder. + +.. code-block:: bash + + $ ls _configs + browser.toml devtools-client.toml devtools-shared.toml + mobile-android.toml toolkit.toml + + +L10n File Contents +------------------ + +Let's assume we have a file to localize in several revisions with different +content. + +== ======= ==== ======= +ID central beta release +== ======= ==== ======= +a one one one +b two two +c three +d four old old +== ======= ==== ======= + +The algorithm then creates content, taking localizable values from the left-most +branch, where *central* overrides *beta*, and *beta* overrides *release*. This +creates content as follows: + +== ======= +ID content +== ======= +a one +b two +c three +d four +== ======= + +If a file doesn't exist in one of the revisions, that revision is dropped +from the content generation for this particular file. + +.. note:: + + The example of the forth string here highlights the impact that changing + an existing string has. We ship one translation of *four* to central, + beta, and release. That's only a good idea if it doesn't matter which of the + two versions of the English copy got translated. + +Project configurations +---------------------- + +The TOML files for project configuration are processed, but not unified +across branches at this point. + +.. note:: + + The content of the ``-central`` branch determines what's localized + from ``gecko-strings``. Thus that TOML file needs to include all + directories across all branches for now. Removing entries requires + that the content is obsolete on all branches in cross-channel. diff --git a/intl/l10n/docs/crosschannel/index.rst b/intl/l10n/docs/crosschannel/index.rst new file mode 100644 index 0000000000..faa28d6157 --- /dev/null +++ b/intl/l10n/docs/crosschannel/index.rst @@ -0,0 +1,88 @@ +============= +Cross-channel +============= + +Firefox is localized with a process nick-named *cross-channel*. This document +explains both the general idea as well as some technical details of that +process. The gist of it is this: + + We use one localization for all release channels. + +There's a number of upsides to that: + +* Localizers maintain a single source of truth. Localizers can work on Nightly, + while updating Beta, Developer Edition or even Release and ESR. +* Localizers can work on strings at their timing. +* Uplifting string changes has less of an impact on the localization toolchain, + and their impact can be evaluated case by case. + +So the problem at hand is to have one localization source +and use that to build 5 different versions of Firefox. The goal is for that +localization to be as complete as possible for each version. While we do +allow for partial localizations, we don't want to enforce partial translations +on any version. + +The process to tackle these follows these steps: + +* Create resource to localize, ``gecko-strings``. + + * Review updates to that resource in *quarantine*. + * Expose a known good state of that resource to localizers. + +* The actual localization work happens in Pontoon. +* Write localizations back to ``l10n-central``. +* Get localizations into the builds. + +.. digraph:: full_tree + + graph [ rankdir=LR ]; + "m-c" -> "quarantine"; + "m-b" -> "quarantine"; + "m-r" -> "quarantine"; + "c-c" -> "quarantine"; + "c-b" -> "quarantine"; + "c-r" -> "quarantine"; + "quarantine" -> "gecko-strings"; + "gecko-strings" -> "Pontoon"; + "Pontoon" -> "l10n-central"; + "l10n-central" -> "Nightly"; + "l10n-central" -> "Beta"; + "l10n-central" -> "Firefox"; + "l10n-central" -> "Daily"; + "l10n-central" -> "Thunderbird"; + { + rank=same; + "quarantine"; + "gecko-strings"; + } + +.. note:: + + The concept behind the quarantine in the process above is to + protect localizers from churn on strings that have technical + problems. Examples like that could be missing localization notes + or copy that should be improved. + +The resource to localize is a Mercurial repository, unifying +all strings to localize for all covered products and branches. Each revision +of this repository holds all the strings for a particular point in time. + +There's three aspects that we'll want to unify here. + +#. Create a version history that allows the localization team + to learn where strings in the generated repository are coming from. +#. Unify the content across different branches for a single app. +#. Unify different apps, coming from different repositories. + +The last item is the easiest, as ``mozilla-*`` and ``comm-*`` don't share +code or history. Thus, they're converted individually to disjunct directories +and files in the target repository, and the Mercurial history of each is interleaved +in the target history. When parents are needed for one repository, they're +rebased over the commits for the other. + +.. toctree:: + :maxdepth: 1 + + commits + content + repositories diff --git a/intl/l10n/docs/crosschannel/repositories.rst b/intl/l10n/docs/crosschannel/repositories.rst new file mode 100644 index 0000000000..8461b32fbd --- /dev/null +++ b/intl/l10n/docs/crosschannel/repositories.rst @@ -0,0 +1,14 @@ +gecko-strings and Quarantine +============================ + +The actual generation is currently done via `taskcluster cron <https://treeherder.mozilla.org/jobs?repo=mozilla-central&searchStr=cross-channel>`_. +The state that is good to use by localizers at large is published at +https://hg.mozilla.org/l10n/gecko-strings/. + +The L10n team is doing a :ref:`review step <exposure-in-gecko-strings>` before publishing the strings, and while +that is ongoing, the intermediate state is published to +https://hg.mozilla.org/l10n/gecko-strings-quarantine/. + +The code is in https://hg.mozilla.org/mozilla-central/file/tip/python/l10n/mozxchannel/, +supported as a mach subcommand in https://hg.mozilla.org/mozilla-central/file/tip/tools/compare-locales/mach_commands.py, +as a taskcluster kind in https://hg.mozilla.org/mozilla-central/file/tip/taskcluster/ci/l10n-cross-channel, and scheduled in cron in https://hg.mozilla.org/mozilla-central/file/tip/.cron.yml. diff --git a/intl/l10n/docs/fluent/index.rst b/intl/l10n/docs/fluent/index.rst new file mode 100644 index 0000000000..84103db5e4 --- /dev/null +++ b/intl/l10n/docs/fluent/index.rst @@ -0,0 +1,25 @@ +====== +Fluent +====== + +`Fluent`_ is a localization system developed by Mozilla, which aims to replace +all existing localization models currently used at Mozilla. + +In case of Firefox it directly supersedes DTD and StringBundle systems, providing +a large number of features and improvements over both of them, for developers +and localizers. + +.. toctree:: + :maxdepth: 2 + + tutorial + review + +Other resources: + + * `Fluent Syntax Guide <http://projectfluent.org/fluent/guide/>`_ + * `Fluent Wiki <https://github.com/projectfluent/fluent/wiki>`_ + * `Fluent.js Wiki <https://github.com/projectfluent/fluent.js/wiki>`_ + * `Fluent DOM L10n Tutorial <https://projectfluent.org/dom-l10n-documentation/>`_ + +.. _Fluent: http://projectfluent.org/ diff --git a/intl/l10n/docs/fluent/review.rst b/intl/l10n/docs/fluent/review.rst new file mode 100644 index 0000000000..83d65ebed9 --- /dev/null +++ b/intl/l10n/docs/fluent/review.rst @@ -0,0 +1,303 @@ +.. role:: bash(code) + :language: bash + +.. role:: js(code) + :language: javascript + +=============================== +Guidelines for Fluent Reviewers +=============================== + +This document is intended as a guideline for developers and reviewers when +working with FTL (Fluent) files. As such, it’s not meant to replace the +`existing extensive documentation`__ about Fluent. + +__ ./tutorial.html + +`Herald`_ is used to set the group `fluent-reviewers`_ as blocking reviewer for +any patch modifying FTL files committed to Phabricator. The person from this +group performing the review will have to manually set other reviewers as +blocking, if the original developer didn’t originally do it. + + +.. hint:: + + In case of doubt, you should always reach out to the l10n team for + clarifications. + + +Message Identifiers +=================== + +While in Fluent it’s possible to use both lowercase and uppercase characters in +message identifiers, the naming convention in Gecko is to use lowercase and +hyphens (*kebab-case*), avoiding CamelCase and underscores. For example, +:js:`allow-button` should be preferred to :js:`allow_button` or +:js:`allowButton`, unless there are technically constraints – like identifiers +generated at run-time from external sources – that make this impractical. + +When importing multiple FTL files, all messages share the same scope in the +Fluent bundle. For that reason, it’s suggested to add scope to the message +identifier itself: using :js:`cancel` as an identifier increases the chances of +having a conflict, :js:`save-dialog-cancel-button` would make it less likely. + +Message identifiers are also used as the ultimate fall back in case of run-time +errors. Having a descriptive message ID would make such fall back more useful +for the user. + +Comments +======== + +When a message includes placeables (variables), there should always be a +comment explaining the format of the variable, and what kind of content it will +be replaced with. This is the format suggested for such comments: + + +.. code-block:: fluent + + # This string is used on a new line below the add-on name + # Variables: + # $name (String) - Add-on author name + cfr-doorhanger-extension-author = by { $name } + + +By default, a comment is bound to the message immediately following it. Fluent +supports both `file-level and group-level comments`__. Be aware that a group +comment will apply to all messages following that comment until the end of the +file. If that shouldn’t be the case, you’ll need to “reset” the group comment, +by adding an empty one (:js:`##`), or moving the section of messages at the end +of the file. + +__ https://projectfluent.org/fluent/guide/comments.html + +Comments are fundamental for localizers, since they don’t see the file as a +whole, or changes as a fragment of a larger patch. Their work happens on a +message at a time, and the context is only provided by comments. + +License headers are standalone comments, that is, a single :js:`#` as prefix, +and the comment is followed by at least one empty line. + +Changes to Existing Messages +============================ + +You must update the message identifier if: + +- The meaning of the sentence has changed. +- You’re changing the morphology of the message, by adding or removing attributes. + +Messages are identified in the entire localization toolchain by their ID. For +this reason, there’s no need to change attribute names. + +If your changes are relevant only for English — for example, to correct a +typographical error or to make letter case consistent — then there is generally +no need to update the message identifier. + +There is a grey area between needing a new ID or not. In some cases, it will be +necessary to look at all the existing translations to determine if a new ID +would be beneficial. You should always reach out to the l10n team in case of +doubt. + +Changing the message ID will invalidate the existing translation, the new +message will be reported as missing in all tools, and localizers will have to +retranslate it. This is the only reliable method to ensure that localizers +update existing localizations, and run-time stop using obsolete translations. + +You must also update all instances where that message identifier is used in the +source code, including localization comments. + +Non-text Elements in Messages +============================= + +When a message includes non text-elements – like anchors or images – make sure +that they have a :js:`data-l10n-name` associated to them. Additional +attributes, like the URL for an anchor or CSS classes, should not be exposed +for localization in the FTL file. More details can be found in `this page`__ +dedicated to DOM overlays. + +__ https://github.com/projectfluent/fluent.js/wiki/DOM-Overlays#text-level-elements + +This information is not relevant if your code is using `fluent-react`_, where +DOM overlays `work differently`__. + +__ https://github.com/projectfluent/fluent.js/wiki/React-Overlays + +Message References +================== + +Consider the following example: + + +.. code-block:: fluent + + newtab-search-box-search-the-web-text = Search the Web + newtab-search-box-search-the-web-input = + .placeholder = { newtab-search-box-search-the-web-text } + .title = { newtab-search-box-search-the-web-text } + + +This might seem to reduce the work for localizers, but it actually doesn’t +help: + +- A change to the referenced message (:js:`newtab-search-box-search-the-web-text`) + would require a new ID also for all messages referencing it. +- Translation memory can help with matching text, not with message references. + +On the other hand, this approach is helpful if, for example, you want to +reference another element of the UI in your message: + + +.. code-block:: fluent + + help-button = Help + help-explanation = Click the { help-button} to access support + + +This enforces consistency and, if :js:`help-button` changes, all other messages +will need to be updated anyway. + +Terms +===== + +Fluent supports a specific type of message, called `term`_. Terms are similar +to regular messages but they can only be used as references in other messages. +They are best used to define vocabulary and glossary items which can be used +consistently across the localization of the entire product. + +Terms are typically used for brand names, like :js:`Firefox` or :js:`Mozilla`: +it allows to have them in one easily identifiable place, and raise warnings +when a localization is not using them. It helps enforcing consistency and brand +protection. If you simply need to reference a message from another message, you +don’t need a term: cross references between messages are allowed, but they +should not be abused, as already described. + +Variants and plurals +==================== + +Consider the following example: + + +.. code-block:: fluent + + items-selected = + { $num -> + [0] Select items. + [one] One item selected. + *[other] { $num } items selected. + } + + +In this example, there’s no guarantee that all localizations will have this +variant covered, since variants are private by design. The correct approach for +the example would be to have a separate message for the :js:`0` case: + + +.. code-block:: fluent + + # Separate messages which serve different purposes. + items-select = Select items + # The default variant works for all values of the selector. + items-selected = + { $num -> + [one] One item selected. + *[other] { $num } items selected. + } + + +As a rule of thumb: + +- Use variants only if the default variant makes sense for all possible values + of the selector. +- The code shouldn’t depend on the availability of a specific variant. + +More examples about selector and variant abuses can be found in `this wiki`__. + +__ https://github.com/projectfluent/fluent/wiki/Good-Practices-for-Developers#prefer-separate-messages-over-variants-for-ui-logic + +In general, also avoid putting a selector in the middle of a sentence, like in +the example below: + + +.. code-block:: fluent + + items-selected = + { $num -> + [one] One item. + *[other] { $num } items + } selected. + + +:js:`1` should only be used in case you want to cover the literal number. If +it’s a standard plural, you should use the :js:`one` category for singular. +Also make sure to always pass the variable to these messages as a number, not +as a string. + +Access Keys +=========== + +The following is a simple potential example of an access key: + +.. code-block:: fluent + + example-menu-item = + .label = Menu Item + .accesskey = M + +Access keys are used in menus in order to help provide easy keyboard shortcut access. They +are useful for both power users, and for users who have accessibility needs. It is +helpful to first read the `Access keys`__ guide in the Windows Developer documentation, +as it outlines the best practices for Windows applications. + +__ https://docs.microsoft.com/en-us/windows/uwp/design/input/access-keys + +There are some differences between operating systems. Linux mostly follows the same +practices as Windows. However, macOS in general does not have good support for accesskeys, +especially in menus. + +When choosing an access key, it's important that it's unique relative to the current level +of UI. It's preferable to avoid letters with descending parts, such as :code:`g`, +:code:`j`, :code:`p`, and :code:`q` as these will not be underlined nicely in Windows or +Linux. Other problematic characters are ones which are narrow, such as :code:`l`, +:code:`i` and :code:`I`. The underline may not be as visible as other letters in +sans-serif fonts. + +Linter +====== + +:bash:`mach lint` includes a :ref:`l10n linter <L10n>`, called :bash:`moz-l10n-lint`. It +can be run locally by developers but also runs on Treeherder: in the Build +Status section of the diff on Phabricator, open the Treeherder Jobs link and +look for the :js:`l1nt` job. + +Besides displaying errors and warnings due to syntax errors, it’s particularly +important because it also checks for message changes without new IDs, and +conflicts with the cross-channel repository used to ship localized versions of +Firefox. + + +.. warning:: + + Currently, there’s an `issue`__ preventing warnings to be displayed in + Phabricator. Checks can be run locally using :bash:`./mach lint -l l10n -W`. + + __ https://github.com/mozilla/code-review/issues/32 + + +Migrating Strings From Legacy or Fluent Files +============================================= + +If a patch is moving legacy strings (.properties, .DTD) to Fluent, it should +also include a recipe to migrate existing strings to FTL messages. The same is +applicable if a patch moves existing Fluent messages to a different file, or +changes the morphology of existing messages without actual changes to the +content. + +Documentation on how to write and test migration recipes is available in `this +page`__. + +__ ./fluent_migrations.html + + +.. _Herald: https://phabricator.services.mozilla.com/herald/ +.. _fluent-reviewers: https://phabricator.services.mozilla.com/tag/fluent-reviewers/ +.. _fluent-react: https://github.com/projectfluent/fluent.js/wiki/React-Bindings +.. _term: https://projectfluent.org/fluent/guide/terms.html diff --git a/intl/l10n/docs/fluent/tutorial.rst b/intl/l10n/docs/fluent/tutorial.rst new file mode 100644 index 0000000000..1312efd448 --- /dev/null +++ b/intl/l10n/docs/fluent/tutorial.rst @@ -0,0 +1,750 @@ +.. role:: html(code) + :language: html + +.. role:: js(code) + :language: javascript + +============================= +Fluent for Firefox Developers +============================= + + +This tutorial is intended for Firefox engineers already familiar with the previous +localization systems offered by Gecko - `DTD`_ and `StringBundle`_ - and assumes +prior experience with those systems. + +For a more hands-on tutorial of understanding Fluent from the ground up, try +following the `Fluent DOMLocalization Tutorial`__, which provides some background on +how Fluent works and walks you through creating a basic web project from scratch that +uses Fluent for localization. + +__ https://projectfluent.org/dom-l10n-documentation/ + +Using Fluent in Gecko +===================== + +`Fluent`_ is a modern localization system introduced into +the Gecko platform with a focus on quality, performance, maintenance and completeness. + +The legacy DTD system is deprecated, and Fluent should be used where possible. + +Getting a Review +---------------- + +If you work on any patch that touches FTL files, you'll need to get a review +from `fluent-reviewers`__. There's a Herald hook that automatically sets +that group as a blocking reviewer. + +__ https://phabricator.services.mozilla.com/tag/fluent-reviewers/ + +Guidelines for the review process are available `here`__. + +__ ./fluent_review.html + +To lighten the burden on reviewers, please take a moment to review some +best practices before submitting your patch for review. + +- `ProjectFluent Good Practices for Developers`_ +- `Mozilla Localization Best Practices For Developers`_ + +.. _ProjectFluent Good Practices for Developers: https://github.com/projectfluent/fluent/wiki/Good-Practices-for-Developers +.. _Mozilla Localization Best Practices For Developers: https://mozilla-l10n.github.io/documentation/localization/dev_best_practices.html + +Major Benefits +============== + +Fluent `ties tightly`__ into the domain of internationalization +through `Unicode`_, `CLDR`_ and `ICU`_. + +__ https://github.com/projectfluent/fluent/wiki/Fluent-and-Standards + +More specifically, the most observable benefits for each group of consumers are + + +Developers +---------- + + - Support for XUL, XHTML, HTML, Web Components, React, JS, Python and Rust + - Strings are available in a single, unified localization context available for both DOM and runtime code + - Full internationalization (i18n) support: date and time formatting, number formatting, plurals, genders etc. + - Strong focus on `declarative API via DOM attributes`__ + - Extensible with custom formatters, Mozilla-specific APIs etc. + - `Separation of concerns`__: localization details, and the added complexity of some languages, don't leak onto the source code and are no concern for developers + - Compound messages link a single translation unit to a single UI element + - `DOM Overlays`__ allow for localization of DOM fragments + - Simplified build system model + - No need for pre-processing instructions + - Support for pseudolocalization + +__ https://github.com/projectfluent/fluent/wiki/Get-Started +__ https://github.com/projectfluent/fluent/wiki/Design-Principles +__ https://github.com/projectfluent/fluent.js/wiki/DOM-Overlays + + +Product Quality +------------------ + + - A robust, multilevel, `error fallback system`__ prevents XML errors and runtime errors + - Simplified l10n API reduces the amount of l10n specific code and resulting bugs + - Runtime localization allows for dynamic language changes and updates over-the-air + - DOM Overlays increase localization security + +__ https://github.com/projectfluent/fluent/wiki/Error-Handling + + +Fluent Translation List - FTL +============================= + +Fluent introduces a file format designed specifically for easy readability +and the localization features offered by the system. + +At first glance the format is a simple key-value store. It may look like this: + +.. code-block:: fluent + + home-page-header = Home Page + + # The label of a button opening a new tab + new-tab-open = Open New Tab + +But the FTL file format is significantly more powerful and the additional features +quickly add up. In order to familiarize yourself with the basic features, +consider reading through the `Fluent Syntax Guide`_ to understand +a more complex example like: + +.. code-block:: fluent + + ### These messages correspond to security and privacy user interface. + ### + ### Please choose simple and non-threatening language when localizing + ### to help user feel in control when interacting with the UI. + + ## General Section + + -brand-short-name = Firefox + .gender = masculine + + pref-pane = + .title = + { PLATFORM() -> + [windows] Options + *[other] Preferences + } + .accesskey = C + + # Variables: + # $tabCount (Number) - number of container tabs to be closed + containers-disable-alert-ok-button = + { $tabCount -> + [one] Close { $tabCount } Container Tab + *[other] Close { $tabCount } Container Tabs + } + + update-application-info = + You are using { -brand-short-name } Version: { $version }. + Please read the <a>privacy policy</a>. + +The above, of course, is a particular selection of complex strings intended to exemplify +the new features and concepts introduced by Fluent. + +.. important:: + + While in Fluent it’s possible to use both lowercase and uppercase characters in message + identifiers, the naming convention in Gecko is to use lowercase and hyphens, avoiding + CamelCase and underscores. For example, `allow-button` should be preferred to + `allow_button` or `allowButton`, unless there are technically constraints – like + identifiers generated at run-time from external sources – that make this impractical. + +In order to ensure the quality of the output, a lot of checks and tooling +is part of the build system. +`Pontoon`_, the main localization tool used to translate Firefox, also supports +Fluent and its features to help localizers in their work. + + +.. _fluent-tutorial-social-contract: + +Social Contract +=============== + +Fluent uses the concept of a `social contract` between developer and localizers. +This contract is established by the selection of a unique identifier, called :js:`l10n-id`, +which carries a promise of being used in a particular place to carry a particular meaning. + +The use of unique identifiers is shared with legacy localization systems in +Firefox. + +.. important:: + + An important part of the contract is that the developer commits to treat the + localization output as `opaque`. That means that no concatenations, replacements + or splitting should happen after the translation is completed to generate the + desired output. + +In return, localizers enter the social contract by promising to provide an accurate +and clean translation of the messages that match the request. + +In Fluent, the developer is not to be bothered with inner logic and complexity that the +localization will use to construct the response. Whether `declensions`__ or other +variant selection techniques are used is up to a localizer and their particular translation. +From the developer perspective, Fluent returns a final string to be presented to +the user, with no l10n logic required in the running code. + +__ https://en.wikipedia.org/wiki/Declension + + +Markup Localization +=================== + +To localize an element in Fluent, the developer adds a new message to +an FTL file and then has to associate an :js:`l10n-id` with the element +by defining a :js:`data-l10n-id` attribute: + +.. code-block:: html + + <h1 data-l10n-id="home-page-header" /> + + <button data-l10n-id="pref-pane" /> + +Fluent will take care of the rest, populating the element with the message value +in its content and all localizable attributes if defined. + +The developer provides only a single message to localize the whole element, +including the value and selected attributes. + +The value can be a whole fragment of DOM: + +.. code-block:: html + + <p data-l10n-id="update-application-info" data-l10n-args='{"version": "60.0"}'> + <a data-l10n-name="privacy-url" href="http://www.mozilla.org/privacy" /> + </p> + +.. code-block:: fluent + + -brand-short-name = Firefox + update-application-info = + You are using { -brand-short-name } Version: { $version }. + Please read the <a data-l10n-name="privacy-url">privacy policy</a>. + + +Fluent will overlay the translation onto the source fragment preserving attributes like +:code:`class` and :code:`href` from the source and adding translations for the elements +inside. The resulting localized content will look like this: + +.. code-block:: + + <p data-l10n-id="update-application-info" data-l10n-args='{"version": "60.0"}'"> + You are using Firefox Version: 60.0. + Please read the <a href="http://www.mozilla.org/privacy">privacy policy</a>. + </p> + + +This operation is sanitized, and Fluent takes care of selecting which elements and +attributes can be safely provided by the localization. +The list of allowed elements and attributes is `maintained by the W3C`__, and if +the developer needs to allow for localization of additional attributes, they can +allow them using :code:`data-l10n-attrs` list: + +.. code-block:: html + + <label data-l10n-id="search-input" data-l10n-attrs="style" /> + +The above example adds an attribute :code:`style` to be allowed on this +particular :code:`label` element. + + +External Arguments +------------------ + +Notice in the previous example the attribute :code:`data-l10n-args`, which is +a JSON object storing variables exposed by the developer to the localizer. + +This is the main channel for the developer to provide additional variables +to be used in the localization. + +Arguments are rarely needed for situations where it’s currently possible to use +DTD, since such variables would need to be computed from the code at runtime. +It's worth noting that, when the :code:`l10n-args` are set in +the runtime code, they are in fact encoded as JSON and stored together with +:code:`l10n-id` as an attribute of the element. + +__ https://www.w3.org/TR/2011/WD-html5-20110525/text-level-semantics.html + + +Runtime Localization +==================== + +In almost every case the JS runtime code will operate on a particular document, either +XUL, XHTML or HTML. + +If the document has its markup already localized, then Fluent exposes a new +attribute on the :js:`document` element - :js:`document.l10n`. + +This property is an object of type :js:`DOMLocalization` which maintains the main +localization context for this document and exposes it to runtime code as well. + +With a focus on `declarative localization`__, the primary method of localization is +to alter the localization attributes in the DOM. Fluent provides a method to facilitate this: + +.. code-block:: javascript + + document.l10n.setAttributes(element, "new-panel-header"); + +This will set the :code:`data-l10n-id` on the element and translate it before the next +animation frame. + +This API can be used to set both the ID and the arguments at the same time. + +.. code-block:: javascript + + document.l10n.setAttributes(element, "containers-disable-alert-ok-button", { + tabCount: 5 + }); + +If only the arguments need to be updated, then it's possible to use the :code:`setArgs` +method. + +.. code-block:: javascript + + document.l10n.setArgs(element, { + tabCount: 5 + }); + +On debug builds if the Fluent arguments are not provided, then Firefox will crash. This +is done so that these errors are caught in CI. On rare occasions it may be necessary +to work around this crash by providing a blank string as an argument value. + +__ https://github.com/projectfluent/fluent/wiki/Good-Practices-for-Developers + + +Non-Markup Localization +----------------------- + +In rare cases, when the runtime code needs to retrieve the translation and not +apply it onto the DOM, Fluent provides an API to retrieve it: + +.. code-block:: javascript + + let [ msg ] = await document.l10n.formatValues([ + {id: "remove-containers-description"} + ]); + + alert(msg); + +This model is heavily discouraged and should be used only in cases where the +DOM annotation is not possible. + +.. note:: + + This API is available as asynchronous. In case of Firefox, + the only non-DOM localizable calls are used where the output goes to + a third-party like Bluetooth, Notifications etc. + All those cases should already be asynchronous. If you can't avoid synchronous + access, you can use ``mozILocalization.formatMessagesSync`` with synchronous IO. + + +Internationalization +==================== + +The majority of internationalization issues are implicitly handled by Fluent without +any additional requirement. Full Unicode support, `bidirectionality`__, and +correct number formatting work without any action required from either +developer or localizer. + +__ https://github.com/projectfluent/fluent/wiki/BiDi-in-Fluent + +.. code-block:: javascript + + document.l10n.setAttributes(element, "welcome-message", { + userName: "اليسع", + count: 5 + }); + +A message like this localized to American English will correctly wrap the user +name in directionality marks, allowing the layout engine to determine how to +display the bidirectional text. + +On the other hand, the same message localized to Arabic will use the Eastern Arabic +numeral for number "5". + + +Plural Rules +------------ + +The most common localization feature is the ability to provide different variants +of the same string depending on plural categories. Fluent ties into the Unicode CLDR +standard called `Plural Rules`_. + +In order to allow localizers to use it, all the developer has to do is to pass +an external argument number: + +.. code-block:: javascript + + document.l10n.setAttributes(element, "unread-warning", { unreadCount: 5 }); + +Localizers can use the argument to build a multi variant message if their +language requires that: + +.. code-block:: fluent + + unread-warning = + { $unreadCount -> + [one] You have { $unreadCount } unread message + *[other] You have { $unreadCount } unread messages + } + +If the variant selection is performed based on a number, Fluent matches that +number against literal numbers as well as its `plural category`__. + +If the given translation doesn't need pluralization for the string (for example +Japanese often will not), the localizer can replace it with: + +.. code-block:: fluent + + unread-warning = You have { $unreadCount } unread messages + +and the message will preserve the social contract. + +One additional feature is that the localizer can further improve the message by +specifying variants for particular values: + +.. code-block:: fluent + + unread-warning = + { $unreadCount -> + [0] You have no unread messages + [1] You have one unread message + *[other] You have { $unreadCount } unread messages + } + +The advantage here is that per-locale choices don't leak onto the source code +and the developer is not affected. + + +.. note:: + + There is an important distinction between a variant keyed on plural category + `one` and digit `1`. Although in English the two are synonymous, in other + languages category `one` may be used for other numbers. + For example in `Bosnian`__, category `one` is used for numbers like `1`, `21`, `31` + and so on, and also for fractional numbers like `0.1`. + +__ https://unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html +__ https://unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html#bs + +Partially-formatted variables +----------------------------- + +When it comes to formatting data, Fluent allows the developer to provide +a set of parameters for the formatter, and the localizer can fine tune some of them. +This technique is called `partially-formatted variables`__. + +For example, when formatting a date, the developer can just pass a JS :js:`Date` object, +but its default formatting will be pretty expressive. In most cases, the developer +may want to use some of the :js:`Intl.DateTimeFormat` options to select the default +representation of the date in string: + +.. code-block:: javascript + + document.l10n.formatValue("welcome-message", { + startDate: FluentDateTime(new Date(), { + year: "numeric", + month: "long", + day: "numeric" + }) + }); + +.. code-block:: fluent + + welcome-message = Your session will start date: { $startDate } + +In most cases, that will be enough and the date would get formatted in the current +Firefox as `February 28, 2018`. + +But if in some other locale the string would get too long, the localizer can fine +tune the options as well: + +.. code-block:: fluent + + welcome-message = Początek Twojej sesji: { DATETIME($startDate, month: "short") } + +This will adjust the length of the month token in the message to short and get formatted +in Polish as `28 lut 2018`. + +At the moment Fluent supports two formatters that match JS Intl API counterparts: + + * **NUMBER**: `Intl.NumberFormat`__ + * **DATETIME**: `Intl.DateTimeFormat`__ + +With time more formatters will be added. Also, this feature is not exposed +to ``setAttributes`` at this point, as that serializes to JSON. + +__ https://projectfluent.org/fluent/guide/functions.html#partially-formatted-variables +__ https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/NumberFormat +__ https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/DateTimeFormat + +Registering New L10n Files +========================== + +Fluent uses a wildcard statement, packaging all localization resources into +their component's `/localization/` directory. + +That means that, if a new file is added to a component of Firefox already +covered by Fluent like `browser`, it's enough to add the new file to the +repository in a path like `browser/locales/en-US/browser/component/file.ftl`, and +the toolchain will package it into `browser/localization/browser/component/file.ftl`. + +At runtime Firefox uses a special registry for all localization data. It will +register the browser's `/localization/` directory and make all files inside it +available to be referenced. + +To make the document localized using Fluent, all the developer has to do is add +localizable resources for Fluent API to use: + +.. code-block:: html + + <link rel="localization" href="branding/brand.ftl"/> + <link rel="localization" href="browser/preferences/preferences.ftl"/> + +The URI provided to the :html:`<link/>` element are relative paths within the localization +system. + + +Custom Localizations +==================== + +The above method creates a single localization context per document. +In almost all scenarios that's sufficient. + +In rare edge cases where the developer needs to fetch additional resources, or +the same resources in another language, it is possible to create additional +Localization object manually using the `Localization` class: + +.. code-block:: javascript + + const myL10n = new Localization([ + "branding/brand.ftl", + "browser/preferences/preferences.ftl" + ]); + + + let [isDefaultMsg, isNotDefaultMsg] = + await myL10n.formatValues({id: "is-default"}, {id: "is-not-default"}); + + +.. admonition:: Example + + An example of a use case is the Preferences UI in Firefox, which uses the + main context to localize the UI but also to build a search index. + + It is common to build such search index both in a current language and additionally + in English, since a lot of documentation and online help exist only in English. + + A developer may create manually a new context with the same resources as the main one, + but hardcode it to `en-US` and then build the search index using both contexts. + + +By default, all `Localization` contexts are asynchronous. It is possible to create a synchronous +one by passing an `sync = false` argument to the constructor, or calling the `SetIsSync(bool)` method +on the class. + + +.. code-block:: javascript + + const myL10n = new Localization([ + "branding/brand.ftl", + "browser/preferences/preferences.ftl" + ], false); + + + let [isDefaultMsg, isNotDefaultMsg] = + myL10n.formatValuesSync({id: "is-default"}, {id: "is-not-default"}); + + +Synchronous contexts should be always avoided as they require synchronous I/O. If you think your use case +requires a synchronous localization context, please consult Gecko, Performance and L10n Drivers teams. + + +Designing Localizable APIs +========================== + +When designing localizable APIs, the most important rule is to resolve localization as +late as possible. That means that instead of resolving strings somewhere deep in the +codebase and then passing them on, or even caching, it is highly recommended to pass +around :code:`l10n-id` or :code:`[l10n-id, l10n-args]` pairs until the top-most code +resolves them or applies them onto the DOM element. + + +Testing +======= + +When writing tests that involve both I18n and L10n, the general rule is that +result strings are opaque. That means that the developer should not assume any particular +value and should never test against it. + +In case of raw i18n the :js:`resolvedOptions` method on all :js:`Intl.*` formatters +makes it relatively easy. In case of localization, the recommended way is to test that +the code sets the right :code:`l10n-id`/:code:`l10n-args` attributes like this: + +.. code-block:: javascript + + testedFunction(); + + const l10nAttrs = document.l10n.getAttributes(element); + + deepEquals(l10nAttrs, { + id: "my-expected-id", + args: { + unreadCount: 5 + } + }); + +If the code really has to test for particular values in the localized UI, it is +always better to scan for a variable: + +.. code-block:: javascript + + testedFunction(); + + equals(element.textContent.contains("John")); + +.. important:: + + Testing against whole values is brittle and will break when we insert Unicode + bidirectionality marks into the result string or adapt the output in other ways. + + +Manually Testing UI with Pseudolocalization +=========================================== + +When working with a Fluent-backed UI, the developer gets a new tool to test their UI +against several classes of problems. + +Pseudolocalization is a mechanism which transforms messages on the fly, using +specific logic to help emulate how the UI will look once it gets localized. + +The three classes of potential problems that this can help with are: + + - Hardcoded strings. + + Turning on pseudolocalization should expose any strings that were left + hardcoded in the source, since they won't get transformed. + + + - UI space not adapting to longer text. + + Many languages use longer strings than English. For example, German strings + may be 30% longer (or more). Turning on pseudolocalization is a quick way to + test how the layout handles such locales. Strings that don't fit the space + available are truncated and pseudolocalization can also help with detecting them. + + + - Bidi adaptation. + + For many developers, testing the UI in right-to-left mode is hard. + Pseudolocalization shows how a right-to-left locale will look like. + +To turn on pseudolocalization, open the :doc:`Browser Toolbox <../../devtools-user/browser_toolbox/index>`, +click the three dot menu in the top right corner, and choose one of the following: + + - **Enable “accented” locale** - [Ȧȧƈƈḗḗƞŧḗḗḓ Ḗḗƞɠŀīīşħ] + + This strategy replaces all Latin characters with their accented equivalents, + and duplicates some vowels to create roughly 30% longer strings. Strings are + wrapped in markers (square brackets), which help with detecting truncation. + + This option sets the :js:`intl.l10n.pseudo` pref to :js:`accented`. + + + - **Enable bidi locale** - ɥsıʅƃuƎ ıpıԐ + + This strategy replaces all Latin characters with their 180 degree rotated versions + and enforces right to left text flow using Unicode UAX#9 `Explicit Directional Embeddings`__. + In this mode, the UI directionality will also be set to right-to-left. + + This option sets the :js:`intl.l10n.pseudo` pref to :js:`bidi`. + +__ https://www.unicode.org/reports/tr9/#Explicit_Directional_Embeddings + +Inner Structure of Fluent +========================= + +The inner structure of Fluent in Gecko is out of scope of this tutorial, but +since the class and file names may show up during debugging or profiling, +below is a list of major components, each with a corresponding file in `/intl/l10n` +modules in Gecko. + +For more hands-on experience with some of the concepts below, try +following the `Fluent DOMLocalization Tutorial`__, which provides some +background on how Fluent works and walks you through creating a basic +web project from scratch that uses Fluent for localization. + +__ https://projectfluent.org/dom-l10n-documentation/overview.html + +FluentBundle +-------------- + +FluentBundle is the lowest level API. It's fully synchronous, contains a parser for the +FTL file format and a resolver for the logic. It is not meant to be used by +consumers directly. + +In the future we intend to offer this layer for standardization and it may become +part of the :js:`mozIntl.*` or even :js:`Intl.*` API sets. + +That part of the codebase is also the first that we'll be looking to port to Rust. + + +Localization +------------ + +Localization is a higher level API which uses :js:`FluentBundle` internally but +provides a full layer of compound message formatting and robust error fall-backing. + +It is intended for use in runtime code and contains all fundamental localization +methods. + + +DOMLocalization +--------------- + +DOMLocalization extends :js:`Localization` with functionality to operate on HTML, XUL +and the DOM directly including DOM Overlays and Mutation Observers. + +DocumentL10n +------------ + +DocumentL10n implements the DocumentL10n WebIDL API and allows Document to +communicate with DOMLocalization. + +Events +^^^^^^ + +DOM translation is asynchronous (e.g., setting a `data-l10n-id` attribute won't +immediately reflect the localized content in the DOM). + +We expose a :js:`Document.hasPendingL10nMutations` member that reflects whether +there are any async operations pending. When they are finished, the +`L10nMutationsFinished` event is fired on the document, so that chrome code can +be certain all the async operations are done. + +L10nRegistry +------------ + +L10nRegistry is our resource management service. It +maintains the state of resources packaged into the build and language packs, +providing an asynchronous iterator of :js:`FluentBundle` objects for a given locale set +and resources that the :js:`Localization` class uses. + + +.. _Fluent: https://projectfluent.org/ +.. _DTD: https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XUL/Tutorial/Localization +.. _StringBundle: https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XUL/Tutorial/Property_Files +.. _Firefox Preferences: https://bugzilla.mozilla.org/show_bug.cgi?id=1415730 +.. _Unprivileged Contexts: https://bugzilla.mozilla.org/show_bug.cgi?id=1407418 +.. _System Add-ons: https://bugzilla.mozilla.org/show_bug.cgi?id=1425104 +.. _CLDR: http://cldr.unicode.org/ +.. _ICU: http://site.icu-project.org/ +.. _Unicode: https://www.unicode.org/ +.. _Fluent Syntax Guide: https://projectfluent.org/fluent/guide/ +.. _Pontoon: https://pontoon.mozilla.org/ +.. _Plural Rules: http://cldr.unicode.org/index/cldr-spec/plural-rules diff --git a/intl/l10n/docs/glossary.rst b/intl/l10n/docs/glossary.rst new file mode 100644 index 0000000000..f8df34e819 --- /dev/null +++ b/intl/l10n/docs/glossary.rst @@ -0,0 +1,22 @@ +L10N Glossary +============= + +.. glossary:: + :sorted: + + Localization + The process of creating content in a native language, including + translation, but also customizations like Search. + + Localizability + Enabling a piece of software to be localized. This is mostly + externalizing English strings, and writing build support to + pick up localized search engines etc. + + L10n + *Numeronym* for Localization, *L*, 10 chars, *n* + + l10n-merge + nick-name for the process of merging ``en-US`` and a particular + localization into one joint artifact without any missing strings, and + without technical errors, as far as possible. diff --git a/intl/l10n/docs/index.rst b/intl/l10n/docs/index.rst new file mode 100644 index 0000000000..cbd9c3e796 --- /dev/null +++ b/intl/l10n/docs/index.rst @@ -0,0 +1,26 @@ +============ +Localization +============ + +Localization – sometimes written as l10n, where 10 is the number of letters between `l` and `n` – +is an aspect of internationalization focused on adapting software to +different cultural and regional needs. + +The boundary between internationalization and localization is fuzzy. At Mozilla +we refer to localization when we talk about adapting the user interface +and messages, while internationalization handles operations on raw data. + +.. note:: + + Localization is a broader term than translation because it involves extensive research + into the target culture, and in result touches not only text and UI translation but also + cultural adaptation of icons, communication styles, colors, and UX. + +.. toctree:: + :maxdepth: 2 + + overview + fluent/index + migrations/index + crosschannel/index + glossary diff --git a/intl/l10n/docs/migrations/fluent.rst b/intl/l10n/docs/migrations/fluent.rst new file mode 100644 index 0000000000..bc14293ed7 --- /dev/null +++ b/intl/l10n/docs/migrations/fluent.rst @@ -0,0 +1,153 @@ +.. role:: bash(code) + :language: bash + +.. role:: js(code) + :language: javascript + +.. role:: python(code) + :language: python + + +=========================== +Fluent to Fluent Migrations +=========================== + +When migrating existing Fluent messages, +it's possible to copy a source directly with :python:`COPY_PATTERN`, +or to apply string replacements and other changes +by extending the :python:`TransformPattern` visitor class. + +These transforms work with individual Fluent patterns, +i.e. the body of a Fluent message or one of its attributes. + +Copying Fluent Patterns +----------------------- + +Consider for example a patch modifying an existing message to move the original +value to a :js:`alt` attribute. + +Original message: + + +.. code-block:: fluent + + about-logins-icon = Warning icon + .title = Breached website + + +New message: + + +.. code-block:: fluent + + about-logins-breach-icon = + .alt = Warning icon + .title = Breached website + + +This type of changes requires a new message identifier, which in turn causes +existing translations to be lost. It’s possible to migrate the existing +translated content with: + + +.. code-block:: python + + from fluent.migrate import COPY_PATTERN + + ctx.add_transforms( + "browser/browser/aboutLogins.ftl", + "browser/browser/aboutLogins.ftl", + transforms_from( + """ + about-logins-breach-icon = + .alt = {COPY_PATTERN(from_path, "about-logins-icon")} + .title = {COPY_PATTERN(from_path, "about-logins-icon.title")} + """,from_path="browser/browser/aboutLogins.ftl"), + ) + + +In this specific case, the destination and source files are the same. The dot +notation is used to access attributes: :js:`about-logins-icon.title` matches +the :js:`title` attribute of the message with identifier +:js:`about-logins-icon`, while :js:`about-logins-icon` alone matches the value +of the message. + + +.. warning:: + + The second argument of :python:`COPY_PATTERN` and :python:`TransformPattern` + identifies a pattern, so using the message identifier will not + migrate the message as a whole, with all its attributes, only its value. + +Transforming Fluent Patterns +---------------------------- + +To apply changes to Fluent messages, you may extend the +:python:`TransformPattern` class to create your transformation. +This is a powerful general-purpose tool, of which :python:`COPY_PATTERN` is the +simplest extension that applies no transformation to the source. + +Consider for example a patch copying an existing message to strip out its HTML +content to use as an ARIA value. + +Original message: + + +.. code-block:: fluent + + videocontrols-label = + { $position }<span data-l10n-name="duration"> / { $duration }</span> + + +New message: + + +.. code-block:: fluent + + videocontrols-scrubber = + .aria-valuetext = { $position } / { $duration } + + +A migration may be applied to create this new message with: + + +.. code-block:: python + + from fluent.migrate.transforms import TransformPattern + import fluent.syntax.ast as FTL + + class STRIP_SPAN(TransformPattern): + def visit_TextElement(self, node): + node.value = re.sub("</?span[^>]*>", "", node.value) + return node + + def migrate(ctx): + path = "toolkit/toolkit/global/videocontrols.ftl" + ctx.add_transforms( + path, + path, + [ + FTL.Message( + id=FTL.Identifier("videocontrols-scrubber"), + attributes=[ + FTL.Attribute( + id=FTL.Identifier("aria-valuetext"), + value=STRIP_SPAN(path, "videocontrols-label"), + ), + ], + ), + ], + ) + + +Note that a custom extension such as :python:`STRIP_SPAN` is not supported by +the :python:`transforms_from` utility, so the list of transforms needs to be +defined explicitly. + +Internally, :python:`TransformPattern` extends the `fluent.syntax`__ +:python:`Transformer`, which defines the :python:`FTL` AST used here. +As a specific convenience, pattern element visitors such as +:python:`visit_TextElement` are allowed to return a :python:`FTL.Pattern` +to replace themselves with more than one node. + +__ https://projectfluent.org/python-fluent/fluent.syntax/stable/ diff --git a/intl/l10n/docs/migrations/index.rst b/intl/l10n/docs/migrations/index.rst new file mode 100644 index 0000000000..e9ed12aa22 --- /dev/null +++ b/intl/l10n/docs/migrations/index.rst @@ -0,0 +1,53 @@ +.. role:: bash(code) + :language: bash + +.. role:: js(code) + :language: javascript + +.. role:: python(code) + :language: python + +============================================= +Migrating Strings From Legacy or Fluent Files +============================================= + +Firefox is a project localized in over 100 languages. As the code for existing +features moves away from the old localization systems and starts using +`Fluent`_, we need to ensure that we don’t lose existing translations, which +would have the adverse effect of forcing contributors to localize hundreds of +strings from scratch. + +`Fluent Migration`_ is a Python library designed to solve this specific problem: +it allows to migrate translations from `.properties` and other legacy file formats, +not only moving strings and transforming them as needed to adapt to the `FTL` +syntax, but also replicating "blame" for each string in VCS. + +The library also includes basic support for migrating existing Fluent messages +without interpolations (e.g. variable replacements). The typical use cases +would be messages moving as-is to a different file, or changes to the +morphology of existing messages (e.g move content from an attribute to the +value of the message). + +.. toctree:: + :maxdepth: 2 + + overview + legacy + fluent + testing + localizations + +How to Get Help +=============== + +Writing migration recipes can be challenging for non trivial cases, and it can +require extensive l10n knowledge to avoid localizability issues. + +Don’t hesitate to reach out to the l10n-drivers for feedback, help to test or +write the migration recipes: + + - Francesco Lodolo (:flod) + - Eemeli Aro (:eemeli) + +.. _Fluent: http://projectfluent.org/ +.. _Fluent Migration: https://hg.mozilla.org/l10n/fluent-migration/ diff --git a/intl/l10n/docs/migrations/legacy.rst b/intl/l10n/docs/migrations/legacy.rst new file mode 100644 index 0000000000..b1287ba4d4 --- /dev/null +++ b/intl/l10n/docs/migrations/legacy.rst @@ -0,0 +1,642 @@ +.. role:: bash(code) + :language: bash + +.. role:: js(code) + :language: javascript + +.. role:: python(code) + :language: python + +======================== +Migrating Legacy Formats +======================== + +Migrating from legacy formats (.dtd, .properties) is different from migrating +Fluent to Fluent. When migrating legacy code paths, you'll need to adjust the +Fluent strings for the quirks Mozilla uses in the legacy code paths. You'll +find a number of specialized functionalities here. + +Legacy Migration Tools +---------------------- + +To assist with legacy format migrations, some scripting tools are provided: + + - `XUL+DTD to Fluent`_ + - `.properties to Fluent`_ + +When creating a migration, one or both of these tools may provide a good +starting point for manual work by automating at least a part of the migration, +including recipe generation and refactoring the calling code. + +.. _XUL+DTD to Fluent: https://github.com/zbraniecki/convert_xul_to_fluent +.. _.properties to Fluent: https://github.com/mozilla/properties-to-ftl + +Basic Migration +--------------- + +Let’s consider a basic example: one string needs to be migrated, without +any further change, from a DTD file to Fluent. + +The legacy string is stored in :bash:`toolkit/locales/en-US/chrome/global/findbar.dtd`: + + +.. code-block:: dtd + + <!ENTITY next.tooltip "Find the next occurrence of the phrase"> + + +The new Fluent string is stored in :bash:`toolkit/locales/en-US/toolkit/main-window/findbar.ftl`: + + +.. code-block:: properties + + findbar-next = + .tooltiptext = Find the next occurrence of the phrase + + +This is how the migration recipe looks: + + +.. code-block:: python + + # Any copyright is dedicated to the Public Domain. + # http://creativecommons.org/publicdomain/zero/1.0/ + + from __future__ import absolute_import + import fluent.syntax.ast as FTL + from fluent.migrate.helpers import transforms_from + + def migrate(ctx): + """Bug 1411707 - Migrate the findbar XBL binding to a Custom Element, part {index}.""" + + ctx.add_transforms( + "toolkit/toolkit/main-window/findbar.ftl", + "toolkit/toolkit/main-window/findbar.ftl", + transforms_from( + """ + findbar-next = + .tooltiptext = { COPY(from_path, "next.tooltip") } + """, from_path="toolkit/chrome/global/findbar.dtd")) + + +The first important thing to notice is that the migration recipe needs file +paths relative to a localization repository, losing :bash:`locales/en-US/`: + + - :bash:`toolkit/locales/en-US/chrome/global/findbar.dtd` becomes + :bash:`toolkit/chrome/global/findbar.dtd`. + - :bash:`toolkit/locales/en-US/toolkit/main-window/findbar.ftl` becomes + :bash:`toolkit/toolkit/main-window/findbar.ftl`. + +The :python:`context.add_transforms` function takes 3 arguments: + + - Path to the target l10n file. + - Path to the reference (en-US) file. + - An array of Transforms. Transforms are AST nodes which describe how legacy + translations should be migrated. + +.. note:: + + For migrations of Firefox localizations, the target and reference path + are the same. This isn't true for all projects that use Fluent, so both + arguments are required. + +In this case there is only one Transform that migrates the string with ID +:js:`next.tooltip` from :bash:`toolkit/chrome/global/findbar.dtd`, and injects +it in the FTL fragment. The :python:`COPY` Transform allows to copy the string +from an existing file as is, while :python:`from_path` is used to avoid +repeating the same path multiple times, making the recipe more readable. Without +:python:`from_path`, this could be written as: + + +.. code-block:: python + + ctx.add_transforms( + "toolkit/toolkit/main-window/findbar.ftl", + "toolkit/toolkit/main-window/findbar.ftl", + transforms_from( + """ + findbar-next = + .tooltiptext = { COPY("toolkit/chrome/global/findbar.dtd", "next.tooltip") } + """)) + + +This method of writing migration recipes allows to take the original FTL +strings, and simply replace the value of each message with a :python:`COPY` +Transform. :python:`transforms_from` takes care of converting the FTL syntax +into an array of Transforms describing how the legacy translations should be +migrated. This manner of defining migrations is only suitable to simple strings +where a copy operation is sufficient. For more complex use-cases which require +some additional logic in Python, it’s necessary to resort to the raw AST. + + +The example above is equivalent to the following syntax, which exposes +the underlying AST structure: + + +.. code-block:: python + + ctx.add_transforms( + "toolkit/toolkit/main-window/findbar.ftl", + "toolkit/toolkit/main-window/findbar.ftl", + [ + FTL.Message( + id=FTL.Identifier("findbar-next"), + attributes=[ + FTL.Attribute( + id=FTL.Identifier("tooltiptext"), + value=COPY( + "toolkit/chrome/global/findbar.dtd", + "next.tooltip" + ) + ) + ] + ) + ] + ) + +This creates a :python:`Message`, taking the value from the legacy string +:js:`findbar-next`. A message can have an array of attributes, each with an ID +and a value: in this case there is only one attribute, with ID :js:`tooltiptext` +and :js:`value` copied from the legacy string. + +Notice how both the ID of the message and the ID of the attribute are +defined as an :python:`FTL.Identifier`, not simply as a string. + + +.. tip:: + + It’s possible to concatenate arrays of Transforms defined manually, like in + the last example, with those coming from :python:`transforms_from`, by using + the :python:`+` operator. Alternatively, it’s possible to use multiple + :python:`add_transforms`. + + The order of Transforms provided in the recipe is not relevant, the reference + file is used for ordering messages. + + +Replacing Content in Legacy Strings +----------------------------------- + +While :python:`COPY` allows to copy a legacy string as is, :python:`REPLACE` +(from `fluent.migrate`) allows to replace content while performing the +migration. This is necessary, for example, when migrating strings that include +placeholders or entities that need to be replaced to adapt to Fluent syntax. + +Consider for example the following string: + + +.. code-block:: DTD + + <!ENTITY aboutSupport.featuresTitle "&brandShortName; Features"> + + +Which needs to be migrated to: + + +.. code-block:: fluent + + features-title = { -brand-short-name } Features + + +The entity :js:`&brandShortName;` needs to be replaced with a term reference: + + +.. code-block:: python + + FTL.Message( + id=FTL.Identifier("features-title"), + value=REPLACE( + "toolkit/chrome/global/aboutSupport.dtd", + "aboutSupport.featuresTitle", + { + "&brandShortName;": TERM_REFERENCE("brand-short-name"), + }, + ) + ), + + +This creates an :python:`FTL.Message`, taking the value from the legacy string +:js:`aboutSupport.featuresTitle`, but replacing the specified text with a +Fluent term reference. + +.. note:: + :python:`REPLACE` replaces all occurrences of the specified text. + + +It’s also possible to replace content with a specific text: in that case, it +needs to be defined as a :python:`TextElement`. For example, to replace +:js:`example.com` with HTML markup: + + +.. code-block:: python + + value=REPLACE( + "browser/chrome/browser/preferences/preferences.properties", + "searchResults.sorryMessageWin", + { + "example.com": FTL.TextElement('<span data-l10n-name="example"></span>') + } + ) + + +The situation is more complex when a migration recipe needs to replace +:js:`printf` arguments like :js:`%S`. In fact, the format used for localized +and source strings doesn’t need to match, and the two following strings using +unordered and ordered argument are perfectly equivalent: + + +.. code-block:: properties + + btn-quit = Quit %S + btn-quit = Quit %1$S + + +In this scenario, replacing :js:`%S` would work on the first version, but not +on the second, and there’s no guarantee that the localized string uses the +same format as the source string. + +Consider also the following string that uses :js:`%S` for two different +variables, implicitly relying on the order in which the arguments appear: + + +.. code-block:: properties + + updateFullName = %S (%S) + + +And the target Fluent string: + + +.. code-block:: fluent + + update-full-name = { $name } ({ $buildID }) + + +As indicated, :python:`REPLACE` would replace all occurrences of :js:`%S`, so +only one variable could be set. The string needs to be normalized and treated +like: + + +.. code-block:: properties + + updateFullName = %1$S (%2$S) + + +This can be obtained by calling :python:`REPLACE` with +:python:`normalize_printf=True`: + + +.. code-block:: python + + FTL.Message( + id=FTL.Identifier("update-full-name"), + value=REPLACE( + "toolkit/chrome/mozapps/update/updates.properties", + "updateFullName", + { + "%1$S": VARIABLE_REFERENCE("name"), + "%2$S": VARIABLE_REFERENCE("buildID"), + }, + normalize_printf=True + ) + ) + + +.. attention:: + + To avoid any issues :python:`normalize_printf=True` should always be used when + replacing :js:`printf` arguments. This is the default behaviour when working + with .properties files. + +.. note:: + + :python:`VARIABLE_REFERENCE`, :python:`MESSAGE_REFERENCE`, and + :python:`TERM_REFERENCE` are helper Transforms which can be used to save + keystrokes in common cases where using the raw AST is too verbose. + + :python:`VARIABLE_REFERENCE` is used to create a reference to a variable, e.g. + :js:`{ $variable }`. + + :python:`MESSAGE_REFERENCE` is used to create a reference to another message, + e.g. :js:`{ another-string }`. + + :python:`TERM_REFERENCE` is used to create a reference to a `term`__, + e.g. :js:`{ -brand-short-name }`. + + Both Transforms need to be imported at the beginning of the recipe, e.g. + :python:`from fluent.migrate.helpers import VARIABLE_REFERENCE` + + __ https://projectfluent.org/fluent/guide/terms.html + + +Trimming Unnecessary Whitespaces in Translations +------------------------------------------------ + +.. note:: + + This section was updated in May 2020 to reflect the change to the default + behavior: legacy translations are now trimmed, unless the :python:`trim` + parameter is set explicitly. + +It’s not uncommon to have strings with unnecessary leading or trailing spaces +in legacy translations. These are not meaningful, don’t have practical results +on the way the string is displayed in products, and are added mostly for +formatting reasons. For example, consider this DTD string: + + +.. code-block:: DTD + + <!ENTITY aboutAbout.note "This is a list of “about” pages for your convenience.<br/> + Some of them might be confusing. Some are for diagnostic purposes only.<br/> + And some are omitted because they require query strings."> + + +By default, the :python:`COPY`, :python:`REPLACE`, and :python:`PLURALS` +transforms will strip the leading and trailing whitespace from each line of the +translation, as well as the empty leading and trailing lines. The above string +will be migrated as the following Fluent message, despite copious indentation +on the second and the third line in the original: + + +.. code-block:: fluent + + about-about-note = + This is a list of “about” pages for your convenience.<br/> + Some of them might be confusing. Some are for diagnostic purposes only.<br/> + And some are omitted because they require query strings. + + +To disable the default trimming behavior, set :python:`trim:"False"` or +:python:`trim=False`, depending on the context: + + +.. code-block:: python + + transforms_from( + """ + about-about-note = { COPY("toolkit/chrome/global/aboutAbout.dtd", "aboutAbout.note", trim:"False") } + """) + + FTL.Message( + id=FTL.Identifier("discover-description"), + value=REPLACE( + "toolkit/chrome/mozapps/extensions/extensions.dtd", + "discover.description2", + { + "&brandShortName;": TERM_REFERENCE("-brand-short-name") + }, + trim=False + ) + ), + + +Concatenating Strings +--------------------- + +It's best practice to only expose complete phrases to localization, and to avoid +stitching localized strings together in code. With `DTD` and `properties`, +there were few options. So when migrating to Fluent, you'll find +it quite common to concatenate multiple strings coming from `DTD` and +`properties`, for example to create sentences with HTML markup. It’s possible to +concatenate strings and text elements in a migration recipe using the +:python:`CONCAT` Transform. + +Note that in case of simple migrations using :python:`transforms_from`, the +concatenation is carried out implicitly by using the Fluent syntax interleaved +with :python:`COPY()` transform calls to define the migration recipe. + +Consider the following example: + + +.. code-block:: properties + + # %S is replaced by a link, using searchResults.needHelpSupportLink as text + searchResults.needHelp = Need help? Visit %S + + # %S is replaced by "Firefox" + searchResults.needHelpSupportLink = %S Support + + +In Fluent: + + +.. code-block:: fluent + + search-results-need-help-support-link = Need help? Visit <a data-l10n-name="url">{ -brand-short-name } Support</a> + + +This is quite a complex migration: it requires to take 2 legacy strings, and +concatenate their values with HTML markup. Here’s how the Transform is defined: + + +.. code-block:: python + + FTL.Message( + id=FTL.Identifier("search-results-help-link"), + value=REPLACE( + "browser/chrome/browser/preferences/preferences.properties", + "searchResults.needHelp", + { + "%S": CONCAT( + FTL.TextElement('<a data-l10n-name="url">'), + REPLACE( + "browser/chrome/browser/preferences/preferences.properties", + "searchResults.needHelpSupportLink", + { + "%1$S": TERM_REFERENCE("brand-short-name"), + }, + normalize_printf=True + ), + FTL.TextElement("</a>") + ) + } + ) + ), + + +:js:`%S` in :js:`searchResults.needHelpSupportLink` is replaced by a reference +to the term :js:`-brand-short-name`, migrating from :js:`%S Support` to :js:`{ +-brand-short-name } Support`. The result of this operation is then inserted +between two text elements to create the anchor markup. The resulting text is +finally used to replace :js:`%S` in :js:`searchResults.needHelp`, and used as +value for the FTL message. + + +.. important:: + + When concatenating existing strings, avoid introducing changes to the original + text, for example adding spaces or punctuation. Each language has its own + rules, and this might result in poor migrated strings. In case of doubt, + always ask for feedback. + + +When more than 1 element is passed in to concatenate, :python:`CONCAT` +disables whitespace trimming described in the section above on all legacy +Transforms passed into it: :python:`COPY`, :python:`REPLACE`, and +:python:`PLURALS`, unless the :python:`trim` parameters has been set +explicitly on them. This helps ensure that spaces around segments are not +lost during the concatenation. + +When only a single element is passed into :python:`CONCAT`, however, the +trimming behavior is not altered, and follows the rules described in the +previous section. This is meant to make :python:`CONCAT(COPY())` equivalent +to a bare :python:`COPY()`. + + +Plural Strings +-------------- + +Migrating plural strings from `.properties` files usually involves two +Transforms from :python:`fluent.migrate.transforms`: the +:python:`REPLACE_IN_TEXT` Transform takes TextElements as input, making it +possible to pass it as the foreach function of the :python:`PLURALS` Transform. + +Consider the following legacy string: + + +.. code-block:: properties + + # LOCALIZATION NOTE (disableContainersOkButton): Semi-colon list of plural forms. + # See: http://developer.mozilla.org/en/docs/Localization_and_Plurals + # #1 is the number of container tabs + disableContainersOkButton = Close #1 Container Tab;Close #1 Container Tabs + + +In Fluent: + + +.. code-block:: fluent + + containers-disable-alert-ok-button = + { $tabCount -> + [one] Close { $tabCount } Container Tab + *[other] Close { $tabCount } Container Tabs + } + + +This is how the Transform for this string is defined: + + +.. code-block:: python + + FTL.Message( + id=FTL.Identifier("containers-disable-alert-ok-button"), + value=PLURALS( + "browser/chrome/browser/preferences/preferences.properties", + "disableContainersOkButton", + VARIABLE_REFERENCE("tabCount"), + lambda text: REPLACE_IN_TEXT( + text, + { + "#1": VARIABLE_REFERENCE("tabCount") + } + ) + ) + ) + + +The `PLURALS` Transform will take care of creating the correct number of plural +categories for each language. Notice how `#1` is replaced for each of these +variants with :js:`{ $tabCount }`, using :python:`REPLACE_IN_TEXT` and +:python:`VARIABLE_REFERENCE("tabCount")`. + +In this case it’s not possible to use :python:`REPLACE` because it takes a file +path and a message ID as arguments, whereas here the recipe needs to operate on +regular text. The replacement is performed on each plural form of the original +string, where plural forms are separated by a semicolon. + +Explicit Variants +----------------- + +Explicitly creating variants of a string is useful for platform-dependent +terminology, but also in cases where you want a one-vs-many split of a string. +It’s always possible to migrate strings by manually creating the underlying AST +structure. Consider the following complex Fluent string: + + +.. code-block:: fluent + + use-current-pages = + .label = + { $tabCount -> + [1] Use Current Page + *[other] Use Current Pages + } + .accesskey = C + + +The migration for this string is quite complex: the :js:`label` attribute is +created from 2 different legacy strings, and it’s not a proper plural form. +Notice how the first string is associated to the :js:`1` case, not the :js:`one` +category used in plural forms. For these reasons, it’s not possible to use +:python:`PLURALS`, the Transform needs to be crafted recreating the AST. + + +.. code-block:: python + + + FTL.Message( + id=FTL.Identifier("use-current-pages"), + attributes=[ + FTL.Attribute( + id=FTL.Identifier("label"), + value=FTL.Pattern( + elements=[ + FTL.Placeable( + expression=FTL.SelectExpression( + selector=VARIABLE_REFERENCE("tabCount"), + variants=[ + FTL.Variant( + key=FTL.NumberLiteral("1"), + default=False, + value=COPY( + "browser/chrome/browser/preferences/main.dtd", + "useCurrentPage.label", + ) + ), + FTL.Variant( + key=FTL.Identifier("other"), + default=True, + value=COPY( + "browser/chrome/browser/preferences/main.dtd", + "useMultiple.label", + ) + ) + ] + ) + ) + ] + ) + ), + FTL.Attribute( + id=FTL.Identifier("accesskey"), + value=COPY( + "browser/chrome/browser/preferences/main.dtd", + "useCurrentPage.accesskey", + ) + ), + ], + ), + + +This Transform uses several concepts already described in this document. Notable +is the :python:`SelectExpression` inside a :python:`Placeable`, with an array +of :python:`Variant` objects. Exactly one of those variants needs to have +``default=True``. + +This example can still use :py:func:`transforms_from()``, since existing strings +are copied without interpolation. + +.. code-block:: python + + transforms_from( + """ + use-current-pages = + .label = + { $tabCount -> + [1] { COPY(main_dtd, "useCurrentPage.label") } + *[other] { COPY(main_dtd, "useMultiple.label") } + } + .accesskey = { COPY(main_dtd, "useCurrentPage.accesskey") } + """, main_dtd="browser/chrome/browser/preferences/main.dtd" + ) diff --git a/intl/l10n/docs/migrations/localizations.rst b/intl/l10n/docs/migrations/localizations.rst new file mode 100644 index 0000000000..0861d6e52b --- /dev/null +++ b/intl/l10n/docs/migrations/localizations.rst @@ -0,0 +1,42 @@ +.. role:: bash(code) + :language: bash + +.. role:: js(code) + :language: javascript + +.. role:: python(code) + :language: python + +=========================================== +How Migrations Are Run on l10n Repositories +=========================================== + +Once a patch including new FTL strings and a migration recipe lands in +mozilla-central, l10n-drivers will perform a series of actions to migrate +strings in all 100+ localization repositories: + + - New Fluent strings land in `mozilla-central`, together with a migration + recipe. + - New strings are added to `gecko-strings-quarantine`_, a unified repository + including strings for all shipping versions of Firefox, and used as a buffer + before exposing strings to localizers. + - Migration recipes are run against all l10n repositories, migrating strings + from old to new files, and storing them in VCS. + - New en-US strings are pushed to the official `gecko-strings`_ repository + used by localization tools, and exposed to all localizers. + +Migration recipes could be run again within a release cycle, in order to migrate +translations for legacy strings added after the first run. They’re usually +removed from `mozilla-central` within 2 cycles, e.g. a migration recipe created +for Firefox 59 would be removed when Firefox 61 is available in Nightly. + + +.. tip:: + + A script to run migrations on all l10n repositories is available in `this + repository`__, automating part of the steps described for manual testing, and + it could be adapted to local testing. + + __ https://github.com/flodolo/fluent-migrations +.. _gecko-strings-quarantine: https://hg.mozilla.org/l10n/gecko-strings-quarantine/ +.. _gecko-strings: https://hg.mozilla.org/l10n/gecko-strings diff --git a/intl/l10n/docs/migrations/overview.rst b/intl/l10n/docs/migrations/overview.rst new file mode 100644 index 0000000000..dc9c128fb9 --- /dev/null +++ b/intl/l10n/docs/migrations/overview.rst @@ -0,0 +1,136 @@ +.. role:: bash(code) + :language: bash + +.. role:: js(code) + :language: javascript + +.. role:: python(code) + :language: python + +===================================== +Migration Recipes and Their Lifecycle +===================================== + +The actual migrations are performed running Python modules called **migration +recipes**, which contain directives on how to migrate strings, which files are +involved, transformations to apply, etc. These recipes are stored in +`mozilla-central`__. + +__ https://hg.mozilla.org/mozilla-central/file/default/python/l10n/fluent_migrations + +When part of Firefox’s UI is migrated to Fluent, a migration recipe should be +attached to the same patch that adds new strings to `.ftl` files. + +Migration recipes can quickly become obsolete, because the referenced strings +and files are removed from repositories as part of ongoing development. +For these reasons, l10n-drivers periodically clean up the `fluent_migrations` +folder in mozilla-central, keeping only recipes for 2 +shipping versions (Nightly and Beta). + + +.. hint:: + + As a developer you don’t need to bother about updating migration recipes + already in `mozilla-central`: if a new patch removes a string or file that is + used in a migration recipe, simply ignore it, since the entire recipe will be + removed within a couple of cycles. + + +How to Write Migration Recipes +============================== + +The migration recipe’s filename should start with a reference to the associated +bug number, and include a brief description of the bug, e.g. +:bash:`bug_1451992_preferences_applicationManager.py` is the migration recipe +used to migrate the Application Manager window in preferences. It’s also +possible to look at existing recipes in `mozilla-central`__ for inspiration. + +__ https://hg.mozilla.org/mozilla-central/file/default/python/l10n/fluent_migrations + + +General Recipe Structure +======================== + +A migration recipe is a Python module, implementing the :py:func:`migrate` +function, which takes a :py:class:`MigrationContext` as input. The API provided +by the context is + +.. code-block:: python + + class MigrationContext: + def add_transforms(self, target, reference, transforms): + """Define transforms for target using reference as template. + + `target` is a path of the destination FTL file relative to the + localization directory. `reference` is a path to the template FTL + file relative to the reference directory. + + Each transform is an extended FTL node with `Transform` nodes as some + values. + + For transforms that merely copy legacy messages or Fluent patterns, + using `fluent.migrate.helpers.transforms_from` is recommended. + """ + +The skeleton of a migration recipe just implements the :py:func:`migrate` +function calling into :py:func:`ctx.add_transforms`, and looks like + +.. code-block:: python + + # coding=utf8 + + # Any copyright is dedicated to the Public Domain. + # http://creativecommons.org/publicdomain/zero/1.0/ + + from __future__ import absolute_import + + + def migrate(ctx): + """Bug 1552333 - Migrate feature to Fluent, part {index}""" + target = 'browser/browser/feature.ftl' + reference = 'browser/browser/feature.ftl' + ctx.add_transforms( + target, + reference, + [], # Actual transforms go here. + ) + +One can call into :py:func:`ctx.add_transforms` multiple times. In particular, one +can create migrated content in multiple files as part of a single migration +recipe by calling :py:func:`ctx.add_transforms` with different target-reference +pairs. + +The *docstring* for this function will be used +as a commit message in VCS, that’s why it’s important to make sure the bug +reference is correct, and to keep the `part {index}` section: multiple strings +could have multiple authors, and would be migrated in distinct commits (part 1, +part 2, etc.). + +Transforms +========== + +The work of the migrations is done by the transforms that are passed as +last argument to :py:func:`ctx.add_transforms`. They're instances of either Fluent +:py:class:`fluent.syntax.ast.Message` or :py:class:`Term`, and their content +can depend on existing translation sources. The skeleton of a Message looks like + +.. code-block:: python + + FTL.Message( + id=FTL.Identifier( + name="msg", + ), + value=FTL.Pattern( + elements=[ + FTL.TextElement( + value="A string", + ), + ], + ), + ) + +When migrating existing legacy translations, you'll replace an +``FTL.TextElement`` with a ``COPY(legacy_path, "old_id")``, or one of its +variations we detail :doc:`next <legacy>`. When migrating existing Fluent +translations, an ``FTL.Pattern`` is replaced with a +``COPY_PATTERN(old_path, "old-id")``. diff --git a/intl/l10n/docs/migrations/testing.rst b/intl/l10n/docs/migrations/testing.rst new file mode 100644 index 0000000000..aa9b9747f6 --- /dev/null +++ b/intl/l10n/docs/migrations/testing.rst @@ -0,0 +1,58 @@ +.. role:: bash(code) + :language: bash + +.. role:: js(code) + :language: javascript + +.. role:: python(code) + :language: python + +============================= +How to Test Migration Recipes +============================= + +To test migration recipes, use the following mach command: + +.. code-block:: bash + + ./mach fluent-migration-test python/l10n/fluent_migrations/bug_1485002_newtab.py + +This will analyze your migration recipe to check that the :python:`migrate` +function exists, and interacts correctly with the migration context. Once that +passes, it clones :bash:`gecko-strings` into :bash:`$OBJDIR/python/l10n`, creates a +reference localization by adding your local Fluent strings to the ones in +:bash:`gecko-strings`. It then runs the migration recipe, both as dry run and +as actual migration. Finally it analyzes the commits, and checks if any +migrations were actually run and the bug number in the commit message matches +the migration name. + +It will also show the diff between the migrated files and the reference, ignoring +blank lines. + +You can inspect the generated repository further by looking at + +.. code-block:: bash + + ls $OBJDIR/python/l10n/bug_1485002_newtab/en-US + +Caveats +------- + +Be aware of hard-coded English context in migration. Consider for example: + + +.. code-block:: python + + ctx.add_transforms( + "browser/browser/preferences/siteDataSettings.ftl", + "browser/browser/preferences/siteDataSettings.ftl", + transforms_from( + """ + site-usage-persistent = { site-usage-pattern } (Persistent) + """) + ) + + +This Transform will pass a manual comparison, since the two files are identical, +but will result in :js:`(Persistent)` being hard-coded in English for all +languages. diff --git a/intl/l10n/docs/overview.rst b/intl/l10n/docs/overview.rst new file mode 100644 index 0000000000..1f1d90c105 --- /dev/null +++ b/intl/l10n/docs/overview.rst @@ -0,0 +1,199 @@ +.. role:: js(code) + :language: javascript + +============ +Localization +============ + +Localization at Mozilla +======================= + +At Mozilla localizations are managed by locale communities around the world, who +are responsible for maintaining high quality linguistic and cultural adaptation +of Mozilla software into over 100 locales. + +The exact process of localization management differs from project to project, but +in the case of Gecko applications, the localization is primarily done via a web localization +system called `Pontoon`_ and stored in HG repositories under +`hg.mozilla.org/l10n-central`_. + +Developers are expected to keep their code localizable using localization +and internationalization systems, and also serve as localizers into the `en-US` locale +which is used as the `source` locale. + +In between the developers and localizers, there's a sophisticated ecosystem of tools, +tests, automation, validators and other checks on one hand, and management, release, +community and quality processes facilitated by the `L10n Drivers Team`_, on the other. + +Content vs. UI +============== + +The two main categories in localization are content localization vs UI localization. + +The former is usually involved when dealing with large blocks of text such as +documentation, help articles, marketing material and legal documents. + +The latter is the primary type when handling user interfaces for applications such +as Firefox. + +This article will focus on UI localization. + +Lifecycle & Workflow +==================== + +1) New feature +-------------- + +The typical life cycle of a localizable UI starts with a UX/UI or new feature need which +should be accompanied by the UX mockups involving so called `copy` - the original +content to be used in the new piece of UI. + +2) UX mockup + copy review +-------------------------- + +The UX mockup with copy is the first step that should be reviewed by the L10n Drivers Team. +Their aim is to identify potential cultural and localization challenges that may arise +later and ensure that the UI is ready for localization on a linguistic, cultural, +and technical level. + +3) Patch l10n review +-------------------- + +Once that is completed, the next stage is for front-end engineers to create patches +which implement the new UI. Those patches should already contain the `copy` and +place the strings in the localization resources for the source locale (`en-US`). + +The developer uses the localization API by selecting a special identifier we call +`L10n ID` and optionally a list of variables that will be passed to the translation. + +We call this "a social contract" which binds the l10n-id/args combination to a particular +source translation to use in the UI. + +The localizer expects the developer to maintain the contract by ensuring that the +translation will be used in the given location, and will correspond to the +source translation. If that contract is to be changed, the developer will be expected +to update it. More on that in part `6) String Updates`. + +The next review comes from either L10n Drivers, or experienced front end engineers +familiar with the internationalization and localization systems, making sure that +the patches properly use the right APIs and the code is ready to be landed +into `mozilla-central`. + +.. _exposure-in-gecko-strings: + +4) Exposure in `gecko-strings` +------------------------------ + +Once the patch lands in `mozilla-central`, L10n Drivers will take a final look at +the localizability of the introduced strings. In case of issues, developers might +be asked to land a follow up, or the patch could be backed out with the help of sheriffs. + +Every few days, strings are exported into a repository called `gecko-strings-quarantine`, +a unified repository that includes strings for all shipping versions of Firefox +(nightly, beta, release). This repository is used as a buffer to avoid exposing potential +issues to over 100 locales. + +As a last step, strings are pushed into `gecko-strings`, another unified repository that +is exposed to localization tools, like Pontoon, and build automation. + +5) Localization +--------------- + +From that moment localizers will work on providing translations for the new feature +either while the new strings are only in Nightly or after they are merged to Beta. +The goal is to have as much of the UI ready in as many locales as early as possible, +but the process is continuous and we're capable of releasing Firefox with incomplete +translations falling back on a backup locale in case of a missing string. + +While Nightly products use the latest version of localization available in repositories, +the L10n Drivers team is responsible for reviewing and signing off versions of each +localization shipping in Beta and Release versions of Gecko products. + +6) String updates +----------------- + +Later in the software life cycle some strings might need to be changed or removed. +As a general rule, once the strings lands in `mozilla-central`, any further update +to existing strings will need to follow these guidelines, independently from how much +time has passed since previous changes. + +If it's just a string removal, all the engineer has to do is to remove it from the UI +and from the localization resource file in `mozilla-central`. + +If it's an update, we currently have two "levels" of change severity: + +1) If the change is minor, like fixing a spelling error or case, the developer should update +the `en-US` translation without changing the l10n-id. + +2) If the change is anyhow affecting the meaning or tone of the message, the developer +is requested to update the l10n string ID. + +The latter is considered a change in the social contract between the developer and +the localizer and an update to the ID is expected. + +In case of `Fluent`_, any changes to the structure of the message such as adding/removing +attributes also requires an update of the ID. + +The new ID will be recognized by the l10n tooling as untranslated, and the old one +as obsolete. This will give the localizers an opportunity to find and update the +translation, while the old string will be removed from the build process. + +There is a gray area between the two severity levels. In case of doubt, don’t hesitate +to request feedback of review from L10n Drivers to avoid issues once the strings land. + +Selecting L10n Identifier +========================= + +Choosing an identifier for a localization message is tricky. It may seem similar +to picking a variable name, but in reality, it's much closer to designing a public +API. + +An l10n identifier, once defined, is then getting associated to a translated +message in every one of 100+ locales and it becomes very costly to attempt to +migrate that string in all locales to a different identifier. + +Additionally, in Fluent an identifier is used as a last resort string to be displayed in +an error scenario when formatting the message fails, which makes selecting +**meaningful** identifiers particularly valuable. + +Lastly, l10n resources get mixed and matched into localization contexts where +it becomes important to avoid identifier collision from two strings coming +from two different files. + +For all those reasons, a longer identifier such as :js:`privacy-exceptions-button-ok` is +preferred over short identifiers like :js:`ok` or :js:`ok-button`. + +Localization Systems +==================== + +Gecko has two main localization systems: Fluent and StringBundle, a legacy system. + +Fluent +------ + +Fluent is a modern localization system designed by Mozilla to address the challenges +and limitations of older systems. + +It's well suited for modern web development cycle, provides a number of localization +features including good internationalization model and strong bidirectionality support. + + +To learn more about Fluent, follow the `Fluent for Firefox Developers`_ guide. + +StringBundle +------------ + +StringBundle is a runtime API used primarily for localization of C++ code. +The messages are stored in `.properties` files and loaded using the StringBundle API +and then retrieved from there via imperative calls. + +The system provides external arguments which can be placed into the string, and +supports basic plural categories via a proprietary API `PluralForm.sys.mjs`. + +Adding new StringBundle messages should only be done after serious consideration, +and in particular any new use of PluralForm messages should be avoided. + +.. _Pontoon: https://pontoon.mozilla.org/ +.. _hg.mozilla.org/l10n-central: https://hg.mozilla.org/l10n-central/ +.. _L10n Drivers Team: https://wiki.mozilla.org/L10n:Mozilla_Team +.. _Fluent For Firefox Developers: ./fluent/tutorial.html |