summaryrefslogtreecommitdiffstats
path: root/toolkit/components/translations/docs/resources/02_contributing.md
blob: 227a855d2838ce0171c11d300bda9139275eee58 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
# Contributing

The following content goes more in-depth than the [Overview](./01_overview.md) section
to provide helpful information regarding contributing to Firefox Translations.

- [Source Code](#source-code)
- [Architecture](#architecture)
    - [JSActors](#jsactors)
- [Remote Settings](#remote-settings)
    - [Admin Dashboards](#admin-dashboards)
    - [Prod Admin Access](#prod-admin-access)
    - [Pulling From Different Sources](#pulling-from-different-sources)
    - [Versioning](#versioning)
      - [Non-Breaking Changes](#non-breaking-changes)
      - [Breaking Changes](#breaking-changes)
- [Building fastText](#building-fasttext)
    - [Downloading The Models](#downloading-the-models)
    - [Building the WASM Binary](#building-the-wasm-binary)
        - [Dependencies](#dependencies)
        - [Modifying the EMCXXFLAGS](#modifying-the-emcxxflags)
- [Building Bergamot](#building-bergamot)

---
## Source Code

The primary source code for Firefox Translations lives in the following directory:

> **[toolkit/components/translations]**

---
## Architecture

### JSActors

Translations functionality is divided into different classes based on which access privileges are needed.
Generally, this split is between [Parent] and [Child] versions of [JSWindowActors].

The [Parent] actors have access to privileged content and is responsible for things like downloading models
from [Remote Settings](#remote-settings), modifying privileged UI components etc.

The [Child] actors are responsible for interacting with content on the page itself, requesting content from
the [Parent] actors, and creating [Workers] to carry out tasks.

---
## Remote Settings

The machine-learning models and [WASM] binaries are all hosted in Remote Settings and are downloaded/cached when needed.

### Admin Dashboards

In order to get access to Firefox Translations content in the Remote Settings admin dashboards, you will need to request
access in the Remote Settings component on [Bugzilla].

Once you have access to Firefox Translations content in Remote Settings, you will be able to view it in the admin dashboards:

**Dev**<br>
> [https://settings.dev.mozaws.net/v0/admin](https://settings.dev.mozaws.net/v1/admin)

**Stage**<br>
> [https://remote-settings.allizom.org/v0/admin](https://settings-writer.stage.mozaws.net/v1/admin)

### Prod Admin Access

In order to access the prod admin dashboard, you must also have access to a VPN that is authorized to view the dashboard.
To gain access to the VPN, follow [Step 3] on this page in the Remote Settings documentation.

**Prod**<br>
> [https://remote-settings.mozilla.org/v1/admin](https://settings-writer.prod.mozaws.net/v1/admin)


### Pulling From Different Sources

When you are running Firefox, you can choose to pull data from **Dev**, **Stage**, or **Prod** by downloading and installing
the latest [remote-settings-devtools] Firefox extension.

### Versioning

Firefox Translations uses semantic versioning for all of its records via the **`version`** property.

```{note}
**Beta Versions**

Versions that are below **`1.0`** are considered to be a beta version. For language translation models,
this quality will be communicated to users via the UI.
```

#### Non-breaking Changes

Firefox Translations code will always retrieve the maximum compatible version of each record from Remote Settings.
If two records exist with different versions, (e.g. **`1.0`** and **`1.1`**) then only the version **`1.1`** record
will be considered.

This allows us to update and ship new versions of models that are compatible with the current source code and wasm runtimes
in both backward-compatible and forward-compatible ways. These can be released through remote settings independent of the
[Firefox Release Schedule].

#### Breaking Changes

Breaking changes for Firefox Translations are a bit more tricky. These are changes that make older-version records
incompatible with the current Firefox source code and/or [WASM] runtimes.

While a breaking change will result in a change of the semver number (e.g. **`1.1 ⟶ 2.0`**), this alone is not
sufficient. Since Firefox Translations always attempts to use the maximum compatible version, only bumping this number
would result in older versions of Firefox attempting to use a newer-version record that is no longer compatible with the
Firefox source code or [WASM] runtimes.

To handle these changes, Firefox Translations utilizes Remote Settings [Filter Expressions] to make certain records
available to only particular releases of Firefox. This will allow Firefox Translations to make different sets of Remote Settings records available to different versions
of Firefox.

```{admonition} Example

Let's say that Firefox 108 through Firefox 120 is compatible with translations model records in the **`1.*`** major-version range, however Firefox 121 and onward is compatible with only model records in the **`2.*`** major-version range.

This will allow us to mark the **`1.*`** major-version records with the following filter expression:

**`
"filter_expression": "env.version|versionCompare('108.a0') >= 0 && env.version|versionCompare('121.a0') < 0"
`**

This means that these records will only be available in Firefox versions greater than or equal to 108, and less than 121.

Similarly, we will be able to mark all of the **`2.*`** major-version records with this filter expression:

**`
"filter_expression": "env.version|versionCompare('121.a0') >= 0"
`**

This means that these records will only be available in Firefox versions greater than or equal to Firefox 121.

```

Tying breaking changes to releases in this way frees up Firefox Translations to make changes as large as entirely
switching one third-party library for another in the compiled source code, while allowing older versions of Firefox to continue utilizing the old library and allowing newer versions of Firefox to utilize the new library.

---
## Building fastText

### Downloading the Models

The fastText model that we use can be downloaded directly from the fastText website:<br>
> [https://fasttext.cc/docs/en/language-identification.html](https://fasttext.cc/docs/en/language-identification.html)

Firefox Translations uses the compressed, **`lid.176.ftz`** model.

### Building the WASM Binary

To build the fastText [WASM] binary, we can follow the steps in the [Requirements] section of the fastText website.

#### Dependencies

**C++ Compiler**<br>
Any of the C++ compilers from [Getting Set Up To Work On The Firefox Codebase] will be sufficient for this.

**emskd**<br>
Follow the [Download and Install] instructions for setting up the emscripten sdk.

#### Modifying the EMCXXFLAGS

At the time of writing, the a latest commit on the fastText repo ([3697152e0fd772d9185697fdbd4a1d340ca5571d])
is not compatible by default with the latest version of [emscripten (3.1.35)].

A few changes need to be made to the Makefile in order to generate the fastText [WASM] for use in Firefox.

**1) Disable DYNAMIC_EXECUTION**<br>
In the `Makefile` for the fastText repo, there is a variable called **`EMCXXFLAGS`**.<br>
We need to add the following flag to this variable:

```
-s "DYNAMIC_EXECUTION=0"
```

If this flag is not set to **`0`**, then emscripten will [generate functions] that use the [eval()] function.
[eval()] is not allowed in the context that fastText runs in FireFox due to security reasons.

**2) Rename EXTRA_EXPORTED_RUNTIME_METHODS**<br>
In [emscripten (2.0.18)], **`EXTRA_EXPORTED_RUNTIME_METHODS`** was deprecated in favor of **`EXPORTED_RUNTIME_METHODS`**.
The fastText Makefile still has the old flag, so we need to update the name.

**3) Use the -r Flag When Appropriate**<br>
In [emscripten (2.0.3)] the following change was made:

> "The default output format is now executable JavaScript. Previously we would default to output objecting files unless, for example, the output name ended in **`.js`**. This is contrary to behavior of clang and gcc. Now emscripten will always produce and executable unless the **`-c`**, **`-r`** or **`-shared`** flags are given. This is true even when the name of the output file ends in **`.o`**. e.g, **`emcc foo.c -o foo.o`** will produce a JavaScript file called **`foo.o`**. This might surprise some users (although it matches the behavior of existing toolchains) so we now produce a warning in this case."

The Makefile needs to be modified to use the **`-r`** flag when appropriate. These changes are modeled after comments on this [GitHub Issue].

**Cumulative Changes**<br>
Here is a diff of the full changes needed for the Makefile at the time of writing:

```diff
diff --git a/Makefile b/Makefile
index e246f79..396ae0b 100644
--- a/Makefile
+++ b/Makefile
@@ -73,7 +73,9 @@ clean:

 EMCXX = em++
-EMCXXFLAGS = --bind --std=c++11 -s WASM=1 -s ALLOW_MEMORY_GROWTH=1 -s "EXTRA_EXPORTED_RUNTIME_METHODS=['addOnPostRun', 'FS']" -s "DISABLE_EXCEPTION_CATCHING=0" -s "EXCEPTION_DEBUG=1" -s "FORCE_FILESYSTEM=1" -s "MODULARIZE=1" -s "EXPORT_ES6=1" -s 'EXPORT_NAME="FastTextModule"' -Isrc/
+EMCXXFLAGS_BASE = --bind --std=c++11 -s WASM=1 -s ALLOW_MEMORY_GROWTH=1 -s "EXPORTED_RUNTIME_METHODS=['addOnPostRun', 'FS']" -s "DISABLE_EXCEPTION_CATCHING=0" -s "EXCEPTION_DEBUG=0" -s "DYNAMIC_EXECUTION=0" -s "FORCE_FILESYSTEM=1" -s "MODULARIZE=1" -s "EXPORT_ES6=1" -s 'EXPORT_NAME="FastTextModule"' -Isrc/
+EMCXXFLAGS = $(EMCXXFLAGS_BASE) -r
+EMCXXFLAGS_JS = $(EMCXXFLAGS_BASE)
 EMOBJS = args.bc autotune.bc matrix.bc dictionary.bc loss.bc productquantizer.bc densematrix.bc quantmatrix.bc vector.bc model.bc utils.bc meter.bc fasttext.bc main.bc


@@ -120,6 +122,6 @@ fasttext.bc: src/fasttext.cc src/*.h
 	$(EMCXX) $(EMCXXFLAGS)  src/fasttext.cc -o fasttext.bc

 webassembly/fasttext_wasm.js: $(EMOBJS) webassembly/fasttext_wasm.cc Makefile
-	$(EMCXX) $(EMCXXFLAGS) $(EMOBJS) -o webassembly/fasttext_wasm.js
+	$(EMCXX) $(EMCXXFLAGS_JS) $(EMOBJS) -o webassembly/fasttext_wasm.js
```

After modifying the Makefile in the previous section, running **`make wasm`** in the fastText repo should run without warnings or errors and the following files will be generated in the **`webassembly`** directory:

```
webassembly
├── fasttext.js
├── fasttext_wasm.js
└── fasttext_wasm.wasm
```

#### Modifying fasttext_wasm.js

There are a few changes we need to make to the **`fasttext_wasm.js`** file to make it compatible with use in Firefox.

**1) Define a function, not a module**<br>
The generated code exports a module, but this needs to be modified into a function for use in [importScripts()] in a worker.

At the top of the file we need to make the following changes:

```diff
diff --git a/toolkit/components/translations/fasttext/fasttext_wasm.js b/toolkit/components/translations/fasttext/fasttext_wasm.js
index 64c6184a85851..4802343da2a03 100644
--- a/toolkit/components/translations/fasttext/fasttext_wasm.js
+++ b/toolkit/components/translations/fasttext/fasttext_wasm.js
@@ -1,9 +1,6 @@

-var FastTextModule = (() => {
-  var _scriptDir = import.meta.url;
-
-  return (
-async function(FastTextModule = {})  {
+async function loadFastTextModule(FastTextModule = {})  {
+  const _scriptDir = null;

 // include: shell.js
 // The Module object: Our interface to the outside world. We import
```

Here we are defining a function rather than a variable, and we are setting **`_scriptDir`** to null
because **`import.meta.url`** is only available for use within modules.

Next we need to modify the bottom of the file to match these changes:

```diff
diff --git a/toolkit/components/translations/fasttext/fasttext_wasm.js b/toolkit/components/translations/fasttext/fasttext_wasm.js
index 64c6184a85851..0a6fca3f524e4 100644
--- a/toolkit/components/translations/fasttext/fasttext_wasm.js
+++ b/toolkit/components/translations/fasttext/fasttext_wasm.js
@@ -8287,7 +8287,3 @@ run();

   return FastTextModule.ready
 }
-
-);
-})();
-export default FastTextModule;
```

**2) Remove unneeded environment checks**<br>
Next we need to remove unneeded checks for different environments:

```JavaScript
if (ENVIRONMENT_IS_NODE) {
    // ...
} else
if (ENVIRONMENT_IS_SHELL) {
    // ...
} else
if (ENVIRONMENT_IS_WEB || ENVIRONMENT_IS_WORKER) {
    // ...
} else
{
  throw new Error('environment detection error');
}
```

Since this code will only be run inside of a worker, we want to delete the blocks that deal with **`ENVIRONMENT_IS_NODE`** and **`ENVIRONMENT_IS_SHELL`**. In fact, this code will fail to be imported by [importScripts()] if we don't do this.

**3) Remove the use of `import.meta.url`**<br>
Finally, there is a use of **`import.meta.url`** that we need to remove.

```diff
diff --git a/toolkit/components/translations/fasttext/fasttext_wasm.js b/toolkit/components/translations/fasttext/fasttext_wasm.js
index 64c6184a85851..746cbae2ec952 100644
--- a/toolkit/components/translations/fasttext/fasttext_wasm.js
+++ b/toolkit/components/translations/fasttext/fasttext_wasm.js
@@ -746,7 +746,7 @@ if (Module['locateFile']) {
   }
 } else {
   // Use bundler-friendly `new URL(..., import.meta.url)` pattern; works in browsers too.
-  wasmBinaryFile = new URL('fasttext_wasm.wasm', import.meta.url).href;
+  wasmBinaryFile = null;
 }

 function getBinary(file) {
```

As mentioned before, **`import.meta.url`** is not allowed outside of modules and cannot be used with [importScripts()]
in the worker code that we are creating.

It is okay to set this to null here, because we will be providing the **`wasmBinaryFile`** via [Remote Settings].

**4) Minifying the file**<br>
The generated **`fasttext_wasm.js`** file is very large. To minimize the impact on the size of the code in the Firefox source tree, we want to minify the file using the [minify] tool.

```
Size Name
291k ├── fasttext_wasm.js (original)
109k └── fasttext_wasm.js (minified)
```

**5) Adding the license**<br>
Finally, we should add a copy of the current fastText MIT license to the top of the minified **`fasttext_wasm.js`** file.
You should be able to paste this from the generated **`fasttext.js`** file.

#### Modifying fasttext.js

```{note}
It is likely that the source file in tree already has these changes and is already sufficient,
even if **`fasttext_wasm.js`** has been recently updated. Try running it first as-is before replacing
and re-modifying.
```

Next we need to modify **`fasttext.js`** to utilize the changes that we made to **`fasttext_wasm.js`** and also to
not be a module so that we can import it using [importScripts()].

These changes do the following:

1) Define a variable called **`fastTextModule`** for use in the worker scripts.
2) Utilize the **`loadFastTextModule()`** function that we defined in **`fasttext_wasm.js`**
3) Add a function **`loadModelBinary()`** that takes the wasm binary directly, which we will provide through [Remote Settings].
4) Remove any module exports.

```diff
diff --git a/toolkit/components/translations/fasttext/fasttext.js b/toolkit/components/translations/fasttext/fasttext.js
index 86600b9ac9e28..2c49b3faaeedc 100644
--- a/toolkit/components/translations/fasttext/fasttext.js
+++ b/toolkit/components/translations/fasttext/fasttext.js
@@ -6,20 +6,30 @@
  * LICENSE file in the root directory of this source tree.
  */

-import fastTextModularized from './fasttext_wasm.js';
-const fastTextModule = fastTextModularized();
+let fastTextModule;
+
+const _initFastTextModule = async function (wasmModule) {
+  try {
+    fastTextModule = await loadFastTextModule(wasmModule);
+  } catch(e) {
+    console.error(e);
+  }
+  return true
+}

 let postRunFunc = null;
 const addOnPostRun = function(func) {
   postRunFunc = func;
 };

-fastTextModule.addOnPostRun(() => {
-  if (postRunFunc) {
-    postRunFunc();
-  }
-});

+const loadFastText = (wasmModule) => {
+  _initFastTextModule(wasmModule).then((res) => {
+    if (postRunFunc) {
+      postRunFunc();
+    }
+  })
+}
 const thisModule = this;
 const trainFileInWasmFs = 'train.txt';
 const testFileInWasmFs = 'test.txt';
@@ -41,7 +51,7 @@ const getFloat32ArrayFromHeap = (len) => {
 const heapToFloat32 = (r) => new Float32Array(r.buffer, r.ptr, r.size);

 class FastText {
-  constructor() {
+  constructor(fastTextModule) {
     this.f = new fastTextModule.FastText();
   }

@@ -77,6 +87,15 @@ class FastText {
     });
   }

+  loadModelBinary(buffer) {
+    const fastTextNative = this.f;
+    const byteArray = new Uint8Array(buffer);
+    const FS = fastTextModule.FS;
+    FS.writeFile(modelFileInWasmFs, byteArray);
+    fastTextNative.loadModel(modelFileInWasmFs);
+    return new FastTextModel(fastTextNative);
+  }
+
   _train(url, modelName, kwargs = {}, callback = null) {
     const fetchFunc = (thisModule && thisModule.fetch) || fetch;
     const fastTextNative = this.f;
@@ -515,6 +534,3 @@ class FastTextModel {
     });
   }
 }
-
-
-export {FastText, addOnPostRun};
```

---
## Building Bergamot

TODO


<!-- Hyperlinks -->
[3697152e0fd772d9185697fdbd4a1d340ca5571d]: https://github.com/facebookresearch/fastText/tree/3697152e0fd772d9185697fdbd4a1d340ca5571d
[Bugzilla]: https://bugzilla.mozilla.org/enter_bug.cgi?product=Cloud%20Services&component=Server%3A%20Remote%20Settings
[Child]: https://searchfox.org/mozilla-central/search?q=TranslationsChild
[Download and Install]: https://emscripten.org/docs/getting_started/downloads.html#download-and-install
[emscripten (2.0.3)]: https://github.com/emscripten-core/emscripten/blob/main/ChangeLog.md#203-09102020
[emscripten (2.0.18)]: https://github.com/emscripten-core/emscripten/blob/main/ChangeLog.md#2018-04232021
[emscripten (3.1.35)]: https://github.com/emscripten-core/emscripten/blob/main/ChangeLog.md#3135---040323
[Environments]: https://remote-settings.readthedocs.io/en/latest/getting-started.html#environments
[eval()]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/eval
[Filter Expressions]: https://remote-settings.readthedocs.io/en/latest/target-filters.html#filter-expressions
[Firefox Release Schedule]: https://wiki.mozilla.org/Release_Management/Calendar
[generate functions]: https://emscripten.org/docs/api_reference/emscripten.h.html?highlight=dynamic_execution#functions
[Getting Set Up To Work On The Firefox Codebase]: https://firefox-source-docs.mozilla.org/setup/index.html
[GitHub Issue]: https://github.com/facebookresearch/fastText/pull/1227#issuecomment-1353830003
[importScripts()]: https://developer.mozilla.org/en-US/docs/Web/API/WorkerGlobalScope/importScripts
[JSWindowActors]: https://firefox-source-docs.mozilla.org/dom/ipc/jsactors.html#jswindowactor
[minify]: https://github.com/tdewolff/minify
[Parent]: https://searchfox.org/mozilla-central/search?q=TranslationsParent
[Step 3]: https://remote-settings.readthedocs.io/en/latest/getting-started.html#create-a-new-official-type-of-remote-settings
[remote-settings-devtools]: https://github.com/mozilla-extensions/remote-settings-devtools/releases
[Remote Settings]: https://remote-settings.readthedocs.io/en/latest/
[Requirements]: https://fasttext.cc/docs/en/webassembly-module.html#requirements
[toolkit/components/translations]: https://searchfox.org/mozilla-central/search?q=toolkit%2Fcomponents%2Ftranslations
[WASM]: https://webassembly.org/
[Workers]: https://searchfox.org/mozilla-central/search?q=%2Ftranslations.*worker&path=&case=false&regexp=true