229 lines
11 KiB
ReStructuredText
229 lines
11 KiB
ReStructuredText
APIs
|
|
====
|
|
|
|
Running the pipeline API
|
|
::::::::::::::::::::::::
|
|
|
|
You can use the Transformer.js `pipeline` API directly to perform inference, as long
|
|
as the model is in our model hub.
|
|
|
|
The `Transformers.js documentation <https://huggingface.co/tasks>`_ provides a lot
|
|
of examples that you can slightly adapt when running in Firefox.
|
|
|
|
In the example below, a text summarization task is performed using the `summarization` task:
|
|
|
|
.. code-block:: javascript
|
|
|
|
const { createEngine } = ChromeUtils.importESModule("chrome://global/content/ml/EngineProcess.sys.mjs");
|
|
const options = {
|
|
taskName: "summarization",
|
|
modelId: "mozilla/text_summarization",
|
|
modelRevision: "main"
|
|
};
|
|
|
|
const engine = await createEngine(options);
|
|
|
|
const text = 'The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, ' +
|
|
'and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. ' +
|
|
'During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest ' +
|
|
'man-made structure in the world, a title it held for 41 years until the Chrysler Building in New ' +
|
|
'York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to ' +
|
|
'the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the ' +
|
|
'Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second ' +
|
|
'tallest free-standing structure in France after the Millau Viaduct.';
|
|
|
|
const request = { args: [text], options: { max_new_tokens: 100 } };
|
|
const res = await engine.run(request);
|
|
console.log(res[0]["summary_text"]);
|
|
|
|
|
|
|
|
The code sample above executes the LLM and returns the complete output after the computation is finished.
|
|
Alternatively, you can receive the output incrementally by using the asynchronous generator method
|
|
`runWithGenerator` provided by the engine.
|
|
|
|
.. code-block:: javascript
|
|
|
|
let summaryText = "";
|
|
for await (const chunk of engine.runWithGenerator(request)){
|
|
summaryText += chunk.text;
|
|
}
|
|
|
|
|
|
You can use the browser console or toolbox to run this example.
|
|
To enable the browser console, flip the following option in `about:config`: **devtools.chrome.enabled**.
|
|
To get access to the full toolbox, set the **devtools.debugger.remote-enabled** option.
|
|
We recommend using the toolbox to get access to more tools. You will get a security warning
|
|
when starting it, when the toolbox connects to the browser.
|
|
|
|
When running this code, Firefox will look for models in the Mozilla model hub located at https://model-hub.mozilla.org
|
|
which contains a curated list of models.
|
|
|
|
Available Options
|
|
:::::::::::::::::
|
|
|
|
Options passed to the `createEngine` function are verified and converted into a `PipelineOptions` object.
|
|
|
|
Below are the options available:
|
|
|
|
- **taskName**: The name of the task the pipeline is configured for.
|
|
- **featureId**: The identifier for the feature to be used by the pipeline.
|
|
- **engineId**: The identifier for the engine to be used by the pipeline.
|
|
- **timeoutMS**: The maximum amount of time in milliseconds the worker will run (-1 to never expire).
|
|
- **modelHub**: The model hub to use, can be `huggingface` or `mozilla`. When used, `modelHubRootUrl` and `modelHubUrlTemplate` are ignored.
|
|
- **modelHubRootUrl**: The root URL of the model hub where models are hosted.
|
|
- **modelHubUrlTemplate**: A template URL for building the full URL for the model.
|
|
- **modelId**: The identifier for the specific model to be used by the pipeline.
|
|
- **modelRevision**: The revision for the specific model to be used by the pipeline.
|
|
- **tokenizerId**: The identifier for the tokenizer associated with the model, used for pre-processing inputs.
|
|
- **tokenizerRevision**: The revision for the tokenizer associated with the model, used for pre-processing inputs.
|
|
- **processorId**: The identifier for any processor required by the model, used for additional input processing.
|
|
- **processorRevision**: The revision for any processor required by the model, used for additional input processing.
|
|
- **logLevel**: The log level used in the worker
|
|
- **runtimeFilename**: Name of the runtime wasm file.
|
|
- **dtype**: quantization level, can be `fp32`, `fp16`, `q8`, `int8`, `uint8`, `q4`, `bnb4`, `q4f16``. Defaults to `q8`
|
|
- **device**: device to use (`wasm` or `gpu`). Defaults to `wasm`
|
|
- **kvCacheDtype**: quantization level for the wllama backend. It only applies to the wllama backend when flash attention is enabled. Otherwise, it is always `f32`.
|
|
- **numContext**: Maximum context size. Applies only to the wllama backend.
|
|
- **numBatch**: Number of tokens processed in a single forward pass. Applies only to the wllama backend.
|
|
- **numUbatch**: Token batch size. Applies only to the wllama backend.
|
|
- **flashAttn**: Whether to use flash attention. Applies only to the wllama backend.
|
|
- **useMmap**: Whether to use memory mapped for the model. Applies only to the wllama backend.
|
|
- **useMlock**: Whether to lock in memory the full model. Applies only to the wllama backend.
|
|
- **numThreadsDecoding**: Number of threads used during decoding. Applies only to the wllama backend.
|
|
- **modelFile**: The name of model file. Currently, only supported for the wllama backend.
|
|
- **backend**: The backend to use, can be `onnx`, `wllama`. Default to `onnx`.
|
|
|
|
**taskName** and **modelId** are required, the others are optional and will be filled automatically
|
|
using values pulled from Remote Settings when the task id is recognized.
|
|
|
|
To learn about the different inference tasks, refer to this Hugging Face
|
|
documentation: `Tasks <https://huggingface.co/tasks>`_
|
|
|
|
**featureId** is used to uniquely identify the feature that will be used by the pipeline
|
|
and store corresponding options in Remote Settings -- :ref:`See the ml-inference-options collection <inference-remote-settings>`.
|
|
|
|
**engineId** is used to manage the lifecycle of the engine. When not provided, it defaults to
|
|
`default-engine`. Everytime a new engine is created using `createEngine`, the API will ensure that
|
|
there's a single engine with the given id. If the options of the existing engine are not different,
|
|
the instance is reused. If they differ, the engine is reinitialized with the new options.
|
|
This ensures we don't have too many engines running at once since it takes a lot of resources.
|
|
To make sure your engine is not destroyed or reused elsewhere, set that value with a unique id
|
|
that matches your component.
|
|
|
|
When an engine is created, an inference process is created if it's not already there, and
|
|
a new worker is launched for that engine. The inference process is unique and shared by all engines.
|
|
|
|
Some values are also set from the preferences (set in `about:config`):
|
|
|
|
- **browser.ml.logLevel**: Set to "All" to see all logs, which are useful for debugging.
|
|
- **browser.ml.modelHubRootUrl**: Model hub root URL used to download models
|
|
- **browser.ml.modelHubUrlTemplate**: Model URL template
|
|
- **browser.ml.modelCacheTimeout**: Worker timeout in ms. Default value used for **timeoutMS**
|
|
- **browser.ml.modelCacheMaxSize**: Maximum disk size for ML model cache (in GiB)
|
|
|
|
|
|
URL allow and deny list
|
|
:::::::::::::::::::::::
|
|
|
|
We keep a Remote Settings collection called `ml-model-allow-deny-list` that contains URL prefixes
|
|
that are allowed or denied.
|
|
|
|
Each record comes with the following fields:
|
|
|
|
- urlPrefix: The URL prefix to allow or deny
|
|
- filter: Set to `ALLOW` to allow, `DENY` to deny
|
|
- description: an optional description
|
|
|
|
When the API is about to fetch a file, its URL is controlled in the allow/deny list.
|
|
|
|
Examples of patterns:
|
|
|
|
- ALL models ALL VERSIONS from the mozilla organization on hugging face : https://huggingface.co/Mozilla/
|
|
- ALL models ALL VERSIONS from our hub: https://model-hub.mozilla.org/
|
|
- A specific model ALL VERSIONS https://huggingface.co/typeform/distilbert-base-uncased-mnli/
|
|
- A specific model and a specific version https://huggingface.co/Mozilla/distilvit/blob/v0.5.0/
|
|
|
|
Each URL is tested and needs to be included in the allowlist and not in the denylist
|
|
|
|
To bypass this check and allow Firefox to download any file for runnings models,
|
|
you need to use the `MOZ_ALLOW_EXTERNAL_ML_HUB` environment variable.
|
|
|
|
If you want to add a new hub, organization or a specific model, ask us by
|
|
`opening a ticket <https://bugzilla.mozilla.org/enter_bug.cgi?product=Core&component=Machine%20Learning>`_.
|
|
|
|
|
|
Using the Hugging Face model hub
|
|
::::::::::::::::::::::::::::::::
|
|
|
|
By default, the engine will use the Mozilla model hub. You will need to pass `huggingface` as `modelHub`.
|
|
|
|
The inference engine will then look for models in the Hugging Face model hub. If the URL is
|
|
not allowed (see previous section) and you still want to experiment with the model,
|
|
use `MOZ_ALLOW_EXTERNAL_ML_HUB`.
|
|
|
|
To run against a Hugging Face model, visit `this page <https://huggingface.co/models?library=transformers.js>`_ and select on
|
|
the top left corner `tasks`. You can pick a task and then choose a model.
|
|
|
|
For example, models for the `summarization` tasks compatible with our inference engine are listed `here <https://huggingface.co/models?pipeline_tag=summarization&library=transformers.js&sort=trending>`_.
|
|
|
|
Let's say you want to pick the `Xenova/distilbart-cnn-6-6` model. All you have to do is use the id when calling our
|
|
`createEngine` pipeline:
|
|
|
|
.. code-block:: javascript
|
|
|
|
const { createEngine } = ChromeUtils.importESModule("chrome://global/content/ml/EngineProcess.sys.mjs");
|
|
|
|
const options = {
|
|
taskName: "summarization",
|
|
modelId: "Xenova/distilbart-cnn-6-6",
|
|
modelHub: "huggingface"
|
|
};
|
|
|
|
const engine = await createEngine(options);
|
|
|
|
const text = 'The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, ' +
|
|
'and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. ' +
|
|
'During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest ' +
|
|
'man-made structure in the world, a title it held for 41 years until the Chrysler Building in New ' +
|
|
'York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to ' +
|
|
'the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the ' +
|
|
'Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second ' +
|
|
'tallest free-standing structure in France after the Millau Viaduct.';
|
|
|
|
const request = { args: [text], options: { max_new_tokens: 100 } };
|
|
const res = await engine.run(request);
|
|
console.log(res[0]["summary_text"]);
|
|
|
|
|
|
Running the internal APIs
|
|
:::::::::::::::::::::::::
|
|
|
|
Some inference tasks are doing more complex operations within the engine, such as image processing.
|
|
For these tasks, you can use the internal APIs to run the inference. Those tasks are prefixed with `moz`.
|
|
|
|
In the example below, an image is converted to text using the `moz-image-to-text` task.
|
|
|
|
|
|
.. code-block:: javascript
|
|
|
|
const { createEngine } = ChromeUtils.importESModule("chrome://global/content/ml/EngineProcess.sys.mjs");
|
|
|
|
// options needed for the task
|
|
const options = {taskName: "moz-image-to-text" };
|
|
|
|
// We create the engine object, using the options
|
|
const engine = await createEngine(options);
|
|
|
|
// Preparing a request
|
|
const request = {url: "https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg"};
|
|
|
|
// At this point we are ready to do some inference.
|
|
const res = await engine.run(request);
|
|
// The result is a string containing the text extracted from the image
|
|
console.log(res);
|
|
|
|
|
|
The following internal tasks are supported by the machine learning engine:
|
|
|
|
.. js:autofunction:: imageToText
|