firefox/toolkit/components/ml/docs/architecture.rst

.. _inference-architecture:

Architecture
============

The Firefox AI Runtime uses the ONNX runtime to run models, and leverages
the Transformers.js library to simplify the inference work.


.. figure:: assets/architecture.png
   :alt: Platform Architecture
   :scale: 95%
   :align: center

   Firefox AI Runtime Architecture

(1)(2) When a content process calls the inference API, Firefox calls the Remote Settings
service to get ONNX WASM runtime if needed, and to get default options for the
inference task that is going to be executed.


(3) Firefox then creates an inference process which is a specific type of content process.
That process loads the Transformers.js and ONNX WASM runtime. It then triggers the inference call.

(4) The Transformers.js library will ask for model files depending on the
inference task to perform and the different options that were passed.
These calls are relayed to Firefox, which will look at what is available in
IndexDB. If the required files are present, it will return them to the inference
process.

(5) If they are not, it will trigger the download process by visiting the
Model Hub. And store then in IndexDB