Bug 1922481 - Improve the ML docs r=vazish

Depends on D221606

Differential Revision: https://phabricator.services.mozilla.com/D224449
This commit is contained in:
Tarek Ziadé
2024-10-03 16:21:51 +00:00
parent 521c33ff8a
commit 5f3574e22b
4 changed files with 77 additions and 10 deletions

View File

@@ -0,0 +1,31 @@
Architecture
============
The Firefox AI Platform uses the ONNX runtime to run models, and leverages
the Transformers.js library to simplify the inference work.
.. figure:: assets/architecture.png
:alt: Platform Architecture
:scale: 95%
:align: center
Firefox AI Platform Architecture
(1)(2) When a content process calls the inference API, Firefox calls the Remote Settings
service to get ONNX WASM runtime if needed, and to get default options for the
inference task that is going to be executed.
(3) Firefox then creates an inference process which is a specific type of content process.
That process loads the Transformers.js and ONNX WASM runtime. It then triggers the inference call.
(4) The Transformers.js library will ask for model files depending on the
inference task to perform and the different options that were passed.
These calls are relayed to Firefox, which will look at what is available in
IndexDB. If the required files are present, it will return them to the inference
process.
(5) If they are not, it will trigger the download process by visiting the
Model Hub. And store then in IndexDB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

View File

@@ -27,6 +27,7 @@ Learn more about the platform:
.. toctree::
:maxdepth: 1
architecture
api
notifications
models

View File

@@ -1,6 +1,42 @@
Models management
=================
Prepare a model for Firefox
:::::::::::::::::::::::::::
Models that can be used with Firefox should have ONNX weights at different quantization levels.
In order to make sure we are compatible with Transformers.js, we use the conversion script
provided by that project, which checks that the model arhitecture will work and has
been tested.
To do this, follow these steps:
- make sure your model is published in Hugging Face with PyTorch or SafeTensor weights.
- clone https://github.com/xenova/transformers.js and checkout branch `v3`
- go into `scripts/`
- create a virtualenv there and install requirements from the local `requirements.txt` file
Then you can run:
.. code-block:: bash
python convert.py --model_id organizationId/modelId --quantize --modes fp16 q8 q4 --task the-inference-task
You will get a new directory in `models/organizationId/modelId` that includes an `onnx` directory and
other files. Upload everything into Hugging Face.
Congratulations! you have a Firefox-compatible model. You can now try it in `about:inference`.
Notice that for the encoder-decoder models with two files, you may need to rename `decoder_model_quantized.onnx`
to `decoder_model_merged_quantized.onnx`, and make similar changes to the fp16, q4 versions.
You do not need to rename the encoder models.
Lifecycle
:::::::::
When Firefox uses a model, it will
1. read metadata stored in Remote Settings
@@ -9,7 +45,7 @@ When Firefox uses a model, it will
1. Remote Settings
::::::::::::::::::
------------------
We have two collections in Remote Settings:
@@ -26,7 +62,7 @@ setting a new revision for a model in Remote Settings will trigger a new downloa
2. Model Hub
::::::::::::
------------
Our Model hub follows the same structure than Hugging Face, each file for a model is under
a unique URL:
@@ -38,8 +74,13 @@ Where:
- `revision` is the branch or version
- `path` is the path to the file.
When a model is stored in the Mozilla or Hugging Face Model Hub, it typically consists of several
files that define the model, its configuration, tokenizer, and training metadata.
Model files downloaded from the hub are stored in IndexDB so users don't need to download them again.
Model files
:::::::::::
Models consists of several files like its configuration, tokenizer, training metadata, and weights.
Below are the most common files youll encounter:
@@ -89,9 +130,3 @@ This allows the Hugging Face library to reconstruct the model exactly as it was
- ``merges.txt``: For byte pair encoding (BPE) tokenizers, this file contains the merge operations used to split words into subwords.
- ``preprocessor_config.json``: Contains configuration details for any pre-processing or feature extraction steps applied to the input before passing it to the model.
3. IndexDB
::::::::::
Model files are stored in IndexDB so users don't need to download them again.