Bug 1922481 - Improve the ML docs r=vazish

Depends on D221606 Differential Revision: https://phabricator.services.mozilla.com/D224449
2024-10-03 16:21:51 +00:00
parent 521c33ff8a
commit 5f3574e22b
4 changed files with 77 additions and 10 deletions
--- a/toolkit/components/ml/docs/architecture.rst
+++ b/toolkit/components/ml/docs/architecture.rst
@@ -0,0 +1,31 @@
+Architecture
+============
+
+The Firefox AI Platform uses the ONNX runtime to run models, and leverages
+the Transformers.js library to simplify the inference work.
+
+
+
+.. figure:: assets/architecture.png
+   :alt: Platform Architecture
+   :scale: 95%
+   :align: center
+
+   Firefox AI Platform Architecture
+
+(1)(2) When a content process calls the inference API, Firefox calls the Remote Settings
+service to get ONNX WASM runtime if needed, and to get default options for the
+inference task that is going to be executed.
+
+
+(3) Firefox then creates an inference process which is a specific type of content process.
+That process loads the Transformers.js and ONNX WASM runtime. It then triggers the inference call.
+
+(4) The Transformers.js library will ask for model files depending on the
+inference task to perform and the different options that were passed.
+These calls are relayed to Firefox, which will look at what is available in
+IndexDB. If the required files are present, it will return them to the inference
+process.
+
+(5) If they are not, it will trigger the download process by visiting the
+Model Hub. And store then in IndexDB
--- a/toolkit/components/ml/docs/assets/architecture.png
+++ b/toolkit/components/ml/docs/assets/architecture.png
--- a/toolkit/components/ml/docs/index.rst
+++ b/toolkit/components/ml/docs/index.rst
@@ -27,6 +27,7 @@ Learn more about the platform:
 .. toctree::
   :maxdepth: 1

+   architecture
   api
   notifications
   models
--- a/toolkit/components/ml/docs/models.rst
+++ b/toolkit/components/ml/docs/models.rst
@@ -1,6 +1,42 @@
 Models management
 =================

+
+Prepare a model for Firefox
+:::::::::::::::::::::::::::
+
+Models that can be used with Firefox should have ONNX weights at different quantization levels.
+
+In order to make sure we are compatible with Transformers.js, we use the conversion script
+provided by that project, which checks that the model arhitecture will work and has
+been tested.
+
+To do this, follow these steps:
+
+- make sure your model is published in Hugging Face with PyTorch or SafeTensor weights.
+- clone https://github.com/xenova/transformers.js and checkout branch `v3`
+- go into `scripts/`
+- create a virtualenv there and install requirements from the local `requirements.txt` file
+
+Then you can run:
+
+.. code-block:: bash
+
+  python convert.py --model_id organizationId/modelId --quantize --modes fp16 q8 q4 --task the-inference-task
+
+You will get a new directory in `models/organizationId/modelId` that includes an `onnx` directory and
+other files. Upload everything into Hugging Face.
+
+Congratulations! you have a Firefox-compatible model. You can now try it in `about:inference`.
+
+Notice that for the encoder-decoder models with two files, you may need to rename `decoder_model_quantized.onnx`
+to `decoder_model_merged_quantized.onnx`, and make similar changes to the fp16, q4 versions.
+You do not need to rename the encoder models.
+
+
+Lifecycle
+:::::::::
+
 When Firefox uses a model, it will

 1. read metadata stored in Remote Settings
@@ -9,7 +45,7 @@ When Firefox uses a model, it will


 1. Remote Settings
-::::::::::::::::::
+------------------

 We have two collections in Remote Settings:

@@ -26,7 +62,7 @@ setting a new revision for a model in Remote Settings will trigger a new downloa


 2. Model Hub
-::::::::::::
+------------

 Our Model hub follows the same structure than Hugging Face, each file for a model is under
 a unique URL:
@@ -38,8 +74,13 @@ Where:
 - `revision` is the branch or version
 - `path` is the path to the file.

-When a model is stored in the Mozilla or Hugging Face Model Hub, it typically consists of several
-files that define the model, its configuration, tokenizer, and training metadata.
+
+Model files downloaded from the hub are stored in IndexDB so users don't need to download them again.
+
+Model files
+:::::::::::
+
+Models consists of several files like its configuration, tokenizer, training metadata, and weights.

 Below are the most common files you’ll encounter:

@@ -89,9 +130,3 @@ This allows the Hugging Face library to reconstruct the model exactly as it was

 - ``merges.txt``: For byte pair encoding (BPE) tokenizers, this file contains the merge operations used to split words into subwords.
 - ``preprocessor_config.json``: Contains configuration details for any pre-processing or feature extraction steps applied to the input before passing it to the model.
-
-
-3. IndexDB
-::::::::::
-
-Model files are stored in IndexDB so users don't need to download them again.