Bug 1922481 - Improve the ML docs r=vazish
Depends on D221606 Differential Revision: https://phabricator.services.mozilla.com/D224449
This commit is contained in:
31
toolkit/components/ml/docs/architecture.rst
Normal file
31
toolkit/components/ml/docs/architecture.rst
Normal file
@@ -0,0 +1,31 @@
|
||||
Architecture
|
||||
============
|
||||
|
||||
The Firefox AI Platform uses the ONNX runtime to run models, and leverages
|
||||
the Transformers.js library to simplify the inference work.
|
||||
|
||||
|
||||
|
||||
.. figure:: assets/architecture.png
|
||||
:alt: Platform Architecture
|
||||
:scale: 95%
|
||||
:align: center
|
||||
|
||||
Firefox AI Platform Architecture
|
||||
|
||||
(1)(2) When a content process calls the inference API, Firefox calls the Remote Settings
|
||||
service to get ONNX WASM runtime if needed, and to get default options for the
|
||||
inference task that is going to be executed.
|
||||
|
||||
|
||||
(3) Firefox then creates an inference process which is a specific type of content process.
|
||||
That process loads the Transformers.js and ONNX WASM runtime. It then triggers the inference call.
|
||||
|
||||
(4) The Transformers.js library will ask for model files depending on the
|
||||
inference task to perform and the different options that were passed.
|
||||
These calls are relayed to Firefox, which will look at what is available in
|
||||
IndexDB. If the required files are present, it will return them to the inference
|
||||
process.
|
||||
|
||||
(5) If they are not, it will trigger the download process by visiting the
|
||||
Model Hub. And store then in IndexDB
|
||||
BIN
toolkit/components/ml/docs/assets/architecture.png
Normal file
BIN
toolkit/components/ml/docs/assets/architecture.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 31 KiB |
@@ -27,6 +27,7 @@ Learn more about the platform:
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
architecture
|
||||
api
|
||||
notifications
|
||||
models
|
||||
|
||||
@@ -1,6 +1,42 @@
|
||||
Models management
|
||||
=================
|
||||
|
||||
|
||||
Prepare a model for Firefox
|
||||
:::::::::::::::::::::::::::
|
||||
|
||||
Models that can be used with Firefox should have ONNX weights at different quantization levels.
|
||||
|
||||
In order to make sure we are compatible with Transformers.js, we use the conversion script
|
||||
provided by that project, which checks that the model arhitecture will work and has
|
||||
been tested.
|
||||
|
||||
To do this, follow these steps:
|
||||
|
||||
- make sure your model is published in Hugging Face with PyTorch or SafeTensor weights.
|
||||
- clone https://github.com/xenova/transformers.js and checkout branch `v3`
|
||||
- go into `scripts/`
|
||||
- create a virtualenv there and install requirements from the local `requirements.txt` file
|
||||
|
||||
Then you can run:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python convert.py --model_id organizationId/modelId --quantize --modes fp16 q8 q4 --task the-inference-task
|
||||
|
||||
You will get a new directory in `models/organizationId/modelId` that includes an `onnx` directory and
|
||||
other files. Upload everything into Hugging Face.
|
||||
|
||||
Congratulations! you have a Firefox-compatible model. You can now try it in `about:inference`.
|
||||
|
||||
Notice that for the encoder-decoder models with two files, you may need to rename `decoder_model_quantized.onnx`
|
||||
to `decoder_model_merged_quantized.onnx`, and make similar changes to the fp16, q4 versions.
|
||||
You do not need to rename the encoder models.
|
||||
|
||||
|
||||
Lifecycle
|
||||
:::::::::
|
||||
|
||||
When Firefox uses a model, it will
|
||||
|
||||
1. read metadata stored in Remote Settings
|
||||
@@ -9,7 +45,7 @@ When Firefox uses a model, it will
|
||||
|
||||
|
||||
1. Remote Settings
|
||||
::::::::::::::::::
|
||||
------------------
|
||||
|
||||
We have two collections in Remote Settings:
|
||||
|
||||
@@ -26,7 +62,7 @@ setting a new revision for a model in Remote Settings will trigger a new downloa
|
||||
|
||||
|
||||
2. Model Hub
|
||||
::::::::::::
|
||||
------------
|
||||
|
||||
Our Model hub follows the same structure than Hugging Face, each file for a model is under
|
||||
a unique URL:
|
||||
@@ -38,8 +74,13 @@ Where:
|
||||
- `revision` is the branch or version
|
||||
- `path` is the path to the file.
|
||||
|
||||
When a model is stored in the Mozilla or Hugging Face Model Hub, it typically consists of several
|
||||
files that define the model, its configuration, tokenizer, and training metadata.
|
||||
|
||||
Model files downloaded from the hub are stored in IndexDB so users don't need to download them again.
|
||||
|
||||
Model files
|
||||
:::::::::::
|
||||
|
||||
Models consists of several files like its configuration, tokenizer, training metadata, and weights.
|
||||
|
||||
Below are the most common files you’ll encounter:
|
||||
|
||||
@@ -89,9 +130,3 @@ This allows the Hugging Face library to reconstruct the model exactly as it was
|
||||
|
||||
- ``merges.txt``: For byte pair encoding (BPE) tokenizers, this file contains the merge operations used to split words into subwords.
|
||||
- ``preprocessor_config.json``: Contains configuration details for any pre-processing or feature extraction steps applied to the input before passing it to the model.
|
||||
|
||||
|
||||
3. IndexDB
|
||||
::::::::::
|
||||
|
||||
Model files are stored in IndexDB so users don't need to download them again.
|
||||
|
||||
Reference in New Issue
Block a user