utils

encoding utilities for onnx based text encoders

Let’s load some default models that work well off the box for various tasks

FastEncode is an onnx based embedding model wrapper that can work with most onnx model with a huggingface tokenizer. (The Qwen models are a bit tricky due to their padding token handling so they need a custom wrapper which we will add later)

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/usearch/__init__.py:125: UserWarning: Will download `usearch_sqlite` binary from GitHub.
  warnings.warn("Will download `usearch_sqlite` binary from GitHub.", UserWarning)

source

download_model


def download_model(
    repo_id:str='onnx-community/embeddinggemma-300m-ONNX', # model repo on HF
    md:str='onnx-community/embeddinggemma-300m-ONNX', # local model dir
    token:NoneType=None, # HF token. you can also set HF_TOKEN env variable
):

Download model from HF hub


source

FastEncode


def FastEncode(
    model_dict:AttrDict={'model': 'onnx-community/embeddinggemma-300m-ONNX', 'onnx_path': 'onnx/model.onnx', 'prompt': {'document': 'Instruct: document \n document: {text}', 'query': 'Instruct: query \n query: {text}'}}, # model dict with model repo, onnx path and prompt templates
    repo_id:NoneType=None, # model repo on HF. needs to have onnx model file
    md:NoneType=None, # local model dir
    md_nm:NoneType=None, # onnx model file name
    normalize:bool=True, # normalize embeddings
    dtype:type=<class 'numpy.float16'>, # output dtype
    tti:bool=False, # use token type ids
    prompt:NoneType=None, # prompt templates
    hf_token:NoneType=None, # HF token. you can also set HF_TOKEN env variable
):

Fast ONNX-based text encoder

Let’s quickly check if the encoder is working

enc=FastEncode()
2026-01-12 09:03:03.549 Python[28016:124590440] 2026-01-12 09:03:03.547587 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 122 number of nodes in the graph: 1380 number of nodes supported by CoreML: 281
*************** EP Error ***************
EP Error SystemError : 20 when using ['CoreMLExecutionProvider', 'CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
****************************************
enc.encode_document(['This is a test', 'Another test'])
array([[ 0.05777 ,  0.001723,  0.002573, ..., -0.0618  , -0.00662 ,
         0.03174 ],
       [ 0.02936 , -0.00818 , -0.00916 , ..., -0.02847 , -0.00226 ,
         0.02846 ]], shape=(2, 768), dtype=float16)
modern_enc=FastEncode(modernbert)
2026-01-12 09:03:15.873 Python[28016:124590440] 2026-01-12 09:03:15.873658 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 180 number of nodes in the graph: 1509 number of nodes supported by CoreML: 952
modern_enc.encode_query(['This is a test', 'Another test'])
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
array([[-0.0503  , -0.04352 , -0.0171  , ..., -0.04977 ,  0.01598 ,
        -0.0706  ],
       [-0.05096 , -0.02129 , -0.0367  , ..., -0.10736 , -0.000948,
        -0.0118  ]], shape=(2, 768), dtype=float16)