utils

encoding utilities for onnx based text encoders

Let’s load some default models that work well off the box for various tasks

FastEncode is an onnx based embedding model wrapper that can work with most onnx model with a huggingface tokenizer. (The Qwen models are a bit tricky due to their padding token handling so they need a custom wrapper which we will add later)

/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/usearch/__init__.py:125: UserWarning: Will download `usearch_sqlite` binary from GitHub.
  warnings.warn("Will download `usearch_sqlite` binary from GitHub.", UserWarning)

source

download_model

 download_model (repo_id='onnx-community/embeddinggemma-300m-ONNX',
                 md='onnx-community/embeddinggemma-300m-ONNX', token=None)

Download model from HF hub

Type Default Details
repo_id str onnx-community/embeddinggemma-300m-ONNX model repo on HF
md str onnx-community/embeddinggemma-300m-ONNX local model dir
token NoneType None HF token. you can also set HF_TOKEN env variable

source

FastEncode

 FastEncode (model_dict={'model': 'onnx-
             community/embeddinggemma-300m-ONNX', 'onnx_path':
             'onnx/model.onnx', 'prompt': {'document': 'Instruct: document
             \n document: {text}', 'query': 'Instruct: query \n query:
             {text}'}}, repo_id=None, md=None, md_nm=None, normalize=True,
             dtype=<class 'numpy.float16'>, tti=False, prompt=None,
             hf_token=None)

Fast ONNX-based text encoder

Type Default Details
model_dict AttrDict {‘model’: ‘onnx-community/embeddinggemma-300m-ONNX’, ‘onnx_path’: ‘onnx/model.onnx’, ‘prompt’: {‘document’: ‘Instruct: document document: {text}’, ‘query’: ‘Instruct: query query: {text}’}} model dict with model repo, onnx path and prompt templates
repo_id NoneType None model repo on HF. needs to have onnx model file
md NoneType None local model dir
md_nm NoneType None onnx model file name
normalize bool True normalize embeddings
dtype type float16 output dtype
tti bool False use token type ids
prompt NoneType None prompt templates
hf_token NoneType None HF token. you can also set HF_TOKEN env variable

Let’s quickly check if the encoder is working

enc=FastEncode()
2025-12-16 22:58:02.611162903 [W:onnxruntime:, transformer_memcpy.cc:111 ApplyImpl] 736 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
enc.encode_document(['This is a test', 'Another test'])
array([[ 0.05774 ,  0.001704,  0.002562, ..., -0.06177 , -0.00661 ,
         0.03174 ],
       [ 0.02939 , -0.008194, -0.00918 , ..., -0.02846 , -0.002222,
         0.02847 ]], shape=(2, 768), dtype=float16)
modern_enc=FastEncode(modernbert)
modern_enc.encode_query(['This is a test', 'Another test'])
array([[-0.05026 , -0.04352 , -0.0171  , ..., -0.04974 ,  0.01598 ,
        -0.07056 ],
       [-0.05093 , -0.02133 , -0.0368  , ..., -0.10736 , -0.000944,
        -0.01177 ]], shape=(2, 768), dtype=float16)