Let’s load some default models that work well off the box for various tasks
FastEncode is an onnx based embedding model wrapper that can work with most onnx model with a huggingface tokenizer. (The Qwen models are a bit tricky due to their padding token handling so they need a custom wrapper which we will add later)
/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/usearch/__init__.py:125: UserWarning: Will download `usearch_sqlite` binary from GitHub.
warnings.warn("Will download `usearch_sqlite` binary from GitHub.", UserWarning)
def download_model( repo_id:str='onnx-community/embeddinggemma-300m-ONNX', # model repo on HF md:str='onnx-community/embeddinggemma-300m-ONNX', # local model dir token:NoneType=None, # HF token. you can also set HF_TOKEN env variable):
def FastEncode( model_dict:AttrDict={'model': 'onnx-community/embeddinggemma-300m-ONNX', 'onnx_path': 'onnx/model.onnx', 'prompt': {'document': 'Instruct: document \n document: {text}', 'query': 'Instruct: query \n query: {text}'}}, # model dict with model repo, onnx path and prompt templates repo_id:NoneType=None, # model repo on HF. needs to have onnx model file md:NoneType=None, # local model dir md_nm:NoneType=None, # onnx model file name normalize:bool=True, # normalize embeddings dtype:type=<class'numpy.float16'>, # output dtype tti:bool=False, # use token type ids prompt:NoneType=None, # prompt templates hf_token:NoneType=None, # HF token. you can also set HF_TOKEN env variable):
Fast ONNX-based text encoder
Let’s quickly check if the encoder is working
enc=FastEncode()
2026-01-12 09:03:03.549 Python[28016:124590440] 2026-01-12 09:03:03.547587 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 122 number of nodes in the graph: 1380 number of nodes supported by CoreML: 281
*************** EP Error ***************
EP Error SystemError : 20 when using ['CoreMLExecutionProvider', 'CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
****************************************
enc.encode_document(['This is a test', 'Another test'])
2026-01-12 09:03:15.873 Python[28016:124590440] 2026-01-12 09:03:15.873658 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 180 number of nodes in the graph: 1509 number of nodes supported by CoreML: 952
modern_enc.encode_query(['This is a test', 'Another test'])
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1