Let’s load some default models that work well off the box for various tasks
FastEncode is an onnx based embedding model wrapper that can work with most onnx model with a huggingface tokenizer. (The Qwen models are a bit tricky due to their padding token handling so they need a custom wrapper which we will add later)
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/usearch/__init__.py:125: UserWarning: Will download `usearch_sqlite` binary from GitHub.
warnings.warn("Will download `usearch_sqlite` binary from GitHub.", UserWarning)
model dict with model repo, onnx path and prompt templates
repo_id
NoneType
None
model repo on HF. needs to have onnx model file
md
NoneType
None
local model dir
md_nm
NoneType
None
onnx model file name
normalize
bool
True
normalize embeddings
dtype
type
float16
output dtype
tti
bool
False
use token type ids
prompt
NoneType
None
prompt templates
hf_token
NoneType
None
HF token. you can also set HF_TOKEN env variable
Let’s quickly check if the encoder is working
enc=FastEncode()
2025-12-16 22:58:02.611162903 [W:onnxruntime:, transformer_memcpy.cc:111 ApplyImpl] 736 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
enc.encode_document(['This is a test', 'Another test'])