Using The Mojo Backend
This page is about one specific goal:
"I want my Python Kayak setup to use the Mojo exact CPU backend as my normal path, not accidentally fall back to NumPy because I only installed the package."
What The Mojo Backend Actually Gives You
kayak.MOJO_EXACT_CPU_BACKEND is the exact CPU execution path for:
kayak.maxsim(...)kayak.maxsim_batch(...)kayak.search(...)kayak.search_batch(...)kayak.search_with_plan(...)
Use it for:
- exact local scoring
- exact local top-k search
- exact reranking in stage-aware search plans
- repeated and batched queries against the same index
It does not change the retrieval semantics. It changes the executor.
That means:
- the scores should match the NumPy reference path on supported inputs
- the stage structure stays the same
- the backend choice is still explicit at the call site
Make Mojo Available
The requirement is not Pixi itself. The requirement is that Kayak can invoke a
usable mojo command.
Two verified install stories are:
- put Python 3.11, Mojo, and Kayak in the same Pixi environment
- or install
kayakwith UV or pip in an environment wheremojois already available
In both cases:
- Verify
kayak.available_backends()includesmojo_exact_cpu. - Define a backend constant in your application code.
If that tuple does not include mojo_exact_cpu, stop and inspect
kayak.backend_info(kayak.MOJO_EXACT_CPU_BACKEND) before assuming anything.
UV works too if Mojo is already installed first and is discoverable:
mojo --version
uv add kayak
uv run python - <<'PY'
import kayak
print(kayak.available_backends())
print(kayak.backend_info(kayak.MOJO_EXACT_CPU_BACKEND).availability_reason)
PY
That is equally valid as long as the mojo CLI is actually visible to Kayak.
How To Make Your Own Code Default To Mojo
Low-level Kayak calls do not auto-select Mojo. If you want "default to Mojo" behavior in your own codebase, define that default yourself:
Then thread it through the operations you care about:
scores = kayak.maxsim(query, index, backend=BACKEND)
hits = kayak.search(query, index, k=10, backend=BACKEND)
hits_by_query = kayak.search_batch(batch, index, k=10, backend=BACKEND)
result = kayak.search_with_plan(query, index, plan, backend=BACKEND)
That is better than relying on hidden environment behavior because:
- it is explicit in code review
- tests can override it easily
- benchmarks stay honest about which executor ran
If you want the highest-level text workflow to do that selection for you,
open_text_retriever(...) already prefers Mojo automatically when it is
available.
Use 128-Dim Embeddings For The Mojo Path
The validated Mojo workflow is ColBERT-style 128-dimensional embeddings.
That is the right shape for:
query.to_layout("flat_dim128")index.to_layout("hybrid_flat_dim128")- the prepared packed-index fast path
If you are sketching with tiny 2D toy vectors, keep those examples on the NumPy backend. Switch to 128-dimensional vectors before you judge the Mojo path.
Fastest Normal Usage Pattern
For repeated exact search against one index, use this pattern:
import numpy as np
import kayak
def dim128(index: int) -> np.ndarray:
vector = np.zeros(128, dtype=np.float32)
vector[index] = 1.0
return vector
BACKEND = kayak.MOJO_EXACT_CPU_BACKEND
query = kayak.query(np.stack([dim128(0), dim128(1)]))
index = kayak.documents(
["doc-a", "doc-b"],
[
np.stack([dim128(0), dim128(1)]),
np.stack([dim128(0), dim128(0)]),
],
).pack()
hits = kayak.search(query, index, k=2, backend=BACKEND)
Reasons this is the right baseline:
documents.pack()gives you a reusableLateIndex- reusing the same
LateIndexinstance lets Kayak reuse prepared index state search(...)gives direct exact top-k output instead of forcing a full score vector first
When To Use Explicit Flat Layouts
If you want the 128-dimensional layout transformation to be visible in the code, convert it explicitly:
flat_query = query.to_layout("flat_dim128")
hybrid_index = index.to_layout("hybrid_flat_dim128")
scores = kayak.maxsim(
flat_query,
hybrid_index,
backend=BACKEND,
)
Choose this form when:
- you want profiling boundaries to line up with the flat layout
- you want layout choice to be part of the benchmark setup
- you are reasoning about storage and conversion cost explicitly
Use Batch APIs For Repeated Query Traffic
If the index is fixed and queries are the thing changing, use the batch API.
batch = kayak.query_batch(
[
np.stack([dim128(0), dim128(1)]),
np.stack([dim128(0), dim128(1), dim128(2)]),
]
)
scores_by_query = kayak.maxsim_batch(
batch,
index,
backend=BACKEND,
)
hits_by_query = kayak.search_batch(
batch,
index,
k=10,
backend=BACKEND,
)
This is a better fit than looping one query at a time in Python when you are running repeated local evaluation or benchmark workloads.
Use Mojo Inside Search Plans
Search plans are explicit stage pipelines. The backend only affects the stages that perform exact late-interaction scoring.
Exact full scan
plan = kayak.exact_full_scan_search_plan(final_k=10)
result = kayak.search_with_plan(query, index, plan, backend=BACKEND)
Here stage 1 is already exact, so stage 2 defaults to noop_topk.
Approximate candidate generation plus exact Mojo rerank
plan = kayak.document_proxy_search_plan(
final_k=10,
candidate_k=100,
query_vector_budget=32,
document_vector_budget=64,
)
result = kayak.search_with_plan(query, index, plan, backend=BACKEND)
In this plan:
- stage 1 uses the document-proxy candidate generator
- stage 2 uses exact late interaction on the candidate window
- the exact rerank stage is where the backend matters most
That is the standard pattern when you want a fast approximate candidate stage followed by exact scoring on a smaller set.
What To Do With It
Use the Mojo backend when you need to answer one of these questions with exact local execution:
- "What are the exact top-k results for this query and index?"
- "Did my approximate candidate generator recover the exact winners?"
- "How much exact stage-2 work did this candidate window create?"
- "What is steady-state exact search throughput on a fixed local index?"
The point is not just "make search faster." The point is "keep the exact path programmable and measurable from Python."
Troubleshooting
available_backends() only shows numpy_reference
Check:
info = kayak.backend_info(kayak.MOJO_EXACT_CPU_BACKEND)
print(info.available)
print(info.availability_reason)
Most often this means one of:
- Mojo is not installed in the active environment
mojois not onPATH- the wrong
mojobinary is being picked up
If you need to pin the binary, set KAYAK_MOJO_CLI.
KAYAK_MOJO_CLI can be either a binary path or a full command prefix such as:
Mojo reports a bundled backend/compiler mismatch
That means the installed kayak wheel was built with a different Mojo release
than the one you are invoking locally. Install matching kayak and mojo
versions, then rerun the same code.
The first Mojo call is slow
That is expected. The first call may build backend artifacts and prepare index state. Measure steady-state after warmup, not only the first invocation.
You want Mojo to be implicit everywhere
The SDK does not do that. Make it explicit in your own code with a shared
BACKEND = kayak.MOJO_EXACT_CPU_BACKEND constant or a small wrapper layer.