Usage Patterns
This page is the shortest answer to:
- what object to build
- which function to call
- when to use each API
For a complete runnable example, download the executed notebook:
- real-usage-with-mojo.ipynb
- batch-search-on-one-loaded-lancedb-slice.ipynb
- lancedb-to-kayak-reranking.ipynb
If you already have a vector database and want Kayak to own retrieval, see:
If your data already lives in a hosted-engine snapshot and you want one local Python reuse or scheduling surface around that snapshot, see:
If you are choosing an encoder first, read:
Choose The API Shape
Use one retriever object:
kayak.open_text_retriever(...)retriever.upsert_texts(...)retriever.search_text(...)retriever.search_text_batch(...)for repeated query traffic
Choose this when your application inputs are raw text and you do not want to manage encoder and store objects separately.
Use the low-level exact search surface:
kayak.query(...)kayak.documents(...).pack()kayak.search(...)kayak.search_batch(...)when the same index serves many queries
Choose this when your application already owns token-level query and document vectors.
Keep the database for persistence and let Kayak materialize the searchable slice:
kayak.open_store(...)store.load_index(...)kayak.search(...)orkayak.search_batch(...)
Choose this when LanceDB, PgVector, Qdrant, Weaviate, or Chroma is already part of your system.
Use an explicit search plan:
kayak.exact_full_scan_search_plan(...)kayak.document_proxy_search_plan(...)kayak.search_with_plan(...)
Choose this when you need stage accounting, candidate windows, or exact reranking after an approximate first stage.
Text Ingest Plus Search
Use this when your application starts from text and you want one SDK object for ingest plus retrieval.
import kayak
retriever = kayak.open_text_retriever(
encoder="colbert",
store="kayak",
encoder_kwargs={"model_name": "colbert-ir/colbertv2.0"},
store_kwargs={"path": "./kayak-index"},
)
retriever.upsert_texts(doc_ids, texts, metadata=metadata_rows)
hits = retriever.search_text(query_text, k=5, where={"tenant": "acme"})
Use:
kayak.open_text_retriever(...)for one high-level text workflow objectretriever.upsert_texts(...)when the source corpus starts as raw textretriever.search_text(...)when you want the retriever to encode the query and load the store sliceretriever.load_index(...)when you want to reuse one materialized slice across many queries- omit
backend=...if you want this high-level workflow to prefer Mojo automatically when available
Choose the encoder this way:
"colbert"when you have a ColBERT checkpoint on Hugging Face"callable"when you already have your own model methods
Exact Search For One Query
Use this when you already have one query embedding and one packed index.
import kayak
BACKEND = kayak.MOJO_EXACT_CPU_BACKEND
query = kayak.query(query_vectors, text=query_text)
index = kayak.documents(
doc_ids,
document_vectors,
texts=document_texts,
).pack()
hits = kayak.search(query, index, k=5, backend=BACKEND)
Use:
kayak.query(...)for one querykayak.documents(...).pack()for one reusable indexkayak.search(...)when you want top-k hits only
Exact Scores For All Documents
Use this when you need the full score vector, not only top-k hits.
Use:
kayak.maxsim(...)for one query against all indexed documentsLateScores.topk(k)when you want hits after inspecting scores
Batch Search Against The Same Index
Use this when the index stays fixed and you want to run multiple queries.
batch = kayak.query_batch([query_a_vectors, query_b_vectors, query_c_vectors])
hits_by_query = kayak.search_batch(
batch,
index,
k=5,
backend=BACKEND,
)
Use:
kayak.query_batch(...)to keep ragged query vector counts explicitkayak.search_batch(...)for top-k hits per querykayak.maxsim_batch(...)if you need full score vectors per query
If you are already using the high-level retriever and your queries still start as text, use:
hits_by_query = retriever.search_text_batch(
[query_a_text, query_b_text, query_c_text],
k=5,
where={"tenant": "acme"},
)
retriever.search_text_batch(...)when you want one store load plus one batch searchretriever.search_query_batch(...)when you already built aLateQueryBatch
This is the right public SDK shape when many queries target the same already loaded slice. It is different from generic concurrent use of one store object.
Exact Baseline With Stage Profiles
Use this when you want exact retrieval plus explicit stage accounting.
plan = kayak.exact_full_scan_search_plan(final_k=5)
result = kayak.search_with_plan(
query,
index,
plan,
backend=BACKEND,
)
Use:
kayak.exact_full_scan_search_plan(...)as the exact baselinekayak.search_with_plan(...)when you want profiles, candidate-stage output, and stage metadata
Approximate Candidate Stage Plus Exact Rerank
Use this when you want a smaller candidate window before exact late interaction.
plan = kayak.document_proxy_search_plan(
final_k=5,
candidate_k=100,
query_vector_budget=32,
document_vector_budget=64,
)
result = kayak.search_with_plan(
query,
index,
plan,
backend=BACKEND,
)
Use:
candidate_kto control candidate-window sizequery_vector_budgetanddocument_vector_budgetto control proxy-stage costresult.candidate_stage.profileandresult.stage2to inspect the actual work done
Vector DB For Storage, Kayak For Search
Use this when your database is the durable store and Kayak is the search engine.
rows = vector_db.fetch_all()
index = kayak.documents(
[row["doc_id"] for row in rows],
[row["vector"] for row in rows],
texts=[row["text"] for row in rows],
).pack()
hits = kayak.search(query, index, k=10, backend=BACKEND)
Use:
- the vector database for saving
kayak.open_store(...)when you want Kayak to materialize that database into one searchable slice directlykayak.documents(...).pack()as the reusable in-process search indexkayak.search(...)orkayak.search_with_plan(...)for query-time retrieval
For the benchmark-backed version of this pattern, see:
Many Same-Process Callers Against One Fixed Snapshot
Use this when your data already lives in a hosted-engine service root and many callers in one Python process need the same pinned snapshot.
from kayak_engine import (
PreparedExactSearchRuntimeConfig,
prepare_exact_search_runtime,
)
runtime = prepare_exact_search_runtime(
service_root="./.state/kayak-engine",
collection_id="news",
tenant_id="tenant-a",
namespace_id="search",
snapshot_id="snapshot-0001",
config=PreparedExactSearchRuntimeConfig(
concurrency_lane_count=2,
worker_count=4,
max_batch_size=32,
max_batch_wait_ms=2,
),
)
Use:
kayak.search_batch(...)when you already loaded one localLateIndexkayak_engine.prepare_exact_search_runtime(...)when many callers share one fixed hosted snapshotkayak_engine.prepare_exact_search_scheduler(...)only as a compatibility aliasconcurrency_lane_countwhen you need more than one independent runtime lane- Hosted Engine Python for the full runtime contract
Clause-Text Verification
Use this only when you want the text-family verifier.
You must attach:
text=to the querytexts=to the documents or index
query = kayak.query(query_vectors, text=query_text)
index = kayak.documents(
doc_ids,
document_vectors,
texts=document_texts,
).pack()
plan = kayak.exact_full_scan_search_plan(
final_k=5,
candidate_k=20,
stage3_verifier=kayak.clause_text_stage3_verifier_operator(),
)
result = kayak.search_with_plan(
query,
index,
plan,
backend=BACKEND,
)
Explicit 128-Dim Layouts
Use this when your embeddings are 128-dimensional and you want the layout choice to be explicit in code.
flat_query = query.to_layout("flat_dim128")
hybrid_index = index.to_layout("hybrid_flat_dim128")
scores = kayak.maxsim(
flat_query,
hybrid_index,
backend=BACKEND,
)
Backend Selection
If your application should normally use Mojo, define it once and reuse it.
If you want to inspect what the runtime can actually use:
Minimal API Map
| Task | API |
|---|---|
| Text ingest plus text search | kayak.open_text_retriever(...) |
| One query, top-k hits | kayak.search(...) |
| One query, all exact scores | kayak.maxsim(...) |
| Many queries, top-k hits | kayak.search_batch(...) |
| Many queries, all exact scores | kayak.maxsim_batch(...) |
| Exact baseline with stage metadata | kayak.exact_full_scan_search_plan(...) + kayak.search_with_plan(...) |
| Approximate candidates + exact rerank | kayak.document_proxy_search_plan(...) + kayak.search_with_plan(...) |
| Text-family verification | kayak.clause_text_stage3_verifier_operator() |
| 128-dim explicit layouts | query.to_layout(...), index.to_layout(...) |
Notebook
The notebook uses a real text corpus, encodes it to ColBERT-style 128-dimensional vectors on CPU, builds a Kayak index, and runs:
- exact search
- flat/hybrid layout scoring
- approximate candidate generation plus exact reranking
- batch search
- clause-text verification
Notebook:
For vector-database handoff patterns, see: