Quickstart

The shortest verified path once installation is correct. Uses ColBERT-style 128-dimensional vectors — the shape the Mojo exact path is designed for.

One File, Mojo First

import numpy as np
import kayak


def dim128(index: int) -> np.ndarray:
    vector = np.zeros(128, dtype=np.float32)
    vector[index] = 1.0
    return vector


BACKEND = kayak.MOJO_EXACT_CPU_BACKEND

query = kayak.query(np.stack([dim128(0), dim128(1)]))
documents = kayak.documents(
    ["doc-a", "doc-b"],
    [
        np.stack([dim128(0), dim128(1)]),
        np.stack([dim128(0), dim128(0)]),
    ],
)
index = documents.pack()

hits = kayak.search(query, index, k=2, backend=BACKEND)
scores = kayak.maxsim(query, index, backend=BACKEND)

print("hits:", [(hit.doc_id, hit.score) for hit in hits])
print("scores:", scores.numpy().tolist())

kayak.query(...) wraps one token matrix as a LateQuery.
kayak.documents(...) collects aligned ids and token matrices.
.pack() materializes the search-ready LateIndex.
kayak.search(...) and kayak.maxsim(...) run exact scoring on the selected backend.

If You Start From Plain Text Instead Of Vectors

Use one encoder plus one retriever:

retriever = kayak.open_text_retriever(
    encoder="colbert",
    store="kayak",
    encoder_kwargs={"model_name": "colbert-ir/colbertv2.0"},
    store_kwargs={"path": "./kayak-index"},
)

retriever.upsert_texts(doc_ids, texts)
hits = retriever.search_text(query_text, k=10)

For text workflows, open_text_retriever(...) already prefers the Mojo backend automatically when the active environment can actually run it.

This path expects text strings. If your source data is PDFs, scans, tables, or other mixed-layout documents, parse or extract them before handing text or token-level vectors to Kayak.

Make The Layout Explicit When You Care About It

If you want the query and index layouts to be part of the code you benchmark or profile, convert them explicitly:

flat_query = query.to_layout("flat_dim128")
hybrid_index = index.to_layout("hybrid_flat_dim128")

scores = kayak.maxsim(
    flat_query,
    hybrid_index,
    backend=BACKEND,
)

Use this form when you want the 128-dimensional flattened layout to be visible in your code and measurements.

Batch Search On The Same Index

If many queries hit the same index, use the batch API:

batch = kayak.query_batch(
    [
        np.stack([dim128(0), dim128(1)]),
        np.stack([dim128(0), dim128(1), dim128(2)]),
    ]
)

hits_by_query = kayak.search_batch(
    batch,
    index,
    k=2,
    backend=BACKEND,
)

That is one of the main reasons to install Mojo correctly in the first place.

Open The Next Page Based On What You Need

If you want to...	Open...
understand the search-layer boundary	Where Kayak Fits
choose the long-term API shape	Usage Patterns
pass a Hugging Face ColBERT checkpoint or your own model	Text Encoders
open a full executed walkthrough	real-usage-with-mojo.ipynb
understand the backend and layout surface	Mojo Backend
keep an existing database for storage	Storage + Search