Storage + Search
If you already use a vector database, the default Kayak pattern is: leave persistence where it is, materialize one exact searchable slice into Kayak, and run search on that slice directly.
Recommended Operating Model
| Situation | Default choice | Why |
|---|---|---|
| the searchable slice fits locally | full Kayak exact retrieval | simplest path and easiest thing to measure |
| the database is already your durable system of record | open_store(...) plus load_index(...) |
keeps persistence where it is and gives Kayak a reusable exact slice |
| many queries hit the same fixed slice | load_index(...) once, then search_batch(...) |
avoids repeated slice materialization |
| the full slice is too large or must be routed first | database candidate stage plus exact Kayak search | use the DB only when it materially reduces the working set |
Local Evidence Snapshot
These are local measured examples from the executed notebooks — useful deployment evidence, not universal benchmark claims.
| Scenario | Measured result | Interpretation |
|---|---|---|
| BrowseComp-Plus gold slice from LanceDB | same NDCG@10, Kayak exact search 21.154376159357273x faster after a one-time 0.20880537503398955 second load |
when the slice fits, loading once and searching in Kayak can preserve quality while reducing search time |
| BrowseComp-Plus evidence slice from LanceDB | same NDCG@10, Kayak exact search 12.52156994765039x faster after a one-time 0.21453558304347098 second load |
the same storage-first, search-in-Kayak pattern held on a second slice |
| repeated-query LanceDB slice example | explicit loaded-slice search_batch(...) was 154.42x faster than looping retriever.search_text(...) |
once the slice is loaded, batch search is the right public fast path |
| repeated-query example through the high-level retriever | retriever.search_text_batch(...) was 1.029x vs a per-query loop |
retriever batching is mainly ergonomic; the main gain comes from reusing the explicit slice |
Default Recommendation
Use full Kayak retrieval when the searchable slice:
- fits on the target host
- can be refreshed on your normal update cadence
- does not need a database-side candidate stage before search
That is usually the cleanest production shape:
import kayak
retriever = kayak.open_text_retriever(
encoder="colbert",
store="kayak",
encoder_kwargs={"model_name": "colbert-ir/colbertv2.0"},
store_kwargs={"path": "./kayak-index"},
)
retriever.upsert_texts(doc_ids, texts, metadata=metadata_rows)
hits = retriever.search_text(query_text, k=10, where={"tenant": "acme"})
If you already own vectors rather than text, the equivalent lower-level shape is:
import numpy as np
import kayak
rows = vector_db.fetch_all()
index = kayak.documents(
[row["doc_id"] for row in rows],
[np.asarray(row["vector"], dtype=np.float32) for row in rows],
texts=[row["text"] for row in rows],
).pack()
query = kayak.query(query_vectors, text=query_text)
hits = kayak.search(
query,
index,
k=10,
backend=kayak.MOJO_EXACT_CPU_BACKEND,
)
Public Store Adapters
Use the public store adapters when the storage layer is already one of the supported systems and you want to avoid writing your own row-to-index bridge.
| Store | Open with |
|---|---|
| LanceDB | kayak.open_store("lancedb", path=..., table_name=...) |
| PgVector | kayak.open_store("pgvector", dsn=... \| connection=..., table_name=..., schema_name=...) |
| Qdrant | kayak.open_store("qdrant", client=... \| path=..., collection_name=...) |
| Weaviate | kayak.open_store("weaviate", client=... \| persistence_path=..., collection_name=..., vector_name=...) |
| Chroma | kayak.open_store("chromadb", client=... \| path=..., collection_name=...) |
Prefer the context-manager form when the adapter may own cleanup-sensitive resources:
with kayak.open_store("qdrant", client=my_qdrant_client, collection_name="docs") as store:
store.upsert(documents, metadata=metadata_rows)
index = store.load_index(include_text=True)
Store-specific where= behavior is not identical across adapters. Use
Vector Databases for the adapter-by-adapter semantics
before assuming pushdown behavior.
Repeated Queries Against The Same Stored Slice
If the underlying rows stay fixed for a while, the verified fast path is:
- load one exact Kayak slice once
- reuse that
LateIndexfor many queries - use
search_batch(...)when those queries arrive together
with kayak.open_store("pgvector", dsn=dsn, table_name="docs") as store:
index = store.load_index(where={"tenant": "acme"}, include_text=True)
batch = kayak.query_batch([query_a_vectors, query_b_vectors, query_c_vectors])
hits_by_query = kayak.search_batch(
batch,
index,
k=10,
backend=kayak.MOJO_EXACT_CPU_BACKEND,
)
The executed example for this path is:
What This Page Does Not Promise
The public LateStore protocol does not currently promise generic thread-safe
concurrent use of the same store instance across every adapter.
What is verified and supported today is:
- repeated queries against one loaded
LateIndex - batched queries against one loaded
LateIndex - explicit concurrent same-snapshot serving through
import kayak_engine
If you want many same-process callers to share one fixed hosted snapshot, use
the prepared scheduler from Hosted Engine Python
instead of assuming that shared concurrent open_store(...) use is the
intended scale-out path.