Storage + Search
If you already use a vector database, the default Kayak pattern is: leave persistence where it is, materialize one exact searchable slice into Kayak, and run search on that slice directly.
Recommended Operating Model
| Situation | Default choice | Why |
|---|---|---|
| the searchable slice fits locally | full Kayak exact retrieval | simplest path and easiest thing to measure |
| the database is already your durable system of record | open_store(...) plus load_index(...) |
keeps persistence where it is and gives Kayak a reusable exact slice |
| many queries hit the same fixed slice | load_index(...) once, then search_batch(...) |
avoids repeated slice materialization |
| the full slice is too large or must be routed first | database candidate stage plus exact Kayak search | use the DB only when it materially reduces the working set |
Recommendation First
If storage already lives somewhere else, the public recommendation is:
- keep that system for persistence
- materialize one exact Kayak slice with
load_index(...) - reuse that slice
- use
search_batch(...)when queries arrive in groups
Current repeated-query fast path in the Python SDK:
- one loaded
LateIndex kayak.search_batch(...)backend=kayak.MOJO_EXACT_CPU_BACKENDwhen available
The speed win comes mainly from reusing the loaded slice. Backend selection is secondary to avoiding repeated materialization.
Local Evidence Snapshot
These are local measured examples from the executed notebooks — useful deployment evidence, not universal benchmark claims.
| Scenario | Measured result | Interpretation |
|---|---|---|
| BrowseComp-Plus gold slice from LanceDB | same NDCG@10, Kayak exact search 21.154376159357273x faster after a one-time 0.20880537503398955 second load |
when the slice fits, loading once and searching in Kayak can preserve quality while reducing search time |
| BrowseComp-Plus evidence slice from LanceDB | same NDCG@10, Kayak exact search 12.52156994765039x faster after a one-time 0.21453558304347098 second load |
the same storage-first, search-in-Kayak pattern held on a second slice |
| repeated-query LanceDB slice example | explicit loaded-slice search_batch(...) was 154.42x faster than looping retriever.search_text(...) |
once the slice is loaded, batch search is the right public fast path |
| repeated-query example through the high-level retriever | retriever.search_text_batch(...) was 1.029x vs a per-query loop |
retriever batching is mainly ergonomic; the main gain comes from reusing the explicit slice |
Default Recommendation
Use full Kayak retrieval when the searchable slice:
- fits on the target host
- can be refreshed on your normal update cadence
- does not need a database-side candidate stage before search
That is usually the cleanest production shape:
import kayak
retriever = kayak.open_text_retriever(
encoder="colbert",
store="kayak",
encoder_kwargs={"model_name": "colbert-ir/colbertv2.0"},
store_kwargs={"path": "./kayak-index"},
)
retriever.upsert_texts(doc_ids, texts, metadata=metadata_rows)
hits = retriever.search_text(query_text, k=10, where={"tenant": "acme"})
If you already own vectors rather than text, the equivalent lower-level shape is:
import numpy as np
import kayak
rows = vector_db.fetch_all()
index = kayak.documents(
[row["doc_id"] for row in rows],
[np.asarray(row["vector"], dtype=np.float32) for row in rows],
texts=[row["text"] for row in rows],
).pack()
query = kayak.query(query_vectors, text=query_text)
hits = kayak.search(
query,
index,
k=10,
backend=kayak.MOJO_EXACT_CPU_BACKEND,
)
Public Store Adapters
Use the public store adapters when the storage layer is already one of the supported systems and you want to avoid writing your own row-to-index bridge.
| Store | Open with |
|---|---|
| LanceDB | kayak.open_store("lancedb", path=..., table_name=...) |
| PgVector | kayak.open_store("pgvector", dsn=... \| connection=..., table_name=..., schema_name=...) |
| Qdrant | kayak.open_store("qdrant", client=... \| path=..., collection_name=...) |
| Weaviate | kayak.open_store("weaviate", client=... \| persistence_path=..., collection_name=..., vector_name=...) |
| Chroma | kayak.open_store("chromadb", client=... \| path=..., collection_name=...) |
Prefer the context-manager form when the adapter may own cleanup-sensitive resources:
with kayak.open_store("qdrant", client=my_qdrant_client, collection_name="docs") as store:
store.upsert(documents, metadata=metadata_rows)
index = store.load_index(include_text=True)
Store-specific where= behavior is not identical across adapters. Use
Vector Databases for the adapter-by-adapter semantics
before assuming pushdown behavior.
Repeated Queries Against The Same Stored Slice
If the underlying rows stay fixed for a while, the verified fast path is:
- load one exact Kayak slice once
- reuse that
LateIndexfor many queries - use
search_batch(...)when those queries arrive together
with kayak.open_store("pgvector", dsn=dsn, table_name="docs") as store:
index = store.load_index(where={"tenant": "acme"}, include_text=True)
batch = kayak.query_batch([query_a_vectors, query_b_vectors, query_c_vectors])
hits_by_query = kayak.search_batch(
batch,
index,
k=10,
backend=kayak.MOJO_EXACT_CPU_BACKEND,
)
The executed example for this path is:
What This Page Does Not Promise
The public LateStore protocol does not currently promise generic thread-safe
concurrent use of the same store instance across every adapter.
What is verified and supported today is:
- repeated queries against one loaded
LateIndex - batched queries against one loaded
LateIndex - reusable loaded-slice workflows through the public Python SDK
Do not assume that one shared LateStore instance is the intended concurrency
surface across every adapter.