Where Kayak Fits

Kayak is the late-interaction search layer.

It is not a full document-intelligence or RAG-ingestion platform.

The Boundary

Use Kayak after documents have become either:

token-level query and document vectors
plain text that you pass through an explicit encoder
a materialized search slice loaded from your existing storage system

Do not expect Kayak to own:

OCR
PDF layout recovery
table extraction
handwritten annotation handling
application answer generation

Those are real systems. Keep them on your side or connect Kayak to a dedicated parser, extractor, or application stack.

The Pipeline Shape

documents, PDFs, or databases
        |
        v
parser / OCR / extractor / application ETL
        |
        v
plain text or token-level vectors
        |
        v
encoder, if vectors are not already materialized
        |
        v
Kayak: late-interaction index, search plan, exact rerank, explain data
        |
        v
application workflow, answer generation, or user interface

Kayak owns the middle search layer:

explicit late-interaction objects
exact MaxSim scoring
candidate stages plus exact reranking
vector-count and layout visibility
database handoff and search-slice materialization
repeated-query and hosted-snapshot reuse

Why The Boundary Is Narrow

Late interaction has a different cost model from one-vector dense retrieval.

Search quality, memory, and latency depend on:

query vector count
document vector count
candidate window size
layout
backend
whether stage 1 actually reaches the documents that exact reranking needs

Kayak keeps those details visible so you can measure and tune them directly. That would be harder if the product boundary hid retrieval inside a broad "upload documents and ask questions" pipeline.

What To Optimize In Kayak

Optimize Kayak when the problem is:

exact late-interaction throughput
candidate recall per unit cost
bytes per vector or bytes per document
repeated-query serving over one fixed search slice
explicit stage profiles and explainability
keeping an existing database while Kayak owns the search step

Use other systems when the problem is:

scanned document cleanup
layout-aware document parsing
structured field extraction
chunk authoring for non-late-interaction retrievers
answer synthesis and workflow automation

Plain Text Support

kayak.open_text_retriever(...) is a convenience path for plain text plus an explicit encoder.

It does not turn Kayak into an OCR or extraction engine.

Use it when your application already has text strings and you want one object to wire together:

encoder
store
search

If your source data is PDFs, scans, spreadsheets, emails, or mixed-layout documents, run the appropriate parser or extractor first. Then pass the resulting text or token-level vectors into Kayak.