Search & RAG
Two retrieval systems share one database (Postgres + pgvector) and pull in opposite directions. Universal search ranks whole entities across everything you can see so you can jump to them. Per-conversation file RAG retrieves chunks of the files attached to one chat and feeds them to Carola so she can answer from a 1000-page PDF and cite the page.
Coming from React + Django? "RAG" (retrieval-augmented generation) is just: before asking the LLM, look up the relevant paragraphs and paste them into the prompt. The model doesn't "read your PDF" — we find the right slices and hand them over. The clever part is finding them by meaning, not keywords, which is what the embedding + pgvector half buys you (a nearest-neighbour index over text meaning, like a
GiST/GINindex but over "what the sentence is about").
| Universal search | Per-conversation file RAG | |
|---|---|---|
| Question | "Where is X?" → navigate | "What does this file say about X?" → answer |
| Returns | whole entities (a project, a task, an article) | chunks of text (passages with a page number) |
| Scope | everything you're allowed to see | the files attached to this thread, owner-private |
| Consumer | a ranked list you click | the LLM prompt — Carola quotes + cites |
| Engine | hybrid FTS + trigram + vector, fused with RRF | pgvector cosine kNN, with a keyword fallback |
Both run server-only (they pull db and the embedder), and both degrade gracefully with no
embedder configured — so the app still works air-gapped.
Per-conversation file RAG
Attach a file to a Carola conversation (the composer paperclip, or the conversation's rail files panel) and she can answer questions from its contents — a partial factor buried in §6.1 of a Eurocode, a value in a 1000-page spec — quoting the exact passage and citing file + page. The file is owner-private and rides that conversation (a many-to-many join, so one file can be re-attached to many chats).
The pipeline at a glance
1 · Upload — uploadChatDocument
src/lib/chat-document-server.ts holds the server function. It's a createServerFn({ method: "POST" })
that takes multipart FormData (the file, the threadId, and a scope/scopeKey stamp). On each
upload it does three writes and then kicks off processing:
- Bytes → object storage.
putObject(uploadKey("chat-docs", ext), buffer, mime)fromfusion-storage— MinIO locally, Azure Blob in the cloud. The DB never holds the file bytes, only astorageKeypointer. - A
chat_documentrow with a short base62 id (the no-UUID rule — a user references this file by id when they rename/delete it) andstatus: "pending". - A
chat_document_attachmentrow ({ documentId, threadId },onConflictDoNothing) — this join is what makes the file belong to the conversation. Listing a chat's files (listChatDocuments) and retrieval both read through this join, keyed bythreadId; there is no project-wide file list.
Coming from Django?
uploadChatDocumentis your DRF@api_viewPOST handler;requireSession()is the auth check; thechat_documentrow is the model instance. The difference from a classicFileFieldis that the bytes go to object storage and the row stores extracted-and-embedded derivatives (chunks), because the file's job here is to be searched by meaning, not downloaded.
Then kickProcessing(id) runs the pipeline out of band:
/** Kick off processing: a Hatchet job when available, else inline on the long-running SSR runtime. */
async function kickProcessing(id: string): Promise<void> {
if (isJobsEnabled()) {
const { getHatchet } = await import("@tikab-interactive/fusion-jobs/hatchet");
await getHatchet()
.admin.runWorkflow("process-chat-document", { documentId: id })
.catch(() => processChatDocument(id));
} else {
await processChatDocument(id);
}
}The upload returns immediately with status: "pending"; the UI polls the row (pending → processing → ready/failed) and shows the chip as still working. Embeddings are asynchronous — a freshly
uploaded file isn't retrievable until the pipeline reaches ready.
2 · Extract — Kreuzberg
src/lib/chat-document-process.ts → extractText() picks the extractor by file type:
- Plain-text family (
text/*,.md,.csv,.json, …) is read directly in-process — no external service. This is also the zero-dependency local-dev path. - Everything else (PDF incl. OCR, Office, images — Kreuzberg handles ~96 formats) is POSTed to the
Kreuzberg service.
src/lib/kreuzberg.ts → extractWithKreuzberg()sends the bytes to its/extractendpoint and reads back the full text. Kreuzberg is a container in the example'sdocker-compose.yml(next to MinIO + Hatchet);KREUZBERG_URLpoints at it and is preset in.env.local.
Page numbers come from splitPages(): many PDF→text extractors separate pages with a form-feed
(\f), so when it's present we split on it and number the pages (page: 1, 2, …) — that's how a
citation can say "p. 412". No form-feed → the document is one unpaginated body and chunks carry
page: null.
Gotcha — the worker must be running. If
KREUZBERG_URLis unset, plain text still works but any PDF/Office upload is markedfailedwith a message telling you todocker compose up kreuzberg. Failures are recorded on the row (status: "failed",error), never thrown into the request — a bad upload degrades to a visible error chip, it doesn't crash the chat.
3 · Chunk — chunkWithOffsets
Embedding models have a bounded context and retrieval wants passage-sized hits, so the extracted
text is split into overlapping windows. chat-document-process.ts → buildChunks() chunks one page
at a time (so offsets stay page-relative and pair cleanly with the page number) using
chunkWithOffsets() from fusion-search:
- ~1200 characters per chunk, 150 of overlap (
CHUNK_CHARS/CHUNK_OVERLAP). The overlap means a sentence split across a boundary still lands wholly inside at least one chunk. - Boundary-aware — it prefers to cut at a paragraph → sentence → line break near the target size rather than mid-word, so chunks read as coherent passages.
- Offset-truthful — each chunk records
[charStart, charEnd)into the cleaned page text, with the invariantcontent === clean.slice(charStart, charEnd)(asserted inchunk.test.ts). Those offsets let a hit deep-link to and highlight the exact matched span.
Each chunk becomes a chat_document_chunk row carrying ordinal, page, the offsets, and content.
4 · Embed — embed() → pgvector
chat-document-process.ts → embedChunks() turns each chunk's text into a 768-dim vector via
embed() from fusion-ai, batched 64 at a time and run outside any transaction (it's the
slow, I/O-bound step). The vector is the chunk's meaning as a point in 768-space; "close" vectors mean
"similar meaning", which is what makes semantic retrieval possible.
The vectors are stored on the chunk rows in a pgvector column — embedding (768-dim, null
until embedded), paired with an embeddingPending flag and an HNSW index for fast cosine
search. The full chunk table from the schema (src/db/schema/chat-document.ts):
export const chatDocumentChunk = pgTable(
"chat_document_chunk",
{
// Internal row id — never referenced as an identifier (chunks are queried by documentId), so it
// stays a generated uuid; the no-UUID rule governs user-referenced ids, not opaque rows.
id: uuid("id").primaryKey().defaultRandom(),
documentId: text("document_id")
.notNull()
.references(() => chatDocument.id, { onDelete: "cascade" }),
// Chunk order within the document.
ordinal: integer("ordinal").notNull(),
// 1-based page number for the citation (null if the extractor gave no page info).
page: integer("page"),
// Character offsets of this chunk within its (cleaned) page text — lets a search hit
// highlight the exact matched passage. Null for chunks written before offsets were tracked.
charStart: integer("char_start"),
charEnd: integer("char_end"),
content: text("content").notNull(),
// Null until embedded; `embeddingPending` lets a back-fill job find un-embedded rows,
// so an embedding failure never makes a chunk silently unsearchable (it still matches
// the keyword fallback).
embedding: vector("embedding", { dimensions: CHAT_DOC_EMBED_DIM }),
embeddingPending: boolean("embedding_pending").notNull().default(true),
createdAt: timestamp("created_at").notNull().defaultNow(),
},
(t) => [
index("chat_document_chunk_doc_idx").on(t.documentId, t.ordinal),
// ANN index for cosine similarity search over chunk embeddings (pgvector hnsw).
index("chat_document_chunk_embedding_hnsw").using("hnsw", t.embedding.op("vector_cosine_ops")),
],
);When no embedder is configured (or a batch fails), embed() returns null and the chunk is written
with embedding: null, embeddingPending: true. It's still keyword-searchable (step 5's fallback)
and a back-fill job can find it later by that flag — an embedding failure never makes a chunk silently
unsearchable. The row's status flips to ready once chunks are written, and
indexDocumentChunks() mirrors the finished chunks into the universal search_index — reusing the
same vectors, never re-embedding (see below).
Coming from React + Django? The HNSW index is the load-bearing trick. Without it, "find the nearest vector" is a full table scan computing cosine distance against every chunk; with it, pgvector keeps an approximate-nearest-neighbour graph so a kNN query is sub-linear — the vector analogue of reaching for a B-tree instead of
WHERE … LIKE.
5 · Retrieve — searchChatDocuments
At question time Carola has a tool, search_attached_documents (defined in
src/lib/agent-assistant.ts), available in every scope. The LLM calls it whenever the user asks
about the content of an attached file; it forwards to searchChatDocuments() in
src/lib/chat-document-search.ts, which is doubly scoped:
// Files attached to THIS conversation (via the join), owner-scoped + ready.
const docs = await db
.select({
id: chatDocument.id,
filename: chatDocument.filename,
displayName: chatDocument.displayName,
})
.from(chatDocumentAttachment)
.innerJoin(chatDocument, eq(chatDocument.id, chatDocumentAttachment.documentId))
.where(
and(
eq(chatDocument.ownerId, opts.ownerId),
eq(chatDocumentAttachment.threadId, opts.threadId),
eq(chatDocument.status, "ready"),
),
);One user's files can never leak into another's chat: retrieval starts from the per-thread join filtered
by ownerId. Over those documents' chunks it runs semantic-first, keyword-fallback:
- Semantic (the normal path). Embed the query, then order chunks by pgvector cosine distance
(
cosineDistance(chunk.embedding, queryVec)) over rows whereembeddingPending = false, taking the top k (default 6). This is the nearest-neighbour lookup — it finds passages about the same thing as the question even when they share no words with it. - Keyword fallback. If there's no embedder (air-gapped) or embeddings haven't finished yet, it
falls back to
ilike '%term%'over the chunk text for the query's longer terms — so "find the exact value" still works immediately, before the vectors land.
It returns DocExcerpt[] — { file, page, content } per hit — and Carola is instructed to quote the
exact text and cite the file and page ("…the standard gives γM0 = 1.0 (_EN 1993-1-1, p. 47)").
That's the whole RAG loop: retrieve the right chunks, stuff them into the prompt, let the model answer
grounded in them.
Gotchas to remember
- The worker must be running for anything that isn't plain text. No
KREUZBERG_URL→ PDFs/Office land asfailed. (docker compose up kreuzberg— it's preset in.env.local.) - Embeddings are asynchronous. Upload returns at
pending; a file isn't semantically retrievable until it reachesready. The keyword fallback covers the gap. - Restart the worker after a schema change. The Hatchet worker imports the schema at boot, so a
chat_document*migration means re-runningbun run worker(or the inline path keeps serving stale shapes).
Universal search
One search box, in the app header, over everything you can see — projects, tasks, protocols, news, handbook articles, people, and your own conversations + attached files. It ranks across all of them at once, owner- and membership-scoped, and never widens access.
Three verbs, never blurred. The same box does Ask (hand the query to Carola), Find (ranked results you click into), and Scope (re-lens the conversation onto a building). Each is a distinct affordance — search never silently becomes chat, and chat never silently becomes a filter.
Where it lives
| Layer | Files |
|---|---|
| Engine | fusion-search — the pure ranking algorithm (FTS + trigram + vector, fused with RRF) |
| Index | search_index (one owner-scoped table) · src/lib/search-index-server.ts (the indexer) |
| Server function | universalSearch — embeds the query, runs the three retrievers, fuses, filters by what the viewer sees |
| UI | fusion-ui UniversalSearch (the header find/go control), wired into the app shell |
How it ranks
Three retrievers run over the one search_index, then Reciprocal Rank Fusion (in
fusion-search) merges their rankings into a single list:
FTS catches exact words, trigram catches typos and partial matches, and the vector retriever catches
meaning. RRF needs no score calibration between the lanes — and that's the point. A Postgres
ts_rank and a pgvector cosine distance live on incomparable scales, so weighting them against each
other is guesswork. reciprocalRankFusion() (in rrf.ts) sidesteps it: for every lane an id appears
in it adds 1 / (k + rank) (k = 60, the canonical constant) and sums those contributions, so a
result strong in any lane — and especially one near the top of two — surfaces. Ties break by id, so
it's deterministic. With no embedder configured it degrades to FTS + trigram, so search
works air-gapped too.
One index, owner-scoped
Everything indexable mirrors into a single search_index row (entityType, title, content,
embedding, an owner/visibility key, a deep-link url). The indexer in search-index-server.ts
has a collect* per domain (collectWiki, collectTasks, collectPoints, collectPeople,
collectDocuments, …) and each one stamps the security-load-bearing visibility for its rows:
public, project / protokoll (membership-gated, with a scopeKey), or owner (your private
conversations and files). The same permission filters the apps use are applied at
query time, so a result can never reveal something the viewer couldn't already open — and rows that
can't be scoped (e.g. an orphaned protocol point) are simply not indexed, security over
completeness.
search-index-server.ts keeps the index fresh incrementally — every create/update/delete of an
indexed entity re-mirrors or removes its row (reindexEntity / removeEntity), and bun run db:seed
does a full reindexAll() so search works from the first run.
Extending it
- Index a new entity → add a
collect*mapping insearch-index-server.ts(title + content + url- visibility) and call
reindexEntityfrom its server mutations.
- visibility) and call
- A new retriever or weighting → it lives in
fusion-search, independent of the app; the server function just calls it.
How the two systems meet
They're separate at query time but share the embedding work. When a file finishes processing,
indexDocumentChunks() (in search-index-server.ts, called from the end of processChatDocument)
projects its chunks into search_index reusing the vectors the RAG pipeline already computed — a
document is never embedded twice. collectDocuments() even sets each row's deep-link url to the
file's object route with a #page=N anchor, so a universal-search hit on a PDF jumps to the page.
So the same chunk lives in two places for two jobs:
- in
chat_document_chunk, retrieved bysearchChatDocuments(cosine kNN, this thread's files only) to feed Carola's prompt and get a cited answer; - mirrored into
search_index, retrieved byuniversalSearch(hybrid + RRF, everything you can see) to rank the file as one result you click.
RAG pulls paragraphs into the prompt; universal search ranks whole things you can open. Same Postgres, same pgvector, opposite ends of the same idea.