Search & RAG

Two retrieval systems share one database (Postgres + pgvector) and pull in opposite directions. Universal search ranks whole entities across everything you can see so you can jump to them. Per-conversation file RAG retrieves chunks of the files attached to one chat and feeds them to Carola so she can answer from a 1000-page PDF and cite the page.

Coming from React + Django? "RAG" (retrieval-augmented generation) is just: before asking the LLM, look up the relevant paragraphs and paste them into the prompt. The model doesn't "read your PDF" — we find the right slices and hand them over. The clever part is finding them by meaning, not keywords, which is what the embedding + pgvector half buys you (a nearest-neighbour index over text meaning, like a GiST/GIN index but over "what the sentence is about").

	Universal search	Per-conversation file RAG
Question	"Where is X?" → navigate	"What does this file say about X?" → answer
Returns	whole entities (a project, a task, an article)	chunks of text (passages with a page number)
Scope	everything you're allowed to see	the files attached to this thread, owner-private
Consumer	a ranked list you click	the LLM prompt — Carola quotes + cites
Engine	hybrid FTS + trigram + vector, fused with RRF	pgvector cosine kNN, with a keyword fallback

Both run server-only (they pull db and the embedder), and both degrade gracefully with no embedder configured — so the app still works air-gapped.

Per-conversation file RAG

Attach a file to a Carola conversation (the composer paperclip, or the conversation's rail files panel) and she can answer questions from its contents — a partial factor buried in §6.1 of a Eurocode, a value in a 1000-page spec — quoting the exact passage and citing file + page. The file is owner-private and rides that conversation (a many-to-many join, so one file can be re-attached to many chats).

The pipeline at a glance

Loading diagram...

1 · Upload — `uploadChatDocument`

src/lib/chat-document-server.ts holds the server function. It's a createServerFn({ method: "POST" }) that takes multipart FormData (the file, the threadId, and a scope/scopeKey stamp). On each upload it does three writes and then kicks off processing:

Bytes → object storage. putObject(uploadKey("chat-docs", ext), buffer, mime) from fusion-storage — MinIO locally, Azure Blob in the cloud. The DB never holds the file bytes, only a storageKey pointer.
A chat_document row with a short base62 id (the no-UUID rule — a user references this file by id when they rename/delete it) and status: "pending".
A chat_document_attachment row ({ documentId, threadId }, onConflictDoNothing) — this join is what makes the file belong to the conversation. Listing a chat's files (listChatDocuments) and retrieval both read through this join, keyed by threadId; there is no project-wide file list.

Coming from Django? uploadChatDocument is your DRF @api_view POST handler; requireSession() is the auth check; the chat_document row is the model instance. The difference from a classic FileField is that the bytes go to object storage and the row stores extracted-and-embedded derivatives (chunks), because the file's job here is to be searched by meaning, not downloaded.

Then kickProcessing(id) runs the pipeline out of band:

example/src/lib/chat-document-server.ts

/** Kick off processing: a Hatchet job when available, else inline on the long-running SSR runtime. */
async function kickProcessing(id: string): Promise<void> {
	if (isJobsEnabled()) {
		const { getHatchet } = await import("@tikab-interactive/fusion-jobs/hatchet");
		await getHatchet()
			.admin.runWorkflow("process-chat-document", { documentId: id })
			.catch(() => processChatDocument(id));
	} else {
		await processChatDocument(id);
	}
}

The upload returns immediately with status: "pending"; the UI polls the row (pending → processing → ready/failed) and shows the chip as still working. Embeddings are asynchronous — a freshly uploaded file isn't retrievable until the pipeline reaches ready.

2 · Extract — Kreuzberg

src/lib/chat-document-process.ts → extractText() picks the extractor by file type:

Plain-text family (text/*, .md, .csv, .json, …) is read directly in-process — no external service. This is also the zero-dependency local-dev path.
Everything else (PDF incl. OCR, Office, images — Kreuzberg handles ~96 formats) is POSTed to the Kreuzberg service. src/lib/kreuzberg.ts → extractWithKreuzberg() sends the bytes to its /extract endpoint and reads back the full text. Kreuzberg is a container in the example's docker-compose.yml (next to MinIO + Hatchet); KREUZBERG_URL points at it and is preset in .env.local.

Page numbers come from splitPages(): many PDF→text extractors separate pages with a form-feed (\f), so when it's present we split on it and number the pages (page: 1, 2, …) — that's how a citation can say "p. 412". No form-feed → the document is one unpaginated body and chunks carry page: null.

Gotcha — the worker must be running. If KREUZBERG_URL is unset, plain text still works but any PDF/Office upload is marked failed with a message telling you to docker compose up kreuzberg. Failures are recorded on the row (status: "failed", error), never thrown into the request — a bad upload degrades to a visible error chip, it doesn't crash the chat.

3 · Chunk — `chunkWithOffsets`

Embedding models have a bounded context and retrieval wants passage-sized hits, so the extracted text is split into overlapping windows. chat-document-process.ts → buildChunks() chunks one page at a time (so offsets stay page-relative and pair cleanly with the page number) using chunkWithOffsets() from fusion-search:

~1200 characters per chunk, 150 of overlap (CHUNK_CHARS / CHUNK_OVERLAP). The overlap means a sentence split across a boundary still lands wholly inside at least one chunk.
Boundary-aware — it prefers to cut at a paragraph → sentence → line break near the target size rather than mid-word, so chunks read as coherent passages.
Offset-truthful — each chunk records [charStart, charEnd) into the cleaned page text, with the invariant content === clean.slice(charStart, charEnd) (asserted in chunk.test.ts). Those offsets let a hit deep-link to and highlight the exact matched span.

Each chunk becomes a chat_document_chunk row carrying ordinal, page, the offsets, and content.

4 · Embed — `embed()` → pgvector

chat-document-process.ts → embedChunks() turns each chunk's text into a 768-dim vector via embed() from fusion-ai, batched 64 at a time and run outside any transaction (it's the slow, I/O-bound step). The vector is the chunk's meaning as a point in 768-space; "close" vectors mean "similar meaning", which is what makes semantic retrieval possible.

The vectors are stored on the chunk rows in a pgvector column — embedding (768-dim, null until embedded), paired with an embeddingPending flag and an HNSW index for fast cosine search. The full chunk table from the schema (src/db/schema/chat-document.ts):

example/src/db/schema/chat-document.ts

export const chatDocumentChunk = pgTable(
	"chat_document_chunk",
	{
		// Internal row id — never referenced as an identifier (chunks are queried by documentId), so it
		// stays a generated uuid; the no-UUID rule governs user-referenced ids, not opaque rows.
		id: uuid("id").primaryKey().defaultRandom(),
		documentId: text("document_id")
			.notNull()
			.references(() => chatDocument.id, { onDelete: "cascade" }),
		// Chunk order within the document.
		ordinal: integer("ordinal").notNull(),
		// 1-based page number for the citation (null if the extractor gave no page info).
		page: integer("page"),
		// Character offsets of this chunk within its (cleaned) page text — lets a search hit
		// highlight the exact matched passage. Null for chunks written before offsets were tracked.
		charStart: integer("char_start"),
		charEnd: integer("char_end"),
		content: text("content").notNull(),
		// Null until embedded; `embeddingPending` lets a back-fill job find un-embedded rows,
		// so an embedding failure never makes a chunk silently unsearchable (it still matches
		// the keyword fallback).
		embedding: vector("embedding", { dimensions: CHAT_DOC_EMBED_DIM }),
		embeddingPending: boolean("embedding_pending").notNull().default(true),
		createdAt: timestamp("created_at").notNull().defaultNow(),
	},
	(t) => [
		index("chat_document_chunk_doc_idx").on(t.documentId, t.ordinal),
		// ANN index for cosine similarity search over chunk embeddings (pgvector hnsw).
		index("chat_document_chunk_embedding_hnsw").using("hnsw", t.embedding.op("vector_cosine_ops")),
	],
);

When no embedder is configured (or a batch fails), embed() returns null and the chunk is written with embedding: null, embeddingPending: true. It's still keyword-searchable (step 5's fallback) and a back-fill job can find it later by that flag — an embedding failure never makes a chunk silently unsearchable. The row's status flips to ready once chunks are written, and indexDocumentChunks() mirrors the finished chunks into the universal search_index — reusing the same vectors, never re-embedding (see below).

Coming from React + Django? The HNSW index is the load-bearing trick. Without it, "find the nearest vector" is a full table scan computing cosine distance against every chunk; with it, pgvector keeps an approximate-nearest-neighbour graph so a kNN query is sub-linear — the vector analogue of reaching for a B-tree instead of WHERE … LIKE.

5 · Retrieve — `searchChatDocuments`

At question time Carola has a tool, search_attached_documents (defined in src/lib/agent-assistant.ts), available in every scope. The LLM calls it whenever the user asks about the content of an attached file; it forwards to searchChatDocuments() in src/lib/chat-document-search.ts, which is doubly scoped:

example/src/lib/chat-document-search.ts

	// Files attached to THIS conversation (via the join), owner-scoped + ready.
	const docs = await db
		.select({
			id: chatDocument.id,
			filename: chatDocument.filename,
			displayName: chatDocument.displayName,
		})
		.from(chatDocumentAttachment)
		.innerJoin(chatDocument, eq(chatDocument.id, chatDocumentAttachment.documentId))
		.where(
			and(
				eq(chatDocument.ownerId, opts.ownerId),
				eq(chatDocumentAttachment.threadId, opts.threadId),
				eq(chatDocument.status, "ready"),
			),
		);

One user's files can never leak into another's chat: retrieval starts from the per-thread join filtered by ownerId. Over those documents' chunks it runs semantic-first, keyword-fallback:

Semantic (the normal path). Embed the query, then order chunks by pgvector cosine distance (cosineDistance(chunk.embedding, queryVec)) over rows where embeddingPending = false, taking the top k (default 6). This is the nearest-neighbour lookup — it finds passages about the same thing as the question even when they share no words with it.
Keyword fallback. If there's no embedder (air-gapped) or embeddings haven't finished yet, it falls back to ilike '%term%' over the chunk text for the query's longer terms — so "find the exact value" still works immediately, before the vectors land.

It returns DocExcerpt[] — { file, page, content } per hit — and Carola is instructed to quote the exact text and cite the file and page ("…the standard gives γM0 = 1.0 (_EN 1993-1-1, p. 47)"). That's the whole RAG loop: retrieve the right chunks, stuff them into the prompt, let the model answer grounded in them.

Gotchas to remember

The worker must be running for anything that isn't plain text. No KREUZBERG_URL → PDFs/Office land as failed. (docker compose up kreuzberg — it's preset in .env.local.)
Embeddings are asynchronous. Upload returns at pending; a file isn't semantically retrievable until it reaches ready. The keyword fallback covers the gap.
Restart the worker after a schema change. The Hatchet worker imports the schema at boot, so a chat_document* migration means re-running bun run worker (or the inline path keeps serving stale shapes).

Universal search

One search box, in the app header, over everything you can see — projects, tasks, protocols, news, handbook articles, people, and your own conversations + attached files. It ranks across all of them at once, owner- and membership-scoped, and never widens access.

Three verbs, never blurred. The same box does Ask (hand the query to Carola), Find (ranked results you click into), and Scope (re-lens the conversation onto a building). Each is a distinct affordance — search never silently becomes chat, and chat never silently becomes a filter.

Where it lives

Layer	Files
Engine	`fusion-search` — the pure ranking algorithm (FTS + trigram + vector, fused with RRF)
Index	`search_index` (one owner-scoped table) · `src/lib/search-index-server.ts` (the indexer)
Server function	`universalSearch` — embeds the query, runs the three retrievers, fuses, filters by what the viewer sees
UI	fusion-ui `UniversalSearch` (the header find/go control), wired into the app shell

How it ranks

Three retrievers run over the one search_index, then Reciprocal Rank Fusion (in fusion-search) merges their rankings into a single list:

Loading diagram...

FTS catches exact words, trigram catches typos and partial matches, and the vector retriever catches meaning. RRF needs no score calibration between the lanes — and that's the point. A Postgres ts_rank and a pgvector cosine distance live on incomparable scales, so weighting them against each other is guesswork. reciprocalRankFusion() (in rrf.ts) sidesteps it: for every lane an id appears in it adds 1 / (k + rank) (k = 60, the canonical constant) and sums those contributions, so a result strong in any lane — and especially one near the top of two — surfaces. Ties break by id, so it's deterministic. With no embedder configured it degrades to FTS + trigram, so search works air-gapped too.

One index, owner-scoped

Everything indexable mirrors into a single search_index row (entityType, title, content, embedding, an owner/visibility key, a deep-link url). The indexer in search-index-server.ts has a collect* per domain (collectWiki, collectTasks, collectPoints, collectPeople, collectDocuments, …) and each one stamps the security-load-bearing visibility for its rows: public, project / protokoll (membership-gated, with a scopeKey), or owner (your private conversations and files). The same permission filters the apps use are applied at query time, so a result can never reveal something the viewer couldn't already open — and rows that can't be scoped (e.g. an orphaned protocol point) are simply not indexed, security over completeness.

search-index-server.ts keeps the index fresh incrementally — every create/update/delete of an indexed entity re-mirrors or removes its row (reindexEntity / removeEntity), and bun run db:seed does a full reindexAll() so search works from the first run.

Extending it

Index a new entity → add a collect* mapping in search-index-server.ts (title + content + url
- visibility) and call reindexEntity from its server mutations.
A new retriever or weighting → it lives in fusion-search, independent of the app; the server function just calls it.

How the two systems meet

They're separate at query time but share the embedding work. When a file finishes processing, indexDocumentChunks() (in search-index-server.ts, called from the end of processChatDocument) projects its chunks into search_index reusing the vectors the RAG pipeline already computed — a document is never embedded twice. collectDocuments() even sets each row's deep-link url to the file's object route with a #page=N anchor, so a universal-search hit on a PDF jumps to the page.

So the same chunk lives in two places for two jobs:

in chat_document_chunk, retrieved by searchChatDocuments (cosine kNN, this thread's files only) to feed Carola's prompt and get a cited answer;
mirrored into search_index, retrieved by universalSearch (hybrid + RRF, everything you can see) to rank the file as one result you click.

RAG pulls paragraphs into the prompt; universal search ranks whole things you can open. Same Postgres, same pgvector, opposite ends of the same idea.