Skip to content
Fusion

Search & RAG

Two retrieval systems share one database (Postgres + pgvector) and pull in opposite directions. Universal search ranks whole entities across everything you can see so you can jump to them. Per-conversation file RAG retrieves chunks of the files attached to one chat and feeds them to Carola so she can answer from a 1000-page PDF and cite the page.

Coming from React + Django? "RAG" (retrieval-augmented generation) is just: before asking the LLM, look up the relevant paragraphs and paste them into the prompt. The model doesn't "read your PDF" — we find the right slices and hand them over. The clever part is finding them by meaning, not keywords, which is what the embedding + pgvector half buys you (a nearest-neighbour index over text meaning, like a GiST/GIN index but over "what the sentence is about").

Universal searchPer-conversation file RAG
Question"Where is X?" → navigate"What does this file say about X?" → answer
Returnswhole entities (a project, a task, an article)chunks of text (passages with a page number)
Scopeeverything you're allowed to seethe files attached to this thread, owner-private
Consumera ranked list you clickthe LLM prompt — Carola quotes + cites
Enginehybrid FTS + trigram + vector, fused with RRFpgvector cosine kNN, with a keyword fallback

Both run server-only (they pull db and the embedder), and both degrade gracefully with no embedder configured — so the app still works air-gapped.


Per-conversation file RAG

Attach a file to a Carola conversation (the composer paperclip, or the conversation's rail files panel) and she can answer questions from its contents — a partial factor buried in §6.1 of a Eurocode, a value in a 1000-page spec — quoting the exact passage and citing file + page. The file is owner-private and rides that conversation (a many-to-many join, so one file can be re-attached to many chats).

The pipeline at a glance

Loading diagram...

1 · Upload — uploadChatDocument

src/lib/chat-document-server.ts holds the server function. It's a createServerFn({ method: "POST" }) that takes multipart FormData (the file, the threadId, and a scope/scopeKey stamp). On each upload it does three writes and then kicks off processing:

  • Bytes → object storage. putObject(uploadKey("chat-docs", ext), buffer, mime) from fusion-storage — MinIO locally, Azure Blob in the cloud. The DB never holds the file bytes, only a storageKey pointer.
  • A chat_document row with a short base62 id (the no-UUID rule — a user references this file by id when they rename/delete it) and status: "pending".
  • A chat_document_attachment row ({ documentId, threadId }, onConflictDoNothing) — this join is what makes the file belong to the conversation. Listing a chat's files (listChatDocuments) and retrieval both read through this join, keyed by threadId; there is no project-wide file list.

Coming from Django? uploadChatDocument is your DRF @api_view POST handler; requireSession() is the auth check; the chat_document row is the model instance. The difference from a classic FileField is that the bytes go to object storage and the row stores extracted-and-embedded derivatives (chunks), because the file's job here is to be searched by meaning, not downloaded.

Then kickProcessing(id) runs the pipeline out of band:

example/src/lib/chat-document-server.ts
/** Kick off processing: a Hatchet job when available, else inline on the long-running SSR runtime. */
async function kickProcessing(id: string): Promise<void> {
	if (isJobsEnabled()) {
		const { getHatchet } = await import("@tikab-interactive/fusion-jobs/hatchet");
		await getHatchet()
			.admin.runWorkflow("process-chat-document", { documentId: id })
			.catch(() => processChatDocument(id));
	} else {
		await processChatDocument(id);
	}
}

The upload returns immediately with status: "pending"; the UI polls the row (pending → processing → ready/failed) and shows the chip as still working. Embeddings are asynchronous — a freshly uploaded file isn't retrievable until the pipeline reaches ready.

2 · Extract — Kreuzberg

src/lib/chat-document-process.ts → extractText() picks the extractor by file type:

  • Plain-text family (text/*, .md, .csv, .json, …) is read directly in-process — no external service. This is also the zero-dependency local-dev path.
  • Everything else (PDF incl. OCR, Office, images — Kreuzberg handles ~96 formats) is POSTed to the Kreuzberg service. src/lib/kreuzberg.ts → extractWithKreuzberg() sends the bytes to its /extract endpoint and reads back the full text. Kreuzberg is a container in the example's docker-compose.yml (next to MinIO + Hatchet); KREUZBERG_URL points at it and is preset in .env.local.

Page numbers come from splitPages(): many PDF→text extractors separate pages with a form-feed (\f), so when it's present we split on it and number the pages (page: 1, 2, …) — that's how a citation can say "p. 412". No form-feed → the document is one unpaginated body and chunks carry page: null.

Gotcha — the worker must be running. If KREUZBERG_URL is unset, plain text still works but any PDF/Office upload is marked failed with a message telling you to docker compose up kreuzberg. Failures are recorded on the row (status: "failed", error), never thrown into the request — a bad upload degrades to a visible error chip, it doesn't crash the chat.

3 · Chunk — chunkWithOffsets

Embedding models have a bounded context and retrieval wants passage-sized hits, so the extracted text is split into overlapping windows. chat-document-process.ts → buildChunks() chunks one page at a time (so offsets stay page-relative and pair cleanly with the page number) using chunkWithOffsets() from fusion-search:

  • ~1200 characters per chunk, 150 of overlap (CHUNK_CHARS / CHUNK_OVERLAP). The overlap means a sentence split across a boundary still lands wholly inside at least one chunk.
  • Boundary-aware — it prefers to cut at a paragraph → sentence → line break near the target size rather than mid-word, so chunks read as coherent passages.
  • Offset-truthful — each chunk records [charStart, charEnd) into the cleaned page text, with the invariant content === clean.slice(charStart, charEnd) (asserted in chunk.test.ts). Those offsets let a hit deep-link to and highlight the exact matched span.

Each chunk becomes a chat_document_chunk row carrying ordinal, page, the offsets, and content.

4 · Embed — embed() → pgvector

chat-document-process.ts → embedChunks() turns each chunk's text into a 768-dim vector via embed() from fusion-ai, batched 64 at a time and run outside any transaction (it's the slow, I/O-bound step). The vector is the chunk's meaning as a point in 768-space; "close" vectors mean "similar meaning", which is what makes semantic retrieval possible.

The vectors are stored on the chunk rows in a pgvector column — embedding (768-dim, null until embedded), paired with an embeddingPending flag and an HNSW index for fast cosine search. The full chunk table from the schema (src/db/schema/chat-document.ts):

example/src/db/schema/chat-document.ts
export const chatDocumentChunk = pgTable(
	"chat_document_chunk",
	{
		// Internal row id — never referenced as an identifier (chunks are queried by documentId), so it
		// stays a generated uuid; the no-UUID rule governs user-referenced ids, not opaque rows.
		id: uuid("id").primaryKey().defaultRandom(),
		documentId: text("document_id")
			.notNull()
			.references(() => chatDocument.id, { onDelete: "cascade" }),
		// Chunk order within the document.
		ordinal: integer("ordinal").notNull(),
		// 1-based page number for the citation (null if the extractor gave no page info).
		page: integer("page"),
		// Character offsets of this chunk within its (cleaned) page text — lets a search hit
		// highlight the exact matched passage. Null for chunks written before offsets were tracked.
		charStart: integer("char_start"),
		charEnd: integer("char_end"),
		content: text("content").notNull(),
		// Null until embedded; `embeddingPending` lets a back-fill job find un-embedded rows,
		// so an embedding failure never makes a chunk silently unsearchable (it still matches
		// the keyword fallback).
		embedding: vector("embedding", { dimensions: CHAT_DOC_EMBED_DIM }),
		embeddingPending: boolean("embedding_pending").notNull().default(true),
		createdAt: timestamp("created_at").notNull().defaultNow(),
	},
	(t) => [
		index("chat_document_chunk_doc_idx").on(t.documentId, t.ordinal),
		// ANN index for cosine similarity search over chunk embeddings (pgvector hnsw).
		index("chat_document_chunk_embedding_hnsw").using("hnsw", t.embedding.op("vector_cosine_ops")),
	],
);

When no embedder is configured (or a batch fails), embed() returns null and the chunk is written with embedding: null, embeddingPending: true. It's still keyword-searchable (step 5's fallback) and a back-fill job can find it later by that flag — an embedding failure never makes a chunk silently unsearchable. The row's status flips to ready once chunks are written, and indexDocumentChunks() mirrors the finished chunks into the universal search_indexreusing the same vectors, never re-embedding (see below).

Coming from React + Django? The HNSW index is the load-bearing trick. Without it, "find the nearest vector" is a full table scan computing cosine distance against every chunk; with it, pgvector keeps an approximate-nearest-neighbour graph so a kNN query is sub-linear — the vector analogue of reaching for a B-tree instead of WHERE … LIKE.

5 · Retrieve — searchChatDocuments

At question time Carola has a tool, search_attached_documents (defined in src/lib/agent-assistant.ts), available in every scope. The LLM calls it whenever the user asks about the content of an attached file; it forwards to searchChatDocuments() in src/lib/chat-document-search.ts, which is doubly scoped:

example/src/lib/chat-document-search.ts
	// Files attached to THIS conversation (via the join), owner-scoped + ready.
	const docs = await db
		.select({
			id: chatDocument.id,
			filename: chatDocument.filename,
			displayName: chatDocument.displayName,
		})
		.from(chatDocumentAttachment)
		.innerJoin(chatDocument, eq(chatDocument.id, chatDocumentAttachment.documentId))
		.where(
			and(
				eq(chatDocument.ownerId, opts.ownerId),
				eq(chatDocumentAttachment.threadId, opts.threadId),
				eq(chatDocument.status, "ready"),
			),
		);

One user's files can never leak into another's chat: retrieval starts from the per-thread join filtered by ownerId. Over those documents' chunks it runs semantic-first, keyword-fallback:

  • Semantic (the normal path). Embed the query, then order chunks by pgvector cosine distance (cosineDistance(chunk.embedding, queryVec)) over rows where embeddingPending = false, taking the top k (default 6). This is the nearest-neighbour lookup — it finds passages about the same thing as the question even when they share no words with it.
  • Keyword fallback. If there's no embedder (air-gapped) or embeddings haven't finished yet, it falls back to ilike '%term%' over the chunk text for the query's longer terms — so "find the exact value" still works immediately, before the vectors land.

It returns DocExcerpt[]{ file, page, content } per hit — and Carola is instructed to quote the exact text and cite the file and page ("…the standard gives γM0 = 1.0 (_EN 1993-1-1, p. 47)"). That's the whole RAG loop: retrieve the right chunks, stuff them into the prompt, let the model answer grounded in them.

Gotchas to remember

  • The worker must be running for anything that isn't plain text. No KREUZBERG_URL → PDFs/Office land as failed. (docker compose up kreuzberg — it's preset in .env.local.)
  • Embeddings are asynchronous. Upload returns at pending; a file isn't semantically retrievable until it reaches ready. The keyword fallback covers the gap.
  • Restart the worker after a schema change. The Hatchet worker imports the schema at boot, so a chat_document* migration means re-running bun run worker (or the inline path keeps serving stale shapes).

Universal search

One search box, in the app header, over everything you can see — projects, tasks, protocols, news, handbook articles, people, and your own conversations + attached files. It ranks across all of them at once, owner- and membership-scoped, and never widens access.

Three verbs, never blurred. The same box does Ask (hand the query to Carola), Find (ranked results you click into), and Scope (re-lens the conversation onto a building). Each is a distinct affordance — search never silently becomes chat, and chat never silently becomes a filter.

Where it lives

LayerFiles
Enginefusion-search — the pure ranking algorithm (FTS + trigram + vector, fused with RRF)
Indexsearch_index (one owner-scoped table) · src/lib/search-index-server.ts (the indexer)
Server functionuniversalSearch — embeds the query, runs the three retrievers, fuses, filters by what the viewer sees
UIfusion-ui UniversalSearch (the header find/go control), wired into the app shell

How it ranks

Three retrievers run over the one search_index, then Reciprocal Rank Fusion (in fusion-search) merges their rankings into a single list:

Loading diagram...

FTS catches exact words, trigram catches typos and partial matches, and the vector retriever catches meaning. RRF needs no score calibration between the lanes — and that's the point. A Postgres ts_rank and a pgvector cosine distance live on incomparable scales, so weighting them against each other is guesswork. reciprocalRankFusion() (in rrf.ts) sidesteps it: for every lane an id appears in it adds 1 / (k + rank) (k = 60, the canonical constant) and sums those contributions, so a result strong in any lane — and especially one near the top of two — surfaces. Ties break by id, so it's deterministic. With no embedder configured it degrades to FTS + trigram, so search works air-gapped too.

One index, owner-scoped

Everything indexable mirrors into a single search_index row (entityType, title, content, embedding, an owner/visibility key, a deep-link url). The indexer in search-index-server.ts has a collect* per domain (collectWiki, collectTasks, collectPoints, collectPeople, collectDocuments, …) and each one stamps the security-load-bearing visibility for its rows: public, project / protokoll (membership-gated, with a scopeKey), or owner (your private conversations and files). The same permission filters the apps use are applied at query time, so a result can never reveal something the viewer couldn't already open — and rows that can't be scoped (e.g. an orphaned protocol point) are simply not indexed, security over completeness.

search-index-server.ts keeps the index fresh incrementally — every create/update/delete of an indexed entity re-mirrors or removes its row (reindexEntity / removeEntity), and bun run db:seed does a full reindexAll() so search works from the first run.

Extending it

  • Index a new entity → add a collect* mapping in search-index-server.ts (title + content + url
    • visibility) and call reindexEntity from its server mutations.
  • A new retriever or weighting → it lives in fusion-search, independent of the app; the server function just calls it.

How the two systems meet

They're separate at query time but share the embedding work. When a file finishes processing, indexDocumentChunks() (in search-index-server.ts, called from the end of processChatDocument) projects its chunks into search_index reusing the vectors the RAG pipeline already computed — a document is never embedded twice. collectDocuments() even sets each row's deep-link url to the file's object route with a #page=N anchor, so a universal-search hit on a PDF jumps to the page.

So the same chunk lives in two places for two jobs:

  • in chat_document_chunk, retrieved by searchChatDocuments (cosine kNN, this thread's files only) to feed Carola's prompt and get a cited answer;
  • mirrored into search_index, retrieved by universalSearch (hybrid + RRF, everything you can see) to rank the file as one result you click.

RAG pulls paragraphs into the prompt; universal search ranks whole things you can open. Same Postgres, same pgvector, opposite ends of the same idea.