Skip to content
Fusion

Storage — object storage (files & blobs)

Anything a user uploads — a chat attachment, a profile image, a 1000-page PDF — is a blob: a chunk of bytes that's too big to belong in a row, wants to be streamed, and may one day sit behind a CDN. So it doesn't go in the database. It goes in object storage, and the database keeps only a short storageKey pointer to it.

fusion-storage is the stack's object-storage seam. It exposes a tiny server-only API — putObject / getObject / removeObject — over two interchangeable drivers: an S3-compatible MinIO container locally and on-prem, Azure Blob in the cloud. Which one runs is decided by environment variables; the calling code never knows the difference.

Coming from Django? This is Django's storage backends, made explicit. A Django FileField hides a Storage backend — FileSystemStorage by default, S3Boto3Storage (django-storages) in production — and your model column stores a path/name, not the bytes. fusion-storage is the same split without the ORM magic: putObject(key, bytes)default_storage.save(name, content), getObject(key)default_storage.open(name), and the column on your row (storageKey) is the name you saved under. The /api/files route below is your media-serving view — with the permission check you'd write by hand, because object storage doesn't do per-user auth.


Object storage, not the database

The rule across the stack: blobs live in object storage; the row stores a storageKey. When you upload a file to a Carola conversation, the chat_document row that's created holds the filename, MIME type, size, and a storageKey — but not the bytes (see Search & RAG for the full attachment pipeline). The bytes are in MinIO/Blob under that key.

Three reasons the blob never touches Postgres:

  • Size. A 50 MB PDF as a bytea column bloats the table, the WAL, every backup, and every SELECT * that forgets to exclude it. Object stores are built for large opaque objects; relational tables are not.
  • Streaming. Serving a file means streaming bytes to a Response, not marshalling them through the query layer. Object stores hand you a readable stream; the database would force the whole blob through a driver round-trip first.
  • CDN / offload. A storageKey + a serving URL is exactly the shape a CDN or a signed-URL scheme wants. Keeping bytes out of the row leaves that door open — in Azure the blob already lives in a storage account that a CDN can front directly.

So the database is the index; object storage is the content. The row knows about the file (and, for attachments, stores extracted-and-embedded derivatives — chunks — for RAG); the store knows the file.

Loading diagram...

The provider seam

fusion-storage/src/index.ts defines one internal interface and two implementations of it. The interface is deliberately small:

// fusion-storage/src/index.ts — the whole contract the rest of the stack codes against.
type ObjectStore = {
	put(key: string, body: Buffer, contentType: string): Promise<void>;
	get(key: string): Promise<{ body: Buffer; contentType: string } | null>;
	remove(key: string): Promise<void>;
};

minioStore() implements it with the minio client (S3-compatible — same protocol as AWS S3, just pointed at a MinIO endpoint); azureBlobStore() implements it with @azure/storage-blob. A cached store() picks one at first use from the environment:

// fusion-storage/src/index.ts — AZURE_STORAGE_ACCOUNT set → Blob; otherwise MinIO/S3.
let cached: ObjectStore | null = null;
function store(): ObjectStore {
	if (cached) return cached;
	cached = process.env.AZURE_STORAGE_ACCOUNT
		? azureBlobStore(process.env.AZURE_STORAGE_ACCOUNT)
		: minioStore();
	return cached;
}

The public functions — putObject, getObject, removeObject — are one-line wrappers that just call store().put/get/remove. The driver choice is invisible above this line. A createServerFn handler calls putObject(key, bytes, mime) and gets identical behaviour whether it's hitting a MinIO container on a developer's laptop or an Azure storage account in production.

ConcernMinIO / S3 (local, on-prem)Azure Blob (cloud)
Selected whenAZURE_STORAGE_ACCOUNT is unsetAZURE_STORAGE_ACCOUNT is set
Clientminio@azure/storage-blob
Containerbucket S3_BUCKET (default uploads)container AZURE_STORAGE_CONTAINER (default uploads)
EndpointS3_ENDPOINT (e.g. http://localhost:9000)https://<account>.blob.core.windows.net
AuthS3_ACCESS_KEY / S3_SECRET_KEYAZURE_STORAGE_KEY (account key)

A few details worth knowing:

  • "Not found" is a contract, not an exception. getObject returns null for a missing object, but the underlying clients throw. The MinIO path guards statObject and translates the miss via isMinioNotFound(err) (which matches both NotFound and NoSuchKey, since S3-compatible servers disagree on the code) — and rethrows anything else (auth, network, bad bucket), because those are real failures, not absences.
  • Server-only, enforced by the bundler. The package ships a "browser" export condition that resolves to browser-stub.ts, where every export is a function that throws "fusion-storage is server-only…". So a stray client-side import fails loudly instead of trying to ship native deps and credentials to the browser. Server code gets the real module because Vite picks the "node" condition during SSR. In practice you never import it from a component anyway — only from createServerFn handlers, API routes, and jobs.
  • Azure auth today, hardening later. The Blob driver authenticates with the account key (AZURE_STORAGE_KEY). Switching to a managed identity is the documented hardening step — the account key is what the deploy wires in automatically via Bicep app settings.

Upload → store

The canonical writer is uploadChatDocument in example/src/lib/chat-document-server.ts — the server side of attaching a file to a Carola conversation. It's a createServerFn({ method: "POST" }) that takes multipart FormData. The storage-relevant core is the key/buffer/putObject/insert sequence in the handler:

example/src/lib/chat-document-server.ts
export const uploadChatDocument = createServerFn({ method: "POST" })
	.inputValidator((data: unknown) => {
		if (!(data instanceof FormData)) throw new Error("Expected multipart FormData");
		return data;
	})
	.handler(async ({ data }) => {
		const session = await requireSession();
		const file = data.get("file");
		const scope =
			String(data.get("scope") ?? "general").trim() === "project" ? "project" : "general";
		const scopeKey = scope === "project" ? String(data.get("scopeKey") ?? "").trim() || null : null;
		const threadId = String(data.get("threadId") ?? "").trim() || null;
		if (!(file instanceof File)) throw new Error("No file provided");
		if (file.size > MAX_BYTES) throw new Error("File too large (max 50 MB)");
		if (scope === "project" && !scopeKey) throw new Error("project scope requires a scopeKey");
 
		const key = uploadKey("chat-docs", extForName(file.name));
		const buffer = Buffer.from(await file.arrayBuffer());
		await putObject(key, buffer, file.type || "application/octet-stream");
 
		const id = generateShortId();
		await db.insert(chatDocument).values({
			id,
			ownerId: session.user.id,
			scope,
			scopeKey,
			filename: file.name,
			storageKey: key,
			mimeType: file.type || "",
			sizeBytes: file.size,
			status: "pending",
		});
		// Attach to the conversation it was uploaded into — it rides this chat across messages.
		if (threadId) {
			await db
				.insert(chatDocumentAttachment)
				.values({ documentId: id, threadId })
				.onConflictDoNothing();
		}
 
		await kickProcessing(id);
		const [latest] = await db
			.select({ status: chatDocument.status })
			.from(chatDocument)
			.where(eq(chatDocument.id, id))
			.limit(1);
		return { id, status: latest?.status ?? "pending", filename: file.name };
	});

Reading the storage-relevant steps:

  1. Generate a key. uploadKey("chat-docs", ext) produces a collision-resistant key under a prefix (chat-docs/<random>.<ext>). The prefix is just a namespace inside the one bucket — it keeps chat attachments separate from, say, the uploads/ images that storage-server.ts → uploadImage writes.
  2. Read the file into a Buffer. file.arrayBuffer() pulls the multipart body; Buffer.from makes it the Buffer the store API wants.
  3. putObject(key, buffer, mime). The bytes land in object storage. This is the only place the file content lives.
  4. Store the pointer. A chat_document row records storageKey: key (plus filename, MIME, size) and a short base62 id — never a UUID (the stack's no-UUID rule). The row is the database's handle on a blob it doesn't hold.

After this the row is pending and processing is kicked off out-of-band (extract → chunk → embed). That pipeline reads the bytes back with getObject(doc.storageKey) in chat-document-process.ts — the same key, the other direction:

example/src/lib/chat-document-process.ts
		const object = await getObject(doc.storageKey);
		if (!object) throw new Error("file bytes not found in storage");
 
		const extracted = await extractText(new Uint8Array(object.body), doc.filename, doc.mimeType);

The full extract → chunk → embed → index pipeline is documented in Search & RAG — this page is just the bytes-in, bytes-out seam underneath it.

Coming from Django? uploadChatDocument is your DRF @api_view POST handler: requireSession() is the permission check, the chat_document row is the model instance, and putObject is the FileField's save. The one twist versus a plain FileField: the bytes go to the store and the row additionally keeps derived data (the extracted text, chunked and embedded), because this file's job is to be searched by meaning, not merely downloaded.

A second, smaller writer — uploadImage in storage-server.ts — shows the same shape for images (validate the content-type, uploadKey("uploads", ext), putObject, return fileUrl(key)), useful when you just need a URL and no database row.


Serving — /api/files?key=…

Object storage has no per-user authorization, and the bucket isn't public. So reading a blob back goes through an app route that streams it — example/src/routes/api/files.ts:

example/src/routes/api/files.ts
import { createFileRoute } from "@tanstack/react-router";
 
import { getObject } from "@tikab-interactive/fusion-storage";
 
// Serves a stored object by key — the URL fusion-storage's `fileUrl()` builds
// (`/api/files?key=…`). Server-only API route, so the storage driver never
// enters the client bundle.
export const Route = createFileRoute("/api/files")({
	server: {
		handlers: {
			GET: async ({ request }) => {
				const key = new URL(request.url).searchParams.get("key");
				if (!key) return new Response("Missing key", { status: 400 });
				const object = await getObject(key);
				if (!object) return new Response("Not found", { status: 404 });
				return new Response(new Uint8Array(object.body), {
					headers: {
						"Content-Type": object.contentType,
						"Cache-Control": "private, max-age=3600",
					},
				});
			},
		},
	},
});

fileUrl(key) (in fusion-storage) builds the URL this route answers: /api/files?key=<urlencoded key>. Note the key travels as a query param, not a path segment — on purpose. Vite's dev server intercepts request paths ending in an asset extension (.png, .jpg, …) and 404s them before they reach the route; keeping the extension inside the query string sidesteps that, and behaves identically in production.

Owner-scoping — a key must not leak another user's file

The route above streams any key it's given. On its own, that's a hole: an attachment is owner-private, so possessing (or guessing) a key must never be enough to read someone else's file. The scoping lives at the layer that hands out the URLs, not in the raw byte route:

  • A chat's files are only ever listed through listChatDocuments / getFileContent, which join chat_document_attachment → chat_document filtered by chatDocument.ownerId = session.user.id and the threadId. You only receive a fileUrl for a file your session owns.
  • Universal search only emits a file's URL for rows whose visibility is owner and whose ownerId matches the viewer — the authorization filters run at query time, so a search hit can never surface a key you couldn't already open.

Because keys are random base62 (uploadKeyrandomUUID()-derived), they're unguessable, and they're only ever revealed to the owner. The general principle — don't trust the key alone, scope at the URL-issuing layer — is the same one you'd apply to a Django media view that checks request.user before serve()-ing a protected file.

Deep-linking into a page

The search index doesn't just point at the file — it points at the page. When collectDocuments() in search-index-server.ts mirrors a document's chunks into search_index, it builds the file route once per chunk and stamps each row's deep-link url with a #page=N anchor when the chunk knows its page:

example/src/lib/search-index-server.ts
		const fileUrl = `/api/files?key=${encodeURIComponent(doc.storageKey)}`;
		return [
			{
				entityType: "document" as const,
				entityId: doc.id,
				chunkOrdinal: c.ordinal,
				page: c.page,
				charStart: c.charStart,
				charEnd: c.charEnd,
				visibility: "owner" as const,
				ownerId: doc.ownerId,
				// Stamp the file's scope (cheap) so a future lens can scope-filter owner documents; the
				// universal-search pre-filter still matches owner rows by ownerId, so behaviour is unchanged.
				scopeKey: doc.scopeKey,
				title: doc.displayName ?? doc.filename,
				body: c.content,
				url: c.page ? `${fileUrl}#page=${c.page}` : fileUrl,

So a universal-search hit on a 400-page PDF resolves to /api/files?key=…#page=412, and the browser's PDF viewer jumps straight to the cited page. The page numbers come from the extraction step (form-feed splitting) — see Search & RAG for how page gets onto each chunk.


Gotcha — Docker clock skew breaks MinIO

A sharp local-dev failure mode, worth recognising on sight. S3-compatible servers (MinIO included) reject requests whose signed timestamp is too far from the server's clock with RequestTimeTooSkewed. After your Mac sleeps, the Docker VM's clock drifts behind the host, so the signature on every putObject/getObject looks stale — and uploads and file-serving start failing silently (an upload row goes failed, an image won't load), with no obvious cause in the app.

The same drift also blocks Docker from pulling new images, so you can't always just "pull a fresh container" your way out of it.

Fix: resync the Docker VM's clock. The reliable move is to set the VM clock from inside, using an image that's already present (no pull required):

# Resync the Docker VM clock to the host (uses an already-present image — no pull needed).
docker run --rm --privileged alpine date -s "$(date -u +%T)"

Then confirm the MinIO container (and, for attachments, the Kreuzberg extractor) is back up and retry the upload. The symptom to memorise: RequestTimeTooSkewed after the machine has been asleep → it's the VM clock, not your code.


Local vs cloud at a glance

Locally, MinIO runs as a container in example/docker-compose.yml, with a one-shot createbucket helper that makes the uploads bucket and exits:

# example/docker-compose.yml — MinIO + a one-shot bucket creator.
minio:
  image: minio/minio
  command: server /data --console-address ":9001"
  ports:
    - "${MINIO_PORT:-9000}:9000" # S3 API
    - "${MINIO_CONSOLE_PORT:-9001}:9001" # web console
  environment:
    MINIO_ROOT_USER: ${S3_ACCESS_KEY:-minioadmin}
    MINIO_ROOT_PASSWORD: ${S3_SECRET_KEY:-minioadmin}
  volumes:
    - minio_data:/data
 
createbucket: # waits for MinIO, then `mc mb local/uploads`, then exits
  image: minio/mc
  depends_on: [minio]

The matching .env.local points the S3 driver at it (S3_ENDPOINT=http://localhost:9000, S3_BUCKET=uploads, minioadmin/minioadmin), and leaves AZURE_STORAGE_ACCOUNT unset — so the MinIO driver is the one store() selects. The MinIO web console is on localhost:9001 if you want to eyeball the bucket.

In the cloud, the deploy provisions an Azure storage account with a private uploads blob container and injects AZURE_STORAGE_ACCOUNT / AZURE_STORAGE_KEY / AZURE_STORAGE_CONTAINER as Container App settings (Bicep, example/infra/main.bicep). The same AZURE_STORAGE_ACCOUNT being set is what flips store() over to the Azure Blob driver — no code change, just environment. That's the whole point of the seam: identical putObject/getObject calls, a different store underneath.


See also

  • Search & RAG — the attachment pipeline that writes (putObject) and reads back (getObject) blobs: extract → chunk → embed → index, and the #page=N deep-links.
  • Deploy — how the Azure storage account + uploads container are provisioned and wired through AZURE_STORAGE_*.
  • Packages — where fusion-storage sits among the stack's packages.
  • Carola — the app whose conversation attachments are the main user of object storage.