Storage — object storage (files & blobs)
Anything a user uploads — a chat attachment, a profile image, a 1000-page PDF — is a blob:
a chunk of bytes that's too big to belong in a row, wants to be streamed, and may one day
sit behind a CDN. So it doesn't go in the database. It goes in object storage, and the
database keeps only a short storageKey pointer to it.
fusion-storage is the stack's object-storage seam. It exposes a tiny
server-only API — putObject / getObject / removeObject — over two interchangeable
drivers: an S3-compatible MinIO container locally and on-prem, Azure Blob in the
cloud. Which one runs is decided by environment variables; the calling code never knows the
difference.
Coming from Django? This is Django's storage backends, made explicit. A Django
FileFieldhides aStoragebackend —FileSystemStorageby default,S3Boto3Storage(django-storages) in production — and your model column stores a path/name, not the bytes. fusion-storage is the same split without the ORM magic:putObject(key, bytes)≈default_storage.save(name, content),getObject(key)≈default_storage.open(name), and the column on your row (storageKey) is the name you saved under. The/api/filesroute below is your media-serving view — with the permission check you'd write by hand, because object storage doesn't do per-user auth.
Object storage, not the database
The rule across the stack: blobs live in object storage; the row stores a storageKey. When
you upload a file to a Carola conversation, the chat_document row that's created holds the
filename, MIME type, size, and a storageKey — but not the bytes (see
Search & RAG for the full attachment pipeline). The bytes are in MinIO/Blob under
that key.
Three reasons the blob never touches Postgres:
- Size. A 50 MB PDF as a
byteacolumn bloats the table, the WAL, every backup, and everySELECT *that forgets to exclude it. Object stores are built for large opaque objects; relational tables are not. - Streaming. Serving a file means streaming bytes to a
Response, not marshalling them through the query layer. Object stores hand you a readable stream; the database would force the whole blob through a driver round-trip first. - CDN / offload. A
storageKey+ a serving URL is exactly the shape a CDN or a signed-URL scheme wants. Keeping bytes out of the row leaves that door open — in Azure the blob already lives in a storage account that a CDN can front directly.
So the database is the index; object storage is the content. The row knows about the file (and, for attachments, stores extracted-and-embedded derivatives — chunks — for RAG); the store knows the file.
The provider seam
fusion-storage/src/index.ts defines one internal interface and two implementations of it. The
interface is deliberately small:
// fusion-storage/src/index.ts — the whole contract the rest of the stack codes against.
type ObjectStore = {
put(key: string, body: Buffer, contentType: string): Promise<void>;
get(key: string): Promise<{ body: Buffer; contentType: string } | null>;
remove(key: string): Promise<void>;
};minioStore() implements it with the minio client
(S3-compatible — same protocol as AWS S3, just pointed at a MinIO endpoint); azureBlobStore()
implements it with @azure/storage-blob. A cached store() picks one at first use from the
environment:
// fusion-storage/src/index.ts — AZURE_STORAGE_ACCOUNT set → Blob; otherwise MinIO/S3.
let cached: ObjectStore | null = null;
function store(): ObjectStore {
if (cached) return cached;
cached = process.env.AZURE_STORAGE_ACCOUNT
? azureBlobStore(process.env.AZURE_STORAGE_ACCOUNT)
: minioStore();
return cached;
}The public functions — putObject, getObject, removeObject — are one-line wrappers that
just call store().put/get/remove. The driver choice is invisible above this line. A
createServerFn handler calls putObject(key, bytes, mime) and gets identical behaviour whether
it's hitting a MinIO container on a developer's laptop or an Azure storage account in production.
| Concern | MinIO / S3 (local, on-prem) | Azure Blob (cloud) |
|---|---|---|
| Selected when | AZURE_STORAGE_ACCOUNT is unset | AZURE_STORAGE_ACCOUNT is set |
| Client | minio | @azure/storage-blob |
| Container | bucket S3_BUCKET (default uploads) | container AZURE_STORAGE_CONTAINER (default uploads) |
| Endpoint | S3_ENDPOINT (e.g. http://localhost:9000) | https://<account>.blob.core.windows.net |
| Auth | S3_ACCESS_KEY / S3_SECRET_KEY | AZURE_STORAGE_KEY (account key) |
A few details worth knowing:
- "Not found" is a contract, not an exception.
getObjectreturnsnullfor a missing object, but the underlying clients throw. The MinIO path guardsstatObjectand translates the miss viaisMinioNotFound(err)(which matches bothNotFoundandNoSuchKey, since S3-compatible servers disagree on the code) — and rethrows anything else (auth, network, bad bucket), because those are real failures, not absences. - Server-only, enforced by the bundler. The package ships a
"browser"export condition that resolves tobrowser-stub.ts, where every export is a function that throws "fusion-storage is server-only…". So a stray client-side import fails loudly instead of trying to ship native deps and credentials to the browser. Server code gets the real module because Vite picks the"node"condition during SSR. In practice you never import it from a component anyway — only fromcreateServerFnhandlers, API routes, and jobs. - Azure auth today, hardening later. The Blob driver authenticates with the account key
(
AZURE_STORAGE_KEY). Switching to a managed identity is the documented hardening step — the account key is what the deploy wires in automatically via Bicep app settings.
Upload → store
The canonical writer is uploadChatDocument in example/src/lib/chat-document-server.ts — the
server side of attaching a file to a Carola conversation. It's a createServerFn({ method: "POST" })
that takes multipart FormData. The storage-relevant core is the key/buffer/putObject/insert
sequence in the handler:
export const uploadChatDocument = createServerFn({ method: "POST" })
.inputValidator((data: unknown) => {
if (!(data instanceof FormData)) throw new Error("Expected multipart FormData");
return data;
})
.handler(async ({ data }) => {
const session = await requireSession();
const file = data.get("file");
const scope =
String(data.get("scope") ?? "general").trim() === "project" ? "project" : "general";
const scopeKey = scope === "project" ? String(data.get("scopeKey") ?? "").trim() || null : null;
const threadId = String(data.get("threadId") ?? "").trim() || null;
if (!(file instanceof File)) throw new Error("No file provided");
if (file.size > MAX_BYTES) throw new Error("File too large (max 50 MB)");
if (scope === "project" && !scopeKey) throw new Error("project scope requires a scopeKey");
const key = uploadKey("chat-docs", extForName(file.name));
const buffer = Buffer.from(await file.arrayBuffer());
await putObject(key, buffer, file.type || "application/octet-stream");
const id = generateShortId();
await db.insert(chatDocument).values({
id,
ownerId: session.user.id,
scope,
scopeKey,
filename: file.name,
storageKey: key,
mimeType: file.type || "",
sizeBytes: file.size,
status: "pending",
});
// Attach to the conversation it was uploaded into — it rides this chat across messages.
if (threadId) {
await db
.insert(chatDocumentAttachment)
.values({ documentId: id, threadId })
.onConflictDoNothing();
}
await kickProcessing(id);
const [latest] = await db
.select({ status: chatDocument.status })
.from(chatDocument)
.where(eq(chatDocument.id, id))
.limit(1);
return { id, status: latest?.status ?? "pending", filename: file.name };
});Reading the storage-relevant steps:
- Generate a key.
uploadKey("chat-docs", ext)produces a collision-resistant key under a prefix (chat-docs/<random>.<ext>). The prefix is just a namespace inside the one bucket — it keeps chat attachments separate from, say, theuploads/images thatstorage-server.ts → uploadImagewrites. - Read the file into a Buffer.
file.arrayBuffer()pulls the multipart body;Buffer.frommakes it theBufferthe store API wants. putObject(key, buffer, mime). The bytes land in object storage. This is the only place the file content lives.- Store the pointer. A
chat_documentrow recordsstorageKey: key(plus filename, MIME, size) and a short base62 id — never a UUID (the stack's no-UUID rule). The row is the database's handle on a blob it doesn't hold.
After this the row is pending and processing is kicked off out-of-band (extract → chunk → embed).
That pipeline reads the bytes back with getObject(doc.storageKey) in
chat-document-process.ts — the same key, the other direction:
const object = await getObject(doc.storageKey);
if (!object) throw new Error("file bytes not found in storage");
const extracted = await extractText(new Uint8Array(object.body), doc.filename, doc.mimeType);The full extract → chunk → embed → index pipeline is documented in Search & RAG — this page is just the bytes-in, bytes-out seam underneath it.
Coming from Django?
uploadChatDocumentis your DRF@api_viewPOST handler:requireSession()is the permission check, thechat_documentrow is the model instance, andputObjectis theFileField's save. The one twist versus a plainFileField: the bytes go to the store and the row additionally keeps derived data (the extracted text, chunked and embedded), because this file's job is to be searched by meaning, not merely downloaded.
A second, smaller writer — uploadImage in storage-server.ts — shows the same shape for images
(validate the content-type, uploadKey("uploads", ext), putObject, return fileUrl(key)),
useful when you just need a URL and no database row.
Serving — /api/files?key=…
Object storage has no per-user authorization, and the bucket isn't public. So reading a blob back
goes through an app route that streams it — example/src/routes/api/files.ts:
import { createFileRoute } from "@tanstack/react-router";
import { getObject } from "@tikab-interactive/fusion-storage";
// Serves a stored object by key — the URL fusion-storage's `fileUrl()` builds
// (`/api/files?key=…`). Server-only API route, so the storage driver never
// enters the client bundle.
export const Route = createFileRoute("/api/files")({
server: {
handlers: {
GET: async ({ request }) => {
const key = new URL(request.url).searchParams.get("key");
if (!key) return new Response("Missing key", { status: 400 });
const object = await getObject(key);
if (!object) return new Response("Not found", { status: 404 });
return new Response(new Uint8Array(object.body), {
headers: {
"Content-Type": object.contentType,
"Cache-Control": "private, max-age=3600",
},
});
},
},
},
});fileUrl(key) (in fusion-storage) builds the URL this route answers:
/api/files?key=<urlencoded key>. Note the key travels as a query param, not a path segment —
on purpose. Vite's dev server intercepts request paths ending in an asset extension (.png,
.jpg, …) and 404s them before they reach the route; keeping the extension inside the query string
sidesteps that, and behaves identically in production.
Owner-scoping — a key must not leak another user's file
The route above streams any key it's given. On its own, that's a hole: an attachment is owner-private, so possessing (or guessing) a key must never be enough to read someone else's file. The scoping lives at the layer that hands out the URLs, not in the raw byte route:
- A chat's files are only ever listed through
listChatDocuments/getFileContent, which joinchat_document_attachment → chat_documentfiltered bychatDocument.ownerId = session.user.idand thethreadId. You only receive afileUrlfor a file your session owns. - Universal search only emits a file's URL for rows whose
visibilityisownerand whoseownerIdmatches the viewer — the authorization filters run at query time, so a search hit can never surface a key you couldn't already open.
Because keys are random base62 (uploadKey → randomUUID()-derived), they're unguessable, and
they're only ever revealed to the owner. The general principle — don't trust the key alone,
scope at the URL-issuing layer — is the same one you'd apply to a Django media view that checks
request.user before serve()-ing a protected file.
Deep-linking into a page
The search index doesn't just point at the file — it points at the page. When
collectDocuments() in search-index-server.ts mirrors a document's chunks into search_index,
it builds the file route once per chunk and stamps each row's deep-link url with a #page=N
anchor when the chunk knows its page:
const fileUrl = `/api/files?key=${encodeURIComponent(doc.storageKey)}`;
return [
{
entityType: "document" as const,
entityId: doc.id,
chunkOrdinal: c.ordinal,
page: c.page,
charStart: c.charStart,
charEnd: c.charEnd,
visibility: "owner" as const,
ownerId: doc.ownerId,
// Stamp the file's scope (cheap) so a future lens can scope-filter owner documents; the
// universal-search pre-filter still matches owner rows by ownerId, so behaviour is unchanged.
scopeKey: doc.scopeKey,
title: doc.displayName ?? doc.filename,
body: c.content,
url: c.page ? `${fileUrl}#page=${c.page}` : fileUrl,So a universal-search hit on a 400-page PDF resolves to /api/files?key=…#page=412, and the
browser's PDF viewer jumps straight to the cited page. The page numbers come from the extraction
step (form-feed splitting) — see Search & RAG for how page gets onto each chunk.
Gotcha — Docker clock skew breaks MinIO
A sharp local-dev failure mode, worth recognising on sight. S3-compatible servers (MinIO included)
reject requests whose signed timestamp is too far from the server's clock with
RequestTimeTooSkewed. After your Mac sleeps, the Docker VM's clock drifts behind the host, so
the signature on every putObject/getObject looks stale — and uploads and file-serving start
failing silently (an upload row goes failed, an image won't load), with no obvious cause in
the app.
The same drift also blocks Docker from pulling new images, so you can't always just "pull a fresh container" your way out of it.
Fix: resync the Docker VM's clock. The reliable move is to set the VM clock from inside, using an image that's already present (no pull required):
# Resync the Docker VM clock to the host (uses an already-present image — no pull needed).
docker run --rm --privileged alpine date -s "$(date -u +%T)"Then confirm the MinIO container (and, for attachments, the Kreuzberg extractor) is back up and
retry the upload. The symptom to memorise: RequestTimeTooSkewed after the machine has been
asleep → it's the VM clock, not your code.
Local vs cloud at a glance
Locally, MinIO runs as a container in example/docker-compose.yml, with a one-shot createbucket
helper that makes the uploads bucket and exits:
# example/docker-compose.yml — MinIO + a one-shot bucket creator.
minio:
image: minio/minio
command: server /data --console-address ":9001"
ports:
- "${MINIO_PORT:-9000}:9000" # S3 API
- "${MINIO_CONSOLE_PORT:-9001}:9001" # web console
environment:
MINIO_ROOT_USER: ${S3_ACCESS_KEY:-minioadmin}
MINIO_ROOT_PASSWORD: ${S3_SECRET_KEY:-minioadmin}
volumes:
- minio_data:/data
createbucket: # waits for MinIO, then `mc mb local/uploads`, then exits
image: minio/mc
depends_on: [minio]The matching .env.local points the S3 driver at it (S3_ENDPOINT=http://localhost:9000,
S3_BUCKET=uploads, minioadmin/minioadmin), and leaves AZURE_STORAGE_ACCOUNT unset — so
the MinIO driver is the one store() selects. The MinIO web console is on
localhost:9001 if you want to eyeball the bucket.
In the cloud, the deploy provisions an Azure storage account with a private
uploads blob container and injects AZURE_STORAGE_ACCOUNT / AZURE_STORAGE_KEY /
AZURE_STORAGE_CONTAINER as Container App settings (Bicep, example/infra/main.bicep). The same
AZURE_STORAGE_ACCOUNT being set is what flips store() over to the Azure Blob driver — no code
change, just environment. That's the whole point of the seam: identical putObject/getObject
calls, a different store underneath.
See also
- Search & RAG — the attachment pipeline that writes (
putObject) and reads back (getObject) blobs: extract → chunk → embed → index, and the#page=Ndeep-links. - Deploy — how the Azure storage account +
uploadscontainer are provisioned and wired throughAZURE_STORAGE_*. - Packages — where
fusion-storagesits among the stack's packages. - Carola — the app whose conversation attachments are the main user of object storage.