AI
fusion-ai is the stack's AI seam. It's
provider-pluggable — local Ollama or hosted OpenRouter — chosen from the
environment or, per request, from a cookie. It's also the one AI import: it
re-exports the @tanstack/ai pieces you need (toolDefinition, maxIterations) so
features build against fusion-ai, not the underlying library. When no provider is
configured it returns null (one-shot) or a 503 (streaming), so the AI path is
always present and degrades gracefully.
Coming from Django? There's no
settings.LLM_BACKEND, no model registry, no Celery task wrapping the call. fusion-ai is a thin function module (src/index.ts): it readsprocess.envat call time, picks a@tanstack/aiadapter, and runs it. The "config" is four env vars and a cookie. No singleton, no app-ready hook — the seam is stateless, so a request that flips the provider cookie hits a different backend with no restart.
The provider seam
Everything funnels through src/index.ts in fusion-ai. There are two real backends,
each a @tanstack/ai adapter, and the package never imports either model SDK at the top
of your feature code — it owns that dependency:
- Ollama (
@tanstack/ai-ollama→createOllamaChat) — local / on-prem inference against an HTTP daemon (http://localhost:11434). The air-gapped default. - OpenRouter (
@tanstack/ai-openrouter→openRouterText) — hosted inference over a single API key, fronting Anthropic / OpenAI / etc.
What selects a provider
Three functions, in order of authority:
envProvider()— the baseline.OLLAMA_BASE_URLset ⇒"ollama"; elseOPENROUTER_API_KEYset ⇒"openrouter"; elsenull. Ollama wins when both are set — an explicitOLLAMA_BASE_URLreads as air-gap intent, so a box that has both defaults to staying local.providerFromRequest(request)— the per-request overridestreamChatuses. It regex-matches theai_providercookie off the request headers and defaults to OpenRouter when the cookie is absent. This is what the sandbox provider toggle flips — switch backends with no reload.- An explicit
provideroption — pass{ provider: "ollama" }to any entry point and it wins outright. The proactive agent uses this to pin a provider for residency (below).
getTextAdapter({ provider, model }) resolves the choice to a concrete adapter:
createOllamaChat(model, url) for Ollama, openRouterText(model) for OpenRouter — or
null when the chosen provider's env var is missing. That null is the whole
graceful-degradation story: every entry point checks it and falls back rather than
throwing.
// src/index.ts — the resolver, condensed
export function getTextAdapter(options: AdapterOptions = {}) {
const provider = options.provider ?? envProvider();
if (provider === "ollama") {
const url = process.env.OLLAMA_BASE_URL;
return url ? createOllamaChat(modelFor("ollama", options.model), url) : null;
}
if (provider === "openrouter") {
return process.env.OPENROUTER_API_KEY
? openRouterText(modelFor("openrouter", options.model))
: null;
}
return null; // nothing configured → caller degrades
}The model id
modelFor(provider, override) picks the model string: an explicit model option wins,
else the provider's env var (OLLAMA_MODEL / OPENROUTER_MODEL), else a baked-in
default from DEFAULT_MODELS. aiModelId() exposes the resolved id (and returns
"stub" when AI is off), so a response can tell the UI which model answered.
| Var | Effect |
|---|---|
OPENROUTER_API_KEY | enable hosted inference (OpenRouter) |
OPENROUTER_MODEL | hosted model — default anthropic/claude-haiku-4.5 |
OLLAMA_BASE_URL | enable local inference, e.g. http://localhost:11434 |
OLLAMA_MODEL | local model — default llama3.3:70b (sized for a GPU box) |
OPENROUTER_REASONING_MODEL / OLLAMA_REASONING_MODEL | model used by the reasoning sandbox |
The hosted default is Haiku, not Opus — the deployments are small, so fusion-ai
favours cheap + tool-capable and lets you bump OPENROUTER_MODEL when a task needs more
power. The Ollama default llama3.3:70b is a GPU-box flagship; on a laptop set
OLLAMA_MODEL=llama3.2:1b (cheat sheet at the bottom).
Graceful degradation — the app always runs
isAiEnabled() is just Boolean(OLLAMA_BASE_URL || OPENROUTER_API_KEY). With neither
set, getTextAdapter returns null everywhere, and each entry point has a defined
no-AI behaviour:
generateText/generateObject→ returnnull; the caller renders a "configure AI" affordance instead of erroring (seesrc/lib/ai-server.ts, which mapsnullto{ configured: false }).streamChat→ returns a503Response with the provider name, so the chat UI shows it's off rather than hanging.- the proactive agent → swaps the LLM finding-model for a deterministic demo model
(
demoFindingModelinsrc/lib/agent-config.ts) that emits a canned finding for any signal tagged[URGENT]. The whole feature is demoable and e2e-testable with no LLM at all.
So the same build boots and serves with zero AI config — handy for CI, a closed network,
or a first-run developer who hasn't touched .env.local.
Two entry points
generateText(prompt, opts) — one shot, returns { text, model } or null.
The app does auth and reads the provider toggle; fusion-ai owns the AI side. Internally
it builds a single user ModelMessage, runs chat({ adapter, messages, systemPrompts }),
and collects the full reply with streamToText (no streaming to the client):
import { getRequestHeaders } from "@tanstack/react-start/server";
import { createServerFn } from "@tanstack/react-start";
import { z } from "zod";
import { type AiProvider, generateText } from "@tikab-interactive/fusion-ai";
import { getSession } from "#/lib/auth";
/**
* Result of the AI sandbox prompt. `configured: false` is the graceful-degrade
* path — fusion-ai returned no adapter (the chosen provider isn't configured) —
* so the page shows how to turn AI on instead of erroring.
*/
export type AiSandboxResult =
| { configured: false }
| { configured: true; model: string; text: string };
/** The provider the sandbox toggle chose (the `ai_provider` cookie), default OpenRouter. */
function providerFromCookie(): AiProvider {
const cookie = new Headers(getRequestHeaders()).get("cookie") ?? "";
return /(?:^|;\s*)ai_provider=ollama(?:;|$)/.test(cookie) ? "ollama" : "openrouter";
}
/**
* One-shot completion for the AI sandbox. The app does auth + validation + reads
* the provider toggle; fusion-ai's `generateText` owns the AI side and returns
* null when the provider isn't configured.
*/
export const runAiPrompt = createServerFn({ method: "POST" })
.inputValidator(z.object({ prompt: z.string().min(1).max(4000) }).parse)
.handler(async ({ data }): Promise<AiSandboxResult> => {
const session = await getSession();
if (!session) throw new Response("Unauthorized", { status: 401 });
const result = await generateText(data.prompt, {
provider: providerFromCookie(),
systemPrompts: ["You are a helpful assistant. Answer concisely."],
});
return result ? { configured: true, ...result } : { configured: false };
});streamChat(request, opts) — the whole server side of a streaming chat
endpoint. It picks the provider, parses the useChat request, merges in any
client-declared tools, runs the model, and returns a Server-Sent-Events Response.
A route handler just authenticates and hands off:
import { createFileRoute } from "@tanstack/react-router";
import { streamChat } from "@tikab-interactive/fusion-ai";
import { getSession } from "#/lib/auth";
// Plain streaming chat — the foundation the streaming + persistence sandboxes use.
// The app does auth; fusion-ai's streamChat does the whole AI side (adapter +
// parse + SSE).
const SYSTEM = "You are a helpful assistant. Keep answers concise.";
export const Route = createFileRoute("/api/sandbox/chat")({
server: {
handlers: {
POST: async ({ request }) => {
const session = await getSession();
if (!session) return new Response("Unauthorized", { status: 401 });
return streamChat(request, { systemPrompts: [SYSTEM] });
},
},
},
});Add tools (from toolDefinition().server(...)) and agentLoopStrategy: maxIterations(n) to turn it into a tool-calling agent loop.
There's also generateObject(prompt, { schema, … }) — a structured one-shot that
forces the model to return a value matching a Zod schema (@tanstack/ai enforces
outputSchema). It's the backbone of the proactive agent's findings; on a schema miss it
does one bounded re-ask, then fails cleanly rather than delivering garbage.
The request flow — a Carola chat, end to end
A message in the home-screen assistant is the realest path through the seam. The endpoint
is src/routes/api/agent/chat.ts (a TanStack Start file route) and it is the
conversational front door to Carola / the proactive agent.
Walking the route handler (POST in chat.ts):
- Auth.
getSession()→401if absent. Everything downstream keys offsession.user.id. - Assemble the system prompt. The route concatenates a stack of blocks into one
systemstring: the persona (Carola's voice), a scope block that changes what she may touch (general= no project data, ChatGPT-style;project/portfolio= the building-aware assistant), a remembered-facts block (durable user memories injected up front so she addresses you right without a tool call), a docs block naming the files attached to this conversation, an ambient-context block (the page/selection you're on — biases retrieval, never widens access), and an output-language line. This is the prompt-engineering layer, and it lives in the app, not the package — the package stays domain-agnostic. - Build the tools.
createAgentAssistantTools(actor, scope.kind, threadId)(src/lib/agent-assistant.ts) returns a set oftoolDefinition().server(...)tools —list_recent_findings,list_upcoming_deadlines,search_agent_memory,remember,run_agent_check_now,search_project_wiki,search_project_news,show_project_map, and (when the chat has attachments)search_attached_documents. Every tool is owner-scoped tosession.user.id— that's the security boundary; a tool can never read another user's rows, so injected ambient context can't widen access. - Hand off. The route calls
streamChat(request, { systemPrompts: [system], tools, agentLoopStrategy: maxIterations(6) })and returns its Response directly.
Inside streamChat (src/index.ts), the package does the AI plumbing:
providerFromRequest(request)resolves the provider from the cookie (→ OpenRouter by default);getTextAdapterbuilds the adapter, or returns503if it's unconfigured.chatParamsFromRequest(request)parses theuseChatPOST body intomessagesplus any client-declared tools the browser sent (e.g. the page'sset_color_scheme, which runs in the browser to flip the theme).mergeAgentTools(serverTools, clientTools)unions the two tool sets, so the model sees server tools (DB reads) and client tools (browser actions) in one list.chat({ adapter, messages, systemPrompts, tools, agentLoopStrategy })runs the loop: the model may call tools, the harness feeds results back, up tomaxIterations(6)times, then it produces the answer.toServerSentEventsResponse(...)wraps the token stream as SSE, whichuseChaton the client consumes to render the reply token-by-token.
Coming from React? The client side is Vercel-style
useChatfrom@tanstack/ai, living inside fusion-ui'sChat. You don't write the fetch or the SSE parsing — the hook POSTs to your route and streams the response. What's unusual is the server tools: the route declares functions the model may invoke mid-stream, they execute on the server against the DB, and their results re-enter the model's context. That's the agent loop, and it's the samestreamChatwhether the tool rolls a die in a sandbox or reads your real deadlines.
The proactive-agent harness
Beyond one-shot and chat, fusion-ai ships a proactive-agent harness at the
@tikab-interactive/fusion-ai/agent sub-path: a deterministic loop (injected clock,
model, run-store and delivery ports) that wakes on a schedule, runs checks, asks
the model for structured findings, and decides notify-now / digest / stay-quiet.
The only place it talks to an LLM is createFindingModel (src/agent/model.ts), which is
just generateObject under the hood — it turns a tick's gathered checks into
schema-valid Finding[]. Two details worth knowing if you debug it:
- Prompt-injection fencing. Signal content (gathered from external sources) is
untrusted, so it's wrapped in a
<signal>fence andsanitizeSignal(src/agent/guards.ts) strips any tag that could close the fence or impersonate the instruction channel. The<rules>block tells the model to treat<signal>as data, never instructions. - Bounded re-ask. Weak local models often mis-shape JSON; on a schema miss the model
does exactly one corrective re-ask, then throws
FindingValidationError— it never delivers malformed findings, never crashes the worker.
It's schema-agnostic (the app injects its DB + tables) and falls back to the demo model when no provider is set. The example wires it end-to-end — see Proactive agent and the chat-first home.
Embeddings
fusion-ai also exposes embed() (src/embeddings.ts) — the vector half of the seam,
re-exported from both the root and /agent. It uses the same two providers and
precedence as chat, so whenever a chat provider is configured, embeddings work too with no
extra key:
- Ollama →
POST /api/embed, modelOLLAMA_EMBED_MODEL(defaultnomic-embed-text, 768-dim). The on-prem / air-gapped default. - OpenRouter →
POST /api/v1/embeddings(OpenAI-compatible), modelOPENROUTER_EMBED_MODEL(defaultopenai/text-embedding-3-small). It passesdimensions: EMBED_DIMso the output matches regardless of model.
EMBED_PROVIDER forces one independently of the chat provider — useful when you run
Ollama for chat but want hosted embeddings (or vice-versa). Two invariants matter:
EMBED_DIM = 768is pinned and must match thevector(768)columns in the app schema (example/src/db/schema/agent.ts AGENT_EMBED_DIM). Changing it needs a DB migration — this is the most common "embeddings silently broke" trap.embed()returnsnull, never throws, on a missing provider or a failed call. Callers that write embeddings (memory, the search index, document chunks) store the row withembedding_pending = trueand a back-fill job re-embeds it later (src/lib/agent-memory-jobs.ts). An embedding failure must never make a row silently unsearchable — it just stays keyword-only until Ollama comes online.
Where the vectors feed:
| Consumer | What it embeds | Where |
|---|---|---|
| Agent memory | durable facts about the user (pgvector) | agent-store.ts, agent-memory-jobs.ts |
| Document RAG | attached-file chunks → cited excerpts | chat-document-search.ts |
| Universal search | every indexable entity → one search_index | search-index-server.ts, universal-search-server.ts |
Each retrieval site embeds the query the same way and ranks by pgvector cosine
distance, then falls back to keyword / trigram when there's no query vector — so "find
the exact value" works air-gapped or before the back-fill finishes
(searchChatDocuments in chat-document-search.ts is the clearest example).
Residency — local-only vs any
A classified / air-gapped tenant must never let data reach a cloud provider, even as a fallback. The seam enforces this with a residency lock, not a convention:
AGENT_RESIDENCY=local-only(read byagentResidency()insrc/lib/agent-config.ts) declares the policy for the deployment. Default is"any".residencyProvider(residency)(src/agent/guards.ts) returns"ollama"forlocal-only, elseundefined.createFindingModelpassesresidencyProvider(...) ?? opts.provideras the provider — so a local-only tenant is forced to Ollama regardless of what env precedence or a cookie would pick.embed(texts, { residency })does the same on the vector side: underlocal-onlyit uses Ollama ifOLLAMA_BASE_URLis set, otherwise returnsnull(keyword-only) — it will never call OpenRouter. Every embedding caller passes{ residency: agentResidency() }.agentChatProvider()returnsundefinedunderlocal-only(so the harness lock takes over) but prefers OpenRouter otherwise — because structured findings need a capable model, and the small local models tend to return prose instead of schema-valid JSON.
So pinning a deploy to local is one env var (AGENT_RESIDENCY=local-only) plus a
configured OLLAMA_BASE_URL: chat, findings, and embeddings all stay on the box, and the
cloud path is unreachable by construction — not just unselected. With no Ollama reachable
under that policy, embeddings degrade to keyword search rather than leaking to the cloud.
No cloud infrastructure is provisioned for AI either — it's purely env-driven, so the same build runs with or without it, and on a closed network you point it at a local Ollama.
In the example
Seven sandboxes exercise the AI features through the production
fusion-ui chat components (Chat, ChatWidget, PromptInput, AiResponse),
backed by API route handlers under src/routes/api/sandbox/ that call streamChat:
| Sandbox | Shows |
|---|---|
/sandbox/ai | one-shot prompt → answer (generateText) |
/sandbox/streaming | token-by-token chat (useChat + SSE) |
/sandbox/tools | tool-calling agent loop (roll_dice, get_current_time) |
/sandbox/reasoning | a reasoning model streaming its "thinking" separately |
/sandbox/persistence | a chat thread that survives reload (localStorage) |
/sandbox/generative-ui | a tool call rendering a live bar chart inline |
/sandbox/assistant | a floating widget whose client tool flips the app theme |
The real production path is the Carola endpoint above — the sandboxes are the same streamChat
seam with smaller, self-contained tool sets, so they're the fastest place to watch the
agent loop and the provider toggle in isolation.
Ollama cheat sheet
Ollama runs models locally and serves an HTTP API on
http://localhost:11434 — exactly the OLLAMA_BASE_URL the app expects.
# Install — macOS: brew (or the app from ollama.com/download)
brew install ollama
# Linux:
curl -fsSL https://ollama.com/install.sh | sh
ollama serve # start the API on :11434 (skip if the desktop app already runs it)
ollama pull llama3.2:1b # a small, laptop-friendly model
ollama run llama3.2:1b # pull-if-needed + chat in the terminal to smoke-test| Command | What it does |
|---|---|
ollama list | models you've pulled |
ollama ps | models currently loaded in memory |
ollama pull <model> | download a model |
ollama run <model> | chat with it (pulls first if missing) |
ollama rm <model> | delete a model |
Then point the app at it (the toggle's Ollama side, or env precedence) and open a sandbox:
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2:1b # the package default is llama3.3:70b — too big for most laptopsFor embeddings on the same daemon, pull an embedding model too
(ollama pull nomic-embed-text) — that's the OLLAMA_EMBED_MODEL default, and it's what
RAG, agent memory, and universal search will use.
fusion-ai's default llama3.3:70b needs a serious GPU (~40 GB); on a laptop set
OLLAMA_MODEL to something small like llama3.2:1b (~1 GB). Confirm the daemon is
up with curl http://localhost:11434/api/tags.