AI

fusion-ai is the stack's AI seam. It's provider-pluggable — local Ollama or hosted OpenRouter — chosen from the environment or, per request, from a cookie. It's also the one AI import: it re-exports the @tanstack/ai pieces you need (toolDefinition, maxIterations) so features build against fusion-ai, not the underlying library. When no provider is configured it returns null (one-shot) or a 503 (streaming), so the AI path is always present and degrades gracefully.

Loading diagram...

Coming from Django? There's no settings.LLM_BACKEND, no model registry, no Celery task wrapping the call. fusion-ai is a thin function module (src/index.ts): it reads process.env at call time, picks a @tanstack/ai adapter, and runs it. The "config" is four env vars and a cookie. No singleton, no app-ready hook — the seam is stateless, so a request that flips the provider cookie hits a different backend with no restart.

The provider seam

Everything funnels through src/index.ts in fusion-ai. There are two real backends, each a @tanstack/ai adapter, and the package never imports either model SDK at the top of your feature code — it owns that dependency:

Ollama (@tanstack/ai-ollama → createOllamaChat) — local / on-prem inference against an HTTP daemon (http://localhost:11434). The air-gapped default.
OpenRouter (@tanstack/ai-openrouter → openRouterText) — hosted inference over a single API key, fronting Anthropic / OpenAI / etc.

What selects a provider

Three functions, in order of authority:

envProvider() — the baseline. OLLAMA_BASE_URL set ⇒ "ollama"; else OPENROUTER_API_KEY set ⇒ "openrouter"; else null. Ollama wins when both are set — an explicit OLLAMA_BASE_URL reads as air-gap intent, so a box that has both defaults to staying local.
providerFromRequest(request) — the per-request override streamChat uses. It regex-matches the ai_provider cookie off the request headers and defaults to OpenRouter when the cookie is absent. This is what the sandbox provider toggle flips — switch backends with no reload.
An explicit provider option — pass { provider: "ollama" } to any entry point and it wins outright. The proactive agent uses this to pin a provider for residency (below).

getTextAdapter({ provider, model }) resolves the choice to a concrete adapter: createOllamaChat(model, url) for Ollama, openRouterText(model) for OpenRouter — or null when the chosen provider's env var is missing. That null is the whole graceful-degradation story: every entry point checks it and falls back rather than throwing.

// src/index.ts — the resolver, condensed
export function getTextAdapter(options: AdapterOptions = {}) {
	const provider = options.provider ?? envProvider();
	if (provider === "ollama") {
		const url = process.env.OLLAMA_BASE_URL;
		return url ? createOllamaChat(modelFor("ollama", options.model), url) : null;
	}
	if (provider === "openrouter") {
		return process.env.OPENROUTER_API_KEY
			? openRouterText(modelFor("openrouter", options.model))
			: null;
	}
	return null; // nothing configured → caller degrades
}

The model id

modelFor(provider, override) picks the model string: an explicit model option wins, else the provider's env var (OLLAMA_MODEL / OPENROUTER_MODEL), else a baked-in default from DEFAULT_MODELS. aiModelId() exposes the resolved id (and returns "stub" when AI is off), so a response can tell the UI which model answered.

Var	Effect
`OPENROUTER_API_KEY`	enable hosted inference (OpenRouter)
`OPENROUTER_MODEL`	hosted model — default `anthropic/claude-haiku-4.5`
`OLLAMA_BASE_URL`	enable local inference, e.g. `http://localhost:11434`
`OLLAMA_MODEL`	local model — default `llama3.3:70b` (sized for a GPU box)
`OPENROUTER_REASONING_MODEL` / `OLLAMA_REASONING_MODEL`	model used by the reasoning sandbox

The hosted default is Haiku, not Opus — the deployments are small, so fusion-ai favours cheap + tool-capable and lets you bump OPENROUTER_MODEL when a task needs more power. The Ollama default llama3.3:70b is a GPU-box flagship; on a laptop set OLLAMA_MODEL=llama3.2:1b (cheat sheet at the bottom).

Graceful degradation — the app always runs

isAiEnabled() is just Boolean(OLLAMA_BASE_URL || OPENROUTER_API_KEY). With neither set, getTextAdapter returns null everywhere, and each entry point has a defined no-AI behaviour:

generateText / generateObject → return null; the caller renders a "configure AI" affordance instead of erroring (see src/lib/ai-server.ts, which maps null to { configured: false }).
streamChat → returns a 503 Response with the provider name, so the chat UI shows it's off rather than hanging.
the proactive agent → swaps the LLM finding-model for a deterministic demo model (demoFindingModel in src/lib/agent-config.ts) that emits a canned finding for any signal tagged [URGENT]. The whole feature is demoable and e2e-testable with no LLM at all.

So the same build boots and serves with zero AI config — handy for CI, a closed network, or a first-run developer who hasn't touched .env.local.

Two entry points

generateText(prompt, opts) — one shot, returns { text, model } or null. The app does auth and reads the provider toggle; fusion-ai owns the AI side. Internally it builds a single user ModelMessage, runs chat({ adapter, messages, systemPrompts }), and collects the full reply with streamToText (no streaming to the client):

example/src/lib/ai-server.ts

import { getRequestHeaders } from "@tanstack/react-start/server";
import { createServerFn } from "@tanstack/react-start";
import { z } from "zod";
 
import { type AiProvider, generateText } from "@tikab-interactive/fusion-ai";
 
import { getSession } from "#/lib/auth";
 
/**
 * Result of the AI sandbox prompt. `configured: false` is the graceful-degrade
 * path — fusion-ai returned no adapter (the chosen provider isn't configured) —
 * so the page shows how to turn AI on instead of erroring.
 */
export type AiSandboxResult =
	| { configured: false }
	| { configured: true; model: string; text: string };
 
/** The provider the sandbox toggle chose (the `ai_provider` cookie), default OpenRouter. */
function providerFromCookie(): AiProvider {
	const cookie = new Headers(getRequestHeaders()).get("cookie") ?? "";
	return /(?:^|;\s*)ai_provider=ollama(?:;|$)/.test(cookie) ? "ollama" : "openrouter";
}
 
/**
 * One-shot completion for the AI sandbox. The app does auth + validation + reads
 * the provider toggle; fusion-ai's `generateText` owns the AI side and returns
 * null when the provider isn't configured.
 */
export const runAiPrompt = createServerFn({ method: "POST" })
	.inputValidator(z.object({ prompt: z.string().min(1).max(4000) }).parse)
	.handler(async ({ data }): Promise<AiSandboxResult> => {
		const session = await getSession();
		if (!session) throw new Response("Unauthorized", { status: 401 });
 
		const result = await generateText(data.prompt, {
			provider: providerFromCookie(),
			systemPrompts: ["You are a helpful assistant. Answer concisely."],
		});
		return result ? { configured: true, ...result } : { configured: false };
	});

streamChat(request, opts) — the whole server side of a streaming chat endpoint. It picks the provider, parses the useChat request, merges in any client-declared tools, runs the model, and returns a Server-Sent-Events Response. A route handler just authenticates and hands off:

example/src/routes/api/·/chat.ts — a minimal streaming route

import { createFileRoute } from "@tanstack/react-router";
import { streamChat } from "@tikab-interactive/fusion-ai";
 
import { getSession } from "#/lib/auth";
 
// Plain streaming chat — the foundation the streaming + persistence sandboxes use.
// The app does auth; fusion-ai's streamChat does the whole AI side (adapter +
// parse + SSE).
const SYSTEM = "You are a helpful assistant. Keep answers concise.";
 
export const Route = createFileRoute("/api/sandbox/chat")({
	server: {
		handlers: {
			POST: async ({ request }) => {
				const session = await getSession();
				if (!session) return new Response("Unauthorized", { status: 401 });
				return streamChat(request, { systemPrompts: [SYSTEM] });
			},
		},
	},
});

Add tools (from toolDefinition().server(...)) and agentLoopStrategy: maxIterations(n) to turn it into a tool-calling agent loop.

There's also generateObject(prompt, { schema, … }) — a structured one-shot that forces the model to return a value matching a Zod schema (@tanstack/ai enforces outputSchema). It's the backbone of the proactive agent's findings; on a schema miss it does one bounded re-ask, then fails cleanly rather than delivering garbage.

The request flow — a Carola chat, end to end

A message in the home-screen assistant is the realest path through the seam. The endpoint is src/routes/api/agent/chat.ts (a TanStack Start file route) and it is the conversational front door to Carola / the proactive agent.

Loading diagram...

Walking the route handler (POST in chat.ts):

Auth. getSession() → 401 if absent. Everything downstream keys off session.user.id.
Assemble the system prompt. The route concatenates a stack of blocks into one system string: the persona (Carola's voice), a scope block that changes what she may touch (general = no project data, ChatGPT-style; project/portfolio = the building-aware assistant), a remembered-facts block (durable user memories injected up front so she addresses you right without a tool call), a docs block naming the files attached to this conversation, an ambient-context block (the page/selection you're on — biases retrieval, never widens access), and an output-language line. This is the prompt-engineering layer, and it lives in the app, not the package — the package stays domain-agnostic.
Build the tools. createAgentAssistantTools(actor, scope.kind, threadId) (src/lib/agent-assistant.ts) returns a set of toolDefinition().server(...) tools — list_recent_findings, list_upcoming_deadlines, search_agent_memory, remember, run_agent_check_now, search_project_wiki, search_project_news, show_project_map, and (when the chat has attachments) search_attached_documents. Every tool is owner-scoped to session.user.id — that's the security boundary; a tool can never read another user's rows, so injected ambient context can't widen access.
Hand off. The route calls streamChat(request, { systemPrompts: [system], tools, agentLoopStrategy: maxIterations(6) }) and returns its Response directly.

Inside streamChat (src/index.ts), the package does the AI plumbing:

providerFromRequest(request) resolves the provider from the cookie (→ OpenRouter by default); getTextAdapter builds the adapter, or returns 503 if it's unconfigured.
chatParamsFromRequest(request) parses the useChat POST body into messages plus any client-declared tools the browser sent (e.g. the page's set_color_scheme, which runs in the browser to flip the theme).
mergeAgentTools(serverTools, clientTools) unions the two tool sets, so the model sees server tools (DB reads) and client tools (browser actions) in one list.
chat({ adapter, messages, systemPrompts, tools, agentLoopStrategy }) runs the loop: the model may call tools, the harness feeds results back, up to maxIterations(6) times, then it produces the answer.
toServerSentEventsResponse(...) wraps the token stream as SSE, which useChat on the client consumes to render the reply token-by-token.

Coming from React? The client side is Vercel-style useChat from @tanstack/ai, living inside fusion-ui's Chat. You don't write the fetch or the SSE parsing — the hook POSTs to your route and streams the response. What's unusual is the server tools: the route declares functions the model may invoke mid-stream, they execute on the server against the DB, and their results re-enter the model's context. That's the agent loop, and it's the same streamChat whether the tool rolls a die in a sandbox or reads your real deadlines.

The proactive-agent harness

Beyond one-shot and chat, fusion-ai ships a proactive-agent harness at the @tikab-interactive/fusion-ai/agent sub-path: a deterministic loop (injected clock, model, run-store and delivery ports) that wakes on a schedule, runs checks, asks the model for structured findings, and decides notify-now / digest / stay-quiet.

The only place it talks to an LLM is createFindingModel (src/agent/model.ts), which is just generateObject under the hood — it turns a tick's gathered checks into schema-valid Finding[]. Two details worth knowing if you debug it:

Prompt-injection fencing. Signal content (gathered from external sources) is untrusted, so it's wrapped in a <signal> fence and sanitizeSignal (src/agent/guards.ts) strips any tag that could close the fence or impersonate the instruction channel. The <rules> block tells the model to treat <signal> as data, never instructions.
Bounded re-ask. Weak local models often mis-shape JSON; on a schema miss the model does exactly one corrective re-ask, then throws FindingValidationError — it never delivers malformed findings, never crashes the worker.

It's schema-agnostic (the app injects its DB + tables) and falls back to the demo model when no provider is set. The example wires it end-to-end — see Proactive agent and the chat-first home.

Embeddings

fusion-ai also exposes embed() (src/embeddings.ts) — the vector half of the seam, re-exported from both the root and /agent. It uses the same two providers and precedence as chat, so whenever a chat provider is configured, embeddings work too with no extra key:

Ollama → POST /api/embed, model OLLAMA_EMBED_MODEL (default nomic-embed-text, 768-dim). The on-prem / air-gapped default.
OpenRouter → POST /api/v1/embeddings (OpenAI-compatible), model OPENROUTER_EMBED_MODEL (default openai/text-embedding-3-small). It passes dimensions: EMBED_DIM so the output matches regardless of model.

EMBED_PROVIDER forces one independently of the chat provider — useful when you run Ollama for chat but want hosted embeddings (or vice-versa). Two invariants matter:

EMBED_DIM = 768 is pinned and must match the vector(768) columns in the app schema (example/src/db/schema/agent.ts AGENT_EMBED_DIM). Changing it needs a DB migration — this is the most common "embeddings silently broke" trap.
embed() returns null, never throws, on a missing provider or a failed call. Callers that write embeddings (memory, the search index, document chunks) store the row with embedding_pending = true and a back-fill job re-embeds it later (src/lib/agent-memory-jobs.ts). An embedding failure must never make a row silently unsearchable — it just stays keyword-only until Ollama comes online.

Where the vectors feed:

Consumer	What it embeds	Where
Agent memory	durable facts about the user (pgvector)	`agent-store.ts`, `agent-memory-jobs.ts`
Document RAG	attached-file chunks → cited excerpts	`chat-document-search.ts`
Universal search	every indexable entity → one `search_index`	`search-index-server.ts`, `universal-search-server.ts`

Each retrieval site embeds the query the same way and ranks by pgvector cosine distance, then falls back to keyword / trigram when there's no query vector — so "find the exact value" works air-gapped or before the back-fill finishes (searchChatDocuments in chat-document-search.ts is the clearest example).

Residency — local-only vs any

A classified / air-gapped tenant must never let data reach a cloud provider, even as a fallback. The seam enforces this with a residency lock, not a convention:

AGENT_RESIDENCY=local-only (read by agentResidency() in src/lib/agent-config.ts) declares the policy for the deployment. Default is "any".
residencyProvider(residency) (src/agent/guards.ts) returns "ollama" for local-only, else undefined. createFindingModel passes residencyProvider(...) ?? opts.provider as the provider — so a local-only tenant is forced to Ollama regardless of what env precedence or a cookie would pick.
embed(texts, { residency }) does the same on the vector side: under local-only it uses Ollama if OLLAMA_BASE_URL is set, otherwise returns null (keyword-only) — it will never call OpenRouter. Every embedding caller passes { residency: agentResidency() }.
agentChatProvider() returns undefined under local-only (so the harness lock takes over) but prefers OpenRouter otherwise — because structured findings need a capable model, and the small local models tend to return prose instead of schema-valid JSON.

So pinning a deploy to local is one env var (AGENT_RESIDENCY=local-only) plus a configured OLLAMA_BASE_URL: chat, findings, and embeddings all stay on the box, and the cloud path is unreachable by construction — not just unselected. With no Ollama reachable under that policy, embeddings degrade to keyword search rather than leaking to the cloud.

No cloud infrastructure is provisioned for AI either — it's purely env-driven, so the same build runs with or without it, and on a closed network you point it at a local Ollama.

In the example

Seven sandboxes exercise the AI features through the production fusion-ui chat components (Chat, ChatWidget, PromptInput, AiResponse), backed by API route handlers under src/routes/api/sandbox/ that call streamChat:

Sandbox	Shows
`/sandbox/ai`	one-shot prompt → answer (`generateText`)
`/sandbox/streaming`	token-by-token chat (`useChat` + SSE)
`/sandbox/tools`	tool-calling agent loop (`roll_dice`, `get_current_time`)
`/sandbox/reasoning`	a reasoning model streaming its "thinking" separately
`/sandbox/persistence`	a chat thread that survives reload (localStorage)
`/sandbox/generative-ui`	a tool call rendering a live bar chart inline
`/sandbox/assistant`	a floating widget whose client tool flips the app theme

The real production path is the Carola endpoint above — the sandboxes are the same streamChat seam with smaller, self-contained tool sets, so they're the fastest place to watch the agent loop and the provider toggle in isolation.

Ollama cheat sheet

Ollama runs models locally and serves an HTTP API on http://localhost:11434 — exactly the OLLAMA_BASE_URL the app expects.

# Install — macOS: brew (or the app from ollama.com/download)
brew install ollama
# Linux:
curl -fsSL https://ollama.com/install.sh | sh
 
ollama serve            # start the API on :11434 (skip if the desktop app already runs it)
ollama pull llama3.2:1b # a small, laptop-friendly model
ollama run llama3.2:1b  # pull-if-needed + chat in the terminal to smoke-test

Command	What it does
`ollama list`	models you've pulled
`ollama ps`	models currently loaded in memory
`ollama pull <model>`	download a model
`ollama run <model>`	chat with it (pulls first if missing)
`ollama rm <model>`	delete a model

Then point the app at it (the toggle's Ollama side, or env precedence) and open a sandbox:

example/.env.local

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2:1b   # the package default is llama3.3:70b — too big for most laptops

For embeddings on the same daemon, pull an embedding model too (ollama pull nomic-embed-text) — that's the OLLAMA_EMBED_MODEL default, and it's what RAG, agent memory, and universal search will use.

fusion-ai's default llama3.3:70b needs a serious GPU (~40 GB); on a laptop set OLLAMA_MODEL to something small like llama3.2:1b (~1 GB). Confirm the daemon is up with curl http://localhost:11434/api/tags.