Use streaming when the user should see tokens as they arrive rather than waiting for the full response. Persist chat state separately in KV, writing it only after the stream finishes.
Minimal Route
The smallest-complete shape: read recent history, call the Gateway with stream: true, and return the SSE stream.
import { Hono } from 'hono';
import { AIGatewayClient } from '@agentuity/aigateway';
import { KeyValueClient } from '@agentuity/keyvalue';
interface ChatMessage {
role: 'user' | 'assistant';
content: string;
}
const kv = new KeyValueClient();
const gateway = new AIGatewayClient();
const app = new Hono();
const DEFAULT_MODEL = 'deepseek/deepseek-v4-flash';
app.post('/api/chat', async (c) => {
const { conversationId, message, model = DEFAULT_MODEL } = await c.req.json<{
conversationId: string;
message: string;
model?: string;
}>();
const stored = await kv.get<ChatMessage[]>('chat-history', conversationId);
const messages: ChatMessage[] = stored.exists ? (stored.data ?? []) : [];
const next = [...messages, { role: 'user' as const, content: message }];
const { stream } = await gateway.streamRequest({
path: '/',
body: { model, stream: true, messages: next },
});
return new Response(stream, {
headers: { 'content-type': 'text/event-stream' },
});
});
export default app;streamRequest() passes through the Gateway SSE stream. The SDK key authenticates the request; the model value is normal app input or configuration.
Complete Hono Route with KV History
This route reads stored history, streams the next answer, then appends the assistant turn.
npm install hono @agentuity/aigateway @agentuity/keyvalue valibot @agentuity/telemetryimport { Hono } from 'hono';
import { AIGatewayClient } from '@agentuity/aigateway';
import { KeyValueClient } from '@agentuity/keyvalue';
import { logger } from '@agentuity/telemetry';
import * as v from 'valibot';
const inputSchema = v.object({
conversationId: v.string(),
message: v.string(),
model: v.optional(v.string()),
});
const messageSchema = v.object({
role: v.picklist(['user', 'assistant']),
content: v.string(),
});
const historySchema = v.array(messageSchema);
type ChatMessage = v.InferOutput<typeof messageSchema>;
const kv = new KeyValueClient();
const gateway = new AIGatewayClient();
const app = new Hono();
const DEFAULT_MODEL = 'deepseek/deepseek-v4-flash';
app.post('/api/chat', async (c) => {
const body: unknown = await c.req.json();
const input = v.parse(inputSchema, body);
const model = input.model ?? DEFAULT_MODEL;
const stored = await kv.get<unknown>('chat-history', input.conversationId);
const history: ChatMessage[] = stored.exists
? v.parse(historySchema, stored.data)
: [];
const userMessage: ChatMessage = { role: 'user', content: input.message };
const nextHistory = [...history, userMessage];
const { stream } = await gateway.streamRequest({
path: '/',
body: {
model,
stream: true,
messages: [
{ role: 'system', content: 'You are a concise product support assistant.' },
...nextHistory,
],
},
});
const [clientStream, historyStream] = stream.tee();
void saveAssistantTurn(input.conversationId, nextHistory, historyStream).catch((error) => {
logger.error('conversation save failed', { conversationId: input.conversationId, error });
});
return new Response(clientStream, {
headers: { 'content-type': 'text/event-stream' },
});
});
async function saveAssistantTurn(
conversationId: string,
nextHistory: readonly ChatMessage[],
stream: ReadableStream<Uint8Array>,
): Promise<void> {
const text = await readOpenAICompatibleStreamText(stream);
if (!text) return;
await kv.set(
'chat-history',
conversationId,
[...nextHistory, { role: 'assistant', content: text }],
{ ttl: 60 * 60 * 24 * 30 }, // 30-day TTL
);
logger.info('conversation saved', {
conversationId,
turns: nextHistory.length + 1,
});
}
async function readOpenAICompatibleStreamText(
stream: ReadableStream<Uint8Array>,
): Promise<string> {
const reader = stream.pipeThrough(new TextDecoderStream()).getReader();
let buffer = '';
let text = '';
for (;;) {
const { value, done } = await reader.read();
if (done) break;
buffer += value;
const frames = buffer.split(/\r?\n\r?\n/);
buffer = frames.pop() ?? '';
text += frames.map(readFrameText).join('');
}
return text + readFrameText(buffer);
}
function readFrameText(frame: string): string {
const data = frame
.split(/\r?\n/)
.filter((line) => line.startsWith('data:'))
.map((line) => line.slice(5).trimStart())
.join('\n')
.trim();
if (!data || data === '[DONE]') return '';
try {
return readDeltaText(JSON.parse(data));
} catch {
return '';
}
}
function readDeltaText(event: unknown): string {
if (!isRecord(event)) return '';
const choices = event.choices;
if (!Array.isArray(choices)) return '';
return choices
.map((choice) => {
if (!isRecord(choice)) return '';
const delta = choice.delta;
if (!isRecord(delta)) return '';
const content = delta.content;
return typeof content === 'string' ? content : '';
})
.join('');
}
function isRecord(value: unknown): value is Record<string, unknown> {
return typeof value === 'object' && value !== null;
}
export default app;streamRequest() returns a standard Web stream, so this works in Hono and any framework that accepts Web Response objects. KeyValueClient and AIGatewayClient can both authenticate from AGENTUITY_SDK_KEY. The parser above stores text from OpenAI-compatible SSE frames; if you choose a model with a different stream event shape, parse that provider's frame format before writing assistant history.
In @agentuity/hono routes use c.var.logger. In standalone framework apps like this one, import logger from @agentuity/telemetry instead.
Read the Stream in the Browser
For a custom UI, read the response body and append chunks as they arrive. This is the lowest-level browser shape.
async function sendMessage(conversationId: string, message: string): Promise<string> {
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify({ conversationId, message }),
});
if (!response.ok) {
throw new Error(`Chat request failed with ${response.status}`);
}
if (!response.body) {
throw new Error('Chat response did not include a body');
}
const reader = response.body.pipeThrough(new TextDecoderStream()).getReader();
let assistantMessage = '';
let buffer = '';
for (;;) {
const { value, done } = await reader.read();
if (done) break;
buffer += value;
const frames = buffer.split(/\r?\n\r?\n/);
buffer = frames.pop() ?? '';
assistantMessage += frames.map(readGatewayFrameText).join('');
}
return assistantMessage + readGatewayFrameText(buffer);
}
function readGatewayFrameText(frame: string): string {
const data = frame
.split(/\r?\n/)
.filter((line) => line.startsWith('data:'))
.map((line) => line.slice(5).trimStart())
.join('\n')
.trim();
if (!data || data === '[DONE]') return '';
try {
return readGatewayDeltaText(JSON.parse(data));
} catch {
return '';
}
}
function readGatewayDeltaText(event: unknown): string {
if (!isRecord(event)) return '';
const choices = event.choices;
if (!Array.isArray(choices)) return '';
return choices
.map((choice) => {
if (!isRecord(choice)) return '';
const delta = choice.delta;
if (!isRecord(delta)) return '';
const content = delta.content;
return typeof content === 'string' ? content : '';
})
.join('');
}
function isRecord(value: unknown): value is Record<string, unknown> {
return typeof value === 'object' && value !== null;
}The Gateway returns provider-compatible SSE frames, not plain text chunks. If the selected model uses OpenAI-compatible frames, reuse readFrameText() from the route example before appending to the UI. If you use AI SDK UI helpers on the frontend, use an AI SDK route with toUIMessageStreamResponse() so the stream format matches the UI package.
Stream with an AI SDK Provider
The direct Gateway stream above is the default path for app-owned streaming routes. When the frontend expects AI SDK UI message streams, keep the route on AI SDK and use one provider package. This example uses the Anthropic provider; under agentuity dev, Anthropic SDK env wiring can route through AI Gateway when no provider key override is set.
npm install ai@latest @ai-sdk/anthropic@latestimport { anthropic } from '@ai-sdk/anthropic';
import { streamText } from 'ai';
export async function streamWithAnthropic(message: string): Promise<Response> {
const model = process.env.ANTHROPIC_MODEL;
if (!model) {
throw new Error('Set ANTHROPIC_MODEL to the model this route should use.');
}
const result = streamText({
model: anthropic(model),
system: 'You are a concise product support assistant.',
prompt: message,
});
return result.toTextStreamResponse();
}Keep model IDs in configuration so each route can use the provider and tier that fits the task. Use the AI Gateway model catalog for direct Gateway calls, or the provider's model docs for provider SDK calls.
Smoke Test with curl
curl -N http://127.0.0.1:3000/api/chat \
-H "content-type: application/json" \
-d '{"conversationId":"demo","message":"Summarize Agentuity in one sentence"}'-N disables output buffering so you see chunks as they arrive.
Choose the Right Stream Shape
| Stream shape | Use it when |
|---|---|
| Gateway SSE stream | route calls AIGatewayClient.streamRequest() directly |
toTextStreamResponse() | AI SDK route returns plain text chunks |
toUIMessageStreamResponse() | AI SDK frontend needs tool calls, usage, finish reason |
| Durable Streams | output must survive a page refresh or be replayed later |
| Server-sent events | you need named events such as status, chunk, and done |
Durable streams persist to storage and return a URL. Use them when the generated content is large, long-running, or must remain available after the HTTP connection closes.
Keep History Bounded
Sending an unlimited transcript to the model on every request increases cost and latency. Keep a rolling summary and the most recent turns instead of the full history.
interface ChatMemory {
readonly summary: string;
readonly recent: readonly ChatMessage[];
}
// After the stream collector finishes, replace the flat history with a bounded memory object
const memory: ChatMemory = {
summary: existingSummary, // update periodically with a summarization call
recent: allMessages.slice(-8), // keep last 8 turns
};
await kv.set('chat-memory', input.conversationId, memory, {
ttl: 60 * 60 * 24 * 30,
});Store a plain-text summary of older turns plus the last N messages. Pass both to the model as context: the summary as a system prompt addendum, the recent turns as the messages array. Update the summary every N turns with a separate completeText() call.
Common Gotchas
| Symptom | Cause | Fix |
|---|---|---|
| History written even when stream fails | Saving to KV before the stream drains | Tee the stream and write history after the collector finishes |
| Context window errors after long conversations | Sending full history every request | Use bounded memory (summary + recent turns) |
| Client sees no chunks | Response buffered by middleware or proxy | Verify Content-Type: text/event-stream and no buffering layer |
| Tool calls missing from client | Reading the Gateway SSE stream as plain text | Parse the provider-compatible stream event shape, or use an AI SDK route when the frontend expects AI SDK UI messages |
| KV TTL errors | TTL below minimum (60 s) or above maximum (365 days) | Use ttl: null or ttl: 0 for never-expire; otherwise pass a value in [60, 31_536_000] seconds |
Next Steps
- Durable Streams: persist large or long-running output outside the HTTP response
- Key-Value Storage: compact conversation state keyed by conversation ID
- Agents: wrap model calls in typed, reusable app functions
- Tool Calling: stream tool-aware chat responses with AI SDK UI messages