Cache Control
Pruning solves one half of the problem. The other half is that every call still sends the parts of the context that didn't change.
Your system prompt is the same on call ten as it was on call one. The early user prompt is the same. The tool definitions are the same. The provider has seen these tokens before, and you're paying to send them again.
Cache control is the provider's way to say "I remember this, don't process it again." Where it's supported, the cost drop is real. Where it isn't, the pattern still teaches a useful habit: separate stable context from fresh context.
Outcome
A prepareCall pipeline that prunes old tool results, then marks the stable parts of the remaining context as cacheable. On providers that support cache control, repeated input costs drop substantially.
Fast Track
- Add an
addCacheControl(messages)helper that marks stable messages cacheable - Compose it after
pruneMessagesinprepareCall - Mark the system message and early conversation as cacheable; leave the most recent messages fresh
Hands-on Exercise 5.4
Wire cache control behind the pruning step.
Requirements:
- Write
addCacheControl(messages)that returns a new array of messages withproviderOptions.cacheControlset where appropriate - The first message (system or initial user prompt) should always be cacheable
- Messages older than the last two should be cacheable
- The most recent one or two messages should stay uncached
- Update
prepareCallto calladdCacheControl(pruneMessages(...))
Implementation hints:
cacheControl: { type: "ephemeral" }is the Anthropic-flavored shape. Other providers use different keys. If you're hitting one that doesn't support cache control, the call still works, the header just gets ignored- The cache breakpoints work by prefix. Marking message 5 as cacheable caches everything up to and including message 5. The provider checks the prefix on the next call
- Don't mark recent messages cacheable. They're about to be replaced
The helper
import type { ModelMessage } from "ai";
export function addCacheControl(messages: ModelMessage[]): ModelMessage[] {
return messages.map((msg, i) => {
if (i === 0) {
return {
...msg,
providerOptions: { cacheControl: { type: "ephemeral" } },
};
}
if (i < messages.length - 2) {
return {
...msg,
providerOptions: { cacheControl: { type: "ephemeral" } },
};
}
return msg;
});
}The first message and all but the last two get the cache marker. The recent ones stay out so they don't end up cached just before the next call replaces them.
Compose with pruning
In index.ts, the prepareCall becomes a pipeline:
import { addCacheControl } from "./src/cache";
prepareCall: async (options) => {
const pruned = options.messages
? pruneMessages({
messages: options.messages,
toolCalls: "before-last-3-messages",
})
: undefined;
return {
...options,
messages: pruned ? addCacheControl(pruned) : undefined,
};
},Pruning happens first because it changes how many messages there are. Caching happens second on whatever messages survive.
What the savings look like
The numbers depend on the provider, the cache hit rate, and how much of your context is stable. For a typical Anthropic-backed agent running long sessions:
| Component | Cost without cache | Cost with cache |
|---|---|---|
| 50 calls, 200K input each | Full input every call | Stable prefix cached |
| 10M tokens at full price | ~$30 per session | ~$6 per session |
That's an order-of-magnitude swing on long sessions. On short sessions, the difference is smaller because the cache doesn't have time to amortize. The discipline still applies.
Anthropic uses cacheControl on message parts. OpenAI uses different headers and a different model. Some providers don't expose caching at the prompt level at all. The pattern (separate stable from fresh) survives those differences. The exact providerOptions shape doesn't.
Why the pattern matters even where caching isn't supported
You're forcing yourself to think about which parts of the prompt are stable. That's a useful habit. It tells you whether your system prompt is doing too much work per-call (rebuilt every time) or just right (built once, reused).
If caching disappeared tomorrow, the discipline of identifying stable context would still be worth keeping. Stable context is also easier to test, easier to version, and easier to reason about than context that changes shape every call.
Try It
Run a multi-step task and watch the token counts:
bun run index.ts . "Read package.json, tsconfig, index.ts, then summarize"If your provider returns cache hit info in the usage object, log it:
onStepFinish: ({ usage, stepNumber }) => {
console.error(
`Step ${stepNumber}: ${usage.inputTokens} input, ${usage.outputTokens} output, ${usage.cachedInputTokens ?? 0} cached`,
);
},You should see cachedInputTokens growing from step 1 onward, with inputTokens (the part you pay full price for) staying small.
npx tsc --noEmitCommit
git add src/cache.ts index.ts
git commit -m "feat(context): add cache control to stable messages"Done-When
addCacheControl(messages)marks stable messages cacheableprepareCallruns pruning then caching, in that order- The most recent message stays uncached
- On a provider that supports caching,
cachedInputTokensshows up in usage npx tsc --noEmitpasses
You've got telemetry from lesson 5.1, pruning from 5.2, caps from 5.3, and caching from this lesson. Wire them together. After each step, log: input tokens (uncached), cached tokens, output tokens, and the running cost. At session end, print the total cost and what the session would have cost without pruning and caching. The number is what makes the discipline worth doing.
Solution
import type { ModelMessage } from "ai";
export function addCacheControl(messages: ModelMessage[]): ModelMessage[] {
return messages.map((msg, i) => {
if (i === 0) {
return {
...msg,
providerOptions: { cacheControl: { type: "ephemeral" } },
};
}
if (i < messages.length - 2) {
return {
...msg,
providerOptions: { cacheControl: { type: "ephemeral" } },
};
}
return msg;
});
}prepareCall: async (options) => {
const pruned = options.messages
? pruneMessages({
messages: options.messages,
toolCalls: "before-last-3-messages",
})
: undefined;
return {
...options,
messages: pruned ? addCacheControl(pruned) : undefined,
};
},Was this helpful?