The Problem
Your agent has been getting away with something.
Through Modules 1 to 4, you've been running short tasks. Read a file, find a TODO, list the directory. Five steps, ten at most. The context window stays comfortable, the agent stays sharp, and everything is fine.
This is the part where we turn around and look at what happens on a real task. Twenty steps. Thirty. The kind of task where the agent reads three files, searches the codebase twice, runs a test, fixes a thing, runs the test again, and writes a summary.
Before we fix the problem, we need to see it. The fix is small. The seeing is the hard part.
Outcome
onStepFinish logs input and output tokens at every step. Running a multi-step task makes context growth visible: input tokens climb linearly while output stays roughly flat.
Fast Track
- Add an
onStepFinishcallback toToolLoopAgent - Log
usage.inputTokensandusage.outputTokensper step - Run a multi-step task and watch the input number climb
Hands-on Exercise 5.1
Wire token logging and run a task that does enough work to show the curve.
Requirements:
- Pass
onStepFinish: ({ usage, stepNumber }) => { ... }toToolLoopAgent - Log to
console.error(so it shows up next to the agent's normal output but stays out of stdout) - Run a prompt that forces 4+ tool calls so the curve is visible
- Read the numbers. Don't try to fix anything yet
Implementation hints:
onStepFinishruns after every step, not just successful ones. Theusagefield is the relevant one- Use
console.errorfor telemetry.console.logmixes with the agent's response and gets ugly - The prompt needs to do real work. "Read package.json" is one step. "Read package.json, then tsconfig, then the entry point, then summarize" is four
Add the logging
const agent = new ToolLoopAgent({
// ... existing config
onStepFinish: ({ usage, stepNumber }) => {
console.error(
`Step ${stepNumber}: ${usage.inputTokens} input, ${usage.outputTokens} output`,
);
},
});That's all of the instrumentation. The SDK calls this for you after every step.
Run a task that hurts
bun run index.ts . "Read package.json, then tsconfig.json, then index.ts, then summarize everything"You should see something like:
Step 0: 1,200 input, 450 output
Step 1: 2,800 input, 200 output
Step 2: 4,100 input, 180 output
Step 3: 8,900 input, 350 output
Step 4: 9,200 input, 600 output
The exact numbers will be different in your project. The shape won't be. Input tokens climb every step. Output tokens stay roughly flat.
Why input tokens grow
Every step sends the entire message history to the model. The user prompt. The system prompt. Every tool call the agent has made. Every tool result the agent has received.
The package.json from step 1 is still in context at step 4, even though the agent is done with it. The tsconfig from step 2 is still there. Nothing leaves on its own.
| Component | Tokens | Behavior |
|---|---|---|
| System prompt | ~500 | Fixed, sent every call |
| Each tool result | 200 to 2,000 | Stays in history forever |
| After 20 tool calls | 4,000 to 40,000 | Linearly accumulating |
The context window is 200,000 tokens. A busy agent that reads big files can hit it in 30 to 50 steps. When it does:
- The instructions at the top get pushed out of attention
- The model starts ignoring its own system prompt
- Tool selection degrades
- The agent loops or hallucinates
What doesn't work
Three tempting non-fixes:
- Hoping it doesn't happen. It always happens on real tasks
- Reducing step count. Ten steps is too few for real work. Fifty is normal
- Using a bigger model. A bigger context window delays the problem, doesn't solve it. And it costs more per token
The fix is to take old tool results out of the message history before they overflow attention. Lesson 5.2 does exactly that.
This lesson doesn't fix anything. That's deliberate. You can't tell whether a fix worked unless you measured the problem first. Token logging stays in for the rest of the module so you can compare before and after at every step.
Try It
Run the multi-step task. Look at the numbers. Confirm:
- Input tokens grow with every step
- Output tokens stay relatively small
- By the last step, you're sending more than three times the tokens of step 0
bun run index.ts . "Read package.json, then tsconfig.json, then index.ts, then summarize everything"Try a longer task and watch the curve get steeper:
bun run index.ts . "Read every .ts file in src/, then tell me what each one does"npx tsc --noEmitCommit
git add index.ts
git commit -m "feat(telemetry): log token usage per step"Done-When
onStepFinishis wired to logusage.inputTokensandusage.outputTokens- A 4+ step task shows input tokens climbing
- Output tokens stay relatively flat across steps
- The logging goes to
console.error, notconsole.log npx tsc --noEmitpasses
Logging numbers to stderr is fine. Logging them to a CSV is better. Append step,inputTokens,outputTokens to a file in onStepFinish, then load it into your spreadsheet of choice. The shape of the curve tells you whether your task is read-heavy (steep climb) or compute-heavy (gentler climb). Same agent, different curve, depending on what it's doing.
Solution
const agent = new ToolLoopAgent({
model: "anthropic/claude-haiku-4-5",
instructions: buildSystemPrompt({
workingDirectory: cwd,
sandboxType: sandbox.type,
toolNames: Object.keys(tools),
projectContext,
}),
tools,
stopWhen: stepCountIs(15),
onStepFinish: ({ usage, stepNumber }) => {
console.error(
`Step ${stepNumber}: ${usage.inputTokens} input, ${usage.outputTokens} output`,
);
},
});Was this helpful?