Tool Output Design

Pruning takes old results out of context. That's necessary. It's not enough.

If a single tool result is 5,000 tokens, pruning the next one doesn't save you. The damage is already done. The model already saw 5,000 tokens of grep output and now has to keep them around for at least three more turns.

The better fix is upstream. Tools should produce small, structured, bounded output by default. Pruning is the cleanup crew. Tool design is the prevention.

Outcome

Every tool in your harness has explicit output caps (lines, matches, characters), and the truncation behavior is communicated back to the model so it can paginate when it needs more.

Fast Track

Cap read at 500 lines, with offset/limit pagination
Cap grep at 50 matches, returning the total count
Cap bash at 5,000 characters of output, keeping the tail
Every cap surfaces a truncation message the model can see and act on

Hands-on Exercise 5.3

Apply the bounded-output contract to all three tools.

Requirements:

read keeps its 500-line cap from Module 1, with offset and limit params for pagination
grep keeps its 50-match cap from Module 1, with a "(N total, showing first 50)" suffix when truncated
bash adds a 5,000-character cap on stdout. Keep the tail (last 5,000 chars), not the head, because errors usually live at the end
Each truncation appends a clear message like "... (truncated, showing last 5000 chars)"

Implementation hints:

The truncation message is the model's only signal that there's more data. It needs to be visible
For bash, slicing the tail is usually right. Build output, test failures, and stack traces tend to be at the end. Pick differently if your tool runs commands where the head matters
"Bounded" doesn't mean "tiny." 500 lines, 50 matches, 5,000 chars. Enough to answer the question, small enough to stay in context

The cap table

Tool	Cap	Why this number
`read`	500 lines	Enough to read most files. Big enough to grasp structure, small enough to not bury the model
`grep`	50 matches	A search returning 50 results answered the question. Five hundred would be a data dump
`bash`	5,000 chars	Most command output fits. `npm install` and friends produce noise the model doesn't need

These numbers aren't sacred. They're tuned by running real tasks and noticing what hurts. If your harness consistently runs commands with longer output that matters, raise the cap. If you mostly do fast searches, lower it.

Bash output, with cap

The bash tool didn't have an output cap until now. Add one:

src/tools.ts (excerpt)

const MAX_BASH_CHARS = 5000;
 
const stdout = result.stdout || "(no output)";
const cappedStdout =
  stdout.length > MAX_BASH_CHARS
    ? stdout.slice(-MAX_BASH_CHARS) +
      `\n... (truncated, showing last ${MAX_BASH_CHARS} chars)`
    : stdout;
 
return cappedStdout;

Slicing from the end is intentional. Most commands the agent will run fail loudly at the end. A failed test prints the failures last. A failed build prints the error last. Keeping the tail keeps the part the agent needs to act on.

Structured returns, not raw dumps

The grep tool already does this from Module 1, but it's worth restating the pattern:

src/tools.ts (grep, excerpt)

const lines = stdout.trim().split("\n").filter(Boolean);
const MAX_MATCHES = 50;
const truncated = lines.length > MAX_MATCHES;
const result = truncated ? lines.slice(0, MAX_MATCHES) : lines;
 
return truncated
  ? result.join("\n") + `\n... (${lines.length} total, showing first ${MAX_MATCHES})`
  : result.join("\n") || "No matches found.";

The truncation message gives the model two pieces of information it can act on: there were more results than shown, and exactly how many. With that, the agent can decide to narrow the search or paginate.

The truncation contract

Every tool that can produce unbounded output should follow the same shape:

Cap the output at a reasonable limit
Tell the model the output was truncated and how much got cut
Provide pagination parameters where the tool supports them (offset/limit on read, narrower glob patterns on grep)

The contract is what lets the agent react. A tool that silently truncates is worse than no truncation at all, because the model thinks it has the full picture and acts on incomplete data.

Caps are a tax the agent pays in pagination

Bounded output makes some tasks a little slower. To read a 2,000-line file, the agent now needs four read calls instead of one. That's the right tradeoff. Four bounded reads are cheaper, in tokens and cost, than one massive read that pollutes context for the rest of the session.

Try It

Run a search that you know returns a lot of matches:

Terminal

bun run index.ts . "Find all import statements in this project"

You should see grep return 50 matches with a count of total matches in the tail. If you ask the agent to keep going, it should narrow the search or use a more specific glob, not ask for an unbounded dump.

Try a command that produces a lot of output:

Terminal

bun run index.ts . "Run: ls -laR"

If the recursive listing exceeds 5,000 characters, you should see the truncation message. The agent should react by narrowing the listing or asking for a specific subdirectory.

Terminal

npx tsc --noEmit

Commit

git add src/tools.ts
git commit -m "feat(tools): cap bash output at 5000 chars with tail-keep"

Done-When

read caps at 500 lines with offset/limit pagination
grep caps at 50 matches with a (N total) suffix on truncation
bash caps stdout at 5,000 characters, keeping the tail
Every cap surfaces a clear truncation message the model can see
No tool can dump unbounded data into context
npx tsc --noEmit passes

Make the caps configurable

Hardcoded caps are a starting point. A subagent doing a quick check might need 100 lines, not 500. A deep analysis might want 2,000. Refactor your tool factories to accept a caps config object. Now the caller can tune them per agent. Watch for the tradeoff: configurable caps means more knobs for the user to set wrong. Where's the right default?

Solution

src/tools.ts (bash excerpt)

export function createBashTool(
  sandbox: Sandbox,
  needsApproval: (input: { command: string }) => boolean,
) {
  const MAX_BASH_CHARS = 5000;
 
  return tool({
    description: `Execute a shell command in the working directory.
WHEN TO USE: build commands, package install, tests, git, directory listings.
WHEN NOT TO USE: reading file contents (use read).
DO NOT USE FOR: reading files (use read), searching code (use grep).`,
    inputSchema: z.object({
      command: z.string().describe("Shell command to execute"),
    }),
    execute: async ({ command }) => {
      if (needsApproval({ command })) {
        return `Blocked: "${command}" requires approval.`;
      }
      const result = await sandbox.exec(command);
      const stdout = result.stdout || "(no output)";
      return stdout.length > MAX_BASH_CHARS
        ? stdout.slice(-MAX_BASH_CHARS) +
            `\n... (truncated, showing last ${MAX_BASH_CHARS} chars)`
        : stdout;
    },
  });
}