ASK KNOX
beta
LESSON 185

Model Routing — Opus Where It Matters, Sonnet Everywhere Else

Not every agent needs the most powerful model. Strategic model routing across your agent fleet saves tokens without sacrificing quality — if you know where reasoning actually matters.

7 min read·Claude Code Operations

Running every agent on opus is like deploying your senior architect to write boilerplate. The work gets done, and you pay full senior-architect rates for it. At five agents that is tolerable. At twenty agents across a parallel swarm, it is a structural budget leak.

The routing question is the same one from basic model routing, applied at the fleet level: what is the minimum capability needed to produce an acceptable output? The answer changes depending on what type of cognitive work each agent is doing.

Why Most Agent Work Does Not Need Opus

Opus earns its place when an agent must hold contradictory information in context, reason through multi-step problems with no obvious path, synthesize disparate sources into novel insight, or make judgment calls where the stakes of being wrong are high.

Most agent work does not meet that bar.

A security agent scanning for OWASP vulnerabilities is doing pattern matching. The patterns are known. The checklist is finite. Sonnet handles this at high accuracy. A QA agent writing tests from a spec is doing mechanical translation — spec language into assertion language. Sonnet is sufficient. A research agent gathering data from multiple sources is reading and extracting structured information. Sonnet is sufficient.

Routing Tables From a Real Multi-Agent Setup

These tables reflect actual routing decisions across four agent team types. The reasoning column matters — it is the pattern, not the specific assignment.

Feature Team

AgentModelReason
Backend DevopusComplex implementation, architecture decisions, cross-cutting concerns
Frontend DevopusCreative UI work, design system adherence, component architecture
QA EngineersonnetTest writing from spec, verification — read-heavy, mechanical

The feature team is the one case where most agents stay on opus. Implementation work requires holding the full project context, making judgment calls about abstractions, and producing code that has to actually work. QA gets downgraded because writing tests from a working implementation is largely mechanical.

Audit Swarm

AgentModelReason
Security AuditorsonnetPattern matching against known vulnerabilities
Architecture ReviewersonnetStructural analysis against established patterns
Performance AuditorsonnetAnti-pattern detection in existing code
Test Coverage AnalystsonnetCoverage analysis, gap identification

The entire audit swarm runs on sonnet. Every audit task is evaluation against known criteria — the criteria are in the prompt, the code is in the files, and the agent is checking conformance. No novel synthesis required.

Security Team

AgentModelReason
Static AnalyzersonnetOWASP checklist, deterministic vulnerability patterns
Dependency AuditorsonnetRunning audit tools, parsing structured output
Threat ModeleropusSTRIDE analysis, attack reasoning, adversarial thinking

Security teams split down the middle. Static analysis and dependency auditing are pattern matching against known catalogs. Threat modeling is genuinely different — the agent must reason from the attacker's perspective, identify non-obvious attack paths, and synthesize a threat model that does not exist yet. That is opus work.

Research Team

AgentModelReason
ResearchersonnetData gathering, source extraction, structured collection
StrategistopusStrategic synthesis, insight generation, recommendations
CriticsonnetReviewing existing output, identifying gaps — read-heavy

The research team demonstrates the pattern most clearly. The Researcher gathers. The Critic evaluates. Both are reading and pattern matching against criteria. The Strategist is the one agent doing novel synthesis from the gathered data — translating raw research into strategic recommendations that did not exist before. That is the only role that needs opus.

The Numbers

Across these four team types with a realistic dispatch frequency, 14 of 21 agents run on sonnet. That is a substantial reduction in compute spend, with no reduction in output quality because the downgraded agents were not doing work that required opus capability.

Specifying Model in Agent Prompts

When dispatching agents via the Agent tool, include a routing table comment at the top of your orchestrator skill. This makes the routing decisions explicit and reviewable.

## Model Routing

| Agent | Model | Justification |
|-------|-------|---------------|
| Backend Dev | opus | Architecture decisions, complex implementation |
| QA Engineer | sonnet | Test writing from spec, mechanical verification |
| Security Auditor | sonnet | OWASP checklist, pattern matching |
| Threat Modeler | opus | STRIDE analysis, adversarial reasoning |

Then in each agent spawn block, annotate the model explicitly:

Task: Write integration tests for the new auth endpoints.
Model: sonnet
Allowed tools: Read, Bash, Edit, Write

The model parameter on the Agent tool call accepts "sonnet" or "opus" directly. Making it explicit in both the routing table and the agent prompt creates a paper trail for the decision and makes future cost reviews straightforward.

The Haiku Tier

For the simplest tasks — log parsing, file format conversion, extracting fields from structured output, verifying that a file exists — there is a tier below sonnet. Haiku handles deterministic, low-stakes tasks at a fraction of the cost.

Use haiku for agents whose entire job is running a command and reporting the exit code. Deployment verifiers, file presence checkers, output formatters. The cognitive load of these tasks is genuinely minimal and haiku meets it.

is: haiku for mechanical execution, sonnet for analysis and evaluation, opus for synthesis and reasoning. Most agent fleets should land predominantly in the sonnet tier.

When NOT to Downgrade

Two failure modes to avoid:

Downgrading implementation agents. The backend and frontend developers in a feature team are doing real work. A sonnet-tier implementation agent will produce sonnet-tier code — it will work but will miss the abstractions, architectural patterns, and edge cases that opus catches. The cost savings are real; the technical debt is also real.

Downgrading strategy agents. Any agent responsible for producing recommendations, synthesizing research into a plan, or making design decisions that downstream agents execute should stay on opus. Cheap strategy is expensive execution.

The routing table is a tool, not a rule. If your sonnet-tier agent is consistently producing outputs that require significant correction, the task requires more reasoning than you estimated. Move it up.

Lesson 185 Drill

Take your last agent dispatch — whatever skill or manual Agent call you ran most recently.

  1. List every agent in that dispatch
  2. For each agent, identify the cognitive task type: mechanical execution, pattern matching against criteria, or novel synthesis
  3. Assign each to the correct model tier based on that classification
  4. Compare your new assignment against what you actually used

The gap between assigned and actual is your current over-routing rate. On a fleet running daily dispatches, that gap compounds fast.