Ask Knox

Not every agent task requires the same model. Routing all your agent traffic through your best model is like shipping every package overnight because overnight delivery exists. It works, but you are paying premium rates for tasks that did not need them.

Model tier routing solves this at the infrastructure level, so individual agents do not have to reason about cost.

The Four-Tier Model

The agent-broker FinOps system defines a tier order with four slots:

MODEL_TIER_ORDER = [
    "gemini-2.0-flash",          # Tier 0 — Nano (free tier, rate-limited)
    "claude-haiku-4-5-20251001", # Tier 1 — Micro ($1.00/$5.00 per MTok)
    "claude-sonnet-4-6",         # Tier 2 — Standard ($3.00/$15.00 per MTok)
    "claude-opus-4-8",           # Tier 3 — Premium ($5.00/$25.00 per MTok)
]

The list is ordered from cheapest to most expensive. This ordering is not cosmetic — it is used directly in the enforcement logic. A model's tier is its index position in this list. When the system needs to compare tiers, it compares indices.

The pricing spread is significant:

Nano to Micro: Nano is free only within Gemini's rate-limited free tier (its paid tier bills ~$0.10/$0.40 per MTok); Micro is the paid entry point at $1.00/$5.00 per MTok (input/output)
Micro to Standard: ~3x cost increase for input, ~3x for output
Standard to Premium: ~1.67x cost increase for input, ~1.67x for output

A system that routes utility agents to Haiku instead of Sonnet saves ~3x on every call those agents make. Over a full day of agent activity, that difference is material.

Agent Type Ceilings

Model tier enforcement is applied at the agent type level, not per-agent. This is intentional. Individual agents should not negotiate their own model access — that would make every agent a potential budget escalation vector.

MODEL_TIER_CEILING = {
    "revenue-product": "claude-sonnet-4-6",
    "shared-service": "claude-sonnet-4-6",
    "content-product": "claude-sonnet-4-6",
    "tooling": "claude-haiku-4-5-20251001",
}

The mapping reflects the actual quality requirements of each agent type:

Revenue-product agents (Foresight, the sports-prediction bot, the signal-engine bot) make trading decisions. The quality of reasoning directly affects P&L. Sonnet is the minimum acceptable quality level. These agents are capped at Sonnet, not Opus — adding Opus overhead to every trading decision would be expensive, and the quality delta for signal generation tasks does not justify the premium.

Shared-service agents (InDecision Engine, Semantic Memory Layer) power other agents. They need reliable reasoning for classification, routing, and synthesis tasks. Sonnet is the appropriate ceiling.

Content-product agents (the blog-content pipeline, the Rewired Minds pipeline) generate written output at scale. Sonnet produces high-quality content for blog and video scripts. Opus is excessive for this use case and would make the content pipeline ~1.67x more expensive per run.

Tooling agents (OpenClaw Skills, MCPs) perform well-defined utility tasks: format conversion, simple classification, template filling. Haiku handles these cleanly. Routing tooling agents through Sonnet is a ~3x cost premium for equivalent output quality.

Notice what is missing from the ceiling map: Opus. No agent type has a ceiling at Opus. This is a deliberate product decision — if a task genuinely requires Opus-level reasoning, it warrants a manual override and an explicit justification, not an automated pathway.

The Enforcement Function

The enforce_tier() static method is the implementation. It takes an agent type and a requested model, returns the allowed model:

@staticmethod
def enforce_tier(agent_type: str, requested_model: str) -> str:
    """
    Enforce model tier ceiling per agent type.
    Returns the allowed model (may be downgraded).
    Unknown agent types default to the lowest tier (haiku).
    """
    # Default ceiling for unknown agent types — fail closed
    default_ceiling = "claude-haiku-4-5-20251001"
    ceiling = MODEL_TIER_CEILING.get(agent_type, default_ceiling)

    if requested_model not in MODEL_TIER_ORDER:
        logger.warning(
            f"Unknown model '{requested_model}' — allowing "
            f"(not in tier order)"
        )
        return requested_model

    requested_idx = MODEL_TIER_ORDER.index(requested_model)
    ceiling_idx = MODEL_TIER_ORDER.index(ceiling)

    if requested_idx > ceiling_idx:
        logger.warning(
            f"Model tier downgrade: {requested_model} → {ceiling} "
            f"for agent type {agent_type}"
        )
        return ceiling
    return requested_model

Two design decisions worth examining:

Fail closed for unknown agent types. If an agent presents a type that is not in MODEL_TIER_CEILING, the default ceiling is Haiku — the cheapest paid model. New agent types do not automatically inherit broad model access. They start at the floor and get an explicit ceiling added to the map.

Fail open for unknown models. If the requested model is not in MODEL_TIER_ORDER, the function logs a warning and allows the call through. This is the inverse of the agent type behavior, for a different reason: a new model that has not been added to the tier system yet should not silently break agents that legitimately use it. The warning surfaces the gap so the operator can add it to the tier order.

Walking Through the Logic

Let a tooling agent (OpenClaw Skills) request claude-opus-4-8:

agent_type = "tooling"
requested_model = "claude-opus-4-8"

ceiling = MODEL_TIER_CEILING.get("tooling") = "claude-haiku-4-5-20251001"

requested_idx = MODEL_TIER_ORDER.index("claude-opus-4-8") = 3
ceiling_idx = MODEL_TIER_ORDER.index("claude-haiku-4-5-20251001") = 1

3 > 1 → True → downgrade
return "claude-haiku-4-5-20251001"

The agent asked for Opus. It gets Haiku. The log records the downgrade. No exception is raised, no call is blocked — the agent proceeds, just with a model that matches its allocated tier.

Now let a revenue-product agent (Foresight) request claude-haiku-4-5-20251001:

agent_type = "revenue-product"
requested_model = "claude-haiku-4-5-20251001"

ceiling = MODEL_TIER_CEILING.get("revenue-product") = "claude-sonnet-4-6"

requested_idx = MODEL_TIER_ORDER.index("claude-haiku-4-5-20251001") = 1
ceiling_idx = MODEL_TIER_ORDER.index("claude-sonnet-4-6") = 2

1 > 2 → False → no downgrade
return "claude-haiku-4-5-20251001"

Foresight is allowed to use Haiku voluntarily. The ceiling is a maximum, not a minimum. An agent can always use a cheaper tier than its ceiling — it just cannot exceed it.

Where This Fits in the Call Flow

Model tier enforcement runs before the cost recording step. The sequence for every LLM call in the agent broker is:

Agent requests a model
enforce_tier(agent_type, requested_model) — returns the allowed model
budget_enforcer.check_budget(agent_id) — checks if the agent has budget remaining
LLM call executes with the allowed model
cost_tracker.record(agent_id, session_id, model, input_tokens, output_tokens) — records the actual cost

Steps 2 and 3 are both required. Tier enforcement alone does not prevent overspend — it prevents model overshoot. Budget enforcement alone does not prevent model abuse — it only stops agents after they have already used expensive models to exhaust their budget. Together they form the complete control surface.

The Linter Rule

Model tier enforcement in code is only effective if agents cannot bypass it. The agent-broker codebase has a linter rule that catches direct anthropic.messages.create() calls. Any agent code that calls the Anthropic API directly, without going through CostTracker.create(), fails the lint check.

This is a common pattern in infrastructure engineering: pair enforcement logic with a static analysis check that prevents developers from accidentally bypassing the enforcement. The linter catches bypasses in code review before they reach production.

Extending the Tier System

The tier system is designed to be extended, not replaced. Adding a new model requires three updates:

Add the model to MODEL_PRICING_USD_PER_MTOK with its input and output pricing
Insert it at the correct position in MODEL_TIER_ORDER
Update any MODEL_TIER_CEILING entries for agent types that should have access to it

Adding a new agent type requires one update: add it to MODEL_TIER_CEILING with its appropriate ceiling. The default-to-Haiku behavior means new agent types work immediately without explicit configuration — they just work at the floor tier until you decide to elevate them.

Cost Impact of Getting This Right

To make the stakes concrete: a fleet of 20 agents running on Sonnet across a day, with an average session of 30 turns at 6,000 input / 1,500 output tokens per turn:

Per turn: (6000/1M × $3.00) + (1500/1M × $15.00) = $0.018 + $0.0225 = $0.0405
Per 30-turn session: $1.22
20 agents × 3 sessions/day × $1.22 = $73.08/day

If 8 of those 20 agents are tooling-type agents that could run on Haiku:

Per turn (Haiku 4-5): (6000/1M × $1.00) + (1500/1M × $5.00) = $0.006 + $0.0075 = $0.0135
Per 30-turn session: $0.405
8 agents × 3 sessions/day × $0.405 = $9.72/day vs $29.27/day on Sonnet

Routing those 8 tooling agents to their correct tier saves ~$19.44/day. Over a month, that is ~$583.

This is not a hypothetical optimization. It is the cost difference between a FinOps system and running everything on the most capable model available.

Next: Per-Agent Daily Budgets

Tier enforcement caps the model. The next lesson adds the budget layer: per-agent daily limits with an 80% warning, a 100% stop, and the global $25 ceiling that acts as the final backstop regardless of agent tier.

Model Tier Routing — Automatic Downgrade Logic