ASK KNOX
beta
LESSON 205

Model Tier Routing — Automatic Downgrade Logic

Four model tiers, one enforcement layer. How principal-broker routes agents to the cheapest model that meets their quality ceiling — and blocks anyone who tries to reach above their tier.

8 min read·FinOps for AI Agents

Not every agent task requires the same model. Routing all your agent traffic through your best model is like shipping every package overnight because overnight delivery exists. It works, but you are paying premium rates for tasks that did not need them.

Model tier routing solves this at the infrastructure level, so individual agents do not have to reason about cost.

The Four-Tier Model

The principal-broker FinOps system defines a tier order with four slots:

MODEL_TIER_ORDER = [
    "gemini-2.0-flash",          # Tier 0 — Nano (free)
    "claude-haiku-4-5-20251001", # Tier 1 — Micro ($0.25/$1.25 per MTok)
    "claude-sonnet-4-6",         # Tier 2 — Standard ($3.00/$15.00 per MTok)
    "claude-opus-4-6",           # Tier 3 — Premium ($15.00/$75.00 per MTok)
]

The list is ordered from cheapest to most expensive. This ordering is not cosmetic — it is used directly in the enforcement logic. A model's tier is its index position in this list. When the system needs to compare tiers, it compares indices.

The pricing spread is significant:

  • Nano to Micro: 12x cost increase for input, similar for output
  • Micro to Standard: 12x cost increase for input, 12x for output
  • Standard to Premium: 5x cost increase for input, 5x for output

A system that routes utility agents to Haiku instead of Sonnet saves 12x on every call those agents make. Over a full day of agent activity, that difference is material.

Agent Type Ceilings

Model tier enforcement is applied at the agent type level, not per-agent. This is intentional. Individual agents should not negotiate their own model access — that would make every agent a potential budget escalation vector.

MODEL_TIER_CEILING = {
    "revenue-product": "claude-sonnet-4-6",
    "shared-service": "claude-sonnet-4-6",
    "content-product": "claude-sonnet-4-6",
    "tooling": "claude-haiku-4-5-20251001",
}

The mapping reflects the actual quality requirements of each agent type:

Revenue-product agents (Foresight, the sports prediction agent, the political events prediction agent) make trading decisions. The quality of reasoning directly affects P&L. Sonnet is the minimum acceptable quality level. These agents are capped at Sonnet, not Opus — adding Opus overhead to every trading decision would be expensive, and the quality delta for signal generation tasks does not justify the premium.

Shared-service agents (InDecision Engine, the semantic memory layer) power other agents. They need reliable reasoning for classification, routing, and synthesis tasks. Sonnet is the appropriate ceiling.

Content-product agents (the AI content production pipeline, Rewired Minds pipeline) generate written output at scale. Sonnet produces high-quality content for blog and video scripts. Opus is excessive for this use case and would make the content pipeline 5x more expensive per run.

Tooling agents (OpenClaw skills, MCPs) perform well-defined utility tasks: format conversion, simple classification, template filling. Haiku handles these cleanly. Routing tooling agents through Sonnet is a 12x cost premium for equivalent output quality.

Notice what is missing from the ceiling map: Opus. No agent type has a ceiling at Opus. This is a deliberate product decision — if a task genuinely requires Opus-level reasoning, it warrants a manual override and an explicit justification, not an automated pathway.

The Enforcement Function

The enforce_tier() static method is the implementation. It takes an agent type and a requested model, returns the allowed model:

@staticmethod
def enforce_tier(agent_type: str, requested_model: str) -> str:
    """
    Enforce model tier ceiling per agent type.
    Returns the allowed model (may be downgraded).
    Unknown agent types default to the lowest tier (haiku).
    """
    # Default ceiling for unknown agent types — fail closed
    default_ceiling = "claude-haiku-4-5-20251001"
    ceiling = MODEL_TIER_CEILING.get(agent_type, default_ceiling)

    if requested_model not in MODEL_TIER_ORDER:
        logger.warning(
            f"Unknown model '{requested_model}' — allowing "
            f"(not in tier order)"
        )
        return requested_model

    requested_idx = MODEL_TIER_ORDER.index(requested_model)
    ceiling_idx = MODEL_TIER_ORDER.index(ceiling)

    if requested_idx > ceiling_idx:
        logger.warning(
            f"Model tier downgrade: {requested_model} → {ceiling} "
            f"for agent type {agent_type}"
        )
        return ceiling
    return requested_model

Two design decisions worth examining:

Fail closed for unknown agent types. If an agent presents a type that is not in MODEL_TIER_CEILING, the default ceiling is Haiku — the cheapest paid model. New agent types do not automatically inherit broad model access. They start at the floor and get an explicit ceiling added to the map.

Fail open for unknown models. If the requested model is not in MODEL_TIER_ORDER, the function logs a warning and allows the call through. This is the inverse of the agent type behavior, for a different reason: a new model that has not been added to the tier system yet should not silently break agents that legitimately use it. The warning surfaces the gap so the operator can add it to the tier order.

Walking Through the Logic

Let a tooling agent (OpenClaw skills) request claude-opus-4-6:

agent_type = "tooling"
requested_model = "claude-opus-4-6"

ceiling = MODEL_TIER_CEILING.get("tooling") = "claude-haiku-4-5-20251001"

requested_idx = MODEL_TIER_ORDER.index("claude-opus-4-6") = 3
ceiling_idx = MODEL_TIER_ORDER.index("claude-haiku-4-5-20251001") = 1

3 > 1 → True → downgrade
return "claude-haiku-4-5-20251001"

The agent asked for Opus. It gets Haiku. The log records the downgrade. No exception is raised, no call is blocked — the agent proceeds, just with a model that matches its allocated tier.

Now let a revenue-product agent (Foresight) request claude-haiku-4-5-20251001:

agent_type = "revenue-product"
requested_model = "claude-haiku-4-5-20251001"

ceiling = MODEL_TIER_CEILING.get("revenue-product") = "claude-sonnet-4-6"

requested_idx = MODEL_TIER_ORDER.index("claude-haiku-4-5-20251001") = 1
ceiling_idx = MODEL_TIER_ORDER.index("claude-sonnet-4-6") = 2

1 > 2 → False → no downgrade
return "claude-haiku-4-5-20251001"

Foresight is allowed to use Haiku voluntarily. The ceiling is a maximum, not a minimum. An agent can always use a cheaper tier than its ceiling — it just cannot exceed it.

Where This Fits in the Call Flow

Model tier enforcement runs before the cost recording step. The sequence for every LLM call in principal-broker is:

  1. Agent requests a model
  2. enforce_tier(agent_type, requested_model) — returns the allowed model
  3. budget_enforcer.check_budget(agent_id) — checks if the agent has budget remaining
  4. LLM call executes with the allowed model
  5. cost_tracker.record(agent_id, session_id, model, input_tokens, output_tokens) — records the actual cost

Steps 2 and 3 are both required. Tier enforcement alone does not prevent overspend — it prevents model overshoot. Budget enforcement alone does not prevent model abuse — it only stops agents after they have already used expensive models to exhaust their budget. Together they form the complete control surface.

The Linter Rule

Model tier enforcement in code is only effective if agents cannot bypass it. The principal-broker codebase has a linter rule that catches direct anthropic.messages.create() calls. Any agent code that calls the Anthropic API directly, without going through CostTracker.create(), fails the lint check.

This is a common pattern in infrastructure engineering: pair enforcement logic with a static analysis check that prevents developers from accidentally bypassing the enforcement. The linter catches bypasses in code review before they reach production.

Extending the Tier System

The tier system is designed to be extended, not replaced. Adding a new model requires three updates:

  1. Add the model to MODEL_PRICING_USD_PER_MTOK with its input and output pricing
  2. Insert it at the correct position in MODEL_TIER_ORDER
  3. Update any MODEL_TIER_CEILING entries for agent types that should have access to it

Adding a new agent type requires one update: add it to MODEL_TIER_CEILING with its appropriate ceiling. The default-to-Haiku behavior means new agent types work immediately without explicit configuration — they just work at the floor tier until you decide to elevate them.

Cost Impact of Getting This Right

To make the stakes concrete: a fleet of 20 agents running on Sonnet across a day, with an average session of 30 turns at 6,000 input / 1,500 output tokens per turn:

Per turn: (6000/1M × $3.00) + (1500/1M × $15.00) = $0.018 + $0.0225 = $0.0405
Per 30-turn session: $1.22
20 agents × 3 sessions/day × $1.22 = $73.08/day

If 8 of those 20 agents are tooling-type agents that could run on Haiku:

Per turn (Haiku): (6000/1M × $0.25) + (1500/1M × $1.25) = $0.0015 + $0.001875 = $0.003375
Per 30-turn session: $0.10
8 agents × 3 sessions/day × $0.10 = $2.43/day vs $29.27/day on Sonnet

Routing those 8 tooling agents to their correct tier saves $26.84/day. Over a month, that is $805.

This is not a hypothetical optimization. It is the cost difference between a FinOps system and running everything on the most capable model available.

Next: Per-Agent Daily Budgets

Tier enforcement caps the model. Lesson 206 adds the budget layer: per-agent daily limits with an 80% warning, a 100% stop, and the global $25 ceiling that acts as the final backstop regardless of agent tier.