Ask Knox

Every FinOps control in the previous four lessons is deterministic and automatic. Model tier routing fires on every LLM call. Budget enforcement checks on every call. Loop detection checks on every turn. These controls run without human intervention.

Budget overrides are different. They are explicitly authorized changes to the budget system, performed by an operator who has decided a default allocation is insufficient for a specific situation. Because they are human-authorized exceptions to automated controls, they need an audit trail.

This lesson covers the override mechanism: the endpoint design, the authentication approach, and the audit pattern that makes overrides accountable.

The Admin Token Pattern

The override endpoint requires admin authorization:

def _require_admin(request: Request) -> None:
    """Raise 403 unless the caller presents the BROKER_ADMIN_TOKEN."""
    auth_header = request.headers.get("Authorization", "")
    token = auth_header[7:] if auth_header.startswith("Bearer ") else ""
    admin_token = os.environ.get("BROKER_ADMIN_TOKEN", "")
    if not admin_token or token != admin_token:
        raise HTTPException(status_code=403, detail="Admin access required")

The implementation checks for a Bearer <token> header and compares it against the BROKER_ADMIN_TOKEN environment variable.

Two security details worth noting:

The not admin_token check. If BROKER_ADMIN_TOKEN is not set in the environment, every request returns 403. This is fail-closed: an operator who forgets to set the token cannot accidentally leave the endpoint unprotected. The endpoint is locked until explicitly unlocked by setting the token.

Header extraction pattern. auth_header[7:] strips the "Bearer " prefix (7 characters). If the header does not start with "Bearer ", token becomes an empty string, which will not match any valid token. This handles malformed headers without an additional condition.

The Override Input Schema

class BudgetOverrideInput(BaseModel):
    agent_id: str = Field(..., description="Agent ID to override")
    new_budget_usd: float = Field(..., description="New daily budget in USD (must be > 0)")
    reason: str = Field(..., description="Reason for the override")

Three fields, all required. The ... in Field(...) is Pydantic's syntax for a required field — there is no default. A request without any of these three fields returns a 422 before reaching the endpoint logic.

The reason field deserves particular attention. It is a string with no minimum length in the schema — Pydantic will accept reason: "q". The enforcement is social and operational, not technical: if you send a single-character reason to a production override endpoint, the audit log will record your single-character reason, and when someone reviews that log during an incident post-mortem, they will ask why the budget was changed with no explanation.

In practice, the Knox approval flow enforces reason quality: the operator sends a Discord command with the override request, OpenClaw parses it and forwards to the broker, and the Discord message becomes the reason string — naturally descriptive because it was written for a human audience.

The Endpoint Logic

@router.post("/budgets/override")
async def override_budget(inp: BudgetOverrideInput, request: Request) -> dict[str, Any]:
    """
    Override an agent's daily budget.
    Admin-only. The override takes precedence over the default until cleared.
    """
    _require_admin(request)

    if inp.new_budget_usd <= 0:
        raise HTTPException(
            status_code=422,
            detail="new_budget_usd must be greater than 0",
        )

    registry = request.app.state.registry
    if not registry.get(inp.agent_id):
        raise HTTPException(
            status_code=404,
            detail=f"Agent '{inp.agent_id}' not found in registry",
        )

    be = request.app.state.budget_enforcer
    old_budget = be.get_effective_budget(inp.agent_id)
    be.set_override(inp.agent_id, inp.new_budget_usd)

    logger.info(
        f"Budget override: agent={inp.agent_id} "
        f"old=${old_budget:.2f} new=${inp.new_budget_usd:.2f} "
        f"reason={inp.reason!r}"
    )

    return {
        "agent_id": inp.agent_id,
        "old_budget_usd": old_budget,
        "new_budget_usd": inp.new_budget_usd,
        "reason": inp.reason,
        "status": "applied",
    }

The logic runs four checks before applying the override, in order:

Admin auth — 403 if token is wrong or missing
Budget validation — 422 if new_budget_usd <= 0 (you cannot set a zero or negative budget)
Registry check — 404 if the agent is not registered (prevents overrides for phantom agents)
Old budget capture — reads the current effective budget before applying the change (for the audit log)

Then applies the override and logs. The logger.info() line is the audit record: agent ID, old budget, new budget, and reason — all in a single log line that grep can find in the broker log.

The response echoes back the input plus the old_budget_usd and status. The caller can verify the change took effect and see what was overridden from.

The Knox Approval Flow

The admin token pattern secures the endpoint from external callers. Within this system, the flow for requesting a budget override is:

Knox sends a Discord command to OpenClaw: "Give expert-panel $5 today — quarterly review running"
OpenClaw parses the intent, identifies it as a budget override request, formats it as a broker API call
The call goes to POST /v1/finops/budgets/override with BROKER_ADMIN_TOKEN from the environment
The override is applied, logged, and the response is sent back to Knox via Discord

The reason field in this flow is the original Discord message text — human-authored, naturally descriptive, automatically captured. The audit log entry becomes: reason="Give expert-panel $5 today — quarterly review running".

This flow is meaningful because it creates accountability without friction. Knox does not need to log into a dashboard, navigate to a budget management screen, fill out a form, and click save. He sends a Discord message. The system does the rest. But the audit trail is complete: who requested it, when, why, what changed.

The Audit Trail

The logger.info() call in the endpoint is the primary audit record. In production, the broker's log goes to /tmp/agent-broker.log, which is monitored by Watchdog Service for staleness and readable by any operator with SSH access.

The log line format:

2026-03-28 14:22:08,391 INFO Budget override: agent=expert-panel old=$2.00 new=$5.00 reason='quarterly review running, expected 3x normal token usage'

This line has everything needed for a post-mortem:

Timestamp (when)
Agent (who was affected)
Old and new budget (what changed)
Reason (why)

The implicit "who" — the operator who issued the command — is in the Discord channel history that generated the request. For direct API calls, the who is whoever held the BROKER_ADMIN_TOKEN, which is a single shared admin credential in the current implementation.

Override Lifecycle

Overrides are in-memory. They survive service restarts only if the broker has a persistence mechanism for the override state. In the current agent-broker implementation, overrides do not persist across restarts — this is intentional.

The rationale: overrides are operational exceptions, not permanent configuration. If a service restarts during an override, the agent returns to its default budget. For most override scenarios, this is acceptable — the override is granted for a specific session or day, and the need for the override is visible in the Discord history.

If an override needs to persist across restarts, the operator re-applies it after restart. The Discord history provides the audit trail for why it was applied again.

The clear_override() method removes an active override:

def clear_override(self, agent_id: str) -> None:
    """Remove budget override for a specific agent, reverting to default."""
    self._overrides.pop(agent_id, None)

Clearing an override that does not exist is a no-op — dict.pop() with a default silently handles the missing key. There is no "clear override" endpoint in the current API, so clearing happens programmatically or on service restart.

The Budgets Endpoint

The GET /v1/finops/budgets endpoint is the read companion to the override write:

@router.get("/budgets")
async def get_budgets(request: Request) -> dict[str, Any]:
    """All agent budget allocations (defaults + any active overrides)."""
    be = request.app.state.budget_enforcer
    return {
        "budgets": be.get_all_budgets(),
        "overrides": be._overrides,
    }

Two fields:

budgets — the full allocation map with overrides applied (what each agent's effective budget is right now)
overrides — just the active overrides (which agents have non-default budgets)

Returning both lets the caller distinguish default allocations from active overrides without diffing. If expert-panel appears in overrides, it is running with a temporary exception. If it only appears in budgets, it is at its default.

Note the direct attribute access: be._overrides. This is a mild encapsulation violation — the internal dict is returned directly. For a single-operator system, this is acceptable. A more rigorous implementation would expose it through a get_active_overrides() method.

Putting the FinOps Layer Together

With all five components covered, the complete FinOps flow for a single LLM call in the agent broker:

agent requests LLM call
  │
  ├─ enforce_tier(agent_type, requested_model)
  │    └─ returns allowed model (may be downgraded)
  │
  ├─ check_budget(agent_id, is_critical)
  │    ├─ check global ceiling
  │    ├─ check agent budget (skip if critical)
  │    └─ returns BudgetCheckResult(allowed, warning, ...)
  │
  ├─ [if not allowed: block call, fire budget.exhausted event]
  │
  ├─ LLM call executes with allowed model
  │
  ├─ cost_tracker.record(agent_id, session_id, model, input, output)
  │    ├─ creates CostRecord
  │    ├─ updates _daily_spend
  │    └─ fires emit_callback → cost.attributed event
  │
  └─ check_turn(session_id, turn_content)  [loop detector]
       ├─ compute max Jaccard similarity against window
       ├─ update consecutive_similar counter
       └─ returns LoopCheckResult(looping, should_terminate, ...)

Each step is independent. Each step has its own module, its own tests, its own failure mode. The composition is the enforcement layer, and it runs before and after every LLM call.

The REST API surfaces this state for operator inspection and intervention:

GET  /v1/finops/spend            — live spend by agent and model
GET  /v1/finops/budgets          — current allocations with active overrides
POST /v1/finops/budgets/override — apply a temporary budget increase
GET  /v1/finops/report           — full CFO report (admin only)

Core FinOps Summary

The lessons so far have covered the foundational agent-broker cost control surface:

The “The $200 Weekend Problem” lesson — Why FinOps is infrastructure, not accounting. The failure modes of uncapped agent spend.
The “Model Tier Routing” lesson — Model tier routing. The tier order, ceiling map, and enforce_tier() downgrade logic.
The “Per-Agent Daily Budgets” lesson — Per-agent daily budgets. The 80% warning, 100% stop, global ceiling, and critical session override.
The “Loop Detection” lesson — Loop detection. Jaccard similarity, 3-turn warning, 5-turn termination.
The previous lesson — cost.attributed events. The CostRecord model, emit callback pattern, and CFO report structure.
The “Budget Override With Audit Trail” lesson — Budget override with audit trail. Admin token enforcement, mandatory reason field, Knox approval flow.

The system described across these lessons is not theoretical — it is the production implementation running in the agent broker. Every LLM call the agent fleet makes passes through these controls. The $25.00 global ceiling is real. The per-agent budgets are real. The loop detector has caught real loops.

Build this layer before you need it. You will know you needed it when you do not.

What's Next

The track continues with three applied lessons that extend these foundations into real production patterns:

The next lesson — Model tier routing in cost-efficient agent architecture: prompt caching as the biggest single cost lever, the three-tier routing matrix, and the silent quota-error failure mode that drops jobs without a trace.
Directive SLA enforcement governing your agent fleet with time-bound response contracts and escalation paths.
The build-without-scheduling pitfall why agents that ship without a run schedule silently disappear from production and how to prevent it.

Budget Override With Audit Trail — Knox Approval Flow