Ask Knox

When you deploy an AI agent into a live system, it will eventually try to do something it shouldn't. Not because it's malicious — because its judgment about what's appropriate will diverge from yours at some edge case you didn't anticipate.

The question is not whether this will happen. The question is what your system does when it does.

The Agent Broker answers this with a concept called the authority ceiling: a hard set of constraints attached to every agent that defines exactly what it can do without asking first.

Why Ceilings, Not Trust

The naive approach to agent safety is trust calibration. You evaluate an agent's track record, assign it a trust score, and gate its actions on that score. Agents that perform well earn more latitude.

This breaks in practice for three reasons.

First, trust is backward-looking. It tells you about past behavior in past conditions. An agent that has placed 10,000 trades correctly under normal market conditions has not demonstrated what it will do when a circuit breaker fires at 2am and liquidity evaporates. Trust scores accumulated in normal conditions do not transfer cleanly to novel conditions.

Second, trust is continuous and implicit. A ceiling is discrete and explicit. When something goes wrong, you need to know exactly what the system was and wasn't allowed to do. "The agent had a trust score of 0.87" is not an answer. "The agent was authorized to execute trades up to $500 without approval" is.

Third, trust scores require tuning. They have hyperparameters. They degrade. They require monitoring. A ceiling is a config value that a human set and owns. It does not drift.

The Agent Broker uses ceilings.

The Three Dimensions of Authority

Every agent in the system has an authority block in its Agent Card with three independently enforced constraints:

Financial ceiling (maxAutonomousDollars): The maximum dollar value of an action the agent can take without human approval. A trading bot with a $500 ceiling can place a $400 trade autonomously. A $600 trade becomes an escalation. An agent with maxAutonomousDollars: 0 cannot make any autonomous financial decisions.

Risk tier (maxRiskTier): A ranked enumeration — low, medium, high, critical. An agent ceilinged at medium risk can act on low and medium operations. A high risk action becomes an escalation. The full ordering is enforced at comparison time.

Approval list (requiresApprovalFor): Named action types that always require explicit approval, regardless of dollar amount or risk tier. This handles cases that don't fit cleanly into the first two dimensions — actions that might be cheap and low-risk in isolation but that you've decided always need a human in the loop.

The check() Method

Here is the actual check() implementation from broker/core/authority.py:

def check(self, message: A2AMessage) -> AuthorityResult:
    card = self.registry.get(message.envelope.from_agent)
    if not card:
        return AuthorityResult(
            authorized=False,
            reason="Unregistered agent — no authority granted",
        )

    authority = card.authority

    # Check financial ceiling
    implied_dollars = self._extract_financial_impact(message)
    if implied_dollars > 0 and implied_dollars > authority.maxAutonomousDollars:
        return AuthorityResult(
            authorized=False,
            reason=(
                f"Financial authority exceeded: "
                f"action implies ${implied_dollars:.2f}, "
                f"ceiling is ${authority.maxAutonomousDollars:.2f}"
            ),
        )

    # Check risk tier
    implied_risk = self._extract_risk_tier(message)
    max_risk = authority.maxRiskTier
    if (
        implied_risk in RISK_TIER_ORDER
        and max_risk in RISK_TIER_ORDER
        and RISK_TIER_ORDER.index(implied_risk)
        > RISK_TIER_ORDER.index(max_risk)
    ):
        return AuthorityResult(
            authorized=False,
            reason=(
                f"Risk tier exceeded: action is {implied_risk}, "
                f"ceiling is {max_risk}"
            ),
        )

    # Check specific approval requirements
    for requires in authority.requiresApprovalFor:
        if requires in message.subtype:
            return AuthorityResult(
                authorized=False,
                reason=f"Requires explicit approval: {requires}",
            )
        if self._payload_contains_action(message.payload, requires):
            return AuthorityResult(
                authorized=False,
                reason=f"Requires explicit approval: {requires}",
            )

    return AuthorityResult(authorized=True)

Notice what this does not do. It does not raise an exception. It does not reject the message. It returns an AuthorityResult with authorized=False and a reason. The caller — the router — decides what to do with that result.

The design is deliberate. check() is a pure function. It has no side effects. It can be called anywhere in the pipeline without worrying about what it will do to system state. The consequence of an authority breach happens in the routing layer, not the enforcement layer.

Financial Impact Extraction

The broker infers financial impact from a message's payload by scanning for known field names:

def _extract_financial_impact(self, message: A2AMessage) -> float:
    payload = message.payload
    for field_name in ("size", "amount", "value", "cost", "budget"):
        val = payload.get(field_name)
        if isinstance(val, (int, float)):
            return float(val)
    return 0.0

This is intentionally conservative. If the field name is not in the known list, financial impact defaults to zero and the check passes. The alternative — treating unknown payload shapes as high-risk — would generate too many false positives on legitimate agent communication.

The implication: if you're building an agent that handles money, your payload schema should use one of these field names. Use amount for monetary transactions, size for trade size, budget for planning actions. This is not arbitrary — it is the naming convention the enforcement layer depends on.

Risk Tier Extraction

Risk tier is extracted from payload fields riskLevel (direct) or severity (mapped):

def _extract_risk_tier(self, message: A2AMessage) -> str:
    payload = message.payload
    if "riskLevel" in payload:
        return str(payload["riskLevel"])
    if "severity" in payload:
        mapping = {
            "warning": "medium",
            "critical": "critical",
            "high": "high",
        }
        return mapping.get(str(payload["severity"]), "low")
    return "low"

Absent explicit risk annotation, actions default to low risk. This means if your agent is sending messages without a riskLevel field, it is effectively claiming low-risk status. For agents that operate on sensitive infrastructure — deployment, configuration changes, database operations — you should annotate messages explicitly with riskLevel: "high" or riskLevel: "critical" to ensure the authority system evaluates them correctly.

Approval Requirements and Payload Scanning

The requiresApprovalFor check is the most nuanced. It inspects both the message subtype and the payload recursively to find matching action names:

def _payload_contains_action(
    self, payload: dict, action: str, _depth: int = 0
) -> bool:
    if _depth > 10:
        return False
    for key, val in payload.items():
        if key == action:
            return True
        if isinstance(val, str) and val == action:
            return True
        if isinstance(val, dict) and self._payload_contains_action(
            val, action, _depth + 1
        ):
            return True
    return False

The depth limit at 10 prevents traversal loops on pathologically nested payloads. Within this payload scan, the matching is exact — substring matches are not flagged. This means requiresApprovalFor: ["deploy"] catches a payload key named deploy but not one named deployment_target. Precision over recall; false negatives here are handled by the approval list being configured carefully.

The subtype check in check() is deliberately looser: if requires in message.subtype is substring containment, so "deploy" would also flag a subtype like deployment.schedule. The two checks trade off differently — subtypes are a small, broker-controlled vocabulary where a wide net is cheap, while payloads are arbitrary agent data where substring matching would generate constant false positives.

What Ceilings Are Not

Authority ceilings are not a replacement for monitoring. An agent operating repeatedly near its ceiling — repeatedly generating escalations it knows will be approved — is a signal worth investigating even if no individual action was technically unauthorized.

Authority ceilings are not a replacement for hard blocks. Ceilings define what requires approval. Hard blocks define what is never allowed regardless of approval. Those are in a separate module for a reason.

Authority ceilings are not static forever. As you build trust in an agent's behavior over time, raising its ceiling is a deliberate operational decision you make consciously — not something that happens automatically. That deliberateness is the point.

Setting Ceilings for Real Agents

A trading bot that operates at the $200-400 trade size gets a ceiling of $500 — enough headroom to operate normally, tight enough to catch runaway sizing. A content publishing agent gets maxAutonomousDollars: 0 because publishing decisions should never be purely autonomous. An infrastructure agent gets requiresApprovalFor: ["production.deploy", "launchd.daemon.create"] because those actions are cheap but consequential.

The authority ceiling is where your operational risk appetite becomes code. Set it once. Review it quarterly. Tighten it any time an agent surprises you.

The Authority Ceiling