Ask Knox

The most tempting architectural mistake when building an AI agent broker is using an AI to route messages.

The reasoning sounds plausible: the messages are complex, the org structure is nuanced, an LLM understands context better than hardcoded rules. Let the model figure out where each message should go.

This reasoning is wrong. Here is why, and here is what the right answer looks like.

Why LLMs Do Not Belong in the Router

Infrastructure has three requirements that LLMs structurally cannot meet: determinism, speed, and auditability.

Determinism. Given the same input, a router must produce the same output every time. An LLM produces probabilistically similar outputs, not identical ones. If a trade.executed message from Foresight goes to VP Trading today and to Agent gateway tomorrow because the model interpreted the context differently, you do not have a routing layer — you have a routing lottery.

Speed. The router sits on the hot path of every message in the system. A Foresight trade execution, an InDecision Engine signal, a Monitoring system health alert — every one of them passes through the router synchronously before dispatch. An LLM API call adds 500ms to 2000ms of latency and introduces a network dependency on every message. A Python conditional runs in microseconds with no network dependency.

Auditability. When a message was routed incorrectly, you need to know why. "The LLM decided" is not a satisfying explanation for a compliance or incident review. "Rule 3 matched because the subtype starts with trade." is a satisfying explanation. Deterministic rules can be logged, replayed, and verified. LLM decisions cannot.

The comment at the top of broker/core/router.py makes this explicit:

"""
Deterministic org-aware message router.

9 routing rules derived from org chart and message type contracts.
NO LLM logic anywhere in this file. Rules are exhaustive.
"""

This is not a temporary decision pending a better model. It is a permanent architectural constraint. The routing logic lives in the infrastructure layer. The intelligence lives in the agents.

The 9 Rules

The router evaluates rules in order. First match wins for primary routing. Fan-out rules are additive — they do not stop evaluation. Here is the complete routing logic from the production codebase:

def route(self, message: A2AMessage) -> RouteDecision:
    """
    The 9 rules from A2A-SPEC.md:
    1. CRITICAL → openclaw + knox
    2. escalation → direct manager
    3. trade.* → vp-trading + cfo, mission-control
    4. indecision-engine signal.published → all trading agents
    5. service.* → vp-engineering + monitor, mission-control
    6. content.* → vp-content + mission-control
    7. memory.committed → memory-service only
    8. *.report → manager + expert-panel
    9. tool.* → sr-director-tooling (low priority)
    Default: sender's direct manager
    """
    # RULE 1: CRITICAL priority → openclaw + knox
    if message.envelope.priority == "critical":
        return RouteDecision(
            primary_recipients=["openclaw", "knox"],
            fan_out=["mission-control", "monitor"],
            store_in_memory=True,
        )

    # RULE 2: Escalations → direct manager
    if message.type == "escalation":
        manager = self._get_manager(message.envelope.from_agent)
        return RouteDecision(
            primary_recipients=[manager],
            fan_out=["mission-control"],
            store_in_memory=True,
            priority_override="high",
        )
    ...

Let's walk through each rule and the design reasoning behind it.

Rule 1: CRITICAL Priority

Any message marked with priority: critical bypasses all type-based routing and goes directly to Agent Gateway (the CEO agent — openclaw is its agent ID, which is why the code above routes to ["openclaw", "knox"]) and Knox (the human founder). This is the circuit breaker — the one rule that overrides everything else.

Fan-out includes Mission Control (the operations dashboard) and Monitoring System (fleet health alerts). Every critical message is also stored in Semantic Memory Layer for post-incident review.

The insight here is that critical priority is not a message type — it is a severity declaration. It does not matter whether the message is a trade failure, a service crash, or an authority breach. If something is critical, it reaches the top of the chain immediately.

Rule 2: Escalations

When an agent's authority check fails, the broker converts the message to an escalation and runs it through Rule 2. The escalation goes to the sender's direct manager, not to a fixed address.

This is where the org chart becomes load-bearing. _get_manager() looks up the sender's Agent Card and reads org.reportsTo. For Foresight, that returns vp-trading. For InDecision Engine (the Sr. Director of Signals), it also returns vp-trading. The chain of command is encoded in the cards, not in the routing logic.

def _get_manager(self, agent_id: str) -> str:
    """Get an agent's direct manager from the registry."""
    card = self.registry.get(agent_id)
    if card and card.org.reportsTo:
        return card.org.reportsTo
    return "openclaw"

The fallback to Agent Gateway (return "openclaw") ensures that if an unknown agent sends a message, it still reaches a responsible party rather than getting dropped.

Rule 3: Trade Messages

The trade.* subtype namespace covers all trading activity: trade.executed, trade.completed, trade.halted, trade.failed, etc. All of it routes to VP Trading.

Two subtypes — trade.executed and trade.completed — also trigger Semantic Memory Layer storage. These are the permanent record events. A trade.entry is ephemeral; a trade.executed is history. The distinction matters for memory hygiene.

Rule 4: InDecision Engine Signal

This rule is specific to one sender and one subtype combination: from_agent == "indecision-engine" and subtype == "signal.published". When InDecision Engine publishes a signal, the broker queries the registry for all revenue-product agents and delivers to all of them simultaneously.

if (
    message.envelope.from_agent == "indecision-engine"
    and message.subtype == "signal.published"
):
    trading_agents = self._get_trading_agents()
    return RouteDecision(
        primary_recipients=trading_agents,
        fan_out=[],
        store_in_memory=False,
        priority_override="high",
    )

The fallback list — ["foresight", "sports-bot", "signal-engine", "perp-bot"] — is used if the registry query fails. This means InDecision Engine signals reach trading bots even if the registry is temporarily degraded. Revenue continuity is preserved.

Rule 5: Service Messages

Infrastructure services (Semantic Memory Layer, NATS, the event bus, Watchdog Service) emit service.* messages when their status changes. These go to VP Engineering with fan-out to Monitoring System and Mission Control.

One specific subtype gets additional routing: service.offline also fans out to Agent Gateway. A service going offline is not just an engineering concern — it is an operational concern that the CEO agent needs to know about.

if message.subtype == "service.offline":
    fan_out.append("openclaw")

This is a good example of how individual subtypes can add fan-out recipients without breaking the primary routing decision.

Rules 6, 7, 8, 9

Content messages (content.*) route to VP Content with Mission Control fan-out. Content published events trigger Semantic memory layer storage.

Memory commits (memory.committed) go exclusively to Semantic Memory Layer. No fan-out, no manager escalation, no Mission Control. Memory commits are private internal operations.

Report messages (*.report) go to the sender's manager plus the Multi-agent advisory system. The .endswith(".report") pattern covers pnl.report, sla.report, audience.report, and any future report type without requiring rule updates.

Tool messages (tool.*) go to the Sr. Director of Tooling at low priority. Tool operations are the lowest priority traffic in the system.

The Default

When no rule matches, the message goes to the sender's direct manager with Mission Control fan-out. This is the safety net — no message can be dropped by the routing layer.

The RouteDecision Object

Every routing decision returns a RouteDecision dataclass:

@dataclass
class RouteDecision:
    """Where a message should go after routing."""
    primary_recipients: list[str] = field(default_factory=list)
    fan_out: list[str] = field(default_factory=list)
    store_in_memory: bool = False
    priority_override: Optional[str] = None
    converted_to_escalation: bool = False

The separation of primary_recipients and fan_out is intentional. Primary recipients are the direct targets — the agents whose inboxes this message is primarily meant for. Fan-out recipients are observers and record-keepers — Mission Control, Monitoring System, the Multi-agent Advisory System.

From the dispatcher's perspective, both lists get messages. But the distinction matters for audit queries and for understanding who was meant to act versus who was meant to observe.

Designing Rules That Are Exhaustive

The most important property of this routing system is that the rules are exhaustive. Every possible message will match at least the default rule. There is no case where the router returns nothing.

This was achieved by designing the rules as a priority ladder:

Critical severity overrides everything
Message type overrides subtype patterns
Subtype prefixes cover namespaces
Subtype suffixes cover cross-cutting concerns (reports)
The default catches everything else

The rules were derived from the org chart and message type contracts before any code was written. This is the right order of operations: design the contracts first, implement the rules second, write the tests third.

What Deterministic Routing Enables

By keeping the router free of LLM calls, three capabilities become possible that would otherwise be fragile or impossible:

Unit testability. Every rule can be tested with a simple synthetic message. The test for Rule 3 is: construct a message with subtype="trade.executed", call router.route(message), assert primary_recipients == ["vp-trading"]. No mocks, no API calls, no probabilistic behavior.

Replay. The broker can reprocess any historical message and verify it would route identically today. This is essential for incident reconstruction and rule change validation.

Hot-reload without risk. The routing rules are static Python code, not a model configuration. Adding a new rule is a code change with a test — not a prompt change with uncertain behavior.

The 9 rules in broker/core/router.py are the backbone of the Agent Broker's reliability guarantee. They are simple, verifiable, and intentionally free of intelligence. The intelligence belongs to the agents that send the messages, not the infrastructure that routes them.

Deterministic Routing — No LLM in the Router