Ask Knox

A fleet of well-designed specialist agents, each with deep expertise and proper memory infrastructure, is still a collection of isolated processes without wiring. Wiring is what turns a collection into an organization: a shared identity system, a message broker that routes work correctly, bridge scripts that connect heterogeneous services, and a directive lifecycle that gives every piece of work a traceable state.

This lesson builds the wiring.

Agent Cards: Identity in the Org

Every agent in the org has a card — a structured document that the broker uses to make routing and authority decisions. The card is the agent's identity document.

@dataclass
class AgentCard:
    # Identity
    id: str                    # "coding-agent-01"
    name: str                  # "Coding Agent"
    version: str               # "2.1.0"
    description: str

    # Capabilities (what the agent can do)
    domains: list[str]         # ["coding", "testing", "code-review"]
    capabilities: list[str]    # ["write_code", "run_tests", "review_pr"]
    skills: list[str]          # ["feature-team", "quality-team"]

    # Authority (what the agent is allowed to do)
    authority_tier: int        # 1=read-only, 2=write, 3=deploy, 4=production
    max_blast_radius: str      # "single-repo", "org-wide", "infrastructure"
    requires_approval: list[str]  # actions that need human sign-off

    # Communication
    endpoint: str              # "http://localhost:9001"
    message_schema: str        # version of the message protocol
    timeout_seconds: int       # max time to wait for response

    # State (runtime, not stored in card file)
    status: str = "offline"    # offline|ready|busy|error
    current_directive: Optional[str] = None
    last_seen: Optional[datetime] = None

Agent Cards are stored in a registry file and loaded by the broker on startup:

# agents/registry.yaml
agents:
  - id: coding-agent-01
    name: Coding Agent
    version: 2.1.0
    domains: [coding, testing, code-review]
    capabilities: [write_code, run_tests, review_pr, open_pr]
    authority_tier: 2
    max_blast_radius: single-repo
    requires_approval: [deploy_to_production, delete_branch]
    endpoint: http://localhost:9001
    message_schema: v2

  - id: trading-agent-01
    name: Trading Agent
    version: 1.4.0
    domains: [trading, risk, portfolio]
    capabilities: [analyze_market, check_positions, alert_risk]
    authority_tier: 2
    max_blast_radius: single-portfolio
    requires_approval: [modify_strategy_params, increase_position_size]
    endpoint: http://localhost:9002
    message_schema: v2

  - id: content-agent-01
    name: Content Agent
    version: 1.2.0
    domains: [content, social, publishing]
    capabilities: [write_post, schedule_content, publish_article]
    authority_tier: 2
    max_blast_radius: content-pipeline
    requires_approval: [publish_to_production, modify_brand_voice]
    endpoint: http://localhost:9003
    message_schema: v2

The Agent Card registry is the org chart of the agent fleet. When a new directive arrives, the broker consults the registry to find an agent with the right domain and available status — without ever sending a message to any agent until routing is decided.

The Agent Broker Pattern

The broker is the central nervous system. Every message in the org goes through it. No agent sends directly to another agent without broker mediation.

Why? Because direct agent-to-agent communication bypasses:

Authority checks (can this agent send this type of message to that agent?)
Audit logging (was this message sent? was it received? what happened?)
Fan-out coordination (this signal needs to reach multiple agents simultaneously)
Offline handling (the target agent is down; where does the message go?)

class AgentBroker:
    def __init__(self, registry: AgentRegistry):
        self.registry = registry
        self.audit_log = AuditLog()
        self.offline_queues: dict[str, list[Message]] = {}
        self.routing_rules = RoutingRules()

    async def route(self, message: Message) -> RouteResult:
        # 1. Validate message schema
        self.validate_schema(message)

        # 2. Authenticate sender
        sender = self.registry.get(message.sender_id)
        if not sender:
            raise UnknownSender(message.sender_id)

        # 3. Authority check — can this sender send this message type?
        if not self.check_authority(sender, message.type):
            raise AuthorityViolation(
                f"{sender.id} cannot send {message.type}"
            )

        # 4. Deterministic routing — 9 rules, no LLM
        target = self.routing_rules.resolve(message, self.registry)

        # 5. Log routing decision
        self.audit_log.record(
            event="route_decided",
            message_id=message.id,
            from_agent=message.sender_id,
            to_agent=target.id,
            rule_applied=target.rule_used,
        )

        # 6. Deliver or queue
        if target.agent.status == "offline":
            self.enqueue_offline(target.agent.id, message)
            return RouteResult(status="queued", target=target.agent.id)

        result = await self.deliver(target.agent, message)
        return RouteResult(status="delivered", target=target.agent.id)

The routing rules are deterministic — nine explicit rules, evaluated in priority order, with no LLM reasoning:

class RoutingRules:
    """
    9 deterministic routing rules.
    Rules are evaluated in order; first match wins.
    No LLM calls. No ambiguity.
    """

    def resolve(self, message: Message, registry: AgentRegistry) -> RouteDecision:
        # Rule 1: Explicit target — honor it if authority permits
        if message.target_id:
            target = registry.get(message.target_id)
            if target and self.can_reach(message.sender_id, target):
                return RouteDecision(agent=target, rule_used="explicit-target")

        # Rule 2: Domain match — find agent with matching domain
        domain_agents = registry.by_domain(message.domain)
        ready = [a for a in domain_agents if a.status == "ready"]
        if ready:
            return RouteDecision(agent=ready[0], rule_used="domain-match")

        # Rule 3: Capability match — find agent with required capability
        if message.required_capability:
            capable = registry.by_capability(message.required_capability)
            ready = [a for a in capable if a.status == "ready"]
            if ready:
                return RouteDecision(agent=ready[0], rule_used="capability-match")

        # Rule 4: Authority tier — route up the hierarchy for escalations
        if message.type == "escalation":
            manager = registry.get_manager_for(message.sender_id)
            if manager:
                return RouteDecision(agent=manager, rule_used="escalation-up")

        # Rules 5-9: fan-out, offline-resilient delivery, fallback, etc.
        # ...

        raise NoRouteFound(f"No route for message {message.id}")

Bridge Scripts: Connecting Heterogeneous Services

Not every service in the org speaks the broker's message protocol. Legacy services use HTTP. Some use Redis pub/sub. Some write to files. Some trigger cron jobs. Bridge scripts translate between these protocols and the broker's message schema.

A bridge script has one rule: it is stateless. It translates and forwards. It never stores state.

# bridges/discord_bridge.py
# Translates Discord messages into broker directives

class DiscordBridge:
    def __init__(self, broker: AgentBroker, discord_client: DiscordClient):
        self.broker = broker
        self.discord = discord_client

    async def on_discord_message(self, msg: DiscordMessage) -> None:
        # Translate Discord message to broker directive
        if not msg.content.startswith("!agent"):
            return  # Not a command, ignore

        # Bridges are trust boundaries: authenticate the human BEFORE
        # stamping human authority on the directive. Anyone can type in
        # a channel — only allowlisted operators get the principal.
        if msg.author.id not in OPERATOR_IDS:
            await self.discord.reply(msg, "Not authorized to issue directives.")
            return

        directive = Directive(
            id=generate_id(),
            source="discord",
            sender_id="human-operator",  # verified operator = human authority
            type="task",
            domain=self.parse_domain(msg.content),
            description=self.parse_description(msg.content),
            priority="normal",
            created_at=datetime.now(timezone.utc),
        )

        # Forward to broker — do not store, do not modify
        result = await self.broker.route(directive)

        # Report back to Discord
        await self.discord.reply(msg, f"Directive {directive.id}: {result.status}")

Bridges are trust boundaries. A bridge is the point where unauthenticated external input enters the org, so it must verify who the human is before assigning the human-operator principal — otherwise anyone who can type in the channel inherits full human authority, and every authority check downstream is built on a lie.

# bridges/cron_bridge.py
# Translates scheduled cron triggers into broker directives

class CronBridge:
    def __init__(self, broker: AgentBroker):
        self.broker = broker
        self.schedule = load_schedule("cron/schedule.yaml")

    async def trigger(self, job_name: str) -> None:
        job = self.schedule.get(job_name)
        if not job:
            raise UnknownJob(job_name)

        directive = Directive(
            id=generate_id(),
            source="cron",
            sender_id="cron-scheduler",
            type=job.directive_type,
            domain=job.domain,
            description=job.description,
            priority=job.priority,
            created_at=datetime.now(timezone.utc),
        )

        await self.broker.route(directive)
        # No state stored — the directive is in the broker now

The Directive Lifecycle

Every piece of work in the org is a directive with a defined lifecycle. The lifecycle is the audit trail.

pending → acknowledged → in_progress → completed
                    ↘              ↘
                   rejected       failed
                              ↘
                            escalated

@dataclass
class Directive:
    id: str
    type: str                      # "task", "escalation", "query", "event"
    domain: str                    # "coding", "trading", "content"
    description: str
    sender_id: str
    target_id: Optional[str]       # None = broker routes automatically
    priority: str                  # "low", "normal", "high", "critical"
    created_at: datetime

    # Lifecycle state
    status: DirectiveStatus = DirectiveStatus.PENDING
    acknowledged_at: Optional[datetime] = None
    started_at: Optional[datetime] = None
    completed_at: Optional[datetime] = None
    result: Optional[dict] = None
    error: Optional[str] = None

    # Audit trail
    transitions: list[StatusTransition] = field(default_factory=list)

    def transition(self, new_status: DirectiveStatus, actor: str, note: str = ""):
        self.transitions.append(StatusTransition(
            from_status=self.status,
            to_status=new_status,
            actor=actor,
            timestamp=datetime.now(timezone.utc),
            note=note,
        ))
        self.status = new_status

The lifecycle transitions:

pending → acknowledged: The target agent received the directive and accepted it into its queue. If the agent is busy, the directive waits in the broker queue without transitioning.

acknowledged → in_progress: The agent started working on the directive. This is the moment the timer starts — if the agent spends too long in in_progress, the health monitor flags it.

in_progress → completed: The agent finished and returned a result. The audit log records the result.

in_progress → failed: The agent encountered an unrecoverable error. The broker may trigger a retry or escalation depending on the directive type and error classification.

in_progress → escalated: The agent determined the directive requires human review or a higher-authority agent. The broker routes the escalation up the hierarchy.

# Agent-side lifecycle management
async def process_directive(self, directive: Directive) -> None:
    # Acknowledge
    directive.transition(
        DirectiveStatus.ACKNOWLEDGED,
        actor=self.agent_id,
        note="Added to processing queue"
    )
    await self.broker.update_directive(directive)

    # Start
    directive.transition(
        DirectiveStatus.IN_PROGRESS,
        actor=self.agent_id,
    )
    await self.broker.update_directive(directive)

    try:
        result = await self.execute(directive)
        directive.result = result
        directive.transition(
            DirectiveStatus.COMPLETED,
            actor=self.agent_id,
        )
    except EscalationRequired as e:
        directive.transition(
            DirectiveStatus.ESCALATED,
            actor=self.agent_id,
            note=str(e)
        )
        await self.broker.escalate(directive, reason=str(e))
    except Exception as e:
        directive.transition(
            DirectiveStatus.FAILED,
            actor=self.agent_id,
            note=str(e)
        )

    await self.broker.update_directive(directive)

The directive lifecycle is the organization's audit trail. At any moment, you can query all directives in in_progress state, sorted by age, and immediately know what every agent is working on and how long it has been working. This is the operational visibility that distinguishes a platform from a collection of scripts.

Putting the Wiring Together

The wiring layer of an agent operations platform has four components:

Agent Cards — identity documents in a registry; the broker uses them for all routing and authority decisions
Agent Broker — central message router using deterministic rules; maintains audit log and offline queues
Bridge Scripts — stateless translators connecting heterogeneous protocols to the broker message schema
Directive Lifecycle — every piece of work has a traceable state from pending to completed

Together they create a system where every message is auditable, every agent is accountable, and every piece of work has a traceable history from origin to outcome.

Summary

Agent Cards encode identity, capabilities, authority tier, and endpoint — the broker uses them for all decisions
The agent-broker routes all messages through deterministic rules, not LLM reasoning
Bridge scripts translate external protocols (Discord, cron, webhooks) to the broker schema — always stateless
The directive lifecycle (pending → acknowledged → in_progress → completed) provides full audit trail
Offline resilience requires explicit queuing — the broker holds directives for offline agents

What's Next

With the org wired, the next lesson covers the decision layer that sits at the top: the CEO agent that triages incoming work, applies authority rules, and decides what to resolve automatically versus what to escalate to the human.