Ask Knox

The previous six lessons built the components. This lesson assembles them — an end-to-end walkthrough of a production agent operations platform processing a real task, from the moment a directive enters the system to the moment the result is delivered.

Then it answers the question every builder asks after assembling the first working system: what do I build next?

The Platform Architecture

Every component has been covered. Here is how they connect:

External World
  ↓
Bridge Layer (Discord, cron, webhooks, HTTP)
  ↓
Agent Broker
  ├─ Agent Card Registry
  ├─ Routing Rules Engine (9 deterministic rules)
  ├─ Audit Log
  └─ Offline Queues
  ↓
CEO Triage Engine
  ├─ Structured Report Parser
  ├─ Triage Rules Engine (12 rules)
  └─ Authority Checker
  ↓
Team Skills / Specialist Dispatch
  ├─ Team Skill Definitions (YAML)
  ├─ Phase Execution (parallel + sequential)
  └─ Territory Enforcement
  ↓
Specialist Agents
  ├─ Boot Protocol (seed load + memory hydration)
  ├─ Model Routing (task-type based)
  ├─ Domain Execution
  └─ Shutdown Protocol (memory flush)
  ↑
Memory Layer (Semantic Memory Layer)
  ├─ Per-agent namespaces
  └─ Shared org knowledge
  ↑
Health Monitor (separate process)
  ├─ Stale Execution Detector
  ├─ Doom Spiral Detector
  ├─ Hallucination Validator
  └─ Circuit Breakers

End-to-End Walkthrough: Feature Request

Let's trace a feature request from Discord to an open PR with passing tests.

T=0: Discord message arrives

Ava types in the #agent-tasks Discord channel:

!agent task: Add rate limiting to the /api/signals endpoint.
             Max 100 requests per minute per user.

T=0.1: Bridge script translates

# discord_bridge.py processes the message
directive = Directive(
    id="dir-a7f3c9",
    source="discord",
    sender_id="human-operator",  # Ava passed the bridge's operator allowlist (lesson 245)
    type="task",
    domain="coding",
    description="Add rate limiting to /api/signals endpoint. Max 100 req/min/user.",
    priority="normal",
    created_at=datetime.now(timezone.utc),
)
await broker.route(directive)

T=0.2: Broker routes to CEO triage

The broker receives the directive. No explicit target is set, and org policy routes every inbound external task directive to the CEO triage engine first — authority and skill-dispatch decisions happen before any specialist sees the work. The directive remains in pending state — the broker does not acknowledge it to an agent yet.

T=0.3: CEO triage dispatches to team skill

The triage engine receives the directive. This is a type="task" directive (not a report), so the report-triage rules from lesson 246 (R01–R06) do not apply. The engine evaluates the task path instead: blast radius is "single-repo", the domain is "coding", and no authority ceiling is reached — a standard task within single-repo blast radius dispatches to the matching team skill. The triage engine dispatches to the feature-team skill, and coding-agent-01 acknowledges the directive — transitioning it from pending to acknowledged.

T=0.5: Team skill initialized

The feature-team skill activates two specialists for this task: Backend Developer and QA Engineer. The frontend dev is not needed for an API-only change.

skill = TeamSkill.load("feature-team")
session = await skill.execute(
    directive=directive,
    participants=["backend-dev-01", "qa-engineer-01"]
)

T=1: Backend Dev boots

# Backend agent boot protocol
seed = load_seed_file("agents/backend-dev/seed.md")
context = await memory_query(
    query="rate limiting patterns, signals endpoint, recent API changes",
    namespace="backend-dev-01",
    limit=10
)
# Memory returns:
# - Prior rate limiting implementation on /api/trades (2026-02-15)
# - Known pattern: use Redis sliding window with lua script
# - Signals endpoint structure from last PR review

The agent starts with the prior rate limiting implementation already in context. No re-explanation needed.

T=3: Backend Dev implements

Model routing: the task is standard implementation → Sonnet.

# backend/middleware/rate_limit.py
import redis
import time
import uuid
from typing import Optional

class SlidingWindowRateLimiter:
    def __init__(self, redis_client: redis.Redis):
        self.redis = redis_client
        self.lua_script = self.redis.register_script("""
            local key = KEYS[1]
            local now = tonumber(ARGV[1])
            local window = tonumber(ARGV[2])
            local limit = tonumber(ARGV[3])
            local member = ARGV[4]

            -- Remove entries outside window
            redis.call('ZREMRANGEBYSCORE', key, 0, now - window)

            -- Count current entries
            local count = redis.call('ZCARD', key)

            if count < limit then
                redis.call('ZADD', key, now, member)
                -- window is in milliseconds, so PEXPIRE (EXPIRE takes seconds)
                redis.call('PEXPIRE', key, window)
                return 1  -- allowed
            end
            return 0  -- denied
        """)

    def check(
        self,
        user_id: str,
        endpoint: str,
        limit: int = 100,
        window_seconds: int = 60
    ) -> bool:
        key = f"rate_limit:{endpoint}:{user_id}"
        now = int(time.time() * 1000)  # milliseconds
        # Unique member per request: two requests in the same millisecond
        # must not collapse into one sorted-set entry, or the limiter
        # undercounts and lets users exceed the limit.
        member = f"{now}-{uuid.uuid4().hex}"

        result = self.lua_script(
            keys=[key],
            args=[now, window_seconds * 1000, limit, member]
        )
        return bool(result)

T=8: Backend Dev files completion report

report = AgentReport(
    agent_id="backend-dev-01",
    report_type="completion",
    headline="Rate limiting implemented on /api/signals",
    status="ok",
    confidence=0.95,
    findings=[
        Finding(severity="info", description="Used sliding window Redis pattern from prior implementation"),
        Finding(severity="info", description="Added 3 unit tests: allow, deny, window reset"),
    ],
    recommendation="QA to run integration tests",
    auto_resolvable=True,
    blast_radius="single-repo",
)

T=8.5: QA Engineer activates

The team skill's phase gate passes (backend complete), QA activates in parallel to the backend agent's shutdown.

T=12: QA Engineer completes

Integration tests pass. QA files a completion report. Confidence: 0.94.

T=13: PR created

The backend agent's shutdown protocol runs:

# Store the implementation pattern to memory
await memory_store(
    content="Rate limiting on /api/signals: SlidingWindowRateLimiter "
            "with Redis ZADD/ZREMRANGEBYSCORE lua script. "
            "100 req/min per user. Key format: rate_limit:{endpoint}:{user_id}. "
            "Tests in tests/backend/test_rate_limit.py. PR: #312.",
    category="coding",
    tags=["rate-limiting", "redis", "signals-endpoint", "implementation"],
    type="episodic"
)

PR #312 is created. CI runs. Tests pass. The broker transitions the directive to completed — the task's definition of done is an open PR with passing tests. Merging requires human review: Tier 2 agents cannot merge_to_main.

T=13.5: Discord notification

The bridge script reports back to the #agent-tasks channel:

Directive dir-a7f3c9: COMPLETED
PR #312 opened: Add rate limiting to /api/signals
Tests: 3 unit + 2 integration — all pass
Duration: 13 minutes

The total wall-clock time from Discord message to open PR with passing tests: 13 minutes. No human wrote a line of code.

Startup Sequence

Getting the platform running the first time requires a specific startup order:

#!/bin/bash
# start-platform.sh

# 1. Memory system first — agents need it on boot
docker compose up -d memory-service
# DEV ONLY: fixed sleep; in production replace with a health-check loop:
#   until curl -s http://localhost:<memory-service-port>/health; do sleep 1; done
sleep 5

# 2. Broker — agents register with it on boot
python -m broker.main &
# DEV ONLY: fixed sleep; production: until curl -s :8080/health; do sleep 1; done
sleep 3

# 3. Agents — each runs boot protocol on start
python -m agents.backend_dev &
python -m agents.qa_engineer &
python -m agents.trading_agent &
python -m agents.content_agent &
# DEV ONLY: fixed sleep; production: poll broker /agents/ready endpoint
sleep 5

# 4. Health monitor — needs agents to be running
python -m health.monitor &

# 5. Bridges — start accepting external input last
python -m bridges.discord_bridge &
python -m bridges.cron_bridge &

echo "Platform running"

The Daily Operating Pattern

Once running, the platform operates with minimal human input. The human's daily interaction with the platform:

Morning: Read the digest

Daily Digest — 2026-03-30

Directives: 47 total
├─ Auto-resolved: 43
├─ Escalated (internal): 3
└─ Escalated (human): 1 ← review needed

Cost: $2.14 (budget: $5.00/day)
Health: All agents green
Open PRs: 3 (2 in CI, 1 awaiting review)

1 Human Escalation:
  trading-agent: "New market pattern not in seed knowledge"
  → Needs: updated strategy params or explicit guidance

As needed: Review the one escalation, provide guidance.

Weekly: Review each agent's Semantic Memory Layer store — look for patterns in what's being stored, update seed files as operational knowledge matures.

Monthly: Update authority tiers based on demonstrated agent reliability, refine triage rules based on false escalations, retire stale memory entries.

What to Build Next

After the core platform is running and processing real work, four extensions matter most.

1. Mission Control Dashboard

Before expanding the fleet, build visibility. A dashboard showing:

All agents: status, current directive, circuit breaker state
Directive queue: pending, in-progress, completed, failed
Cost per agent per day with budget burn rate
Health monitor alerts, active and resolved
Memory growth per agent namespace

Without visibility, you manage the fleet blind. Build this before adding agents.

2. Automated Knowledge Curation

The Semantic Memory Layer store grows over time. Some entries are still relevant; others are outdated. Build an automated curation process:

async def curate_memory_namespace(agent_id: str, namespace: str) -> CurationResult:
    """
    Weekly: review memory entries, promote stable ones to seed files,
    archive stale ones, surface knowledge gaps.
    """
    entries = await memory_query(namespace=namespace, limit=100)

    for entry in entries:
        age_days = (datetime.now(timezone.utc) - entry.created_at).days
        access_count = entry.access_count

        if age_days > 90 and access_count == 0:
            await memory_delete(entry.id)  # stale, unused
        elif age_days > 30 and access_count > 10:
            # Frequently accessed, stable — promote to seed file
            await promote_to_seed(entry, agent_id)

3. Cross-Agent Knowledge Sharing

Some discoveries should propagate across agents. When the coding agent discovers a new API pattern, the QA agent should know. Build a knowledge propagation system:

# When an agent stores a memory entry with propagation flag
await memory_store(
    content="Supabase RLS policy bug: policies are not evaluated for service_role key...",
    category="coding",
    tags=["supabase", "rls", "bug", "org-wide-knowledge"],
    propagate_to=["qa-engineer", "content-agent"],  # who else needs to know
)

4. Incident Replay

When things go wrong, you need to reconstruct what happened. Build an incident replay system from the audit log:

async def replay_incident(
    incident_start: datetime,
    incident_end: datetime
) -> IncidentTimeline:
    """
    Reconstruct the sequence of events from the audit log.
    """
    events = await audit_log.query(
        from_time=incident_start,
        to_time=incident_end,
        include_directives=True,
        include_agent_state=True,
        include_health_events=True,
    )
    return build_timeline(events)

The Compounding Return

A one-agent system has linear returns: more work requires proportionally more human time.

A platform has compounding returns: the work grows, but the human time required does not. Agents accumulate expertise. Triage rules improve with each false positive fixed. Memory grows richer with each session. The platform gets better at being the platform.

The first month feels like infrastructure investment. By month six, it feels like leverage.

The final measurement that matters: how much work did the platform process this week that would have required your direct attention? Track that number. Watch it grow. That is the return on the infrastructure you built.

Summary

The platform components connect in a specific order: memory → broker → agents → health monitor → bridges
The startup sequence matters: agents depend on the Semantic Memory Layer being ready; the health monitor depends on agents running
A real task flows in ~13 minutes from Discord message to an open PR with passing tests
The daily operating pattern centers on a digest — not a firehose
Build Mission Control before expanding the fleet — visibility enables management
Automated memory curation, cross-agent knowledge sharing, and incident replay are the highest-value extensions
The compounding return is the whole point: the platform gets better at being the platform

Track Complete

You now have the complete blueprint: from the problem with stateless generalists to a running production platform. The concepts are transferable — the seed file pattern, the boot/shutdown protocol, deterministic routing, authority ceilings, circuit breakers. These apply whether you are building on Agent Gateway, on raw Claude Code sessions, or on any other agent execution environment.

The platform that runs well at three agents scales to thirty. Build it right once.

The Complete Platform