Ask Knox

Most code reviews are sequential. One reviewer reads the codebase, takes notes, produces a report. The problem with that model is not the quality of the reviewer — it is the cognitive cost of context switching. The same person cannot simultaneously think like a security auditor probing for injection vectors, an architect analyzing coupling patterns, a performance engineer measuring hot paths, and a testing engineer identifying coverage gaps. Switching between these mental models degrades each one.

The audit swarm solves this with specialization and parallelism. Instead of one reviewer context-switching, you run four agents simultaneously, each carrying a single deep domain model, each writing to its own output file. You get four expert perspectives in the same wall-clock time it would take one reviewer to produce one perspective.

This lesson breaks down exactly how the swarm works, what each agent looks for, and what the output structure looks like. The next lessons cover prioritization and dispatch — but the foundation is the swarm architecture itself.

The Four Domain Agents

The audit swarm dispatches four agents. Each receives the same codebase path and a domain-specific system prompt. Each writes an independent findings file. None of them wait for the others.

Agent 1 — Security Auditor

The security agent operates with a threat-modeling mindset. It reads authentication and authorization code first, then follows the data flow from external input to persistence. It looks for:

Auth bypass patterns: middleware that accepts any non-empty token, endpoints that skip authorization checks, hardcoded credentials
Injection vectors: SQL query construction via f-strings, shell commands with interpolated arguments, deserialization of untrusted input
Token and secret handling: raw tokens logged at INFO level, secrets stored in audit tables, tokens returned in response bodies without access control
Privilege escalation: any agent rotating another agent's token, unregistered agents granted full authority, synthetic test endpoints exposed in production

The security agent's output format is deliberate: every finding includes the exact file path and line numbers, a description of the vulnerability class, and the potential impact. This is not soft feedback — it is a bug report.

The examples throughout this lesson are drawn from an Agent Broker audit — the Agent Broker is an AI agent message router composed of 23 modules that handles routing, authority enforcement, kill-switch escalation, and audit logging for an agent fleet. In the Agent Broker audit, the security agent found 16 findings across P0 to P2. Among the P0 findings it surfaced were two deployment blockers: an auth middleware that accepted any non-empty bearer token (meaning the entire broker ran unauthenticated), and a SQL injection vector built via f-string interpolation. Both were fixed within hours of the audit completing.

Agent 2 — Architecture Analyst

The architecture agent reads for structure rather than behavior. It is looking for coupling violations, dead code, and systemic design problems that create correctional debt. It looks for:

Dead code and missing wiring: fully implemented subsystems that are never called from the main pipeline
Layer violations: business logic inside API handlers, infrastructure concerns leaking into domain models
Fragile path derivation: DB paths computed by string manipulation on other paths, which fails silently when the derivation assumption breaks
Stub implementations returning success: safety-critical functions (_revoke_all_tokens, _lock_env_files) that return True while doing nothing — the system reports success on operations that never happen

The architectural findings from the Agent Broker audit included P1 discoveries that were more dangerous than they appeared. The finops subsystem (cost tracking, budget enforcement, loop detection) was fully implemented — 400+ lines of production code — and never wired into the main message pipeline. It could not execute. Same for the feedback protocols. These were not TODO comments or stubs. They were complete implementations that were never connected.

Agent 3 — Performance Engineer

The performance agent looks for patterns that degrade throughput under load. It is particularly sensitive to operations on hot paths — the code that runs on every message, every request, every event. It looks for:

Connection-per-call anti-patterns: SQLite connections created and destroyed on every call in a crisis path, HTTP clients instantiated per-message instead of shared
Sync operations blocking the event loop: synchronous SQLite commits inside async handlers, fsync barriers on every message
Redundant computation: the same query called twice consecutively at startup, no caching on hot-path lookups
Resource lifecycle problems: resources created inside handler functions that should live in application state

In the Agent Broker audit, the performance agent found 28 findings across P1 to P3. The most impactful was a connection-per-call pattern in the halt store — the crisis-path code that needed to be fast was creating a new SQLite connection on every call. Fixed: the connection is now persistent, opened once at startup, reused for the lifetime of the process.

Agent 4 — Testing Engineer

The testing agent reads coverage reports alongside source code. It does not just count uncovered lines — it evaluates the risk posture of the gaps. A missing test on a utility function is different from a missing test on the composition of a safety pipeline. It looks for:

Composition gaps: individual components tested in isolation but their wiring in the actual runtime handler untested
Placeholder tests: test files that exist solely to prevent pytest exit code 5 (the code pytest returns when it collects no tests at all), containing only pass or trivial truthiness assertions
Coverage below floor: the 90% floor is not arbitrary — the agent looks for which files are dragging coverage down and quantifies the ROI of covering them
Missing regression tests: bug classes that are known to exist but have no test preventing re-introduction

The Agent Broker testing audit's most dangerous finding was a composition gap: every safety component (hard blocks, authority ceilings, audit-before-dispatch) had unit tests in isolation, but the 7-step message pipeline that composed them in _make_message_handler had zero tests. A wiring bug — wrong order, missing await, exception swallowing — would pass all existing tests and only surface in production.

The Parallel Execution Model

The four agents run concurrently. Each one:

Receives the codebase path and its domain-specific prompt
Reads files relevant to its domain
Writes findings to its output file: security.md, architecture.md, performance.md, testing.md
Completes independently

Because each agent writes to its own file, there is no coordination overhead and no merge conflicts. The outputs are additive.

After all four complete, a synthesis step aggregates findings into MASTER-SUMMARY.md. This file:

Consolidates all findings into a unified priority table (P0 through P3)
Identifies systemic issues that span multiple domains
Produces the ordered fix list for the resolution phase

The synthesis step takes minutes. The domain agents take 10-15 minutes each. Running them in parallel means the total audit time is the duration of the slowest agent, not the sum of all agents.

What the Output Looks Like

Each domain report follows a consistent schema:

| Priority | Category | Location | Issue | Status |

Every finding has an exact location (file path, line numbers), a category (injection, auth bypass, connection anti-pattern, coverage gap, etc.), and a clear statement of impact. No vague feedback. No "consider refactoring." Specific, addressable findings.

The MASTER-SUMMARY.md aggregates these into a priority table:

| Priority | Count | Status |
|----------|-------|--------|
| P0       | 7     | 7 fixed |
| P1       | 26    | 26 fixed |
| P2       | 45    | 45 fixed |
| P3       | 6     | 6 fixed |
| Total    | 84    | 84 fixed, 0 open |

84 findings across four domains. Produced in parallel. Ready to prioritize and resolve.

Why Specialization Compounds

The value of domain specialization is not additive — it compounds. A security agent running through auth code without the cognitive overhead of simultaneously evaluating test coverage produces a different quality of security finding. It can hold the entire threat model in its context window. It can trace the full auth bypass path from entry point to exploitation. It can identify patterns that a general reviewer would miss because the general reviewer's attention is divided.

Multiply that effect across four domains and you have a compound advantage: not four audits added together, but four deep audits each operating at full capacity.

In the Agent Broker case, this produced 84 findings in a single session. Many of those findings would not have surfaced in a standard code review. The SQL injection via f-string in okr_monitor_metrics.py is exactly the kind of finding that gets missed when the reviewer is also thinking about test coverage and coupling patterns. The security agent found it because that is all it was thinking about.

Lesson Drill

Design your own audit swarm for a codebase you own. Map out:

What four domains matter most for your system?
What are the top 5 things each domain agent should look for, specific to your tech stack?
What output format would make findings most actionable for your team?
Where does your current review process produce shallow findings because of context switching?

You do not need to run the swarm today. The design exercise builds the mental model. When you are ready to run it, the structure is already there.

Bottom Line

The audit swarm is a force multiplier. Four specialized agents running in parallel produce better findings than one general reviewer running sequentially — not because any single agent is smarter, but because specialization eliminates context switching and parallelism eliminates serial wait time. The output is a structured, prioritized findings document ready for the resolution phase covered in the next lesson.

The Audit Swarm — Four Agents, One Codebase, Zero Blind Spots