ASK KNOX
beta
LESSON 168

The 3-Role Audit Swarm

One agent misses what three specialized agents catch by design. Backend, DevOps, and Architect reading the same codebase through different lenses — this is the pattern that found a public API exposure, dead health-check code, and a duplicate event handler in a single pass.

10 min read·AI Code Audit Patterns

One agent reading your entire codebase through every lens simultaneously is not a comprehensive review. It is a shallow one.

The InDecision ecosystem — engine, Discord bot, site, workers — went through a 3-agent audit swarm last cycle. The result was 24 issues across 11 files in a single pass. Issues that a single "comprehensive" agent had seen before and missed. Not because the single agent was bad. Because cognitive load is real, and specialization is the answer to it.

The 3 Roles and What Each One Sees

Each role gets the same file list. Each role gets a different instruction set. The instruction set determines what they look for — and more importantly, what they do not look for.

Backend Engineer — logic correctness, data integrity, API contracts, race conditions, error handling gaps, type annotation accuracy, duplicate event handlers, silent default returns on failure, input validation gaps.

DevOps Engineer — plist configuration, logging setup (FileHandler vs StreamHandler vs launchd stdout redirect), PATH configuration, macOS TCC restrictions, scheduling alignment, process overlap, monitoring coverage, service binding addresses.

Distinguished Architect — signal flow end-to-end, single point of failure analysis, duplicate detection patterns, concurrency model correctness, exchange and data source consistency, dead code in production paths, module coupling, interface contracts between services.

These roles are not arbitrary. They map to the three layers where production bugs actually live: the logic layer, the infrastructure layer, and the system design layer. A single agent trying to hold all three simultaneously drifts toward the layer it was implicitly prompted to focus on and produces surface-level observations in the other two.

What Each Role Found That the Others Missed

This is the empirical argument for the pattern. Same codebase. Same pass. Different roles. Different findings.

DevOps found: The com.example.indecision-api.plist had 0.0.0.0 as the bind address. The docstring in the service file explicitly said 127.0.0.1. The plist was wrong. The API was publicly exposed on all interfaces, not localhost-only. The Backend agent was focused on route logic and handler correctness — it was not scanning plists for binding discrepancies. The Architect was modeling signal flow — not process configuration.

Architect found: is_db_degraded() had full unit test coverage, was imported in deps.py, and was never called from any production route. The /health endpoint was returning {"status": "ok"} with a hardcoded response, completely bypassing the degradation check. The function existed, the tests passed, the import was clean — and the health check was lying. The Backend agent validated the logic of is_db_degraded() itself. Neither the Backend nor the DevOps agent traced the call chain from route handler to health utility.

Backend found: The Discord bot had a duplicate on_message handler. Both were registered. The second silently overwrote the first. One handler was the original implementation; the other was a refactored version that never replaced it cleanly. Messages were being processed only by the second handler, and the first handler's logic — including a rate-limiting check — was completely bypassed.

| Severity | Issue                              | File                            | Line |
|----------|------------------------------------|---------------------------------|------|
| P0       | 0.0.0.0 binding in plist           | com.example.indecision-api.plist   | 12   |
| P0       | is_db_degraded() never called      | deps.py                         | 31   |
| P1       | Duplicate on_message handler       | bot.py                          | 445  |

Three P0/P1 issues. All found in a single pass. None of them would have surfaced with a general "please review this codebase" prompt.

The Prompt Discipline That Makes It Work

The swarm is not magic. It produces results because of four specific discipline choices in how you set it up.

Give each agent an explicit file list. Do not tell the DevOps agent to "explore the repo." Give it the specific files in scope: com.example.indecision-api.plist, launchd/, logging_config.py, requirements.txt. Free exploration leads to agents spending cycles on files outside their domain. Explicit scope means every cycle is used against relevant surface area.

Pre-seed with known suspects. Before the audit starts, brief each agent on the problem areas you already know about. Known race conditions, recently refactored modules, files that have had multiple bug reports. This prevents agents from spending time rediscovering things you already know. Their value is finding things you do not know. Direct their attention accordingly.

Set output format upfront. Every agent gets the same output schema: severity table with columns for Severity, Issue, File, and Line. When all three reports arrive in the same format, you can merge them directly, sort by severity, and have a unified priority queue without manual normalization. Reports in different formats require synthesis time that does not add value.

Fix P0 criticals while the architect agent is still running. The moment the Backend or DevOps agent confirms a P0, start fixing it. Do not wait for all three agents to complete. The Architect may surface additional P0s — good, fix those too when they arrive. Waiting for full completion before acting on identified criticals is wasted time.

Setting Up the Swarm

The mechanics are straightforward. Three concurrent Claude Code sessions, each seeded with role identity and file scope.

ROLE: You are a Senior Backend Engineer conducting a production code audit.
FOCUS: Logic correctness, data integrity, API contracts, race conditions,
       error handling, type annotation accuracy, duplicate handlers.
FILES IN SCOPE: [explicit file list]
KNOWN SUSPECTS: [list of recently changed or problematic files]
OUTPUT FORMAT: Severity table (P0/P1/P2/P3) with columns: Severity | Issue | File | Line

Repeat for DevOps and Architect with their respective focus areas. Run all three concurrently. Merge outputs into a single severity table, deduplicate any overlapping findings, and prioritize by severity.

The finding set you get from this will be materially different — and more actionable — than anything a single comprehensive agent produces on the same codebase. The reason is not the model. The reason is the lens.

Why Specialization Wins

Cognitive load scales with scope. An agent told to look for everything looks thoroughly for nothing. An agent told to look for one class of problem looks for that class with much greater depth.

The Backend agent reading 20 files for logic errors will catch a duplicate event handler. The same agent simultaneously trying to evaluate plist configs and system architecture will give each file a surface pass and miss the subtle second handler registration buried 445 lines into bot.py.

The three-role structure is not about using more compute. It is about using compute more precisely. Each role represents a mental model. Each mental model has its own vocabulary of what constitutes a signal vs. noise. You are not tripling the work — you are tripling the resolution within each domain.

One comprehensive agent. Three role-specialized agents. Same files. The second approach found a public API exposure, a lying health endpoint, and a silently broken message handler. The first approach missed all three.

That is the pattern. Run it on any codebase with more than a handful of services and see what surfaces.