Ask Knox

An 84-finding audit report looks like a mountain. The instinct is to start with the easiest items — quick wins, morale building, visible progress. That instinct is wrong. It will get you tangled in rework.

The correct move is to look at the dependency structure of the findings. Some findings change how the system fundamentally works. Others build on a system that is correctly working. Fix things in the wrong order and you may resolve a P2 cleanly — then discover that the P0 you deferred changes the architecture in a way that makes your P2 fix incorrect.

Priority levels exist to encode this dependency structure. P0 through P3 is not a bureaucratic label system. It is a directed acyclic graph of fix order.

The Four Priority Levels

P0 — Fix Before Deploy

P0 findings are deployment blockers. Not "fix soon" — fix before anyone runs this system in production.

The tests for P0 status are strict:

Does this finding expose active users to security vulnerabilities (auth bypass, injection, privilege escalation)?
Does this finding mean the system will behave incorrectly in an emergency?
Does this finding mean critical safety infrastructure is reporting success while doing nothing?
Does this finding leave a core runtime path entirely untested, where a wiring bug would pass all tests and fail in production?

If any of these are true, the finding is P0.

From the Agent Broker audit, the 7 P0 findings were:

Auth middleware that accepted any non-empty bearer token — the entire broker ran unauthenticated
Kill switch resume with no authorization check — anyone could reset the kill switch
SQL injection via f-string interpolation in a production metrics script
Command injection via interpolated daemon names in an SSH invocation
list_escalations that ignored its state parameter — both branches returned list_pending()
Kill switch Level 4 with stub implementations — _revoke_all_tokens() and _lock_env_files() returning True while doing nothing
Message pipeline handler with zero tests — the 7-step safety enforcement pipeline untested at the composition level

The stub implementations (finding 6) deserve a longer look. This is a recurring pattern in audits. A system is designed with a kill switch. The kill switch is wired up. Tests for the kill switch pass. But the implementation of specific levels is a stub:

def _revoke_all_tokens(self) -> bool:
    # TODO: implement token revocation
    return True

The test sees True. The log says tokens_revoked=True. The cockpit shows the kill switch fired. Agents retain valid tokens and continue operating. This is the worst class of P0: a safety system that fails silently while reporting success.

If you find this pattern in a security-critical path, treat it as an immediate blocker. There is no scenario where a stub that returns True on a safety operation is acceptable in production.

P1 — Fix This Sprint

P1 findings are not deployment blockers but they are not optional. They represent significant risks that compound if left open. The categories that consistently land at P1:

Architectural wiring failures — fully implemented subsystems that are never called. The Agent Broker's finops and feedback modules: 400+ lines of complete code that could not execute because they were never wired into main.py or the message pipeline. This is not dead code in the conventional sense. It is live code that was designed to run but was never connected. The architecture is hollow.

Security issues below the P0 bar — hardcoded audit identifiers, synthetic test endpoints exposed in production without environment guards, input fields that accept arbitrary strings with no allowlist validation. These do not expose the system to immediate compromise but they create conditions that make compromise easier.

Performance anti-patterns on hot paths — connection-per-call in a crisis handler, synchronous commits blocking the async event loop, HTTP clients constructed per-message. These do not cause failures but they cause degraded performance under exactly the conditions where performance matters most: high load, emergency response.

Specific uncovered paths behind the coverage gap — the 90% coverage requirement itself is a P0 in CI-gated systems (dropping below it blocks the build), but the individual uncovered paths that are driving the gap are P1 findings. Dispatchers with zero tests, escalation endpoints with zero tests, NATS (the message bus the broker publishes to) connection paths with zero tests.

P2 — Significant Technical Debt

P2 findings represent real problems that should be fixed, but they do not introduce immediate risk and they do not block the current sprint's work. They accumulate into architectural debt if ignored across multiple sprints.

Common P2 patterns:

Edge cases and error paths with no regression tests
Inconsistent patterns across the codebase (some modules use persistent SQLite connections, others create connections per-call)
Weak but functional security configurations (confirmation phrases in source code that should be in env vars)
Missing chaos and integration tests (directories with __init__.py only)
Non-optimal but not actively harmful code patterns

P2 findings go on the backlog. They should be resolved within 2-3 sprints. A codebase with 40 open P2 findings is carrying meaningful risk — not because any single finding is dangerous, but because the accumulated weight of "works but not quite right" creates a system that is brittle under modification.

P3 — Track and Address When Convenient

P3 findings are genuine improvements that would make the codebase better. Redundant method calls that could be collapsed, log verbosity that could be tuned, documentation comments that could be cleaner. Nothing breaks without fixing them. Nothing is dangerous. They are worth tracking because a P3 today can become a P2 when the system scales.

Why Fix Order Matters More Than You Think

Consider a scenario from the Agent Broker audit. The message pipeline had zero tests (P0 testing finding). The authentication middleware accepted any non-empty token (P0 security finding).

If you fix the authentication P0 first — building a real token validation system — your message pipeline tests now need to account for authentication. The tests you would have written before fixing auth are different from the tests you write after. If you write the pipeline tests first, you build them against an unauthenticated system and then have to revise them after the auth fix.

The P0 security findings set the foundation that everything else builds on. Fix them first.

Similarly, the architectural wiring failures (P1 — finops and feedback never wired) needed to be resolved before adding coverage tests for those subsystems. Testing code that is wired incorrectly produces tests that pass against broken wiring. Fix the wiring, then write the tests.

The fix order is:

P0 security findings
→ P0 architecture findings (stubs, hollow systems)
→ P0 testing findings (composition gaps)
→ P1 architecture findings (wiring failures)
→ P1 security findings (second-tier auth issues)
→ P1 performance findings (hot path optimization)
→ P1 testing findings (coverage to floor)
→ P2 findings (technical debt, sprint by sprint)
→ P3 findings (backlog, opportunistic)

Within each priority level, the rule of thumb is security first, then architecture, then testing — testing a component depends on it being correctly implemented and wired. The Agent Broker's P1 group inverted the first two: its P1 architectural wiring failures (finops and feedback never connected) were sequenced ahead of its second-tier P1 security issues, because the hollow subsystems blocked the coverage tests that the rest of the P1 work depended on. The general order is a default; the dependency graph of the specific findings overrides it.

The Deployment Decision

The deployment gate is clean: if any P0 findings are open, the system does not ship. No exceptions, no "we'll fix it in the next release," no partial deployment.

This sounds rigid. It is. The rigidity is the point.

An auth bypass in production is not a minor issue that can be patched on the next release cycle. A kill switch that reports success while doing nothing is not an acceptable state in a system that controls AI agents. A message pipeline handler with zero tests means you are running safety enforcement code that has never been verified at the composition level.

P0 findings exist precisely because some problems cannot be deferred. The prioritization framework gives you a clean, defensible answer to the question "can we ship this?" when the answer is no.

Lesson Drill

Take your last code review — any size. Categorize every finding using P0-P3 criteria. For each finding you originally fixed in any order, trace whether fixing it before a higher-priority item forced you to revisit the fix later. How many fixes needed revision because something foundational changed after they were written?

The answer to that question is the cost of ignoring fix order.

Bottom Line

84 findings prioritized is a sprint plan. 84 findings unprioritized is paralysis. The P0 to P3 framework encodes the dependency structure of your codebase's problems — which foundations need to be stable before you build on them. Fix P0 first. Always. Everything else is built on top of that ground.

Prioritization — Why Fix Order Is the Whole Game