ASK KNOX
beta
LESSON 167

Maintenance as a Skill

The difference between teams that maintain clean repos and teams that don't isn't discipline or intelligence. It's systematization. Clean repos stay clean because the process is automated, not because developers remember to clean.

11 min read·Repo Hygiene & Cost Discipline

The difference between teams that maintain clean repos and teams that don't isn't discipline. It isn't intelligence. It isn't that one team cares more.

It is systematization.

Clean repos stay clean because there is a process that runs on a schedule. Dirty repos stay dirty because there is an intention that never becomes a trigger. You cannot maintain a system through willpower. You maintain it through automation and cadence.

This lesson is about encoding the five-phase audit workflow into a reusable Claude Code skill — and about understanding why the cadence question matters as much as the workflow itself.

What a Claude Code Skill Is

Claude Code skills are reusable /commands that encode workflows. A skill lives in ~/.claude/skills/<name>/SKILL.md with YAML frontmatter that defines the command name, description, and arguments. When you type /repo-maintenance ~/Documents/Dev in any Claude Code session, the skill runs the full five-phase audit on every repo in that directory.

The skill is not a script. It is a structured prompt that instructs Claude Code on how to conduct the audit — what to check, how to classify findings, how to name branches, what to include in PRs. It encodes the institutional knowledge from Lessons 165 and 166 into a repeatable process that any engineer can invoke.

Skills are stored in the user's Claude config directory, which means they persist across projects and sessions. You build the skill once. You get the workflow forever.

The Discovery Loop

The skill starts with a discovery pass: find every git repository one level deep in the target directory, then collect the signal for each one before doing anything else.

# Find all git repos one level deep
find "$ARGUMENTS" -maxdepth 2 -name ".git" -type d | while read gitdir; do
  repo=$(dirname "$gitdir")
  echo "Repo: $repo"
  # Check for CLAUDE.md, test suite, CI
done

"One level deep" is deliberate. Nested repos — a monorepo containing multiple packages each with their own .git — are handled separately. The discovery loop is designed for a flat portfolio directory like ~/Documents/Dev, where each subdirectory is an independent project.

The output is a table: repo name, CLAUDE.md status, test suite present or absent, CI present or absent, primary language. Print it before taking any action. The table is also the artifact you share when someone asks what the audit found.

Stub Detection

The stub detection check is worth understanding in detail, because stubs are the most consequential finding the audit can surface.

for f in $(find . -path "*/tests/test_*.py" -not -path "*/node_modules/*"); do
  count=$(grep -cE "^\s*(async )?def test_" "$f" 2>/dev/null || echo 0)
  if [ "$count" -eq 0 ]; then echo "STUB: $f (0 tests)"; fi
done

The pattern ^\s*(async )?def test_ matches Python test function definitions — both sync and async. A file that matches zero instances of this pattern has no real tests, regardless of how many imports, fixtures, or comments it contains.

Why does this matter enough to be P0? Because CI coverage tools count stub files. A 200-line stub file with fixtures and pass implementations looks like a 200-line tested file to coverage calculators. The threshold appears to pass. The quality gate shows green. But nothing is actually being tested.

The Foresight trading bot had one stub file that was a remnant from a refactor. The function it was supposed to test had been split into three smaller functions. The stub was never updated. Coverage stayed green. The three new functions had zero test coverage for four months.

The P0/P1/P2 Plan Format

The skill generates a structured plan that doubles as a communication artifact. Here is the template format it uses:

## Maintenance Plan

### P0 — Blocking / Correctness
- [repo] Stub test file backend/tests/test_foo.py (0 tests)
- [repo] Coverage threshold mismatch: CI says 85%, CLAUDE.md says 90%

### P1 — High Value
- [repo] CLAUDE.md is 465 lines (target ≤200) — est. -280 lines
- [repo] Missing venv cache — 3 pip installs per PR (~90s wasted each)

The format is intentionally terse. Repo name, finding, estimated impact. No prose, no hedging. The plan should be scannable in 60 seconds. If the plan takes five minutes to read, it will not be read before execution starts — and execution without a shared plan produces PRs that surprise reviewers.

The Cadence Question

How often should you run this?

Monthly maintenance run across all active repos. Quarterly deep audit that includes a review of test coverage trends over time, not just a point-in-time snapshot.

The goal is not perfection. The goal is drift prevention. Here is why the cadence matters:

A CLAUDE.md that grows 5 lines per month hits the 200-line target in roughly three years. That is slow enough to be invisible month-to-month. A CLAUDE.md that grows 50 lines per month — sprint notes, new tool integrations, expanded examples — hits 200 lines in four months. That is fast enough to notice, but only if you are looking.

Monthly cadence catches the fast drifters before they become expensive. The slow drifters get caught in the quarterly deep audit.

The Compound Cost Calculation

Context window cost is not a one-time charge. It is a per-session charge. Every time an agent session loads a bloated CLAUDE.md, you pay for those extra tokens.

Run the math on a single repo:

  • 400-line CLAUDE.md vs. 150-line CLAUDE.md: approximately 2,500 extra tokens per session
  • 10 agent sessions per day: 25,000 extra tokens per day
  • 30 days: 750,000 extra tokens per month
  • At $15/MTok (Claude Sonnet input): $11.25 per month, per repo, from CLAUDE.md overhead alone

That is one repo. A five-repo portfolio with bloated CLAUDE.md files in each one costs $50-60 per month in context overhead before any actual work gets done. The maintenance run that prunes those files pays for itself in 30 days.

The number will vary based on model, session frequency, and how much of the CLAUDE.md is actually loaded versus skipped. But the direction is always the same: smaller context = lower cost. There is no scenario where a 400-line CLAUDE.md is cheaper than a 150-line one.

What Maintenance Teaches You About Your System

The practical payoff is the PRs: pruned files, filled tests, optimized CI. The deeper payoff is what you learn while doing it.

Pruning CLAUDE.md forces you to interrogate every section: does an agent session without this section make worse decisions? For most of what gets added over time, the honest answer is no. The section was added because someone thought it would be useful. It has been loaded into context hundreds of times. It has never changed an agent's behavior. Removing it is not a loss — it is correction.

Filling stub tests forces you to understand your untested surface area — concretely, function by function. You will find assumptions baked into code that nobody documented and error paths that were never tested because no one expected them to trigger.

Adding path filters forces you to map the dependency graph of your test suite against your repo's file structure. Which backend changes break E2E tests even when no frontend files changed? The path filter is a forcing function for that analysis. The analysis produces knowledge that belongs in every engineer's head, not just a YAML file.

Systematizing the Trigger

The skill encodes the workflow. The calendar encodes the cadence. Set a monthly recurring event: "Repo maintenance run." Block 90 minutes. Run /repo-maintenance ~/Documents/Dev. Create the PRs. Merge by end of day.

Quarterly, run the deep audit: pull 90 days of coverage trend data, check whether fixed findings have regressed, review whether any CLAUDE.md files have drifted back above 200 lines. Regressions are part of the signal. A repo that regresses on CLAUDE.md size every quarter needs a stronger pruning policy or a structural change in how context is documented.

The finish line is a system that keeps the repo clean without anyone remembering to clean it. Not the audit — the systematization.