ASK KNOX
beta
LESSON 164

Quality Gate Theater

CI says 92% coverage. The PR passes. No one questions it. The 92% is built on stub files with zero real test functions. This lesson shows you how to find them, what to do with them, and why deleting them is sometimes the right call.

10 min read·Repo Hygiene & Cost Discipline

CI says 92% coverage.

The PR review takes 4 minutes. Two approvals. Merge.

Everyone moves on. The system is trusted. The 92% confers confidence — this codebase is tested, quality gates are working, bugs will be caught.

Except the 92% includes 11 stub test files. Empty bodies. pass statements. Functions defined but never implemented. The import is there. The class is there. The test count is zero.

This is : the appearance of rigor without the substance. The scoreboard says you are winning a game that is not being played.

What Counts as a Stub File

A stub file is any test file that imports modules and exists in your test suite but contains zero real test functions.

In Python, the canonical diagnostic is:

grep -cE "^\s*(async )?def test_" test_file.py

If this returns 0, the file is a stub. Note the \s* — naive patterns like ^def test_ miss class-based tests where the method is indented under a TestCase subclass. The (async )? handles async test functions that are increasingly common in async-first codebases.

In TypeScript with Jest or Vitest, stubs look different:

describe("UserService", () => {
  it.todo("should create a user")
  it.todo("should validate email")
  xit("should handle duplicate registration", () => {
    // TODO: implement
  })
})

Every test is it.todo, xit, or xdescribe. The test runner counts the file. The coverage tool counts the imports. Zero assertions are made.

Finding Stubs at Scale

Manual inspection does not scale. This script finds every Python stub in a project:

for f in $(find . -path "*/tests/test_*.py" -not -path "*/node_modules/*"); do
  count=$(grep -cE "^\s*(async )?def test_" "$f" 2>/dev/null || echo 0)
  if [ "$count" -eq 0 ]; then
    echo "STUB: $f (0 tests)"
  fi
done

Run this in any Python project root. Every file it outputs is costing you coverage points without providing coverage value. In a project with 40 test files, it is not unusual to find 5–8 stubs. That is 12–20% of the test suite contributing nothing except a falsely elevated coverage number.

For TypeScript:

for f in $(find . -name "*.test.ts" -o -name "*.spec.ts" | grep -v node_modules); do
  real_count=$(grep -cE "^\s*(it|test)\s*\(" "$f" 2>/dev/null || echo 0)
  skip_count=$(grep -cE "^\s*(it\.todo|xit|xtest)\s*\(" "$f" 2>/dev/null || echo 0)
  if [ "$real_count" -eq "$skip_count" ] && [ "$real_count" -gt 0 ]; then
    echo "ALL_SKIPPED: $f ($real_count skipped)"
  elif [ "$real_count" -eq 0 ]; then
    echo "STUB: $f (0 real tests)"
  fi
done

The Resolution Rule

Finding a stub is the beginning, not the end. Before acting, ask one question: is this functionality tested somewhere else?

# Find test coverage of a specific module
grep -r "import UserService\|from.*UserService" tests/
grep -r "UserService\|create_user\|validate_email" tests/ --include="*.py"

This matters because stubs sometimes survive refactors where the tests moved to a different file or the module was merged into a larger integration test. The stub is orphaned overhead — it is not covering a gap, it is just taking up space and inflating the count.

If the functionality is covered elsewhere: Delete the stub. It is not a gap — it is a coverage scam. Deleting it is more honest than keeping it, because at least the coverage number will reflect reality.

If the functionality is genuinely untested: The stub becomes a real obligation. Write at minimum three test functions:

  1. Happy path — the function does what it is supposed to do under normal conditions.
  2. Error path — the function handles a failure condition correctly (bad input, missing data, network error).
  3. Edge case — the boundary value, the empty input, the concurrent call, or whatever the highest-risk scenario is for this specific module.

Three is the floor. It forces you to think about failure modes instead of just proving the function runs once.

Coverage Threshold Theater

Stubs are not the only way quality gates lie. Configuration mismatches are equally dangerous and easier to miss.

# pyproject.toml
[tool.pytest.ini_options]
addopts = "--cov=src --cov-fail-under=85"
# CLAUDE.md
Testing Mandate: 90% coverage floor. Non-negotiable.

These two files exist in the same repo. CI runs pytest. The pipeline passes at 85%. The CLAUDE.md says 90%. Nobody notices the discrepancy because the pipeline is green.

The lower number wins silently. Always.

To find mismatches across a project:

# Check configured threshold
grep -r "cov-fail-under\|coverageThreshold\|branches.*[0-9]" \
  pyproject.toml pytest.ini setup.cfg .nycrc vitest.config.ts jest.config.ts 2>/dev/null

# Check documented policy
grep -i "coverage\|90%\|85%\|floor" CLAUDE.md README.md docs/CONTRIBUTING.md 2>/dev/null

If these produce different numbers, the configured number is your real policy regardless of what the docs say. Fix the configuration to match the documented standard — not the other way around.

Why Stubs Happen

Stubs are not malicious. They are the artifact of reasonable workflows that were never completed.

The refactor case: tests are moved to a new location. The old file is left behind as a placeholder. Nobody cleans it up because the tests run and the coverage is fine.

The TDD stub case: a developer creates test file structure before implementation. The implementation ships. The tests get postponed. The postponement becomes permanent.

The agent case: a coding agent is tasked with "add test coverage for the trading module." It creates test files, defines test functions with descriptive names, and writes bodies that are either pass or TODO comments. CI passes. The agent reports success. No real assertions were made. The Foresight trading bot had exactly this happen during a refactor sprint — a stub file persisted for months, coverage stayed green, and the underlying functions had zero real test coverage the entire time.

The Audit Command

Run this before any merge that touches test infrastructure:

# Full stub audit: Python
echo "=== Python Stubs ===" && \
for f in $(find . -path "*/test*.py" -not -path "*/.venv/*" -not -path "*/node_modules/*"); do
  count=$(grep -cE "^\s*(async )?def test_" "$f" 2>/dev/null || echo 0)
  [ "$count" -eq 0 ] && echo "  STUB: $f"
done

# Coverage threshold check
echo "=== Coverage Config ===" && \
grep -r "fail-under\|coverageThreshold" pyproject.toml pytest.ini setup.cfg \
  vitest.config.ts jest.config.ts 2>/dev/null | head -10

Add it to your CI pipeline as a non-blocking check first. Observe what it surfaces. After one sprint of observation, make it blocking.

The 90% floor is non-negotiable. But 90% built on stub files is 0% real confidence. The difference between those two numbers is the distance between the scoreboard and the game.