ASK KNOX
beta
LESSON 165

CI Cost Engineering

Every CI run costs time and money. Most teams burn 100-200 minutes per week on jobs that should never have triggered. Three patterns fix 80% of the waste in under an hour.

10 min read·Repo Hygiene & Cost Discipline

A team has Playwright E2E tests. They run on every single PR. Including PRs that only change CSS utility classes that have nothing to do with page routing. Each run: 8 minutes of GitHub Actions minutes. 20 PRs per week, 160 minutes wasted per week. The Foresight trading bot repo ran this pattern for three months before path filters were added — the E2E suite triggered on every documentation edit, every config comment update, every formatting pass.

That is not a hypothetical. That is the default state of most repositories after six months of organic growth. No one made a bad decision — people just added tests and workflows without pausing to ask what each run was costing.

CI waste is an invisible tax. It does not show up in a bug report or a postmortem. It shows up as a slower development loop, a higher GitHub bill at the end of the month, and an engineering culture that treats CI as a thing that happens to them rather than a system they own.

Three patterns fix 80% of the waste. Each takes under 20 minutes to implement.

The Three Waste Patterns

Pattern 1: Missing venv Cache

Python repos without actions/cache@v4 reinstall pip dependencies on every CI run. Call it 90 seconds per job. If three jobs in your workflow each run pip install -r requirements.txt against the same file, that is 4.5 minutes of redundant work per PR, every PR.

The fix is a cache keyed on the requirements file hash. When nothing changed, the cache hits and the install step skips. When requirements.txt changes, the hash changes, the cache misses, and a fresh install runs. Correct behavior, zero configuration.

- name: Cache virtualenv
  id: venv-cache
  uses: actions/cache@v4
  with:
    path: backend/.venv
    key: venv-${{ runner.os }}-py3.13-${{ hashFiles('backend/requirements.txt') }}

- name: Install dependencies
  if: steps.venv-cache.outputs.cache-hit != 'true'
  run: |
    python -m venv backend/.venv
    backend/.venv/bin/pip install -r requirements.txt

- name: Add venv to PATH
  run: echo "${{ github.workspace }}/backend/.venv/bin" >> $GITHUB_PATH

The if: steps.venv-cache.outputs.cache-hit != 'true' condition is load-bearing. Without it, the install always runs and the cache is never used. The third step adds the venv to GITHUB_PATH so all subsequent steps see the installed packages without needing to activate the environment explicitly.

One additional rule: if multiple jobs install from the same requirements.txt, give them the same cache key. A cache is only shared if the key matches. Giving each job a unique key means each job stores and retrieves its own copy — you get four cache entries instead of one, and the restore only benefits the job that wrote it last.

Pattern 2: No Path Filters on Expensive Jobs

Playwright running on a CSS change is the canonical example, but the pattern applies everywhere: any slow job that runs regardless of whether the changed files are relevant to what it tests.

The dorny/paths-filter@v3 action gives you a two-job architecture. A fast changes job reads the diff and outputs boolean flags. Downstream jobs gate on those flags. A Playwright suite that takes 8 minutes now only runs when frontend pages, layout components, the app entry point, E2E specs, or backend files changed.

changes:
  runs-on: ubuntu-latest
  outputs:
    playwright: ${{ steps.filter.outputs.playwright }}
  steps:
    - uses: actions/checkout@v4
    - uses: dorny/paths-filter@v3
      id: filter
      with:
        filters: |
          playwright:
            - 'frontend/src/pages/**'
            - 'frontend/src/components/layout/**'
            - 'frontend/src/App.tsx'
            - 'e2e/**'
            - 'backend/**'

playwright:
  needs: [frontend, changes]
  if: needs.changes.outputs.playwright == 'true'

The filter list is a deliberate design decision. You are encoding what affects page behavior. Backend changes are included because a broken API breaks E2E tests even if no frontend files changed. The e2e/** glob ensures that changes to the tests themselves always trigger a run — otherwise a test fix would never run in CI.

Start conservative and expand. If you miss a path and Playwright skips a run that should have caught a regression, you add the path. The cost of a false skip is a single incident. The cost of no filters is thousands of wasted minutes over a project lifetime.

Pattern 3: Missing npm Test Aliases

This one is about developer ergonomics, not CI cost directly. But it has a real multiplier effect on team velocity.

A developer changes a date formatter. The full test suite takes 2 minutes. Half of that is rendering tests for UI components that have nothing to do with the formatter. The developer either waits the full 2 minutes every iteration, or they skip running tests locally entirely and push to let CI catch it. Neither outcome is good.

Targeted test aliases solve this:

"scripts": {
  "test": "vitest run --coverage",
  "test:logic": "vitest run src/__tests__/formatters.test.ts src/__tests__/api.test.ts",
  "test:components": "vitest run src/__tests__/ui-components.test.tsx"
}

test:logic runs in 15 seconds. test:components runs in 20 seconds. The developer who changed a formatter runs test:logic and gets a fast answer. CI still runs the full test suite with coverage. The aliases do not replace CI — they make the local feedback loop fast enough that developers actually use it.

This pattern came directly out of the mission-control dashboard repo. Backend logic and UI rendering tests were always run together. Splitting them into aliases cut the local feedback loop from 90 seconds to 12 seconds for logic changes.

The Math

Fix these three patterns in a single session:

PatternSetup TimeWeekly Savings (20 PRs)
venv cache15 min~90 min (3 jobs × 90s × 20 PRs)
Playwright path filter20 min~120 min (assumes 75% of PRs skip E2E)
npm aliases10 minHard to quantify — reduces local friction, improves loop

At 20 PRs per week, fixing the cache and path filter alone recovers 150-200 minutes per week. At scale — 50 PRs per week, multiple repos — this is real money and real hours.

CI is infrastructure. Infrastructure requires maintenance. The three patterns above are the maintenance backlog for most repositories that have been running workflows for more than a few months. They are not glamorous work. But they compound — every PR runs faster, every developer gets feedback sooner, and the bill shrinks every month.

The session time investment is under an hour. The payback starts on the next PR.