Docker Cache Hides Fixes
When COPY is cached, your permission fix or code update doesn't take. Understanding Docker layer caching saves hours of debugging.
The Incident
March 29, 2026. A service was failing because a file inside a Docker container had permissions 600 (owner read/write only) when the running process needed 644 (world-readable). The fix seemed obvious: change the permissions on the source file and rebuild.
chmod 644 config/settings.json
docker compose build akashic
Build output looked encouraging. The step completed. The container was restarted. The service still failed with a permission error.
The culprit: a single word in the build output that most engineers skim past.
=> CACHED [3/6] COPY config/ /app/config/ 0.0s
The cache prevented the permission fix from reaching the image. Twice.
How Docker Layer Caching Works
Every instruction in a Dockerfile produces a — an immutable filesystem snapshot stored in Docker's content-addressable cache. When you rebuild, Docker checks each instruction against its cache to determine whether to reuse the stored layer or execute the instruction fresh.
The cache invalidation rules differ by instruction type:
RUN instructions — Docker compares the command string. If the string is identical to a previous build, the cached layer is used, regardless of what the command does at runtime.
COPY and ADD instructions — Docker computes a checksum of the source files' content. If the content hash matches a previous build, the cached layer is used. Metadata like file permissions, ownership, and timestamps are not part of this hash calculation on all Docker versions.
This last rule is the trap. A chmod 600 → 644 change modifies file metadata, not file content. The bytes of the file are identical. Docker sees an identical content hash and serves the cached layer. Your permission fix never enters the image.
The Cascade: Three Rebuilds, Three Failures
Here is exactly what happened across the three rebuild attempts on March 29:
Attempt 1 — Standard rebuild after chmod
chmod 644 config/settings.json
docker compose build akashic
docker compose up -d akashic
Result: CACHED on COPY step. Container starts with original permissions. Service fails.
Attempt 2 — docker compose build with --pull
docker compose build --pull akashic
--pull forces Docker to check for a newer base image. It does not bust the COPY cache if the base image is unchanged. Result: still CACHED on COPY. Service still fails.
Attempt 3 — --no-cache
docker compose build --no-cache akashic
docker compose up -d akashic
--no-cache disables all layer reuse. Every instruction executes from scratch. The chmod'd file is COPYed fresh. Service starts successfully.
The 21-minute cost came from the combination of --no-cache forcing a full pip install (5 min), re-downloading the embedding model (2 min), and the two wasted intermediate attempts.
When to Use --no-cache
--no-cache is a sledgehammer. It solves cache problems but abandons every performance benefit of layered builds. Use it deliberately.
Use --no-cache when:
- You changed file permissions or file ownership and the service depends on those
- You changed file timestamps (rare, but can affect behavior in some pipelines)
- You suspect a dependency has changed but the lockfile hash is identical
- A
RUNcommand fetches external resources (git clone, curl) that may have updated - You are debugging a "my fix isn't in the container" situation and need certainty
Do not default to --no-cache for every build. In repos with large dependency installs or model downloads, it can cost 10+ minutes per build.
Better Patterns: Eliminating the Problem
The real fix is to stop relying on host filesystem permissions and handle permissions explicitly in the Dockerfile. There are two reliable approaches.
Pattern 1: chmod Inside the Dockerfile
Set permissions as a RUN instruction after COPY:
COPY config/ /app/config/
RUN chmod 644 /app/config/settings.json
When the RUN instruction string changes (e.g., you update the path or mode), Docker invalidates that layer and all subsequent layers. The permission is set inside the image regardless of what the host filesystem says.
Pattern 2: Multi-Stage Builds
Multi-stage builds allow you to separate the "dependency installation" layers (slow, rarely change) from the "application code" layers (fast, change frequently):
# Stage 1: Dependencies — cached aggressively
FROM python:3.11-slim AS deps
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Stage 2: Application — rebuilt on every code change
FROM deps AS app
COPY --chown=appuser:appuser src/ /app/src/
COPY --chown=appuser:appuser config/ /app/config/
RUN chmod 644 /app/config/*.json
With this layout, a code change busts only Stage 2's cache. The slow pip install in Stage 1 is unaffected. You get the cache performance on dependencies while ensuring application files and their permissions are always fresh.
Pattern 3: .dockerignore Discipline
Keeping a clean .dockerignore means fewer spurious files are included in the COPY checksum. When Docker only checksums the files you actually need, cache hits are more predictable and busting is less surprising.
# .dockerignore
__pycache__/
*.pyc
.git/
.env
*.log
data/
If data/ or *.log files are accidentally included in a COPY . instruction, any write to those files busts the entire application layer cache. Separate what changes frequently from what changes rarely.
Verifying the Fix Is Actually In the Container
After any rebuild where you suspected a caching problem, verify the fix landed before restarting the service:
# Check the permissions inside the running container
docker exec akashic ls -la /app/config/settings.json
# Check the build timestamp to confirm a fresh layer
docker inspect akashic | grep -i created
# Start an interactive shell to explore
docker exec -it akashic /bin/bash
Do not trust the service behavior as the sole indicator. A cached layer can produce a container that starts successfully but fails only under specific code paths. Explicit verification takes 10 seconds and eliminates an entire class of "my fix didn't work" bugs.
Reading Build Output Defensively
Develop the habit of reading docker compose build output line by line when debugging. The key signals:
=> CACHED [3/6] COPY config/ /app/config/ # Permission fix is NOT in this image
=> [3/6] COPY config/ /app/config/ # No CACHED prefix = fresh copy, fix IS in
When a layer is stale and you need it fresh, the fastest targeted approach is to add a throwaway ARG before the affected COPY to bust the cache for that layer and all subsequent ones:
ARG CACHE_BUST=1
COPY config/ /app/config/
Then rebuild with:
docker compose build --build-arg CACHE_BUST=$(date +%s) akashic
This is more surgical than --no-cache — it preserves cached layers above the bust point (like the pip install stage) while forcing a fresh copy of everything below.
Key Takeaways
- Docker's COPY cache key is based on file content hashes, not file permissions or timestamps — a
chmodalone will not bust the cache. --no-cacheguarantees a full fresh build but forfeits all performance benefits; use it for debugging, not as default practice.- Set permissions inside the Dockerfile with
RUN chmodafter everyCOPY— never depend on host filesystem metadata surviving into the image. - Multi-stage builds let you cache slow dependency layers independently from fast-changing application code layers.
- Always verify fixes landed with
docker exec ls -larather than inferring from service behavior alone.
What's Next
A stale Docker image is a frustrating waste of time. But there is a class of problem that is not just wasteful — it is actively dangerous. In Lesson 239, we look at what happens when you have two instances of a stateful service running simultaneously, and why that scenario is a direct financial risk for any service that manages money or external state.