Permission Drift from Automation

Twenty Python files. All readable on the host. All committed to git. All imported successfully in local tests. Inside the Docker container, they caused a crash loop on startup.

The cause: . The AI agent session that created the files had a restrictive umask. The files were mode 600 — readable only by their owner. The container ran as appuser. appuser was not the owner. Python could not import the modules.

The permission was invisible until the container tried to use it.

How umask Works

Every process has a — a bitmask that is subtracted from default permissions when new files are created. The standard system default is 022:

Default file permissions: 666 (rw-rw-rw-)
Minus umask 022:          644 (rw-r--r--)
Result:                   644 (owner read/write, others read)

A restrictive umask of 077 removes group and other permissions entirely:

Default file permissions: 666 (rw-rw-rw-)
Minus umask 077:          600 (rw-------)
Result:                   600 (owner read/write only)

AI agent sessions — Claude Code, automated tooling, CI runners — often run with restrictive umasks. This is a security-conscious default: processes creating temporary files should not expose them to other users on the system. The problem arises when those files are not temporary. When an agent session writes production source code, the restrictive umask follows those files into the repo.

The Akashic Permission Bomb

During a coding session, an AI agent wrote and edited approximately twenty Python files across the Akashic Records repo. The agent session had umask 077. Every file it touched — new files it created, and files it modified — inherited that umask.

The files looked normal:

$ git status
# Nothing suspicious — all files tracked, no unexpected changes

$ cat akashic/query/discover.py
# Reads fine — you are the owner

git status does not show permission modes by default. cat works because the developer running the command is the file owner. Local tests pass because pytest runs as the same user. The bomb is invisible to every tool in the normal development workflow.

The Docker container ran as appuser — a non-root user created specifically to avoid running the service as root. appuser was not the owner of the files. When Python tried to import akashic.query.discover, it encountered mode 600 and threw:

PermissionError: [Errno 13] Permission denied: '/app/akashic/query/discover.py'

The container entered a crash loop. Docker marked it unhealthy after three failed restarts. The service was down.

The Cache Trap

The first fix attempt was to chmod the affected files and rebuild:

chmod 644 akashic/**/*.py
docker compose build akashic
docker compose up -d akashic

The container still crashed.

The cause: Docker build layer caching. The COPY . /app instruction in the Dockerfile had been executed and cached when the files had mode 600. Running docker compose build did not invalidate the cache — Docker detected that the source files had not changed (from its perspective, only the metadata changed, and Docker's cache key does not include permission bits). The COPY layer was served from cache. The appuser inside the container still saw mode 600.

The fix required bypassing the cache:

docker compose build --no-cache akashic
docker compose up -d akashic

With --no-cache, Docker re-executed every layer including COPY. The new 644 permissions were captured. The container started successfully.

Defense Layer 1: chmod in Dockerfile

The most robust fix embeds the permission correction in the Dockerfile itself. Regardless of what permissions the host files have when they enter the image, the Dockerfile normalizes them:

FROM python:3.12-slim

WORKDIR /app

# Create non-root user
RUN useradd -r -u 1001 -s /sbin/nologin appuser

# Copy source
COPY . /app

# Normalize permissions regardless of host umask
RUN find /app -type f -name "*.py" -exec chmod 644 {} \; && \
    find /app -type d -exec chmod 755 {} \;

# Switch to non-root user
USER appuser

CMD ["python", "-m", "uvicorn", "akashic.main:app", "--host", "0.0.0.0", "--port", "8002"]

The RUN chmod step executes inside the container layer, after COPY. It normalizes Python file permissions to 644 and directory permissions to 755. The running user's identity is irrelevant at this layer — the RUN step executes as root, which can chmod anything.

This defense is unconditional. It does not matter what umask the agent had, what the developer's local umask is, or whether someone ran chmod before building. The Dockerfile enforces the correct permissions on every build.

Defense Layer 2: Pre-Build Scan

A pre-build scan catches the problem before it enters the container. Add a script to CI and to the local deploy checklist:

#!/bin/bash
# scripts/check-permissions.sh
# Run before docker compose build

VIOLATIONS=()
while IFS= read -r file; do
    mode=$(stat -f "%Lp" "$file" 2>/dev/null || stat -c "%a" "$file" 2>/dev/null)
    if [ "$mode" = "600" ] || [ "$mode" = "700" ]; then
        VIOLATIONS+=("$file (mode: $mode)")
    fi
done < <(find . -type f -name "*.py" ! -path "./.git/*")

if [ ${#VIOLATIONS[@]} -gt 0 ]; then
    echo "PERMISSION VIOLATIONS:"
    for v in "${VIOLATIONS[@]}"; do
        echo "  $v"
    done
    echo ""
    echo "Run: find . -name '*.py' -exec chmod 644 {} \\;"
    exit 1
fi

echo "Permissions OK"
exit 0

Integrate this into the deploy workflow:

# In deploy script or Makefile
deploy:
    bash scripts/check-permissions.sh
    docker compose build akashic
    docker compose up -d akashic

The scan adds under one second to the build process. It catches the problem on the host, where the fix is trivial — a single chmod command — rather than inside the container, where the diagnosis takes minutes.

Defense Layer 3: CI Permission Gate

Add a permission check to the CI workflow so that files with restrictive modes never merge to main:

# .github/workflows/test.yml
- name: Check file permissions
  run: |
    VIOLATIONS=$(find . -type f -name "*.py" ! -path "./.git/*" \
      -perm /g=,o= -print)
    if [ -n "$VIOLATIONS" ]; then
      echo "Files with owner-only permissions:"
      echo "$VIOLATIONS"
      exit 1
    fi

This check fails the PR if any Python file lacks group or other read permissions. The AI agent's restrictive umask is caught before the branch is ever merged, before a container is ever built.

Detecting Agent Sessions as the Source

When debugging a permission problem, check whether the affected files were all modified in the same coding session:

git log --format="%H %ae %aI" -- akashic/query/discover.py

If the files share a commit author and timestamp that corresponds to an agent session, the umask is the likely culprit. Check whether the commit author is the Claude Code agent identity.

You can also check the current umask of any active process:

# Check Claude Code's umask (process must be running)
cat /proc/$(pgrep -f claude)/status | grep Umask
# Or on macOS:
launchctl config user umask  # check launchd-managed process umask

Key Takeaways

AI agent sessions running with restrictive umask (e.g., 077) create Python files with mode 600 — readable only by the owner.
These files look completely normal on the host because the developer is the owner. They detonate when a Docker container running as a different user tries to import them.
The first rebuild after fixing permissions can fail silently due to Docker layer caching. Always use --no-cache when fixing permission-related build failures.
The most reliable defense is a RUN chmod step in the Dockerfile, after COPY, which normalizes permissions unconditionally on every build.
A pre-build scan (find . -perm /g=,o= -name "*.py") and CI gate catch the problem before it reaches the container.

What's Next

Permission drift is one source of invisible accumulation. In Lesson 236, we examine process accumulation — zombie processes, orphaned test runners, and duplicate service instances that pile up silently and consume resources without any alert firing.