Agent Memory Management: Why Your Fleet Eats Itself

You dispatch a code agent to write tests. The suite has 3,400 tests. The agent decides to run the full suite to establish a baseline. The full suite takes longer than the agent's execution window. It times out.

The agent retries in the background.

That background task also times out. It spawns another. Meanwhile, the first two processes haven't been cleaned up — they're still resident in memory, still consuming CPU, still holding file locks. A third retry spawns. Then a fourth.

Eleven zombie processes later, macOS kills the session. Your terminal is gone. The work in progress — gone. The 35 minutes you spent getting the agent to the right context — gone.

The output file is empty.

This is not an edge case. It is a predictable failure mode with a predictable cause: the agent had no awareness of the resource environment it was operating in.

The Resource Budget Problem

Every agent you spawn is competing for the same finite memory. The agent doesn't know this.

A subagent executing on a 24GB machine sees a machine with 24GB of RAM. What it doesn't see: Docker consuming 17GB across running containers. The parent session holding another 1-2GB of context. Background services, crons, monitoring daemons taking another 1-2GB. By the time your subagent is executing, it may have 3-4GB of usable headroom — possibly less.

No agent framework tells subagents this. Claude Code does not receive a system-level memory budget. The OpenAI Agents SDK does not pass available RAM as context. Anthropic's Claude Agent SDK does not inject resource constraints into subagent prompts.

The math is unforgiving. A 3,400-test suite that requires 6GB to run in parallel will not gracefully degrade to a smaller batch when it hits memory pressure. It will hang, then get killed by the OS, then retry the exact same operation with the exact same resource footprint.

Background Task Sprawl

The specific failure mode that kills sessions is not a single OOM event. It is the retry loop.

Here is how it unfolds. An agent hits a timeout during a long-running operation. The operation was running in the background — a test suite, a build process, a coverage report. The agent, not seeing output, assumes the task failed. It spawns a background retry. The background retry also hangs. The agent spawns another.

Each spawned task is a new process. Each new process inherits the same resource appetite as the original. None of them are cleaned up before the next one starts. The system does not tell the agent that three previous attempts are still resident in memory. The agent has no visibility into its own process tree.

This failure is silent in the worst way: the agent appears to be working. Background tasks are running. Progress indicators may be ticking. The human waits. The machine is running out of memory. Eventually, something gets killed — and it is usually not just the agent.

The Rules That Prevent This

These are not suggestions. They are operational constraints that should be encoded into every agent system prompt that dispatches subagents to do compute-heavy work.

Subagents run targeted tests, never full suites.

A subagent that needs test coverage data should run the specific test files related to the code it changed — not the entire test suite. On a 3,400-test codebase, the subagent needs 40 tests, not 3,400. The full suite is a resource cannon. Surgical execution is the only acceptable default.

Use CI logs as a proxy for coverage data.

If you need to know the coverage baseline, read the most recent CI run logs. Do not re-run coverage locally. CI already has the data. CI ran it in a sandboxed environment with the right resource allocation. Pulling the existing result is free. Recomputing it locally is expensive and risky.

One full test suite at a time, maximum.

If a full suite run is genuinely required, it runs alone. No parallel agent tasks. No background processes. One process with full resource access. Not two. Not three in parallel to "save time." The time you save is not worth the session you lose.

Pre-flight memory check before heavy operations.

On macOS, memory_pressure prints current memory statistics and ends with a system-wide free-memory percentage. For a machine-readable pressure level, query sysctl kern.memorystatus_vm_pressure_level — it returns 1 (normal), 2 (warn), or 4 (critical). An agent that checks this before attempting a heavy operation can make an informed decision: proceed, reduce scope, or escalate to the human. One command, run before the expensive operation, prevents the entire failure class.

# Check memory pressure before heavy compute
memory_pressure | tail -1
# Returns: System-wide memory free percentage: 41%

# Machine-readable pressure level
sysctl kern.memorystatus_vm_pressure_level
# Returns: kern.memorystatus_vm_pressure_level: 1
# 1 = normal, 2 = warn, 4 = critical

Switch to foreground execution when background tasks start failing silently.

If a background task produced no output, do not retry it in the background. Bring it to the foreground. Foreground execution surfaces failure immediately — you see the error, you see the timeout, you see what actually happened. Background tasks that fail silently will silently retry forever.

Designing Memory-Aware Agent Architectures

The solution is not better agents. It is better architecture around the agents.

Decompose by resource profile, not just by domain.

When you assign work to subagents, think about what each task costs — not just what it does. A subagent that synthesizes text is cheap. A subagent that runs a test suite is expensive. A subagent that builds a Docker image while running tests is dangerous. These are different resource tiers. They should not compete for the same headroom without explicit sequencing.

Delegate heavy compute to CI, not to subagents.

CI exists for a reason. It runs in isolated, correctly-resourced environments. It has the right memory allocation, the right CPU quota, the right sandbox. An agent that triggers a CI run and waits for the result is not doing less work — it is doing the right work. Subagents are not replacements for CI infrastructure. They are consumers of CI results.

Implement graceful degradation for resource-constrained execution.

A subagent system prompt that includes resource awareness might look like: "If running the test suite would take more than 2 minutes or you encounter a memory warning, stop and report what you know so far. Do not retry. Escalate to the orchestrator." This is a simple instruction. It prevents the retry loop entirely.

Use result delegation between subagents.

Subagent A runs tests and writes results to a shared file. Subagent B reads the results file and synthesizes findings. They do not both run tests. They do not duplicate compute. One produces, one consumes. The resource cost is linear, not multiplicative.

The Compound Effect

One OOM event does not kill only the task that caused it. macOS memory pressure kills processes in priority order — and your Docker containers, your background services, your launchd-managed daemons, and your cron jobs are all candidates.

The blast radius of a runaway agent extends far beyond the agent itself.

When memory pressure hits critical on a development machine running a typical stack: the OS terminates the highest memory consumers first. That may be a Docker container running your local database. Or the Agent Gateway session managing your other automations. Or the monitoring daemon watching your trading infrastructure. Or all three.

You recover the agent session. The agent is gone. But so is the PostgreSQL container that was mid-transaction. And the Watchdog Service that was supposed to restart failed services. And two cron jobs that were in the middle of their scheduled runs.

The compounding does not stop at process termination. Killed Docker containers leave data in inconsistent states. Services that die mid-write leave corrupt lock files. Cron jobs that are killed mid-run may have partially executed — sent half a notification, written half a file, made half an API call. Cleaning up after a memory exhaustion event takes longer than the original task would have taken if it had been scoped correctly.

The economics are simple. Five minutes of disciplined scoping — targeted tests, CI log reads, memory pre-flight checks — prevents 35 minutes of recovery, re-establishing context, cleaning up partial state, and restarting killed services.

Agents that don't know their memory ceiling will eventually find it. Build the ceiling into the architecture before they do.

Lesson Drill

Audit one running agent in your current system against the five memory-management rules:

Identify the heaviest compute task that agent performs. Is it scoped (targeted files/tests) or unbounded (full suite)?
Check whether any background tasks it spawns have an explicit timeout and a foreground fallback path.
Review the agent's system prompt: does it include an explicit resource constraint instruction? If not, add one.
Run sysctl kern.memorystatus_vm_pressure_level on the host machine before and after a typical agent run. Does the level stay at 1 (normal) throughout?
Verify that the agent reads CI logs rather than re-running coverage locally. If it recomputes, that is your highest-priority fix.

Document the gaps. Each gap is a session-killing failure waiting for the right conditions.

Bottom Line

Subagents do not know their resource ceiling. They will attempt any operation they are prompted to attempt, at any scale, without regard for available memory. The retry loop that emerges from a timed-out background task compounds exponentially — each retry adds processes without cleaning up the previous ones.

The five rules — targeted tests, CI log reads, one full suite at a time, pre-flight memory check, foreground recovery on silent failure — are the architectural constraints that prevent the failure. Encode them into every system prompt that dispatches subagents to compute-heavy work. They are not suggestions. They are the difference between a working session and a dead terminal with an empty output file.