DevOps Engineer
Docker Sandbox β Why Every Non-Main Agent Runs in a Container
Key Takeaway
With 30 AI agents sharing one host, a single rogue command could nuke everything. Every non-main agent runs in a Docker sandbox with filesystem isolation, network restrictions, and a shell allowlist of 63 safe binaries.
The Problem
Thirty AI agents. One Linux host. Each agent can execute shell commands.
Think about that for a second.
An agent that hallucinates rm -rf / doesn't care about your uptime. An agent that runs curl to an unexpected URL doesn't care about your secrets. An agent that spawns a Bitcoin miner doesn't care about your AWS bill.
Without isolation, you're one bad inference away from a very bad day.
This isn't theoretical. During development, an agent attempted rm -rf /tmp as a "cleanup" step. On a shared host, /tmp had sockets, PID files, and shared memory segments from other running agents. That one command would have cascaded into 12 agent crashes.
We needed isolation that was strict enough to prevent damage but permissive enough that agents could still do their jobs β write code, run tests, interact with Git, execute build commands.
The Solution
Every non-main agent runs inside a Docker container built from a custom image: mrchief-sandbox-common:bookworm-slim. The sandbox enforces filesystem isolation (agents can only write to their workspace), network restrictions (limited outbound, no host network), and a shell command allowlist.
The Process
Step 1: Custom Base Image
dockerfileShow code
# mrchief-sandbox-common:bookworm-slim
FROM debian:bookworm-slim
# Install everything agents need β nothing they don't
RUN apt-get update && apt-get install -y --no-install-recommends \
python3 python3-pip python3-venv \
nodejs npm \
git openssh-client \
curl wget jq \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Pre-install common Python packages
RUN pip3 install --break-system-packages \
django djangorestframework celery \
pytest black ruff mypy \
web3 eth-abi
# Pre-install Node packages
RUN npm install -g hardhat typescript ts-node
# Create non-root user for agent execution
RUN useradd -m -s /bin/bash agent
USER agent
WORKDIR /workspace
Step 2: Container Configuration
Each agent gets a container with specific bind mounts and restrictions:
yamlShow code
# Mr.Chief sandbox configuration per agent
sandbox:
image: mrchief-sandbox-common:bookworm-slim
filesystem: workspaceOnly # Can only write to /workspace
network: limited # Outbound to approved domains only
hostAccess: none # No access to host processes or filesystem
bindMounts:
- source: ~/.ssh
target: /home/agent/.ssh
readOnly: true # Git operations work, but can't modify keys
- source: ~/.gitconfig
target: /home/agent/.gitconfig
readOnly: true
- source: ./workspace
target: /workspace
readOnly: false # Agent's working directory β full access
Step 3: Shell Allowlist
The sandbox intercepts every shell command and checks it against an allowlist:
jsonShow code
{
"allowedBinaries": [
"python3", "pip3", "node", "npm", "npx",
"git", "ssh", "ssh-keygen",
"curl", "wget", "jq",
"cat", "head", "tail", "grep", "sed", "awk",
"ls", "find", "wc", "sort", "uniq", "diff",
"mkdir", "cp", "mv", "touch", "chmod",
"tar", "gzip", "unzip",
"echo", "printf", "date", "env",
"hardhat", "forge", "cast",
"pytest", "black", "ruff", "mypy",
"docker"
],
"blockedPatterns": [
"rm -rf /",
"rm -rf /*",
":(){ :|:& };:",
"mkfs",
"dd if=/dev/zero",
"shutdown",
"reboot"
],
"requiresApproval": [
"rm",
"kill",
"pkill",
"apt-get install",
"pip install"
]
}
Note: rm isn't blocked β it requires approval. Agents need to clean up build artifacts. But rm -rf / is pattern-blocked regardless.
Step 4: Network Restrictions
bashShow code
# iptables rules inside the container
# Allow outbound to approved domains only
iptables -A OUTPUT -d gitlab.com -j ACCEPT
iptables -A OUTPUT -d github.com -j ACCEPT
iptables -A OUTPUT -d registry.npmjs.org -j ACCEPT
iptables -A OUTPUT -d pypi.org -j ACCEPT
iptables -A OUTPUT -d files.pythonhosted.org -j ACCEPT
iptables -A OUTPUT -d render.com -j ACCEPT
iptables -A OUTPUT -d vercel.com -j ACCEPT
# Block everything else
iptables -A OUTPUT -j DROP
The real incident that validated this:
An agent, during a "clean up temporary files" step, ran:
bashShow code
rm -rf /tmp
The sandbox intercepted it. The agent received:
View details
β οΈ Command requires approval: rm -rf /tmp
Reason: 'rm' is in the requiresApproval list
Action: blocked β awaiting human approval
On the host, /tmp contained Unix sockets for 8 running agent sessions, PostgreSQL shared memory files, and Redis dump files. That one blocked command prevented a cascade failure across 12 agents.
The Results
| Metric | Without Sandbox | With Sandbox |
|---|---|---|
| Agents on one host | 30 | 30 |
| Destructive command incidents | 3 (in first month) | 0 (all caught) |
| Agent-to-agent interference | Possible | Impossible |
| Secret leakage risk | High | None (no host access) |
| Performance overhead | N/A | ~2% (container layer) |
| Agent capability impact | N/A | None (all tools available) |
Try It Yourself
Start with workspaceOnly filesystem isolation β it's the highest-impact, lowest-effort change. Agents rarely need to write outside their project directory. Add the shell allowlist next β enumerate what's needed, block everything else. Network restrictions come last because they require knowing your dependency hosts upfront.
Trust, but sandbox.
Related case studies
CTO
The Engineering Pipeline β From Idea to Production in 24 Hours
From approved idea to production deployment in one day. See how 31 AI agents coordinate across GitLab, Docker, Vercel, and Render to ship with confidence.
Software Engineer
Code Refactoring at Scale β Agent Migrated 200 Files to New Pattern
Migrating 200 Django files from function-based to class-based views β an AI agent did it in 4 hours with zero regressions. A developer would need 2-3 weeks.
Full-Stack Developer
Delegating a Full Feature to Claude Code β JWT Auth Built in 20 Minutes
How PyratzLabs delegated a complete Django JWT authentication feature to Claude Code via ACP β models, views, serializers, middleware, and 15 tests built in 20 minutes with 94% coverage.
Want results like these?
Start free with your own AI team. No credit card required.