DevOps Engineer

Docker Sandbox — Why Every Non-Main Agent Runs in a Container

3 destructive incidents caught, zero damageEngineering & DevOps5 min read

Key Takeaway

With 30 AI agents sharing one host, a single rogue command could nuke everything. Every non-main agent runs in a Docker sandbox with filesystem isolation, network restrictions, and a shell allowlist of 63 safe binaries.

The Problem

Thirty AI agents. One Linux host. Each agent can execute shell commands.

Think about that for a second.

An agent that hallucinates rm -rf / doesn't care about your uptime. An agent that runs curl to an unexpected URL doesn't care about your secrets. An agent that spawns a Bitcoin miner doesn't care about your AWS bill.

Without isolation, you're one bad inference away from a very bad day.

This isn't theoretical. During development, an agent attempted rm -rf /tmp as a "cleanup" step. On a shared host, /tmp had sockets, PID files, and shared memory segments from other running agents. That one command would have cascaded into 12 agent crashes.

We needed isolation that was strict enough to prevent damage but permissive enough that agents could still do their jobs — write code, run tests, interact with Git, execute build commands.

The Solution

Every non-main agent runs inside a Docker container built from a custom image: mrchief-sandbox-common:bookworm-slim. The sandbox enforces filesystem isolation (agents can only write to their workspace), network restrictions (limited outbound, no host network), and a shell command allowlist.

The Process

Step 1: Custom Base Image

dockerfileShow code

# mrchief-sandbox-common:bookworm-slim
FROM debian:bookworm-slim

# Install everything agents need — nothing they don't
RUN apt-get update && apt-get install -y --no-install-recommends \
    python3 python3-pip python3-venv \
    nodejs npm \
    git openssh-client \
    curl wget jq \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Pre-install common Python packages
RUN pip3 install --break-system-packages \
    django djangorestframework celery \
    pytest black ruff mypy \
    web3 eth-abi

# Pre-install Node packages
RUN npm install -g hardhat typescript ts-node

# Create non-root user for agent execution
RUN useradd -m -s /bin/bash agent
USER agent
WORKDIR /workspace

Step 2: Container Configuration

Each agent gets a container with specific bind mounts and restrictions:

yamlShow code

# Mr.Chief sandbox configuration per agent
sandbox:
  image: mrchief-sandbox-common:bookworm-slim
  filesystem: workspaceOnly    # Can only write to /workspace
  network: limited             # Outbound to approved domains only
  hostAccess: none             # No access to host processes or filesystem
  bindMounts:
    - source: ~/.ssh
      target: /home/agent/.ssh
      readOnly: true           # Git operations work, but can't modify keys
    - source: ~/.gitconfig
      target: /home/agent/.gitconfig
      readOnly: true
    - source: ./workspace
      target: /workspace
      readOnly: false          # Agent's working directory — full access

Step 3: Shell Allowlist

The sandbox intercepts every shell command and checks it against an allowlist:

jsonShow code

{
  "allowedBinaries": [
    "python3", "pip3", "node", "npm", "npx",
    "git", "ssh", "ssh-keygen",
    "curl", "wget", "jq",
    "cat", "head", "tail", "grep", "sed", "awk",
    "ls", "find", "wc", "sort", "uniq", "diff",
    "mkdir", "cp", "mv", "touch", "chmod",
    "tar", "gzip", "unzip",
    "echo", "printf", "date", "env",
    "hardhat", "forge", "cast",
    "pytest", "black", "ruff", "mypy",
    "docker"
  ],
  "blockedPatterns": [
    "rm -rf /",
    "rm -rf /*",
    ":(){ :|:& };:",
    "mkfs",
    "dd if=/dev/zero",
    "shutdown",
    "reboot"
  ],
  "requiresApproval": [
    "rm",
    "kill",
    "pkill",
    "apt-get install",
    "pip install"
  ]
}

Note: rm isn't blocked — it requires approval. Agents need to clean up build artifacts. But rm -rf / is pattern-blocked regardless.

Step 4: Network Restrictions

bashShow code

# iptables rules inside the container
# Allow outbound to approved domains only
iptables -A OUTPUT -d gitlab.com -j ACCEPT
iptables -A OUTPUT -d github.com -j ACCEPT
iptables -A OUTPUT -d registry.npmjs.org -j ACCEPT
iptables -A OUTPUT -d pypi.org -j ACCEPT
iptables -A OUTPUT -d files.pythonhosted.org -j ACCEPT
iptables -A OUTPUT -d render.com -j ACCEPT
iptables -A OUTPUT -d vercel.com -j ACCEPT
# Block everything else
iptables -A OUTPUT -j DROP

The real incident that validated this:

An agent, during a "clean up temporary files" step, ran:

bashShow code

rm -rf /tmp

The sandbox intercepted it. The agent received:

View details

⚠️ Command requires approval: rm -rf /tmp
Reason: 'rm' is in the requiresApproval list
Action: blocked — awaiting human approval

On the host, /tmp contained Unix sockets for 8 running agent sessions, PostgreSQL shared memory files, and Redis dump files. That one blocked command prevented a cascade failure across 12 agents.

The Results

Metric	Without Sandbox	With Sandbox
Agents on one host	30	30
Destructive command incidents	3 (in first month)	0 (all caught)
Agent-to-agent interference	Possible	Impossible
Secret leakage risk	High	None (no host access)
Performance overhead	N/A	~2% (container layer)
Agent capability impact	N/A	None (all tools available)

Try It Yourself

Start with workspaceOnly filesystem isolation — it's the highest-impact, lowest-effort change. Agents rarely need to write outside their project directory. Add the shell allowlist next — enumerate what's needed, block everything else. Network restrictions come last because they require knowing your dependency hosts upfront.

Start free on Mr.Chief →

Trust, but sandbox.

DockerSecurityAI AgentsSandboxingDevOps

Related case studies

CTO

The Engineering Pipeline — From Idea to Production in 24 Hours

From approved idea to production deployment in one day. See how 31 AI agents coordinate across GitLab, Docker, Vercel, and Render to ship with confidence.

Full feature: 2-3 weeks → 10 hours, 91% coverage5 min read

Software Engineer

Code Refactoring at Scale — Agent Migrated 200 Files to New Pattern

Migrating 200 Django files from function-based to class-based views — an AI agent did it in 4 hours with zero regressions. A developer would need 2-3 weeks.

200 files migrated in 4 hours, zero regressions4 min read

Full-Stack Developer

Delegating a Full Feature to Claude Code — JWT Auth Built in 20 Minutes

How PyratzLabs delegated a complete Django JWT authentication feature to Claude Code via ACP — models, views, serializers, middleware, and 15 tests built in 20 minutes with 94% coverage.

Full feature: 4-8 hours → 20 minutes, 94% coverage5 min read

Want results like these?

Start free with your own AI team. No credit card required.

Start Free →Browse agents