DevOps Engineer

Docker Sandbox β€” Why Every Non-Main Agent Runs in a Container

3 destructive incidents caught, zero damageEngineering & DevOps5 min read

Key Takeaway

With 30 AI agents sharing one host, a single rogue command could nuke everything. Every non-main agent runs in a Docker sandbox with filesystem isolation, network restrictions, and a shell allowlist of 63 safe binaries.

The Problem

Thirty AI agents. One Linux host. Each agent can execute shell commands.

Think about that for a second.

An agent that hallucinates rm -rf / doesn't care about your uptime. An agent that runs curl to an unexpected URL doesn't care about your secrets. An agent that spawns a Bitcoin miner doesn't care about your AWS bill.

Without isolation, you're one bad inference away from a very bad day.

This isn't theoretical. During development, an agent attempted rm -rf /tmp as a "cleanup" step. On a shared host, /tmp had sockets, PID files, and shared memory segments from other running agents. That one command would have cascaded into 12 agent crashes.

We needed isolation that was strict enough to prevent damage but permissive enough that agents could still do their jobs β€” write code, run tests, interact with Git, execute build commands.

The Solution

Every non-main agent runs inside a Docker container built from a custom image: mrchief-sandbox-common:bookworm-slim. The sandbox enforces filesystem isolation (agents can only write to their workspace), network restrictions (limited outbound, no host network), and a shell command allowlist.

The Process

Step 1: Custom Base Image

dockerfileShow code
# mrchief-sandbox-common:bookworm-slim
FROM debian:bookworm-slim

# Install everything agents need β€” nothing they don't
RUN apt-get update && apt-get install -y --no-install-recommends \
    python3 python3-pip python3-venv \
    nodejs npm \
    git openssh-client \
    curl wget jq \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Pre-install common Python packages
RUN pip3 install --break-system-packages \
    django djangorestframework celery \
    pytest black ruff mypy \
    web3 eth-abi

# Pre-install Node packages
RUN npm install -g hardhat typescript ts-node

# Create non-root user for agent execution
RUN useradd -m -s /bin/bash agent
USER agent
WORKDIR /workspace

Step 2: Container Configuration

Each agent gets a container with specific bind mounts and restrictions:

yamlShow code
# Mr.Chief sandbox configuration per agent
sandbox:
  image: mrchief-sandbox-common:bookworm-slim
  filesystem: workspaceOnly    # Can only write to /workspace
  network: limited             # Outbound to approved domains only
  hostAccess: none             # No access to host processes or filesystem
  bindMounts:
    - source: ~/.ssh
      target: /home/agent/.ssh
      readOnly: true           # Git operations work, but can't modify keys
    - source: ~/.gitconfig
      target: /home/agent/.gitconfig
      readOnly: true
    - source: ./workspace
      target: /workspace
      readOnly: false          # Agent's working directory β€” full access

Step 3: Shell Allowlist

The sandbox intercepts every shell command and checks it against an allowlist:

jsonShow code
{
  "allowedBinaries": [
    "python3", "pip3", "node", "npm", "npx",
    "git", "ssh", "ssh-keygen",
    "curl", "wget", "jq",
    "cat", "head", "tail", "grep", "sed", "awk",
    "ls", "find", "wc", "sort", "uniq", "diff",
    "mkdir", "cp", "mv", "touch", "chmod",
    "tar", "gzip", "unzip",
    "echo", "printf", "date", "env",
    "hardhat", "forge", "cast",
    "pytest", "black", "ruff", "mypy",
    "docker"
  ],
  "blockedPatterns": [
    "rm -rf /",
    "rm -rf /*",
    ":(){ :|:& };:",
    "mkfs",
    "dd if=/dev/zero",
    "shutdown",
    "reboot"
  ],
  "requiresApproval": [
    "rm",
    "kill",
    "pkill",
    "apt-get install",
    "pip install"
  ]
}

Note: rm isn't blocked β€” it requires approval. Agents need to clean up build artifacts. But rm -rf / is pattern-blocked regardless.

Step 4: Network Restrictions

bashShow code
# iptables rules inside the container
# Allow outbound to approved domains only
iptables -A OUTPUT -d gitlab.com -j ACCEPT
iptables -A OUTPUT -d github.com -j ACCEPT
iptables -A OUTPUT -d registry.npmjs.org -j ACCEPT
iptables -A OUTPUT -d pypi.org -j ACCEPT
iptables -A OUTPUT -d files.pythonhosted.org -j ACCEPT
iptables -A OUTPUT -d render.com -j ACCEPT
iptables -A OUTPUT -d vercel.com -j ACCEPT
# Block everything else
iptables -A OUTPUT -j DROP

The real incident that validated this:

An agent, during a "clean up temporary files" step, ran:

bashShow code
rm -rf /tmp

The sandbox intercepted it. The agent received:

View details
⚠️ Command requires approval: rm -rf /tmp
Reason: 'rm' is in the requiresApproval list
Action: blocked β€” awaiting human approval

On the host, /tmp contained Unix sockets for 8 running agent sessions, PostgreSQL shared memory files, and Redis dump files. That one blocked command prevented a cascade failure across 12 agents.

The Results

MetricWithout SandboxWith Sandbox
Agents on one host3030
Destructive command incidents3 (in first month)0 (all caught)
Agent-to-agent interferencePossibleImpossible
Secret leakage riskHighNone (no host access)
Performance overheadN/A~2% (container layer)
Agent capability impactN/ANone (all tools available)

Try It Yourself

Start with workspaceOnly filesystem isolation β€” it's the highest-impact, lowest-effort change. Agents rarely need to write outside their project directory. Add the shell allowlist next β€” enumerate what's needed, block everything else. Network restrictions come last because they require knowing your dependency hosts upfront.


Trust, but sandbox.

DockerSecurityAI AgentsSandboxingDevOps

Want results like these?

Start free with your own AI team. No credit card required.

Docker Sandbox β€” Why Every Non-Main Agent Runs in a Container β€” Mr.Chief