How to Secure 31 AI Agents Without Lobotomizing Them

Mr.Chief TeamMar 10, 20269 min read

AI securityDocker sandboxingmulti-agent securityproduction AI

The hidden trap of AI agent security: every layer you add risks cutting agents off from the knowledge that makes them useful.

The Moment We Realized Our AI Could Read Every API Key We Own

Our threat model exercise started as a checkbox. It ended with a cold sweat.

We mapped every agent's permissions — what each of our 31 AI agents could read, write, and modify. That's when we found it: the master orchestrator agent had read/write access to .env.studio — the single file containing every API key in the entire operation. Banking keys. OAuth tokens. Payment processing credentials. Everything.

If a prompt injection attack succeeded — say, via a malicious email the agent was asked to summarize — every credential could be exfiltrated in one request.

That was six months ago. Since then, we've built 5 independent security layers. Not "best practices" borrowed from a blog post. Not theoretical frameworks. Battle-tested layers, each born from a specific vulnerability we discovered the hard way.

Here's each one, why it matters, and the trap that almost made us undo all of it.

Layer 1: Shell Command Allowlisting

The problem: An agent with unrestricted shell access can rm -rf /, exfiltrate data via curl, install backdoors, or modify system files. When you're running 30 autonomous agents with shell access, you have 30 potential vectors for catastrophic commands.

The fix: Non-main agents can only execute commands from an approved list of ~60 binaries. Anything not on the list requires explicit human approval.

jsonShow code

{
  "exec": {
    "security": "allowlist",
    "ask": "on-miss",
    "safeBins": ["cat", "ls", "grep", "git", "python3", "node", "curl", "jq", "tree"]
  }
}

What it caught: During a sub-agent test, a code agent attempted to run pkill -9 to restart a process. Under unrestricted access, this would have succeeded — potentially killing critical services. With the allowlist, the command was blocked and escalated. The task was rerouted to the DevOps sub-agent which had the appropriate tooling.

The key: The allowlist isn't about blocking everything. It's about creating a known-good set and escalating everything else. Agents are creative — they'll construct commands you didn't anticipate. The approval gate catches the unknowns.

Layer 2: Filesystem Isolation

The problem: Any agent could read .env.studio — the file containing ALL API keys. A successful prompt injection via a malicious email could exfiltrate every credential in one request.

The fix: Non-main agents are restricted to their own workspace directory. They cannot read /etc/shadow, other agents' configs, .env files, or system paths.

jsonShow code

{
  "workspaceOnly": true
}

Each agent's blast radius is proportional to its workspace, not the entire server. The master agent retains unrestricted filesystem access because it needs cross-workspace visibility for orchestration — but it's the only one.

What it caught: The Anthropic Agent SDK spawns a subprocess that inherits the entire process.env — exposing Qonto banking keys, gateway tokens, Google OAuth credentials, and xAI API keys. We saw the leak in debug traces. Filesystem isolation plus env sanitization closed this vector permanently.

Layer 3: Docker Sandboxing

The problem: Layers 1 and 2 are configuration-based — they rely on Mr.Chief enforcing the rules. But configuration can be bypassed. A sufficiently creative agent (or a prompt injection attack) might find a path around config restrictions.

The fix: Every non-main agent session runs inside a Docker container. The container has access to the agent's workspace directory and internet connectivity, but physically cannot see the host filesystem, other agents' workspaces, environment variables, or system files.

jsonShow code

{
  "sandbox": {
    "mode": "non-main",
    "scope": "agent",
    "workspaceAccess": "rw",
    "docker": { "network": "bridge" }
  }
}

This is the difference between a rule that says "don't open that door" and a wall where no door exists.

What it caught: The SDK environment leak from Layer 2? With Docker sandboxing, that subprocess would have inherited an empty environment. The credentials never existed inside the container. Zero exposure, regardless of what any SDK does internally.

Layer 4: Approval Gates for Unknown Commands

The problem: The allowlist (Layer 1) handles known-good commands. But agents construct novel commands you can't anticipate. The unknown-unknowns are the real threat.

The fix: Any command not in the allowlist triggers a human approval request. The agent pauses, surfaces the exact command, and waits for explicit confirmation.

jsonShow code

{
  "ask": "on-miss"
}

What it caught: An investment sub-agent tried to use ssh to tunnel to an external service for real-time data. The command was technically valid and non-destructive, but it wasn't in the allowlist. The approval gate caught it — we reviewed it (it was fine) and approved. Without the gate, we'd never have known the agent was making outbound SSH connections.

The gate has taught us more about our agents' actual behavior than any logging system. When you see what commands agents try to run, you learn what they're actually doing versus what you think they're doing.

Layer 5: Automated Daily Security Audit

The problem: Security hardening is a point-in-time effort. A config change, a package update, a service restart — any of these can silently undo your security setup.

The fix: A cron job runs every morning at 7am, checking:

UFW (firewall) active and configured
Fail2ban running and blocking brute force
Disk usage below threshold
Mr.Chief version current
Auth profiles healthy
No unexpected open ports
Failed SSH attempts in last 24h

Reports only issues. Silent when everything is clean.

What it caught: The morning after installing Docker, the audit flagged that Docker had opened port 2375 (Docker daemon API) on the network interface. This was an unintended side effect — the Docker API was accessible from the local network. The audit caught it before anyone could exploit it.

The principle: Every security improvement you add can be undone by the next change you make. The daily audit catches drift automatically. It's the difference between "we hardened our server once" and "our server is verified secure every morning."

The Trap: Security vs. Access (The Lesson That Almost Killed Our System)

Here's what nobody tells you about AI agent security: every layer you add creates a new isolation boundary. Each boundary is correct in isolation. But the compound effect can silently cut agents off from the shared context that makes them useful.

We implemented Docker sandboxing, filesystem isolation, and shell allowlists — and felt great about it. Then one of our agents told its human: "I don't know what we're talking about."

The agent wasn't broken. It was perfectly secure. And perfectly amnesiac.

The sandbox had cut it off from:

The RAG knowledge base (institutional memory)
The task registry (what everyone's working on)
The regressions list (mistakes to never repeat)

From the agent's perspective, it had never learned any of the things the team had collectively learned. It was producing work that ignored everything — because from inside its container, everything didn't exist.

The fix: Read-only bind mounts.

jsonShow code

{
  "sandbox": {
    "docker": {
      "binds": [
        "/path/to/knowledge:/knowledge:ro",
        "/path/to/tasks.json:/workspace/tasks.json:ro",
        "/path/to/REGRESSIONS.md:/workspace/REGRESSIONS.md:ro"
      ]
    }
  }
}

The :ro suffix is critical — agents can read shared resources but cannot modify them. The security boundary is preserved: agents still can't access each other's workspaces, system files, or credentials. They just get read access to the three files that represent institutional memory.

The meta-lesson: After every security improvement, test agent access to shared resources. "Can Agent X query the knowledge base? Can it read the regressions list? Can it see active tasks?" If the answer is no, your security improvement just created a knowledge silo.

The goal is secure isolation with shared institutional memory — not secure isolation with amnesia.

Network Hardening (The 2-Minute Wins)

Beyond the 5 application layers, two infrastructure hardening steps that take 2 minutes each and close attack vectors permanently:

UFW (Firewall):

bashShow code

sudo ufw default deny incoming
sudo ufw allow 22/tcp
sudo ufw allow in on tailscale0
sudo ufw enable

Fail2ban (Brute Force Protection):

View details

[sshd]
enabled = true
maxretry = 3
bantime = 86400  # 24-hour ban

Within the first hour of enabling Fail2ban, it logged 4 blocked IPs attempting SSH brute force. Automated scanners hitting our SSH port continuously — we just never noticed because SSH key authentication was blocking them. Now they get banned after 3 attempts instead of trying indefinitely.

The Security Stack at a Glance

Layer	What It Does	What It Catches
1. Shell Allowlist	Only ~60 approved commands	Destructive commands, unauthorized tools
2. Filesystem Isolation	Agents see only their workspace	Credential exfiltration, cross-agent snooping
3. Docker Sandbox	Hard container boundary	Config bypass, env leaks, host access
4. Approval Gates	Human confirms unknowns	Novel commands, unexpected behaviors
5. Daily Audit	Morning verification of all layers	Security drift, config regression, new ports
Bonus: Bind Mounts	Shared knowledge, read-only	The access-vs-security trap

Each layer is independently verifiable. Each was born from a specific vulnerability. Together, they create defense in depth — compromise one layer, and four others still hold.

What We Built From This

These 5 layers secure 31 agents running across our AI startup studio. We packaged the same security architecture into Mr.Chief — an AI Chief of Staff that lives in your messages.

GDPR-native. EU-hosted. Not retrofitted — built this way from day one.

Because an AI assistant that can read your email should have better security than your email.

Start free on Mr.Chief →

This article is part of a series on production AI agent architecture. See also: How to Run 31 AI Agents in Production, Why Your AI Agent Has Amnesia, The Real Cost of Running AI Agents.

The Trading Desk: Cross-Domain AI Investment Orchestration (2026)

A probabilistic, cross-signal investment system that synthesizes equities, crypto, and prediction markets — with a strict cron DAG, aggressive Red Team review, and veto-gated execution. Live numbers from 68 closed paper trades.

8 min read

How to Run 31 AI Agents in Production: The Architecture That Actually Works

Learn how to run 31+ AI agents in production with circuit breakers, cascading validation, task registries, and model differentiation. Real architecture, real failures, real fixes.

8 min read

The Real Cost of Running AI Agents (And How We Cut It 80%)

How to cut AI agent costs 80% with model differentiation, skills-in-prompt loading, custom Docker images, session architecture, and automated performance scorecards. Real numbers from 31 production agents.

10 min read

Ready to delegate?

Start free with your own AI team. No credit card required.

Start Free →Browse agents

The Moment We Realized Our AI Could Read Every API Key We Own

Layer 1: Shell Command Allowlisting

Layer 2: Filesystem Isolation

Layer 3: Docker Sandboxing

Layer 4: Approval Gates for Unknown Commands

Layer 5: Automated Daily Security Audit

The Trap: Security vs. Access (The Lesson That Almost Killed Our System)

Network Hardening (The 2-Minute Wins)

The Security Stack at a Glance

What We Built From This

Related posts