How to Secure 31 AI Agents Without Lobotomizing Them
The hidden trap of AI agent security: every layer you add risks cutting agents off from the knowledge that makes them useful.
The Moment We Realized Our AI Could Read Every API Key We Own
Our threat model exercise started as a checkbox. It ended with a cold sweat.
We mapped every agent's permissions β what each of our 31 AI agents could read, write, and modify. That's when we found it: the master orchestrator agent had read/write access to .env.studio β the single file containing every API key in the entire operation. Banking keys. OAuth tokens. Payment processing credentials. Everything.
If a prompt injection attack succeeded β say, via a malicious email the agent was asked to summarize β every credential could be exfiltrated in one request.
That was six months ago. Since then, we've built 5 independent security layers. Not "best practices" borrowed from a blog post. Not theoretical frameworks. Battle-tested layers, each born from a specific vulnerability we discovered the hard way.
Here's each one, why it matters, and the trap that almost made us undo all of it.
Layer 1: Shell Command Allowlisting
The problem: An agent with unrestricted shell access can rm -rf /, exfiltrate data via curl, install backdoors, or modify system files. When you're running 30 autonomous agents with shell access, you have 30 potential vectors for catastrophic commands.
The fix: Non-main agents can only execute commands from an approved list of ~60 binaries. Anything not on the list requires explicit human approval.
jsonShow code
{
"exec": {
"security": "allowlist",
"ask": "on-miss",
"safeBins": ["cat", "ls", "grep", "git", "python3", "node", "curl", "jq", "tree"]
}
}
What it caught: During a sub-agent test, a code agent attempted to run pkill -9 to restart a process. Under unrestricted access, this would have succeeded β potentially killing critical services. With the allowlist, the command was blocked and escalated. The task was rerouted to the DevOps sub-agent which had the appropriate tooling.
The key: The allowlist isn't about blocking everything. It's about creating a known-good set and escalating everything else. Agents are creative β they'll construct commands you didn't anticipate. The approval gate catches the unknowns.
Layer 2: Filesystem Isolation
The problem: Any agent could read .env.studio β the file containing ALL API keys. A successful prompt injection via a malicious email could exfiltrate every credential in one request.
The fix: Non-main agents are restricted to their own workspace directory. They cannot read /etc/shadow, other agents' configs, .env files, or system paths.
jsonShow code
{
"workspaceOnly": true
}
Each agent's blast radius is proportional to its workspace, not the entire server. The master agent retains unrestricted filesystem access because it needs cross-workspace visibility for orchestration β but it's the only one.
What it caught: The Anthropic Agent SDK spawns a subprocess that inherits the entire process.env β exposing Qonto banking keys, gateway tokens, Google OAuth credentials, and xAI API keys. We saw the leak in debug traces. Filesystem isolation plus env sanitization closed this vector permanently.
Layer 3: Docker Sandboxing
The problem: Layers 1 and 2 are configuration-based β they rely on Mr.Chief enforcing the rules. But configuration can be bypassed. A sufficiently creative agent (or a prompt injection attack) might find a path around config restrictions.
The fix: Every non-main agent session runs inside a Docker container. The container has access to the agent's workspace directory and internet connectivity, but physically cannot see the host filesystem, other agents' workspaces, environment variables, or system files.
jsonShow code
{
"sandbox": {
"mode": "non-main",
"scope": "agent",
"workspaceAccess": "rw",
"docker": { "network": "bridge" }
}
}
This is the difference between a rule that says "don't open that door" and a wall where no door exists.
What it caught: The SDK environment leak from Layer 2? With Docker sandboxing, that subprocess would have inherited an empty environment. The credentials never existed inside the container. Zero exposure, regardless of what any SDK does internally.
Layer 4: Approval Gates for Unknown Commands
The problem: The allowlist (Layer 1) handles known-good commands. But agents construct novel commands you can't anticipate. The unknown-unknowns are the real threat.
The fix: Any command not in the allowlist triggers a human approval request. The agent pauses, surfaces the exact command, and waits for explicit confirmation.
jsonShow code
{
"ask": "on-miss"
}
What it caught: An investment sub-agent tried to use ssh to tunnel to an external service for real-time data. The command was technically valid and non-destructive, but it wasn't in the allowlist. The approval gate caught it β we reviewed it (it was fine) and approved. Without the gate, we'd never have known the agent was making outbound SSH connections.
The gate has taught us more about our agents' actual behavior than any logging system. When you see what commands agents try to run, you learn what they're actually doing versus what you think they're doing.
Layer 5: Automated Daily Security Audit
The problem: Security hardening is a point-in-time effort. A config change, a package update, a service restart β any of these can silently undo your security setup.
The fix: A cron job runs every morning at 7am, checking:
- UFW (firewall) active and configured
- Fail2ban running and blocking brute force
- Disk usage below threshold
- Mr.Chief version current
- Auth profiles healthy
- No unexpected open ports
- Failed SSH attempts in last 24h
Reports only issues. Silent when everything is clean.
What it caught: The morning after installing Docker, the audit flagged that Docker had opened port 2375 (Docker daemon API) on the network interface. This was an unintended side effect β the Docker API was accessible from the local network. The audit caught it before anyone could exploit it.
The principle: Every security improvement you add can be undone by the next change you make. The daily audit catches drift automatically. It's the difference between "we hardened our server once" and "our server is verified secure every morning."
The Trap: Security vs. Access (The Lesson That Almost Killed Our System)
Here's what nobody tells you about AI agent security: every layer you add creates a new isolation boundary. Each boundary is correct in isolation. But the compound effect can silently cut agents off from the shared context that makes them useful.
We implemented Docker sandboxing, filesystem isolation, and shell allowlists β and felt great about it. Then one of our agents told its human: "I don't know what we're talking about."
The agent wasn't broken. It was perfectly secure. And perfectly amnesiac.
The sandbox had cut it off from:
- The RAG knowledge base (institutional memory)
- The task registry (what everyone's working on)
- The regressions list (mistakes to never repeat)
From the agent's perspective, it had never learned any of the things the team had collectively learned. It was producing work that ignored everything β because from inside its container, everything didn't exist.
The fix: Read-only bind mounts.
jsonShow code
{
"sandbox": {
"docker": {
"binds": [
"/path/to/knowledge:/knowledge:ro",
"/path/to/tasks.json:/workspace/tasks.json:ro",
"/path/to/REGRESSIONS.md:/workspace/REGRESSIONS.md:ro"
]
}
}
}
The :ro suffix is critical β agents can read shared resources but cannot modify them. The security boundary is preserved: agents still can't access each other's workspaces, system files, or credentials. They just get read access to the three files that represent institutional memory.
The meta-lesson: After every security improvement, test agent access to shared resources. "Can Agent X query the knowledge base? Can it read the regressions list? Can it see active tasks?" If the answer is no, your security improvement just created a knowledge silo.
The goal is secure isolation with shared institutional memory β not secure isolation with amnesia.
Network Hardening (The 2-Minute Wins)
Beyond the 5 application layers, two infrastructure hardening steps that take 2 minutes each and close attack vectors permanently:
UFW (Firewall):
bashShow code
sudo ufw default deny incoming
sudo ufw allow 22/tcp
sudo ufw allow in on tailscale0
sudo ufw enable
Fail2ban (Brute Force Protection):
View details
[sshd]
enabled = true
maxretry = 3
bantime = 86400 # 24-hour ban
Within the first hour of enabling Fail2ban, it logged 4 blocked IPs attempting SSH brute force. Automated scanners hitting our SSH port continuously β we just never noticed because SSH key authentication was blocking them. Now they get banned after 3 attempts instead of trying indefinitely.
The Security Stack at a Glance
| Layer | What It Does | What It Catches |
|---|---|---|
| 1. Shell Allowlist | Only ~60 approved commands | Destructive commands, unauthorized tools |
| 2. Filesystem Isolation | Agents see only their workspace | Credential exfiltration, cross-agent snooping |
| 3. Docker Sandbox | Hard container boundary | Config bypass, env leaks, host access |
| 4. Approval Gates | Human confirms unknowns | Novel commands, unexpected behaviors |
| 5. Daily Audit | Morning verification of all layers | Security drift, config regression, new ports |
| Bonus: Bind Mounts | Shared knowledge, read-only | The access-vs-security trap |
Each layer is independently verifiable. Each was born from a specific vulnerability. Together, they create defense in depth β compromise one layer, and four others still hold.
What We Built From This
These 5 layers secure 31 agents running across our AI startup studio. We packaged the same security architecture into Mr.Chief β an AI Chief of Staff that lives in your messages.
GDPR-native. EU-hosted. Not retrofitted β built this way from day one.
Because an AI assistant that can read your email should have better security than your email.
This article is part of a series on production AI agent architecture. See also: How to Run 31 AI Agents in Production, Why Your AI Agent Has Amnesia, The Real Cost of Running AI Agents.
Related posts
How to Run 31 AI Agents in Production: The Architecture That Actually Works
Learn how to run 31+ AI agents in production with circuit breakers, cascading validation, task registries, and model differentiation. Real architecture, real failures, real fixes.
8 min read
The Real Cost of Running AI Agents (And How We Cut It 80%)
How to cut AI agent costs 80% with model differentiation, skills-in-prompt loading, custom Docker images, session architecture, and automated performance scorecards. Real numbers from 31 production agents.
10 min read
Ready to delegate?
Start free with your own AI team. No credit card required.