CTO

Scoring 12/13 on a Security Hardening Checklist β€” The Agent Did the Work

13/13 security scoreProductivity & Security4 min read

Key Takeaway

Johann Sathianathen published a 13-point Mr.Chief security hardening checklist β€” our AI agent scored 12/13 on the first automated pass and hit 13/13 within hours, with zero manual intervention after the initial run.

The Problem

Running AI agents on a public-facing server is a liability if you're not careful. These agents have shell access. They can execute code. They interact with APIs, databases, and the filesystem. One misconfiguration and you've given an LLM the keys to your infrastructure.

I knew our setup was solid β€” UFW, fail2ban, Docker sandboxing, the works. But "I think it's secure" isn't the same as "I've verified it's secure against a published standard."

Then Johann Sathianathen published his Mr.Chief security hardening guide. Thirteen specific requirements. A checklist with clear pass/fail criteria. Exactly what I needed: an external standard to measure against.

The question was: do I spend an afternoon manually checking each item, or do I let the agent do it?

The Solution

The Healthcheck skill in Mr.Chief can run a structured security audit against a defined checklist. I fed it Johann's 13 requirements and told it to check every single one. Report back pass or fail. Suggest fixes for anything that fails.

The Process

Here's the task I gave the agent:

View details
Run the full Johann Sathianathen security hardening checklist.
For each of the 13 items, check current system state, report
PASS or FAIL, and provide the fix command for any failures.

The agent systematically worked through every item. Here's the full checklist with results from the first run:

View details
Johann Sathianathen Mr.Chief Security Checklist
================================================

 #  β”‚ Requirement                          β”‚ Status β”‚ Detail
────┼──────────────────────────────────────┼────────┼──────────────────────────
 1  β”‚ UFW enabled, default deny inbound    β”‚ βœ… PASS β”‚ Default: deny (incoming)
 2  β”‚ UFW allow SSH (port 22)              β”‚ βœ… PASS β”‚ 22/tcp ALLOW IN
 3  β”‚ UFW allow Tailscale (41641/udp)      β”‚ βœ… PASS β”‚ 41641/udp ALLOW IN
 4  β”‚ fail2ban installed and active        β”‚ βœ… PASS β”‚ Active: active (running)
 5  β”‚ fail2ban SSH jail: 3 retries, 24h banβ”‚ βœ… PASS β”‚ maxretry=3, bantime=86400
 6  β”‚ SSH: root login disabled             β”‚ βœ… PASS β”‚ PermitRootLogin no
 7  β”‚ SSH: password authentication off     β”‚ βœ… PASS β”‚ PasswordAuthentication no
 8  β”‚ Unattended-upgrades enabled          β”‚ βœ… PASS β”‚ APT::Periodic::Unattended-Upgrade "1"
 9  β”‚ Docker sandbox for non-main agents   β”‚ βœ… PASS β”‚ sandbox: require (config.yaml)
10  β”‚ Shell allowlist configured           β”‚ βœ… PASS β”‚ 63 binaries in allowlist
11  β”‚ Filesystem isolation (workspace only)β”‚ βœ… PASS β”‚ Agent workspace mounted read-write
12  β”‚ Gateway bound to localhost/Tailscale β”‚ ❌ FAIL β”‚ Bound to 0.0.0.0:18789
13  β”‚ API authentication enabled           β”‚ βœ… PASS β”‚ Bearer token required

Score: 12/13

One failure. Item 12: the gateway was bound to 0.0.0.0 β€” listening on all interfaces instead of just localhost or the Tailscale interface.

The agent flagged it and provided the fix:

yamlShow code
# config.yaml β€” before
gateway:
  port: 18789

# config.yaml β€” after
gateway:
  host: 100.86.123.3  # Tailscale IP only
  port: 18789
bashShow code
# Apply the fix
mrchief gateway restart

After the restart, the agent re-ran the check:

View details
12  β”‚ Gateway bound to localhost/Tailscale β”‚ βœ… PASS β”‚ Bound to 100.86.123.3:18789

Score: 13/13 βœ…

Total elapsed time: under two hours from first run to perfect score. The fix itself took 30 seconds. The rest was the agent methodically verifying every item.

The Results

13

Checklist items

12/13 (92%)

First-pass score

13/13 (100%)

Final score (after fix)

1 (gateway binding)

Critical issues found

~8 minutes

Time to complete audit

~2 minutes

Time to fix + re-verify

Reviewed agent's report, approved fix

Manual effort

The gateway binding issue was real. Before the fix, if someone discovered port 18789, they could potentially reach the gateway from outside the Tailscale VPN. UFW would block it β€” so it wasn't an active vulnerability β€” but it was defense-in-depth failure. The agent caught it. I didn't.

Try It Yourself

Any Mr.Chief user can run this same checklist:

bashShow code
# Run the full security hardening audit
mrchief run --task "Run Johann Sathianathen's 13-point Mr.Chief security
hardening checklist. Check each item, report PASS/FAIL, and suggest fixes
for any failures."

# Or set it up as a weekly cron
mrchief cron add --schedule "0 8 * * 1" \
  --task "Run the full 13-point security hardening checklist and report to Telegram." \
  --channel telegram

Your server will tell you exactly where it stands. Every week. No excuses.


12/13 felt good. 13/13 felt right. The agent found the gap I missed.

Security ChecklistHardeningDockerSSHCompliance

Want results like these?

Start free with your own AI team. No credit card required.

Scoring 12/13 on a Security Hardening Checklist β€” The Agent Did the Work β€” Mr.Chief