CTO
Scoring 12/13 on a Security Hardening Checklist β The Agent Did the Work
Key Takeaway
Johann Sathianathen published a 13-point Mr.Chief security hardening checklist β our AI agent scored 12/13 on the first automated pass and hit 13/13 within hours, with zero manual intervention after the initial run.
The Problem
Running AI agents on a public-facing server is a liability if you're not careful. These agents have shell access. They can execute code. They interact with APIs, databases, and the filesystem. One misconfiguration and you've given an LLM the keys to your infrastructure.
I knew our setup was solid β UFW, fail2ban, Docker sandboxing, the works. But "I think it's secure" isn't the same as "I've verified it's secure against a published standard."
Then Johann Sathianathen published his Mr.Chief security hardening guide. Thirteen specific requirements. A checklist with clear pass/fail criteria. Exactly what I needed: an external standard to measure against.
The question was: do I spend an afternoon manually checking each item, or do I let the agent do it?
The Solution
The Healthcheck skill in Mr.Chief can run a structured security audit against a defined checklist. I fed it Johann's 13 requirements and told it to check every single one. Report back pass or fail. Suggest fixes for anything that fails.
The Process
Here's the task I gave the agent:
View details
Run the full Johann Sathianathen security hardening checklist.
For each of the 13 items, check current system state, report
PASS or FAIL, and provide the fix command for any failures.
The agent systematically worked through every item. Here's the full checklist with results from the first run:
View details
Johann Sathianathen Mr.Chief Security Checklist
================================================
# β Requirement β Status β Detail
βββββΌβββββββββββββββββββββββββββββββββββββββΌβββββββββΌββββββββββββββββββββββββββ
1 β UFW enabled, default deny inbound β β
PASS β Default: deny (incoming)
2 β UFW allow SSH (port 22) β β
PASS β 22/tcp ALLOW IN
3 β UFW allow Tailscale (41641/udp) β β
PASS β 41641/udp ALLOW IN
4 β fail2ban installed and active β β
PASS β Active: active (running)
5 β fail2ban SSH jail: 3 retries, 24h banβ β
PASS β maxretry=3, bantime=86400
6 β SSH: root login disabled β β
PASS β PermitRootLogin no
7 β SSH: password authentication off β β
PASS β PasswordAuthentication no
8 β Unattended-upgrades enabled β β
PASS β APT::Periodic::Unattended-Upgrade "1"
9 β Docker sandbox for non-main agents β β
PASS β sandbox: require (config.yaml)
10 β Shell allowlist configured β β
PASS β 63 binaries in allowlist
11 β Filesystem isolation (workspace only)β β
PASS β Agent workspace mounted read-write
12 β Gateway bound to localhost/Tailscale β β FAIL β Bound to 0.0.0.0:18789
13 β API authentication enabled β β
PASS β Bearer token required
Score: 12/13
One failure. Item 12: the gateway was bound to 0.0.0.0 β listening on all interfaces instead of just localhost or the Tailscale interface.
The agent flagged it and provided the fix:
yamlShow code
# config.yaml β before
gateway:
port: 18789
# config.yaml β after
gateway:
host: 100.86.123.3 # Tailscale IP only
port: 18789
bashShow code
# Apply the fix
mrchief gateway restart
After the restart, the agent re-ran the check:
View details
12 β Gateway bound to localhost/Tailscale β β
PASS β Bound to 100.86.123.3:18789
Score: 13/13 β
Total elapsed time: under two hours from first run to perfect score. The fix itself took 30 seconds. The rest was the agent methodically verifying every item.
The Results
13
Checklist items
12/13 (92%)
First-pass score
13/13 (100%)
Final score (after fix)
1 (gateway binding)
Critical issues found
~8 minutes
Time to complete audit
~2 minutes
Time to fix + re-verify
Reviewed agent's report, approved fix
Manual effort
The gateway binding issue was real. Before the fix, if someone discovered port 18789, they could potentially reach the gateway from outside the Tailscale VPN. UFW would block it β so it wasn't an active vulnerability β but it was defense-in-depth failure. The agent caught it. I didn't.
Try It Yourself
Any Mr.Chief user can run this same checklist:
bashShow code
# Run the full security hardening audit
mrchief run --task "Run Johann Sathianathen's 13-point Mr.Chief security
hardening checklist. Check each item, report PASS/FAIL, and suggest fixes
for any failures."
# Or set it up as a weekly cron
mrchief cron add --schedule "0 8 * * 1" \
--task "Run the full 13-point security hardening checklist and report to Telegram." \
--channel telegram
Your server will tell you exactly where it stands. Every week. No excuses.
12/13 felt good. 13/13 felt right. The agent found the gap I missed.
Related case studies
Founder
ClawHub: From 15 Skills to 52 in One Afternoon β The Skill Marketplace That Scales Your Agent
Started with 15 bundled skills. ClawHub marketplace got us to 52 in one afternoon. Finance, legal, security, research β here's how we evaluated and installed 37 skills.
Product Manager
Monitoring 100 Competitor Pages for Changes β Weekly Diff Report
An AI agent scrapes 100 competitor pages weekly, diffs them against the previous snapshot, and flags changes. Pricing shifts, new features, team hires β nothing slips through.
CTO
Security Audit That Runs Every Morning β 149 Intrusion Attempts Caught on Day One
An AI agent runs a full security audit every morning at 7am β UFW, SSH, fail2ban, open ports. Day one: 149 blocked brute-force attempts. Here's how we set it up with Mr.Chief.
Want results like these?
Start free with your own AI team. No credit card required.