Founder

52 Skills Installed — Which Ones Actually Get Used? The Agent Audit

14% context token savingsProductivity & Security5 min read

Key Takeaway

Running a skill audit across our 52 installed Mr.Chief skills revealed 8 dead-weight skills to prune, 3 outdated versions to update, and 2 custom skills worth publishing to ClawHub.

The Problem

Mr.Chief skills are powerful. Each one gives the agent a new capability — web scraping, financial analysis, email outreach, competitive intel. So I installed them aggressively. Every skill that looked useful got installed. Over 6 months, I accumulated 52 skills.

The problem with 52 skills: context window pollution.

Every installed skill adds to the agent's available context. Skill descriptions, capability summaries, parameter schemas — it all consumes tokens. At 52 skills, the agent was spending significant context window budget just knowing what it could do, before doing anything.

Worse: some skills overlapped. I had three different ways to search the web, two approaches to email drafting, and a LinkedIn scraper I'd literally never used. The agent occasionally picked the wrong skill for a task because the descriptions were too similar.

Skills are like apps on your phone. You install with optimism, use 30% regularly, and let the rest rot. Except these apps cost tokens every single turn.

The Solution

Alfrawd runs a skill audit. Check every installed skill against actual usage data. Score by frequency, recency, context cost, and value delivered. Prune what's dead. Update what's stale. Publish what's novel.

The ClawHub skill handles the marketplace operations — search, install, update, publish. The agent handles the judgment: what stays, what goes, what's worth sharing.

The Process

Step 1: Inventory and usage analysis

bashShow code

# List all installed skills
mrchief skills check

# Output:
# 52 skills installed
# Last audit: never
# Context budget: ~18,000 tokens on skill descriptions alone

The agent cross-references with session logs to build a usage profile:

bashShow code

# Agent analyzes session transcripts for skill invocations
# Builds usage matrix:

Skill Usage Report (Last 30 Days):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Skill                    | Uses | Last Used  | Tokens
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
google-workspace         |  142 | today      |    380
notion                   |   89 | today      |    420
excel                    |   67 | yesterday  |    350
reddit-scraper           |   45 | 3 days ago |    290
weather                  |   38 | today      |    180
session-logs             |   34 | today      |    250
...
linkedin-influencer      |    0 | never      |    340
programmatic-seo-spy     |    0 | never      |    380
customer-win-back        |    0 | never      |    290
messaging-ab-tester      |    0 | 45 days    |    310
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 2: Pruning decision logic

The agent applies a scoring framework:

pythonShow code

# Skill value score (0-100)
def score_skill(skill):
    frequency_score = min(skill.uses_30d * 2, 40)      # Max 40 points
    recency_score = max(0, 30 - skill.days_since_use)   # Max 30 points
    context_cost = -skill.token_cost / 50                # Penalty for heavy skills
    value_signal = skill.user_satisfaction * 10           # From feedback signals

    return frequency_score + recency_score + context_cost + value_signal

# Decision thresholds
# Score > 50: Keep
# Score 20-50: Review (might be seasonal)
# Score < 20: Prune candidate

Results:

View details

PRUNE (score < 20, unused 30+ days):
  ❌ linkedin-influencer-discovery (0 uses, 340 tokens)
  ❌ programmatic-seo-spy (0 uses, 380 tokens)
  ❌ customer-win-back-sequencer (0 uses, 290 tokens)
  ❌ messaging-ab-tester (0 uses, 310 tokens)
  ❌ champion-tracker (0 uses, 260 tokens)
  ❌ serp-feature-sniper (0 uses, 350 tokens)
  ❌ trending-ad-hook-spotter (0 uses, 280 tokens)
  ❌ tech-stack-teardown (0 uses, 320 tokens)

SAVINGS: ~2,530 tokens per turn (8 skills × avg 316 tokens)

UPDATE (newer versions available on ClawHub):
  🔄 excel v1.2.0 → v1.4.1 (new pivot table support)
  🔄 reddit-scraper v2.0.0 → v2.1.3 (rate limit fixes)
  🔄 weather v1.0.0 → v1.1.0 (Open-Meteo fallback)

Step 3: Execute

bashShow code

# Prune unused skills
clawhub uninstall linkedin-influencer-discovery
clawhub uninstall programmatic-seo-spy
clawhub uninstall customer-win-back-sequencer
# ... (5 more)

# Update outdated skills
clawhub update excel
clawhub update reddit-scraper
clawhub update weather

# Verify
mrchief skills check
# 44 skills installed
# Context budget: ~15,470 tokens (-2,530 saved)

Step 4: Publish custom skills

Two skills we built internally proved general enough to share:

bashShow code

# Publish PyratzLabs custom skills to ClawHub
clawhub publish skills/pyratz-compliance-checker \
  --name "eu-crypto-compliance" \
  --description "MiCA and EU regulatory compliance checker for crypto projects"

clawhub publish skills/pyratz-investor-pipeline \
  --name "investor-pipeline-tracker" \
  --description "Track investor conversations, term sheets, and follow-ups"

Both published. Both available for the community.

The Results

Metric	Before Audit	After Audit
Skills installed	52	44
Context tokens on skills	~18,000	~15,470
Token savings per turn	—	2,530 (~14%)
Outdated skills	3	0
Unused skills (30d)	8	0
Skills published to ClawHub	0	2
Skill selection accuracy	~85%	~94%

The skill selection accuracy improvement was unexpected. With fewer, more distinct skills, the agent picks the right tool more often. Less confusion, fewer retries.

The 14% context savings compound. Every turn, every conversation, every day. Over a month, that's millions of tokens not wasted on skills that were never invoked.

Try It Yourself

Run mrchief skills check to see your current skill inventory
Cross-reference with actual usage from session logs
Score each skill: frequency × recency − context cost
Prune anything unused for 30+ days (you can always reinstall)
Check ClawHub for updates: clawhub search [skill-name]
If you've built something useful, publish it: clawhub publish

Schedule the audit quarterly. Skills accumulate like browser tabs — you don't notice the bloat until you clear it.

Start free on Mr.Chief →

Every installed skill is a standing cost. Audit ruthlessly. Your context window will thank you.

ClawHubSkillsAuditContext WindowOptimization

Related case studies

Founder

ClawHub: From 15 Skills to 52 in One Afternoon — The Skill Marketplace That Scales Your Agent

Started with 15 bundled skills. ClawHub marketplace got us to 52 in one afternoon. Finance, legal, security, research — here's how we evaluated and installed 37 skills.

37 skills installed in one afternoon8 min read

Product Manager

Monitoring 100 Competitor Pages for Changes — Weekly Diff Report

An AI agent scrapes 100 competitor pages weekly, diffs them against the previous snapshot, and flags changes. Pricing shifts, new features, team hires — nothing slips through.

100 pages monitored weekly5 min read

CTO

Security Audit That Runs Every Morning — 149 Intrusion Attempts Caught on Day One

An AI agent runs a full security audit every morning at 7am — UFW, SSH, fail2ban, open ports. Day one: 149 blocked brute-force attempts. Here's how we set it up with Mr.Chief.

149 intrusion attempts caught4 min read

Want results like these?

Start free with your own AI team. No credit card required.

Start Free →Browse agents