Product Manager
Scraping Protected Websites β When web_fetch Hits a Wall
Key Takeaway
When web_fetch returns 403s on Cloudflare-protected and JS-rendered sites, Scrapling's three scraping modes β simple, stealth, and dynamic β bypass bot detection and extract the data agents actually need.
The Problem
Every AI agent framework gives you a basic web fetch tool. Mr.Chief's web_fetch works fine for simple pages β documentation, blog posts, public APIs. It's fast and lightweight.
Then you try to scrape a competitor's pricing page. 403 Forbidden.
You try a JS-heavy SaaS dashboard. Empty HTML β the content loads client-side.
You try a site behind Cloudflare's bot protection. Captcha wall.
This is the reality of the modern web. Over 20% of all websites use Cloudflare. Most SaaS products render client-side with React or Vue. Anti-bot systems are getting smarter every quarter. A basic HTTP GET request with a User-Agent header isn't enough anymore.
For AI agents that need to gather competitive intelligence, monitor pricing, or research companies, this is a showstopper. The data exists. It's publicly visible in a browser. But your agent can't access it.
The Solution
The Scrapling skill for Mr.Chief β a three-mode web scraping system that ranges from basic extraction to full stealth browser automation. Each mode trades speed for capability:
- Simple mode: Fast HTML extraction. No browser. Works for static sites.
- Stealth mode: Real browser fingerprint with anti-detection. Bypasses Cloudflare, DataDome, and similar.
- Dynamic mode: Full browser automation. JavaScript execution, infinite scroll, login flows, interaction.
The Process
Here's how each mode works in practice.
Simple mode β when speed matters and the site is cooperating:
pythonShow code
# Simple mode: basic HTML extraction
# ~200ms per page, no browser overhead
scrapling simple --url "https://docs.example.com/api/reference" \
--extract "article" \
--format markdown
Stealth mode β when the site fights back:
pythonShow code
# Stealth mode: real browser fingerprint
# Bypasses Cloudflare, DataDome, PerimeterX
scrapling stealth --url "https://competitor.com/pricing" \
--extract ".pricing-table" \
--wait-for ".price-amount" \
--format json
Stealth mode doesn't just set a User-Agent string. It generates a complete browser fingerprint β canvas, WebGL, fonts, plugins, screen resolution, timezone. To the anti-bot system, it looks like a real person on a real browser. Because it is a real browser. Just one that an agent controls.
Dynamic mode β when you need the full browser:
pythonShow code
# Dynamic mode: full browser automation
# Handles JS rendering, infinite scroll, interactions
scrapling dynamic --url "https://app.example.com/dashboard" \
--actions '[
{"scroll": "bottom", "times": 5},
{"wait": ".loaded-content"},
{"click": ".show-more-button"},
{"wait": 2000}
]' \
--extract ".data-card" \
--format json
Real use case: competitor pricing scrape
We needed pricing data from five competitors. All SaaS companies. All behind Cloudflare.
View details
Competitor A (Cloudflare Pro):
web_fetch β 403 Forbidden β
scrapling simple β 403 Forbidden β
scrapling stealth β 200 OK β
(full pricing table extracted)
Competitor B (Cloudflare + JS rendering):
web_fetch β 200 but empty pricing section β
scrapling simple β 200 but empty pricing section β
scrapling stealth β 200 OK β
(JS rendered, prices loaded)
Competitor C (DataDome protection):
web_fetch β 403 Forbidden β
scrapling stealth β 200 OK β
(bypassed DataDome)
Every single one that blocked web_fetch was accessible through stealth mode. The agent extracted pricing tiers, feature lists, and plan names β structured as JSON, ready for analysis.
The Results
| Metric | web_fetch | Scrapling Simple | Scrapling Stealth | Scrapling Dynamic |
|---|---|---|---|---|
| Speed (per page) | ~100ms | ~200ms | ~2-4s | ~5-15s |
| JS rendering | No | No | Yes | Yes |
| Cloudflare bypass | No | No | Yes | Yes |
| Anti-bot bypass | No | No | Yes | Yes |
| Infinite scroll | No | No | No | Yes |
| Login flows | No | No | No | Yes |
| Resource usage | Minimal | Low | Medium | High |
| Success rate (protected sites) | 12% | 18% | 89% | 97% |
The success rate tells the story. On protected sites, web_fetch works 12% of the time. Stealth mode works 89%. Dynamic mode β with full browser automation β works 97%.
The 3% dynamic mode failure is typically hard captchas (hCaptcha with visual challenges). Everything else falls.
Try It Yourself
bashShow code
# Install the scrapling skill
# Install via Mr.Chief dashboard after signing up at mrchief.ai/setup
# clawhub install scrapling
# Test simple mode on a static page
mrchief run --task "Use scrapling simple mode to extract the main content
from https://example.com/blog/post-1"
# Test stealth mode on a protected page
mrchief run --task "Use scrapling stealth mode to extract pricing data
from https://competitor.com/pricing β the site uses Cloudflare"
# Test dynamic mode for JS-heavy pages
mrchief run --task "Use scrapling dynamic mode to scrape the full product
listing from https://app.example.com β scroll to load all items"
Start with simple. Escalate to stealth. Use dynamic when you need interaction. The agent handles the mode selection automatically when you describe the problem.
The web doesn't want to be scraped. Scrapling disagrees.
Related case studies
Product Manager
Monitoring 100 Competitor Pages for Changes β Weekly Diff Report
An AI agent scrapes 100 competitor pages weekly, diffs them against the previous snapshot, and flags changes. Pricing shifts, new features, team hires β nothing slips through.
Product Manager
Extracting Twitter Content Without API Limits β Stealth Scraping X
Twitter API v2 is rate-limited and expensive. Scrapling's dynamic mode extracts full threads, engagement metrics, and reply sentiment β free and unlimited. Here's how.
Founder
ClawHub: From 15 Skills to 52 in One Afternoon β The Skill Marketplace That Scales Your Agent
Started with 15 bundled skills. ClawHub marketplace got us to 52 in one afternoon. Finance, legal, security, research β here's how we evaluated and installed 37 skills.
Want results like these?
Start free with your own AI team. No credit card required.