Product Manager

Scraping Protected Websites — When web_fetch Hits a Wall

89% success on protected sitesProductivity & Security5 min read

Key Takeaway

When web_fetch returns 403s on Cloudflare-protected and JS-rendered sites, Scrapling's three scraping modes — simple, stealth, and dynamic — bypass bot detection and extract the data agents actually need.

The Problem

Every AI agent framework gives you a basic web fetch tool. Mr.Chief's web_fetch works fine for simple pages — documentation, blog posts, public APIs. It's fast and lightweight.

Then you try to scrape a competitor's pricing page. 403 Forbidden.

You try a JS-heavy SaaS dashboard. Empty HTML — the content loads client-side.

You try a site behind Cloudflare's bot protection. Captcha wall.

This is the reality of the modern web. Over 20% of all websites use Cloudflare. Most SaaS products render client-side with React or Vue. Anti-bot systems are getting smarter every quarter. A basic HTTP GET request with a User-Agent header isn't enough anymore.

For AI agents that need to gather competitive intelligence, monitor pricing, or research companies, this is a showstopper. The data exists. It's publicly visible in a browser. But your agent can't access it.

The Solution

The Scrapling skill for Mr.Chief — a three-mode web scraping system that ranges from basic extraction to full stealth browser automation. Each mode trades speed for capability:

Simple mode: Fast HTML extraction. No browser. Works for static sites.
Stealth mode: Real browser fingerprint with anti-detection. Bypasses Cloudflare, DataDome, and similar.
Dynamic mode: Full browser automation. JavaScript execution, infinite scroll, login flows, interaction.

The Process

Here's how each mode works in practice.

Simple mode — when speed matters and the site is cooperating:

pythonShow code

# Simple mode: basic HTML extraction
# ~200ms per page, no browser overhead
scrapling simple --url "https://docs.example.com/api/reference" \
  --extract "article" \
  --format markdown

Stealth mode — when the site fights back:

pythonShow code

# Stealth mode: real browser fingerprint
# Bypasses Cloudflare, DataDome, PerimeterX
scrapling stealth --url "https://competitor.com/pricing" \
  --extract ".pricing-table" \
  --wait-for ".price-amount" \
  --format json

Stealth mode doesn't just set a User-Agent string. It generates a complete browser fingerprint — canvas, WebGL, fonts, plugins, screen resolution, timezone. To the anti-bot system, it looks like a real person on a real browser. Because it is a real browser. Just one that an agent controls.

Dynamic mode — when you need the full browser:

pythonShow code

# Dynamic mode: full browser automation
# Handles JS rendering, infinite scroll, interactions
scrapling dynamic --url "https://app.example.com/dashboard" \
  --actions '[
    {"scroll": "bottom", "times": 5},
    {"wait": ".loaded-content"},
    {"click": ".show-more-button"},
    {"wait": 2000}
  ]' \
  --extract ".data-card" \
  --format json

Real use case: competitor pricing scrape

We needed pricing data from five competitors. All SaaS companies. All behind Cloudflare.

View details

Competitor A (Cloudflare Pro):
  web_fetch → 403 Forbidden ❌
  scrapling simple → 403 Forbidden ❌
  scrapling stealth → 200 OK ✅ (full pricing table extracted)

Competitor B (Cloudflare + JS rendering):
  web_fetch → 200 but empty pricing section ❌
  scrapling simple → 200 but empty pricing section ❌
  scrapling stealth → 200 OK ✅ (JS rendered, prices loaded)

Competitor C (DataDome protection):
  web_fetch → 403 Forbidden ❌
  scrapling stealth → 200 OK ✅ (bypassed DataDome)

Every single one that blocked web_fetch was accessible through stealth mode. The agent extracted pricing tiers, feature lists, and plan names — structured as JSON, ready for analysis.

The Results

Metric	web_fetch	Scrapling Simple	Scrapling Stealth	Scrapling Dynamic
Speed (per page)	~100ms	~200ms	~2-4s	~5-15s
JS rendering	No	No	Yes	Yes
Cloudflare bypass	No	No	Yes	Yes
Anti-bot bypass	No	No	Yes	Yes
Infinite scroll	No	No	No	Yes
Login flows	No	No	No	Yes
Resource usage	Minimal	Low	Medium	High
Success rate (protected sites)	12%	18%	89%	97%

The success rate tells the story. On protected sites, web_fetch works 12% of the time. Stealth mode works 89%. Dynamic mode — with full browser automation — works 97%.

The 3% dynamic mode failure is typically hard captchas (hCaptcha with visual challenges). Everything else falls.

Try It Yourself

bashShow code

# Install the scrapling skill
# Install via Mr.Chief dashboard after signing up at mrchief.ai/setup
# clawhub install scrapling

# Test simple mode on a static page
mrchief run --task "Use scrapling simple mode to extract the main content
from https://example.com/blog/post-1"

# Test stealth mode on a protected page
mrchief run --task "Use scrapling stealth mode to extract pricing data
from https://competitor.com/pricing — the site uses Cloudflare"

# Test dynamic mode for JS-heavy pages
mrchief run --task "Use scrapling dynamic mode to scrape the full product
listing from https://app.example.com — scroll to load all items"

Start with simple. Escalate to stealth. Use dynamic when you need interaction. The agent handles the mode selection automatically when you describe the problem.

Start free on Mr.Chief →

The web doesn't want to be scraped. Scrapling disagrees.

Web ScrapingScraplingCloudflareBot DetectionCompetitive Intelligence

Related case studies

Product Manager

Monitoring 100 Competitor Pages for Changes — Weekly Diff Report

An AI agent scrapes 100 competitor pages weekly, diffs them against the previous snapshot, and flags changes. Pricing shifts, new features, team hires — nothing slips through.

100 pages monitored weekly5 min read

Product Manager

Extracting Twitter Content Without API Limits — Stealth Scraping X

Twitter API v2 is rate-limited and expensive. Scrapling's dynamic mode extracts full threads, engagement metrics, and reply sentiment — free and unlimited. Here's how.

$0 unlimited tweet extraction5 min read

Founder

ClawHub: From 15 Skills to 52 in One Afternoon — The Skill Marketplace That Scales Your Agent

Started with 15 bundled skills. ClawHub marketplace got us to 52 in one afternoon. Finance, legal, security, research — here's how we evaluated and installed 37 skills.

37 skills installed in one afternoon8 min read

Want results like these?

Start free with your own AI team. No credit card required.

Start Free →Browse agents