Scrape guild rosters and member social tags from WoWProgress — with Cloudflare bypass via Playwright.
WoWProgress protects its pages with Cloudflare challenges, making traditional HTTP-based scraping impossible. This tool uses Playwright to drive a real Chromium browser, solve the challenge automatically, and then:
- Navigate to any guild's roster page
- Extract active members with name, rank, role, and item level
- Visit each member's profile to collect social media handles (Battle.net, Discord, Twitter, Twitch, YouTube)
- Output everything as structured JSON
Built for guild leaders, community managers, and recruitment officers who need roster data that WoWProgress doesn't expose via an API.
-
🛡️ Cloudflare Bypass — Uses a real browser context with configurable user agent and optional playwright-stealth to bypass Cloudflare's bot detection.
-
📋 Full Roster Extraction — Parses the guild member table including character name, rank, role (spec), item level, and profile URL. Inactive members are automatically filtered out.
-
🔗 Social Tag Collection — Visits each member's profile page and extracts Battle.net, Discord, Twitter, Twitch, and YouTube handles via regex pattern matching.
-
⏱️ Rate Limiting — Configurable delay and random jitter between profile visits to avoid triggering rate limits or bans.
-
📄 JSON Output — Clean, structured output to stdout or file — ready for further processing or import into spreadsheets and databases.
- Python 3.10+
- Chromium browser (installed via Playwright)
# 1. Clone
git clone https://github.com/CheswickDEV/WoWProgress-Scraper.git
cd WoWProgress-Scraper
# 2. Install dependencies
pip install -r requirements.txt
# 3. Install Playwright browsers
playwright install chromium# Basic usage — scrape a guild roster + social tags
python wowprogress_scraper.py \
--region eu \
--realm draenor \
--guild Method \
--output members.json
# Quick test with limited members
python wowprogress_scraper.py \
--region eu \
--realm draenor \
--guild Method \
--max-members 5
# Stealth mode + headless for automation
python wowprogress_scraper.py \
--region us \
--realm illidan \
--guild "Liquid" \
--headless \
--stealth \
--output liquid.json| Parameter | Required | Default | Description |
|---|---|---|---|
--region |
✅ | — | Region of the guild (eu, us, kr, etc.) |
--realm |
✅ | — | Realm (server) name |
--guild |
✅ | — | Guild name |
--output |
❌ | stdout | Output JSON file path |
--max-members |
❌ | all | Limit number of members (useful for testing) |
--headless |
❌ | false |
Run browser without visible window |
--stealth |
❌ | false |
Enable playwright-stealth mitigations |
--user-agent |
❌ | default | Override browser user agent string |
--delay |
❌ | 2.0 |
Base delay between profile visits (seconds) |
--jitter |
❌ | 0.75 |
Random jitter ± added to delay |
[
{
"name": "Charactername",
"rank": "1",
"role": null,
"item_level": "639",
"profile_url": "https://www.wowprogress.com/character/eu/draenor/Charactername",
"social_tags": {
"discord": "user#1234",
"twitch": "streamername",
"twitter": "@handle"
}
}
]This tool is for personal use and research purposes only. Respect WoWProgress's terms of service. Use reasonable delays between requests and avoid hammering their servers. The Cloudflare bypass operates within a standard browser context — no CAPTCHA solving or token forgery is involved.
WoWProgress-Scraper/
├── wowprogress_scraper.py # Main scraper script
├── requirements.txt # Python dependencies
└── README.md
Dependencies:
playwright≥ 1.55.0playwright-stealth≥ 2.0.0 (optional, for--stealthmode)
- 🚀 Initial release
- ✨ Guild roster extraction with Cloudflare bypass
- ✨ Social tag collection (Battle.net, Discord, Twitter, Twitch, YouTube)
- ✨ Configurable rate limiting with jitter
- ✨ JSON output to file or stdout
- ✨ Optional stealth mode via playwright-stealth
MIT — do what you want, just give credit.
Made with 🖤 by cheswick.dev