Welcome to the Bright Data web scraping workshop! This hands-on workshop demonstrates how to choose the right scraping solution based on your needs, optimize for cost efficiency, and maximize success rates.
Available in two languages:
|
Best for:
Requirements:
|
Best for:
Requirements:
|
By the end of this workshop, you'll understand:
- Cost-Efficient Scraping Strategy - Start with the cheapest solution and upgrade only when needed
- Headers & Cookies Impact - How adding proper headers/cookies dramatically increases success rates
- When to Use What - Decision framework for choosing between different Bright Data products
- Success Rate Optimization - Practical demonstrations of solving blocks and CAPTCHAs
- Bright Data account with active zones:
- Datacenter Proxy Zone
- Web Unlocker Zone
- Scraping Browser Zone
- SERP API Zone
- Residential Proxy Zone (optional)
- Google Chrome browser installed (for browser-based scripts)
- Your chosen programming language runtime (Node.js or Python)
PROXY OPTIONS (cheapest β more expensive):
ββ Datacenter Proxy + Simple HTTP Request
ββ Datacenter Proxy + Browser
ββ Residential Proxy + Simple HTTP Request
ββ Residential Proxy + Browser
MANAGED SOLUTIONS (No headers/cookies hassle):
ββ Web Unlocker (for simple requests)
ββ SERP API (for search engine results)
ββ Scraping Browser (for dynamic sites)
Start Here:
- Try Datacenter Proxy with simple HTTP requests
- Add headers and cookies to improve success rate
- If still blocked β Choose your upgrade path:
Upgrade Path A: Stick with HTTP
- If blocked β Use Web Unlocker (handles headers/cookies/CAPTCHAs automatically)
Upgrade Path B: Need a Browser
- If need JavaScript rendering β Use Datacenter + Browser
- If blocked β Use Scraping Browser (remote browser with auto-fingerprinting & CAPTCHA solving)
For easier demonstration and comparison, some scripts have different default settings between JavaScript and Python:
| Script | Configuration | JavaScript | Python | Why Different? |
|---|---|---|---|---|
| simple_request | USE_HEADERS | false |
False |
Shows impact of adding headers |
| browser_with_proxy | USE_RESIDENTIAL | false |
True |
Demonstrates both proxy types |
| unlocker_demo | USE_WEB_UNLOCKER | true |
False |
Shows regular vs unlocker comparison |
| remote_browser | USE_SCRAPING_BROWSER | false |
False |
Both start in local mode |
| serp_api_demo | USE_JSON_PARSING | true |
False |
Shows JSON vs raw HTML output |
These intentional differences let you see varied behaviors by default. Check the language-specific READMEs for complete configuration options.
Both JavaScript and Python versions include 5 core demonstrations:
Cost Level: CHEAPEST (Datacenter) or MODERATE (Residential)
Demonstrates:
- Basic HTTP requests through proxies
- Impact of headers and cookies on success rate
- Baseline for cost-efficient scraping
- Compare Datacenter vs Residential IPs
Cost Level: CHEAP (Datacenter) or MODERATE (Residential)
Demonstrates:
- Using real browser with proxies
- JavaScript rendering handling
- Headers/cookies in browser context
- Request blocking for efficiency
Cost Level: HIGHER (but automated)
Demonstrates:
- Direct comparison: Regular proxy vs Web Unlocker
- Automatic CAPTCHA solving
- No manual header/cookie configuration
- Success rate improvement on protected sites
Cost Level: HIGHEST (but maximum success)
Demonstrates:
- Remote browser with automatic fingerprints
- WebSocket connection to hosted browsers
- Automatic CAPTCHA solving
- Chrome DevTools integration
- Scalable to hundreds of parallel sessions
Cost Level: MODERATE (managed, pay-per-success)
Demonstrates:
- Search engine scraping (Google, Bing, DuckDuckGo)
- Automatic JSON parsing with
brd_json=1 - Country and language targeting
- No manual header/cookie configuration
- 99.9% success rate on search results
Adding proper headers and cookies can increase success rates from 20% to 80%+ with Datacenter proxies:
- Without: Gets blocked as "bot-like" traffic
- With Headers: Mimics real browser requests
- With Cookies: Maintains session state and authentication
Don't pay for Residential IPs or managed solutions if Datacenter + headers works:
Always start with: DC + HTTP + Headers/Cookies
β (if blocked)
Upgrade to: Web Unlocker (for HTTP) or DC + Browser (for JS)
β (if still blocked)
Final solution: Scraping Browser (for maximum success)
Web Unlocker and Scraping Browser handle the complexity for you:
- No manual header configuration
- No cookie management
- Automatic CAPTCHA solving
- Automatic retries and fingerprinting
- Higher cost but faster development time
Use HTTP (cheaper) when:
- Site returns full HTML in initial response
- No JavaScript rendering required
- APIs and server-side rendered pages
Use Browser when:
- Content loaded via JavaScript
- Need to interact with site (click, scroll, fill forms)
- Infinite scroll or AJAX-loaded content
- SPAs (Single Page Applications)
JavaScript Developers:
cd javascript
npm install
cp .env.example .env
# Edit .env with your credentials
npm run simplePython Developers:
cd python
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
playwright install chromium
cp .env.example .env
# Edit .env with your credentials
python src/simple_request.py.
βββ javascript/ # JavaScript implementation
β βββ src/
β β βββ simple_request.js
β β βββ browser_with_proxy.js
β β βββ unlocker_demo.js
β β βββ remote_browser.js
β β βββ serp_api_demo.js
β β βββ open_chrome.js
β βββ package.json
β βββ .env.example
β βββ README.md
βββ python/ # Python implementation
β βββ src/
β β βββ simple_request.py
β β βββ browser_with_proxy.py
β β βββ unlocker_demo.py
β β βββ remote_browser.py
β β βββ serp_api_demo.py
β β βββ open_chrome.py
β βββ requirements.txt
β βββ .env.example
β βββ README.md
βββ .gitignore
βββ README.md # This file
- Always start with the cheapest solution - Datacenter + Simple HTTP
- Add headers and cookies first - Often solves 80% of blocking issues
- Test both Datacenter and Residential - Swap zones to see which works best
- Use Web Unlocker when you don't want to manage headers/cookies - Let Bright Data handle it
- Use browsers only when necessary - HTTP is faster and cheaper
- Monitor your success rates - Upgrade only when current solution fails
- Never hardcode credentials - Always use
.envfiles - Test with small batches first - Start with 1-5 requests before scaling
- Check your
BRIGHTDATA_CUSTOMER_IDand passwords in.env - Verify zone names are correct (case-sensitive)
- Ensure zones are active in Bright Data dashboard
- Consider upgrading to Web Unlocker (for HTTP) or Scraping Browser (for browser)
- Check if site requires Residential IPs (geo-restrictions)
- Reduce request rate (may be rate-limited)
- Install Google Chrome browser
- Update browser path in scripts if non-standard location
- Check internet connection
- Verify zone is active in Bright Data dashboard
- Try increasing timeout value in script
Happy Scraping! π
Remember: Start cheap, optimize headers/cookies, upgrade only when blocked!