Skip to content

ScrapeAlchemist/BrightData_Cookbook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Bright Data Cookbook

A collection of production-ready recipes and scripts to help you get started with Bright Data faster. Each recipe demonstrates best practices for web scraping using Bright Data's proxy infrastructure.

What's Inside

This cookbook provides ready-to-use scraping recipes that you can run immediately or use as a foundation for your own projects. Each recipe includes:

  • Complete, working code
  • Detailed documentation
  • Configuration examples
  • Best practices and tips
  • Error handling patterns

Available Recipes

Production-ready Amazon product scraper with concurrent scraping using Bright Data's Web Unlocker API.

Features:

  • ⚡ Concurrent scraping (configurable: 10-200+ concurrent requests)
  • 🔓 Web Unlocker API with automatic IP rotation and anti-bot bypass
  • 🎯 Two-phase scraping (search results → product details)
  • 📦 Structured data extraction (title, price, ratings, images, specs, reviews)
  • 📊 Search metadata (searchTerm, searchRank, searchPage)
  • 💾 Organized output with ASIN-based folders
  • 🔌 Easy integration with other projects

Use Cases:

  • Price monitoring
  • Market research
  • Competitor analysis
  • Inventory tracking

View Recipe →

Getting Started

Each recipe is self-contained in its own directory. To use a recipe:

  1. Navigate to the recipe directory:

    cd Amazon_scraper
  2. Install dependencies:

    npm install
  3. Configure your credentials:

    cp .env.example .env
    # Edit .env with your Bright Data credentials
  4. Run the recipe:

    npm run scrape

Prerequisites

  • Node.js 18+ - All recipes require Node.js 18 or higher
  • Bright Data Account - Sign up at brightdata.com
  • API Token or Proxy Zone - Depending on the recipe

Bright Data Setup

1. Create an Account

Sign up at brightdata.com if you don't have an account.

2. Get API Token (for Web Unlocker recipes)

  1. Go to Bright Data Settings
  2. Generate an API token
  3. Create a Web Unlocker zone at Zones Dashboard
  4. Add credentials to your recipe's .env file:
    BRIGHT_DATA_API_TOKEN=your_token_here
    WEB_UNLOCKER_ZONE=unlocker

3. Or Create a Proxy Zone (for proxy-based recipes)

  1. Go to Bright Data Dashboard
  2. Click "Add Zone"
  3. Select proxy type:
    • Datacenter - Cost-efficient, good for high-volume scraping
    • Residential - Higher success rate, better for difficult sites
  4. Get your credentials:
    • Customer ID (starts with hl_)
    • Zone name
    • Zone password
  5. Add these to your recipe's .env file

Recipe Structure

Each recipe follows this structure:

recipe_name/
├── README.md              # Complete documentation
├── package.json           # Dependencies and scripts
├── .env.example           # Environment variable template
├── src/                   # Source code
│   └── *.js              # Main scripts
└── examples/              # Example scripts and tests (optional)
    └── test_connection.js

Best Practices

1. Start Small

Test with minimal settings first:

  • Small number of pages/products
  • Short delays between requests
  • Test connection before full runs

2. Use Appropriate Proxies

  • Datacenter: Start here - cheaper, good success rate
  • Residential: Use if datacenter gets blocked - higher cost, better success

3. Respect Rate Limits

  • All recipes include built-in delays
  • Don't reduce delays too much
  • Monitor success rates

4. Handle Errors Gracefully

  • Check error logs in output directories
  • Review failed requests
  • Adjust settings based on error patterns

5. Organize Output

  • Recipes auto-organize output
  • Keep raw HTML for debugging
  • Extract structured data separately

Contributing

We welcome contributions! To add a new recipe:

  1. Fork this repository

  2. Create a new recipe directory:

    mkdir My_Recipe
    cd My_Recipe
  3. Follow the recipe structure:

    • Add README.md with clear documentation
    • Include .env.example for credentials
    • Provide working example code
    • Add error handling
    • Include test scripts
  4. Update this README:

    • Add your recipe to the "Available Recipes" section
    • Include a brief description and use cases
  5. Submit a Pull Request:

    • Describe what your recipe does
    • Include any special requirements
    • Provide example output

Recipe Guidelines

Your recipe should:

  • ✅ Work out of the box after configuration
  • ✅ Include comprehensive documentation in README.md
  • ✅ Handle errors gracefully
  • ✅ Use environment variables for credentials
  • ✅ Include example output/logs
  • ✅ Follow Node.js best practices
  • ✅ Implement proper rate limiting or concurrency control
  • ✅ Use Bright Data's Web Unlocker API or session rotation for unique IPs
  • ✅ Be production-ready and easily integrable

Troubleshooting

Connection Issues

# Test your Bright Data connection
cd recipe_name
npm test

Common Problems

Authentication Failed:

  • Verify credentials in .env file
  • Check zone is active in Bright Data dashboard
  • Ensure zone type matches configuration

High Failure Rate:

  • Increase delays between requests
  • Switch from Datacenter to Residential proxies
  • Check if target site has changed structure

No Data Extracted:

  • Review raw HTML output
  • Check if selectors need updating
  • Verify site structure hasn't changed

Resources

License

MIT License - feel free to use these recipes for personal and commercial projects.

Support

  • Check individual recipe documentation
  • Review error logs in output directories
  • Open an issue for bugs or feature requests
  • Contribute improvements via Pull Requests

Built with Bright Data - The world's #1 web data platform

Start fast. Scrape responsibly. Build amazing things.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published