A Python package for scraping GitHub topics and their top repositories.
pip install github-topic-scraperfrom github_topic_scraper import GitHubTopicScraper
# Initialize the scraper
scraper = GitHubTopicScraper()
# Scrape topics and repositories
topics_df = scraper.scrape_all()
# Access the scraped data
topics_df.head()- Scrapes GitHub topics and their descriptions
- Collects top repositories for each topic
- Saves data to CSV files
- Includes error handling and progress tracking
- Configurable output directory
- Python 3.7+
- requests
- pandas
- beautifulsoup4
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate# Make sure your virtual environment is activated
pip install requests pandas beautifulsoup4# Make sure you're in the github_topic_scraper directory
pip install -e .# Start Python interpreter
python
# Try importing the package
>>> from github_topic_scraper import GitHubTopicScraper
# If no error appears, the package is installed correctly
>>> exit()from github_topic_scraper import GitHubTopicScraper
scraper = GitHubTopicScraper(output_dir="sample")
topics_df = scraper.scrape_all()If files aren't being generated:
- Verify your virtual environment is activated (you should see
(venv)in your terminal) - Confirm all dependencies are installed:
pip list - Check you have write permissions in the output directory
- Ensure you're running the code from the correct directory
- ImportError: Make sure all dependencies are installed in your virtual environment
- Permission Error: Check write permissions in your output directory
- ModuleNotFoundError: Verify the package is installed correctly
For additional help, please open an issue on GitHub. Thanks!!!