A Python package for scraping GitHub topics and their top repositories.
pip install github-topic-scraperfrom github_topic_scraper import GitHubTopicScraper
# Initialize the scraper
scraper = GitHubTopicScraper()
# Scrape topics and repositories
topics_df = scraper.scrape_all()
# Access the scraped data
topics_df.head()- Scrapes GitHub topics and their descriptions
- Collects top repositories for each topic
- Saves data to CSV files
- Includes error handling and progress tracking
- Configurable output directory
- Python 3.7+
- requests
- pandas
- beautifulsoup4
-
Install Dependencies
- Make sure your virtual environment is activated
- pip install requests pandas beautifulsoup4
-
Install Package in Development Mode
- Make sure you're in the github_topic_scraper directory
- pip install -e .
-
Verify Installation
-
Start Python interpreter
-
python
-
Try importing the package from github_topic_scraper import GitHubTopicScraper
If no error appears, the package is installed correctly exit()
-
-
Run the sample file
- if the files are being generated inside sample diretory the installation is a success
- if the files arent being generated make sure you have a vertual enviornment setup and dependencies installed in it