"Sigan Viendo" is a web scraping application designed to gather and analyze information from Dominican Republic government websites.
This application is designed to:
- Search for all
*.gob.dodomains which represent Dominican Republic government institutions. - Collect specific data from these websites, specifically the ID (cédula), position, and salary of public employees.
- Identify employees who work in multiple government institutions.
- Clone this repository:
git clone https://github.com/gmarte/gobdo.git - Change into the directory:
cd gobdo - Install the required Python packages:
pip install -r requirements.txt - Set up the MongoDB database by following the instructions in
config.py. - Run the application:
python src/main.py
To start the web scraping process, run the main.py script located in the src directory:
This will start the web scraping process and the collected data will be stored in the MongoDB database configured in config.py.
The application is structured as follows:
src/: Contains the Python scripts for the application.main.py: Entry point for the application.web_scraper.py: Contains the logic for web scraping.database.py: Contains the logic for interacting with the database.
config.py: Contains the configuration for the database.requirements.txt: Contains the Python packages required for this application.
If you want to contribute to this project, please fork the repository, make your changes, and open a Pull Request.
This project is licensed under the MIT License. See LICENSE for more details.