TRF Scraper

Simple spider made with scrapy used to scrape the contents of the TRF1 website into .csv files

Instructions

The project needs poetry to be installed in your machine, installation instructions here

After having installed poetry you can install the dependecies with the following command:

poetry install

After installing all dependencies you can run the project with the following command:

poetry run scrapy runspider trf_spider.py

Output

The spider will create 4 .csv files for each scraped item, each in its own location on the following folders:

./output/distribuicao
./output/movimentacao
./output/peticao
./output/processo

The files will be named according to the folder name and process number as following: {folder_name}_{process_number}.csv

If the contents parsed are empty the .csv file will be empty as well. This is intended behaviour that probably should be tweaked but it works well enough 🙃

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
output		output
requests		requests
.flake8		.flake8
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
trf_spider.py		trf_spider.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TRF Scraper

Instructions

Output

About

Uh oh!

Languages

sammyzord/trf_scraper

Folders and files

Latest commit

History

Repository files navigation

TRF Scraper

Instructions

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages