Skip to content

non recursive in nature (will only visit the children urls listed in urls.csv without adding any new ones), ensures this little crawler doesn't go out to map the entire internet :)

Notifications You must be signed in to change notification settings

booringreader/web-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

clone the repo

    git clone https://github.com/booringreader/web-crawler.git
    cd web-crawler

Execute

  1. if the existing one doesn't work, remove venv dir & create a new virtual environment with
python -m venv venv # macOS/Linux
  1. install dependencies with
pip install beautifulsoup4
  1. execute the urls.py file first, then enter the root url(the first page, the entry point); this will populate the urls.csv file
  2. execute the mails.py file

About

non recursive in nature (will only visit the children urls listed in urls.csv without adding any new ones), ensures this little crawler doesn't go out to map the entire internet :)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages