Skip to content

Latest commit

 

History

History
23 lines (15 loc) · 695 Bytes

File metadata and controls

23 lines (15 loc) · 695 Bytes

Python-Web-Text-Scraper

A Python console application that retrives clean text data from multiple websites at once and saves the data to a text file.

Running The Text Scraper Program:

  1. Run the script
  2. Enter a search keyword - try "emoji" ;)

Text data output is saved to a "text_data.txt" file

Requires requests, and beautiful soup 4

pip install requests
pip install bs4

Requests Documentation: https://requests.readthedocs.io/en/master/

Beautiful Soup Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#

Example result text on "covid19" keyword searches:
Covid19 Text Corpus: https://drive.google.com/open?id=1YS8UJ-Qeamdo9aAcpIgUqVb0ohrKijHy