-
Notifications
You must be signed in to change notification settings - Fork 0
Buczman/WebScrapper
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
#####################################################
# WEBSCRAPPING #
# POLISH CONSTITUTIONAL TRIBUNAL #
# #
#####################################################
#####################################################
# REQUIREMENTS:
#####################################################
*Please download wkthmltopdf.exe, install it and specify
path to it in variable pathwkthmltopdf.
*Please install newest selenium driver
*Please download geckodriver and specify path to it in
variable driver(executable_path=...)
#####################################################
Below code is aimed at scraping jurisdiction accompanied
by separate opinions.
Specify filters and necessary parameters at PARAMETRIZATION
section!!!
Each method is described more thoroughly throughout the code
#####################################################
# OUTPUT:
#####################################################
Output is as below:
- outputL - main list containing:
* outputLDict - list of dictionaries in a form of JSON
containing fields:
** id - id of a jurisdiction
** link - direct link to the jurisdiction
** sign - signature name of a jurisdiction
** sep_opi - list of (if available) separate opinions
in a form of dictionaries with fields:
*** link - direct link to the separate
opinion
*** by - name and surname of the
separate opinion's author
* mostcommon5 - list of tuples with 5 most active
authors in separate opinions in a form of:
(name , number of separate opinions)
* file output saved in folders in a following way:
** /ID_SIGNATURE - here are all PDF and HTML files
relating to a separate jurisdiction stored, each file
named as ID_SINGATURE.PDF (.HTML)
** /ID_SIGNATURE/separate_opinions - here are all
PDF and HTML files relating to each separate opinion
of a single jurisdiction stored, named as
ID_SIGNATURE_BY.PDF (.HTML)
Program also produces a log file containing DEBUG info.
About
Study project for Webscrapping and Social Media scrapping
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published