Skip to content

ntimotijevic6318/Scrapy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 

Repository files navigation

[EN]

Scrapy

Example of site scraping - IMDB

Presentation recording: Scraping

The recording is about:

  • Using the scrapy library for recursive data scraping which has the command scrapy startproject project-name
  • When the program is created in a virtual environment we define the main method parse which scrapes data following the commands in the code
  • The commands fetch HTML elements using the command response.css(.class child...) as a kind of query selector
  • Running is done with the command crawl name -o(output) name|of|the|generated|file.extension (.csv, .json, .xml)
  • In this example we exported .csv because it is compatible with Excel
  • In Excel, as an example, we did statistics on how many films were made per year, the number of films by genres
  • Knime analytics

Nikola Timotijevic 63/18 RN

[SR]

Scrapy

Primer skrejpovanja sajta - IMDB

Snimak prezentacije : Scraping

Na snimku se radi o:

  • Korišćenju scrapy biblioteke za rekurzivno skrejpovanje podataka koja ima komandu scrapy startproject project-name
  • Kada se program kreira u virtualnom okruzenju definišemo glavnu metodu parse koja skrejpuje podatke idući po naredbama u kodu
  • Naredbe dohvataju html elemente pomoću komande response.css(.class child...) kao neka vrsta query selector-a.
  • Pokretanje se vrši komandom crawl name -o(output) ime|izvedenog|fajla.extension(.csv , .json , .xml)
  • Na ovom primeru smoo izveli .csv jer je kompatibilan sa Excelom
  • U Excelu smo za primer uradili statistiku koliko je filmova snimljeno po godinama, broj filmova po žanrovima.
  • Knime analitika

Nikola Timotijevic 63/18 RN

About

Primer skrejpovanja sajta - IMDB

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published