[EN]
Presentation recording: Scraping
The recording is about:
- Using the scrapy library for recursive data scraping which has the command scrapy startproject project-name
- When the program is created in a virtual environment we define the main method parse which scrapes data following the commands in the code
- The commands fetch HTML elements using the command response.css(.class child...) as a kind of query selector
- Running is done with the command crawl name -o(output) name|of|the|generated|file.extension (.csv, .json, .xml)
- In this example we exported .csv because it is compatible with Excel
- In Excel, as an example, we did statistics on how many films were made per year, the number of films by genres
- Knime analytics
[SR]
Snimak prezentacije : Scraping
Na snimku se radi o:
- Korišćenju scrapy biblioteke za rekurzivno skrejpovanje podataka koja ima komandu scrapy startproject project-name
- Kada se program kreira u virtualnom okruzenju definišemo glavnu metodu parse koja skrejpuje podatke idući po naredbama u kodu
- Naredbe dohvataju html elemente pomoću komande response.css(.class child...) kao neka vrsta query selector-a.
- Pokretanje se vrši komandom crawl name -o(output) ime|izvedenog|fajla.extension(.csv , .json , .xml)
- Na ovom primeru smoo izveli .csv jer je kompatibilan sa Excelom
- U Excelu smo za primer uradili statistiku koliko je filmova snimljeno po godinama, broj filmova po žanrovima.
- Knime analitika