GitHub

Project #1 – Use NLTK library and other python library like Beautiful Soup to parse Reuters collection (Reuter’s-21578 corpus) into documents, articles, tokens, stems. Removed stop words indexed in indexer.
Project #2 – Create a naive indexer, a single term querying processor and a compressed index.
Project #3 - Create an indexer via SPIMI. Single term querying processor, AND query processor and OR query processor was implemented and convert the indexer into a probabilistic search engine using the BM25 formula.
Project #4 -Experiment with web crawling, scrape and index a set of web documents, cluster the documents using k-means and use the AFINN sentiment analysis script to assign a sentiment score to each cluster.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Project 1		Project 1
Project 2		Project 2
Project 3		Project 3
Project 4		Project 4
README.md		README.md

Provide feedback