Skip to content

janeeyre912/information_retrieve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Projects of Information Retrieval and Web Search

  • Project #1 – Use NLTK library and other python library like Beautiful Soup to parse Reuters collection (Reuter’s-21578 corpus) into documents, articles, tokens, stems. Removed stop words indexed in indexer.
  • Project #2 – Create a naive indexer, a single term querying processor and a compressed index.
  • Project #3 - Create an indexer via SPIMI. Single term querying processor, AND query processor and OR query processor was implemented and convert the indexer into a probabilistic search engine using the BM25 formula.
  • Project #4 -Experiment with web crawling, scrape and index a set of web documents, cluster the documents using k-means and use the AFINN sentiment analysis script to assign a sentiment score to each cluster.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages