- Project #1 – Use NLTK library and other python library like Beautiful Soup to parse Reuters collection (Reuter’s-21578 corpus) into documents, articles, tokens, stems. Removed stop words indexed in indexer.
- Project #2 – Create a naive indexer, a single term querying processor and a compressed index.
- Project #3 - Create an indexer via SPIMI. Single term querying processor, AND query processor and OR query processor was implemented and convert the indexer into a probabilistic search engine using the BM25 formula.
- Project #4 -Experiment with web crawling, scrape and index a set of web documents, cluster the documents using k-means and use the AFINN sentiment analysis script to assign a sentiment score to each cluster.
-
Notifications
You must be signed in to change notification settings - Fork 0
janeeyre912/information_retrieve
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published