Homework 3: What movie to watch tonight?

Authors

Yves Gaetan Nana Teukam
Caterina Alfano
Meher Kavya Koppisetti

Objective

The goal of this assignment is to create a search engine for movies.
To make this possible downloaded 30000 wikipedia paged about movies and then retrieved the most important informations to create an inverted index to compute our queries. The query's result are then ordered by similarity to the query (using the Cosine Similarity and the tdidf methods).
Laslty we also added our own ranking method to order the results of the query.

The Homework also includes an algorithmic question: how to find the length of the longest palindrome substring in a give string

Repository Structure

Our Repository contains the following files:

README.md: a Markdown file that explains the content of your repository.
collector.py: a python file that contains the line of code needed to collect data from the html page and Wikipedia.
collector_utils.py: a python file that stores the function used in collector.py.
parser.py: a python file that contains the line of code needed to parse the entire collection of html pages and save those in tsv files.
parser_utils.py: a python file that gathers the function used in parser.py.
index.py: a python file that once executed generate the indexes of the Search engines.
index_utils.py: a python file that contains the functions used for creating indexes.
utils.py: a python file that gather functions needed in more than one of the previous files.
main.py: a python file that once executed builds up the search engine.
exercise_4.py: python file that contains the implementation of the algorithm that solves problem 4.
main.ipynb: a Jupyter notebook explaines the strategies you adopted solving the homework and the Bonus point (visualization task). The notebook must be clear, complete and tidy. Here an example of a nice notebook from last year. Avoid pushing on GitHub notebook that contain entire long printed list, otherwise we will not be able to open it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Homework 3: What movie to watch tonight?

Authors

Objective

Repository Structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md
collector.py		collector.py
exercise_4.py		exercise_4.py
index.py		index.py
index_utils.py		index_utils.py
main.ipynb		main.ipynb
main.py		main.py
parser.py		parser.py
parser_utils.py		parser_utils.py

cat-erina/ADM-HW3

Folders and files

Latest commit

History

Repository files navigation

Homework 3: What movie to watch tonight?

Authors

Objective

Repository Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages