- Reddar is a retrieval system based on elasticsearch and python flask for Reddit data. We already fetch part of data from Reddit "IAMA" thread for experiment, you may want to retrieve own data by Reddit json APIs to build corpus!
- Improved search functionality with elastic seach and organized search results for different demand
- Full-text search Reddit data with elastic search
- Order search results by relevance, time, score and distinguish result by theme and replies
- Flatten reddit content and re-arrange replies by time
- Check user history
-
Article(Theme) index
- Indexing the article content separately without replies
- Standard tokenizer, porter stem, english stop
-
Replies index
- Indexing all the replies content and each their parent. (The replies' which depth is 0 will have article(theme) as parent
- Standard tokenizer, porter stem, english stop
-
Text Search:
- Search all relevant articles
- Search all relevant replies
- Reconstruct the structure through depth and parent fields
-
Author Search:
- Search all articles posted by the author
- Search all replies posted by the author
-
Weighting
- we considered the replies’ relevance decrease as they go deeper, which means the replies close to the root (the reddit article ) are more relevant to those far from the root. Assign the first level of replies as depth 0. The weight of a reply is 1/(1+depth).
-
Sorting
- We support 3 kinds of sorting: relevance, time, score (upvotes- downvotes)
Here is a live demo deployed on heroku with partial data : Reddar (not available any more)
Before starting flask application, please make sure you already start local elasticsearch(which is for building index and work as database in our application) first! You can start application by run commands in the terminal as follows.
- <elasticsearch path>/bin/elasticsearch
after elasticsearch started
- python <Application path>/run.py
Want to contribute? Great!
To fix a bug or enhance an existing module, follow these steps:
1.Fork the repo
2.Create a new branch (`git checkout -b improve-feature`)
3.Make the appropriate changes in the files
4.Add changes to reflect the changes made
5.Commit your changes (`git commit -am 'Improve feature'`)
6.Push to the branch (`git push origin improve-feature`)
7.Create a Pull Request
- ElasticSearch - Elasticsearch is a search engine based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.
- Masonry - Masonry is a JavaScript grid layout library. It works by placing elements in optimal position based on available vertical space, sort of like a mason fitting stones in a wall. You’ve probably seen it in use all over the Internet.
- Flask - Flask is a microframework for Python based on Werkzeug, Jinja 2 and good intentions.
- Reddit APIs - Reddit json APIs
- Bootstrap - Build responsive, mobile-first projects on the web with the world's most popular front-end component library.
- Build a richer corpus to test the robustness of our application
- Analyze url direct to media content and directly show media content in application
- Definie Synonyms to explore more interesting search functionalities






