ResearchDigest

Disclaimer: This is a proof-of-concept implementation of a custom arXiv paper fetcher. It is not production-ready and has limitations.

Overview

ResearchDigest is a Streamlit-based application that allows users to:

Fetch recent papers from arXiv (default category: cs.RO - Robotics).
Download PDFs and extract their text using PyMuPDF (fitz).
Summarize the extracted text using a Hugging Face t5-small summarization pipeline.
Categorize papers into basic robotics subdomains (e.g., Control Systems, Robot Vision, Robot Learning).
Display summaries and categories in an interactive Streamlit dashboard.

Screenshots

Below are some screenshots of the application in action:

Dashboard Overview
Paper Summaries

Running the Application

To run the application on Windows, use the following command in your terminal:

python -m streamlit run .\main.py

Make sure you have all the required dependencies installed before running the command.

Configuration

Category Query: By default, the app fetches papers from the cs.RO (Robotics) category. To change this, modify the fetch_arxiv_papers(query="cs.RO", ...) function in main.py.
Summarization Model: The app uses the t5-small model for summarization. You can switch to a different model by updating the pipeline("summarization", model="...") argument in the load_summarizer() function.
Categorization Keywords: Edit the categories dictionary in the categorize_paper() function to refine subdomain classification.

Limitations

Error Handling: Minimal error handling is implemented. The app may fail silently on download or extraction errors.
Summary Lengths: Summaries are truncated to the first 5,000 characters of extracted text.
Basic Categorization: Keyword-based classification may misclassify papers.
Model Performance: The t5-small model is lightweight but may produce less accurate summaries for complex texts.

Contributing

Contributions are welcome! Feel free to fork the repository and:

Improve error handling.
Add more categories or refine keyword matching.
Integrate more powerful summarization models.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
media		media
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ResearchDigest

Overview

Screenshots

Running the Application

Configuration

Limitations

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

AmmarW/ResearchDigest

Folders and files

Latest commit

History

Repository files navigation

ResearchDigest

Overview

Screenshots

Running the Application

Configuration

Limitations

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages