by Rohit Gomes
A Streamlit-powered research agent that searches arXiv by topic, summarizes papers using IBM watsonx.ai Granite, generates reviewer-style notes, and exports a consolidated report in TXT, DOCX, and PDF. Built for fast literature reconnaissance and lightweight paper triage.
- Tech stack: Python, Streamlit, arXiv API, IBM watsonx.ai Granite, python-docx, ReportLab
- Author: Rohit Gomes
- GitHub: github.com/RohitXJ
- Topic-based arXiv search with adjustable number of results
- LLM-powered summarization in concise, technical bullet points
- Reviewer-style notes with strengths, weaknesses, questions, and suggestions
- One-click export of the full report in TXT, DOCX, and PDF
- Caching for arXiv queries and automatic retries for model calls
- Secure secret management (no keys committed)
- Enter a topic in the input field (e.g., “Sparse Mixture-of-Experts for Vision Transformers”).
- Click “Run Agent”.
- Watch the progress as the app fetches papers, summarizes, and generates reviews.
- Preview the combined report and download as TXT/DOCX/PDF.
- Expand per-paper details for the generated summary and reviewer notes.
- app/
- app.py
- agents/
- init.py
- fetch.py
- summarize.py
- review.py
- report.py
- utils/
- init.py
- ibm_client.py
- io.py
- text.py
- .streamlit/
- secrets.toml (local only; do NOT commit)
- requirements.txt
- .gitignore
- LICENSE
- README.md
- Python 3.9+
- Dependencies in requirements.txt:
- streamlit
- arxiv
- python-docx
- reportlab
- ibm-watsonx-ai
- tenacity
Install:
- python -m venv .venv
- Windows: . .venv/Scripts/activate
- macOS/Linux: source .venv/bin/activate
- pip install -r requirements.txt
Create .streamlit/secrets.toml locally with IBM watsonx.ai credentials. Do not commit this file.
Example: [ibm] apikey = "YOUR_IBM_WATSONX_APIKEY" url = "YOUR_IBM_WATSONX_URL" project_id = "YOUR_IBM_WATSONX_PROJECT_ID" model_id = "ibm/granite-13b-instruct-v2" decoding_method = "greedy" max_new_tokens = 350 top_p = 0.9 temperature = 0.7
Notes:
- Use least-privilege API keys.
- For Streamlit Cloud, add the same keys in the app’s Secrets UI instead of uploading secrets.toml.
- streamlit run app/app.py
- Open the URL shown in the terminal (usually http://localhost:8501).
- Enter a topic and click “Run Agent”.
- Push the repository to GitHub. Ensure .streamlit/secrets.toml is ignored.
- Go to Streamlit Community Cloud and “New app”.
- Select the repo/branch and set the entry point to app/app.py.
- In Settings → Secrets, paste the [ibm] block from above.
- Deploy.
Tips:
- Subsequent code changes auto-redeploy on push.
- Update secrets in Settings → Secrets without code changes.
- Import error: “No module named app.agents; 'app' is not a package”
- Ensure app/, app/agents/, app/utils/ each have init.py.
- Run from project root: streamlit run app/app.py (not inside app/).
- If needed, add a sys.path shim at the top of app/app.py.
- IBM auth errors
- Verify apikey, url, project_id in secrets.
- Confirm the model_id exists and access is granted to your project.
- ArXiv returns no papers
- Try a broader query or reduce filters; lower number of papers while testing.
- Slow generation or rate limits
- Reduce “Number of papers” and “Max new tokens”.
- Use “greedy” decoding for determinism and speed.
- PDF/DOCX issues
- Ensure python-docx/reportlab installed properly.
- Regenerate a clean virtual environment if needed.
- Prompt style:
- app/agents/summarize.py: tweak summary format (e.g., more/less bullet points).
- app/agents/review.py: adjust reviewer categories or tone.
- Controls:
- app/app.py sidebar: expose more decoding parameters or add filters (e.g., year, author).
- app/agents/fetch.py: add category filtering or date ranges.
- Exports:
- app/utils/io.py: extend to JSON/Markdown exports.
- Add metadata like authors, publish dates, and categories to the report.
- Add year/category filters for arXiv queries.
- Deduplication and relevance scoring beyond default relevance.
- Persist session outputs to downloadable JSON for programmatic use.
- Option to include abstracts verbatim in the report.
- Add a “cost/tokens” estimator and limiter per run.
- Select models from a dropdown if multiple IBM models are available.
- Never commit secrets (secrets.toml is ignored).
- Use separate IBM keys for dev and deployment.
- Avoid logging sensitive inputs or outputs.
- Keep dependency versions updated to patch vulnerabilities.
- arXiv Python library
- IBM watsonx.ai Granite
- Streamlit
- python-docx and ReportLab
- Name: Rohit Gomes
- GitHub: github.com/RohitXJ
If this project helps your workflow, consider starring the repo and sharing feedback or feature requests via issues. 1
