The RAG Summarizer is a Retrieval-Augmented Generation (RAG) model designed to generate concise summaries from input documents. By leveraging both retrieval mechanisms and generative capabilities, it produces accurate and contextually relevant summaries.
- Document Retrieval: Fetches relevant information from a predefined dataset to enhance summary generation.
- Text Summarization: Generates concise summaries using advanced natural language processing techniques.
- Streamlit Interface: Provides an interactive web application for users to input text and receive summaries.
-
Clone the Repository:
git clone https://github.com/arico97/RAG_summarizer.git cd RAG_summarizer -
Set Up a Virtual Environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
-
Configure LLM Credentials:
Create a
.envfile in the project root directory and add your Large Language Model (LLM) credentials:LLM_API_KEY=your_api_key_here
- Run the Application:
.\run.sh - Access the Application:
Open your web browser and navigate to http://localhost:8501 to interact with the RAG Summarizer.
src/: Contains the core modules for document retrieval and summarization.streamlit_app.py: Hosts the Streamlit web application interface.requirements.txt: Lists all necessary Python dependencies.Dockerfile: Defines the Docker image setup for containerized deployment.rag-summarizer-deployment.yaml: Kubernetes deployment configuration for the application.
-
Build the Docker Image:
docker build -t rag-summarizer:latest . -
Run the Docker Container:
docker run -p 8501:8501 rag-summarizer:latest
-
Apply the Deployment Configuration:
kubectl apply -f rag-summarizer-deployment.yaml
-
Expose the Service: Ensure the service is accessible by configuring the appropriate Kubernetes service resources.
Execute the test suite to verify the functionality of the summarizer:
python test.py -o [option]The -o argument specifies the type of test to run. The available options are:
- youtube: Tests summarization of YouTube video transcripts.
[option]=4 - pdf: Tests summarization of local PDF documents.
[option]=2 - pdf on web: Tests summarization of PDF documents in a url.
[option]=1 - web: Tests summarization of web page content.
[option]=3 - epub: Tests summarization of epub documents.
[option]=5
It's important to set in the doc variable either the document path or the url of the webpage, document or YouTube video.
Example usage:
python test.py -o 1Contributions are welcome! Please fork the repository and create a pull request with your enhancements.
This project is licensed under the MIT License. See the LICENSE file for details.
Special thanks to the open-source community and the developers of the libraries utilized in this project.