An end-to-end Retrieval-Augmented Generation (RAG) system for answering analytical questions over company annual reports.
- Frontend: Static React UI hosted on S3, delivered via CloudFront
- Backend: FastAPI + Gunicorn API behind Application Load Balancer (ALB)
- Retrieval: OpenSearch (BM25 + dense vectors)
- Reranking: Cross-encoder–based reranker (top-k refinement)
- Caching: ElastiCache Redis (LLM response & routing cache)
- Infra / Ops: Docker, AWS EC2, CloudFront, ALB
This project is designed to run as a containerized backend on AWS, with a static frontend served via CloudFront.
Prerequisites: AWS account, Docker, OpenSearch knowledge
-
Lambda Function Deployment
- Build the Lambda Docker image from
src/aws_infra/lambda_ingestion/ - Push the image to Amazon ECR
- Create a Lambda function using the ECR image
- Configure environment variables (OpenSearch host, index name, OpenAI API key)
- Build the Lambda Docker image from
-
S3 Trigger Configuration
- Create an S3 bucket for raw documents (PDFs)
- Add an S3 event trigger:
s3:ObjectCreated:*→ Lambda function - Upload PDFs with naming format:
CompanyName-Year-DocType.pdf(e.g.,Microsoft-2024-Annual-Report.pdf)
-
Verification
- Check CloudWatch Logs for ingestion progress
- Verify chunks are indexed in OpenSearch via Dashboards
-
Provision & Configure Infrastructure
- Create an OpenSearch domain
- Apply the index mapping from
src/aws_infra/opensearch/index_mapping.jsonvia OpenSearch Dashboards - Set up ElastiCache (Redis) for caching
- Create an ECR repository for the backend image
- Create a
.envfile with required credentials (seeCONFIGURATION.md)
-
Container Build & Deployment
- Build the backend Docker image locally
- Push the image to Amazon ECR
- Launch an EC2 instance, install Docker, and pull the image from ECR
- Run the container exposing the FastAPI service on port
8000
-
Public Access
- Create an Application Load Balancer (ALB) pointing to the EC2 instance
- Verify the backend is reachable via the ALB DNS endpoint
- Upload
index.htmlto an S3 bucket - Create a CloudFront distribution with S3 as the origin
- Update the frontend to call the backend CloudFront API endpoint
- Access the application via the frontend CloudFront URL
Browser → CloudFront (Frontend) → S3 (index.html)
↓
Browser → CloudFront (Backend API) → ALB → EC2 (FastAPI + Gunicorn)
.
├── assets/ # Architecture diagrams & output screenshots
│
├── backend_server/ # FastAPI backend
│ └── app.py
│
├── front_end/ # Static frontend
│ └── index.html
│
├── notebooks/ # Development & experimentation notebooks
│
├── src/ # Core RAG logic
│ ├── aws_infra/ # AWS-related components
│ │ ├── lambda_ingestion/ # S3 → OpenSearch ingestion
│ │ │ └── ingestion.py
│ │ └── opensearch/ # OpenSearch client & helpers
│ │ ├── client.py
│ │ └── index_mapping.json
│ ├── caching.py
│ ├── generation.py
│ ├── guardrails.py
│ ├── memory.py
│ ├── prompts.py
│ ├── rerankers.py
│ ├── retrieval.py
│ └── router.py
│
├── .dockerignore
├── .gitignore
├── Dockerfile # Backend container image
├── README.md # Project documentation
├── main.py # RAG pipeline entry point (router → retrieval → rerank → generate)
├── requirements.txt # Python dependencies
└── .env # Environment variables (not committed)
This project leverages the following tools and platforms:
- FastAPI for building the backend API
- Gunicorn for production-grade ASGI serving
- OpenSearch for hybrid retrieval (BM25 + vector search)
- Redis (ElastiCache) for caching LLM responses and routing decisions
- Docker for containerization
- AWS EC2 for backend hosting
- Application Load Balancer (ALB) for traffic management
- Amazon S3 for static frontend hosting
- Amazon CloudFront for CDN and secure content delivery
This project is licensed under the MIT License.
See the LICENSE file for details.
Contributions are welcome — please open an issue or submit a pull request with a clear description of your changes.





