Bangla RAG QA is an AI-powered Retrieval-Augmented Generation (RAG) system built with FastAPI to preserve and restore real Bangla history.
It can answer questions in Bangla using only trusted historical sources, ensuring that both the questions and answers remain fully in Bengali.
To create a reliable digital archive of Bangla history and make it searchable in natural Bangla language — no distortions, only facts from the original documents.
- Backend: FastAPI
- OCR:
bangla_pdf_ocr - Embeddings: HuggingFace SBERT (Bangla)
- Vector Store: ChromaDB
- LLM: Groq / Gemini / OpenAI via LangChain
- Text Processing: Unicode normalization & Bangla-specific cleanup
This project was inspired by the open-source initiative Real History of Bangladesh, which aims to preserve and present the authentic history of Bangladesh without distortion.
The historical dataset was collected from the above project. Building upon this foundation, I developed the entire AI-powered RAG (Retrieval-Augmented Generation) pipeline, including:
- Data preprocessing & Bangla text normalization
- Chunking & semantic search indexing
- Context-aware question answering system
While the source content is credited to Real History of Bangladesh, all preprocessing, pipeline design, and AI integration were implemented by me.
The goal is to make historical knowledge easily searchable in Bangla, ensuring both the questions and answers are in the native language, and that the answers are drawn only from verified historical documents.
This is an open-source effort and is completely free for improvement.
Anyone interested in enhancing the dataset, improving accuracy, adding more models, or expanding features is welcome to contribute.
Together, we can build a reliable, accessible, and truthful digital archive of Bangladesh’s history for future generations.
This project requires API keys to access LLMs. If you are using Google Gemini, follow these steps:
-
Get a Gemini API Key
- Sign in to Google AI Studio
- Go to API Keys and create a new one.
-
Create a
.envfile in your project root:GOOGLE_API_KEY=your_gemini_api_key_here
- Clone the repository:
git clone https://github.com/your-username/your-repo.git cd your-repo - Install dependencies:
pip install -r requirements.txt - Run the application:
uvicorn app:app --reload -
Open your browser and go to:
http://127.0.0.1:8000
