Skip to content

Shoaib-33/Bangla-History-Chatbot

Repository files navigation

Bangla-History-Chatbot

Sample Output

📜 Description

Bangla RAG QA is an AI-powered Retrieval-Augmented Generation (RAG) system built with FastAPI to preserve and restore real Bangla history.
It can answer questions in Bangla using only trusted historical sources, ensuring that both the questions and answers remain fully in Bengali.

🎯 Main Aim

To create a reliable digital archive of Bangla history and make it searchable in natural Bangla language — no distortions, only facts from the original documents.

🛠 Tech Stack

  • Backend: FastAPI
  • OCR: bangla_pdf_ocr
  • Embeddings: HuggingFace SBERT (Bangla)
  • Vector Store: ChromaDB
  • LLM: Groq / Gemini / OpenAI via LangChain
  • Text Processing: Unicode normalization & Bangla-specific cleanup

💡 Motivation

📜 Inspiration & Data Source

This project was inspired by the open-source initiative Real History of Bangladesh, which aims to preserve and present the authentic history of Bangladesh without distortion.

The historical dataset was collected from the above project. Building upon this foundation, I developed the entire AI-powered RAG (Retrieval-Augmented Generation) pipeline, including:

  • Data preprocessing & Bangla text normalization
  • Chunking & semantic search indexing
  • Context-aware question answering system

While the source content is credited to Real History of Bangladesh, all preprocessing, pipeline design, and AI integration were implemented by me.

The goal is to make historical knowledge easily searchable in Bangla, ensuring both the questions and answers are in the native language, and that the answers are drawn only from verified historical documents.

This is an open-source effort and is completely free for improvement.
Anyone interested in enhancing the dataset, improving accuracy, adding more models, or expanding features is welcome to contribute.
Together, we can build a reliable, accessible, and truthful digital archive of Bangladesh’s history for future generations.


🔑 Environment Setup

This project requires API keys to access LLMs. If you are using Google Gemini, follow these steps:

  1. Get a Gemini API Key
  2. Create a .env file in your project root:
    GOOGLE_API_KEY=your_gemini_api_key_here

⚙️ Project Setup & Run

  1. Clone the repository:
    git clone https://github.com/your-username/your-repo.git
    cd your-repo
  2. Install dependencies:
    pip install -r requirements.txt
  3. Run the application:
    uvicorn app:app --reload
  4. Open your browser and go to:
    http://127.0.0.1:8000

About

Rag based Chatbot to restore Real Bangla History

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published