A production-ready, RAG-powered chatbot specialized in biomedical engineering topics. Built with LangChain, FAISS, and Groq LLM for intelligent document-based question answering.
- 🌟 Key Features
- 🏗️ System Architecture
- 🚀 Quick Start
- 📁 Project Structure
- 🐳 Docker Deployment
- 🧪 Testing
- 📝 Configuration
- 🎯 Usage Examples
- 🔧 Troubleshooting
- 📊 Performance Metrics
- 🛠️ Technology Stack
- 🧠 RAG Architecture: Retrieval-Augmented Generation for accurate, context-based responses
- 📚 Document Processing: Automatic PDF processing with OCR fallback
- 💬 Dual Mode Operation: Technical Q&A + casual conversation support
- 🚀 Production Ready: Proper error handling, logging, and monitoring
- 🐳 Docker Support: One-command deployment with Docker Compose
- ✅ Tested: Unit tests with pytest
- 📊 Session Management: Track queries and conversation history
┌─────────────┐
│ User UI │ (Streamlit)
└──────┬──────┘
│
┌──────▼──────────────┐
│ QA Chain Manager │
│ (LangChain + Groq) │
└──────┬──────────────┘
│
┌──────▼──────────────┐
│ Vector Store (FAISS)│
│ + Embeddings │
└──────┬──────────────┘
│
┌──────▼──────────────┐
│ PDF Documents │
│ (Technical Docs) │
└─────────────────────┘
MedTech AI/
├── app.py # Main Streamlit application
├── main.py # Alternative application entry point
├── config.py # Centralized configuration
├── requirements.txt # Python dependencies
├── Dockerfile # Docker container setup
├── docker-compose.yml # Docker Compose configuration
├── uv.lock # Dependency lock file
├── .python-version # Python version specification
├── view.png # Project view image
│
├── core/ # Core functionality modules
│ ├── __init__.py
│ ├── vectorstore.py # FAISS vector store management
│ ├── qa_chain.py # QA chain operations
│ └── document_processor.py # PDF processing & chunking
│
├── utils/ # Utility modules
│ ├── __init__.py
│ └── logger.py # Logging configuration
│
├── scripts/ # Utility scripts
│ ├── __init__.py
│ └── build_vectorstore.py # Index documents script
│
├── tests/ # Unit tests
│ ├── __init__.py
│ └── test_qa.py # Test suite
│
├── data/ # PDF documents (add your files here)
├── vectorstore/ # FAISS index storage
│ └── db_faiss/
├── logs/ # Application logs
│
├── .env.example # Environment variables template
├── .gitignore # Git ignore rules
├── .gitattributes # Git attributes
├── LICENSE # MIT License
└── README.md # This file
- Python 3.13+
- Groq API Key (Get one free)
- PDF documents for your knowledge base
- Clone the repository
git clone https://github.com/beastNico/MedTech-AI.git
cd MedTech-AI- Create virtual environment
python -m venv venv
venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Set up environment variables
copy .env.example .env
# Edit .env and add your GROQ_API_KEY- Add your PDF documents
# Place PDF files in the data/ directory
copy your_documents.pdf data\- Build vector store
python scripts\build_vectorstore.py- Run the application
streamlit run app.pyVisit http://localhost:8501 in your browser!
# Build and run
docker-compose up -d
# View logs
docker-compose logs -f
# Stop
docker-compose down# Build image
docker build -t medtech-ai .
# Run container
docker run -p 8501:8501 ^
-v %cd%/data:/app/data ^
-v %cd%/vectorstore:/app/vectorstore ^
-e GROQ_API_KEY=your_key_here ^
medtech-aiRun tests with pytest:
# Run all tests
pytest
# Run with coverage
pytest --cov=core --cov=utils
# Run specific test file
pytest tests\test_qa.py -vEdit config.py to customize:
# Model settings
LLM_MODEL = "deepseek-r1-distill-llama-70b" # Change model
LLM_TEMPERATURE = 0.0 # Adjust creativity
# Retrieval settings
RETRIEVAL_K = 6 # Number of docs to retrieve
CHUNK_SIZE = 500 # Text chunk size
CHUNK_OVERLAP = 50 # Chunk overlapUser: Hi? Who Are You?
Bot: Hello. I'm a biomedical engineering assistant. It's nice to meet you. How can I assist you today?
User: Compare MRI and CT imaging techniques - what are their advantages and limitations?
Bot: Based on the provided context, here is a comparison of MRI and CT imaging techniques:
Advantages of MRI:
Non-invasive procedure
Does not require injecting a contrast medium
Greater sensitivity for detecting disk problems and spinal cord involvement...
- Verify your
.envfile exists - Check that
GROQ_API_KEYis set correctly - Ensure no quotes around the key value
- Response Time: ~2-4 seconds per query
- Accuracy: Depends on document quality
- Uptime: 99%+ with proper deployment
- Concurrent Users: Supports multiple users (Streamlit limitation)
| Component | Technology |
|---|---|
| Frontend | Streamlit |
| LLM | Groq (DeepSeek R1 Distill) |
| Embeddings | HuggingFace (MiniLM) |
| Vector Store | FAISS |
| Framework | LangChain |
| Document Processing | PyPDF + Unstructured |
| Logging | Python logging |
| Testing | Pytest |
| Containerization | Docker |
