medicos - Medical Information Retrieval System

Overview

medicos is a Retrieval-Augmented Generation (RAG) system designed to provide accurate and well-referenced responses to medical queries. The system combines vector search capabilities, external search fallback, and large language model integration to deliver reliable medical information.

Features

Vector Database Storage: Uses ChromaDB to store and retrieve medical documents based on semantic similarity
Document Processing Pipeline: Processes medical documents and websites, chunking content into manageable segments
Google Search Fallback: Falls back to live Google search when local database lacks relevant information
Response Validation: Validates database results for relevance before providing answers
Response Caching: Caches responses to similar questions to improve performance
Medical-Specific Embeddings: Uses domain-specific embedding models for better retrieval accuracy
Source Attribution: Provides transparent source attribution for all information
RESTful API: Exposes functionality through a FastAPI-based REST API
Streamlit Frontend: User-friendly web interface for interacting with the system

System Architecture

The system is composed of four main components:

Document Processor (document_processing.py): Handles document loading, chunking, and indexing
RAG System (rag_system.py): Core component that manages retrieval, validation, generation, and caching
API Server (main.py): FastAPI-based interface for external applications
Streamlit Frontend (medicos-frontend.py): Web interface for easy interaction with the system

Backend Architecture

Sample Images

Prerequisites

Python 3.8+
Required API keys:
- Hugging Face API key (for embedding models)
- Google API key (for search fallback)
- Google Custom Search Engine ID
- Groq API key (for LLM integration)

Installation

Clone the repository

git clone https://github.com/karandomguy/medicos.git
cd medicos

Install dependencies

pip install -r requirements.txt

Create a .env file with the following environment variables:

HUGGINGFACE_API_KEY=your_huggingface_key
GOOGLE_API_KEY=your_google_api_key
GOOGLE_CSE_ID=your_custom_search_engine_id
GROQ_API_KEY=your_groq_api_key

Usage

Process Medical Documents

The document processor can ingest documents from JSON files or URLs:

from document_processing import DocumentProcessor

# Initialize processor
processor = DocumentProcessor()

# Process medical websites
medical_urls = [
    "https://www.mayoclinic.org/diseases-conditions/diabetes/symptoms-causes/syc-20371444",
    "https://www.cdc.gov/diabetes/basics/diabetes.html"
]
processor.run_pipeline(urls=medical_urls)

Query the System

from rag_system import MedicalRAG

# Initialize RAG system
rag = MedicalRAG()

# Process a medical query
response = rag.process_medical_query(
    "What are the early symptoms of diabetes?",
    use_google_fallback=True,
    top_k=5
)

print(response["answer"])

Running the API Server

uvicorn main:app --reload

The API will be available at http://localhost:8000

Running the Streamlit Frontend

streamlit run medicos-frontend.py

The web interface will be available at http://localhost:8501

API Endpoints

POST /api/query: Process a medical question
- Request body: {"question": "your question", "use_google_fallback": true, "top_k": 5}
- Returns: Answer with sources
GET /api/health: Check system health
- Returns: System status

Streamlit Frontend

The system includes a user-friendly Streamlit web interface that provides:

Simple question input interface
Configurable settings for Google search fallback and source retrieval
Comprehensive display of answers with source attribution
Search history tracking and previous result viewing
Tabbed interface for exploring multiple sources
Clear indication of whether answers come from the knowledge base or external search

System Flow

Query Processing:
- Check cache for similar questions
- Search vector database (ChromaDB) for relevant documents
- Validate relevance of retrieved documents
- Fall back to Google Search if needed
- Generate answer using LLM (Groq API)
- Cache response for future similar queries
Document Processing:
- Load documents from file or web
- Chunk documents into manageable segments
- Generate embeddings for each chunk
- Store in vector database

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
chroma_db		chroma_db
data		data
.gitignore		.gitignore
Example Queries and Outputs.pdf		Example Queries and Outputs.pdf
README.md		README.md
architecture.png		architecture.png
document_processing.py		document_processing.py
main.py		main.py
medicos-frontend.py		medicos-frontend.py
rag_system.py		rag_system.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

medicos - Medical Information Retrieval System

Overview

Features

System Architecture

Backend Architecture

Sample Images

Prerequisites

Installation

Usage

Process Medical Documents

Query the System

Running the API Server

Running the Streamlit Frontend

API Endpoints

Streamlit Frontend

System Flow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

medicos - Medical Information Retrieval System

Overview

Features

System Architecture

Backend Architecture

Sample Images

Prerequisites

Installation

Usage

Process Medical Documents

Query the System

Running the API Server

Running the Streamlit Frontend

API Endpoints

Streamlit Frontend

System Flow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages