Skip to content

YSayaovong/Pfizer-AI-PDF-Reader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💊 PharmRAG — Pharmaceutical SDF Document Intelligence

A full Retrieval-Augmented Generation (RAG) chatbot for pharmaceutical SDF and PDF documents. Upload drug labels, clinical trial reports, safety data sheets, and pharmacology papers — then ask questions and get sourced, cited answers powered by an open-source LLM.


Pipeline

End-to-End Pipeline

PDF Upload → OCR / Extract → Chunk + Metadata → Embed → FAISS Index
                                                              ↓
Answer + Sources ← Zephyr-7B ← Prompt ← Retrieve ← User Query

UI

Gradio Chatbot Interface


Test Results

Test Results


Production Readiness

Production Architecture


Tech Stack

Component Technology
PDF Extraction PyMuPDF (fitz)
OCR Tesseract via pytesseract
Chunking Word-level sliding window (400w, 80 overlap)
Metadata Heuristic doc-type classifier (5 categories)
Embeddings sentence-transformers/all-MiniLM-L6-v2
Vector Store FAISS IndexFlatIP (cosine similarity)
LLM HuggingFaceH4/zephyr-7b-beta
UI Gradio Blocks

Document Types Supported

  • SDS — Safety Data Sheets
  • Clinical Trial — Randomized trials, efficacy studies
  • Drug Label — Prescribing information, contraindications
  • Pharmacology — PK/PD, bioavailability, metabolism
  • General — Any other pharmaceutical PDF

Features

  • 🔬 Auto OCR fallback — detects scanned pages and switches to Tesseract automatically
  • 🏷️ Smart metadata tagging — classifies every document on ingest
  • 🎯 Confidence scores — per-chunk cosine similarity with visual bars
  • 🔀 Document-type filter — scope retrieval to a single category
  • 📎 Multi-file upload — ingest multiple PDFs in one session
  • 💬 Sourced answers — every claim cited with [Source N] notation
  • Live status bar — shows retrieval latency and chunk count

License

MIT

About

End to End RAG for Pharmaceutical Documents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors