Deep Research: Replace frontend-only module with CDR (Clinical Deep Research) integration

## Summary

The current Deep Research module is a **UI-only implementation** that searches PubMed/Semantic Scholar directly from the Reflex frontend process, concatenates article abstracts into a massive prompt, and sends it to the generic `/api/v1/query` endpoint for LLM synthesis. There is **no dedicated backend**, no screening, no bias assessment, no evidence grading, and no traceability. It frequently times out and silently swallows errors.

**Proposal:** Replace the entire module with an integration of [CDR (Clinical Deep Research)](https://github.com/DeepRatAI/Clinical-Deep-Research_CDR) — a production-grade systematic review engine with PICO parsing, automated screening, RoB2 bias assessment, GRADE certainty, claim verification, and PRISMA-compliant report generation.

## Current Architecture (Broken)

```
UI (state.py) → perform_scientific_research() → PubMed/SemanticScholar/DuckDuckGo
     ↓
build_scientific_research_prompt() → giant prompt (~6000 tokens)
     ↓
POST /api/v1/query → generic LLM → markdown blob
```

### Current Problems

1. **Frequent timeouts** — The prompt is enormous (all article abstracts + a massive template), and HuggingFace inference has limited throughput. `max_tokens: 6000` with a huge input regularly exceeds the 240s timeout.
2. **Silent API failures** — PubMed and Semantic Scholar errors are caught with bare `except` and `print()`. If both APIs fail, the LLM receives an almost empty prompt with no warning.
3. **`duckduckgo_search` may not be installed** — Conditional import (`HAS_DDGS`). If missing, web fallback silently returns empty.
4. **`research_phase` never transitions to `"complete"`** — State machine is broken; stays at `"researching"` forever.
5. **No screening** — Every retrieved article is included regardless of relevance.
6. **No evidence grading** — Heuristic title-based classification only.
7. **No traceability** — Claims in the output cannot be traced to source articles.
8. **No server-side orchestration** — Everything runs in the browser's Reflex state process.

## Proposed Architecture (CDR Integration)

```
UI (new research panel) → POST /api/v1/research/run {pico, config}
     ↓
CDR Pipeline (LangGraph DAG):
  PICO Parsing → Search Plan → PubMed/SemanticScholar Retrieval
       ↓
  Automated Screening (inclusion/exclusion)
       ↓
  Data Extraction + RoB2 Bias Assessment
       ↓
  Evidence Synthesis + GRADE Certainty + Claim Verification
       ↓
  PRISMA-Compliant Report Generation
     ↓
UI polls GET /api/v1/research/{run_id}/status → progress updates
UI fetches GET /api/v1/research/{run_id}/report → final report
```

### CDR vs Current Deep Research

| Capability | Current | CDR |
|---|---|---|
| Question parsing | Basic sub-question generation | PICO framework extraction |
| Literature search | PubMed + Semantic Scholar (basic) | PubMed + Semantic Scholar + fulltext |
| Screening | None | LLM-based inclusion/exclusion with reasons |
| Bias assessment | None | RoB2 + ROBINS-I |
| Evidence grading | Heuristic (title-based) | GRADE certainty framework |
| Synthesis | Single LLM prompt | Multi-step: synthesis → critique → verification |
| Output | Markdown blob | PRISMA-compliant report (MD/HTML/JSON) |
| Orchestration | Sequential async in UI state | LangGraph DAG with state management |
| Traceability | None | Full claim → source traceability |
| Quality metrics | None | RAGAs-inspired evaluation |

## Technical Tasks

### A. Backend — CDR Integration
- [ ] Add CDR as Python dependency or configure as microservice
- [ ] Create `/api/v1/research/run` endpoint — accepts PICO input, starts async CDR pipeline, returns `run_id`
- [ ] Create `/api/v1/research/{run_id}/status` — returns pipeline progress (current node, articles screened, etc.)
- [ ] Create `/api/v1/research/{run_id}/report` — returns final PRISMA report
- [ ] Add CDR dependencies to `requirements.txt` (`langgraph`, `langchain-core`, provider packages)

### B. UI — Complete Rebuild
The current UI (`scientific_search.py`, `web_search.py`, research portions of `state.py` and `app.py`) is designed around a "search + prompt LLM" flow and **cannot be adapted for CDR**. Build new:
- [ ] **PICO input form** — Population, Intervention, Comparator, Outcome fields (replaces current free-text query)
- [ ] **Pipeline progress dashboard** — Shows current stage, articles found/screened/included, with real-time updates via polling
- [ ] **PRISMA flow diagram** — Visual representation of the screening funnel
- [ ] **Evidence table** — Articles with GRADE certainty levels, RoB2 badges
- [ ] **Claim list with traceability** — Each claim linked to source articles
- [ ] **Report viewer** — Render the final PRISMA-compliant report (markdown)
- [ ] **Export** — PDF/HTML/JSON export of the complete report

### C. Cleanup
- [ ] Deprecate `ui/medex_ui/scientific_search.py` (1092 lines)
- [ ] Deprecate `ui/medex_ui/web_search.py` (343 lines)
- [ ] Remove `perform_scientific_research` / `build_scientific_research_prompt` imports from `state.py`
- [ ] Remove old `research_*` state variables that no longer apply

## Files Affected

| File | Change |
|---|---|
| `run_api.py` | Add 3 new research endpoints |
| `ui/medex_ui/state.py` | Replace research state + handlers |
| `ui/medex_ui/app.py` | New research panel components |
| `ui/medex_ui/scientific_search.py` | **Deprecate** |
| `ui/medex_ui/web_search.py` | **Deprecate** |
| `requirements.txt` | Add CDR dependencies |
| New: `src/medex/research/` | CDR integration service layer |

## Dependencies

- [CDR repository](https://github.com/DeepRatAI/Clinical-Deep-Research_CDR) must be packaged or HTTP-accessible
- LLM provider configuration (Gemini/Groq/HuggingFace) shared with CDR
- PubMed E-utilities and Semantic Scholar APIs (free, no keys)


File	Change
`run_api.py`	Add 3 new research endpoints
`ui/medex_ui/state.py`	Replace research state + handlers
`ui/medex_ui/app.py`	New research panel components
`ui/medex_ui/scientific_search.py`	Deprecate
`ui/medex_ui/web_search.py`	Deprecate
`requirements.txt`	Add CDR dependencies
New: `src/medex/research/`	CDR integration service layer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deep Research: Replace frontend-only module with CDR (Clinical Deep Research) integration #28

Summary

Current Architecture (Broken)

Current Problems

Proposed Architecture (CDR Integration)

CDR vs Current Deep Research

Technical Tasks

A. Backend — CDR Integration

B. UI — Complete Rebuild

C. Cleanup

Files Affected

Dependencies

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Capability	Current	CDR
Question parsing	Basic sub-question generation	PICO framework extraction
Literature search	PubMed + Semantic Scholar (basic)	PubMed + Semantic Scholar + fulltext
Screening	None	LLM-based inclusion/exclusion with reasons
Bias assessment	None	RoB2 + ROBINS-I
Evidence grading	Heuristic (title-based)	GRADE certainty framework
Synthesis	Single LLM prompt	Multi-step: synthesis → critique → verification
Output	Markdown blob	PRISMA-compliant report (MD/HTML/JSON)
Orchestration	Sequential async in UI state	LangGraph DAG with state management
Traceability	None	Full claim → source traceability
Quality metrics	None	RAGAs-inspired evaluation

Deep Research: Replace frontend-only module with CDR (Clinical Deep Research) integration #28

Description

Summary

Current Architecture (Broken)

Current Problems

Proposed Architecture (CDR Integration)

CDR vs Current Deep Research

Technical Tasks

A. Backend — CDR Integration

B. UI — Complete Rebuild

C. Cleanup

Files Affected

Dependencies

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions