Developers are drowning in code they didn't write.
flowchart TB
A[👨💻 Developer]
B[📚 Documentation]
C[🔍 New Codebases]
D[🎧 Passive Learning]
E[📖 Code Reviews]
A --> B
A --> C
A --> D
A --> E
B --> B1[⏳ Reading is time-consuming]
C --> C1[🕒 Understanding takes hours/days]
D --> D1[🚫 Can't learn while commuting]
E --> E1[😴 Reviews are dry & boring]
B1 --> F[❌ Productivity Loss]
C1 --> F
D1 --> F
E1 --> F
Atlas Forensic Vault transforms any GitHub repository into an engaging AI-generated podcast narrated in a Film Noir detective style.
"In this city, every line of code tells a story. Most of them are tragedies. Some are comedies. But in my precinct? They're all mysteries until I say otherwise."
— Det. Mongo D. Bane
flowchart LR
A[🧾 1. Submit<br/>GitHub Repository] --> B[🕵️ 2. Investigate<br/>AI Code Analysis]
B --> C[🎙️ 3. Listen<br/>Generated Podcast]
C --> D[🧠 4. Learn<br/>Deep Understanding]
flowchart TB
subgraph Client["🖥️ Client Layer"]
UI["Next.js 16 Frontend"]
Player["Reel-to-Reel Audio Player"]
Transcript["Live Transcript Viewer"]
end
subgraph API["⚡ API Layer"]
Analyze["/api/analyze"]
Generate["/api/generate-audio"]
Stream["/api/podcasts/audio"]
end
subgraph Services["🧠 AI Services"]
GitHub["📦 GitHub API"]
Gemini["🧠 Gemini 2.5 Flash"]
Eleven["🎙️ ElevenLabs TTS"]
end
subgraph Database["🍃 MongoDB Atlas"]
Podcasts[("Podcasts Collection")]
Vector["🔍 Vector Search"]
Changes["📡 Change Streams"]
end
UI --> Analyze
Analyze --> GitHub
GitHub --> Gemini
Gemini --> Podcasts
Podcasts --> Generate
Generate --> Eleven
Eleven --> Podcasts
Podcasts --> Stream
Stream --> Player
Changes -.->|Real-time Updates| UI
Podcasts --> Transcript
sequenceDiagram
autonumber
participant User as 👤 User
participant App as 🖥️ Next.js
participant GitHub as 📦 GitHub
participant Gemini as 🧠 Gemini
participant DB as 🍃 MongoDB
participant Voice as 🎙️ ElevenLabs
User->>App: Submit Repository URL
App->>DB: Create Podcast Record
App->>GitHub: Fetch Repo Metadata
GitHub-->>App: Files & Structure
App->>DB: Update Progress 25%
App->>Gemini: Generate Script
Gemini-->>App: Noir-Style Script
App->>DB: Store Script 75%
App->>Voice: Generate Audio
Voice-->>App: Audio Buffers
App->>DB: Store Audio 100%
DB-->>User: Real-time Progress
User->>App: Play Podcast
App-->>User: Stream Audio + Transcript
The Atlas Forensic Vault has evolved from a simple audio player into a high-density Repository Intelligence Unit. Before the detective delivers his audio verdict, the system performs a multi-layered autopsy of the "Code Crime Scene."
Instead of a generic landing page, investigators are now redirected to the /case dashboard—a thematic, 3-column intelligence hub that interrogates the repository in real-time.
flowchart TB
subgraph Dashboard["🕵️ CSI DASHBOARD: FORENSIC ANALYTICS"]
direction TB
subgraph Evidence["📁 1. THE EVIDENCE MAP"]
EM1["Atlas-Powered Indexing"]
EM2["Metadata retrieval < 30ms"]
EM3["Color-coded Churn Analysis"]
end
subgraph Interrogation["💬 2. THE INTERROGATION ROOM"]
IR1["Gemini 2.5 Flash Intelligence"]
IR2["Atlas Vector Search Context"]
IR3["RAG-based Logic Interrogation"]
end
subgraph MoneyTrail["📡 3. THE MONEY TRAIL"]
MT1["Ingress: Suspect Entry Points"]
MT2["Laundering: Logic Distribution"]
MT3["Fallout: Real-time Risk Audit"]
end
end
Repo[(GitHub Repository)] --> Dashboard
Dashboard --> Verdict{FORENSIC VERDICT}
Verdict --> Podcast[🎙️ Generate Audio Dossier]
We use MongoDB Aggregation Pipelines to surface the technical "Rap Sheet" of every repository.
The system tracks which "accomplices" have touched the most volatile parts of the code.
pie title Code Crime Contribution (By Churn)
"Lead Developer (Mastermind)" : 45
"Senior Dev (Accomplice)" : 25
"Middleware Specialist" : 15
"Bug Fixer (Cleaner)" : 15
The autopsy calculates the "Motive" (Architecture Summary) vs. the "Execution" (Implementation Quality).
pie title "Repository Forensic Health Distribution"
"Security (Clean Record)" : 95
"Performance (Velocity)" : 80
"Scalability (Expansion)" : 70
"Readability (Legibility)" : 60
"Logic (Complexity)" : 85
Proof of our 16.7x speedup via the Atlas Forensic Vault caching layer.
xychart-beta
title "Latency: Cold Request vs. Vault Retrieval"
x-axis ["GitHub Fetch", "Gemini Analysis", "ElevenLabs TTS", "Atlas Vault Read"]
y-axis "Latency (ms)" 0 --> 5000
line [4500, 3200, 2500, 30]
graph LR
A[🎬 Select Style] --> B[🕵️ True Crime]
A --> C[⚽ Sports]
A --> D[🦁 Documentary]
B --> E["Detective Voice<br/>Film Noir"]
C --> F["Dual Commentators<br/>Play-by-Play"]
D --> G["Attenborough Style<br/>Nature Doc"]
E --> H[🎙️ Generate Podcast]
F --> H
G --> H
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | Next.js 16, React 19, TypeScript | Server-side rendering, type safety |
| Styling | Tailwind CSS 4, Framer Motion | Responsive design, animations |
| 3D Graphics | Three.js, React Three Fiber | Immersive UI elements |
| Database | MongoDB Atlas | Document storage, vector search |
| AI - Script | Google Gemini 2.5 Flash | Codebase analysis, script generation |
| AI - Voice | ElevenLabs Multilingual v2 | High-quality text-to-speech |
| Security | Cloudflare Workers | DDoS protection, edge caching |
| Hosting | Vercel (Pro) | Serverless deployment, 300s timeout |
| API | GitHub REST API | Repository data fetching |
| Feature | Description |
|---|---|
| 🎙️ AI Code Narration | GitHub repo → AI podcast |
| 🎛️ Retro Audio Player | Reel animations · Vintage UI |
| 📜 Live Transcript | Real-time sync · Click-to-seek |
| 🔍 MongoDB Atlas | Vector Search · Change Streams |
| 📄 Export Reports | Redacted · Classified |
Problem: Users waiting for entire podcast generation before playback.
Our Solution: Chunked streaming with MongoDB GridFS
Let
Traditional approach:
Our chunked approach:
Perceived speedup:
For a typical 3-minute podcast generation with polling every 2 seconds:
Traditional Polling:
With Change Streams:
Bandwidth Reduction:
Network Traffic Saved:
Assuming average request size
For 1000 users per day:
Without caching, for
With MongoDB caching (cache hit rate
Where
Savings:
Real numbers from our testing:
- Gemini API: $0.10 per 1M tokens → ~$0.02 per analysis
- MongoDB read: $0.001 per analysis
- Cache hit rate: 87% after first week
Using cosine similarity between query vector
Performance Analysis:
Brute force comparison with
where
MongoDB Atlas Vector Search (using HNSW index):
Speedup for 10,000 repositories:
Result: Recommendations in <100ms even with thousands of repos in the database.
For an audio file of size
Traditional approach (load entire file):
GridFS streaming (load only current chunk):
Memory savings for 10MB file:
Concurrent user scalability:
With
Result: Support 40x more concurrent users with the same server resources.
Without edge caching:
With Cloudflare CDN:
Performance improvement:
Bandwidth Cost Optimization:
Monthly bandwidth without CDN (1,000 podcasts × 10MB × 100 plays):
With Cloudflare CDN (95% cache hit rate):
Cost savings: 95% reduction in origin bandwidth costs
Spin up Atlas Forensic Vault locally in minutes.
Ensure the following are installed and ready
- Node.js ≥ 18 (LTS recommended)
- MongoDB Atlas cluster (free tier works)
- API Keys
- Google Gemini
- ElevenLabs (Text-to-Speech)
- (Optional) GitHub token for higher API rate limits
Clone the repository and install dependencies
git clone https://github.com/SoumyaEXE/Atlas-Forensic-Vault.git
cd Atlas-Forensic-Vault
npm installCreate a local environment file
cp .env.example .env.localAdd the required keys:
MONGODB_URI=mongodb+srv://<username>:<password>@cluster.mongodb.net/atlas_forensic_vaultGEMINI_API_KEY=your_gemini_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_keyGITHUB_TOKEN=your_github_tokenStart the development server -
npm run devThe app will boot with hot reload enabled.
Open in your browser -
http://localhost:3000You're ready to investigate repositories. 🕵️
| Focus Area | What We Delivered |
|---|---|
| 🍃 MongoDB Atlas Excellence | Vector Search · Change Streams · Flexible Schema · GridFS |
| 💡 Product Innovation | Code-to-podcast experience with Film Noir narrative |
| 🧠 AI-First Architecture | Gemini for deep analysis · ElevenLabs for narration |
| 🔒 Security & Performance | Cloudflare DDoS protection · Edge caching · IP filtering |
| 🚀 Production Readiness | Fully deployed, live, and scalable on Vercel |
| 🛠️ Developer Impact | Faster onboarding and deeper code understanding |
<--
| 👨💻 Soumya | 👨💻 Subarna | 👨💻 Saikat | 👨💻 Sourish |
|---|---|---|---|
| Full Stack Developer | Android Developer | DevOps Engineer | Competitive Programmer |
-->
