Skip to content

Your Open Source Repo Investigator. Helps you analyze repositories you’re interested in contributing to by surfacing structure, intent, and contribution-ready insights.

Notifications You must be signed in to change notification settings

Dronzer2code/Atlas-Forensic-Vault

Repository files navigation

Atlas Forensic Vault Banner

Atlas Forensic Vault Typing Animation

"Every Repository Has a Story. We Make It Talk."

Next.js MongoDB Atlas ElevenLabs
Gemini Vercel

🎯 The Problem :

Developers are drowning in code they didn't write.

flowchart TB
    A[👨‍💻 Developer]
    B[📚 Documentation]
    C[🔍 New Codebases]
    D[🎧 Passive Learning]
    E[📖 Code Reviews]

    A --> B
    A --> C
    A --> D
    A --> E

    B --> B1[⏳ Reading is time-consuming]
    C --> C1[🕒 Understanding takes hours/days]
    D --> D1[🚫 Can't learn while commuting]
    E --> E1[😴 Reviews are dry & boring]

    B1 --> F[❌ Productivity Loss]
    C1 --> F
    D1 --> F
    E1 --> F
Loading

💡 Our Solution :

Atlas Forensic Vault transforms any GitHub repository into an engaging AI-generated podcast narrated in a Film Noir detective style.

"In this city, every line of code tells a story. Most of them are tragedies. Some are comedies. But in my precinct? They're all mysteries until I say otherwise."

Det. Mongo D. Bane

🎬 How It Works

flowchart LR
    A[🧾 1. Submit<br/>GitHub Repository] --> B[🕵️ 2. Investigate<br/>AI Code Analysis]
    B --> C[🎙️ 3. Listen<br/>Generated Podcast]
    C --> D[🧠 4. Learn<br/>Deep Understanding]
Loading

🏗️ System Architecture :

High-Level Overview

flowchart TB
    subgraph Client["🖥️ Client Layer"]
        UI["Next.js 16 Frontend"]
        Player["Reel-to-Reel Audio Player"]
        Transcript["Live Transcript Viewer"]
    end

    subgraph API["⚡ API Layer"]
        Analyze["/api/analyze"]
        Generate["/api/generate-audio"]
        Stream["/api/podcasts/audio"]
    end

    subgraph Services["🧠 AI Services"]
        GitHub["📦 GitHub API"]
        Gemini["🧠 Gemini 2.5 Flash"]
        Eleven["🎙️ ElevenLabs TTS"]
    end

    subgraph Database["🍃 MongoDB Atlas"]
        Podcasts[("Podcasts Collection")]
        Vector["🔍 Vector Search"]
        Changes["📡 Change Streams"]
    end

    UI --> Analyze
    Analyze --> GitHub
    GitHub --> Gemini
    Gemini --> Podcasts
    Podcasts --> Generate
    Generate --> Eleven
    Eleven --> Podcasts
    Podcasts --> Stream
    Stream --> Player
    Changes -.->|Real-time Updates| UI
    Podcasts --> Transcript
Loading

🔄 Data Flow Sequence

sequenceDiagram
    autonumber
    participant User as 👤 User
    participant App as 🖥️ Next.js
    participant GitHub as 📦 GitHub
    participant Gemini as 🧠 Gemini
    participant DB as 🍃 MongoDB
    participant Voice as 🎙️ ElevenLabs

    User->>App: Submit Repository URL
    App->>DB: Create Podcast Record
    App->>GitHub: Fetch Repo Metadata
    GitHub-->>App: Files & Structure
    App->>DB: Update Progress 25%
    App->>Gemini: Generate Script
    Gemini-->>App: Noir-Style Script
    App->>DB: Store Script 75%
    App->>Voice: Generate Audio
    Voice-->>App: Audio Buffers
    App->>DB: Store Audio 100%
    DB-->>User: Real-time Progress
    User->>App: Play Podcast
    App-->>User: Stream Audio + Transcript
Loading

🔍 Forensic Case Analysis (CSI Dashboard)

The Atlas Forensic Vault has evolved from a simple audio player into a high-density Repository Intelligence Unit. Before the detective delivers his audio verdict, the system performs a multi-layered autopsy of the "Code Crime Scene."

🏛️ The Interrogation Workflow

Instead of a generic landing page, investigators are now redirected to the /case dashboard—a thematic, 3-column intelligence hub that interrogates the repository in real-time.

flowchart TB
    subgraph Dashboard["🕵️ CSI DASHBOARD: FORENSIC ANALYTICS"]
        direction TB
        
        subgraph Evidence["📁 1. THE EVIDENCE MAP"]
            EM1["Atlas-Powered Indexing"]
            EM2["Metadata retrieval < 30ms"]
            EM3["Color-coded Churn Analysis"]
        end

        subgraph Interrogation["💬 2. THE INTERROGATION ROOM"]
            IR1["Gemini 2.5 Flash Intelligence"]
            IR2["Atlas Vector Search Context"]
            IR3["RAG-based Logic Interrogation"]
        end

        subgraph MoneyTrail["📡 3. THE MONEY TRAIL"]
            MT1["Ingress: Suspect Entry Points"]
            MT2["Laundering: Logic Distribution"]
            MT3["Fallout: Real-time Risk Audit"]
        end
    end

    Repo[(GitHub Repository)] --> Dashboard
    Dashboard --> Verdict{FORENSIC VERDICT}
    Verdict --> Podcast[🎙️ Generate Audio Dossier]
Loading

📊 Repository Forensic Analytics

We use MongoDB Aggregation Pipelines to surface the technical "Rap Sheet" of every repository.

🕵️‍♂️ Contributor "Suspect" Analysis

The system tracks which "accomplices" have touched the most volatile parts of the code.

pie title Code Crime Contribution (By Churn)
    "Lead Developer (Mastermind)" : 45
    "Senior Dev (Accomplice)" : 25
    "Middleware Specialist" : 15
    "Bug Fixer (Cleaner)" : 15
Loading

🦹🏻 Technical Debt "Crime Rate"

The autopsy calculates the "Motive" (Architecture Summary) vs. the "Execution" (Implementation Quality).

pie title "Repository Forensic Health Distribution"
    "Security (Clean Record)" : 95
    "Performance (Velocity)" : 80
    "Scalability (Expansion)" : 70
    "Readability (Legibility)" : 60
    "Logic (Complexity)" : 85
Loading

🏎️ Vault Retrieval Velocity

Proof of our 16.7x speedup via the Atlas Forensic Vault caching layer.

xychart-beta
    title "Latency: Cold Request vs. Vault Retrieval"
    x-axis ["GitHub Fetch", "Gemini Analysis", "ElevenLabs TTS", "Atlas Vault Read"]
    y-axis "Latency (ms)" 0 --> 5000
    line [4500, 3200, 2500, 30]
Loading

🎭 Narrative Styles

graph LR
    A[🎬 Select Style] --> B[🕵️ True Crime]
    A --> C[⚽ Sports]
    A --> D[🦁 Documentary]
    
    B --> E["Detective Voice<br/>Film Noir"]
    C --> F["Dual Commentators<br/>Play-by-Play"]
    D --> G["Attenborough Style<br/>Nature Doc"]
    
    E --> H[🎙️ Generate Podcast]
    F --> H
    G --> H
Loading

🔧 Tech Stack :

Category Technologies
Frontend Next.js React TypeScript Tailwind CSS
Animation / UI Framer Motion shadcn/ui
Database MongoDB Atlas
AI Services Gemini ElevenLabs
Deployment Vercel Cloudflare

📦 Detailed Stack :

Layer Technology Purpose
Frontend Next.js 16, React 19, TypeScript Server-side rendering, type safety
Styling Tailwind CSS 4, Framer Motion Responsive design, animations
3D Graphics Three.js, React Three Fiber Immersive UI elements
Database MongoDB Atlas Document storage, vector search
AI - Script Google Gemini 2.5 Flash Codebase analysis, script generation
AI - Voice ElevenLabs Multilingual v2 High-quality text-to-speech
Security Cloudflare Workers DDoS protection, edge caching
Hosting Vercel (Pro) Serverless deployment, 300s timeout
API GitHub REST API Repository data fetching

✨ Key Features :

Feature Description
🎙️ AI Code Narration GitHub repo → AI podcast
🎛️ Retro Audio Player Reel animations · Vintage UI
📜 Live Transcript Real-time sync · Click-to-seek
🔍 MongoDB Atlas Vector Search · Change Streams
📄 Export Reports Redacted · Classified

📊 Performance Mathematics :

🚀 Audio Streaming Optimization

Problem: Users waiting for entire podcast generation before playback.

Our Solution: Chunked streaming with MongoDB GridFS

Let $T_{\text{total}}$ = total generation time and $T_{\text{first}}$ = time to first playback

Traditional approach:

$$T_{\text{wait}} = T_{\text{total}} = 180\text{s}$$

Our chunked approach:

$$T_{\text{wait}} = T_{\text{first}} = 30\text{s}$$

Perceived speedup:

$$\text{Speedup Factor} = \frac{T_{\text{total}}}{T_{\text{first}}} = \frac{180}{30} = 6\times \text{ faster}$$

📡 MongoDB Change Streams Efficiency

For a typical 3-minute podcast generation with polling every 2 seconds:

Traditional Polling:

$$N_{\text{requests}} = \frac{180\text{s}}{2\text{s/request}} = 90 \text{ requests}$$

With Change Streams:

$$N_{\text{updates}} = 4 \text{ (at 25\%, 50\%, 75\%, 100\%)}$$

Bandwidth Reduction:

$$\text{Efficiency Gain} = \left(1 - \frac{N_{\text{updates}}}{N_{\text{requests}}}\right) \times 100\% = \left(1 - \frac{4}{90}\right) \times 100\% = 95.6\%$$

Network Traffic Saved:

Assuming average request size $S_{\text{req}} = 2\text{KB}$:

$$\text{Traffic}_{\text{polling}} = 90 \times 2\text{KB} = 180\text{KB}$$ $$\text{Traffic}_{\text{streams}} = 4 \times 2\text{KB} = 8\text{KB}$$ $$\text{Savings} = 180 - 8 = 172\text{KB per generation}$$

For 1000 users per day:

$$\text{Daily Savings} = 172\text{KB} \times 1000 = 172\text{MB/day} = 5.2\text{GB/month}$$

💰 Cost Optimization with MongoDB Caching

Without caching, for $N$ identical requests:

$$\text{Cost}_{\text{uncached}} = N \times C_{\text{api}}$$

With MongoDB caching (cache hit rate $h = 0.85$):

$$\text{Cost}_{\text{cached}} = N \times [(1-h) \times C_{\text{api}} + h \times C_{\text{db}}]$$

Where $C_{\text{db}} \ll C_{\text{api}}$ (database reads are ~100x cheaper than API calls)

$$\text{Cost}_{\text{cached}} \approx N \times 0.15 \times C_{\text{api}}$$

Savings:

$$\text{Cost Reduction} = \frac{\text{Cost}_{\text{uncached}} - \text{Cost}_{\text{cached}}}{\text{Cost}_{\text{uncached}}} \times 100\% = 85\%$$

Real numbers from our testing:

  • Gemini API: $0.10 per 1M tokens → ~$0.02 per analysis
  • MongoDB read: $0.001 per analysis
  • Cache hit rate: 87% after first week
$$\text{Monthly Savings (10K analyses)} = 10000 \times 0.87 \times (\$0.02 - \$0.001) = \$165$$

🔍 Vector Search Performance

Using cosine similarity between query vector $\vec{q}$ and document vector $\vec{d}$:

$$\text{similarity}(\vec{q}, \vec{d}) = \frac{\vec{q} \cdot \vec{d}}{|\vec{q}| \cdot |\vec{d}|} = \frac{\sum_{i=1}^{1536} q_i \times d_i}{\sqrt{\sum_{i=1}^{1536} q_i^2} \times \sqrt{\sum_{i=1}^{1536} d_i^2}}$$

Performance Analysis:

Brute force comparison with $N$ documents:

$$\text{Time Complexity}_{\text{brute}} = O(N \times d)$$

where $d = 1536$ dimensions

MongoDB Atlas Vector Search (using HNSW index):

$$\text{Time Complexity}_{\text{vector}} = O(\log N \times d)$$

Speedup for 10,000 repositories:

$$\text{Speedup} = \frac{O(10000 \times 1536)}{O(\log_2(10000) \times 1536)} \approx \frac{10000}{13.3} \approx 752\times$$

Result: Recommendations in <100ms even with thousands of repos in the database.

💾 GridFS Memory Efficiency

For an audio file of size $S$ bytes with chunk size $C = 255\text{KB}$:

Traditional approach (load entire file):

$$\text{Memory}_{\text{traditional}} = S$$

GridFS streaming (load only current chunk):

$$\text{Memory}_{\text{GridFS}} = C$$

Memory savings for 10MB file:

$$\text{Reduction} = \frac{S - C}{S} \times 100\% = \frac{10\text{MB} - 255\text{KB}}{10\text{MB}} \times 100\% = 97.5\%$$

Concurrent user scalability:

With $N$ concurrent users streaming audio:

$$\text{RAM}_{\text{traditional}} = N \times S = 100 \times 10\text{MB} = 1\text{GB}$$ $$\text{RAM}_{\text{GridFS}} = N \times C = 100 \times 255\text{KB} = 25\text{MB}$$

Result: Support 40x more concurrent users with the same server resources.

⚡ Cloudflare CDN Performance

Without edge caching:

$$\text{Latency}_{\text{origin}} = 200-500\text{ms (database query + transfer)}$$

With Cloudflare CDN:

$$\text{Latency}_{\text{edge}} = 20-50\text{ms (edge cache hit)}$$

Performance improvement:

$$\text{Speedup} = \frac{500\text{ms}}{30\text{ms}} \approx 16.7 \times$$

Bandwidth Cost Optimization:

Monthly bandwidth without CDN (1,000 podcasts × 10MB × 100 plays):

$$\text{Bandwidth}_{\text{origin}} = 1000 \times 10\text{MB} \times 100 = 1\text{TB}$$

With Cloudflare CDN (95% cache hit rate):

$$\text{Bandwidth}_{\text{origin}} = 1\text{TB} \times 0.05 = 50\text{GB}$$

Cost savings: 95% reduction in origin bandwidth costs

🚀 Getting Started :

Spin up Atlas Forensic Vault locally in minutes.

🧰 Requirements :

Ensure the following are installed and ready

  • Node.js ≥ 18 (LTS recommended)
  • MongoDB Atlas cluster (free tier works)
  • API Keys
    • Google Gemini
    • ElevenLabs (Text-to-Speech)
  • (Optional) GitHub token for higher API rate limits

📦 Project Setup :

Clone the repository and install dependencies

git clone https://github.com/SoumyaEXE/Atlas-Forensic-Vault.git
cd Atlas-Forensic-Vault
npm install

🔐 Environment Configuration :

Create a local environment file

cp .env.example .env.local

Add the required keys:

🥬 MongoDB Atlas :

MONGODB_URI=mongodb+srv://<username>:<password>@cluster.mongodb.net/atlas_forensic_vault

🤖 AI Services :

GEMINI_API_KEY=your_gemini_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key

✒️ GitHub (optional – improves rate limits)

GITHUB_TOKEN=your_github_token

▶️ Run the App :

Start the development server -

npm run dev

The app will boot with hot reload enabled.

🌐 Access the Application :

Open in your browser -

http://localhost:3000

You're ready to investigate repositories. 🕵️

🏆 Hackathon Highlights :

Focus Area What We Delivered
🍃 MongoDB Atlas Excellence Vector Search · Change Streams · Flexible Schema · GridFS
💡 Product Innovation Code-to-podcast experience with Film Noir narrative
🧠 AI-First Architecture Gemini for deep analysis · ElevenLabs for narration
🔒 Security & Performance Cloudflare DDoS protection · Edge caching · IP filtering
🚀 Production Readiness Fully deployed, live, and scalable on Vercel
🛠️ Developer Impact Faster onboarding and deeper code understanding

<--

👥 Team LowEndCorp. Members :

👨‍💻 Soumya 👨‍💻 Subarna 👨‍💻 Saikat 👨‍💻 Sourish
Full Stack Developer Android Developer DevOps Engineer Competitive Programmer
GitHub GitHub GitHub GitHub

-->

"🕵️ Case Closed."
Built with ❤️ for Hackers!

MongoDB Atlas Cloudflare ElevenLabs Google Gemini

About

Your Open Source Repo Investigator. Helps you analyze repositories you’re interested in contributing to by surfacing structure, intent, and contribution-ready insights.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages