Skip to content

KunjShah01/codebase-oracle

Repository files navigation

Codebase Swarm

Codebase Swarm Banner

A Multi-Agent AI System for Comprehensive Code Analysis

Python 3.8+ License: MIT Streamlit OpenAI

Quick Start β€’ Features β€’ Architecture β€’ Usage Examples β€’ Contributing


Table of Contents

Why I Created This

As a developer, I was tired of juggling 10 different tools to understand my codebase:

Security: Bandit, Snyk, Semgrep

Performance: Profilers, linters, manual code review

Testing: Coverage.py, pytest, mutation testing

Architecture: Graphviz, manual tracing, whiteboard sessions

Refactoring: IDE hints, gut feelings, Stack Overflow

Each tool gave me fragmented insights, but none understood the big picture. I wanted something that could:

Think holistically about my codebase like a senior architect

Connect the dots between security, performance, and design

Generate actual fixes, not just warnings

Learn and adapt to my team's specific patterns

So I built Codebase Swarmβ€”a team of AI agents that collaborate to give you complete codebase intelligence in one place.

What is Codebase Swarm?

Codebase Swarm is a multi-agent AI system that analyzes your codebase using specialized AI agents working together. Think of it as hiring a team of expert consultants (security auditor, performance engineer, test architect, etc.) who:

Collaborate to solve complex problems Specialize in their domain but understand the big picture Generate actionable fixes with proof-of-concept exploits Visualize your architecture in real-time Predict issues before they hit production Unlike traditional static analysis tools, Codebase Swarm uses LLMs + AST parsing + Graph analysis to understand intent, not just syntax.

Problems It Solves

  1. "I have 50 security warnings, which ones actually matter?" Problem: Traditional tools flood you with false positives.

Solution: Security Agent generates proof-of-concept exploits and ranks by actual risk, not just pattern matching.

2."Will this code scale to 1000 RPS?" Problem: Performance issues only appear in production.

Solution: Performance Agent simulates load and predicts bottlenecks with estimated RPS limits.

3."What tests should I write?" Problem: 40% test coverage, but which 40% matters?

Solution: Tester Agent maps critical paths and generates targeted tests for untested error handling.

4."If I change this function, what breaks?" Problem: Fear of refactoring due to unknown dependencies.

Solution: Architect Agent builds a call graph and shows exact impact of changes.

5."How do I fix this vulnerability?" Problem: Tools tell you what's wrong, but not how to fix it.

Solution: Refactorer Agent generates ready-to-apply patches with before/after code.

Key Features

Feature Description Impact
πŸ”’ Security Agent Finds SQLi, XSS, hardcoded secrets with exploits Prevents breaches before deployment
⚑ Performance Agent Predicts RPS limits, detects N+1 queries, blocking calls Scales confidently
πŸ§ͺ Tester Agent Identifies coverage gaps, generates missing tests Reaches 80%+ coverage efficiently
πŸ—οΈ Architect Agent Maps call graphs, detects circular dependencies Refactors safely
πŸ”§ Refactorer Agent Auto-generates patches for all issues Fixes in minutes, not hours
πŸ•ΈοΈ Interactive Graphs D3.js call graph with clickable nodes Visualize architecture
πŸ“Š Risk Scoring 0-10 risk scores per category Prioritize work
πŸ“ Git Integration Analyzes commit history, generates patches Seamless workflow
🎯 Custom Rules YAML-based architecture rules Enforce team standards
🌐 Multi-language Python + extendable to JS/TS, Go, Rust Polyglot support

Architecture

Supported Languages

Codebase Swarm includes built-in parsers and scaffolding for multiple languages. Current lightweight supported languages:

  • Python (fully implemented AST parsing)
  • JavaScript / JSX (heuristic parser stub; recommend integrating tree-sitter or esprima for production)
  • Go (heuristic parser stub)

Broad language support

The project now includes a generic, heuristic parser that provides basic coverage across many languages (Java, Kotlin, C#, PHP, Ruby, Rust, C/C++, Swift, Scala, Perl, and more). Heuristic parsers can detect simple function and class declarations but are not a substitute for full AST-based parsing.

For production-grade, accurate parsing across all languages, integrate Tree-sitter or language-specific AST tools. We provide a clear hook: add a parser under swarm/tools/parsers/ implementing the BaseParser interface and call register_parser('<language>', parser_instance).

Optional: to enable Tree-sitter parsing, install a Python tree-sitter package and configure compiled language libraries. Example (not included):

pip install tree_sitter
# then build language bundles per Tree-sitter docs

Tree-sitter scaffold

The repository now includes a Tree-sitter integration scaffold at swarm/tools/parsers/tree_sitter_parser.py.

  • It will register a tree_sitter parser automatically if the tree_sitter Python package is installed and a compiled languages bundle is available (see TREE_SITTER_LANG_DIR env var or vendor/tree_sitter_languages.so).
  • The scaffold is intentionally minimal β€” extend it to load specific Language objects and map file extensions to those languages for accurate AST queries.

Example quick-start:

pip install tree_sitter
# Build a combined language bundle (see Tree-sitter docs) and export:
export TREE_SITTER_LANG_DIR=/path/to/compiled/bundle

# Then run the analyzer; CodeParser will prefer tree-sitter when available.

To add a new language parser, implement swarm/tools/parsers/<your_parser>.py following the BaseParser interface and register it via swarm/tools/parsers/__init__.py using register_parser(name, parser_instance).

graph TD
    A[User CLI/Streamlit] --> B[Swarm Orchestrator];
    B --> C[Architect Agent];
    B --> D[Security Agent];
    B --> E[Performance Agent];
    B --> F[Tester Agent];
    B --> G[Refactorer Agent];
    
    C --> H[Call Graph Builder];
    C --> I[Import Analyzer];
    
    D --> J[Security Scanner];
    D --> K[Exploit Generator];
    
    E --> L[Performance Analyzer];
    E --> M[Complexity Profiler];
    
    F --> N[Test Generator];
    F --> O[Coverage Analyzer];
    
    G --> P[Patch Generator];
    G --> Q[AST Transformer];
    
    H --> R[Shared State];
    J --> R;
    L --> R;
    N --> R;
    P --> R;
    
    R --> S[Final Report];
    P --> T[fixes.patch];
    
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style B fill:#bbf,stroke:#333,stroke-width:2px
    style S fill:#bfb,stroke:#333,stroke-width:2px
Loading

Installation

Option 1: From PyPI (Recommended)

pip install codebase-swarm

Option 2: From Source

git clone https://github.com/KunjShah01/codebase-swarm.git
cd codebase-swarm
pip install -r requirements.txt
python setup.py install

Option 3: Docker

docker pull KunjShah01/codebase-swarm:latest
docker run -v $(pwd):/code KunjShah01/codebase-swarm /code

Quick Start

30-Second Test

# Run on the included sample project
swarm examples/sample_project --output report.md

# Apply fixes automatically
swarm examples/sample_project --apply-fixes

2-Minute Deep Dive

# Interactive CLI with beautiful UI
swarm --mode interactive

# Or launch Streamlit dashboard
streamlit run streamlit_app.py

Usage Examples

CLI Mode

# Basic analysis
swarm /path/to/your/project

# Save report
swarm /path/to/project -o security_report.md

# Security only
swarm /path/to/project --agents security

# Auto-fix critical issues
swarm /path/to/project --apply-fixes --severity critical

Streamlit Mode

# Launch web interface
streamlit run streamlit_app.py -- --target /path/to/project

Python API

from swarm.orchestrator import SwarmOrchestrator
from swarm.models import Task

# Initialize
orchestrator = SwarmOrchestrator()

# Create task
task = Task(
    description="Find security issues in auth module",
    target="src/auth.py"
)

# Run analysis
result = orchestrator.solve(task)

# Access results
print(f"Found {len(result['security']['vulnerabilities'])} vulnerabilities")
print(f"Generated {len(result['fixes']['fixes'])} fixes")

The Agents

πŸ”’ Security Agent

Expertise: OWASP Top 10, cryptography, secure coding
Tools: Static analyzer, pattern matcher, exploit generator
Output: CVE-style reports with PoC exploits

# Example: Finds and exploits SQL injection
vulnerability = {
    "type": "SQL Injection",
    "cwe": "CWE-89",
    "severity": "critical",
    "exploit": {
        "payload": "' OR '1'='1'; DROP TABLE users; --",
        "impact": "Complete database compromise",
        "proof_of_concept": "python exploit.py --url http://target.com/login"
    }
}

⚑ Performance Agent

Expertise: Algorithms, concurrency, database optimization
Tools: Profiler, complexity analyzer, load simulator
Output: RPS predictions with optimization patches

# Example: Predicts scaling limit
prediction = {
    "estimated_rps": 500,
    "bottleneck_at": "3 critical bottlenecks",
    "failure_mode": "Database connection pool exhaustion",
    "scaling_limit": "Will fail at ~1000 RPS"
}

πŸ§ͺ Tester Agent

Expertise: TDD, pytest, test doubles
Tools: Coverage analyzer, test generator, mutation tester
Output: Missing tests with 80%+ coverage path

# Example: Generates missing test
generated_test = """
def test_process_payment_raises_on_invalid_amount():
    with pytest.raises(ValueError):
        process_payment(user_id=1, amount=-100)
"""

πŸ—οΈ Architect Agent

Expertise: Design patterns, clean architecture, scalability
Tools: Call graph builder, import analyzer, dependency mapper

graph TD
    A[API Layer] --> B[Service Layer];
    B --> C[Database Layer];
    A -.-> C;  # Violation!
Loading

πŸ”§ Refactorer Agent

Expertise: Refactoring, code style, modern Python
Tools: AST transformer, code generator, patch applier
Output: Git-ready patches with before/after

- query = f"SELECT * FROM users WHERE id = {user_id}"
+ query = "SELECT * FROM users WHERE id = ?"
+ cursor.execute(query, (user_id,))

Sample Output

CLI Report

🐝 CODEBASE SWARM ANALYSIS
═══════════════════════════════════════════════════════════

πŸ—οΈ Architecture: 47 functions, 12 classes
πŸ”’ Security: 3 critical, 5 high, 2 medium vulnerabilities
⚑ Performance: 2 critical bottlenecks (estimated RPS: 500)
πŸ§ͺ Testing: 45% coverage, 8 test gaps identified
πŸ”§ Fixes: 18 auto-generated patches ready

🚨 CRITICAL ISSUES:
   β€’ SQL Injection in auth.py:42
   β€’ Blocking call in payment.py:67
   β€’ N+1 query in orders.py:23

πŸ“Š Risk Score: 7.2/10 (High Risk)

Streamlit Dashboard

diff --git a/src/auth.py b/src/auth.py
--- a/src/auth.py
+++ b/src/auth.py
@@ -42,7 +42,8 @@ def authenticate(username, password):
-    query = f"SELECT * FROM users WHERE username = '{username}'"
-    result = db.execute(query)
+    query = "SELECT * FROM users WHERE username = ?"
+    result = db.execute(query, (username,))
     
     if result:
         return User(**result)

Advanced Configuration

Custom Rules (swarm.yaml)

# Architecture rules
architecture:
  forbidden_patterns:
    - pattern: "api/.*\\.py"
      forbidden_imports: ["database", "models"]
      severity: "error"
  
  max_function_length: 50
  max_class_methods: 10

# Security rules
security:
  custom_vulnerabilities:
    - name: "Internal API Key"
      pattern: 'internal_api_key\s*=\s*["\'][^"\']+["\']'
      severity: "high"

# Performance thresholds
performance:
  max_complexity: 10
  min_rps_threshold: 1000

Environment Variables

export OPENAI_API_KEY="sk-..."
export SWARM_CONFIG="swarm.yaml"
export SWARM_OUTPUT_DIR="./reports"
export SWARM_AUTO_APPLY="false"

Roadmap

Q2 2025
GitHub Action - Automated PR comments
VS Code Extension - Real-time analysis in IDE
JavaScript/TypeScript Support - Full AST parsing
Enterprise SSO - SAML/OAuth integration

Q3 2025
Machine Learning Model - Custom bug prediction
Cloud Cost Estimator - AWS/GCP cost analysis
Interactive Playground - Fix vulnerabilities in-browser
Team Dashboard - Organization-wide metrics

Q4 2025
Self-Healing Mode - Auto-fix on commit
Multi-repo Analysis - Microservices architecture view
Custom Agent SDK - Build your own agents
Enterprise Edition - On-premise deployment

Contributing

We love contributions! Here's how to help:

Quick Start for Contributors

git clone https://github.com/KunjShah01/codebase-swarm.git
cd codebase-swarm
pip install -r requirements-dev.txt
pre-commit install

Adding a New Agent

# Create swarm/agents/custom_agent.py
from swarm.agents.base_agent import BaseAgent

class CustomAgent(BaseAgent):
    def execute(self, task, context):
        # Your logic here
        return {"custom_metric": 42}

Running Tests

pytest tests/ --cov=swarm --cov-report=html

License

MIT License - see LICENSE file for details.

Acknowledgments

OpenAI - GPT-4 for agent intelligence
Tree-sitter - Blazing-fast AST parsing
Rich - Beautiful CLI interfaces
Streamlit - Interactive dashboards
All contributors - Making this better every day

Support

Discord: Join our community
GitHub Issues: Report bugs
Documentation: Full docs
Email: kunjkshahdeveloper@gmail.com

About

Because your codebase deserves a team of experts

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages