Skip to content

ipospelov/vibe-coding-architectures

Repository files navigation

Research on Codebase Architectures for AI Coding Tools

Abstract

This research investigates the impact of different architectural approaches on the effectiveness of code generation using artificial intelligence tools. The study addresses the question: "How can we design codebases to be optimal for AI coding tools?" An experimental comparison of four architectural patterns was conducted using eight key performance metrics.

1. Introduction

1.1 Research Motivation

With the advancement of AI coding tools, there is a growing need to reconsider approaches to software architecture design. Traditional architectural patterns may not be optimal for AI-assisted development, as AI tools have different requirements for context understanding and code navigation compared to human developers.

1.2 Problem Statement

The primary research questions addressed in this study are:

  1. How can we design codebases to optimize code generation by AI agents?
  2. Does architecture really matter for AI coding tools? Is this problem worth solving?

1.3 Key Components of AI Code Generation

The success of AI code generation depends on four fundamental components:

  1. Prompt - The model has no built-in understanding of tasks; it only generates the most probable continuation for the input text
  2. Model - Determines the probabilities and quality of generated responses
  3. Context - All data available to the model beyond the manual prompt (primarily the codebase)
  4. Tools - Functions that enable LLM to expand its context (MCP, agent functions)

This research focuses specifically on context optimization through codebase architecture design.

1.4 Research Hypotheses

The study is based on the following hypotheses:

  • Codebases must be understandable not only to humans but also to AI
  • Token-efficient codebases are crucial for AI tools performance
  • Managing context effectively leads to better AI-generated results
  • Different architectural patterns will show varying performance with AI tools

2. Literature Review and Architectural Patterns

2.1 Atomic Composable Architecture (ACA)

Atomic Composable Architecture borrows metaphors from Brad Frost's Atomic Design (atoms → molecules → organisms) and applies them to application code organization.

Core Principle: Build complex systems from predictably simple components.

Level Content Approximate Size
Atom 1 function/class/constant, no dependencies 5-50 LOC
Molecule Small modules with several atoms + tests 50-300 LOC
Organism Complete subsystems (services, CLI utilities, jobs) 300-1500 LOC

Advantages:

  • High modularity and reusable components
  • Easy to test individual components
  • Good scalability when functionality grows
  • Simple pattern for AI tools to follow

Disadvantages:

  • Chain reaction problem when modifying low-level abstractions
  • Requires discipline to maintain clear dependencies
  • Chain problems lead to larger context windows

2.2 Layered Architecture

A classic application design pattern where the system is divided into clearly defined layers. Each layer has narrowly defined responsibilities and interacts only with adjacent layers (above and below).

Core Principle: Clear separation of responsibilities into levels to minimize impact of changes in one layer on others.

Typical structure:

  • User interface details (UI, API) - top layer
  • Domain rules (business logic) - middle layer
  • Data access and external services - bottom layer

Advantages:

  • Well-established pattern familiar to both engineers and AI
  • Clear separation of concerns and abstractions
  • Well-understood responsibility boundaries
  • LLMs have seen this pattern frequently in training data

Disadvantages:

  • AI tools must operate across multiple levels
  • Requires importing extensive context for database operations

2.3 Vertical Slice Architecture

Each feature has its own separate directory with minimal coupling between modules. Each "slice" includes everything needed for one business feature - from HTTP endpoints to data access and tests.

Core Principle: "One feature, one folder, one dependency graph."

Advantages:

  • Code organized by features rather than technical abstractions
  • Ability to set context with a single prompt (read path to directory)
  • Minimal shared dependencies / loosely coupled modules
  • Easy to test and maintain individual features

Disadvantages:

  • Code duplication across features
  • Cannot create shared utility modules
  • May lead to inconsistencies across features

2.4 Pipeline Architecture

Suitable for data processing and sequential transformations. Data flows through a series of processing stages, each performing specific transformations.

Core Principle: Sequential data processing through clearly defined stages.

Advantages:

  • Excellent for stream processing
  • LLMs work well with explicit types and transformations
  • Easy to add/remove/reorder processing steps

Disadvantages:

  • Not suitable for interactive applications
  • Can be inefficient for non-linear processes
  • Poor fit for event-driven systems

3. Methodology

3.1 Experimental Setup

Test Application: Snake game implementation Test Modification: Addition of randomly generated maze functionality LLM: Claude Sonnet 3.7 Code Generation Tool: RooCode Prompt Caching: Enabled Isolation: New functionality added from new chat (no context sharing) Sample Size: 5 runs for each architecture

3.2 Evaluation Metrics

Initial Generation Metrics:

  1. One-shot generation success - Application works correctly on first attempt (binary)
  2. Architecture adherence - Generated app follows specified architecture (binary)
  3. Token consumption - Cached and non-cached tokens for initial generation
  4. Context window size - Final context length after generation

Modification Metrics:

  1. One-shot modification success - New feature works on first attempt (binary)
  2. Architecture preservation - Architecture maintained after modification (binary)
  3. Modification token consumption - Tokens used for feature addition
  4. Final context window size - Context length after modification

3.3 Test Prompts

Generation Prompt: "Generate application based on description from @/snake_game.md designed as described in @/architecture.md. Run the app when you completed implementation."

Modification Prompt: "Generate a maze (labyrinth) for every new game. If the user hits a wall, the game is over. Keep the architecture as described in @/architecture.md"

4. Results

4.1 Atomic Composable Architecture Results

Metric Average Result
One-shot generation success 5/5 (100%)
Architecture adherence (initial) 5/5 (100%)
Initial token consumption 25.88k ↑ / 340.6k ↓
Initial context length 27.02k
One-shot modification success 3/5 (60%)
Architecture adherence (modification) 1/5 (20%)
Modification token consumption 26.16k ↑ / 508.6k ↓
Final context length 35.36k

4.2 Layered Architecture Results

Metric Average Result
One-shot generation success 5/5 (100%)
Architecture adherence (initial) 5/5 (100%)
Initial token consumption 16.22k ↑ / 248k ↓
Initial context length 25.66k
One-shot modification success 5/5 (100%)
Architecture adherence (modification) 5/5 (100%)
Modification token consumption 22.6k ↑ / 380.6k ↓
Final context length 31.8k

4.3 Vertical Slice Architecture Results

Metric Average Result
One-shot generation success 3/5 (60%)
Architecture adherence (initial) 5/5 (100%)
Initial token consumption 18.8k ↑ / 443.6k ↓
Initial context length 27.4k
One-shot modification success 3/3 (100%)*
Architecture adherence (modification) 3/3 (100%)*
Modification token consumption 19.07k ↑ / 379.33k ↓
Final context length 30.03k

*Note: Only successful initial generations were tested for modifications

4.4 Pipeline Architecture Results

Metric Average Result
One-shot generation success 1/5 (20%)
Architecture adherence (initial) 5/5 (100%)
Initial token consumption 25.6k ↑ / 210k ↓
Initial context length 25.7k
One-shot modification success 1/1 (100%)*
Architecture adherence (modification) 1/1 (100%)*
Modification token consumption 18.2k ↑ / 299.5k ↓
Final context length 29.3k

*Note: Only one successful initial generation was achieved

5. Analysis and Discussion

5.1 Performance Ranking

Based on overall performance across all metrics:

  1. Layered Architecture - Clear winner with 100% success rates and optimal token efficiency
  2. Atomic Composable Architecture - Good for initial generation, struggles with modifications
  3. Vertical Slice Architecture - Inconsistent initial generation, but strong modification performance
  4. Pipeline Architecture - Poor fit for interactive applications

5.2 Key Findings

Token Efficiency

Layered Architecture demonstrated the best token efficiency for initial generation (16.22k ↑ / 248k ↓), indicating that familiar, well-established patterns require less computational overhead.

Architecture Preservation

Only Layered Architecture maintained perfect architecture adherence during modifications (5/5), while Atomic Composable Architecture showed significant degradation (1/5).

Chain Reaction Problem

Atomic Composable Architecture suffered from the "chain reaction" problem - modifying low-level atoms required changes in molecules and organisms, leading to larger context windows and architectural drift.

Pattern Familiarity

Well-established patterns (Layered) performed better than newer or less common patterns, suggesting that LLM training data distribution affects performance.

5.3 Subjective Observations

During testing, Layered Architecture not only performed better quantitatively but also generated more sensible maze layouts, suggesting that familiar architectural patterns may help AI understand problem semantics better.

6. Implications and Recommendations

6.1 Does Architecture Really Matter?

Short Answer: Yes, architecture significantly impacts AI coding tool effectiveness.

Detailed Analysis:

  • Current State: Good architecture simplifies context management for both developers and AI
  • Future Outlook: As LLM capabilities evolve, architecture may become less critical
  • Persistent Importance: Precise context management will remain important
  • Resource Efficiency: Well-structured code is cost-effective in terms of time, tokens, and money
  • Application Dependency: Architecture importance varies by application type

6.2 Architecture Selection Guidelines

Architecture Best Use Cases
Layered Applications with clear UI/Logic/Data separation (MVC pattern)
Vertical Slice Applications with independent features
Atomic Composable Rich functionality that composes in different ways
Pipeline Sequential data processing and transformation tasks

6.3 Best Practices for AI-Optimized Codebases

  1. Prioritize Familiar Patterns: Use well-established architectural patterns that LLMs have seen frequently
  2. Optimize Token Efficiency: Design codebases that require minimal context for AI understanding
  3. Maintain Clear Boundaries: Ensure architectural layers and components have well-defined responsibilities
  4. Consider Chain Effects: Be aware of how changes propagate through the architecture
  5. Context Management: Design for effective context window utilization

7. Limitations and Future Work

7.1 Study Limitations

  • Limited to one test application (Snake game)
  • Single LLM model tested (Claude Sonnet 3.7)
  • Small sample size (5 runs per architecture)
  • Specific tool ecosystem (RooCode)
  • Binary success metrics may not capture nuanced performance differences

7.2 Future Research Directions

  1. Broader Application Testing: Test across different types of applications and domains
  2. Multiple LLM Comparison: Evaluate performance across different language models
  3. Longitudinal Studies: Assess architecture performance over extended development cycles
  4. Hybrid Architectures: Explore combinations of architectural patterns
  5. Dynamic Architecture Adaptation: Investigate architectures that adapt based on AI tool feedback

8. Conclusion

This research demonstrates that codebase architecture significantly impacts the effectiveness of AI coding tools. Layered Architecture emerged as the most effective pattern for AI-assisted development, showing superior performance in both initial generation and modification tasks while maintaining optimal token efficiency.

The study confirms that designing codebases with AI tools in mind is not only worthwhile but essential for maximizing the benefits of AI-assisted development. As AI coding tools continue to evolve, understanding and optimizing the relationship between architecture and AI performance will become increasingly important for software development teams.

The key insight is that context management through architectural design is crucial for AI coding effectiveness. Teams should consider AI tool requirements alongside traditional architectural considerations when designing system architecture.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published