Research on Codebase Architectures for AI Coding Tools

Abstract

This research investigates the impact of different architectural approaches on the effectiveness of code generation using artificial intelligence tools. The study addresses the question: "How can we design codebases to be optimal for AI coding tools?" An experimental comparison of four architectural patterns was conducted using eight key performance metrics.

1. Introduction

1.1 Research Motivation

With the advancement of AI coding tools, there is a growing need to reconsider approaches to software architecture design. Traditional architectural patterns may not be optimal for AI-assisted development, as AI tools have different requirements for context understanding and code navigation compared to human developers.

1.2 Problem Statement

The primary research questions addressed in this study are:

How can we design codebases to optimize code generation by AI agents?
Does architecture really matter for AI coding tools? Is this problem worth solving?

1.3 Key Components of AI Code Generation

The success of AI code generation depends on four fundamental components:

Prompt - The model has no built-in understanding of tasks; it only generates the most probable continuation for the input text
Model - Determines the probabilities and quality of generated responses
Context - All data available to the model beyond the manual prompt (primarily the codebase)
Tools - Functions that enable LLM to expand its context (MCP, agent functions)

This research focuses specifically on context optimization through codebase architecture design.

1.4 Research Hypotheses

The study is based on the following hypotheses:

Codebases must be understandable not only to humans but also to AI
Token-efficient codebases are crucial for AI tools performance
Managing context effectively leads to better AI-generated results
Different architectural patterns will show varying performance with AI tools

2. Literature Review and Architectural Patterns

2.1 Atomic Composable Architecture (ACA)

Atomic Composable Architecture borrows metaphors from Brad Frost's Atomic Design (atoms → molecules → organisms) and applies them to application code organization.

Core Principle: Build complex systems from predictably simple components.

Level	Content	Approximate Size
Atom	1 function/class/constant, no dependencies	5-50 LOC
Molecule	Small modules with several atoms + tests	50-300 LOC
Organism	Complete subsystems (services, CLI utilities, jobs)	300-1500 LOC

Advantages:

High modularity and reusable components
Easy to test individual components
Good scalability when functionality grows
Simple pattern for AI tools to follow

Disadvantages:

Chain reaction problem when modifying low-level abstractions
Requires discipline to maintain clear dependencies
Chain problems lead to larger context windows

2.2 Layered Architecture

A classic application design pattern where the system is divided into clearly defined layers. Each layer has narrowly defined responsibilities and interacts only with adjacent layers (above and below).

Core Principle: Clear separation of responsibilities into levels to minimize impact of changes in one layer on others.

Typical structure:

User interface details (UI, API) - top layer
Domain rules (business logic) - middle layer
Data access and external services - bottom layer

Advantages:

Well-established pattern familiar to both engineers and AI
Clear separation of concerns and abstractions
Well-understood responsibility boundaries
LLMs have seen this pattern frequently in training data

Disadvantages:

AI tools must operate across multiple levels
Requires importing extensive context for database operations

2.3 Vertical Slice Architecture

Each feature has its own separate directory with minimal coupling between modules. Each "slice" includes everything needed for one business feature - from HTTP endpoints to data access and tests.

Core Principle: "One feature, one folder, one dependency graph."

Advantages:

Code organized by features rather than technical abstractions
Ability to set context with a single prompt (read path to directory)
Minimal shared dependencies / loosely coupled modules
Easy to test and maintain individual features

Disadvantages:

Code duplication across features
Cannot create shared utility modules
May lead to inconsistencies across features

2.4 Pipeline Architecture

Suitable for data processing and sequential transformations. Data flows through a series of processing stages, each performing specific transformations.

Core Principle: Sequential data processing through clearly defined stages.

Advantages:

Excellent for stream processing
LLMs work well with explicit types and transformations
Easy to add/remove/reorder processing steps

Disadvantages:

Not suitable for interactive applications
Can be inefficient for non-linear processes
Poor fit for event-driven systems

3. Methodology

3.1 Experimental Setup

Test Application: Snake game implementation Test Modification: Addition of randomly generated maze functionality LLM: Claude Sonnet 3.7 Code Generation Tool: RooCode Prompt Caching: Enabled Isolation: New functionality added from new chat (no context sharing) Sample Size: 5 runs for each architecture

3.2 Evaluation Metrics

Initial Generation Metrics:

One-shot generation success - Application works correctly on first attempt (binary)
Architecture adherence - Generated app follows specified architecture (binary)
Token consumption - Cached and non-cached tokens for initial generation
Context window size - Final context length after generation

Modification Metrics:

One-shot modification success - New feature works on first attempt (binary)
Architecture preservation - Architecture maintained after modification (binary)
Modification token consumption - Tokens used for feature addition
Final context window size - Context length after modification

3.3 Test Prompts

Generation Prompt: "Generate application based on description from @/snake_game.md designed as described in @/architecture.md. Run the app when you completed implementation."

Modification Prompt: "Generate a maze (labyrinth) for every new game. If the user hits a wall, the game is over. Keep the architecture as described in @/architecture.md"

4. Results

4.1 Atomic Composable Architecture Results

Metric	Average Result
One-shot generation success	5/5 (100%)
Architecture adherence (initial)	5/5 (100%)
Initial token consumption	25.88k ↑ / 340.6k ↓
Initial context length	27.02k
One-shot modification success	3/5 (60%)
Architecture adherence (modification)	1/5 (20%)
Modification token consumption	26.16k ↑ / 508.6k ↓
Final context length	35.36k

4.2 Layered Architecture Results

Metric	Average Result
One-shot generation success	5/5 (100%)
Architecture adherence (initial)	5/5 (100%)
Initial token consumption	16.22k ↑ / 248k ↓
Initial context length	25.66k
One-shot modification success	5/5 (100%)
Architecture adherence (modification)	5/5 (100%)
Modification token consumption	22.6k ↑ / 380.6k ↓
Final context length	31.8k

4.3 Vertical Slice Architecture Results

Metric	Average Result
One-shot generation success	3/5 (60%)
Architecture adherence (initial)	5/5 (100%)
Initial token consumption	18.8k ↑ / 443.6k ↓
Initial context length	27.4k
One-shot modification success	3/3 (100%)*
Architecture adherence (modification)	3/3 (100%)*
Modification token consumption	19.07k ↑ / 379.33k ↓
Final context length	30.03k

*Note: Only successful initial generations were tested for modifications

4.4 Pipeline Architecture Results

Metric	Average Result
One-shot generation success	1/5 (20%)
Architecture adherence (initial)	5/5 (100%)
Initial token consumption	25.6k ↑ / 210k ↓
Initial context length	25.7k
One-shot modification success	1/1 (100%)*
Architecture adherence (modification)	1/1 (100%)*
Modification token consumption	18.2k ↑ / 299.5k ↓
Final context length	29.3k

*Note: Only one successful initial generation was achieved

5. Analysis and Discussion

5.1 Performance Ranking

Based on overall performance across all metrics:

Layered Architecture - Clear winner with 100% success rates and optimal token efficiency
Atomic Composable Architecture - Good for initial generation, struggles with modifications
Vertical Slice Architecture - Inconsistent initial generation, but strong modification performance
Pipeline Architecture - Poor fit for interactive applications

5.2 Key Findings

Token Efficiency

Layered Architecture demonstrated the best token efficiency for initial generation (16.22k ↑ / 248k ↓), indicating that familiar, well-established patterns require less computational overhead.

Architecture Preservation

Only Layered Architecture maintained perfect architecture adherence during modifications (5/5), while Atomic Composable Architecture showed significant degradation (1/5).

Chain Reaction Problem

Atomic Composable Architecture suffered from the "chain reaction" problem - modifying low-level atoms required changes in molecules and organisms, leading to larger context windows and architectural drift.

Pattern Familiarity

Well-established patterns (Layered) performed better than newer or less common patterns, suggesting that LLM training data distribution affects performance.

5.3 Subjective Observations

During testing, Layered Architecture not only performed better quantitatively but also generated more sensible maze layouts, suggesting that familiar architectural patterns may help AI understand problem semantics better.

6. Implications and Recommendations

6.1 Does Architecture Really Matter?

Short Answer: Yes, architecture significantly impacts AI coding tool effectiveness.

Detailed Analysis:

Current State: Good architecture simplifies context management for both developers and AI
Future Outlook: As LLM capabilities evolve, architecture may become less critical
Persistent Importance: Precise context management will remain important
Resource Efficiency: Well-structured code is cost-effective in terms of time, tokens, and money
Application Dependency: Architecture importance varies by application type

6.2 Architecture Selection Guidelines

Architecture	Best Use Cases
Layered	Applications with clear UI/Logic/Data separation (MVC pattern)
Vertical Slice	Applications with independent features
Atomic Composable	Rich functionality that composes in different ways
Pipeline	Sequential data processing and transformation tasks

6.3 Best Practices for AI-Optimized Codebases

Prioritize Familiar Patterns: Use well-established architectural patterns that LLMs have seen frequently
Optimize Token Efficiency: Design codebases that require minimal context for AI understanding
Maintain Clear Boundaries: Ensure architectural layers and components have well-defined responsibilities
Consider Chain Effects: Be aware of how changes propagate through the architecture
Context Management: Design for effective context window utilization

7. Limitations and Future Work

7.1 Study Limitations

Limited to one test application (Snake game)
Single LLM model tested (Claude Sonnet 3.7)
Small sample size (5 runs per architecture)
Specific tool ecosystem (RooCode)
Binary success metrics may not capture nuanced performance differences

7.2 Future Research Directions

Broader Application Testing: Test across different types of applications and domains
Multiple LLM Comparison: Evaluate performance across different language models
Longitudinal Studies: Assess architecture performance over extended development cycles
Hybrid Architectures: Explore combinations of architectural patterns
Dynamic Architecture Adaptation: Investigate architectures that adapt based on AI tool feedback

8. Conclusion

This research demonstrates that codebase architecture significantly impacts the effectiveness of AI coding tools. Layered Architecture emerged as the most effective pattern for AI-assisted development, showing superior performance in both initial generation and modification tasks while maintaining optimal token efficiency.

The study confirms that designing codebases with AI tools in mind is not only worthwhile but essential for maximizing the benefits of AI-assisted development. As AI coding tools continue to evolve, understanding and optimizing the relationship between architecture and AI performance will become increasingly important for software development teams.

The key insight is that context management through architectural design is crucial for AI coding effectiveness. Teams should consider AI tool requirements alongside traditional architectural considerations when designing system architecture.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
atomic-composable-architecture		atomic-composable-architecture
layered-architecture		layered-architecture
pipeline-architechture		pipeline-architechture
vertical-slice-architecture		vertical-slice-architecture
.gitignore		.gitignore
README.md		README.md

ipospelov/vibe-coding-architectures

Folders and files

Latest commit

History

Repository files navigation

Research on Codebase Architectures for AI Coding Tools

Abstract

1. Introduction

1.1 Research Motivation

1.2 Problem Statement

1.3 Key Components of AI Code Generation

1.4 Research Hypotheses

2. Literature Review and Architectural Patterns

2.1 Atomic Composable Architecture (ACA)

2.2 Layered Architecture

2.3 Vertical Slice Architecture

2.4 Pipeline Architecture

3. Methodology

3.1 Experimental Setup

3.2 Evaluation Metrics

Initial Generation Metrics:

Modification Metrics:

3.3 Test Prompts

4. Results

4.1 Atomic Composable Architecture Results

4.2 Layered Architecture Results

4.3 Vertical Slice Architecture Results

4.4 Pipeline Architecture Results

5. Analysis and Discussion

5.1 Performance Ranking

5.2 Key Findings

Token Efficiency

Architecture Preservation

Chain Reaction Problem

Pattern Familiarity

5.3 Subjective Observations

6. Implications and Recommendations

6.1 Does Architecture Really Matter?

6.2 Architecture Selection Guidelines

6.3 Best Practices for AI-Optimized Codebases

7. Limitations and Future Work

7.1 Study Limitations

7.2 Future Research Directions

8. Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages