vibecoding-arena

Evaluating the Capabilities of LLM-Assisted Coding Tools in Developing Hyper-Structures on the Hyperware stack.

Overview

This repository serves as a structured survey of various LLM-assisted coding tools, evaluating their effectiveness in achieving specific programming tasks. Each task is approached with different tools and prompting strategies to compare their capabilities and limitations.

Repository Structure

.
├── tasks/                    # Each subdirectory is a specific task to accomplish
│   ├── task1/               # e.g., "Build a REST API"
│   │   ├── README.md        # Task description, acceptance criteria, evaluation metrics
│   │   ├── attempts/        # Different attempts with various tools
│   │   │   ├── claude/      # Claude-specific attempt
│   │   │   │   ├── prompt.md    # Prompt used
│   │   │   │   ├── session.md   # Session transcript
│   │   │   │   └── result/      # Resulting code/solution
│   │   │   ├── windsurf/    # Windsurf-specific attempt
│   │   │   └── kibitz/      # Kibitz-specific attempt
│   │   └── evaluation.md    # Comparative analysis of different attempts
│   └── task2/
├── tools/                    # Documentation about each tool being tested
│   ├── claude.md            # Tool-specific setup, capabilities, 
│   
└── evaluation/              # Overall analysis and findings
    ├── metrics.md           # Evaluation criteria and scoring system
    └── results.md           # Comparative analysis across all tasks

Tools Being Evaluated

Tool	Website	Type
Claude Code	Link	CLI with MCP
Windsurf	Link	IDE
Kibitz	Link	Chat with MCP tools

Evaluation Metrics

Success Rate: Did the tool accomplish the task?
Accuracy: How close was the solution to the requirements?
Efficiency: Number of iterations/prompts needed
Code Quality: Readability, maintainability, best practices
Error Handling: How well does it handle edge cases?
Documentation: Quality of generated documentation

Contributing

To add a new task or tool evaluation:

Follow the directory structure outlined above
Include detailed prompts and session transcripts
Document any setup or environment requirements
Provide comprehensive evaluation based on the metrics

License

See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
evaluation		evaluation
resources		resources
tasks/chat-app		tasks/chat-app
utilities/process-lib		utilities/process-lib
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_task.sh		create_task.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vibecoding-arena

Overview

Repository Structure

Tools Being Evaluated

Evaluation Metrics

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

Uncentered-Systems/vibecoding-arena

Folders and files

Latest commit

History

Repository files navigation

vibecoding-arena

Overview

Repository Structure

Tools Being Evaluated

Evaluation Metrics

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages