Home

Welcome to mcpbr! 👋

The defacto standard for MCP server benchmarking

What is mcpbr?

mcpbr (Model Context Protocol Benchmark Runner) is a comprehensive framework for evaluating MCP servers against real-world software engineering benchmarks. It provides apples-to-apples comparisons between tool-assisted and baseline agent performance.

Quick Links

📚 Learning

Getting Started - Your first benchmark in 5 minutes
Core Concepts - Understanding mcpbr's architecture
Tutorials - Step-by-step guides for common workflows

🛠️ Building

Contributing Guide - How to contribute to mcpbr
Development Setup - Setting up your dev environment
Architecture Deep Dive - How mcpbr works internally

🚀 Using

Configuration Reference - All configuration options
CLI Reference - Complete command documentation
Benchmark Guide - Working with different benchmarks
Best Practices - Tips for effective benchmarking

🤝 Community

Code of Conduct - Our commitment to an inclusive community
FAQ - Frequently asked questions
Troubleshooting - Common issues and solutions

Why mcpbr?

The Problem

MCP servers promise to make LLMs better at coding tasks. But how do you prove it? How do you measure improvement objectively?

The Solution

mcpbr runs controlled experiments:

✅ Same model, same tasks, same environment
✅ Only variable: your MCP server
✅ Real GitHub issues from SWE-bench (not toy examples)
✅ Reproducible results via Docker containers

The Result

Hard numbers showing whether your MCP server improves agent performance.

Evaluation Results Summary
+-----------------+-----------+----------+
| Metric          | MCP Agent | Baseline |
+-----------------+-----------+----------+
| Resolved        | 8/25      | 5/25     |
| Resolution Rate | 32.0%     | 20.0%    |
+-----------------+-----------+----------+

Improvement: +60.0%

Supported Benchmarks

Benchmark	Type	Focus	Status
SWE-bench	Bug Fixing	Real GitHub issues requiring patches	✅ Stable
CyberGym	Security	PoC exploit generation for vulnerabilities	✅ Stable
TerminalBench	DevOps	Shell scripting and system administration	✅ Beta
MCPToolBench++	Tool Use	MCP-specific tool evaluation	🚧 Planned

Community

We're building the defacto standard for MCP server benchmarking!

💬 GitHub Discussions - Ask questions, share ideas
🐛 Issue Tracker - Report bugs, request features
📋 Roadmap - See what's coming next
🌟 200+ features planned for v1.0!

Ways to Contribute

🐛 Report Bugs - Found an issue? Open a bug report
💡 Suggest Features - Have an idea? Open a feature request
📝 Improve Docs - Help make our documentation better
🔧 Submit PRs - Check out good first issues
⭐ Star the Repo - Show your support!

Recent Updates

January 2026

✨ MCPToolBench++ integration planned - Comprehensive MCP tool use evaluation (see #223)
📊 Enhanced analytics coming - Tool coverage, error patterns, performance profiling
🚀 Multi-benchmark runs - Run multiple benchmarks in one command
🎯 Cross-benchmark analysis - Compare MCP effectiveness across different task types

December 2025

Added CyberGym and TerminalBench support
Introduced benchmark abstraction layer
Improved error handling and logging

Quick Start

# Install
pip install mcpbr

# Initialize configuration
mcpbr init

# Run your first benchmark
mcpbr run -c mcpbr.yaml -n 5 -v

See Getting Started for detailed instructions.

Contributors

This project exists thanks to all the people who contribute!

Want to see your name here? Check out our Contributing Guide!

License

MIT License - see LICENSE for details.

Built with ❤️ by the mcpbr community

GitHub • PyPI • Documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Welcome to mcpbr! 👋

What is mcpbr?

Quick Links

Why mcpbr?

The Problem

The Solution

The Result

Supported Benchmarks

Community

Ways to Contribute

Recent Updates

January 2026

December 2025

Quick Start

Contributors

License

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally