Skip to content
CodeGraphTheory edited this page Jan 20, 2026 · 2 revisions

Welcome to mcpbr! πŸ‘‹

MCPBR Logo

The defacto standard for MCP server benchmarking

PyPI version Python 3.11+ License: MIT


What is mcpbr?

mcpbr (Model Context Protocol Benchmark Runner) is a comprehensive framework for evaluating MCP servers against real-world software engineering benchmarks. It provides apples-to-apples comparisons between tool-assisted and baseline agent performance.

Quick Links

πŸ“š Learning

πŸ› οΈ Building

πŸš€ Using

🀝 Community


Why mcpbr?

The Problem

MCP servers promise to make LLMs better at coding tasks. But how do you prove it? How do you measure improvement objectively?

The Solution

mcpbr runs controlled experiments:

  • βœ… Same model, same tasks, same environment
  • βœ… Only variable: your MCP server
  • βœ… Real GitHub issues from SWE-bench (not toy examples)
  • βœ… Reproducible results via Docker containers

The Result

Hard numbers showing whether your MCP server improves agent performance.

Evaluation Results Summary
+-----------------+-----------+----------+
| Metric          | MCP Agent | Baseline |
+-----------------+-----------+----------+
| Resolved        | 8/25      | 5/25     |
| Resolution Rate | 32.0%     | 20.0%    |
+-----------------+-----------+----------+

Improvement: +60.0%

Supported Benchmarks

Benchmark Type Focus Status
SWE-bench Bug Fixing Real GitHub issues requiring patches βœ… Stable
CyberGym Security PoC exploit generation for vulnerabilities βœ… Stable
TerminalBench DevOps Shell scripting and system administration βœ… Beta
MCPToolBench++ Tool Use MCP-specific tool evaluation 🚧 Planned

Community

We're building the defacto standard for MCP server benchmarking!

  • πŸ’¬ GitHub Discussions - Ask questions, share ideas
  • πŸ› Issue Tracker - Report bugs, request features
  • πŸ“‹ Roadmap - See what's coming next
  • 🌟 200+ features planned for v1.0!

Ways to Contribute

  1. πŸ› Report Bugs - Found an issue? Open a bug report
  2. πŸ’‘ Suggest Features - Have an idea? Open a feature request
  3. πŸ“ Improve Docs - Help make our documentation better
  4. πŸ”§ Submit PRs - Check out good first issues
  5. ⭐ Star the Repo - Show your support!

Recent Updates

January 2026

  • ✨ MCPToolBench++ integration planned - Comprehensive MCP tool use evaluation (see #223)
  • πŸ“Š Enhanced analytics coming - Tool coverage, error patterns, performance profiling
  • πŸš€ Multi-benchmark runs - Run multiple benchmarks in one command
  • 🎯 Cross-benchmark analysis - Compare MCP effectiveness across different task types

December 2025

  • Added CyberGym and TerminalBench support
  • Introduced benchmark abstraction layer
  • Improved error handling and logging

Quick Start

# Install
pip install mcpbr

# Initialize configuration
mcpbr init

# Run your first benchmark
mcpbr run -c mcpbr.yaml -n 5 -v

See Getting Started for detailed instructions.


Contributors

This project exists thanks to all the people who contribute!

Want to see your name here? Check out our Contributing Guide!


License

MIT License - see LICENSE for details.


Built with ❀️ by the mcpbr community

GitHub β€’ PyPI β€’ Documentation

Clone this wiki locally