Skip to content

feat: add MCP contract testing for distributed AI systems#21

Merged
hidai25 merged 3 commits intomainfrom
claude/eval-distributed-ai-systems-vV2v0
Feb 7, 2026
Merged

feat: add MCP contract testing for distributed AI systems#21
hidai25 merged 3 commits intomainfrom
claude/eval-distributed-ai-systems-vV2v0

Conversation

@hidai25
Copy link
Owner

@hidai25 hidai25 commented Feb 7, 2026

Summary

  • Adds MCP (Model Context Protocol) contract testing to detect external server interface drift
  • Includes contract diffing engine, MCP adapter, and CLI integration
  • Adds comprehensive test suite and documentation

Test plan

  • Unit tests added in tests/test_mcp_contracts.py (495 lines)
  • Manual testing of evalview CLI with MCP contract commands
  • Verify action.yml changes work in CI

🤖 Generated with Claude Code

…tection

Adds the ability to snapshot external MCP server tool definitions and detect
breaking changes (removed tools, new required params, type changes) before
running tests. This addresses Scenario 2 of distributed AI evaluation: when
you don't own the MCP server code.

New commands: evalview mcp snapshot/check/list/show/delete
New flag: evalview run --contracts --fail-on CONTRACT_DRIFT
New CI status: CONTRACT_DRIFT (joins REGRESSION, TOOLS_CHANGED, OUTPUT_CHANGED)

https://claude.ai/code/session_019CvwYcAoNoitWBdhEYUbxV
1. BUG: asyncio.run() nested inside run_in_executor in _run_async (already
   in an async context). Fixed to just await directly.

2. BUG: Contract check ran before fail_on resolved from config.yaml
   defaults, so config-based fail_on: [CONTRACT_DRIFT] was ignored.
   Moved contract check after config loading.

3. BUG: _discover_tools_http missing notifications/initialized after
   init (inconsistent with stdio path, violates MCP protocol).

4. BUG: _discover_tools_http didn't check init response for errors.

5. BUG: DiffStatus docstring said "Four states" but now has five.

Also fixed: unused Optional import, duplicate datetime imports,
variable shadowing (adapter/result in _run_async contract block).

https://claude.ai/code/session_019CvwYcAoNoitWBdhEYUbxV
… test gaps

1. action.yml: When both diff=true and contracts=true, --fail-on was
   appended twice. Refactored to set it once after both flags.

2. mcp_adapter: _discover_tools_http incremented _request_id before
   the notification. JSON-RPC notifications don't carry an id, so
   incrementing is semantically wrong (wastes an id for the next call).

3. Tests: Removed unused Path import. Added test for summary() with
   mixed breaking + informational changes. Added test for duplicate
   tool names in current_tools (edge case).

https://claude.ai/code/session_019CvwYcAoNoitWBdhEYUbxV
@hidai25 hidai25 merged commit 8c66d16 into main Feb 7, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants