From 2057e879a147d7a589f81267c2561b77998d8be3 Mon Sep 17 00:00:00 2001 From: Felix Weinberger Date: Fri, 13 Feb 2026 16:21:37 +0000 Subject: [PATCH 1/4] feat: add specVersions classification to conformance scenarios Each scenario declares which spec versions it applies to as a list. Scenarios that carry forward (e.g. initialize) list all applicable versions ['2025-06-18', '2025-11-25']. Scenarios removed from newer specs (e.g. backcompat auth) only list their original version ['2025-03-26']. - specVersions list on Scenario and ClientScenario interfaces - --spec-version CLI filter uses simple .includes() - Tier-check conformance matrix (Server / Client: Core / Client: Auth) with per-version columns and unique All* count - 7 unit tests for specVersions helpers - Updated tier-audit skill docs with matrix format --- .claude/skills/mcp-sdk-tier-audit/SKILL.md | 39 +++-- .../references/tier-requirements.md | 11 +- src/index.ts | 100 +++++++++++- src/scenarios/client/auth/basic-cimd.ts | 3 +- .../client/auth/client-credentials.ts | 9 +- .../client/auth/discovery-metadata.ts | 1 + .../client/auth/march-spec-backcompat.ts | 4 +- src/scenarios/client/auth/pre-registration.ts | 8 +- .../client/auth/resource-mismatch.ts | 3 +- src/scenarios/client/auth/scope-handling.ts | 7 +- .../client/auth/token-endpoint-auth.ts | 3 +- src/scenarios/client/elicitation-defaults.ts | 3 +- src/scenarios/client/initialize.ts | 8 +- src/scenarios/client/sse-retry.ts | 8 +- src/scenarios/client/tools_call.ts | 3 +- src/scenarios/index.ts | 33 +++- src/scenarios/server/dns-rebinding.ts | 3 +- src/scenarios/server/elicitation-defaults.ts | 3 +- src/scenarios/server/elicitation-enums.ts | 3 +- src/scenarios/server/json-schema-2020-12.ts | 3 +- src/scenarios/server/lifecycle.ts | 3 +- src/scenarios/server/prompts.ts | 7 +- src/scenarios/server/resources.ts | 8 +- src/scenarios/server/sse-multiple-streams.ts | 3 +- src/scenarios/server/sse-polling.ts | 3 +- src/scenarios/server/tools.ts | 13 +- src/scenarios/server/utils.ts | 5 +- src/scenarios/spec-version.test.ts | 94 ++++++++++++ .../checks/test-conformance-results.ts | 18 ++- src/tier-check/output.ts | 145 +++++++++++++++++- src/tier-check/types.ts | 3 + src/types.ts | 9 ++ 32 files changed, 509 insertions(+), 57 deletions(-) create mode 100644 src/scenarios/spec-version.test.ts diff --git a/.claude/skills/mcp-sdk-tier-audit/SKILL.md b/.claude/skills/mcp-sdk-tier-audit/SKILL.md index 234a257..382f637 100644 --- a/.claude/skills/mcp-sdk-tier-audit/SKILL.md +++ b/.claude/skills/mcp-sdk-tier-audit/SKILL.md @@ -66,7 +66,9 @@ npm run --silent tier-check -- \ If no client-cmd was detected, omit the `--client-cmd` flag (client conformance will be skipped). -The CLI output includes server conformance pass rate, client conformance pass rate, issue triage compliance, P0 resolution times, label taxonomy, stable release status, policy signal files, and spec tracking gap. Parse the JSON output to feed into Step 4. +The CLI output includes server conformance pass rate, client conformance pass rate (with per-spec-version breakdown), issue triage compliance, P0 resolution times, label taxonomy, stable release status, policy signal files, and spec tracking gap. Parse the JSON output to feed into Step 4. + +The conformance results now include a `specVersions` field on each detail entry, enabling per-version pass rate analysis. The `list` command also shows spec version tags: `node dist/index.js list` shows `[2025-06-18]`, `[2025-11-25]`, `[draft]`, or `[extension]` next to each scenario. ### Conformance Baseline Check @@ -143,17 +145,21 @@ If any Tier 2 requirement is not met, the SDK is Tier 3. - If GitHub issue labels are not set up per SEP-1730, triage metrics cannot be computed. Note this as a gap. However, repos may use GitHub's native issue types instead of type labels — the CLI checks for both. - If client conformance was skipped (no client command found), note this as a gap but do not block tier advancement based on it alone. -**Client Conformance Splits:** +**Conformance Breakdown:** + +The **full suite** pass rates (server total, client total) are used for tier threshold checks. To interpret them, present a single conformance matrix combining server and client results. Each detail entry in the tier-check JSON has a `specVersions` field; client category is derived from the scenario name (`auth/` prefix = Auth, everything else = Core). Server scenarios are all Core. -When reporting client conformance, always break results into three categories: +Example: -1. **Core suite** — Non-auth scenarios (e.g. initialize, tools_call, elicitation, sse-retry) -2. **Auth suite** — OAuth/authorization scenarios (any scenario starting with `auth/`) -3. **Full suite** — All scenarios combined +| | 2025-03-26 | 2025-06-18 | 2025-11-25 | draft | extension | All\* | +| ------------ | ---------- | ---------- | ---------- | ----- | --------- | ------------ | +| Server | — | 26/26 | 4/4 | — | — | 30/30 (100%) | +| Client: Core | — | 2/2 | 2/2 | — | — | 4/4 (100%) | +| Client: Auth | 0/2 | 3/3 | 6/11 | 0/1 | 0/2 | 9/19 (47%) | -The **full suite** number is used for tier threshold checks. However, the core vs auth split provides essential context. Always present both numbers in the report. +This immediately shows where failures concentrate. Failures clustered in Client: Auth / `2025-11-25` means "new auth features not yet implemented" — a scope gap, not a quality problem. Failures in Server or Client: Core are more concerning. -If the SDK has a `baseline.yml` or expected-failures file, note which failures are known/tracked vs. unexpected regressions. A low full-suite score where all failures are auth scenarios documented in the baseline is a scope gap (OAuth not yet implemented), not a quality problem — flag it accordingly in the assessment. +If the SDK has a `baseline.yml` or expected-failures file, cross-reference with the matrix to identify whether baselined failures cluster in a specific cell (e.g. all in `2025-11-25` / Client: Auth = scope gap). **P0 Label Audit Guidance:** @@ -199,10 +205,15 @@ After the subagents finish, output a short executive summary directly to the use | Check | Value | T2 | T1 | |-------|-------|----|----| -| Server Conformance | / (%) | ✓/✗ | ✓/✗ | -| Client Conformance (full) | / (%) | ✓/✗ | ✓/✗ | -| — Core scenarios | / (%) | — | — | -| — Auth scenarios | / (%) | — | — | +Conformance matrix: + +| | 2025-03-26 | 2025-06-18 | 2025-11-25 | draft | extension | All* | T2 | T1 | +|--------------|------------|------------|------------|-------|-----------|-------|----|----| +| Server | — | pass/total | pass/total | — | — | pass/total (rate%) | ✓/✗ | ✓/✗ | +| Client: Core | — | pass/total | pass/total | — | — | pass/total (rate%) | — | — | +| Client: Auth | pass/total | pass/total | pass/total | pass/total | pass/total | pass/total (rate%) | — | — | +| **Client Total** | | | | | | **pass/total (rate%)** | **✓/✗** | **✓/✗** | + | Issue Triage | % (/) | ✓/✗ | ✓/✗ | | Labels | / | ✓/✗ | ✓/✗ | | P0 Resolution | open | ✓/✗ | ✓/✗ | @@ -213,8 +224,8 @@ After the subagents finish, output a short executive summary directly to the use | Versioning Policy | | N/A | ✓/✗ | | Stable Release | | ✓/✗ | ✓/✗ | -If a baseline file was found, add a note below the table: -> **Baseline**: {N} failures in `baseline.yml` ({list of categories, e.g. "18 auth scenarios"}). Core suite: {core_rate}%. +If a baseline file was found, add a note below the conformance matrix: +> **Baseline**: {N} failures in `baseline.yml` ({list by cell, e.g. "6 in Client: Auth/2025-11-25, 2 in Client: Auth/extension"}). --- diff --git a/.claude/skills/mcp-sdk-tier-audit/references/tier-requirements.md b/.claude/skills/mcp-sdk-tier-audit/references/tier-requirements.md index b36a2d2..077917b 100644 --- a/.claude/skills/mcp-sdk-tier-audit/references/tier-requirements.md +++ b/.claude/skills/mcp-sdk-tier-audit/references/tier-requirements.md @@ -32,12 +32,13 @@ Source: `modelcontextprotocol/docs/community/sdk-tiers.mdx` in the spec reposito ## Conformance Score Calculation -Conformance scores are calculated against **applicable required tests** only: +Every scenario in the conformance suite has a `specVersions` field indicating which spec version it targets. The valid values are defined as the `SpecVersion` type (as a list) in `src/types.ts` — run `node dist/index.js list` to see the current mapping of scenarios to spec versions. -- Tests for the specification version the SDK targets -- Excluding tests marked as pending or skipped -- Excluding tests for experimental features -- Excluding legacy backward-compatibility tests (unless the SDK claims legacy support) +Date-versioned scenarios (e.g. `2025-06-18`, `2025-11-25`) count toward tier scoring. `draft` and `extension` scenarios are listed separately as informational. + +The `--spec-version` CLI flag filters scenarios cumulatively for date versions (e.g. `--spec-version 2025-06-18` includes `2025-03-26` + `2025-06-18`). For `draft`/`extension`, it returns exact matches only. + +The tier-check output includes a per-version pass rate breakdown alongside the aggregate. ## Tier Relegation Rules diff --git a/src/index.ts b/src/index.ts index d51eb0d..537fb16 100644 --- a/src/index.ts +++ b/src/index.ts @@ -19,8 +19,13 @@ import { listMetadataScenarios, listCoreScenarios, listExtensionScenarios, - listBackcompatScenarios + listBackcompatScenarios, + listScenariosForSpec, + listClientScenariosForSpec, + getScenarioSpecVersions, + ALL_SPEC_VERSIONS } from './scenarios'; +import type { SpecVersion } from './scenarios'; import { ConformanceCheck } from './types'; import { ClientOptionsSchema, ServerOptionsSchema } from './schemas'; import { @@ -31,6 +36,32 @@ import { import { createTierCheckCommand } from './tier-check'; import packageJson from '../package.json'; +function resolveSpecVersion(value: string): SpecVersion { + if (ALL_SPEC_VERSIONS.includes(value as SpecVersion)) { + return value as SpecVersion; + } + console.error(`Unknown spec version: ${value}`); + console.error(`Valid versions: ${ALL_SPEC_VERSIONS.join(', ')}`); + process.exit(1); +} + +// Note on naming: `command` refers to which CLI command is calling this. +// The `client` command tests Scenario objects (which test clients), +// and the `server` command tests ClientScenario objects (which test servers). +// This matches the inverted naming in scenarios/index.ts. +function filterScenariosBySpecVersion( + allScenarios: string[], + version: SpecVersion, + command: 'client' | 'server' +): string[] { + const versionScenarios = + command === 'client' + ? listScenariosForSpec(version) + : listClientScenariosForSpec(version); + const allowed = new Set(versionScenarios); + return allScenarios.filter((s) => allowed.has(s)); +} + const program = new Command(); program @@ -53,12 +84,19 @@ program 'Path to YAML file listing expected failures (baseline)' ) .option('-o, --output-dir ', 'Save results to this directory') + .option( + '--spec-version ', + 'Filter scenarios by spec version (cumulative for date versions)' + ) .option('--verbose', 'Show verbose output') .action(async (options) => { try { const timeout = parseInt(options.timeout, 10); const verbose = options.verbose ?? false; const outputDir = options.outputDir; + const specVersionFilter = options.specVersion + ? resolveSpecVersion(options.specVersion) + : undefined; // Handle suite mode if (options.suite) { @@ -85,7 +123,14 @@ program process.exit(1); } - const scenarios = suites[suiteName](); + let scenarios = suites[suiteName](); + if (specVersionFilter) { + scenarios = filterScenariosBySpecVersion( + scenarios, + specVersionFilter, + 'client' + ); + } console.log( `Running ${suiteName} suite (${scenarios.length} scenarios) in parallel...\n` ); @@ -262,6 +307,10 @@ program 'Path to YAML file listing expected failures (baseline)' ) .option('-o, --output-dir ', 'Save results to this directory') + .option( + '--spec-version ', + 'Filter scenarios by spec version (cumulative for date versions)' + ) .option('--verbose', 'Show verbose output (JSON instead of pretty print)') .action(async (options) => { try { @@ -270,6 +319,9 @@ program const verbose = options.verbose ?? false; const outputDir = options.outputDir; + const specVersionFilter = options.specVersion + ? resolveSpecVersion(options.specVersion) + : undefined; // If a single scenario is specified, run just that one if (validated.scenario) { @@ -317,6 +369,14 @@ program process.exit(1); } + if (specVersionFilter) { + scenarios = filterScenariosBySpecVersion( + scenarios, + specVersionFilter, + 'server' + ); + } + console.log( `Running ${suite} suite (${scenarios.length} scenarios) against ${validated.url}\n` ); @@ -393,11 +453,29 @@ program .description('List available test scenarios') .option('--client', 'List client scenarios') .option('--server', 'List server scenarios') + .option( + '--spec-version ', + 'Filter scenarios by spec version (cumulative for date versions)' + ) .action((options) => { + const specVersionFilter = options.specVersion + ? resolveSpecVersion(options.specVersion) + : undefined; + if (options.server || (!options.client && !options.server)) { console.log('Server scenarios (test against a server):'); - const serverScenarios = listClientScenarios(); - serverScenarios.forEach((s) => console.log(` - ${s}`)); + let serverScenarios = listClientScenarios(); + if (specVersionFilter) { + serverScenarios = filterScenariosBySpecVersion( + serverScenarios, + specVersionFilter, + 'server' + ); + } + serverScenarios.forEach((s) => { + const v = getScenarioSpecVersions(s); + console.log(` - ${s}${v ? ` [${v}]` : ''}`); + }); } if (options.client || (!options.client && !options.server)) { @@ -405,8 +483,18 @@ program console.log(''); } console.log('Client scenarios (test against a client):'); - const clientScenarios = listScenarios(); - clientScenarios.forEach((s) => console.log(` - ${s}`)); + let clientScenarioNames = listScenarios(); + if (specVersionFilter) { + clientScenarioNames = filterScenariosBySpecVersion( + clientScenarioNames, + specVersionFilter, + 'client' + ); + } + clientScenarioNames.forEach((s) => { + const v = getScenarioSpecVersions(s); + console.log(` - ${s}${v ? ` [${v}]` : ''}`); + }); } }); diff --git a/src/scenarios/client/auth/basic-cimd.ts b/src/scenarios/client/auth/basic-cimd.ts index 64c87d5..9e5fe67 100644 --- a/src/scenarios/client/auth/basic-cimd.ts +++ b/src/scenarios/client/auth/basic-cimd.ts @@ -1,5 +1,5 @@ import type { Scenario, ConformanceCheck } from '../../../types'; -import { ScenarioUrls } from '../../../types'; +import { ScenarioUrls, SpecVersion } from '../../../types'; import { createAuthServer } from './helpers/createAuthServer'; import { createServer } from './helpers/createServer'; import { ServerLifecycle } from './helpers/serverLifecycle'; @@ -22,6 +22,7 @@ export const CIMD_CLIENT_METADATA_URL = */ export class AuthBasicCIMDScenario implements Scenario { name = 'auth/basic-cimd'; + specVersions: SpecVersion[] = ['2025-11-25']; description = 'Tests OAuth flow with Client ID Metadata Documents (SEP-991/URL-based client IDs). Server advertises client_id_metadata_document_supported=true and client should use URL as client_id instead of DCR.'; private authServer = new ServerLifecycle(); diff --git a/src/scenarios/client/auth/client-credentials.ts b/src/scenarios/client/auth/client-credentials.ts index b82b5e2..79ab1b2 100644 --- a/src/scenarios/client/auth/client-credentials.ts +++ b/src/scenarios/client/auth/client-credentials.ts @@ -1,6 +1,11 @@ import * as jose from 'jose'; import type { CryptoKey } from 'jose'; -import type { Scenario, ConformanceCheck, ScenarioUrls } from '../../../types'; +import type { + Scenario, + ConformanceCheck, + ScenarioUrls, + SpecVersion +} from '../../../types'; import { createAuthServer } from './helpers/createAuthServer'; import { createServer } from './helpers/createServer'; import { ServerLifecycle } from './helpers/serverLifecycle'; @@ -32,6 +37,7 @@ async function generateTestKeypair(): Promise<{ */ export class ClientCredentialsJwtScenario implements Scenario { name = 'auth/client-credentials-jwt'; + specVersions: SpecVersion[] = ['extension']; description = 'Tests OAuth client_credentials flow with private_key_jwt authentication (SEP-1046)'; @@ -250,6 +256,7 @@ export class ClientCredentialsJwtScenario implements Scenario { */ export class ClientCredentialsBasicScenario implements Scenario { name = 'auth/client-credentials-basic'; + specVersions: SpecVersion[] = ['extension']; description = 'Tests OAuth client_credentials flow with client_secret_basic authentication'; diff --git a/src/scenarios/client/auth/discovery-metadata.ts b/src/scenarios/client/auth/discovery-metadata.ts index 6fc09a8..3dd64b7 100644 --- a/src/scenarios/client/auth/discovery-metadata.ts +++ b/src/scenarios/client/auth/discovery-metadata.ts @@ -87,6 +87,7 @@ function createMetadataScenario(config: MetadataScenarioConfig): Scenario { return { name: `auth/${config.name}`, + specVersions: ['2025-11-25'], description: `Tests Basic OAuth metadata discovery flow. **PRM:** ${config.prmLocation}${config.inWwwAuth ? '' : ' (not in WWW-Authenticate)'} diff --git a/src/scenarios/client/auth/march-spec-backcompat.ts b/src/scenarios/client/auth/march-spec-backcompat.ts index 4f0a5ae..3bc857e 100644 --- a/src/scenarios/client/auth/march-spec-backcompat.ts +++ b/src/scenarios/client/auth/march-spec-backcompat.ts @@ -1,5 +1,5 @@ import type { Scenario, ConformanceCheck } from '../../../types'; -import { ScenarioUrls } from '../../../types'; +import { ScenarioUrls, SpecVersion } from '../../../types'; import { createAuthServer } from './helpers/createAuthServer'; import { createServer } from './helpers/createServer'; import { ServerLifecycle } from './helpers/serverLifecycle'; @@ -8,6 +8,7 @@ import { SpecReferences } from './spec-references'; export class Auth20250326OAuthMetadataBackcompatScenario implements Scenario { name = 'auth/2025-03-26-oauth-metadata-backcompat'; + specVersions: SpecVersion[] = ['2025-03-26']; description = 'Tests 2025-03-26 spec OAuth flow: no PRM (Protected Resource Metadata), OAuth metadata at root location'; private server = new ServerLifecycle(); @@ -68,6 +69,7 @@ export class Auth20250326OAuthMetadataBackcompatScenario implements Scenario { export class Auth20250326OEndpointFallbackScenario implements Scenario { name = 'auth/2025-03-26-oauth-endpoint-fallback'; + specVersions: SpecVersion[] = ['2025-03-26']; description = 'Tests OAuth flow with no metadata endpoints, relying on fallback to standard OAuth endpoints at server root (2025-03-26 spec behavior)'; private server = new ServerLifecycle(); diff --git a/src/scenarios/client/auth/pre-registration.ts b/src/scenarios/client/auth/pre-registration.ts index 0d95e33..00673df 100644 --- a/src/scenarios/client/auth/pre-registration.ts +++ b/src/scenarios/client/auth/pre-registration.ts @@ -1,4 +1,9 @@ -import type { Scenario, ConformanceCheck, ScenarioUrls } from '../../../types'; +import type { + Scenario, + ConformanceCheck, + ScenarioUrls, + SpecVersion +} from '../../../types'; import { createAuthServer } from './helpers/createAuthServer'; import { createServer } from './helpers/createServer'; import { ServerLifecycle } from './helpers/serverLifecycle'; @@ -19,6 +24,7 @@ const PRE_REGISTERED_CLIENT_SECRET = 'pre-registered-secret'; */ export class PreRegistrationScenario implements Scenario { name = 'auth/pre-registration'; + specVersions: SpecVersion[] = ['2025-11-25']; description = 'Tests OAuth flow with pre-registered client credentials. Server does not support DCR.'; diff --git a/src/scenarios/client/auth/resource-mismatch.ts b/src/scenarios/client/auth/resource-mismatch.ts index b38968f..dd76c68 100644 --- a/src/scenarios/client/auth/resource-mismatch.ts +++ b/src/scenarios/client/auth/resource-mismatch.ts @@ -1,5 +1,5 @@ import type { Scenario, ConformanceCheck } from '../../../types.js'; -import { ScenarioUrls } from '../../../types.js'; +import { ScenarioUrls, SpecVersion } from '../../../types.js'; import { createAuthServer } from './helpers/createAuthServer.js'; import { createServer } from './helpers/createServer.js'; import { ServerLifecycle } from './helpers/serverLifecycle.js'; @@ -27,6 +27,7 @@ import { MockTokenVerifier } from './helpers/mockTokenVerifier.js'; */ export class ResourceMismatchScenario implements Scenario { name = 'auth/resource-mismatch'; + specVersions: SpecVersion[] = ['draft']; description = 'Tests that client rejects when PRM resource does not match server URL'; allowClientError = true; diff --git a/src/scenarios/client/auth/scope-handling.ts b/src/scenarios/client/auth/scope-handling.ts index b9865bd..166a9c3 100644 --- a/src/scenarios/client/auth/scope-handling.ts +++ b/src/scenarios/client/auth/scope-handling.ts @@ -1,5 +1,5 @@ import type { Scenario, ConformanceCheck } from '../../../types'; -import { ScenarioUrls } from '../../../types'; +import { ScenarioUrls, SpecVersion } from '../../../types'; import { createAuthServer } from './helpers/createAuthServer'; import { createServer } from './helpers/createServer'; import { ServerLifecycle } from './helpers/serverLifecycle'; @@ -15,6 +15,7 @@ import type { Request, Response, NextFunction } from 'express'; */ export class ScopeFromWwwAuthenticateScenario implements Scenario { name = 'auth/scope-from-www-authenticate'; + specVersions: SpecVersion[] = ['2025-11-25']; description = 'Tests that client uses scope parameter from WWW-Authenticate header when provided'; private authServer = new ServerLifecycle(); @@ -100,6 +101,7 @@ export class ScopeFromWwwAuthenticateScenario implements Scenario { */ export class ScopeFromScopesSupportedScenario implements Scenario { name = 'auth/scope-from-scopes-supported'; + specVersions: SpecVersion[] = ['2025-11-25']; description = 'Tests that client uses all scopes from scopes_supported when scope not in WWW-Authenticate header'; private authServer = new ServerLifecycle(); @@ -195,6 +197,7 @@ export class ScopeFromScopesSupportedScenario implements Scenario { */ export class ScopeOmittedWhenUndefinedScenario implements Scenario { name = 'auth/scope-omitted-when-undefined'; + specVersions: SpecVersion[] = ['2025-11-25']; description = 'Tests that client omits scope parameter when scopes_supported is undefined'; private authServer = new ServerLifecycle(); @@ -281,6 +284,7 @@ export class ScopeOmittedWhenUndefinedScenario implements Scenario { */ export class ScopeStepUpAuthScenario implements Scenario { name = 'auth/scope-step-up'; + specVersions: SpecVersion[] = ['2025-11-25']; description = 'Tests that client handles step-up authentication with different scope requirements per operation'; private authServer = new ServerLifecycle(); @@ -477,6 +481,7 @@ export class ScopeStepUpAuthScenario implements Scenario { */ export class ScopeRetryLimitScenario implements Scenario { name = 'auth/scope-retry-limit'; + specVersions: SpecVersion[] = ['2025-11-25']; description = 'Tests that client implements retry limits to prevent infinite authorization loops on repeated 403 responses'; allowClientError = true; diff --git a/src/scenarios/client/auth/token-endpoint-auth.ts b/src/scenarios/client/auth/token-endpoint-auth.ts index 80b80f7..2b32444 100644 --- a/src/scenarios/client/auth/token-endpoint-auth.ts +++ b/src/scenarios/client/auth/token-endpoint-auth.ts @@ -1,5 +1,5 @@ import type { Scenario, ConformanceCheck } from '../../../types.js'; -import { ScenarioUrls } from '../../../types.js'; +import { ScenarioUrls, SpecVersion } from '../../../types.js'; import { createAuthServer } from './helpers/createAuthServer.js'; import { createServer } from './helpers/createServer.js'; import { ServerLifecycle } from './helpers/serverLifecycle.js'; @@ -45,6 +45,7 @@ const AUTH_METHOD_NAMES: Record = { class TokenEndpointAuthScenario implements Scenario { name: string; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description: string; private expectedAuthMethod: AuthMethod; private authServer = new ServerLifecycle(); diff --git a/src/scenarios/client/elicitation-defaults.ts b/src/scenarios/client/elicitation-defaults.ts index 88e6bf0..73bc07c 100644 --- a/src/scenarios/client/elicitation-defaults.ts +++ b/src/scenarios/client/elicitation-defaults.ts @@ -11,7 +11,7 @@ import { ListToolsRequestSchema, ElicitResultSchema } from '@modelcontextprotocol/sdk/types.js'; -import type { Scenario, ConformanceCheck } from '../../types'; +import type { Scenario, ConformanceCheck, SpecVersion } from '../../types'; import express, { Request, Response } from 'express'; import { ScenarioUrls } from '../../types'; import { createRequestLogger } from '../request-logger'; @@ -474,6 +474,7 @@ function createServer(checks: ConformanceCheck[]): { export class ElicitationClientDefaultsScenario implements Scenario { name = 'elicitation-sep1034-client-defaults'; + specVersions: SpecVersion[] = ['2025-11-25']; description = 'Tests client applies default values for omitted elicitation fields (SEP-1034)'; private app: express.Application | null = null; diff --git a/src/scenarios/client/initialize.ts b/src/scenarios/client/initialize.ts index a351700..70fb0d1 100644 --- a/src/scenarios/client/initialize.ts +++ b/src/scenarios/client/initialize.ts @@ -1,9 +1,15 @@ import http from 'http'; -import { Scenario, ScenarioUrls, ConformanceCheck } from '../../types'; +import { + Scenario, + ScenarioUrls, + ConformanceCheck, + SpecVersion +} from '../../types'; import { clientChecks } from '../../checks/index'; export class InitializeScenario implements Scenario { name = 'initialize'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = 'Tests MCP client initialization handshake'; private server: http.Server | null = null; diff --git a/src/scenarios/client/sse-retry.ts b/src/scenarios/client/sse-retry.ts index 6e44a06..c8f0792 100644 --- a/src/scenarios/client/sse-retry.ts +++ b/src/scenarios/client/sse-retry.ts @@ -8,10 +8,16 @@ */ import http from 'http'; -import { Scenario, ScenarioUrls, ConformanceCheck } from '../../types.js'; +import { + Scenario, + ScenarioUrls, + ConformanceCheck, + SpecVersion +} from '../../types.js'; export class SSERetryScenario implements Scenario { name = 'sse-retry'; + specVersions: SpecVersion[] = ['2025-11-25']; description = 'Tests that client respects SSE retry field timing and reconnects properly (SEP-1699)'; diff --git a/src/scenarios/client/tools_call.ts b/src/scenarios/client/tools_call.ts index b074773..807ad55 100644 --- a/src/scenarios/client/tools_call.ts +++ b/src/scenarios/client/tools_call.ts @@ -4,7 +4,7 @@ import { CallToolRequestSchema, ListToolsRequestSchema } from '@modelcontextprotocol/sdk/types.js'; -import type { Scenario, ConformanceCheck } from '../../types'; +import type { Scenario, ConformanceCheck, SpecVersion } from '../../types'; import express, { Request, Response } from 'express'; import { ScenarioUrls } from '../../types'; import { createRequestLogger } from '../request-logger'; @@ -115,6 +115,7 @@ function createServerApp(checks: ConformanceCheck[]): express.Application { export class ToolsCallScenario implements Scenario { name = 'tools_call'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = 'Tests calling tools with various parameter types'; private app: express.Application | null = null; private httpServer: any = null; diff --git a/src/scenarios/index.ts b/src/scenarios/index.ts index 6e17101..d67fae4 100644 --- a/src/scenarios/index.ts +++ b/src/scenarios/index.ts @@ -1,4 +1,4 @@ -import { Scenario, ClientScenario } from '../types'; +import { Scenario, ClientScenario, SpecVersion } from '../types'; import { InitializeScenario } from './client/initialize'; import { ToolsCallScenario } from './client/tools_call'; import { ElicitationClientDefaultsScenario } from './client/elicitation-defaults'; @@ -211,3 +211,34 @@ export function listBackcompatScenarios(): string[] { } export { listMetadataScenarios }; + +// All valid spec versions, used by the CLI to validate --spec-version input. +export const ALL_SPEC_VERSIONS: SpecVersion[] = [ + '2025-03-26', + '2025-06-18', + '2025-11-25', + 'draft', + 'extension' +]; + +export function listScenariosForSpec(version: SpecVersion): string[] { + return scenariosList + .filter((s) => s.specVersions.includes(version)) + .map((s) => s.name); +} + +export function listClientScenariosForSpec(version: SpecVersion): string[] { + return allClientScenariosList + .filter((s) => s.specVersions.includes(version)) + .map((s) => s.name); +} + +export function getScenarioSpecVersions( + name: string +): SpecVersion[] | undefined { + return ( + scenarios.get(name)?.specVersions ?? clientScenarios.get(name)?.specVersions + ); +} + +export type { SpecVersion }; diff --git a/src/scenarios/server/dns-rebinding.ts b/src/scenarios/server/dns-rebinding.ts index 2f21f8f..cd6c5f4 100644 --- a/src/scenarios/server/dns-rebinding.ts +++ b/src/scenarios/server/dns-rebinding.ts @@ -5,7 +5,7 @@ * to prevent DNS rebinding attacks. See GHSA-w48q-cv73-mx4w for details. */ -import { ClientScenario, ConformanceCheck } from '../../types'; +import { ClientScenario, ConformanceCheck, SpecVersion } from '../../types'; import { request } from 'undici'; const SPEC_REFERENCES = [ @@ -85,6 +85,7 @@ async function sendRequestWithHostAndOrigin( export class DNSRebindingProtectionScenario implements ClientScenario { name = 'dns-rebinding-protection'; + specVersions: SpecVersion[] = ['2025-11-25']; description = `Test DNS rebinding protection for localhost servers. **Scope:** This test applies to localhost MCP servers running without HTTPS and without diff --git a/src/scenarios/server/elicitation-defaults.ts b/src/scenarios/server/elicitation-defaults.ts index 2be114c..a458fec 100644 --- a/src/scenarios/server/elicitation-defaults.ts +++ b/src/scenarios/server/elicitation-defaults.ts @@ -2,12 +2,13 @@ * SEP-1034: Elicitation default values test scenarios for MCP servers */ -import { ClientScenario, ConformanceCheck } from '../../types'; +import { ClientScenario, ConformanceCheck, SpecVersion } from '../../types'; import { connectToServer } from './client-helper'; import { ElicitRequestSchema } from '@modelcontextprotocol/sdk/types.js'; export class ElicitationDefaultsScenario implements ClientScenario { name = 'elicitation-sep1034-defaults'; + specVersions: SpecVersion[] = ['2025-11-25']; description = `Test elicitation with default values for all primitive types (SEP-1034). **Server Implementation Requirements:** diff --git a/src/scenarios/server/elicitation-enums.ts b/src/scenarios/server/elicitation-enums.ts index e5c1fa3..c9eb598 100644 --- a/src/scenarios/server/elicitation-enums.ts +++ b/src/scenarios/server/elicitation-enums.ts @@ -2,12 +2,13 @@ * SEP-1330: Elicitation enum schema improvements test scenarios for MCP servers */ -import { ClientScenario, ConformanceCheck } from '../../types'; +import { ClientScenario, ConformanceCheck, SpecVersion } from '../../types'; import { connectToServer } from './client-helper'; import { ElicitRequestSchema } from '@modelcontextprotocol/sdk/types.js'; export class ElicitationEnumsScenario implements ClientScenario { name = 'elicitation-sep1330-enums'; + specVersions: SpecVersion[] = ['2025-11-25']; description = `Test elicitation with enum schema improvements (SEP-1330). **Server Implementation Requirements:** diff --git a/src/scenarios/server/json-schema-2020-12.ts b/src/scenarios/server/json-schema-2020-12.ts index 2cfd08b..be20f2a 100644 --- a/src/scenarios/server/json-schema-2020-12.ts +++ b/src/scenarios/server/json-schema-2020-12.ts @@ -6,7 +6,7 @@ * or additionalProperties fields. */ -import { ClientScenario, ConformanceCheck } from '../../types.js'; +import { ClientScenario, ConformanceCheck, SpecVersion } from '../../types.js'; import { connectToServer } from './client-helper.js'; const EXPECTED_TOOL_NAME = 'json_schema_2020_12_tool'; @@ -14,6 +14,7 @@ const EXPECTED_SCHEMA_DIALECT = 'https://json-schema.org/draft/2020-12/schema'; export class JsonSchema2020_12Scenario implements ClientScenario { name = 'json-schema-2020-12'; + specVersions: SpecVersion[] = ['2025-11-25']; description = `Validates JSON Schema 2020-12 keyword preservation (SEP-1613). **Server Implementation Requirements:** diff --git a/src/scenarios/server/lifecycle.ts b/src/scenarios/server/lifecycle.ts index d9b341e..392a932 100644 --- a/src/scenarios/server/lifecycle.ts +++ b/src/scenarios/server/lifecycle.ts @@ -2,11 +2,12 @@ * Lifecycle test scenarios for MCP servers */ -import { ClientScenario, ConformanceCheck } from '../../types'; +import { ClientScenario, ConformanceCheck, SpecVersion } from '../../types'; import { connectToServer } from './client-helper'; export class ServerInitializeScenario implements ClientScenario { name = 'server-initialize'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test basic server initialization handshake. **Server Implementation Requirements:** diff --git a/src/scenarios/server/prompts.ts b/src/scenarios/server/prompts.ts index 436564b..f62faac 100644 --- a/src/scenarios/server/prompts.ts +++ b/src/scenarios/server/prompts.ts @@ -2,11 +2,12 @@ * Prompts test scenarios for MCP servers */ -import { ClientScenario, ConformanceCheck } from '../../types'; +import { ClientScenario, ConformanceCheck, SpecVersion } from '../../types'; import { connectToServer } from './client-helper'; export class PromptsListScenario implements ClientScenario { name = 'prompts-list'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test listing available prompts. **Server Implementation Requirements:** @@ -87,6 +88,7 @@ export class PromptsListScenario implements ClientScenario { export class PromptsGetSimpleScenario implements ClientScenario { name = 'prompts-get-simple'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test getting a simple prompt without arguments. **Server Implementation Requirements:** @@ -171,6 +173,7 @@ Implement a prompt named \`test_simple_prompt\` with no arguments that returns: export class PromptsGetWithArgsScenario implements ClientScenario { name = 'prompts-get-with-args'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test parameterized prompt. **Server Implementation Requirements:** @@ -266,6 +269,7 @@ Returns (with args \`{arg1: "hello", arg2: "world"}\`): export class PromptsGetEmbeddedResourceScenario implements ClientScenario { name = 'prompts-get-embedded-resource'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test prompt with embedded resource content. **Server Implementation Requirements:** @@ -371,6 +375,7 @@ Returns: export class PromptsGetWithImageScenario implements ClientScenario { name = 'prompts-get-with-image'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test prompt with image content. **Server Implementation Requirements:** diff --git a/src/scenarios/server/resources.ts b/src/scenarios/server/resources.ts index a4ed241..ec3c5fc 100644 --- a/src/scenarios/server/resources.ts +++ b/src/scenarios/server/resources.ts @@ -2,7 +2,7 @@ * Resources test scenarios for MCP servers */ -import { ClientScenario, ConformanceCheck } from '../../types'; +import { ClientScenario, ConformanceCheck, SpecVersion } from '../../types'; import { connectToServer } from './client-helper'; import { TextResourceContents, @@ -11,6 +11,7 @@ import { export class ResourcesListScenario implements ClientScenario { name = 'resources-list'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test listing available resources. **Server Implementation Requirements:** @@ -91,6 +92,7 @@ export class ResourcesListScenario implements ClientScenario { export class ResourcesReadTextScenario implements ClientScenario { name = 'resources-read-text'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test reading text resource. **Server Implementation Requirements:** @@ -177,6 +179,7 @@ Implement resource \`test://static-text\` that returns: export class ResourcesReadBinaryScenario implements ClientScenario { name = 'resources-read-binary'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test reading binary resource. **Server Implementation Requirements:** @@ -261,6 +264,7 @@ Implement resource \`test://static-binary\` that returns: export class ResourcesTemplateReadScenario implements ClientScenario { name = 'resources-templates-read'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test reading resource from template. **Server Implementation Requirements:** @@ -362,6 +366,7 @@ Returns (for \`uri: "test://template/123/data"\`): export class ResourcesSubscribeScenario implements ClientScenario { name = 'resources-subscribe'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test subscribing to resource updates. **Server Implementation Requirements:** @@ -432,6 +437,7 @@ Example request: export class ResourcesUnsubscribeScenario implements ClientScenario { name = 'resources-unsubscribe'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test unsubscribing from resource. **Server Implementation Requirements:** diff --git a/src/scenarios/server/sse-multiple-streams.ts b/src/scenarios/server/sse-multiple-streams.ts index 7cda3f2..ea025de 100644 --- a/src/scenarios/server/sse-multiple-streams.ts +++ b/src/scenarios/server/sse-multiple-streams.ts @@ -9,13 +9,14 @@ * Multiple concurrent streams are achieved via POST requests, each getting their own stream. */ -import { ClientScenario, ConformanceCheck } from '../../types.js'; +import { ClientScenario, ConformanceCheck, SpecVersion } from '../../types.js'; import { EventSourceParserStream } from 'eventsource-parser/stream'; import { Client } from '@modelcontextprotocol/sdk/client/index.js'; import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js'; export class ServerSSEMultipleStreamsScenario implements ClientScenario { name = 'server-sse-multiple-streams'; + specVersions: SpecVersion[] = ['2025-11-25']; description = 'Test server supports multiple concurrent POST SSE streams (SEP-1699)'; diff --git a/src/scenarios/server/sse-polling.ts b/src/scenarios/server/sse-polling.ts index 5a3a240..30deee7 100644 --- a/src/scenarios/server/sse-polling.ts +++ b/src/scenarios/server/sse-polling.ts @@ -8,7 +8,7 @@ * - Replaying events when client reconnects with Last-Event-ID */ -import { ClientScenario, ConformanceCheck } from '../../types.js'; +import { ClientScenario, ConformanceCheck, SpecVersion } from '../../types.js'; import { EventSourceParserStream } from 'eventsource-parser/stream'; import { Client } from '@modelcontextprotocol/sdk/client/index.js'; import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js'; @@ -67,6 +67,7 @@ function createLoggingFetch(checks: ConformanceCheck[]) { export class ServerSSEPollingScenario implements ClientScenario { name = 'server-sse-polling'; + specVersions: SpecVersion[] = ['2025-11-25']; description = 'Test server SSE polling via test_reconnection tool that closes stream mid-call (SEP-1699)'; diff --git a/src/scenarios/server/tools.ts b/src/scenarios/server/tools.ts index e445a5c..7ecbfdd 100644 --- a/src/scenarios/server/tools.ts +++ b/src/scenarios/server/tools.ts @@ -2,7 +2,7 @@ * Tools test scenarios for MCP servers */ -import { ClientScenario, ConformanceCheck } from '../../types'; +import { ClientScenario, ConformanceCheck, SpecVersion } from '../../types'; import { connectToServer, NotificationCollector } from './client-helper'; import { CallToolResultSchema, @@ -13,6 +13,7 @@ import { export class ToolsListScenario implements ClientScenario { name = 'tools-list'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test listing available tools. **Server Implementation Requirements:** @@ -95,6 +96,7 @@ export class ToolsListScenario implements ClientScenario { export class ToolsCallSimpleTextScenario implements ClientScenario { name = 'tools-call-simple-text'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test calling a tool that returns simple text. **Server Implementation Requirements:** @@ -179,6 +181,7 @@ Implement tool \`test_simple_text\` with no arguments that returns: export class ToolsCallImageScenario implements ClientScenario { name = 'tools-call-image'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test calling a tool that returns image content. **Server Implementation Requirements:** @@ -266,6 +269,7 @@ Implement tool \`test_image_content\` with no arguments that returns: export class ToolsCallMultipleContentTypesScenario implements ClientScenario { name = 'tools-call-mixed-content'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test tool returning multiple content types. **Server Implementation Requirements:** @@ -366,6 +370,7 @@ Implement tool \`test_multiple_content_types\` with no arguments that returns: export class ToolsCallWithLoggingScenario implements ClientScenario { name = 'tools-call-with-logging'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test tool that sends log messages during execution. **Server Implementation Requirements:** @@ -454,6 +459,7 @@ Implement tool \`test_tool_with_logging\` with no arguments. export class ToolsCallErrorScenario implements ClientScenario { name = 'tools-call-error'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test tool error reporting. **Server Implementation Requirements:** @@ -538,6 +544,7 @@ Implement tool \`test_error_handling\` with no arguments. export class ToolsCallWithProgressScenario implements ClientScenario { name = 'tools-call-with-progress'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test tool that reports progress notifications. **Server Implementation Requirements:** @@ -657,6 +664,7 @@ If no progress token provided, just execute with delays. export class ToolsCallSamplingScenario implements ClientScenario { name = 'tools-call-sampling'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test tool that requests LLM sampling from client. **Server Implementation Requirements:** @@ -784,6 +792,7 @@ Implement tool \`test_sampling\` with argument: export class ToolsCallElicitationScenario implements ClientScenario { name = 'tools-call-elicitation'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test tool that requests user input (elicitation) from client. **Server Implementation Requirements:** @@ -914,6 +923,7 @@ Implement tool \`test_elicitation\` with argument: export class ToolsCallAudioScenario implements ClientScenario { name = 'tools-call-audio'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test calling a tool that returns audio content. **Server Implementation Requirements:** @@ -1008,6 +1018,7 @@ Implement tool \`test_audio_content\` with no arguments that returns: export class ToolsCallEmbeddedResourceScenario implements ClientScenario { name = 'tools-call-embedded-resource'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test calling a tool that returns embedded resource content. **Server Implementation Requirements:** diff --git a/src/scenarios/server/utils.ts b/src/scenarios/server/utils.ts index 07b2ca6..0a4a391 100644 --- a/src/scenarios/server/utils.ts +++ b/src/scenarios/server/utils.ts @@ -2,11 +2,12 @@ * Utilities test scenarios for MCP servers */ -import { ClientScenario, ConformanceCheck } from '../../types'; +import { ClientScenario, ConformanceCheck, SpecVersion } from '../../types'; import { connectToServer } from './client-helper'; export class LoggingSetLevelScenario implements ClientScenario { name = 'logging-set-level'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test setting logging level. **Server Implementation Requirements:** @@ -85,6 +86,7 @@ export class LoggingSetLevelScenario implements ClientScenario { export class PingScenario implements ClientScenario { name = 'ping'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test ping utility for connection health check. **Server Implementation Requirements:** @@ -174,6 +176,7 @@ export class PingScenario implements ClientScenario { export class CompletionCompleteScenario implements ClientScenario { name = 'completion-complete'; + specVersions: SpecVersion[] = ['2025-06-18', '2025-11-25']; description = `Test completion endpoint. **Server Implementation Requirements:** diff --git a/src/scenarios/spec-version.test.ts b/src/scenarios/spec-version.test.ts new file mode 100644 index 0000000..0b8e652 --- /dev/null +++ b/src/scenarios/spec-version.test.ts @@ -0,0 +1,94 @@ +import { describe, it, expect } from 'vitest'; +import { + listScenarios, + listClientScenarios, + listScenariosForSpec, + getScenarioSpecVersions, + ALL_SPEC_VERSIONS +} from './index'; + +describe('specVersions helpers', () => { + it('every Scenario has specVersions', () => { + for (const name of listScenarios()) { + const versions = getScenarioSpecVersions(name); + expect( + versions, + `scenario "${name}" is missing specVersions` + ).toBeDefined(); + expect(versions!.length).toBeGreaterThan(0); + for (const v of versions!) { + expect(ALL_SPEC_VERSIONS).toContain(v); + } + } + }); + + it('every ClientScenario has specVersions', () => { + for (const name of listClientScenarios()) { + const versions = getScenarioSpecVersions(name); + expect( + versions, + `client scenario "${name}" is missing specVersions` + ).toBeDefined(); + expect(versions!.length).toBeGreaterThan(0); + for (const v of versions!) { + expect(ALL_SPEC_VERSIONS).toContain(v); + } + } + }); + + it('listScenariosForSpec returns scenarios that include that version', () => { + const scenarios = listScenariosForSpec('2025-06-18'); + expect(scenarios.length).toBeGreaterThan(0); + for (const name of scenarios) { + expect(getScenarioSpecVersions(name)).toContain('2025-06-18'); + } + }); + + it('2025-11-25 includes scenarios carried forward from 2025-06-18', () => { + const base = listScenariosForSpec('2025-06-18'); + const current = listScenariosForSpec('2025-11-25'); + // scenarios tagged with both versions should appear in both lists + const currentSet = new Set(current); + // at least some overlap (carried-forward scenarios) + const overlap = base.filter((s) => currentSet.has(s)); + expect(overlap.length).toBeGreaterThan(0); + // current should have more total (new 2025-11-25-only scenarios) + expect(current.length).toBeGreaterThan(overlap.length); + }); + + it('2025-11-25 does not include 2025-03-26-only scenarios', () => { + const backcompat = listScenariosForSpec('2025-03-26'); + const current = listScenariosForSpec('2025-11-25'); + const currentSet = new Set(current); + // backcompat-only scenarios should not appear in 2025-11-25 + for (const name of backcompat) { + const versions = getScenarioSpecVersions(name)!; + if (!versions.includes('2025-11-25')) { + expect(currentSet.has(name)).toBe(false); + } + } + }); + + it('draft and extension scenarios are isolated', () => { + const draft = listScenariosForSpec('draft'); + for (const name of draft) { + expect(getScenarioSpecVersions(name)).toContain('draft'); + } + const ext = listScenariosForSpec('extension'); + for (const name of ext) { + expect(getScenarioSpecVersions(name)).toContain('extension'); + } + }); + + it('draft scenarios are not in dated versions', () => { + const draft = listScenariosForSpec('draft'); + const dated = new Set([ + ...listScenariosForSpec('2025-03-26'), + ...listScenariosForSpec('2025-06-18'), + ...listScenariosForSpec('2025-11-25') + ]); + for (const name of draft) { + expect(dated.has(name)).toBe(false); + } + }); +}); diff --git a/src/tier-check/checks/test-conformance-results.ts b/src/tier-check/checks/test-conformance-results.ts index 55e4f6d..6637136 100644 --- a/src/tier-check/checks/test-conformance-results.ts +++ b/src/tier-check/checks/test-conformance-results.ts @@ -3,7 +3,11 @@ import { mkdtempSync, readFileSync, existsSync, globSync } from 'fs'; import { join, dirname } from 'path'; import { tmpdir } from 'os'; import { ConformanceResult } from '../types'; -import { listScenarios, listActiveClientScenarios } from '../../scenarios'; +import { + listScenarios, + listActiveClientScenarios, + getScenarioSpecVersions +} from '../../scenarios'; import { ConformanceCheck } from '../../types'; /** @@ -105,6 +109,15 @@ function reconcileWithExpected( }) ); + // Attach specVersion to existing detail entries + for (const detail of result.details) { + let name = stripTimestamp(detail.scenario); + if (resultPrefix) { + name = name.replace(new RegExp(`^${resultPrefix}-`), ''); + } + detail.specVersions = getScenarioSpecVersions(name); + } + for (const expected of expectedScenarios) { if (!reportedNames.has(expected)) { result.failed++; @@ -113,7 +126,8 @@ function reconcileWithExpected( scenario: expected, passed: false, checks_passed: 0, - checks_failed: 0 + checks_failed: 0, + specVersions: getScenarioSpecVersions(expected) }); } } diff --git a/src/tier-check/output.ts b/src/tier-check/output.ts index d7e9fc8..d4a857f 100644 --- a/src/tier-check/output.ts +++ b/src/tier-check/output.ts @@ -1,4 +1,4 @@ -import { TierScorecard, CheckStatus } from './types'; +import { TierScorecard, CheckStatus, ConformanceResult } from './types'; const COLORS = { RESET: '\x1b[0m', @@ -23,6 +23,79 @@ function statusIcon(status: CheckStatus): string { } } +const SPEC_VERSIONS = [ + '2025-03-26', + '2025-06-18', + '2025-11-25', + 'draft', + 'extension' +] as const; + +type Cell = { passed: number; total: number }; + +interface MatrixRow { + cells: Map; + unique: Cell; +} + +function newRow(): MatrixRow { + return { cells: new Map(), unique: { passed: 0, total: 0 } }; +} + +interface ConformanceMatrix { + server: MatrixRow; + clientCore: MatrixRow; + clientAuth: MatrixRow; +} + +function buildConformanceMatrix( + server: ConformanceResult, + client: ConformanceResult +): ConformanceMatrix { + const matrix: ConformanceMatrix = { + server: newRow(), + clientCore: newRow(), + clientAuth: newRow() + }; + + for (const d of server.details) { + matrix.server.unique.total++; + if (d.passed) matrix.server.unique.passed++; + for (const v of d.specVersions ?? ['unknown']) { + const cell = matrix.server.cells.get(v) ?? { passed: 0, total: 0 }; + cell.total++; + if (d.passed) cell.passed++; + matrix.server.cells.set(v, cell); + } + } + + for (const d of client.details) { + const row = d.scenario.startsWith('auth/') + ? matrix.clientAuth + : matrix.clientCore; + row.unique.total++; + if (d.passed) row.unique.passed++; + for (const v of d.specVersions ?? ['unknown']) { + const cell = row.cells.get(v) ?? { passed: 0, total: 0 }; + cell.total++; + if (d.passed) cell.passed++; + row.cells.set(v, cell); + } + } + + return matrix; +} + +function formatCell(cell: Cell | undefined): string { + if (!cell || cell.total === 0) return '\u2014'; + return `${cell.passed}/${cell.total}`; +} + +function formatRate(cell: Cell): string { + if (cell.total === 0) return '0/0'; + return `${cell.passed}/${cell.total} (${Math.round((cell.passed / cell.total) * 100)}%)`; +} + export function formatJson(scorecard: TierScorecard): string { return JSON.stringify(scorecard, null, 2); } @@ -42,12 +115,33 @@ export function formatMarkdown(scorecard: TierScorecard): string { lines.push(''); lines.push('| Check | Status | Detail |'); lines.push('|-------|--------|--------|'); - lines.push( - `| Server Conformance | ${c.conformance.status} | ${c.conformance.passed}/${c.conformance.total} scenarios pass (${Math.round(c.conformance.pass_rate * 100)}%) |` + // Conformance matrix + const matrix = buildConformanceMatrix( + c.conformance as ConformanceResult, + c.client_conformance as ConformanceResult ); + + lines.push(''); + lines.push(`| | ${SPEC_VERSIONS.join(' | ')} | All* |`); + lines.push(`|---|${SPEC_VERSIONS.map(() => '---|').join('')}---|`); + + const mdRows: [string, MatrixRow][] = [ + ['Server', matrix.server], + ['Client: Core', matrix.clientCore], + ['Client: Auth', matrix.clientAuth] + ]; + + for (const [label, row] of mdRows) { + lines.push( + `| ${label} | ${SPEC_VERSIONS.map((v) => formatCell(row.cells.get(v))).join(' | ')} | ${formatRate(row.unique)} |` + ); + } + + lines.push(''); lines.push( - `| Client Conformance | ${c.client_conformance.status} | ${c.client_conformance.passed}/${c.client_conformance.total} scenarios pass (${Math.round(c.client_conformance.pass_rate * 100)}%) |` + '_* unique scenarios — a scenario may apply to multiple spec versions_' ); + lines.push(''); lines.push( `| Labels | ${c.labels.status} | ${c.labels.present}/${c.labels.required} required labels${c.labels.missing.length > 0 ? ` (missing: ${c.labels.missing.join(', ')})` : ''} |` ); @@ -100,14 +194,51 @@ export function formatTerminal(scorecard: TierScorecard): void { if (scorecard.version) console.log(`Version: ${scorecard.version}`); console.log(`Timestamp: ${scorecard.timestamp}\n`); - console.log(`${COLORS.BOLD}Check Results:${COLORS.RESET}\n`); + console.log(`${COLORS.BOLD}Conformance:${COLORS.RESET}\n`); + + // Conformance matrix + const matrix = buildConformanceMatrix( + c.conformance as ConformanceResult, + c.client_conformance as ConformanceResult + ); + + const vw = 10; // column width for version cells + const lw = 14; // label column width + const tw = 16; // total column width + const rp = (s: string, w: number) => s.padStart(w); + const lp = (s: string, w: number) => s.padEnd(w); + + console.log( + ` ${COLORS.DIM}${lp('', lw + 2)} ${SPEC_VERSIONS.map((v) => rp(v, vw)).join(' ')} ${rp('All*', tw)}${COLORS.RESET}` + ); + + const rows: [string, MatrixRow, CheckStatus | null, boolean][] = [ + ['Server', matrix.server, c.conformance.status, true], + ['Client: Core', matrix.clientCore, null, false], + ['Client: Auth', matrix.clientAuth, null, false] + ]; + + for (const [label, row, status, bold] of rows) { + const icon = status ? statusIcon(status) + ' ' : ' '; + const b = bold ? COLORS.BOLD : ''; + const r = bold ? COLORS.RESET : ''; + console.log( + ` ${icon}${b}${lp(label, lw)}${r} ${SPEC_VERSIONS.map((v) => rp(formatCell(row.cells.get(v)), vw)).join(' ')} ${b}${rp(formatRate(row.unique), tw)}${r}` + ); + } + // Client total line + const clientTotal: Cell = { + passed: matrix.clientCore.unique.passed + matrix.clientAuth.unique.passed, + total: matrix.clientCore.unique.total + matrix.clientAuth.unique.total + }; console.log( - ` ${statusIcon(c.conformance.status)} Server Conformance ${c.conformance.passed}/${c.conformance.total} (${Math.round(c.conformance.pass_rate * 100)}%)` + ` ${statusIcon(c.client_conformance.status)} ${COLORS.BOLD}${lp('Client Total', lw)}${COLORS.RESET} ${' '.repeat(SPEC_VERSIONS.length * (vw + 1) - 1)} ${COLORS.BOLD}${rp(formatRate(clientTotal), tw)}${COLORS.RESET}` ); console.log( - ` ${statusIcon(c.client_conformance.status)} Client Conformance ${c.client_conformance.passed}/${c.client_conformance.total} (${Math.round(c.client_conformance.pass_rate * 100)}%)` + `\n ${COLORS.DIM}* unique scenarios — a scenario may apply to multiple spec versions${COLORS.RESET}` ); + console.log(`\n${COLORS.BOLD}Repository Health:${COLORS.RESET}\n`); console.log( ` ${statusIcon(c.labels.status)} Labels ${c.labels.present}/${c.labels.required} required labels` ); diff --git a/src/tier-check/types.ts b/src/tier-check/types.ts index ff6be5d..a9830f4 100644 --- a/src/tier-check/types.ts +++ b/src/tier-check/types.ts @@ -1,3 +1,5 @@ +import type { SpecVersion } from '../types'; + export type CheckStatus = 'pass' | 'fail' | 'partial' | 'skipped'; export interface CheckResult { @@ -15,6 +17,7 @@ export interface ConformanceResult extends CheckResult { passed: boolean; checks_passed: number; checks_failed: number; + specVersions?: SpecVersion[]; }>; } diff --git a/src/types.ts b/src/types.ts index 5dd6421..c086e65 100644 --- a/src/types.ts +++ b/src/types.ts @@ -23,6 +23,13 @@ export interface ConformanceCheck { logs?: string[]; } +export type SpecVersion = + | '2025-03-26' + | '2025-06-18' + | '2025-11-25' + | 'draft' + | 'extension'; + export interface ScenarioUrls { serverUrl: string; authUrl?: string; @@ -36,6 +43,7 @@ export interface ScenarioUrls { export interface Scenario { name: string; description: string; + specVersions: SpecVersion[]; /** * If true, a non-zero client exit code is expected and will not cause the test to fail. * Use this for scenarios where the client is expected to error (e.g., rejecting invalid auth). @@ -49,5 +57,6 @@ export interface Scenario { export interface ClientScenario { name: string; description: string; + specVersions: SpecVersion[]; run(serverUrl: string): Promise; } From 97f6d4e14cc0fd9e7c7d53afb079ff52679ec3bd Mon Sep 17 00:00:00 2001 From: Felix Weinberger Date: Fri, 13 Feb 2026 17:36:08 +0000 Subject: [PATCH 2/4] fix: fix console output template for tier audit skill MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The template had an orphan table header (Check | Value | T2 | T1) with no rows above the conformance matrix, causing an empty table to render. The scorecard rows below the matrix also lacked their own header. Fix: two self-contained tables with clear labels — 'Conformance:' for the per-version matrix, 'Scorecard:' for the check rows. --- .claude/skills/mcp-sdk-tier-audit/SKILL.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/.claude/skills/mcp-sdk-tier-audit/SKILL.md b/.claude/skills/mcp-sdk-tier-audit/SKILL.md index 382f637..c4b3ac9 100644 --- a/.claude/skills/mcp-sdk-tier-audit/SKILL.md +++ b/.claude/skills/mcp-sdk-tier-audit/SKILL.md @@ -203,9 +203,7 @@ After the subagents finish, output a short executive summary directly to the use ``` ## — Tier -| Check | Value | T2 | T1 | -|-------|-------|----|----| -Conformance matrix: +Conformance: | | 2025-03-26 | 2025-06-18 | 2025-11-25 | draft | extension | All* | T2 | T1 | |--------------|------------|------------|------------|-------|-----------|-------|----|----| @@ -214,6 +212,13 @@ Conformance matrix: | Client: Auth | pass/total | pass/total | pass/total | pass/total | pass/total | pass/total (rate%) | — | — | | **Client Total** | | | | | | **pass/total (rate%)** | **✓/✗** | **✓/✗** | +If a baseline file was found, add a note below the conformance table: +> **Baseline**: {N} failures in `baseline.yml` ({list by cell, e.g. "6 in Client: Auth/2025-11-25, 2 in Client: Auth/extension"}). + +Scorecard: + +| Check | Value | T2 | T1 | +|-------|-------|----|----| | Issue Triage | % (/) | ✓/✗ | ✓/✗ | | Labels | / | ✓/✗ | ✓/✗ | | P0 Resolution | open | ✓/✗ | ✓/✗ | @@ -224,9 +229,6 @@ Conformance matrix: | Versioning Policy | | N/A | ✓/✗ | | Stable Release | | ✓/✗ | ✓/✗ | -If a baseline file was found, add a note below the conformance matrix: -> **Baseline**: {N} failures in `baseline.yml` ({list by cell, e.g. "6 in Client: Auth/2025-11-25, 2 in Client: Auth/extension"}). - --- **High-Priority Fixes:** From 7deb00f4e179ca2e439380b1bdba2dea38fcbebf Mon Sep 17 00:00:00 2001 From: Felix Weinberger Date: Fri, 13 Feb 2026 17:45:11 +0000 Subject: [PATCH 3/4] fix: align skill console template with tier-check script output MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add asterisk footnote ('unique scenarios — a scenario may apply to multiple spec versions') and rename 'Scorecard' to 'Repository Health' to match the labels used by the tier-check CLI output. --- .claude/skills/mcp-sdk-tier-audit/SKILL.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/.claude/skills/mcp-sdk-tier-audit/SKILL.md b/.claude/skills/mcp-sdk-tier-audit/SKILL.md index c4b3ac9..ca45df9 100644 --- a/.claude/skills/mcp-sdk-tier-audit/SKILL.md +++ b/.claude/skills/mcp-sdk-tier-audit/SKILL.md @@ -212,10 +212,12 @@ Conformance: | Client: Auth | pass/total | pass/total | pass/total | pass/total | pass/total | pass/total (rate%) | — | — | | **Client Total** | | | | | | **pass/total (rate%)** | **✓/✗** | **✓/✗** | +\* unique scenarios — a scenario may apply to multiple spec versions + If a baseline file was found, add a note below the conformance table: > **Baseline**: {N} failures in `baseline.yml` ({list by cell, e.g. "6 in Client: Auth/2025-11-25, 2 in Client: Auth/extension"}). -Scorecard: +Repository Health: | Check | Value | T2 | T1 | |-------|-------|----|----| From dfa903644cf82dbda733e2f8d86bbfcf31ca102e Mon Sep 17 00:00:00 2001 From: Felix Weinberger Date: Fri, 13 Feb 2026 17:54:47 +0000 Subject: [PATCH 4/4] fix: add specVersions to CrossAppAccessCompleteFlowScenario Added in 83c446d on main after this branch diverged. --- src/scenarios/client/auth/cross-app-access.ts | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/scenarios/client/auth/cross-app-access.ts b/src/scenarios/client/auth/cross-app-access.ts index 077889d..05a351b 100644 --- a/src/scenarios/client/auth/cross-app-access.ts +++ b/src/scenarios/client/auth/cross-app-access.ts @@ -1,7 +1,12 @@ import * as jose from 'jose'; import type { CryptoKey } from 'jose'; import express, { type Request, type Response } from 'express'; -import type { Scenario, ConformanceCheck, ScenarioUrls } from '../../../types'; +import type { + Scenario, + ConformanceCheck, + ScenarioUrls, + SpecVersion +} from '../../../types'; import { createAuthServer } from './helpers/createAuthServer'; import { createServer } from './helpers/createServer'; import { MockTokenVerifier } from './helpers/mockTokenVerifier'; @@ -55,6 +60,7 @@ async function createIdpIdToken( */ export class CrossAppAccessCompleteFlowScenario implements Scenario { name = 'auth/cross-app-access-complete-flow'; + specVersions: SpecVersion[] = ['extension']; description = 'Tests complete SEP-990 flow: token exchange + JWT bearer grant (Enterprise Managed OAuth)';