modelcontextprotocol · pcarleton · Feb 13, 2026 · Feb 13, 2026 · Feb 13, 2026 · Feb 13, 2026
diff --git a/.claude/skills/mcp-sdk-tier-audit/SKILL.md b/.claude/skills/mcp-sdk-tier-audit/SKILL.md
@@ -66,7 +66,9 @@ npm run --silent tier-check -- \
 
 If no client-cmd was detected, omit the `--client-cmd` flag (client conformance will be skipped).
 
-The CLI output includes server conformance pass rate, client conformance pass rate, issue triage compliance, P0 resolution times, label taxonomy, stable release status, policy signal files, and spec tracking gap. Parse the JSON output to feed into Step 4.
+The CLI output includes server conformance pass rate, client conformance pass rate (with per-spec-version breakdown), issue triage compliance, P0 resolution times, label taxonomy, stable release status, policy signal files, and spec tracking gap. Parse the JSON output to feed into Step 4.
+
+The conformance results now include a `specVersions` field on each detail entry, enabling per-version pass rate analysis. The `list` command also shows spec version tags: `node dist/index.js list` shows `[2025-06-18]`, `[2025-11-25]`, `[draft]`, or `[extension]` next to each scenario.
 
 ### Conformance Baseline Check
 
@@ -143,17 +145,21 @@ If any Tier 2 requirement is not met, the SDK is Tier 3.
 - If GitHub issue labels are not set up per SEP-1730, triage metrics cannot be computed. Note this as a gap. However, repos may use GitHub's native issue types instead of type labels — the CLI checks for both.
 - If client conformance was skipped (no client command found), note this as a gap but do not block tier advancement based on it alone.
 
-**Client Conformance Splits:**
+**Conformance Breakdown:**
+
+The **full suite** pass rates (server total, client total) are used for tier threshold checks. To interpret them, present a single conformance matrix combining server and client results. Each detail entry in the tier-check JSON has a `specVersions` field; client category is derived from the scenario name (`auth/` prefix = Auth, everything else = Core). Server scenarios are all Core.
 
-When reporting client conformance, always break results into three categories:
+Example:
 
-1. **Core suite** — Non-auth scenarios (e.g. initialize, tools_call, elicitation, sse-retry)
-2. **Auth suite** — OAuth/authorization scenarios (any scenario starting with `auth/`)
-3. **Full suite** — All scenarios combined
+|              | 2025-03-26 | 2025-06-18 | 2025-11-25 | draft | extension | All\*        |
+| ------------ | ---------- | ---------- | ---------- | ----- | --------- | ------------ |
+| Server       | —          | 26/26      | 4/4        | —     | —         | 30/30 (100%) |
+| Client: Core | —          | 2/2        | 2/2        | —     | —         | 4/4 (100%)   |
+| Client: Auth | 0/2        | 3/3        | 6/11       | 0/1   | 0/2       | 9/19 (47%)   |
 
-The **full suite** number is used for tier threshold checks. However, the core vs auth split provides essential context. Always present both numbers in the report.
+This immediately shows where failures concentrate. Failures clustered in Client: Auth / `2025-11-25` means "new auth features not yet implemented" — a scope gap, not a quality problem. Failures in Server or Client: Core are more concerning.
 
-If the SDK has a `baseline.yml` or expected-failures file, note which failures are known/tracked vs. unexpected regressions. A low full-suite score where all failures are auth scenarios documented in the baseline is a scope gap (OAuth not yet implemented), not a quality problem — flag it accordingly in the assessment.
+If the SDK has a `baseline.yml` or expected-failures file, cross-reference with the matrix to identify whether baselined failures cluster in a specific cell (e.g. all in `2025-11-25` / Client: Auth = scope gap).
 
 **P0 Label Audit Guidance:**
 
@@ -197,12 +203,24 @@ After the subagents finish, output a short executive summary directly to the use
 ```
 ## <sdk-name> — Tier <X>
 
+Conformance:
+
+|              | 2025-03-26 | 2025-06-18 | 2025-11-25 | draft | extension | All* | T2 | T1 |
+|--------------|------------|------------|------------|-------|-----------|-------|----|----|
+| Server       | —          | pass/total | pass/total | —     | —         | pass/total (rate%) | ✓/✗ | ✓/✗ |
+| Client: Core | —          | pass/total | pass/total | —     | —         | pass/total (rate%) | — | — |
+| Client: Auth | pass/total | pass/total | pass/total | pass/total | pass/total | pass/total (rate%) | — | — |
+| **Client Total** | | | | | | **pass/total (rate%)** | **✓/✗** | **✓/✗** |
+
+\* unique scenarios — a scenario may apply to multiple spec versions
+
+If a baseline file was found, add a note below the conformance table:
+> **Baseline**: {N} failures in `baseline.yml` ({list by cell, e.g. "6 in Client: Auth/2025-11-25, 2 in Client: Auth/extension"}).
+
+Repository Health:
+
 | Check | Value | T2 | T1 |
 |-------|-------|----|----|
-| Server Conformance | <passed>/<total> (<rate>%) | ✓/✗ | ✓/✗ |
-| Client Conformance (full) | <passed>/<total> (<rate>%) | ✓/✗ | ✓/✗ |
-|   — Core scenarios | <core_pass>/<core_total> (<rate>%) | — | — |
-|   — Auth scenarios | <auth_pass>/<auth_total> (<rate>%) | — | — |
 | Issue Triage | <rate>% (<triaged>/<total>) | ✓/✗ | ✓/✗ |
 | Labels | <present>/<required> | ✓/✗ | ✓/✗ |
 | P0 Resolution | <count> open | ✓/✗ | ✓/✗ |
@@ -213,9 +231,6 @@ After the subagents finish, output a short executive summary directly to the use
 | Versioning Policy | <summary> | N/A | ✓/✗ |
 | Stable Release | <version> | ✓/✗ | ✓/✗ |
 
-If a baseline file was found, add a note below the table:
-> **Baseline**: {N} failures in `baseline.yml` ({list of categories, e.g. "18 auth scenarios"}). Core suite: {core_rate}%.
-
 ---
 
 **High-Priority Fixes:**

diff --git a/.claude/skills/mcp-sdk-tier-audit/references/tier-requirements.md b/.claude/skills/mcp-sdk-tier-audit/references/tier-requirements.md
@@ -32,12 +32,13 @@ Source: `modelcontextprotocol/docs/community/sdk-tiers.mdx` in the spec reposito
 
 ## Conformance Score Calculation
 
-Conformance scores are calculated against **applicable required tests** only:
+Every scenario in the conformance suite has a `specVersions` field indicating which spec version it targets. The valid values are defined as the `SpecVersion` type (as a list) in `src/types.ts` — run `node dist/index.js list` to see the current mapping of scenarios to spec versions.
 
-- Tests for the specification version the SDK targets
-- Excluding tests marked as pending or skipped
-- Excluding tests for experimental features
-- Excluding legacy backward-compatibility tests (unless the SDK claims legacy support)
+Date-versioned scenarios (e.g. `2025-06-18`, `2025-11-25`) count toward tier scoring. `draft` and `extension` scenarios are listed separately as informational.
+
+The `--spec-version` CLI flag filters scenarios cumulatively for date versions (e.g. `--spec-version 2025-06-18` includes `2025-03-26` + `2025-06-18`). For `draft`/`extension`, it returns exact matches only.
+
+The tier-check output includes a per-version pass rate breakdown alongside the aggregate.
 
 ## Tier Relegation Rules
 

diff --git a/src/index.ts b/src/index.ts
@@ -19,8 +19,13 @@ import {
   listMetadataScenarios,
   listCoreScenarios,
   listExtensionScenarios,
-  listBackcompatScenarios
+  listBackcompatScenarios,
+  listScenariosForSpec,
+  listClientScenariosForSpec,
+  getScenarioSpecVersions,
+  ALL_SPEC_VERSIONS
 } from './scenarios';
+import type { SpecVersion } from './scenarios';
 import { ConformanceCheck } from './types';
 import { ClientOptionsSchema, ServerOptionsSchema } from './schemas';
 import {
@@ -31,6 +36,32 @@ import {
 import { createTierCheckCommand } from './tier-check';
 import packageJson from '../package.json';
 
+function resolveSpecVersion(value: string): SpecVersion {
+  if (ALL_SPEC_VERSIONS.includes(value as SpecVersion)) {
+    return value as SpecVersion;
+  }
+  console.error(`Unknown spec version: ${value}`);
+  console.error(`Valid versions: ${ALL_SPEC_VERSIONS.join(', ')}`);
+  process.exit(1);
+}
+
+// Note on naming: `command` refers to which CLI command is calling this.
+// The `client` command tests Scenario objects (which test clients),
+// and the `server` command tests ClientScenario objects (which test servers).
+// This matches the inverted naming in scenarios/index.ts.
+function filterScenariosBySpecVersion(
+  allScenarios: string[],
+  version: SpecVersion,
+  command: 'client' | 'server'
+): string[] {
+  const versionScenarios =
+    command === 'client'
+      ? listScenariosForSpec(version)
+      : listClientScenariosForSpec(version);
+  const allowed = new Set(versionScenarios);
+  return allScenarios.filter((s) => allowed.has(s));
+}
+
 const program = new Command();
 
 program
@@ -53,12 +84,19 @@ program
     'Path to YAML file listing expected failures (baseline)'
   )
   .option('-o, --output-dir <path>', 'Save results to this directory')
+  .option(
+    '--spec-version <version>',
+    'Filter scenarios by spec version (cumulative for date versions)'
+  )
   .option('--verbose', 'Show verbose output')
   .action(async (options) => {
     try {
       const timeout = parseInt(options.timeout, 10);
       const verbose = options.verbose ?? false;
       const outputDir = options.outputDir;
+      const specVersionFilter = options.specVersion
+        ? resolveSpecVersion(options.specVersion)
+        : undefined;
 
       // Handle suite mode
       if (options.suite) {
@@ -85,7 +123,14 @@ program
           process.exit(1);
         }
 
-        const scenarios = suites[suiteName]();
+        let scenarios = suites[suiteName]();
+        if (specVersionFilter) {
+          scenarios = filterScenariosBySpecVersion(
+            scenarios,
+            specVersionFilter,
+            'client'
+          );
+        }
         console.log(
           `Running ${suiteName} suite (${scenarios.length} scenarios) in parallel...\n`
         );
@@ -262,6 +307,10 @@ program
     'Path to YAML file listing expected failures (baseline)'
   )
   .option('-o, --output-dir <path>', 'Save results to this directory')
+  .option(
+    '--spec-version <version>',
+    'Filter scenarios by spec version (cumulative for date versions)'
+  )
   .option('--verbose', 'Show verbose output (JSON instead of pretty print)')
   .action(async (options) => {
     try {
@@ -270,6 +319,9 @@ program
 
       const verbose = options.verbose ?? false;
       const outputDir = options.outputDir;
+      const specVersionFilter = options.specVersion
+        ? resolveSpecVersion(options.specVersion)
+        : undefined;
 
       // If a single scenario is specified, run just that one
       if (validated.scenario) {
@@ -317,6 +369,14 @@ program
           process.exit(1);
         }
 
+        if (specVersionFilter) {
+          scenarios = filterScenariosBySpecVersion(
+            scenarios,
+            specVersionFilter,
+            'server'
+          );
+        }
+
         console.log(
           `Running ${suite} suite (${scenarios.length} scenarios) against ${validated.url}\n`
         );
@@ -393,20 +453,48 @@ program
   .description('List available test scenarios')
   .option('--client', 'List client scenarios')
   .option('--server', 'List server scenarios')
+  .option(
+    '--spec-version <version>',
+    'Filter scenarios by spec version (cumulative for date versions)'
+  )
   .action((options) => {
+    const specVersionFilter = options.specVersion
+      ? resolveSpecVersion(options.specVersion)
+      : undefined;
+
     if (options.server || (!options.client && !options.server)) {
       console.log('Server scenarios (test against a server):');
-      const serverScenarios = listClientScenarios();
-      serverScenarios.forEach((s) => console.log(`  - ${s}`));
+      let serverScenarios = listClientScenarios();
+      if (specVersionFilter) {
+        serverScenarios = filterScenariosBySpecVersion(
+          serverScenarios,
+          specVersionFilter,
+          'server'
+        );
+      }
+      serverScenarios.forEach((s) => {
+        const v = getScenarioSpecVersions(s);
+        console.log(`  - ${s}${v ? ` [${v}]` : ''}`);
+      });
     }
 
     if (options.client || (!options.client && !options.server)) {
       if (options.server || (!options.client && !options.server)) {
         console.log('');
       }
       console.log('Client scenarios (test against a client):');
-      const clientScenarios = listScenarios();
-      clientScenarios.forEach((s) => console.log(`  - ${s}`));
+      let clientScenarioNames = listScenarios();
+      if (specVersionFilter) {
+        clientScenarioNames = filterScenariosBySpecVersion(
+          clientScenarioNames,
+          specVersionFilter,
+          'client'
+        );
+      }
+      clientScenarioNames.forEach((s) => {
+        const v = getScenarioSpecVersions(s);
+        console.log(`  - ${s}${v ? ` [${v}]` : ''}`);
+      });
     }
   });
 

diff --git a/src/scenarios/client/auth/basic-cimd.ts b/src/scenarios/client/auth/basic-cimd.ts
@@ -1,5 +1,5 @@
 import type { Scenario, ConformanceCheck } from '../../../types';
-import { ScenarioUrls } from '../../../types';
+import { ScenarioUrls, SpecVersion } from '../../../types';
 import { createAuthServer } from './helpers/createAuthServer';
 import { createServer } from './helpers/createServer';
 import { ServerLifecycle } from './helpers/serverLifecycle';
@@ -22,6 +22,7 @@ export const CIMD_CLIENT_METADATA_URL =
  */
 export class AuthBasicCIMDScenario implements Scenario {
   name = 'auth/basic-cimd';
+  specVersions: SpecVersion[] = ['2025-11-25'];
   description =
     'Tests OAuth flow with Client ID Metadata Documents (SEP-991/URL-based client IDs). Server advertises client_id_metadata_document_supported=true and client should use URL as client_id instead of DCR.';
   private authServer = new ServerLifecycle();

diff --git a/src/scenarios/client/auth/client-credentials.ts b/src/scenarios/client/auth/client-credentials.ts
@@ -1,6 +1,11 @@
 import * as jose from 'jose';
 import type { CryptoKey } from 'jose';
-import type { Scenario, ConformanceCheck, ScenarioUrls } from '../../../types';
+import type {
+  Scenario,
+  ConformanceCheck,
+  ScenarioUrls,
+  SpecVersion
+} from '../../../types';
 import { createAuthServer } from './helpers/createAuthServer';
 import { createServer } from './helpers/createServer';
 import { ServerLifecycle } from './helpers/serverLifecycle';
@@ -32,6 +37,7 @@ async function generateTestKeypair(): Promise<{
  */
 export class ClientCredentialsJwtScenario implements Scenario {
   name = 'auth/client-credentials-jwt';
+  specVersions: SpecVersion[] = ['extension'];
   description =
     'Tests OAuth client_credentials flow with private_key_jwt authentication (SEP-1046)';
 
@@ -250,6 +256,7 @@ export class ClientCredentialsJwtScenario implements Scenario {
  */
 export class ClientCredentialsBasicScenario implements Scenario {
   name = 'auth/client-credentials-basic';
+  specVersions: SpecVersion[] = ['extension'];
   description =
     'Tests OAuth client_credentials flow with client_secret_basic authentication';
 

diff --git a/src/scenarios/client/auth/cross-app-access.ts b/src/scenarios/client/auth/cross-app-access.ts
@@ -1,7 +1,12 @@
 import * as jose from 'jose';
 import type { CryptoKey } from 'jose';
 import express, { type Request, type Response } from 'express';
-import type { Scenario, ConformanceCheck, ScenarioUrls } from '../../../types';
+import type {
+  Scenario,
+  ConformanceCheck,
+  ScenarioUrls,
+  SpecVersion
+} from '../../../types';
 import { createAuthServer } from './helpers/createAuthServer';
 import { createServer } from './helpers/createServer';
 import { MockTokenVerifier } from './helpers/mockTokenVerifier';
@@ -55,6 +60,7 @@ async function createIdpIdToken(
  */
 export class CrossAppAccessCompleteFlowScenario implements Scenario {
   name = 'auth/cross-app-access-complete-flow';
+  specVersions: SpecVersion[] = ['extension'];
   description =
     'Tests complete SEP-990 flow: token exchange + JWT bearer grant (Enterprise Managed OAuth)';
 

diff --git a/src/scenarios/client/auth/discovery-metadata.ts b/src/scenarios/client/auth/discovery-metadata.ts
@@ -87,6 +87,7 @@ function createMetadataScenario(config: MetadataScenarioConfig): Scenario {
 
   return {
     name: `auth/${config.name}`,
+    specVersions: ['2025-11-25'],
     description: `Tests Basic OAuth metadata discovery flow.
 
 **PRM:** ${config.prmLocation}${config.inWwwAuth ? '' : ' (not in WWW-Authenticate)'}

diff --git a/src/scenarios/client/auth/march-spec-backcompat.ts b/src/scenarios/client/auth/march-spec-backcompat.ts
@@ -1,5 +1,5 @@
 import type { Scenario, ConformanceCheck } from '../../../types';
-import { ScenarioUrls } from '../../../types';
+import { ScenarioUrls, SpecVersion } from '../../../types';
 import { createAuthServer } from './helpers/createAuthServer';
 import { createServer } from './helpers/createServer';
 import { ServerLifecycle } from './helpers/serverLifecycle';
@@ -8,6 +8,7 @@ import { SpecReferences } from './spec-references';
 
 export class Auth20250326OAuthMetadataBackcompatScenario implements Scenario {
   name = 'auth/2025-03-26-oauth-metadata-backcompat';
+  specVersions: SpecVersion[] = ['2025-03-26'];
   description =
     'Tests 2025-03-26 spec OAuth flow: no PRM (Protected Resource Metadata), OAuth metadata at root location';
   private server = new ServerLifecycle();
@@ -68,6 +69,7 @@ export class Auth20250326OAuthMetadataBackcompatScenario implements Scenario {
 
 export class Auth20250326OEndpointFallbackScenario implements Scenario {
   name = 'auth/2025-03-26-oauth-endpoint-fallback';
+  specVersions: SpecVersion[] = ['2025-03-26'];
   description =
     'Tests OAuth flow with no metadata endpoints, relying on fallback to standard OAuth endpoints at server root (2025-03-26 spec behavior)';
   private server = new ServerLifecycle();

diff --git a/src/scenarios/client/auth/pre-registration.ts b/src/scenarios/client/auth/pre-registration.ts
@@ -1,4 +1,9 @@
-import type { Scenario, ConformanceCheck, ScenarioUrls } from '../../../types';
+import type {
+  Scenario,
+  ConformanceCheck,
+  ScenarioUrls,
+  SpecVersion
+} from '../../../types';
 import { createAuthServer } from './helpers/createAuthServer';
 import { createServer } from './helpers/createServer';
 import { ServerLifecycle } from './helpers/serverLifecycle';
@@ -19,6 +24,7 @@ const PRE_REGISTERED_CLIENT_SECRET = 'pre-registered-secret';
  */
 export class PreRegistrationScenario implements Scenario {
   name = 'auth/pre-registration';
+  specVersions: SpecVersion[] = ['2025-11-25'];
   description =
     'Tests OAuth flow with pre-registered client credentials. Server does not support DCR.';