Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 30 additions & 15 deletions .claude/skills/mcp-sdk-tier-audit/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,9 @@ npm run --silent tier-check -- \

If no client-cmd was detected, omit the `--client-cmd` flag (client conformance will be skipped).

The CLI output includes server conformance pass rate, client conformance pass rate, issue triage compliance, P0 resolution times, label taxonomy, stable release status, policy signal files, and spec tracking gap. Parse the JSON output to feed into Step 4.
The CLI output includes server conformance pass rate, client conformance pass rate (with per-spec-version breakdown), issue triage compliance, P0 resolution times, label taxonomy, stable release status, policy signal files, and spec tracking gap. Parse the JSON output to feed into Step 4.

The conformance results now include a `specVersions` field on each detail entry, enabling per-version pass rate analysis. The `list` command also shows spec version tags: `node dist/index.js list` shows `[2025-06-18]`, `[2025-11-25]`, `[draft]`, or `[extension]` next to each scenario.

### Conformance Baseline Check

Expand Down Expand Up @@ -143,17 +145,21 @@ If any Tier 2 requirement is not met, the SDK is Tier 3.
- If GitHub issue labels are not set up per SEP-1730, triage metrics cannot be computed. Note this as a gap. However, repos may use GitHub's native issue types instead of type labels — the CLI checks for both.
- If client conformance was skipped (no client command found), note this as a gap but do not block tier advancement based on it alone.

**Client Conformance Splits:**
**Conformance Breakdown:**

The **full suite** pass rates (server total, client total) are used for tier threshold checks. To interpret them, present a single conformance matrix combining server and client results. Each detail entry in the tier-check JSON has a `specVersions` field; client category is derived from the scenario name (`auth/` prefix = Auth, everything else = Core). Server scenarios are all Core.

When reporting client conformance, always break results into three categories:
Example:

1. **Core suite** — Non-auth scenarios (e.g. initialize, tools_call, elicitation, sse-retry)
2. **Auth suite** — OAuth/authorization scenarios (any scenario starting with `auth/`)
3. **Full suite** — All scenarios combined
| | 2025-03-26 | 2025-06-18 | 2025-11-25 | draft | extension | All\* |
| ------------ | ---------- | ---------- | ---------- | ----- | --------- | ------------ |
| Server | — | 26/26 | 4/4 | — | — | 30/30 (100%) |
| Client: Core | — | 2/2 | 2/2 | — | — | 4/4 (100%) |
| Client: Auth | 0/2 | 3/3 | 6/11 | 0/1 | 0/2 | 9/19 (47%) |

The **full suite** number is used for tier threshold checks. However, the core vs auth split provides essential context. Always present both numbers in the report.
This immediately shows where failures concentrate. Failures clustered in Client: Auth / `2025-11-25` means "new auth features not yet implemented" — a scope gap, not a quality problem. Failures in Server or Client: Core are more concerning.

If the SDK has a `baseline.yml` or expected-failures file, note which failures are known/tracked vs. unexpected regressions. A low full-suite score where all failures are auth scenarios documented in the baseline is a scope gap (OAuth not yet implemented), not a quality problem — flag it accordingly in the assessment.
If the SDK has a `baseline.yml` or expected-failures file, cross-reference with the matrix to identify whether baselined failures cluster in a specific cell (e.g. all in `2025-11-25` / Client: Auth = scope gap).

**P0 Label Audit Guidance:**

Expand Down Expand Up @@ -197,12 +203,24 @@ After the subagents finish, output a short executive summary directly to the use
```
## <sdk-name> — Tier <X>

Conformance:

| | 2025-03-26 | 2025-06-18 | 2025-11-25 | draft | extension | All* | T2 | T1 |
|--------------|------------|------------|------------|-------|-----------|-------|----|----|
| Server | — | pass/total | pass/total | — | — | pass/total (rate%) | ✓/✗ | ✓/✗ |
| Client: Core | — | pass/total | pass/total | — | — | pass/total (rate%) | — | — |
| Client: Auth | pass/total | pass/total | pass/total | pass/total | pass/total | pass/total (rate%) | — | — |
| **Client Total** | | | | | | **pass/total (rate%)** | **✓/✗** | **✓/✗** |

\* unique scenarios — a scenario may apply to multiple spec versions

If a baseline file was found, add a note below the conformance table:
> **Baseline**: {N} failures in `baseline.yml` ({list by cell, e.g. "6 in Client: Auth/2025-11-25, 2 in Client: Auth/extension"}).

Repository Health:

| Check | Value | T2 | T1 |
|-------|-------|----|----|
| Server Conformance | <passed>/<total> (<rate>%) | ✓/✗ | ✓/✗ |
| Client Conformance (full) | <passed>/<total> (<rate>%) | ✓/✗ | ✓/✗ |
| — Core scenarios | <core_pass>/<core_total> (<rate>%) | — | — |
| — Auth scenarios | <auth_pass>/<auth_total> (<rate>%) | — | — |
| Issue Triage | <rate>% (<triaged>/<total>) | ✓/✗ | ✓/✗ |
| Labels | <present>/<required> | ✓/✗ | ✓/✗ |
| P0 Resolution | <count> open | ✓/✗ | ✓/✗ |
Expand All @@ -213,9 +231,6 @@ After the subagents finish, output a short executive summary directly to the use
| Versioning Policy | <summary> | N/A | ✓/✗ |
| Stable Release | <version> | ✓/✗ | ✓/✗ |

If a baseline file was found, add a note below the table:
> **Baseline**: {N} failures in `baseline.yml` ({list of categories, e.g. "18 auth scenarios"}). Core suite: {core_rate}%.

---

**High-Priority Fixes:**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,13 @@ Source: `modelcontextprotocol/docs/community/sdk-tiers.mdx` in the spec reposito

## Conformance Score Calculation

Conformance scores are calculated against **applicable required tests** only:
Every scenario in the conformance suite has a `specVersions` field indicating which spec version it targets. The valid values are defined as the `SpecVersion` type (as a list) in `src/types.ts` — run `node dist/index.js list` to see the current mapping of scenarios to spec versions.

- Tests for the specification version the SDK targets
- Excluding tests marked as pending or skipped
- Excluding tests for experimental features
- Excluding legacy backward-compatibility tests (unless the SDK claims legacy support)
Date-versioned scenarios (e.g. `2025-06-18`, `2025-11-25`) count toward tier scoring. `draft` and `extension` scenarios are listed separately as informational.

The `--spec-version` CLI flag filters scenarios cumulatively for date versions (e.g. `--spec-version 2025-06-18` includes `2025-03-26` + `2025-06-18`). For `draft`/`extension`, it returns exact matches only.

The tier-check output includes a per-version pass rate breakdown alongside the aggregate.

## Tier Relegation Rules

Expand Down
100 changes: 94 additions & 6 deletions src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,13 @@ import {
listMetadataScenarios,
listCoreScenarios,
listExtensionScenarios,
listBackcompatScenarios
listBackcompatScenarios,
listScenariosForSpec,
listClientScenariosForSpec,
getScenarioSpecVersions,
ALL_SPEC_VERSIONS
} from './scenarios';
import type { SpecVersion } from './scenarios';
import { ConformanceCheck } from './types';
import { ClientOptionsSchema, ServerOptionsSchema } from './schemas';
import {
Expand All @@ -31,6 +36,32 @@ import {
import { createTierCheckCommand } from './tier-check';
import packageJson from '../package.json';

function resolveSpecVersion(value: string): SpecVersion {
if (ALL_SPEC_VERSIONS.includes(value as SpecVersion)) {
return value as SpecVersion;
}
console.error(`Unknown spec version: ${value}`);
console.error(`Valid versions: ${ALL_SPEC_VERSIONS.join(', ')}`);
process.exit(1);
}

// Note on naming: `command` refers to which CLI command is calling this.
// The `client` command tests Scenario objects (which test clients),
// and the `server` command tests ClientScenario objects (which test servers).
// This matches the inverted naming in scenarios/index.ts.
function filterScenariosBySpecVersion(
allScenarios: string[],
version: SpecVersion,
command: 'client' | 'server'
): string[] {
const versionScenarios =
command === 'client'
? listScenariosForSpec(version)
: listClientScenariosForSpec(version);
const allowed = new Set(versionScenarios);
return allScenarios.filter((s) => allowed.has(s));
}

const program = new Command();

program
Expand All @@ -53,12 +84,19 @@ program
'Path to YAML file listing expected failures (baseline)'
)
.option('-o, --output-dir <path>', 'Save results to this directory')
.option(
'--spec-version <version>',
'Filter scenarios by spec version (cumulative for date versions)'
)
.option('--verbose', 'Show verbose output')
.action(async (options) => {
try {
const timeout = parseInt(options.timeout, 10);
const verbose = options.verbose ?? false;
const outputDir = options.outputDir;
const specVersionFilter = options.specVersion
? resolveSpecVersion(options.specVersion)
: undefined;

// Handle suite mode
if (options.suite) {
Expand All @@ -85,7 +123,14 @@ program
process.exit(1);
}

const scenarios = suites[suiteName]();
let scenarios = suites[suiteName]();
if (specVersionFilter) {
scenarios = filterScenariosBySpecVersion(
scenarios,
specVersionFilter,
'client'
);
}
console.log(
`Running ${suiteName} suite (${scenarios.length} scenarios) in parallel...\n`
);
Expand Down Expand Up @@ -262,6 +307,10 @@ program
'Path to YAML file listing expected failures (baseline)'
)
.option('-o, --output-dir <path>', 'Save results to this directory')
.option(
'--spec-version <version>',
'Filter scenarios by spec version (cumulative for date versions)'
)
.option('--verbose', 'Show verbose output (JSON instead of pretty print)')
.action(async (options) => {
try {
Expand All @@ -270,6 +319,9 @@ program

const verbose = options.verbose ?? false;
const outputDir = options.outputDir;
const specVersionFilter = options.specVersion
? resolveSpecVersion(options.specVersion)
: undefined;

// If a single scenario is specified, run just that one
if (validated.scenario) {
Expand Down Expand Up @@ -317,6 +369,14 @@ program
process.exit(1);
}

if (specVersionFilter) {
scenarios = filterScenariosBySpecVersion(
scenarios,
specVersionFilter,
'server'
);
}

console.log(
`Running ${suite} suite (${scenarios.length} scenarios) against ${validated.url}\n`
);
Expand Down Expand Up @@ -393,20 +453,48 @@ program
.description('List available test scenarios')
.option('--client', 'List client scenarios')
.option('--server', 'List server scenarios')
.option(
'--spec-version <version>',
'Filter scenarios by spec version (cumulative for date versions)'
)
.action((options) => {
const specVersionFilter = options.specVersion
? resolveSpecVersion(options.specVersion)
: undefined;

if (options.server || (!options.client && !options.server)) {
console.log('Server scenarios (test against a server):');
const serverScenarios = listClientScenarios();
serverScenarios.forEach((s) => console.log(` - ${s}`));
let serverScenarios = listClientScenarios();
if (specVersionFilter) {
serverScenarios = filterScenariosBySpecVersion(
serverScenarios,
specVersionFilter,
'server'
);
}
serverScenarios.forEach((s) => {
const v = getScenarioSpecVersions(s);
console.log(` - ${s}${v ? ` [${v}]` : ''}`);
});
}

if (options.client || (!options.client && !options.server)) {
if (options.server || (!options.client && !options.server)) {
console.log('');
}
console.log('Client scenarios (test against a client):');
const clientScenarios = listScenarios();
clientScenarios.forEach((s) => console.log(` - ${s}`));
let clientScenarioNames = listScenarios();
if (specVersionFilter) {
clientScenarioNames = filterScenariosBySpecVersion(
clientScenarioNames,
specVersionFilter,
'client'
);
}
clientScenarioNames.forEach((s) => {
const v = getScenarioSpecVersions(s);
console.log(` - ${s}${v ? ` [${v}]` : ''}`);
});
}
});

Expand Down
3 changes: 2 additions & 1 deletion src/scenarios/client/auth/basic-cimd.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import type { Scenario, ConformanceCheck } from '../../../types';
import { ScenarioUrls } from '../../../types';
import { ScenarioUrls, SpecVersion } from '../../../types';
import { createAuthServer } from './helpers/createAuthServer';
import { createServer } from './helpers/createServer';
import { ServerLifecycle } from './helpers/serverLifecycle';
Expand All @@ -22,6 +22,7 @@ export const CIMD_CLIENT_METADATA_URL =
*/
export class AuthBasicCIMDScenario implements Scenario {
name = 'auth/basic-cimd';
specVersions: SpecVersion[] = ['2025-11-25'];
description =
'Tests OAuth flow with Client ID Metadata Documents (SEP-991/URL-based client IDs). Server advertises client_id_metadata_document_supported=true and client should use URL as client_id instead of DCR.';
private authServer = new ServerLifecycle();
Expand Down
9 changes: 8 additions & 1 deletion src/scenarios/client/auth/client-credentials.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
import * as jose from 'jose';
import type { CryptoKey } from 'jose';
import type { Scenario, ConformanceCheck, ScenarioUrls } from '../../../types';
import type {
Scenario,
ConformanceCheck,
ScenarioUrls,
SpecVersion
} from '../../../types';
import { createAuthServer } from './helpers/createAuthServer';
import { createServer } from './helpers/createServer';
import { ServerLifecycle } from './helpers/serverLifecycle';
Expand Down Expand Up @@ -32,6 +37,7 @@ async function generateTestKeypair(): Promise<{
*/
export class ClientCredentialsJwtScenario implements Scenario {
name = 'auth/client-credentials-jwt';
specVersions: SpecVersion[] = ['extension'];
description =
'Tests OAuth client_credentials flow with private_key_jwt authentication (SEP-1046)';

Expand Down Expand Up @@ -250,6 +256,7 @@ export class ClientCredentialsJwtScenario implements Scenario {
*/
export class ClientCredentialsBasicScenario implements Scenario {
name = 'auth/client-credentials-basic';
specVersions: SpecVersion[] = ['extension'];
description =
'Tests OAuth client_credentials flow with client_secret_basic authentication';

Expand Down
8 changes: 7 additions & 1 deletion src/scenarios/client/auth/cross-app-access.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
import * as jose from 'jose';
import type { CryptoKey } from 'jose';
import express, { type Request, type Response } from 'express';
import type { Scenario, ConformanceCheck, ScenarioUrls } from '../../../types';
import type {
Scenario,
ConformanceCheck,
ScenarioUrls,
SpecVersion
} from '../../../types';
import { createAuthServer } from './helpers/createAuthServer';
import { createServer } from './helpers/createServer';
import { MockTokenVerifier } from './helpers/mockTokenVerifier';
Expand Down Expand Up @@ -55,6 +60,7 @@ async function createIdpIdToken(
*/
export class CrossAppAccessCompleteFlowScenario implements Scenario {
name = 'auth/cross-app-access-complete-flow';
specVersions: SpecVersion[] = ['extension'];
description =
'Tests complete SEP-990 flow: token exchange + JWT bearer grant (Enterprise Managed OAuth)';

Expand Down
1 change: 1 addition & 0 deletions src/scenarios/client/auth/discovery-metadata.ts
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ function createMetadataScenario(config: MetadataScenarioConfig): Scenario {

return {
name: `auth/${config.name}`,
specVersions: ['2025-11-25'],
description: `Tests Basic OAuth metadata discovery flow.

**PRM:** ${config.prmLocation}${config.inWwwAuth ? '' : ' (not in WWW-Authenticate)'}
Expand Down
4 changes: 3 additions & 1 deletion src/scenarios/client/auth/march-spec-backcompat.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import type { Scenario, ConformanceCheck } from '../../../types';
import { ScenarioUrls } from '../../../types';
import { ScenarioUrls, SpecVersion } from '../../../types';
import { createAuthServer } from './helpers/createAuthServer';
import { createServer } from './helpers/createServer';
import { ServerLifecycle } from './helpers/serverLifecycle';
Expand All @@ -8,6 +8,7 @@ import { SpecReferences } from './spec-references';

export class Auth20250326OAuthMetadataBackcompatScenario implements Scenario {
name = 'auth/2025-03-26-oauth-metadata-backcompat';
specVersions: SpecVersion[] = ['2025-03-26'];
description =
'Tests 2025-03-26 spec OAuth flow: no PRM (Protected Resource Metadata), OAuth metadata at root location';
private server = new ServerLifecycle();
Expand Down Expand Up @@ -68,6 +69,7 @@ export class Auth20250326OAuthMetadataBackcompatScenario implements Scenario {

export class Auth20250326OEndpointFallbackScenario implements Scenario {
name = 'auth/2025-03-26-oauth-endpoint-fallback';
specVersions: SpecVersion[] = ['2025-03-26'];
description =
'Tests OAuth flow with no metadata endpoints, relying on fallback to standard OAuth endpoints at server root (2025-03-26 spec behavior)';
private server = new ServerLifecycle();
Expand Down
8 changes: 7 additions & 1 deletion src/scenarios/client/auth/pre-registration.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
import type { Scenario, ConformanceCheck, ScenarioUrls } from '../../../types';
import type {
Scenario,
ConformanceCheck,
ScenarioUrls,
SpecVersion
} from '../../../types';
import { createAuthServer } from './helpers/createAuthServer';
import { createServer } from './helpers/createServer';
import { ServerLifecycle } from './helpers/serverLifecycle';
Expand All @@ -19,6 +24,7 @@ const PRE_REGISTERED_CLIENT_SECRET = 'pre-registered-secret';
*/
export class PreRegistrationScenario implements Scenario {
name = 'auth/pre-registration';
specVersions: SpecVersion[] = ['2025-11-25'];
description =
'Tests OAuth flow with pre-registered client credentials. Server does not support DCR.';

Expand Down
Loading
Loading