Skip to content

Comments

fix(mcp-common): classify upstream 4xx errors correctly instead of returning 500#294

Draft
irvinebroque wants to merge 7 commits intomainfrom
fix/5xx-error-classification
Draft

fix(mcp-common): classify upstream 4xx errors correctly instead of returning 500#294
irvinebroque wants to merge 7 commits intomainfrom
fix/5xx-error-classification

Conversation

@irvinebroque
Copy link
Collaborator

Summary

Several error paths in mcp-common unconditionally throw McpError with status 500 or plain Error (which catch blocks convert to 500), regardless of the actual upstream HTTP status. This makes it impossible for clients to distinguish between retriable server errors and actionable client errors like expired tokens or invalid grants.

This PR fixes error classification across 4 files in packages/mcp-common/src/ so that upstream status codes are preserved and Sentry noise from expected client errors is eliminated.

Before/After behavior

refreshAuthToken / getAuthToken (cloudflare-auth.ts)

Scenario Upstream status Before After
Expired/revoked refresh token 400 from dash.cloudflare.com 500 "Failed to get OAuth token", reportToSentry=true 400 with upstream error_description, reportToSentry=false
Bad client credentials 401 from dash.cloudflare.com 500 "Failed to get OAuth token", reportToSentry=true 401 with upstream error_description, reportToSentry=false
Insufficient permissions 403 from dash.cloudflare.com 500 "Failed to get OAuth token", reportToSentry=true 403 with upstream error_description, reportToSentry=false
Rate limited 429 from dash.cloudflare.com 500 "Failed to get OAuth token", reportToSentry=true 429 with upstream error_description, reportToSentry=false
Upstream server error 500+ from dash.cloudflare.com 500 "Failed to get OAuth token", reportToSentry=true 502 "Upstream token service unavailable", reportToSentry=true
Non-JSON error body 400 with plain text 500 "Failed to get OAuth token" 400 "Token refresh failed" (fallback message)

handleTokenExchangeCallback (cloudflare-oauth-handler.ts)

Scenario Before After
Account token refresh attempt 500 "Internal Server Error" 400 "Account tokens cannot be refreshed", reportToSentry=false
Missing refresh token in grant props 500 "Missing refreshToken" 400 "No refresh token available for this grant", reportToSentry=false

getUserAndAccounts (cloudflare-oauth-handler.ts)

Scenario Upstream status Before After
Token revoked, user fetch fails 401 from api.cloudflare.com 500 "Failed to fetch user", reportToSentry=true 401 "Failed to fetch user", reportToSentry=false
Upstream server error on /user 500 from api.cloudflare.com 500 "Failed to fetch user", reportToSentry=true 502 "Upstream user service unavailable", reportToSentry=true
Upstream server error on /accounts 500 from api.cloudflare.com 500 "Failed to fetch accounts", reportToSentry=true 502 "Upstream accounts service unavailable", reportToSentry=true

fetchCloudflareApi (cloudflare-api.ts)

Scenario Upstream status Before After
Resource not found 404 Generic Error (no status code) McpError with status 404, reportToSentry=false
Forbidden 403 Generic Error (no status code) McpError with status 403, reportToSentry=false
Rate limited 429 Generic Error (no status code) McpError with status 429, reportToSentry=false
Upstream server error 500+ Generic Error (no status code) McpError with status preserved, reportToSentry=true

OAuth state validation (workers-oauth-utils.ts)

Scenario Before After
Missing state parameter throw new Error(...) → caught as 500 throw new OAuthError('invalid_request', ..., 400) → caught and returned as JSON
Failed to decode state throw new Error(...) → 500 OAuthError('invalid_request', ..., 400)
Invalid/expired state throw new Error(...) → 500 OAuthError('invalid_request', ..., 400)
Missing session cookie throw new Error(...) → 500 OAuthError('invalid_request', ..., 400)
CSRF state mismatch throw new Error(...) → 500 OAuthError('access_denied', ..., 403)
PKCE violation throw new Error(...) → 500 OAuthError('invalid_request', ..., 400)
Non-POST to approval endpoint throw new Error(...) → 500 OAuthError('invalid_request', ..., 405)
CSRF token mismatch in form throw new Error(...) → 500 OAuthError('access_denied', ..., 403)
Malformed state encoding JSON.parse(atob(...)) → unhandled → 500 OAuthError('invalid_request', ..., 400)

What this does NOT change

  • No breaking changes. McpError extends Error, OAuthError extends Error. All function signatures and return types are identical. Existing catch blocks using instanceof Error continue to work.
  • No changes to the @cloudflare/workers-oauth-provider package. The audit identified a missing error boundary in OAuthProvider.handleRefreshTokenGrant() — that requires an upstream fix to the provider package and is out of scope here.
  • No changes to McpError or OAuthError class definitions.

Test coverage

Added 75 tests across 5 new spec files:

  • cloudflare-auth.spec.ts (14 tests) — every upstream status code path for both getAuthToken and refreshAuthToken
  • cloudflare-oauth-handler.spec.ts (6 tests) — account token, missing refresh token, successful refresh, upstream error propagation, non-refresh grant types
  • cloudflare-api.spec.ts (7 tests) — 404, 403, 429, 500, 502 from Cloudflare API with correct status and Sentry flag
  • workers-oauth-utils.spec.ts (16 tests) — every OAuthError throw path in parseRedirectApproval and validateOAuthState
  • sentry.spec.ts (7 tests) — reportToSentry flag correctness for each error classification

All tests use fetchMock from cloudflare:test for outbound HTTP mocking. Type checking passes clean.

…turning 500

Upstream OAuth token errors (expired grants, bad credentials, rate limits)
from dash.cloudflare.com were all being reclassified as 500, making it
impossible for clients to distinguish retriable errors from auth failures.

- Pass upstream HTTP status through in getAuthToken/refreshAuthToken
  (4xx passed through, 5xx becomes 502 bad gateway)
- Return 400 instead of 500 for account token refresh and missing
  refresh token in handleTokenExchangeCallback
- Use OAuthError instead of plain Error in workers-oauth-utils.ts so
  catch blocks in createAuthHandlers handle them properly
- Preserve upstream status in fetchCloudflareApi using McpError
- Set reportToSentry=false for upstream 4xx to reduce Sentry noise
- Guard JSON.parse(atob(...)) in parseRedirectApproval
…ry on all errors

- fetchCloudflareApi now returns 502 Bad Gateway for upstream 5xx,
  consistent with cloudflare-auth.ts (was passing through raw 500/502/503)
- All McpError throws now set reportToSentry: true per policy
- Updated tests to match
…, parse ordering

- Set reportToSentry=false for 4xx errors (was true everywhere, contradicting
  PR intent to reduce Sentry noise from expected client errors)
- Stop forwarding raw upstream error bodies to clients in fetchCloudflareApi;
  use generic client-facing message, preserve raw detail in internalMessage
- Map OAuth error codes to safe messages in cloudflare-auth instead of
  forwarding raw error_description from upstream token endpoint
- Remove security mechanism names (CSRF, PKCE) from OAuthError descriptions
  returned to clients; use generic messages
- Reorder getUserAndAccounts to check response.ok before Zod parsing,
  preventing ZodError from masking the new status-aware error handling
- Add safeStatusCode() helper to clamp non-standard HTTP status codes to
  valid ContentfulStatusCode values instead of unsafe 'as' casts
- Rewrite sentry.spec.ts to test actual production code paths instead of
  tautologically constructing McpError objects with hardcoded flags
- Update all test assertions to match corrected behavior
All OAuth endpoint error responses now return JSON with 'error' and
'error_description' fields per draft-ietf-oauth-v2-1-13 Section 3.2.4,
instead of plain text responses.

- Add mcpErrorToOAuthResponse helper mapping HTTP status to OAuth codes
- Fix all 3 route handler catch blocks (authorize, authorize POST, callback)
- Convert handleTokenExchangeCallback to throw OAuthError (invalid_grant)
- Wrap refreshAuthToken call to convert McpError to OAuthError at boundary
- Update tests to assert OAuthError with correct error codes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant