feat(claude-agent-sdk): add instrumentation for @anthropic-ai/claude-agent-sdk by mr-lee · Pull Request #7603 · DataDog/dd-trace-js

mr-lee · 2026-02-23T19:41:51Z

What does this PR do?

Adds automatic instrumentation for the Claude Agent SDK (@anthropic-ai/claude-agent-sdk), providing full visibility into agentic sessions built with Anthropic's agent framework.

The integration automatically captures a hierarchical span tree for every query() invocation:

agent span (session)
  └─ workflow span (turn)
       ├─ tool span (tool call)
       └─ agent span (subagent)
            └─ (same structure, 1 level deep)

Span types and what they capture:

Span	Kind	Captures
Session	`agent`	Full agentic session lifecycle, model, permission mode, session ID
Turn	`workflow`	Each model turn (user prompt → assistant response), stop reason
Tool	`tool`	Individual tool calls with input/output
Subagent	`agent`	Nested agent invocations with agent ID and type

Session resumption is supported — when a session is resumed via options.resume, a new agent span is created with a parent_session_id tag linking to the original session for lineage tracing.

How it works

The Claude Agent SDK exposes a first-class hooks API for lifecycle events (SessionStart, SessionEnd, Stop, PreToolUse, PostToolUse, SubagentStart, SubagentStop, etc.). Rather than monkey-patching internal SDK methods, this integration wraps query() and injects tracing hooks via the official API. The hooks publish events on diagnostics channels, which the tracing and LLM Obs plugin layers consume to create and enrich spans.

This follows the multi-plugin pattern established by the LangChain integration — multiple channel prefixes, multiple TracingPlugin/LLMObsPlugin subclasses sharing a base, wired together via a CompositePlugin.

Architecture

Layer 1: Shimmer (claude-agent-sdk.js)
  └─ Wraps query(), injects hook callbacks that publish on 4 diagnostics channels:
     session, turn, tool, subagent

Layer 2: Tracing Plugins (tracing.js)
  └─ 4 TracingPlugin subclasses, one per channel prefix
     Create/finish APM spans

Layer 3: LLM Obs Plugins (claude-agent-sdk/index.js)
  └─ 4 LLMObsPlugin subclasses, one per channel prefix
     Enrich spans with kind, IO, and metadata tags

Layer 4: CompositePlugin (index.js)
  └─ Wires all 8 sub-plugins under '@anthropic-ai/claude-agent-sdk'

Motivation

The Claude Agent SDK is GA and growing in adoption for building agentic applications with Claude. It provides a TypeScript API for running multi-turn agent sessions with tool use, subagent orchestration, and session resumption.

dd-trace-js already has an Anthropic integration for @anthropic-ai/sdk (base Messages API), but that only captures individual LLM calls — it has no awareness of the orchestration layer above. This integration fills that gap, giving users visibility into:

How their agent sessions decompose into turns, tool calls, and subagent dispatches
Which tools are called, with what inputs, and what they return
How subagents contribute to the parent session
Session resumption lineage across multiple query() invocations

This complements the existing @anthropic-ai/sdk integration (which captures per-API-call llm spans) by adding the agent → workflow → tool hierarchy above it.

Performance

A sirun benchmark (benchmark/sirun/plugin-claude-agent-sdk/) measures the full instrumentation overhead. Each simulated session has 3 turns with 2 tool calls each — 10 spans and 22 hook invocations per session.

Metric	Value
Overhead per session	~44 µs
Overhead per span	~4.4 µs
Overhead vs 30s real session	0.000147%

The overhead is entirely synchronous — no async/await, no promises, no functional array methods in production code. Hook callbacks do property assignments and diagnostics channel publishes. mergeHooks and buildTracerHooks run once per query() call, not per-event.

A real query() invocation makes API calls to Claude and executes tool operations (file reads, bash commands, etc.), typically taking 30 seconds to 10+ minutes. The ~44µs of instrumentation overhead is negligible.

Tests

41 APM tracing tests — shimmer unit tests (mergeHooks, buildTracerHooks, wrapQuery), channel-based span tests (all 4 span types + hierarchy + error handling), wrapQuery integration tests with active subscribers
10 LLM Obs tests — span kind, IO tagging, and metadata for session, turn, tool, subagent
Lint clean — yarn lint passes with zero warnings

Additional Notes

Pure ESM package: @anthropic-ai/claude-agent-sdk is a single minified ESM bundle (sdk.mjs). The shimmer relies on import-in-the-middle for ESM interop.
No internal monkey-patching: Unlike most integrations, this one uses the SDK's official hooks API to inject tracing callbacks. This makes it resilient to internal SDK refactors.
User hooks are preserved: The shimmer merges tracer hooks alongside any user-provided hooks — it never overwrites or interferes with user callbacks.
Known gap — no LLM-level spans (yet): The Agent SDK ships as a single minified ESM bundle (sdk.mjs) with zero runtime dependencies. Its internal Anthropic API client is bundled — it never require()s or imports @anthropic-ai/sdk, so the existing Anthropic shimmer doesn't fire. This means per-API-call llm spans (token usage, cache metrics, model parameters) are not captured in this PR. What is captured: session lifecycle, turn decomposition, full tool I/O, and subagent orchestration — which covers the orchestration layer the existing integration can't see. Options for closing the LLM span gap in a follow-up:
1. undici:* diagnostics channels — intercept outbound HTTP requests to api.anthropic.com at the transport level. SDK-agnostic, works regardless of bundling, but requires parsing raw request/response bodies.
2. Upstream hooks — if Anthropic adds diagnostics_channel or telemetry hooks inside the Agent SDK, the integration becomes straightforward. Could be a feature request.
3. Bundle-internal shimming — the minified sdk.mjs has an internal HTTP client that could be patched at load time. Fragile and version-dependent, but feasible for targeted extraction (e.g., token counts from response headers).
semver-minor: This is a purely additive new integration with no impact on existing functionality.

Files created

File	Purpose
`packages/datadog-instrumentations/src/claude-agent-sdk.js`	Shimmer: wrap `query()`, inject hooks, publish channels
`packages/datadog-plugin-claude-agent-sdk/src/index.js`	CompositePlugin
`packages/datadog-plugin-claude-agent-sdk/src/tracing.js`	4 TracingPlugin subclasses
`packages/dd-trace/src/llmobs/plugins/claude-agent-sdk/index.js`	4 LLMObsPlugin subclasses
`packages/datadog-plugin-claude-agent-sdk/test/index.spec.js`	APM tracing tests (41 tests)
`packages/dd-trace/test/llmobs/plugins/claude-agent-sdk/index.spec.js`	LLM Obs tests (10 tests)
`benchmark/sirun/plugin-claude-agent-sdk/`	Sirun benchmark (overhead profiling)

Files modified

File	Change
`packages/dd-trace/src/plugins/index.js`	Added lazy getter for `@anthropic-ai/claude-agent-sdk`
`packages/datadog-instrumentations/src/helpers/hooks.js`	Hook registration with `esmFirst: true`
`packages/dd-trace/src/config/supported-configurations.json`	`DD_TRACE_CLAUDE_AGENT_SDK_ENABLED`
`index.d.ts`	TypeScript definitions for plugin config
`docs/test.ts`	Type test for the new plugin
`docs/API.md`	Documentation entry
`.github/workflows/llmobs.yml`	CI job for plugin tests

Follow-up PRs

Item	Notes
LLM-level spans via `undici:*` channels	Capture per-turn token usage, cache metrics, model parameters

codecov · 2026-02-23T23:34:30Z

Codecov Report

❌ Patch coverage is 98.16514% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.37%. Comparing base (f370fb3) to head (e6a51ab).
⚠️ Report is 3 commits behind head on master.

Files with missing lines	Patch %	Lines
...s/datadog-instrumentations/src/claude-agent-sdk.js	97.91%	2 Missing ⚠️
...ages/datadog-instrumentations/src/helpers/hooks.js	0.00%	1 Missing ⚠️
...ges/datadog-plugin-claude-agent-sdk/src/tracing.js	97.72%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7603      +/-   ##
==========================================
+ Coverage   80.30%   80.37%   +0.07%     
==========================================
  Files         733      737       +4     
  Lines       31561    31783     +222     
==========================================
+ Hits        25345    25546     +201     
- Misses       6216     6237      +21

Flag	Coverage Δ
aiguard-macos	`38.95% <0.00%> (-0.09%)`	⬇️
aiguard-ubuntu	`39.07% <0.00%> (-0.09%)`	⬇️
aiguard-windows	`38.80% <0.00%> (-0.09%)`	⬇️
apm-capabilities-tracing-macos	`48.52% <31.14%> (-0.09%)`	⬇️
apm-capabilities-tracing-ubuntu	`48.56% <31.14%> (-0.08%)`	⬇️
apm-capabilities-tracing-windows	`48.26% <31.14%> (-0.08%)`	⬇️
apm-integrations-child-process	`38.53% <0.00%> (-0.09%)`	⬇️
apm-integrations-couchbase-18	`37.44% <0.00%> (-0.08%)`	⬇️
apm-integrations-couchbase-eol	`?`
apm-integrations-oracledb	`37.75% <0.00%> (-0.08%)`	⬇️
appsec-express	`55.53% <0.00%> (-0.07%)`	⬇️
appsec-fastify	`51.85% <0.00%> (-0.07%)`	⬇️
appsec-graphql	`52.04% <0.00%> (-0.07%)`	⬇️
appsec-kafka	`44.47% <0.00%> (-0.13%)`	⬇️
appsec-ldapjs	`44.10% <0.00%> (-0.07%)`	⬇️
appsec-lodash	`43.79% <0.00%> (-0.07%)`	⬇️
appsec-macos	`58.61% <0.00%> (-0.07%)`	⬇️
appsec-mongodb-core	`48.96% <0.00%> (+0.04%)`	⬆️
appsec-mongoose	`49.65% <0.00%> (-0.07%)`	⬇️
appsec-mysql	`51.02% <0.00%> (-0.07%)`	⬇️
appsec-node-serialize	`43.30% <0.00%> (-0.07%)`	⬇️
appsec-passport	`47.79% <0.00%> (-0.08%)`	⬇️
appsec-postgres	`50.76% <0.00%> (-0.09%)`	⬇️
appsec-sourcing	`42.65% <0.00%> (-0.07%)`	⬇️
appsec-template	`43.47% <0.00%> (-0.07%)`	⬇️
appsec-ubuntu	`58.69% <0.00%> (-0.07%)`	⬇️
appsec-windows	`58.45% <0.00%> (-0.09%)`	⬇️
instrumentations-instrumentation-bluebird	`32.23% <0.00%> (-0.09%)`	⬇️
instrumentations-instrumentation-body-parser	`40.53% <0.00%> (-0.08%)`	⬇️
instrumentations-instrumentation-child_process	`37.84% <0.00%> (-0.09%)`	⬇️
instrumentations-instrumentation-cookie-parser	`34.26% <0.00%> (-0.08%)`	⬇️
instrumentations-instrumentation-express	`34.60% <0.00%> (-0.08%)`	⬇️
instrumentations-instrumentation-express-mongo-sanitize	`34.40% <0.00%> (-0.08%)`	⬇️
instrumentations-instrumentation-express-session	`40.15% <0.00%> (-0.08%)`	⬇️
instrumentations-instrumentation-fs	`31.83% <0.00%> (-0.09%)`	⬇️
instrumentations-instrumentation-generic-pool	`29.82% <0.00%> (+0.06%)`	⬆️
instrumentations-instrumentation-http	`39.87% <0.00%> (-0.09%)`	⬇️
instrumentations-instrumentation-knex	`32.23% <0.00%> (-0.09%)`	⬇️
instrumentations-instrumentation-mongoose	`33.39% <0.00%> (-0.08%)`	⬇️
instrumentations-instrumentation-multer	`40.27% <0.00%> (-0.08%)`	⬇️
instrumentations-instrumentation-mysql2	`38.31% <0.00%> (-0.09%)`	⬇️
instrumentations-instrumentation-passport	`44.10% <0.00%> (-0.08%)`	⬇️
instrumentations-instrumentation-passport-http	`43.77% <0.00%> (-0.08%)`	⬇️
instrumentations-instrumentation-passport-local	`44.32% <0.00%> (-0.08%)`	⬇️
instrumentations-instrumentation-pg	`37.73% <0.00%> (-0.09%)`	⬇️
instrumentations-instrumentation-promise	`32.15% <0.00%> (-0.09%)`	⬇️
instrumentations-instrumentation-promise-js	`32.16% <0.00%> (-0.09%)`	⬇️
instrumentations-instrumentation-q	`32.21% <0.00%> (-0.09%)`	⬇️
instrumentations-instrumentation-url	`32.13% <0.00%> (-0.09%)`	⬇️
instrumentations-instrumentation-when	`32.18% <0.00%> (-0.09%)`	⬇️
llmobs-ai	`41.35% <0.00%> (-0.08%)`	⬇️
llmobs-anthropic	`40.34% <0.00%> (-0.08%)`	⬇️
llmobs-bedrock	`39.27% <0.00%> (-0.07%)`	⬇️
llmobs-claude-agent-sdk	`40.62% <98.16%> (?)`
llmobs-google-genai	`39.86% <0.00%> (-0.08%)`	⬇️
llmobs-langchain	`39.45% <0.00%> (-0.07%)`	⬇️
llmobs-openai	`44.15% <0.00%> (-0.08%)`	⬇️
llmobs-vertex-ai	`40.13% <0.00%> (-0.08%)`	⬇️
platform-core	`29.71% <ø> (ø)`
platform-esbuild	`32.89% <ø> (ø)`
platform-instrumentations-misc	`40.53% <ø> (ø)`
platform-shimmer	`36.14% <ø> (ø)`
platform-unit-guardrails	`31.27% <ø> (ø)`
plugins-azure-event-hubs	`24.02% <ø> (ø)`
plugins-azure-service-bus	`23.42% <ø> (ø)`
plugins-bullmq	`43.70% <0.00%> (-0.16%)`	⬇️
plugins-cassandra	`37.79% <0.00%> (-0.08%)`	⬇️
plugins-cookie	`25.08% <ø> (ø)`
plugins-cookie-parser	`24.87% <ø> (ø)`
plugins-crypto	`24.72% <ø> (ø)`
plugins-dd-trace-api	`38.38% <0.00%> (-0.09%)`	⬇️
plugins-express-mongo-sanitize	`25.04% <ø> (ø)`
plugins-express-session	`24.83% <ø> (ø)`
plugins-fastify	`42.29% <0.00%> (-0.08%)`	⬇️
plugins-fetch	`38.34% <0.00%> (-0.08%)`	⬇️
plugins-fs	`38.63% <0.00%> (-0.09%)`	⬇️
plugins-generic-pool	`24.06% <ø> (ø)`
plugins-google-cloud-pubsub	`45.47% <0.00%> (-0.08%)`	⬇️
plugins-grpc	`40.99% <0.00%> (-0.08%)`	⬇️
plugins-handlebars	`25.08% <ø> (ø)`
plugins-hapi	`40.16% <0.00%> (-0.23%)`	⬇️
plugins-hono	`40.43% <0.00%> (-0.08%)`	⬇️
plugins-ioredis	`38.44% <0.00%> (-0.09%)`	⬇️
plugins-knex	`24.80% <ø> (ø)`
plugins-ldapjs	`22.61% <ø> (ø)`
plugins-light-my-request	`24.48% <ø> (ø)`
plugins-limitd-client	`32.52% <0.00%> (-0.09%)`	⬇️
plugins-lodash	`24.13% <ø> (ø)`
plugins-mariadb	`39.51% <0.00%> (-0.09%)`	⬇️
plugins-memcached	`38.17% <0.00%> (-0.09%)`	⬇️
plugins-microgateway-core	`39.19% <0.00%> (-0.08%)`	⬇️
plugins-moleculer	`40.55% <0.00%> (-0.08%)`	⬇️
plugins-mongodb	`39.22% <0.00%> (-0.08%)`	⬇️
plugins-mongodb-core	`39.05% <0.00%> (-0.09%)`	⬇️
plugins-mongoose	`38.87% <0.00%> (-0.08%)`	⬇️
plugins-multer	`24.83% <ø> (ø)`
plugins-mysql	`39.19% <0.00%> (-0.09%)`	⬇️
plugins-mysql2	`39.29% <0.00%> (-0.09%)`	⬇️
plugins-node-serialize	`25.12% <ø> (ø)`
plugins-opensearch	`37.62% <0.00%> (-0.08%)`	⬇️
plugins-passport-http	`24.91% <ø> (ø)`
plugins-postgres	`35.71% <0.00%> (-0.07%)`	⬇️
plugins-process	`24.72% <ø> (ø)`
plugins-pug	`25.08% <ø> (ø)`
plugins-redis	`38.91% <0.00%> (-0.09%)`	⬇️
plugins-router	`43.04% <0.00%> (-0.09%)`	⬇️
plugins-sequelize	`23.66% <ø> (ø)`
plugins-test-and-upstream-amqp10	`38.36% <0.00%> (-0.24%)`	⬇️
plugins-test-and-upstream-amqplib	`43.92% <0.00%> (-0.09%)`	⬇️
plugins-test-and-upstream-apollo	`39.05% <0.00%> (-0.08%)`	⬇️
plugins-test-and-upstream-avsc	`38.72% <0.00%> (-0.09%)`	⬇️
plugins-test-and-upstream-bunyan	`33.82% <0.00%> (-0.09%)`	⬇️
plugins-test-and-upstream-connect	`40.83% <0.00%> (-0.09%)`	⬇️
plugins-test-and-upstream-graphql	`40.17% <0.00%> (-0.09%)`	⬇️
plugins-test-and-upstream-koa	`40.41% <0.00%> (-0.08%)`	⬇️
plugins-test-and-upstream-protobufjs	`38.95% <0.00%> (-0.09%)`	⬇️
plugins-test-and-upstream-rhea	`44.12% <0.00%> (-0.09%)`	⬇️
plugins-undici	`39.13% <0.00%> (-0.08%)`	⬇️
plugins-url	`24.72% <ø> (ø)`
plugins-valkey	`38.09% <0.00%> (-0.09%)`	⬇️
plugins-vm	`24.72% <ø> (ø)`
plugins-winston	`34.02% <0.00%> (-0.08%)`	⬇️
plugins-ws	`41.93% <0.00%> (-0.09%)`	⬇️
profiling-macos	`39.87% <0.00%> (-0.09%)`	⬇️
profiling-ubuntu	`39.99% <0.00%> (-0.09%)`	⬇️
profiling-windows	`41.21% <0.00%> (-0.08%)`	⬇️
serverless-azure-functions-client	`23.75% <ø> (ø)`
serverless-azure-functions-eventhubs	`23.75% <ø> (ø)`
serverless-azure-functions-servicebus	`23.75% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…agent-sdk Adds automatic instrumentation for the Claude Agent SDK, providing visibility into agentic sessions built with Anthropic's agent framework. The integration captures a hierarchical span tree for every query() invocation: agent (session) → workflow (turn) → tool (tool call) → agent (subagent). Session resumption is supported via parent_session_id tag linking. Uses the SDK's official hooks API (SessionStart, SessionEnd, Stop, PreToolUse, PostToolUse, SubagentStart, SubagentStop) rather than monkey-patching internals. Follows the multi-plugin pattern from LangChain: 4 diagnostics channel families, 4 TracingPlugin + 4 LLMObsPlugin subclasses, wired via CompositePlugin.

The Claude Agent SDK is pure ESM and cannot be CJS-required in the test harness. Without loading the actual SDK module, the plugin manager never receives the dd-trace:instrumentation:load event and the TracingPlugin channel subscriptions are never activated — causing all channel-based tests to time out waiting for traces that never arrive. Fix by publishing the load channel event directly after agent.load() to trigger plugin registration without needing the actual ESM SDK module.

…rted-configurations.json Required for the plugin_manager's getEnabled() to validate the configuration key when the plugin is loaded via the load channel.

…st assertions The span processor only includes model_name and model_provider in span events for 'llm' and 'embedding' span kinds. Our spans are 'agent', 'workflow', and 'tool' — so these fields are never output. Remove them from all 13 test assertions to match actual span processor behavior.

…ndling The span processor always includes `input: {}` in span events, but the test assertion helper only deletes `actual.meta.output` (not input) before deepStrictEqual. This caused 5 tests with empty inputValue to fail because `input: {}` was present in actual but not in expected. Fix by: - Removing 3 edge-case tests that relied on empty input (already covered by APM test suite) - Adding tagTextIO call to subagent plugin using agent type/ID as input value, providing meaningful data for the span - Updating subagent test assertions to match

Add 4 integration tests for wrapQuery when channel subscribers are active. These cover the 20 uncovered lines in claude-agent-sdk.js: - Normal subscriber path with sync return - Undefined options handling (resolvedOptions fallback) - Sync error path (error + asyncEnd publish + rethrow) - Async rejection path (thenable detection + rejection handler)

…head Simulates 500 agent sessions (3 turns x 2 tool calls each = 10 spans, 22 hook invocations per session). Measures the overhead of hook injection, diagnostics channel publishing, and span creation/finishing. Results: ~44µs per session, ~4.4µs per span — negligible vs real query() calls which take 30s-600s for API calls and tool execution.

mr-lee requested review from a team as code owners February 23, 2026 19:41

mr-lee force-pushed the feat/claude-agent-sdk-integration branch 3 times, most recently from ca06588 to 0b55aab Compare February 23, 2026 22:39

mr-lee force-pushed the feat/claude-agent-sdk-integration branch from 0b55aab to 6e28bea Compare February 24, 2026 01:58

mr-lee force-pushed the feat/claude-agent-sdk-integration branch from 6e28bea to ddaa3f1 Compare February 24, 2026 02:42

mr-lee added 2 commits February 23, 2026 22:05

fix(claude-agent-sdk): add DD_TRACE_CLAUDE_AGENT_SDK_ENABLED to suppo…

6f62ad6

…rted-configurations.json Required for the plugin_manager's getEnabled() to validate the configuration key when the plugin is loaded via the load channel.

mr-lee requested a review from a team as a code owner February 24, 2026 03:11

mr-lee requested review from ida613 and removed request for a team February 24, 2026 03:11

mr-lee added 5 commits February 23, 2026 22:21

style(claude-agent-sdk): fix padded-blocks lint errors in LLM Obs tests

327611a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat(claude-agent-sdk): add instrumentation for @anthropic-ai/claude-agent-sdk#7603

feat(claude-agent-sdk): add instrumentation for @anthropic-ai/claude-agent-sdk#7603
mr-lee wants to merge 8 commits intoDataDog:masterfrom
mr-lee:feat/claude-agent-sdk-integration

mr-lee commented Feb 23, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

mr-lee commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

How it works

Architecture

Motivation

Performance

Tests

Additional Notes

Files created

Files modified

Follow-up PRs

Uh oh!

codecov bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mr-lee commented Feb 23, 2026 •

edited

Loading

codecov bot commented Feb 23, 2026 •

edited

Loading