Skip to content

Comments

feat(claude-agent-sdk): add instrumentation for @anthropic-ai/claude-agent-sdk#7603

Open
mr-lee wants to merge 8 commits intoDataDog:masterfrom
mr-lee:feat/claude-agent-sdk-integration
Open

feat(claude-agent-sdk): add instrumentation for @anthropic-ai/claude-agent-sdk#7603
mr-lee wants to merge 8 commits intoDataDog:masterfrom
mr-lee:feat/claude-agent-sdk-integration

Conversation

@mr-lee
Copy link

@mr-lee mr-lee commented Feb 23, 2026

What does this PR do?

Adds automatic instrumentation for the Claude Agent SDK (@anthropic-ai/claude-agent-sdk), providing full visibility into agentic sessions built with Anthropic's agent framework.

The integration automatically captures a hierarchical span tree for every query() invocation:

agent span (session)
  └─ workflow span (turn)
       ├─ tool span (tool call)
       └─ agent span (subagent)
            └─ (same structure, 1 level deep)

Span types and what they capture:

Span Kind Captures
Session agent Full agentic session lifecycle, model, permission mode, session ID
Turn workflow Each model turn (user prompt → assistant response), stop reason
Tool tool Individual tool calls with input/output
Subagent agent Nested agent invocations with agent ID and type

Session resumption is supported — when a session is resumed via options.resume, a new agent span is created with a parent_session_id tag linking to the original session for lineage tracing.

How it works

The Claude Agent SDK exposes a first-class hooks API for lifecycle events (SessionStart, SessionEnd, Stop, PreToolUse, PostToolUse, SubagentStart, SubagentStop, etc.). Rather than monkey-patching internal SDK methods, this integration wraps query() and injects tracing hooks via the official API. The hooks publish events on diagnostics channels, which the tracing and LLM Obs plugin layers consume to create and enrich spans.

This follows the multi-plugin pattern established by the LangChain integration — multiple channel prefixes, multiple TracingPlugin/LLMObsPlugin subclasses sharing a base, wired together via a CompositePlugin.

Architecture

Layer 1: Shimmer (claude-agent-sdk.js)
  └─ Wraps query(), injects hook callbacks that publish on 4 diagnostics channels:
     session, turn, tool, subagent

Layer 2: Tracing Plugins (tracing.js)
  └─ 4 TracingPlugin subclasses, one per channel prefix
     Create/finish APM spans

Layer 3: LLM Obs Plugins (claude-agent-sdk/index.js)
  └─ 4 LLMObsPlugin subclasses, one per channel prefix
     Enrich spans with kind, IO, and metadata tags

Layer 4: CompositePlugin (index.js)
  └─ Wires all 8 sub-plugins under '@anthropic-ai/claude-agent-sdk'

Motivation

The Claude Agent SDK is GA and growing in adoption for building agentic applications with Claude. It provides a TypeScript API for running multi-turn agent sessions with tool use, subagent orchestration, and session resumption.

dd-trace-js already has an Anthropic integration for @anthropic-ai/sdk (base Messages API), but that only captures individual LLM calls — it has no awareness of the orchestration layer above. This integration fills that gap, giving users visibility into:

  • How their agent sessions decompose into turns, tool calls, and subagent dispatches
  • Which tools are called, with what inputs, and what they return
  • How subagents contribute to the parent session
  • Session resumption lineage across multiple query() invocations

This complements the existing @anthropic-ai/sdk integration (which captures per-API-call llm spans) by adding the agentworkflowtool hierarchy above it.

Performance

A sirun benchmark (benchmark/sirun/plugin-claude-agent-sdk/) measures the full instrumentation overhead. Each simulated session has 3 turns with 2 tool calls each — 10 spans and 22 hook invocations per session.

Metric Value
Overhead per session ~44 µs
Overhead per span ~4.4 µs
Overhead vs 30s real session 0.000147%

The overhead is entirely synchronous — no async/await, no promises, no functional array methods in production code. Hook callbacks do property assignments and diagnostics channel publishes. mergeHooks and buildTracerHooks run once per query() call, not per-event.

A real query() invocation makes API calls to Claude and executes tool operations (file reads, bash commands, etc.), typically taking 30 seconds to 10+ minutes. The ~44µs of instrumentation overhead is negligible.

Tests

  • 41 APM tracing tests — shimmer unit tests (mergeHooks, buildTracerHooks, wrapQuery), channel-based span tests (all 4 span types + hierarchy + error handling), wrapQuery integration tests with active subscribers
  • 10 LLM Obs tests — span kind, IO tagging, and metadata for session, turn, tool, subagent
  • Lint cleanyarn lint passes with zero warnings

Additional Notes

  • Pure ESM package: @anthropic-ai/claude-agent-sdk is a single minified ESM bundle (sdk.mjs). The shimmer relies on import-in-the-middle for ESM interop.
  • No internal monkey-patching: Unlike most integrations, this one uses the SDK's official hooks API to inject tracing callbacks. This makes it resilient to internal SDK refactors.
  • User hooks are preserved: The shimmer merges tracer hooks alongside any user-provided hooks — it never overwrites or interferes with user callbacks.
  • Known gap — no LLM-level spans (yet): The Agent SDK ships as a single minified ESM bundle (sdk.mjs) with zero runtime dependencies. Its internal Anthropic API client is bundled — it never require()s or imports @anthropic-ai/sdk, so the existing Anthropic shimmer doesn't fire. This means per-API-call llm spans (token usage, cache metrics, model parameters) are not captured in this PR. What is captured: session lifecycle, turn decomposition, full tool I/O, and subagent orchestration — which covers the orchestration layer the existing integration can't see. Options for closing the LLM span gap in a follow-up:
    1. undici:* diagnostics channels — intercept outbound HTTP requests to api.anthropic.com at the transport level. SDK-agnostic, works regardless of bundling, but requires parsing raw request/response bodies.
    2. Upstream hooks — if Anthropic adds diagnostics_channel or telemetry hooks inside the Agent SDK, the integration becomes straightforward. Could be a feature request.
    3. Bundle-internal shimming — the minified sdk.mjs has an internal HTTP client that could be patched at load time. Fragile and version-dependent, but feasible for targeted extraction (e.g., token counts from response headers).
  • semver-minor: This is a purely additive new integration with no impact on existing functionality.

Files created

File Purpose
packages/datadog-instrumentations/src/claude-agent-sdk.js Shimmer: wrap query(), inject hooks, publish channels
packages/datadog-plugin-claude-agent-sdk/src/index.js CompositePlugin
packages/datadog-plugin-claude-agent-sdk/src/tracing.js 4 TracingPlugin subclasses
packages/dd-trace/src/llmobs/plugins/claude-agent-sdk/index.js 4 LLMObsPlugin subclasses
packages/datadog-plugin-claude-agent-sdk/test/index.spec.js APM tracing tests (41 tests)
packages/dd-trace/test/llmobs/plugins/claude-agent-sdk/index.spec.js LLM Obs tests (10 tests)
benchmark/sirun/plugin-claude-agent-sdk/ Sirun benchmark (overhead profiling)

Files modified

File Change
packages/dd-trace/src/plugins/index.js Added lazy getter for @anthropic-ai/claude-agent-sdk
packages/datadog-instrumentations/src/helpers/hooks.js Hook registration with esmFirst: true
packages/dd-trace/src/config/supported-configurations.json DD_TRACE_CLAUDE_AGENT_SDK_ENABLED
index.d.ts TypeScript definitions for plugin config
docs/test.ts Type test for the new plugin
docs/API.md Documentation entry
.github/workflows/llmobs.yml CI job for plugin tests

Follow-up PRs

Item Notes
LLM-level spans via undici:* channels Capture per-turn token usage, cache metrics, model parameters

@mr-lee mr-lee requested review from a team as code owners February 23, 2026 19:41
@mr-lee mr-lee force-pushed the feat/claude-agent-sdk-integration branch 3 times, most recently from ca06588 to 0b55aab Compare February 23, 2026 22:39
@codecov
Copy link

codecov bot commented Feb 23, 2026

Codecov Report

❌ Patch coverage is 98.16514% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.37%. Comparing base (f370fb3) to head (e6a51ab).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
...s/datadog-instrumentations/src/claude-agent-sdk.js 97.91% 2 Missing ⚠️
...ages/datadog-instrumentations/src/helpers/hooks.js 0.00% 1 Missing ⚠️
...ges/datadog-plugin-claude-agent-sdk/src/tracing.js 97.72% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7603      +/-   ##
==========================================
+ Coverage   80.30%   80.37%   +0.07%     
==========================================
  Files         733      737       +4     
  Lines       31561    31783     +222     
==========================================
+ Hits        25345    25546     +201     
- Misses       6216     6237      +21     
Flag Coverage Δ
aiguard-macos 38.95% <0.00%> (-0.09%) ⬇️
aiguard-ubuntu 39.07% <0.00%> (-0.09%) ⬇️
aiguard-windows 38.80% <0.00%> (-0.09%) ⬇️
apm-capabilities-tracing-macos 48.52% <31.14%> (-0.09%) ⬇️
apm-capabilities-tracing-ubuntu 48.56% <31.14%> (-0.08%) ⬇️
apm-capabilities-tracing-windows 48.26% <31.14%> (-0.08%) ⬇️
apm-integrations-child-process 38.53% <0.00%> (-0.09%) ⬇️
apm-integrations-couchbase-18 37.44% <0.00%> (-0.08%) ⬇️
apm-integrations-couchbase-eol ?
apm-integrations-oracledb 37.75% <0.00%> (-0.08%) ⬇️
appsec-express 55.53% <0.00%> (-0.07%) ⬇️
appsec-fastify 51.85% <0.00%> (-0.07%) ⬇️
appsec-graphql 52.04% <0.00%> (-0.07%) ⬇️
appsec-kafka 44.47% <0.00%> (-0.13%) ⬇️
appsec-ldapjs 44.10% <0.00%> (-0.07%) ⬇️
appsec-lodash 43.79% <0.00%> (-0.07%) ⬇️
appsec-macos 58.61% <0.00%> (-0.07%) ⬇️
appsec-mongodb-core 48.96% <0.00%> (+0.04%) ⬆️
appsec-mongoose 49.65% <0.00%> (-0.07%) ⬇️
appsec-mysql 51.02% <0.00%> (-0.07%) ⬇️
appsec-node-serialize 43.30% <0.00%> (-0.07%) ⬇️
appsec-passport 47.79% <0.00%> (-0.08%) ⬇️
appsec-postgres 50.76% <0.00%> (-0.09%) ⬇️
appsec-sourcing 42.65% <0.00%> (-0.07%) ⬇️
appsec-template 43.47% <0.00%> (-0.07%) ⬇️
appsec-ubuntu 58.69% <0.00%> (-0.07%) ⬇️
appsec-windows 58.45% <0.00%> (-0.09%) ⬇️
instrumentations-instrumentation-bluebird 32.23% <0.00%> (-0.09%) ⬇️
instrumentations-instrumentation-body-parser 40.53% <0.00%> (-0.08%) ⬇️
instrumentations-instrumentation-child_process 37.84% <0.00%> (-0.09%) ⬇️
instrumentations-instrumentation-cookie-parser 34.26% <0.00%> (-0.08%) ⬇️
instrumentations-instrumentation-express 34.60% <0.00%> (-0.08%) ⬇️
instrumentations-instrumentation-express-mongo-sanitize 34.40% <0.00%> (-0.08%) ⬇️
instrumentations-instrumentation-express-session 40.15% <0.00%> (-0.08%) ⬇️
instrumentations-instrumentation-fs 31.83% <0.00%> (-0.09%) ⬇️
instrumentations-instrumentation-generic-pool 29.82% <0.00%> (+0.06%) ⬆️
instrumentations-instrumentation-http 39.87% <0.00%> (-0.09%) ⬇️
instrumentations-instrumentation-knex 32.23% <0.00%> (-0.09%) ⬇️
instrumentations-instrumentation-mongoose 33.39% <0.00%> (-0.08%) ⬇️
instrumentations-instrumentation-multer 40.27% <0.00%> (-0.08%) ⬇️
instrumentations-instrumentation-mysql2 38.31% <0.00%> (-0.09%) ⬇️
instrumentations-instrumentation-passport 44.10% <0.00%> (-0.08%) ⬇️
instrumentations-instrumentation-passport-http 43.77% <0.00%> (-0.08%) ⬇️
instrumentations-instrumentation-passport-local 44.32% <0.00%> (-0.08%) ⬇️
instrumentations-instrumentation-pg 37.73% <0.00%> (-0.09%) ⬇️
instrumentations-instrumentation-promise 32.15% <0.00%> (-0.09%) ⬇️
instrumentations-instrumentation-promise-js 32.16% <0.00%> (-0.09%) ⬇️
instrumentations-instrumentation-q 32.21% <0.00%> (-0.09%) ⬇️
instrumentations-instrumentation-url 32.13% <0.00%> (-0.09%) ⬇️
instrumentations-instrumentation-when 32.18% <0.00%> (-0.09%) ⬇️
llmobs-ai 41.35% <0.00%> (-0.08%) ⬇️
llmobs-anthropic 40.34% <0.00%> (-0.08%) ⬇️
llmobs-bedrock 39.27% <0.00%> (-0.07%) ⬇️
llmobs-claude-agent-sdk 40.62% <98.16%> (?)
llmobs-google-genai 39.86% <0.00%> (-0.08%) ⬇️
llmobs-langchain 39.45% <0.00%> (-0.07%) ⬇️
llmobs-openai 44.15% <0.00%> (-0.08%) ⬇️
llmobs-vertex-ai 40.13% <0.00%> (-0.08%) ⬇️
platform-core 29.71% <ø> (ø)
platform-esbuild 32.89% <ø> (ø)
platform-instrumentations-misc 40.53% <ø> (ø)
platform-shimmer 36.14% <ø> (ø)
platform-unit-guardrails 31.27% <ø> (ø)
plugins-azure-event-hubs 24.02% <ø> (ø)
plugins-azure-service-bus 23.42% <ø> (ø)
plugins-bullmq 43.70% <0.00%> (-0.16%) ⬇️
plugins-cassandra 37.79% <0.00%> (-0.08%) ⬇️
plugins-cookie 25.08% <ø> (ø)
plugins-cookie-parser 24.87% <ø> (ø)
plugins-crypto 24.72% <ø> (ø)
plugins-dd-trace-api 38.38% <0.00%> (-0.09%) ⬇️
plugins-express-mongo-sanitize 25.04% <ø> (ø)
plugins-express-session 24.83% <ø> (ø)
plugins-fastify 42.29% <0.00%> (-0.08%) ⬇️
plugins-fetch 38.34% <0.00%> (-0.08%) ⬇️
plugins-fs 38.63% <0.00%> (-0.09%) ⬇️
plugins-generic-pool 24.06% <ø> (ø)
plugins-google-cloud-pubsub 45.47% <0.00%> (-0.08%) ⬇️
plugins-grpc 40.99% <0.00%> (-0.08%) ⬇️
plugins-handlebars 25.08% <ø> (ø)
plugins-hapi 40.16% <0.00%> (-0.23%) ⬇️
plugins-hono 40.43% <0.00%> (-0.08%) ⬇️
plugins-ioredis 38.44% <0.00%> (-0.09%) ⬇️
plugins-knex 24.80% <ø> (ø)
plugins-ldapjs 22.61% <ø> (ø)
plugins-light-my-request 24.48% <ø> (ø)
plugins-limitd-client 32.52% <0.00%> (-0.09%) ⬇️
plugins-lodash 24.13% <ø> (ø)
plugins-mariadb 39.51% <0.00%> (-0.09%) ⬇️
plugins-memcached 38.17% <0.00%> (-0.09%) ⬇️
plugins-microgateway-core 39.19% <0.00%> (-0.08%) ⬇️
plugins-moleculer 40.55% <0.00%> (-0.08%) ⬇️
plugins-mongodb 39.22% <0.00%> (-0.08%) ⬇️
plugins-mongodb-core 39.05% <0.00%> (-0.09%) ⬇️
plugins-mongoose 38.87% <0.00%> (-0.08%) ⬇️
plugins-multer 24.83% <ø> (ø)
plugins-mysql 39.19% <0.00%> (-0.09%) ⬇️
plugins-mysql2 39.29% <0.00%> (-0.09%) ⬇️
plugins-node-serialize 25.12% <ø> (ø)
plugins-opensearch 37.62% <0.00%> (-0.08%) ⬇️
plugins-passport-http 24.91% <ø> (ø)
plugins-postgres 35.71% <0.00%> (-0.07%) ⬇️
plugins-process 24.72% <ø> (ø)
plugins-pug 25.08% <ø> (ø)
plugins-redis 38.91% <0.00%> (-0.09%) ⬇️
plugins-router 43.04% <0.00%> (-0.09%) ⬇️
plugins-sequelize 23.66% <ø> (ø)
plugins-test-and-upstream-amqp10 38.36% <0.00%> (-0.24%) ⬇️
plugins-test-and-upstream-amqplib 43.92% <0.00%> (-0.09%) ⬇️
plugins-test-and-upstream-apollo 39.05% <0.00%> (-0.08%) ⬇️
plugins-test-and-upstream-avsc 38.72% <0.00%> (-0.09%) ⬇️
plugins-test-and-upstream-bunyan 33.82% <0.00%> (-0.09%) ⬇️
plugins-test-and-upstream-connect 40.83% <0.00%> (-0.09%) ⬇️
plugins-test-and-upstream-graphql 40.17% <0.00%> (-0.09%) ⬇️
plugins-test-and-upstream-koa 40.41% <0.00%> (-0.08%) ⬇️
plugins-test-and-upstream-protobufjs 38.95% <0.00%> (-0.09%) ⬇️
plugins-test-and-upstream-rhea 44.12% <0.00%> (-0.09%) ⬇️
plugins-undici 39.13% <0.00%> (-0.08%) ⬇️
plugins-url 24.72% <ø> (ø)
plugins-valkey 38.09% <0.00%> (-0.09%) ⬇️
plugins-vm 24.72% <ø> (ø)
plugins-winston 34.02% <0.00%> (-0.08%) ⬇️
plugins-ws 41.93% <0.00%> (-0.09%) ⬇️
profiling-macos 39.87% <0.00%> (-0.09%) ⬇️
profiling-ubuntu 39.99% <0.00%> (-0.09%) ⬇️
profiling-windows 41.21% <0.00%> (-0.08%) ⬇️
serverless-azure-functions-client 23.75% <ø> (ø)
serverless-azure-functions-eventhubs 23.75% <ø> (ø)
serverless-azure-functions-servicebus 23.75% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mr-lee mr-lee force-pushed the feat/claude-agent-sdk-integration branch from 0b55aab to 6e28bea Compare February 24, 2026 01:58
…agent-sdk

Adds automatic instrumentation for the Claude Agent SDK, providing
visibility into agentic sessions built with Anthropic's agent framework.

The integration captures a hierarchical span tree for every query()
invocation: agent (session) → workflow (turn) → tool (tool call) →
agent (subagent). Session resumption is supported via parent_session_id
tag linking.

Uses the SDK's official hooks API (SessionStart, SessionEnd, Stop,
PreToolUse, PostToolUse, SubagentStart, SubagentStop) rather than
monkey-patching internals. Follows the multi-plugin pattern from
LangChain: 4 diagnostics channel families, 4 TracingPlugin + 4
LLMObsPlugin subclasses, wired via CompositePlugin.
@mr-lee mr-lee force-pushed the feat/claude-agent-sdk-integration branch from 6e28bea to ddaa3f1 Compare February 24, 2026 02:42
The Claude Agent SDK is pure ESM and cannot be CJS-required in the test
harness. Without loading the actual SDK module, the plugin manager never
receives the dd-trace:instrumentation:load event and the TracingPlugin
channel subscriptions are never activated — causing all channel-based
tests to time out waiting for traces that never arrive.

Fix by publishing the load channel event directly after agent.load() to
trigger plugin registration without needing the actual ESM SDK module.
…rted-configurations.json

Required for the plugin_manager's getEnabled() to validate the
configuration key when the plugin is loaded via the load channel.
@mr-lee mr-lee requested a review from a team as a code owner February 24, 2026 03:11
@mr-lee mr-lee requested review from ida613 and removed request for a team February 24, 2026 03:11
…st assertions

The span processor only includes model_name and model_provider in span
events for 'llm' and 'embedding' span kinds. Our spans are 'agent',
'workflow', and 'tool' — so these fields are never output. Remove them
from all 13 test assertions to match actual span processor behavior.
…ndling

The span processor always includes `input: {}` in span events, but the
test assertion helper only deletes `actual.meta.output` (not input)
before deepStrictEqual. This caused 5 tests with empty inputValue to
fail because `input: {}` was present in actual but not in expected.

Fix by:
- Removing 3 edge-case tests that relied on empty input (already
  covered by APM test suite)
- Adding tagTextIO call to subagent plugin using agent type/ID as
  input value, providing meaningful data for the span
- Updating subagent test assertions to match
Add 4 integration tests for wrapQuery when channel subscribers are
active. These cover the 20 uncovered lines in claude-agent-sdk.js:
- Normal subscriber path with sync return
- Undefined options handling (resolvedOptions fallback)
- Sync error path (error + asyncEnd publish + rethrow)
- Async rejection path (thenable detection + rejection handler)
…head

Simulates 500 agent sessions (3 turns x 2 tool calls each = 10 spans,
22 hook invocations per session). Measures the overhead of hook injection,
diagnostics channel publishing, and span creation/finishing.

Results: ~44µs per session, ~4.4µs per span — negligible vs real query()
calls which take 30s-600s for API calls and tool execution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant