feat(claude-agent-sdk): add instrumentation for @anthropic-ai/claude-agent-sdk#7603
Open
mr-lee wants to merge 8 commits intoDataDog:masterfrom
Open
feat(claude-agent-sdk): add instrumentation for @anthropic-ai/claude-agent-sdk#7603mr-lee wants to merge 8 commits intoDataDog:masterfrom
mr-lee wants to merge 8 commits intoDataDog:masterfrom
Conversation
ca06588 to
0b55aab
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #7603 +/- ##
==========================================
+ Coverage 80.30% 80.37% +0.07%
==========================================
Files 733 737 +4
Lines 31561 31783 +222
==========================================
+ Hits 25345 25546 +201
- Misses 6216 6237 +21
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0b55aab to
6e28bea
Compare
…agent-sdk Adds automatic instrumentation for the Claude Agent SDK, providing visibility into agentic sessions built with Anthropic's agent framework. The integration captures a hierarchical span tree for every query() invocation: agent (session) → workflow (turn) → tool (tool call) → agent (subagent). Session resumption is supported via parent_session_id tag linking. Uses the SDK's official hooks API (SessionStart, SessionEnd, Stop, PreToolUse, PostToolUse, SubagentStart, SubagentStop) rather than monkey-patching internals. Follows the multi-plugin pattern from LangChain: 4 diagnostics channel families, 4 TracingPlugin + 4 LLMObsPlugin subclasses, wired via CompositePlugin.
6e28bea to
ddaa3f1
Compare
The Claude Agent SDK is pure ESM and cannot be CJS-required in the test harness. Without loading the actual SDK module, the plugin manager never receives the dd-trace:instrumentation:load event and the TracingPlugin channel subscriptions are never activated — causing all channel-based tests to time out waiting for traces that never arrive. Fix by publishing the load channel event directly after agent.load() to trigger plugin registration without needing the actual ESM SDK module.
…rted-configurations.json Required for the plugin_manager's getEnabled() to validate the configuration key when the plugin is loaded via the load channel.
…st assertions The span processor only includes model_name and model_provider in span events for 'llm' and 'embedding' span kinds. Our spans are 'agent', 'workflow', and 'tool' — so these fields are never output. Remove them from all 13 test assertions to match actual span processor behavior.
…ndling
The span processor always includes `input: {}` in span events, but the
test assertion helper only deletes `actual.meta.output` (not input)
before deepStrictEqual. This caused 5 tests with empty inputValue to
fail because `input: {}` was present in actual but not in expected.
Fix by:
- Removing 3 edge-case tests that relied on empty input (already
covered by APM test suite)
- Adding tagTextIO call to subagent plugin using agent type/ID as
input value, providing meaningful data for the span
- Updating subagent test assertions to match
Add 4 integration tests for wrapQuery when channel subscribers are active. These cover the 20 uncovered lines in claude-agent-sdk.js: - Normal subscriber path with sync return - Undefined options handling (resolvedOptions fallback) - Sync error path (error + asyncEnd publish + rethrow) - Async rejection path (thenable detection + rejection handler)
…head Simulates 500 agent sessions (3 turns x 2 tool calls each = 10 spans, 22 hook invocations per session). Measures the overhead of hook injection, diagnostics channel publishing, and span creation/finishing. Results: ~44µs per session, ~4.4µs per span — negligible vs real query() calls which take 30s-600s for API calls and tool execution.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds automatic instrumentation for the Claude Agent SDK (
@anthropic-ai/claude-agent-sdk), providing full visibility into agentic sessions built with Anthropic's agent framework.The integration automatically captures a hierarchical span tree for every
query()invocation:Span types and what they capture:
agentworkflowtoolagentSession resumption is supported — when a session is resumed via
options.resume, a newagentspan is created with aparent_session_idtag linking to the original session for lineage tracing.How it works
The Claude Agent SDK exposes a first-class hooks API for lifecycle events (
SessionStart,SessionEnd,Stop,PreToolUse,PostToolUse,SubagentStart,SubagentStop, etc.). Rather than monkey-patching internal SDK methods, this integration wrapsquery()and injects tracing hooks via the official API. The hooks publish events on diagnostics channels, which the tracing and LLM Obs plugin layers consume to create and enrich spans.This follows the multi-plugin pattern established by the LangChain integration — multiple channel prefixes, multiple TracingPlugin/LLMObsPlugin subclasses sharing a base, wired together via a CompositePlugin.
Architecture
Motivation
The Claude Agent SDK is GA and growing in adoption for building agentic applications with Claude. It provides a TypeScript API for running multi-turn agent sessions with tool use, subagent orchestration, and session resumption.
dd-trace-js already has an Anthropic integration for
@anthropic-ai/sdk(base Messages API), but that only captures individual LLM calls — it has no awareness of the orchestration layer above. This integration fills that gap, giving users visibility into:query()invocationsThis complements the existing
@anthropic-ai/sdkintegration (which captures per-API-callllmspans) by adding theagent→workflow→toolhierarchy above it.Performance
A sirun benchmark (
benchmark/sirun/plugin-claude-agent-sdk/) measures the full instrumentation overhead. Each simulated session has 3 turns with 2 tool calls each — 10 spans and 22 hook invocations per session.The overhead is entirely synchronous — no async/await, no promises, no functional array methods in production code. Hook callbacks do property assignments and diagnostics channel publishes.
mergeHooksandbuildTracerHooksrun once perquery()call, not per-event.A real
query()invocation makes API calls to Claude and executes tool operations (file reads, bash commands, etc.), typically taking 30 seconds to 10+ minutes. The ~44µs of instrumentation overhead is negligible.Tests
yarn lintpasses with zero warningsAdditional Notes
@anthropic-ai/claude-agent-sdkis a single minified ESM bundle (sdk.mjs). The shimmer relies onimport-in-the-middlefor ESM interop.sdk.mjs) with zero runtime dependencies. Its internal Anthropic API client is bundled — it neverrequire()s orimports@anthropic-ai/sdk, so the existing Anthropic shimmer doesn't fire. This means per-API-callllmspans (token usage, cache metrics, model parameters) are not captured in this PR. What is captured: session lifecycle, turn decomposition, full tool I/O, and subagent orchestration — which covers the orchestration layer the existing integration can't see. Options for closing the LLM span gap in a follow-up:undici:*diagnostics channels — intercept outbound HTTP requests toapi.anthropic.comat the transport level. SDK-agnostic, works regardless of bundling, but requires parsing raw request/response bodies.diagnostics_channelor telemetry hooks inside the Agent SDK, the integration becomes straightforward. Could be a feature request.sdk.mjshas an internal HTTP client that could be patched at load time. Fragile and version-dependent, but feasible for targeted extraction (e.g., token counts from response headers).Files created
packages/datadog-instrumentations/src/claude-agent-sdk.jsquery(), inject hooks, publish channelspackages/datadog-plugin-claude-agent-sdk/src/index.jspackages/datadog-plugin-claude-agent-sdk/src/tracing.jspackages/dd-trace/src/llmobs/plugins/claude-agent-sdk/index.jspackages/datadog-plugin-claude-agent-sdk/test/index.spec.jspackages/dd-trace/test/llmobs/plugins/claude-agent-sdk/index.spec.jsbenchmark/sirun/plugin-claude-agent-sdk/Files modified
packages/dd-trace/src/plugins/index.js@anthropic-ai/claude-agent-sdkpackages/datadog-instrumentations/src/helpers/hooks.jsesmFirst: truepackages/dd-trace/src/config/supported-configurations.jsonDD_TRACE_CLAUDE_AGENT_SDK_ENABLEDindex.d.tsdocs/test.tsdocs/API.md.github/workflows/llmobs.ymlFollow-up PRs
undici:*channels