Export and analyze workflow trace data from LangSmith projects for performance insights and optimization.
This toolkit provides comprehensive capabilities for LangSmith trace analysis:
- Data Export (
export_langsmith_traces.py) - Export trace data from LangSmith using the SDK API - Performance Analysis (
analyze_traces.py) - Analyze exported traces for latency, bottlenecks, and parallel execution (Phase 3A) - Cost Analysis (
analyze_cost.py) - Calculate workflow costs with configurable pricing models (Phase 3B) - Failure Pattern Analysis (
analyze_failures.py) - Detect failures, retry sequences, and error patterns (Phase 3C)
Designed for users on Individual Developer plans without bulk export features, with robust error handling, rate limiting, and comprehensive analysis capabilities. All modules follow strict TDD methodology with 99+ tests and full type safety.
- Export N most recent traces from any LangSmith project
- Automatic pagination - Handles large exports (> 100 records) seamlessly with progress indication
- Environment variable support - Configure once via
.envfile for simplified usage - Hierarchical data export - Includes child_runs with
--include-childrenflag - Automatic rate limiting with exponential backoff
- Progress indication for long-running exports
- Comprehensive error handling (auth, network, rate limits)
- Structured JSON output with metadata
- Type-safe implementation with full type hints
- Test-driven development with pytest suite (33 tests, high coverage)
- Latency Distribution Analysis - Calculate p50/p95/p99 percentiles, identify outliers
- Bottleneck Identification - Rank nodes by execution time, identify primary bottlenecks
- Parallel Execution Verification - Detect parallel vs sequential execution, calculate time savings
- CSV Export - Export analysis results to CSV files for reporting
- Interactive Jupyter Notebook - Visual analysis workflow with automated reporting
- 31 comprehensive tests, 100% type-safe with mypy strict mode
- Python 3.8 or higher
- LangSmith API key (Individual Developer plan or higher)
- Virtual environment manager (uv or venv)
- Additional packages: numpy (for analysis), jupyter (for notebook)
git clone <repository-url>
cd export-langsmith-dataOption A: Using uv (recommended)
uv venv
.venv/Scripts/activate # Windows
# or
source .venv/bin/activate # Linux/MacOption B: Using venv
python -m venv .venv
.venv/Scripts/activate # Windows
# or
source .venv/bin/activate # Linux/Macuv pip install -r requirements.txt
# or
pip install -r requirements.txtCreate a .env file from the template:
cp .env.example .envEdit .env and add your LangSmith API key:
LANGSMITH_API_KEY=lsv2_pt_your_api_key_hereGet your API key from: https://smith.langchain.com/settings
Set up your .env file once:
cp .env.example .env
# Edit .env with your values:
# LANGSMITH_API_KEY=lsv2_pt_your_key_here
# LANGSMITH_PROJECT=your-project-name
# LANGSMITH_LIMIT=150Then use simple commands:
python export_langsmith_traces.py --output "traces_export.json"python export_langsmith_traces.py \
--api-key "lsv2_pt_..." \
--project "your-project-name" \
--limit 150 \
--output "traces_export.json"# Override just the project while using env vars for api-key and limit
python export_langsmith_traces.py \
--project "different-project" \
--output "traces_export.json"--api-key(optional): LangSmith API key for authentication (default:LANGSMITH_API_KEYenv var)--project(optional): LangSmith project name or ID (default:LANGSMITH_PROJECTenv var)--limit(optional): Number of most recent traces to export (default:LANGSMITH_LIMITenv var)- For limits > 100, the tool automatically handles pagination across multiple API calls
- If fewer records exist in the project, you'll receive a warning and all available records
--output(required): Output JSON file path
Note: While the CLI arguments are now optional, the values must be provided either via command line or environment variables.
Using environment variables:
# Set up .env file once
echo "LANGSMITH_API_KEY=lsv2_pt_abc123..." >> .env
echo "LANGSMITH_PROJECT=your-project-name" >> .env
echo "LANGSMITH_LIMIT=200" >> .env
# Simple usage
python export_langsmith_traces.py --output "traces_2025-11-28.json"Using CLI arguments:
python export_langsmith_traces.py \
--api-key "lsv2_pt_abc123..." \
--project "your-project-name" \
--limit 200 \
--output "traces_2025-11-28.json"Mixed usage:
# Use env vars for api-key and project, override limit
python export_langsmith_traces.py \
--limit 500 \
--output "large_export.json"The LangSmith API limits results to 100 records per call. This tool automatically handles pagination for larger exports with progress indication:
Example: Exporting 500 records
python export_langsmith_traces.py --limit 500 --output large_export.jsonOutput:
🚀 Exporting 500 traces from project 'my-project'...
✓ Connected to LangSmith API
📥 Fetching traces...
📄 Fetching 500 runs across 5 pages...
✓ Page 1/5: 100 runs (Total: 100)
✓ Page 2/5: 100 runs (Total: 200)
✓ Page 3/5: 100 runs (Total: 300)
✓ Page 4/5: 100 runs (Total: 400)
✓ Page 5/5: 100 runs (Total: 500)
✓ Fetched 500 traces
🔄 Formatting trace data...
✓ Data formatted
💾 Exporting to large_export.json...
✅ Export complete! Saved to large_export.json
If Project Has Fewer Records:
⚠️ Warning: Fetched 250 runs (requested 500)
Pagination Features:
- Automatic chunking into 100-record pages
- Progress indication for multi-page exports
- Rate limiting between pages (500ms delay)
- Retry logic per page for reliability
- Warning when fewer records available than requested
Once you have exported trace data, use the analysis tools to gain performance insights.
-
Export trace data with hierarchical information:
python export_langsmith_traces.py \ --limit 100 \ --output traces_export.json \ --include-children
-
Launch Jupyter notebook:
jupyter notebook notebooks/langsmith_trace_performance_analysis.ipynb
-
Update the file path in cell 2 to point to your export file
-
Run all cells to generate:
- Latency distribution metrics (p50/p95/p99)
- Bottleneck analysis with node rankings
- Parallel execution verification
- CSV exports in
output/directory
After generating analysis results, use the verification tool to ensure accuracy:
# Basic verification - Phase 3A only (default)
python verify_analysis_report.py traces_export.json
# Verify all phases (3A + 3B + 3C)
python verify_analysis_report.py traces_export.json --phases all
# Verify specific phases
python verify_analysis_report.py traces_export.json --phases 3b
python verify_analysis_report.py traces_export.json --phases 3c
python verify_analysis_report.py traces_export.json --phases "3a,3b"
# Verify against expected values
python verify_analysis_report.py traces_export.json --expected-values expected.json
# Use custom pricing model for cost analysis
python verify_analysis_report.py traces_export.json --phases 3b --pricing-model gemini_1.5_proThe verification tool:
- Regenerates all calculations from raw data
- Provides deterministic verification of findings
- Optionally compares against expected values (PASS/FAIL indicators)
- Supports selective phase verification (3a, 3b, 3c, or all)
- Useful for auditing and validating reports
Example expected values JSON:
{
"sample_size": 10,
"latency": {
"p50": 25.25,
"p95": 46.03,
"mean": 26.23
},
"parallel": {
"parallel_pct": 30.0,
"savings_s": 201.5
}
}Analyze workflow costs based on token usage with configurable pricing models:
from analyze_cost import (
analyze_costs,
PricingConfig,
EXAMPLE_PRICING_CONFIGS,
)
from analyze_traces import load_from_json
# Load exported trace data
dataset = load_from_json("traces_export.json")
# Option 1: Use example pricing config
pricing = EXAMPLE_PRICING_CONFIGS["gemini_1.5_pro"]
# Option 2: Create custom pricing config
pricing = PricingConfig(
model_name="Custom Model",
input_tokens_per_1k=0.001, # $1.00 per 1M input tokens
output_tokens_per_1k=0.003, # $3.00 per 1M output tokens
cache_read_per_1k=0.0001, # $0.10 per 1M cache read tokens (optional)
)
# Run cost analysis
results = analyze_costs(
workflows=dataset.workflows,
pricing_config=pricing,
scaling_factors=[1, 10, 100, 1000], # Optional, defaults to [1, 10, 100, 1000]
monthly_workflow_estimate=10000, # Optional, for monthly cost projections
)
# Access results
print(f"Average cost per workflow: ${results.avg_cost_per_workflow:.4f}")
print(f"Median cost: ${results.median_cost_per_workflow:.4f}")
print(f"Top cost driver: {results.top_cost_driver}")
# View node-level breakdown
for node in results.node_summaries[:3]: # Top 3 nodes
print(f" {node.node_name}:")
print(f" Total cost: ${node.total_cost:.4f}")
print(f" Executions: {node.execution_count}")
print(f" Avg per execution: ${node.avg_cost_per_execution:.6f}")
print(f" % of total: {node.percent_of_total_cost:.1f}%")
# View scaling projections
for scale_label, projection in results.scaling_projections.items():
print(f"{scale_label}: ${projection.total_cost:.2f} for {projection.workflow_count} workflows")
if projection.cost_per_month_30days:
print(f" Monthly estimate: ${projection.cost_per_month_30days:.2f}/month")Cost Analysis Features:
- Configurable pricing for any LLM provider (not hard-coded)
- Token usage extraction (input/output/cache tokens)
- Workflow-level cost aggregation
- Node-level cost breakdown with percentages
- Scaling projections at 1x, 10x, 100x, 1000x volume
- Optional monthly cost estimates
- Data quality reporting for missing token data
Detect and analyze failure patterns, retry sequences, and error distributions:
from analyze_failures import (
analyze_failures,
FAILURE_STATUSES,
ERROR_PATTERNS,
)
from analyze_traces import load_from_json
# Load exported trace data
dataset = load_from_json("traces_export.json")
# Run failure analysis
results = analyze_failures(workflows=dataset.workflows)
# Overall metrics
print(f"Total workflows: {results.total_workflows}")
print(f"Success rate: {results.overall_success_rate_percent:.1f}%")
print(f"Failed workflows: {results.failed_workflows}")
# Node failure breakdown
print("\nTop 5 nodes by failure rate:")
for node in results.node_failure_stats[:5]:
print(f" {node.node_name}:")
print(f" Failure rate: {node.failure_rate_percent:.1f}%")
print(f" Failures: {node.failure_count}/{node.total_executions}")
print(f" Retry sequences: {node.retry_sequences_detected}")
print(f" Common errors: {node.common_error_types}")
# Error distribution
print("\nError type distribution:")
for error_type, count in results.error_type_distribution.items():
print(f" {error_type}: {count}")
# Retry analysis
print(f"\nTotal retry sequences detected: {results.total_retry_sequences}")
if results.retry_success_rate_percent:
print(f"Retry success rate: {results.retry_success_rate_percent:.1f}%")
# Example retry sequence details
for retry_seq in results.retry_sequences[:3]: # First 3 retry sequences
print(f"\nRetry sequence in {retry_seq.node_name}:")
print(f" Attempts: {retry_seq.attempt_count}")
print(f" Final status: {retry_seq.final_status}")
print(f" Total duration: {retry_seq.total_duration_seconds:.1f}s")Failure Analysis Features:
- Status-based failure detection (error, failed, cancelled)
- Regex-based error classification (validation, timeout, import, LLM errors)
- Heuristic retry sequence detection:
- Multiple executions of same node within 5-minute window
- Ordered by start time
- Node-level failure statistics
- Retry success rate calculation
- Error distribution across workflows
- Quality risk identification (placeholder for future enhancement)
You can also use the analysis functions programmatically:
from analyze_traces import (
load_from_json,
analyze_latency_distribution,
identify_bottlenecks,
verify_parallel_execution,
)
# Load exported trace data
dataset = load_from_json("traces_export.json")
# Analyze latency distribution
latency = analyze_latency_distribution(dataset.workflows)
print(f"p50: {latency.p50_minutes:.1f} min")
print(f"p95: {latency.p95_minutes:.1f} min")
print(f"Outliers above 23 min: {len(latency.outliers_above_23min)}")
# Identify bottlenecks
bottlenecks = identify_bottlenecks(dataset.workflows)
print(f"Primary bottleneck: {bottlenecks.primary_bottleneck}")
for node in bottlenecks.node_performances[:5]:
print(f" {node.node_name}: {node.avg_duration_seconds:.1f}s")
# Verify parallel execution
parallel = verify_parallel_execution(dataset.workflows)
print(f"Parallel execution: {parallel.is_parallel}")
print(f"Time savings: {parallel.avg_time_savings_seconds:.1f}s")
# Export to CSV
with open("output/latency.csv", "w") as f:
f.write(latency.to_csv())The analysis generates CSV files in the output/ directory:
-
latency_distribution.csv- Percentile metrics and outlier analysis- p50/p95/p99 latency values
- Min/max/mean/std_dev statistics
- Outlier counts (>23 min, <7 min)
- % within claimed 7-23 minute range
-
bottleneck_analysis.csv- Node-level performance breakdown- Node name, execution count
- Average/median duration, std deviation
- % of total workflow time
- Total time across all workflows
-
parallel_execution_analysis.csv- Parallel execution evidence- Parallel vs sequential workflow counts
- Average start time deltas
- Time savings calculation
- Confidence level assessment
After exporting data, use the validation utility to check data quality and statistical validity:
python validate_export.py traces_export.jsonThe validator provides:
- Dataset overview: Workflow counts, hierarchical data status
- Workflow statistics: Validator presence, duration stats, unique nodes
- Statistical validity assessment: Sample size adequacy for each analysis type
- Recommendations: Whether the dataset is ready for analysis
Example output:
DATASET OVERVIEW
Total workflows: 384
Hierarchical data: Yes
Workflows with validators: 15
STATISTICAL VALIDITY ASSESSMENT
Latency Analysis: EXCELLENT (n >= 100)
Bottleneck Analysis: EXCELLENT (n >= 100)
Parallel Analysis: WEAK (10 <= n < 20, low confidence)
RECOMMENDATION
Status: READY FOR COMPREHENSIVE ANALYSIS
The script generates a JSON file with the following structure:
{
"export_metadata": {
"export_timestamp": "2025-11-28T12:00:00Z",
"project_name": "project-name",
"total_traces": 150,
"langsmith_api_version": "0.4.x"
},
"traces": [
{
"id": "run_id",
"name": "workflow_name",
"start_time": "2025-11-28T10:00:00Z",
"end_time": "2025-11-28T10:15:00Z",
"duration_seconds": 900,
"status": "success",
"inputs": {},
"outputs": {},
"error": null,
"run_type": "chain",
"child_runs": []
}
]
}Export module tests (33 tests):
# Run all export tests
pytest test_export_langsmith_traces.py -v
# Run specific test class
pytest test_export_langsmith_traces.py::TestArgumentParsing -v
# Run with coverage
pytest --cov=export_langsmith_traces test_export_langsmith_traces.pyAnalysis module tests (31 tests):
# Run all analysis tests
pytest test_analyze_traces.py -v
# Run specific phase tests
pytest test_analyze_traces.py::TestLatencyDistribution -v
pytest test_analyze_traces.py::TestBottleneckIdentification -v
pytest test_analyze_traces.py::TestParallelExecutionVerification -v
pytest test_analyze_traces.py::TestCSVExport -v
# Run with coverage
pytest --cov=analyze_traces test_analyze_traces.pyCost analysis module tests (20 tests):
# Run all cost analysis tests
pytest test_analyze_cost.py -v
# Run specific test classes
pytest test_analyze_cost.py::TestPricingConfig -v
pytest test_analyze_cost.py::TestTokenExtraction -v
pytest test_analyze_cost.py::TestCostCalculation -v
# Run with coverage
pytest --cov=analyze_cost test_analyze_cost.pyFailure analysis module tests (15 tests):
# Run all failure analysis tests
pytest test_analyze_failures.py -v
# Run specific test classes
pytest test_analyze_failures.py::TestFailureDetection -v
pytest test_analyze_failures.py::TestRetryDetection -v
pytest test_analyze_failures.py::TestNodeFailureAnalysis -v
# Run with coverage
pytest --cov=analyze_failures test_analyze_failures.pyRun all tests:
# Run all 99 tests (33 export + 31 analysis + 20 cost + 15 failure)
pytest -v
# Run with coverage
pytest --cov=. -vexport-langsmith-data/
├── .env.example # API key configuration template
├── .gitignore # Git ignore patterns
├── requirements.txt # Python dependencies
├── PLAN.md # PDCA implementation plan
├── export-langsmith-requirements.md # Export requirements specification
├── export_langsmith_traces.py # Data export script
├── test_export_langsmith_traces.py # Export test suite (33 tests)
├── validate_export.py # Export validation utility
├── test_validate_export.py # Validation test suite (7 tests)
├── analyze_traces.py # Performance analysis module (Phase 3A)
├── test_analyze_traces.py # Analysis test suite (31 tests)
├── analyze_cost.py # Cost analysis module (Phase 3B)
├── test_analyze_cost.py # Cost analysis test suite (20 tests)
├── analyze_failures.py # Failure pattern analysis module (Phase 3C)
├── test_analyze_failures.py # Failure analysis test suite (15 tests)
├── verify_analysis_report.py # Verification tool for all phases
├── notebooks/
│ └── langsmith_trace_performance_analysis.ipynb # Interactive analysis notebook
├── output/ # Generated CSV analysis results
│ ├── latency_distribution.csv
│ ├── bottleneck_analysis.csv
│ └── parallel_execution_analysis.csv
└── README.md # This file
This project follows the PDCA (Plan-Do-Check-Act) framework with strict Test-Driven Development:
- Plan: Comprehensive analysis and implementation plan (see PLAN.md)
- Do: TDD implementation with Red-Green-Refactor cycles
- Check: Validation against requirements and quality standards
- Act: Retrospection and continuous improvement
Data Export Module:
- ✅ Project setup with virtual environment (uv/venv)
- ✅ Dependencies configuration with CI/CD quality gates
- ✅ CLI argument parsing with validation
- ✅ Environment variable support - Optional
.envfile configuration for simplified usage - ✅ Automatic pagination - Handles API 100-record limit with multi-page fetching and progress indication
- ✅ Hierarchical data export -
--include-childrenflag for complete workflow structures - ✅ LangSmith client initialization with authentication
- ✅ Run fetching with exponential backoff rate limiting
- ✅ Data formatting and transformation with safe field extraction
- ✅ JSON export functionality with error handling
- ✅ Comprehensive error scenario handling
- ✅ Main orchestration with user-friendly progress feedback
- ✅ End-to-end integration testing
- ✅ Test suite: 33 tests, high coverage
- ✅ Code quality: Black, Ruff, mypy, Bandit, Safety checks passing
Performance Analysis Module:
- ✅ Data loading from JSON exports with hierarchical support
- ✅ Latency distribution analysis - p50/p95/p99 percentiles, outlier detection
- ✅ Bottleneck identification - Node-level performance ranking and metrics
- ✅ Parallel execution verification - Detect parallel validators, calculate time savings
- ✅ CSV export functionality - Export all analysis results to CSV format
- ✅ Interactive Jupyter notebook - Complete analysis workflow with visualizations
- ✅ Test suite: 31 tests (10 Phase 1 + 5 Phase 2 + 6 Phase 3 + 6 Phase 4 + 4 Phase 5)
- ✅ Type-safe implementation with mypy strict mode
- ✅ Code quality: Black, Ruff, mypy checks passing
- ✅ TDD methodology: Strict RED-GREEN-REFACTOR cycles across all 5 phases
Cost Analysis Module (Phase 3B):
- ✅ Configurable pricing models for any LLM provider
- ✅ Token usage extraction from trace metadata
- ✅ Cost calculation with input/output/cache token pricing
- ✅ Workflow-level cost aggregation
- ✅ Node-level cost breakdown with percentages
- ✅ Scaling projections (1x, 10x, 100x, 1000x)
- ✅ Test suite: 20 tests, full coverage
- ✅ Code quality: Black, Ruff, mypy, Bandit checks passing
Failure Pattern Analysis Module (Phase 3C):
- ✅ Status-based failure detection
- ✅ Regex-based error classification (5 patterns + unknown)
- ✅ Heuristic retry sequence detection
- ✅ Node-level failure statistics
- ✅ Retry success rate calculation
- ✅ Error distribution tracking
- ✅ Test suite: 15 tests, full coverage
- ✅ Code quality: Black, Ruff, mypy, Bandit checks passing
- ⏸️ Progress indication (tqdm) - Skipped in favor of simple console output
- ⏸️ Validator effectiveness analysis - Placeholder in Phase 3C for future enhancement
- ⏸️ Cache effectiveness analysis - Placeholder in Phase 3B for future enhancement
- Verify your API key is correct in
.envor command line - Check API key has not expired at https://smith.langchain.com/settings
- Script automatically retries with exponential backoff
- Consider reducing export frequency if hitting limits repeatedly
- Check internet connectivity
- Verify access to https://api.smith.langchain.com
- Check firewall/proxy settings
Property of Stride http://www.stride.build
This project was developed using Claude Code with the PDCA framework. See export-langsmith-implementation-plan.md for implementation details.