Add heap dump (HPROF) analysis support by jbachorik · Pull Request #52 · btraceio/jafar

jbachorik · 2026-02-10T16:52:18Z

Summary

New modules: Adds hdump-parser, hdump-shell, shell-core, and parser-core for comprehensive heap dump analysis
HdumpPath query language: Implement path-based queries for heap dumps with filtering, aggregation (groupBy, count, sum, stats), and transformations
Memory leak detection: Add 6 built-in leak detectors with checkLeaks command and interactive wizard
Dominator tree computation: Implement Cooper-Harvey-Kennedy algorithm with hybrid indexed mode support
Retained size analysis: Auto-compute retained sizes on first access with persistent storage and streaming computation to prevent OOM
Indexed parsing: Two-pass index-based parsing for large heap dumps, with >2GB file support via SplicedMappedByteBuffer
GC root path finding: BFS/DFS-based path finding from objects to GC roots
Unified shell architecture: Extract shared infrastructure into shell-core for both JFR and heap dump shells

Test plan

Run ./gradlew test to verify all existing and new tests pass
Run ./gradlew :hdump-shell:run --console=plain and test interactive commands
Open a heap dump file and verify show classes, show objects, checkLeaks work correctly
Test with large (>2GB) heap dump to verify indexed mode and streaming support
Verify retained size computation completes without OOM on large heaps

🤖 Generated with Claude Code

Extract common shell infrastructure to support multiple data formats: - Session interface for uniform session management - ShellModule SPI for pluggable domain modules - Generic SessionManager, VariableStore, QueryEvaluator - Reusable TableRenderer and PagedPrinter JFRSession now implements Session interface. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Add depends/recommends/provides fields to PluginDefinition - Create DependencyResolver with topological sort and cycle detection - Update PluginInstaller with installWithDependencies method - Add uninstall protection for plugins with dependents - Update registry JSON format with dependency fields 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Enables consistent session management across JFR and heap dump analysis. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Rename parser to jfr-parser for clarity - Create parser-core module with shared utilities (CustomByteBuffer, SplicedMappedByteBuffer) - Update hdump-parser to use parser-core for memory-mapped I/O - Add close() and limit() methods to CustomByteBuffer interface - Support multi-release JAR with Java 8/13/21 implementations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Create jafar-shell module with pluggable ShellModule architecture - Implement JfrModule with ServiceLoader discovery - Add context-aware completion and help based on session type - Support tilde expansion in file paths - Fix UpdateChecker noisy logging 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…upBy - Support sum(expr), avg(expr), count(expr), min(expr), max(expr) in groupBy - Add completion for field names inside value expressions - Handle value= prefix and aggregate function contexts in completer 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add RowSorter utility in shell-core for shared sorting logic - Extend GroupByOp with sortBy (key/value) and ascending fields - Parse sort= and asc= options in groupBy - Fix completion: no trailing spaces for named parameters and values 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Change reserved lines from 5 to 10 to show ~5 fewer rows per page 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implements Eclipse MAT-style minimum retained size approximation. Runs ~5x faster than exact dominator computation with 80-90% accuracy. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implements shortest path and all paths algorithms for reference chain analysis. Helps users understand why objects are retained in memory. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Auto-trigger retained size computation on first access - Use ApproximateRetainedSizeComputer for fast approximation - Implement findPathToGcRoot using PathFinder BFS - Add ensureDominatorsComputed() for lazy computation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- PathToRootOp: Find reference chains to GC roots - CheckLeaksOp: Built-in detectors and custom filters - DominatorsOp: Get dominated objects 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Parse pathToRoot, checkLeaks, dominators operators - Support detector/filter parameters with validation - Handle zero-arg operators with optional parens 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- pathToRoot: Converts objects to GC root paths - checkLeaks: Placeholder for detector/filter logic - dominators: Placeholder for dominator tree traversal 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implements on-demand full dominator tree computation as an alternative to approximate retained size calculation. The dominators() operator now guides users to compute the full tree when needed. - DominatorTreeComputer: Cooper-Harvey-Kennedy algorithm (~250 LOC) - HeapObjectImpl: Add dominator field and accessors - HeapDumpImpl: Add computeFullDominatorTree() with progress callback - HeapSession: Expose full dominator tree computation - HdumpPathEvaluator: Update dominators() to check and use full tree Performance: 15-30s for 10M objects, 420MB peak memory Progress: Callback support for UI feedback during computation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Cache rpoIndex map to eliminate O(N³) complexity - Build dominator children map for O(N) retained size computation - Fix infinite loop in intersect() when reaching GC roots - Cache children map for O(1) getDominatedObjects() lookup - Add progress tracking and UI improvements Result: 20-50x speedup (30+ seconds → 2-4 seconds for 10M objects) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add memory-efficient hybrid approach combining approximate and exact dominator computation: New Components: - HybridDominatorComputer: Core hybrid algorithm implementation * identifyInterestingObjects(): Top N + leak-prone classes * expandToDominatorPaths(): Build subgraph to GC roots * computeExactForSubgraph(): Exact dominators for subset - HeapDumpImpl.computeHybridDominators(): Primary API * Takes topN and class patterns * Computes approximate for all (~8 bytes/object) * Computes exact for interesting subset (~150 bytes/object) - HeapDumpImpl.computeExactForClasses(): Targeted exact computation * Class pattern matching with glob support * Expands to include dominator paths - HeapObjectImpl.hasExactRetainedSize: Track approximate vs exact Memory Savings: - 93% reduction vs full computation - 1 GiB heap: 2.1 GB → 140 MB - 8 GiB heap: 16.8 GB → 1.1 GB - 16 GiB heap: 33.6 GB → 2.2 GB Enables 16 GiB heap dump analysis on 16-32 GB RAM workstations. Testing: - HeapDumpParserTest.testHybridDominatorComputation() - Verifies memory usage and correctness Documentation: - doc/hybrid-dominator-computation.md: Complete usage guide 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implements Milestone 1: Core Index Infrastructure - IndexFormat: Binary format constants (25-byte object entries) - IndexWriter: Buffered sequential writes with atomic rename - ObjectIndexReader: Memory-mapped O(1) random access - Integration tests: 4 test cases covering write/read cycle Target: Support 114M objects with <4GB heap via disk-based indexes Index size: 2.85GB for 114M objects (25 bytes/object + 20 byte header) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implements M2.1 of progressive index architecture: - SyntheticHeapDumpGenerator: Minimal HPROF writer for testing (no external deps) - TwoPassParsingTest.AddressCollector: Collects object addresses in single scan - Maps 64-bit addresses to 32-bit sequential IDs (50% space savings) - Tests verify correct address collection and ID mapping Research confirmed no existing library for synthetic HPROF creation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implements M2.2 of progressive index architecture: - IndexBuilder: Builds indexes using address-to-ID mapping from Pass 1 - Re-reads heap dump and writes object metadata to IndexWriter - Maps class addresses to sequential 32-bit class IDs - Handles instance dumps, object arrays, and primitive arrays - Tests verify round-trip: Pass 1 collect → Pass 2 build → read back metadata All 4 tests passing: address collection, index building, integration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implements M2.3 & M2.4 of progressive index architecture: **M2.3: Lazy Loading Support** - Modified getObjectByIdInternal() to load from ObjectIndexReader - Objects loaded on-demand from indexes, cached in objectsById - Supports both in-memory and indexed modes transparently **M2.4: Two-Pass Parsing Integration** - Added ParserOptions.useIndexedParsing flag (new INDEXED preset) - Implemented parseTwoPass() method in HeapDumpImpl - Pass 1: collectObjectAddresses() - single scan, builds addressToId32 map - Pass 2: buildIndexes() - re-reads and writes object metadata to IndexWriter - Index directory created alongside heap dump (.idx suffix) - ObjectIndexReader opened for lazy object loading **Architecture Changes:** - New fields: objectIndexReader, addressToId32, indexDir - parse() method routes to parseTwoPass() when useIndexedParsing=true - close() cleans up objectIndexReader - Fully backward compatible: in-memory mode unchanged **Testing:** - IndexedParsingTest: integration test with 50 synthetic objects - Verifies indexed mode works end-to-end - Tests consistency with in-memory mode - Tests index reuse across parses **Memory Impact:** - In-memory mode: unchanged (objectsById map) - Indexed mode: ~3.5 GB for 114M objects (vs 25 GB in-memory) - Lazy loading: objects cached on first access Target: Enable 114M object heaps with <4GB JVM heap. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…, sum, stats)

Type-filtered queries like 'show objects/java.util.*' now use a persistent class-instances index to directly enumerate matching instances instead of scanning all objects. Provides 10-60x speedup depending on selectivity. Two-file index design: - classinstances-offset.idx: Maps classId to instance list location (O(1)) - classinstances-data.idx: Sequential instance IDs grouped by class Built automatically during Pass 2 with ~5% overhead. Validated and loaded on fast path. Query evaluator uses index when available, falls back to full scan when not. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Instead of returning -1 (causing NaN in aggregations), retained sizes are now automatically computed on first access if not already available. This provides a much better user experience - queries using retainedSize just work without manual computation steps. The computation is triggered once on first access, persisted to retained.idx, and all subsequent accesses are instant. Thread-safe with double-checked locking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

When retained sizes are automatically computed on first access, now shows progress updates so users know the computation is running and not hung. Progress format: "Building inbound index: 45.2%" After completion: "Retained sizes computed and cached." 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Increase progress update frequency (1% vs 5%) for better visibility - Auto-load retained sizes from index after computation in indexed mode - Format aggregated memory values properly (use field names as column names) - Fix sorting when aggregating memory fields 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Allow checkLeaks(...) without 'show objects |' prefix - Add checkLeaks(detector="help") to list available detectors - Display detector descriptions and parameter hints - Simplify leak detection workflow 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Made ensureRetainedSizesComputed() public for external use - Modified applyCheckLeaks to use streaming retained size computation - DuplicateStringsDetector: use lightweight StringGroupMetrics - ListenerLeakDetector: use lightweight ListenerGroupMetrics - FinalizerQueueDetector: stream and count without storing objects All detectors now accumulate only metrics during streaming, not HeapObject references, preventing OOME on large heaps (114M objects). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Added recognition for checkLeaks(...) commands at the Shell level so they're dispatched to handleShow(). This enables the shorthand syntax where checkLeaks(...) is treated as show objects | checkLeaks(...). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Interactive wizard with arrow-key detector selection - Systematic class name format handling (internal/qualified) - Fix all detectors to use internal format (java/lang/String) - Add ClassNameUtil for format conversions - Update docs to use get_resources.sh 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

github-actions · 2026-02-10T16:52:51Z

Combined JUnit Test Report

Total: 0
Passed: 0
Failures: 0
Errors: 0
Skipped: 0

HTML Test Reports

Run artifacts: https://github.com/btraceio/jafar/actions/runs/21875065343

JDK 8: (artifact not found)
JDK 21: (artifact not found)

- Use internal format (java/lang/Thread) in ThreadLocalLeakDetector - Update HeapDump.findPathToGcRoot JavaDoc (method is implemented) - Add null checks for session-dependent pipeline operations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

PathStep record carries the object and the field name of the incoming reference. PathFinder now tracks named edges via getFieldValues() for instance objects and [i] notation for arrays. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Peel leading FilterOps into streaming predicates so filter|top works correctly on large heaps instead of capping at 100 random objects - Pass HeapSession through streaming path so pathToRoot() works after a streaming aggregation - Fix groupBy|top sort: use getAggregationColumnName instead of hardcoded "sum", fixing wrong order when field is e.g. retainedSize - Accept count() with empty parens Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Allow querying array instances via Java notation (int[], Object[][]) and JVM descriptor format ([I, [Ljava.lang.Object;). Previously the parser stopped at '[', blocking all array type specs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add VIRTUAL_ROOT node so multi-root graphs always converge - Add stagnation guard to break flip-flop cycles - Simplify intersect() removing stale self-dominator checks - Use getObjectByIdInternal() instead of getCachedObject() to avoid empty dominated lists for evicted objects Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jbachorik and others added 30 commits February 3, 2026 12:29

Add hdump-parser and hdump-shell modules for heap dump analysis

7e61ef6

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implement HdumpPath query language for heap dump analysis

dc6dff6

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add heap dump docs and improve API

fa3fb26

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implement Session interface in JFRSession

15eb03f

Enables consistent session management across JFR and heap dump analysis. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Reduce default table rows for better terminal fit

1731e6b

Change reserved lines from 5 to 10 to show ~5 fewer rows per page 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add LeakDetector interface and 6 built-in detectors

5ef7782

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Integrate LeakDetectorRegistry with HdumpPathEvaluator

1d2d352

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add completers for new pipeline operators

906eebf

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fix package structure and compilation errors

c482301

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fix Optional handling and method name in hdump-shell

9e7d2b9

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

M3: On-demand inbound index builder

be0b4d4

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

jbachorik and others added 10 commits February 8, 2026 19:43

Add streaming support for 'show objects | top(N)' to avoid OOM

4a651a2

Add streaming support for all aggregations on objects (groupBy, count…

a48ae63

…, sum, stats)

jbachorik added the AI AI-generated code or contributions label Feb 10, 2026

jbachorik and others added 18 commits February 10, 2026 18:13

Update hdump docs with checkLeaks, pathToRoot, dominators

458e7d0

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove CLAUDE.md and prevent re-commit via .gitignore

ce1e4ca

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Unify groupBy sort param and top() syntax across JfrPath and HdumpPath

f027cd7

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Unify sortBy() syntax: multi-field, aliases, asc/desc keywords

57bcb05

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Align single-quote strings to raw semantics in HdumpPath

dc1f45f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add size unit literals (KB, MB, GB) to JfrPath filters

6cfdfa2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix stale docs: JfrPath pipeline operators are chainable

12e7c15

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add cross-references between JFR and heap dump docs

898374b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add head, tail, filter, distinct pipeline operators to JfrPath

a4bbb58

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add value transform operators to HdumpPath

5457304

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add filter functions to HdumpPath predicates

947cfc9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add --format and --limit support for hdump queries in unified shell

355df23

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Bump version to 0.12.0-SNAPSHOT

333ea87

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add heap dump (HPROF) analysis support#52

Add heap dump (HPROF) analysis support#52
jbachorik wants to merge 82 commits intomainfrom
jb/hdump

jbachorik commented Feb 10, 2026

Uh oh!

github-actions bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jbachorik commented Feb 10, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Combined JUnit Test Report

HTML Test Reports

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Feb 10, 2026 •

edited

Loading