-
Notifications
You must be signed in to change notification settings - Fork 1.9k
GROOVY-10307: Improve invokedynamic performance with optimized caching #2374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: GROOVY_4_0_X
Are you sure you want to change the base?
GROOVY-10307: Improve invokedynamic performance with optimized caching #2374
Conversation
This commit significantly improves invokedynamic performance in Grails 7 / Groovy 4 by implementing several key optimizations: 1. **Disabled Global SwitchPoint Guard**: Removed the global SwitchPoint that caused ALL call sites to invalidate when ANY metaclass changed. In Grails and similar frameworks with frequent metaclass modifications, this was causing massive guard failures and performance degradation. Individual call sites now manage their own invalidation via cache clearing. Can be re-enabled with: -Dgroovy.indy.switchpoint.guard=true 2. **Monomorphic Fast-Path**: Added volatile fields (latestClassName/ latestMethodHandleWrapper) to CacheableCallSite to skip synchronization and map lookups for call sites that consistently see the same types, which is the common case in practice. 3. **Call Site Cache Invalidation**: Added call site registration and cache clearing mechanism. When metaclass changes, all registered call sites have their caches cleared and targets reset to the default (fromCache) path, ensuring metaclass changes are visible. 4. **Optimized Cache Operations**: Added get() and putIfAbsent() methods to CacheableCallSite for more efficient cache access patterns, and clearCache() for metaclass change invalidation. 5. **Pre-Guard Method Handle Storage**: Selector now stores the method handle before argument type guards are applied (handleBeforeArgGuards), enabling potential future optimizations for direct invocation. The thresholds can be tuned via system properties: - groovy.indy.optimize.threshold (default: 10000) - groovy.indy.fallback.threshold (default: 10000) - groovy.indy.callsite.cache.size (default: 4) - groovy.indy.switchpoint.guard (default: false)
|
I do not believe the |
Remove monomorphic fast-path, optimized cache ops, and pre-guard method handle storage. Testing showed these provide no measurable benefit in Grails 7 benchmarks while adding complexity. The minimal optimization (disabled SwitchPoint guard + cache invalidation) achieves the same performance improvement with 75% less code (69 lines vs 277 lines). Removed: - Monomorphic fast-path (latestClassName/latestMethodHandleWrapper fields) - Optimized cache operations (get/putIfAbsent methods) - Pre-guard method handle storage (handleBeforeArgGuards, directMethodHandle) Retained: - Disabled global SwitchPoint guard by default - Cache invalidation on metaclass changes - Call site registration for invalidation tracking
Optimizations Retained
Optimizations Removed (No Measurable Benefit)
Testing showed these additional optimizations provide no measurable performance benefit in the Grails 7 benchmark while adding complexity. |
| // sufficient, combined with cache invalidation on metaclass changes. | ||
| // | ||
| // If you need strict metaclass change detection, set groovy.indy.switchpoint.guard=true | ||
| if (SystemUtil.getBooleanSafe("groovy.indy.switchpoint.guard")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one question I have about this is whether this check ought to be inverted. That is, should the default behaviour be the old way, which we know is less performant but more battle-tested, or this way? Could Grails simply turn this flag on by default, thus solving their problem, without impacting other users?
Also, as a nitpicky thing, I reckon this string ought to be a public static final field so that we could add some tests around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this PRs approach is viable, but does not prove to be generally valuable for all/most code run with Indy on, than I think the inverse would be fine for Grails and we could set that flag in default generated applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of why I'd like to run this branch against the current benchmarks on CI. That would likely provide some confidence as to whether this was "safe" to merge without impacting anyone's expectations about how the language behaves.
|
I'm also well outside my expertise here, but so long as we're doing this, I'd be curious to see how this runs against the performance tests we have, compared to baseline. I tried running That seems comparable, so my assumption is this hasn't hurt the most basic performance measures we do. Main reason I'd want CI confirmation is to make sure it wasn't a quirk of my machine. |
This PR significantly improves invokedynamic performance in Grails 7 / Groovy 4 by implementing several key optimizations:
https://issues.apache.org/jira/browse/GROOVY-10307
apache/grails-core#15293
Using https://github.com/jamesfredley/grails7-performance-regression as a test application
10K/10K = groovy.indy.optimize.threshold/groovy.indy.fallback.threshold
Key Results
Problem Statement
Root Cause Analysis
The performance in the Grails 7 Test Application was traced to the global SwitchPoint guard in Groovy's invokedynamic implementation:
Optimizations Applied
Disabled Global SwitchPoint Guard
-Dgroovy.indy.switchpoint.guard=trueCall Site Cache Invalidation
ALL_CALL_SITESset with weak references)Optimizations Removed (No Measurable Benefit)
Testing showed these additional optimizations provide no measurable performance benefit while adding complexity:
Monomorphic Fast-Path(REMOVED)Added volatile fields (latestClassName/latestMethodHandleWrapper) toCacheableCallSiteSkips synchronization and map lookups for call sites that consistently see the same typesOptimized Cache Operations(REMOVED)Addedget()method for cache lookup without automatic putAddedputIfAbsent()for thread-safe cache populationgetAndPut()sufficientPre-Guard Method Handle Storage(REMOVED)Selector stores the method handle before argument type guards are applied (handleBeforeArgGuards)Enables potential future optimizations for direct invocationWhat Was NOT Changed
Performance Results
Test Environment
Runtime Performance Comparison (4GB Heap - Recommended)
Startup Time Comparison
Visual Performance Summary (4GB Heap)
Configuration Test Matrix
All tests performed on January 30, 2026 with JDK 25.0.2 (Corretto), 4GB heap (-Xmx4g -Xms2g). Runtime values are stabilized averages (runs 3-5).
Failed Optimization Attempts
These approaches were tried but caused test failures. They are documented here to prevent repeating the same mistakes.
Failed Attempt 1: Skip Metaclass Guards for Cache Hits
What it tried: Store method handle before guards, use unguarded handle for cache hits.
Why it failed: Call sites can be polymorphic (called with multiple receiver types). The unguarded handles contain
explicitCastArgumentswith embedded type casts that throwClassCastExceptionwhen different types pass through.Failed Attempt 2: Version-Based Metaclass Guards
What it tried: Use metaclass version instead of identity for guards, skip guards for "cache hits."
Why it failed: Same polymorphic call site issue - skipping guards for cache hits is fundamentally unsafe.
Failed Attempt 3: Skip Argument Guards After Warmup
What it tried: After warmup period, use handles without argument type guards.
Why it failed: Call sites can receive different argument types over their lifetime. Skipping argument guards causes
ClassCastExceptionorIncompatibleClassChangeError.Key Learning
Any optimization that bypasses guards based on cache state is unsafe because:
explicitCastArgumentswill throw if types don't matchSafe optimizations must:
Remaining Performance Gap
The optimized indy is still ~2.3x slower than non-indy (139ms vs 60ms). The remaining gap comes from:
Potential Future Optimization Areas
Configuration Reference
System Properties
groovy.indy.optimize.thresholdgroovy.indy.fallback.thresholdgroovy.indy.callsite.cache.sizegroovy.indy.switchpoint.guardFiles Modified
Source Files
Selector.javaIndyInterface.javaCacheableCallSite.javaclearCache()method