perf(evm): optimize EVMC execute entry path with address-based module cache and instance reuse#366
perf(evm): optimize EVMC execute entry path with address-based module cache and instance reuse#366starwarfan wants to merge 1 commit intoDTVMStack:mainfrom
Conversation
… cache and instance reuse Replace per-call overhead (~2.4ms) with address-based module cache and object reuse, shared by both interpreter and multipass modes: - Address-based module cache: code_address + revision key with first/last 256-byte content validation to avoid re-parsing bytecode on repeated calls to the same contract - Stale-entry eviction when code at a cached address changes - EVMInstance reuse via resetForNewCall() instead of alloc/free per call - InterpreterExecContext reuse with deque-to-vector conversion and cross-call capacity caching to avoid ~32KB frame re-allocation - Interpreter fast path bypasses Runtime::callEVMMain for direct dispatch - Multipass path uses same cache with callEVMMain for JIT execution Benchmark impact (vs evmone baseline): - Interpreter: fixed overhead halved from ~2.4ms to ~1.2ms - Multipass: fixed overhead reduced from ~2.4ms to microseconds (e.g. loop_v1: 2.8us, ADD/b0: 15.5us) Made-with: Cursor
24c5bfe to
381db72
Compare
There was a problem hiding this comment.
Pull request overview
This PR optimizes the EVMC execute() path by caching loaded EVMModules by (code_address, revision) and reusing execution objects (notably EVMInstance and interpreter execution context) to reduce per-call overhead in both interpreter and multipass/JIT modes.
Changes:
- Added an address+revision keyed module cache with code-content validation and stale-entry eviction.
- Implemented
EVMInstance::resetForNewCall()and reused a cached instance across EVMC calls. - Switched interpreter frame storage from
dequetovectorand addedInterpreterExecContext::resetForNewCall(); added an interpreter fast path bypassingRuntime::callEVMMain.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
src/vm/dt_evmc_vm.cpp |
Introduces address-based module cache, eviction, cached instance/context reuse, and interpreter fast path dispatch. |
src/runtime/evm_instance.h |
Declares resetForNewCall() to enable cross-call instance reuse. |
src/runtime/evm_instance.cpp |
Implements instance state reset for reuse (gas/memory/message stack/caches). |
src/evm/interpreter.h |
Changes frame stack container to vector and adds context reset for reuse across calls. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| auto *ModCode = reinterpret_cast<const uint8_t *>(Mod->Code); | ||
| size_t HeadLen = std::min(CodeSize, static_cast<size_t>(256)); | ||
| if (std::memcmp(Code, ModCode, HeadLen) != 0) | ||
| return false; | ||
| if (CodeSize > 256) { | ||
| size_t TailLen = std::min(CodeSize, static_cast<size_t>(256)); | ||
| size_t TailOffset = CodeSize - TailLen; |
There was a problem hiding this comment.
This file now uses std::min in validateCodeMatch(), but dt_evmc_vm.cpp does not include . Please include explicitly to avoid relying on transitive includes (which can break builds depending on standard library implementation/compile flags).
| // L0 disabled: pointer comparison is unsafe when callers reuse addresses | ||
| // for different bytecode (e.g. test frameworks, repeated allocations). | ||
| // Fall through to L1 address-based lookup with content validation. | ||
|
|
||
| EVMModule *Mod = nullptr; | ||
|
|
||
| // L1: Address-based map lookup | ||
| CodeAddrRevKey AddrKey{Msg->code_address, Rev}; | ||
| auto It = VM->AddrCache.find(AddrKey); | ||
| if (It != VM->AddrCache.end() && | ||
| validateCodeMatch(Code, CodeSize, It->second)) { | ||
| Mod = It->second; | ||
| } else { | ||
| // Cold path: full module load | ||
| // If validation failed for an existing entry, evict the stale module | ||
| if (It != VM->AddrCache.end()) { | ||
| EVMModule *OldMod = It->second; | ||
| if (VM->CachedInst && VM->CachedInst->getModule() == OldMod) { | ||
| VM->Iso->deleteEVMInstance(VM->CachedInst); | ||
| VM->CachedInst = nullptr; | ||
| } | ||
| if (VM->L0Mod == OldMod) | ||
| VM->L0Mod = nullptr; | ||
| VM->RT->unloadEVMModule(OldMod); | ||
| VM->AddrCache.erase(It); | ||
| } | ||
| std::string ModName = "mod_" + std::to_string(VM->ModCounter++); | ||
| auto ModRet = VM->RT->loadEVMModule(ModName, Code, CodeSize, Rev); | ||
| if (!ModRet) | ||
| return nullptr; | ||
| Mod = *ModRet; | ||
| VM->AddrCache[AddrKey] = Mod; | ||
| } | ||
|
|
||
| // Update L0 cache | ||
| VM->LastCodePtr = Code; | ||
| VM->LastCodeSize = CodeSize; | ||
| VM->L0Mod = Mod; | ||
| return Mod; |
There was a problem hiding this comment.
findModuleCached() states that the L0 pointer-based cache is disabled, but it still updates LastCodePtr/LastCodeSize/L0Mod and maintains invalidation logic for L0Mod. Since these members are otherwise unused, this adds confusion and maintenance cost. Either remove the L0 members/updates entirely, or re-enable the L0 lookup with a safe validation scheme (e.g., pointer+size plus a hash/content check).
| /// Reset instance state for reuse in interpreter mode. | ||
| /// Avoids the cost of destroy + recreate on every EVMC execute() call. |
There was a problem hiding this comment.
The comment says resetForNewCall() is for reuse in interpreter mode, but the new dt_evmc_vm path calls it for multipass/JIT executions as well (getOrCreateInstance()). Please update the comment to reflect that the reset is used for general cross-call instance reuse across modes, or restrict usage if it truly must be interpreter-only.
| /// Reset instance state for reuse in interpreter mode. | |
| /// Avoids the cost of destroy + recreate on every EVMC execute() call. | |
| /// Reset instance state for reuse across EVMC execute() calls, regardless | |
| /// of execution mode (interpreter, multipass, or JIT). This avoids the cost | |
| /// of destroying and recreating the instance for each call. |
| class InterpreterExecContext { | ||
| private: | ||
| runtime::EVMInstance *Inst; | ||
| std::deque<EVMFrame> FrameStack; | ||
| std::vector<EVMFrame> FrameStack; | ||
| evmc_status_code Status = EVMC_SUCCESS; |
There was a problem hiding this comment.
Changing FrameStack from std::deque to std::vector can invalidate pointers/references to existing frames on growth (reallocation). This is unsafe here because allocTopFrame() stores pointers to Frame.Msg in EVMInstance::MessageStack and interpreter code keeps EVMFrame* (e.g., parent frame) across nested calls; a reallocation would turn those pointers into dangling pointers and can cause memory corruption. Use a container with stable addresses (e.g., keep std::deque), or reserve a hard upper bound up-front and enforce max depth so FrameStack never reallocates, or store frames/messages in separately allocated stable storage.
| /// Reset state for reuse across calls. Keeps allocated capacity to avoid | ||
| /// re-allocating the ~32KB EVMFrame on every call. | ||
| void resetForNewCall(runtime::EVMInstance *NewInst) { | ||
| Inst = NewInst; | ||
| FrameStack.clear(); // keeps vector capacity | ||
| Status = EVMC_SUCCESS; | ||
| ReturnData.clear(); // keeps vector capacity | ||
| IsJump = false; | ||
| ExeResult = evmc::Result{EVMC_SUCCESS, 0, 0}; | ||
| } |
There was a problem hiding this comment.
resetForNewCall() intends to preserve ReturnData/FrameStack capacity, but many interpreter paths set return data via setReturnData(std::vector<uint8_t>()) / move-assigning a new vector, which typically discards the existing buffer and defeats cross-call capacity caching. To realize the intended optimization, consider switching those "clear return data" sites to a clear() on the existing buffer (or add a dedicated clearReturnData() API) instead of replacing the vector instance.
| /// Validate that the cached module's code matches the provided code. | ||
| /// Checks code_size + first 256 bytes + last 256 bytes. | ||
| bool validateCodeMatch(const uint8_t *Code, size_t CodeSize, | ||
| const EVMModule *Mod) { | ||
| if (CodeSize != Mod->CodeSize) | ||
| return false; | ||
| if (CodeSize == 0) | ||
| return true; | ||
| auto *ModCode = reinterpret_cast<const uint8_t *>(Mod->Code); | ||
| size_t HeadLen = std::min(CodeSize, static_cast<size_t>(256)); | ||
| if (std::memcmp(Code, ModCode, HeadLen) != 0) | ||
| return false; | ||
| if (CodeSize > 256) { | ||
| size_t TailLen = std::min(CodeSize, static_cast<size_t>(256)); | ||
| size_t TailOffset = CodeSize - TailLen; | ||
| if (std::memcmp(Code + TailOffset, ModCode + TailOffset, TailLen) != 0) | ||
| return false; | ||
| } | ||
| uint32_t Crc = 0xFFFFFFFFu; | ||
| for (size_t I = 0; I < Size; ++I) | ||
| Crc = Table[(Crc ^ Data[I]) & 0xFFu] ^ (Crc >> 8); | ||
| return Crc ^ 0xFFFFFFFFu; | ||
| return true; |
There was a problem hiding this comment.
validateCodeMatch() only compares code size plus the first/last 256 bytes. Two different bytecode blobs can share the same prefix/suffix while differing in the middle, which would incorrectly treat modified code as a cache hit and execute the wrong module. For correctness, the cache validation needs a full-code identity check (e.g., keccak256/CRC over the full code, or a host-provided code hash) rather than a partial window comparison.
Replace per-call overhead (~2.4ms) with address-based module cache and object reuse, shared by both interpreter and multipass modes:
code_address + revisionkey with first/last 256-byte content validation to avoid re-parsing bytecode on repeated calls to the same contractresetForNewCall()instead of alloc/free per callRuntime::callEVMMainfor direct dispatchcallEVMMainfor JIT execution1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):
2. What is the scope of this PR (e.g. component or file name):
evm,runtime,dt_evmc_vm3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):
This PR optimizes the EVMC
execute()entry path to eliminate redundant per-call overhead. The main bottleneck was that every EVMC call re-parsed the bytecode module and allocated fresh execution objects.Address-based module cache: Uses
code_address + revisionas key in anunordered_map, with first/last 256-byte content validation to guard against address reuse with different bytecode. On cache hit with matching content, the existing parsedEVMModuleis reused directly, avoiding the full module load path. When the code at a cached address changes, the stale entry is evicted and the module is unloaded before loading the new one.Object reuse:
EVMInstance(~33KB) is reused viaresetForNewCall()instead of allocating/freeing per call.InterpreterExecContextuses a vector-based frame stack (replacing deque) with cross-call capacity caching to avoid repeated heap allocations of ~32KB frames.Fast path for interpreter: When the module is cached, the interpreter fast path bypasses
Runtime::callEVMMainand dispatches directly viaBaseInterpreter::interpret(), saving additional function-call and setup overhead.Multipass path: Uses the same address-based cache and instance reuse, but dispatches through
callEVMMainfor JIT execution.Benchmark impact (vs evmone baseline):
4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):
5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:
Benchmark results using evmone-bench (Release mode, vs evmone baseline):
6. Release note