perf(evm): optimize EVMC execute entry path with address-based module cache and instance reuse by starwarfan · Pull Request #366 · DTVMStack/DTVM

starwarfan · 2026-02-26T11:35:12Z

Replace per-call overhead (~2.4ms) with address-based module cache and object reuse, shared by both interpreter and multipass modes:

Address-based module cache: code_address + revision key with first/last 256-byte content validation to avoid re-parsing bytecode on repeated calls to the same contract
Stale-entry eviction when code at a cached address changes
EVMInstance reuse via resetForNewCall() instead of alloc/free per call
InterpreterExecContext reuse with deque-to-vector conversion and cross-call capacity caching to avoid ~32KB frame re-allocation
Interpreter fast path bypasses Runtime::callEVMMain for direct dispatch
Multipass path uses same cache with callEVMMain for JIT execution

1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):

N
Y

2. What is the scope of this PR (e.g. component or file name):

evm, runtime, dt_evmc_vm

3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):

Affects user behaviors
Contains CI/CD configuration changes
Contains documentation changes
Contains experimental features
Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Other

This PR optimizes the EVMC execute() entry path to eliminate redundant per-call overhead. The main bottleneck was that every EVMC call re-parsed the bytecode module and allocated fresh execution objects.

Address-based module cache: Uses code_address + revision as key in an unordered_map, with first/last 256-byte content validation to guard against address reuse with different bytecode. On cache hit with matching content, the existing parsed EVMModule is reused directly, avoiding the full module load path. When the code at a cached address changes, the stale entry is evicted and the module is unloaded before loading the new one.

Object reuse: EVMInstance (~33KB) is reused via resetForNewCall() instead of allocating/freeing per call. InterpreterExecContext uses a vector-based frame stack (replacing deque) with cross-call capacity caching to avoid repeated heap allocations of ~32KB frames.

Fast path for interpreter: When the module is cached, the interpreter fast path bypasses Runtime::callEVMMain and dispatches directly via BaseInterpreter::interpret(), saving additional function-call and setup overhead.

Multipass path: Uses the same address-based cache and instance reuse, but dispatches through callEVMMain for JIT execution.

Benchmark impact (vs evmone baseline):

Interpreter: fixed overhead halved from ~2.4ms to ~1.2ms
Multipass: fixed overhead reduced from ~2.4ms to microseconds (e.g. loop_v1: 2.8us, ADD/b0: 15.5us)

4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):

N
Y

5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:

Benchmark results using evmone-bench (Release mode, vs evmone baseline):

Interpreter: fixed overhead halved from ~2.4ms to ~1.2ms
Multipass: fixed overhead reduced from ~2.4ms to microseconds (e.g. loop_v1: 2.8us, ADD/b0: 15.5us)

6. Release note

perf(evm): optimize EVMC execute entry path with address-based module cache (code_address + revision key with content validation) and instance reuse, halving interpreter fixed overhead and reducing multipass overhead to microseconds.

… cache and instance reuse Replace per-call overhead (~2.4ms) with address-based module cache and object reuse, shared by both interpreter and multipass modes: - Address-based module cache: code_address + revision key with first/last 256-byte content validation to avoid re-parsing bytecode on repeated calls to the same contract - Stale-entry eviction when code at a cached address changes - EVMInstance reuse via resetForNewCall() instead of alloc/free per call - InterpreterExecContext reuse with deque-to-vector conversion and cross-call capacity caching to avoid ~32KB frame re-allocation - Interpreter fast path bypasses Runtime::callEVMMain for direct dispatch - Multipass path uses same cache with callEVMMain for JIT execution Benchmark impact (vs evmone baseline): - Interpreter: fixed overhead halved from ~2.4ms to ~1.2ms - Multipass: fixed overhead reduced from ~2.4ms to microseconds (e.g. loop_v1: 2.8us, ADD/b0: 15.5us) Made-with: Cursor

Copilot

Pull request overview

This PR optimizes the EVMC execute() path by caching loaded EVMModules by (code_address, revision) and reusing execution objects (notably EVMInstance and interpreter execution context) to reduce per-call overhead in both interpreter and multipass/JIT modes.

Changes:

Added an address+revision keyed module cache with code-content validation and stale-entry eviction.
Implemented EVMInstance::resetForNewCall() and reused a cached instance across EVMC calls.
Switched interpreter frame storage from deque to vector and added InterpreterExecContext::resetForNewCall(); added an interpreter fast path bypassing Runtime::callEVMMain.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
`src/vm/dt_evmc_vm.cpp`	Introduces address-based module cache, eviction, cached instance/context reuse, and interpreter fast path dispatch.
`src/runtime/evm_instance.h`	Declares `resetForNewCall()` to enable cross-call instance reuse.
`src/runtime/evm_instance.cpp`	Implements instance state reset for reuse (gas/memory/message stack/caches).
`src/evm/interpreter.h`	Changes frame stack container to `vector` and adds context reset for reuse across calls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-28T02:57:48Z

src/vm/dt_evmc_vm.cpp

+  auto *ModCode = reinterpret_cast<const uint8_t *>(Mod->Code);
+  size_t HeadLen = std::min(CodeSize, static_cast<size_t>(256));
+  if (std::memcmp(Code, ModCode, HeadLen) != 0)
+    return false;
+  if (CodeSize > 256) {
+    size_t TailLen = std::min(CodeSize, static_cast<size_t>(256));
+    size_t TailOffset = CodeSize - TailLen;


This file now uses std::min in validateCodeMatch(), but dt_evmc_vm.cpp does not include . Please include explicitly to avoid relying on transitive includes (which can break builds depending on standard library implementation/compile flags).

Copilot · 2026-02-28T02:57:48Z

src/vm/dt_evmc_vm.cpp

+  // L0 disabled: pointer comparison is unsafe when callers reuse addresses
+  // for different bytecode (e.g. test frameworks, repeated allocations).
+  // Fall through to L1 address-based lookup with content validation.
+
+  EVMModule *Mod = nullptr;
+
+  // L1: Address-based map lookup
+  CodeAddrRevKey AddrKey{Msg->code_address, Rev};
+  auto It = VM->AddrCache.find(AddrKey);
+  if (It != VM->AddrCache.end() &&
+      validateCodeMatch(Code, CodeSize, It->second)) {
+    Mod = It->second;
+  } else {
+    // Cold path: full module load
+    // If validation failed for an existing entry, evict the stale module
+    if (It != VM->AddrCache.end()) {
+      EVMModule *OldMod = It->second;
+      if (VM->CachedInst && VM->CachedInst->getModule() == OldMod) {
+        VM->Iso->deleteEVMInstance(VM->CachedInst);
+        VM->CachedInst = nullptr;
+      }
+      if (VM->L0Mod == OldMod)
+        VM->L0Mod = nullptr;
+      VM->RT->unloadEVMModule(OldMod);
+      VM->AddrCache.erase(It);
+    }
+    std::string ModName = "mod_" + std::to_string(VM->ModCounter++);
+    auto ModRet = VM->RT->loadEVMModule(ModName, Code, CodeSize, Rev);
+    if (!ModRet)
+      return nullptr;
+    Mod = *ModRet;
+    VM->AddrCache[AddrKey] = Mod;
+  }
+
+  // Update L0 cache
+  VM->LastCodePtr = Code;
+  VM->LastCodeSize = CodeSize;
+  VM->L0Mod = Mod;
+  return Mod;


findModuleCached() states that the L0 pointer-based cache is disabled, but it still updates LastCodePtr/LastCodeSize/L0Mod and maintains invalidation logic for L0Mod. Since these members are otherwise unused, this adds confusion and maintenance cost. Either remove the L0 members/updates entirely, or re-enable the L0 lookup with a safe validation scheme (e.g., pointer+size plus a hash/content check).

Copilot · 2026-02-28T02:57:48Z

src/runtime/evm_instance.h

+  /// Reset instance state for reuse in interpreter mode.
+  /// Avoids the cost of destroy + recreate on every EVMC execute() call.


The comment says resetForNewCall() is for reuse in interpreter mode, but the new dt_evmc_vm path calls it for multipass/JIT executions as well (getOrCreateInstance()). Please update the comment to reflect that the reset is used for general cross-call instance reuse across modes, or restrict usage if it truly must be interpreter-only.

Suggested change

/// Reset instance state for reuse in interpreter mode.

/// Avoids the cost of destroy + recreate on every EVMC execute() call.

/// Reset instance state for reuse across EVMC execute() calls, regardless

/// of execution mode (interpreter, multipass, or JIT). This avoids the cost

/// of destroying and recreating the instance for each call.

Copilot · 2026-02-28T02:57:49Z

src/evm/interpreter.h

 class InterpreterExecContext {
 private:
  runtime::EVMInstance *Inst;
-  std::deque<EVMFrame> FrameStack;
+  std::vector<EVMFrame> FrameStack;
  evmc_status_code Status = EVMC_SUCCESS;


Changing FrameStack from std::deque to std::vector can invalidate pointers/references to existing frames on growth (reallocation). This is unsafe here because allocTopFrame() stores pointers to Frame.Msg in EVMInstance::MessageStack and interpreter code keeps EVMFrame* (e.g., parent frame) across nested calls; a reallocation would turn those pointers into dangling pointers and can cause memory corruption. Use a container with stable addresses (e.g., keep std::deque), or reserve a hard upper bound up-front and enforce max depth so FrameStack never reallocates, or store frames/messages in separately allocated stable storage.

Copilot · 2026-02-28T02:57:49Z

src/evm/interpreter.h

+  /// Reset state for reuse across calls. Keeps allocated capacity to avoid
+  /// re-allocating the ~32KB EVMFrame on every call.
+  void resetForNewCall(runtime::EVMInstance *NewInst) {
+    Inst = NewInst;
+    FrameStack.clear(); // keeps vector capacity
+    Status = EVMC_SUCCESS;
+    ReturnData.clear(); // keeps vector capacity
+    IsJump = false;
+    ExeResult = evmc::Result{EVMC_SUCCESS, 0, 0};
+  }


resetForNewCall() intends to preserve ReturnData/FrameStack capacity, but many interpreter paths set return data via setReturnData(std::vector<uint8_t>()) / move-assigning a new vector, which typically discards the existing buffer and defeats cross-call capacity caching. To realize the intended optimization, consider switching those "clear return data" sites to a clear() on the existing buffer (or add a dedicated clearReturnData() API) instead of replacing the vector instance.

Copilot · 2026-02-28T02:57:49Z

src/vm/dt_evmc_vm.cpp

+/// Validate that the cached module's code matches the provided code.
+/// Checks code_size + first 256 bytes + last 256 bytes.
+bool validateCodeMatch(const uint8_t *Code, size_t CodeSize,
+                       const EVMModule *Mod) {
+  if (CodeSize != Mod->CodeSize)
+    return false;
+  if (CodeSize == 0)
+    return true;
+  auto *ModCode = reinterpret_cast<const uint8_t *>(Mod->Code);
+  size_t HeadLen = std::min(CodeSize, static_cast<size_t>(256));
+  if (std::memcmp(Code, ModCode, HeadLen) != 0)
+    return false;
+  if (CodeSize > 256) {
+    size_t TailLen = std::min(CodeSize, static_cast<size_t>(256));
+    size_t TailOffset = CodeSize - TailLen;
+    if (std::memcmp(Code + TailOffset, ModCode + TailOffset, TailLen) != 0)
+      return false;
  }
-  uint32_t Crc = 0xFFFFFFFFu;
-  for (size_t I = 0; I < Size; ++I)
-    Crc = Table[(Crc ^ Data[I]) & 0xFFu] ^ (Crc >> 8);
-  return Crc ^ 0xFFFFFFFFu;
+  return true;


validateCodeMatch() only compares code size plus the first/last 256 bytes. Two different bytecode blobs can share the same prefix/suffix while differing in the middle, which would incorrectly treat modified code as a cache hit and execute the wrong module. For correctness, the cache validation needs a full-code identity check (e.g., keccak256/CRC over the full code, or a host-provided code hash) rather than a partial window comparison.

starwarfan force-pushed the opt-evmc-cache branch from 24c5bfe to 381db72 Compare February 26, 2026 13:09

zoowii requested a review from Copilot February 28, 2026 02:50

Copilot started reviewing on behalf of zoowii February 28, 2026 02:50 View session

Copilot AI reviewed Feb 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(evm): optimize EVMC execute entry path with address-based module cache and instance reuse#366

perf(evm): optimize EVMC execute entry path with address-based module cache and instance reuse#366
starwarfan wants to merge 1 commit intoDTVMStack:mainfrom
starwarfan:opt-evmc-cache

starwarfan commented Feb 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 28, 2026

Uh oh!

Copilot AI Feb 28, 2026

Uh oh!

Copilot AI Feb 28, 2026

Uh oh!

Copilot AI Feb 28, 2026

Uh oh!

Copilot AI Feb 28, 2026

Uh oh!

Copilot AI Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		/// Reset instance state for reuse in interpreter mode.
		/// Avoids the cost of destroy + recreate on every EVMC execute() call.

-  /// Reset instance state for reuse in interpreter mode.
-  /// Avoids the cost of destroy + recreate on every EVMC execute() call.
+  /// Reset instance state for reuse across EVMC execute() calls, regardless
+  /// of execution mode (interpreter, multipass, or JIT). This avoids the cost
+  /// of destroying and recreating the instance for each call.

Conversation

starwarfan commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):

2. What is the scope of this PR (e.g. component or file name):

3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):

4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):

5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:

6. Release note

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

starwarfan commented Feb 26, 2026 •

edited

Loading