Skip to content

perf(evm): optimize EVMC execute entry path with address-based module cache and instance reuse#366

Open
starwarfan wants to merge 1 commit intoDTVMStack:mainfrom
starwarfan:opt-evmc-cache
Open

perf(evm): optimize EVMC execute entry path with address-based module cache and instance reuse#366
starwarfan wants to merge 1 commit intoDTVMStack:mainfrom
starwarfan:opt-evmc-cache

Conversation

@starwarfan
Copy link
Contributor

@starwarfan starwarfan commented Feb 26, 2026

Replace per-call overhead (~2.4ms) with address-based module cache and object reuse, shared by both interpreter and multipass modes:

  • Address-based module cache: code_address + revision key with first/last 256-byte content validation to avoid re-parsing bytecode on repeated calls to the same contract
  • Stale-entry eviction when code at a cached address changes
  • EVMInstance reuse via resetForNewCall() instead of alloc/free per call
  • InterpreterExecContext reuse with deque-to-vector conversion and cross-call capacity caching to avoid ~32KB frame re-allocation
  • Interpreter fast path bypasses Runtime::callEVMMain for direct dispatch
  • Multipass path uses same cache with callEVMMain for JIT execution

1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):

  • N
  • Y

2. What is the scope of this PR (e.g. component or file name):

evm, runtime, dt_evmc_vm

3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):

  • Affects user behaviors
  • Contains CI/CD configuration changes
  • Contains documentation changes
  • Contains experimental features
  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Other

This PR optimizes the EVMC execute() entry path to eliminate redundant per-call overhead. The main bottleneck was that every EVMC call re-parsed the bytecode module and allocated fresh execution objects.

Address-based module cache: Uses code_address + revision as key in an unordered_map, with first/last 256-byte content validation to guard against address reuse with different bytecode. On cache hit with matching content, the existing parsed EVMModule is reused directly, avoiding the full module load path. When the code at a cached address changes, the stale entry is evicted and the module is unloaded before loading the new one.

Object reuse: EVMInstance (~33KB) is reused via resetForNewCall() instead of allocating/freeing per call. InterpreterExecContext uses a vector-based frame stack (replacing deque) with cross-call capacity caching to avoid repeated heap allocations of ~32KB frames.

Fast path for interpreter: When the module is cached, the interpreter fast path bypasses Runtime::callEVMMain and dispatches directly via BaseInterpreter::interpret(), saving additional function-call and setup overhead.

Multipass path: Uses the same address-based cache and instance reuse, but dispatches through callEVMMain for JIT execution.

Benchmark impact (vs evmone baseline):

  • Interpreter: fixed overhead halved from ~2.4ms to ~1.2ms
  • Multipass: fixed overhead reduced from ~2.4ms to microseconds (e.g. loop_v1: 2.8us, ADD/b0: 15.5us)

4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):

  • N
  • Y

5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:

  • Unit test
  • Integration test
  • Benchmark (add benchmark stats below)
  • Manual test (add detailed scripts or steps below)
  • Other

Benchmark results using evmone-bench (Release mode, vs evmone baseline):

  • Interpreter: fixed overhead halved from ~2.4ms to ~1.2ms
  • Multipass: fixed overhead reduced from ~2.4ms to microseconds (e.g. loop_v1: 2.8us, ADD/b0: 15.5us)

6. Release note

perf(evm): optimize EVMC execute entry path with address-based module cache (code_address + revision key with content validation) and instance reuse, halving interpreter fixed overhead and reducing multipass overhead to microseconds.

… cache and instance reuse

Replace per-call overhead (~2.4ms) with address-based module cache and
object reuse, shared by both interpreter and multipass modes:

- Address-based module cache: code_address + revision key with
  first/last 256-byte content validation to avoid re-parsing bytecode
  on repeated calls to the same contract
- Stale-entry eviction when code at a cached address changes
- EVMInstance reuse via resetForNewCall() instead of alloc/free per call
- InterpreterExecContext reuse with deque-to-vector conversion and
  cross-call capacity caching to avoid ~32KB frame re-allocation
- Interpreter fast path bypasses Runtime::callEVMMain for direct dispatch
- Multipass path uses same cache with callEVMMain for JIT execution

Benchmark impact (vs evmone baseline):
- Interpreter: fixed overhead halved from ~2.4ms to ~1.2ms
- Multipass: fixed overhead reduced from ~2.4ms to microseconds
  (e.g. loop_v1: 2.8us, ADD/b0: 15.5us)

Made-with: Cursor
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the EVMC execute() path by caching loaded EVMModules by (code_address, revision) and reusing execution objects (notably EVMInstance and interpreter execution context) to reduce per-call overhead in both interpreter and multipass/JIT modes.

Changes:

  • Added an address+revision keyed module cache with code-content validation and stale-entry eviction.
  • Implemented EVMInstance::resetForNewCall() and reused a cached instance across EVMC calls.
  • Switched interpreter frame storage from deque to vector and added InterpreterExecContext::resetForNewCall(); added an interpreter fast path bypassing Runtime::callEVMMain.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
src/vm/dt_evmc_vm.cpp Introduces address-based module cache, eviction, cached instance/context reuse, and interpreter fast path dispatch.
src/runtime/evm_instance.h Declares resetForNewCall() to enable cross-call instance reuse.
src/runtime/evm_instance.cpp Implements instance state reset for reuse (gas/memory/message stack/caches).
src/evm/interpreter.h Changes frame stack container to vector and adds context reset for reuse across calls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +74 to +80
auto *ModCode = reinterpret_cast<const uint8_t *>(Mod->Code);
size_t HeadLen = std::min(CodeSize, static_cast<size_t>(256));
if (std::memcmp(Code, ModCode, HeadLen) != 0)
return false;
if (CodeSize > 256) {
size_t TailLen = std::min(CodeSize, static_cast<size_t>(256));
size_t TailOffset = CodeSize - TailLen;
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file now uses std::min in validateCodeMatch(), but dt_evmc_vm.cpp does not include . Please include explicitly to avoid relying on transitive includes (which can break builds depending on standard library implementation/compile flags).

Copilot uses AI. Check for mistakes.
Comment on lines +191 to +229
// L0 disabled: pointer comparison is unsafe when callers reuse addresses
// for different bytecode (e.g. test frameworks, repeated allocations).
// Fall through to L1 address-based lookup with content validation.

EVMModule *Mod = nullptr;

// L1: Address-based map lookup
CodeAddrRevKey AddrKey{Msg->code_address, Rev};
auto It = VM->AddrCache.find(AddrKey);
if (It != VM->AddrCache.end() &&
validateCodeMatch(Code, CodeSize, It->second)) {
Mod = It->second;
} else {
// Cold path: full module load
// If validation failed for an existing entry, evict the stale module
if (It != VM->AddrCache.end()) {
EVMModule *OldMod = It->second;
if (VM->CachedInst && VM->CachedInst->getModule() == OldMod) {
VM->Iso->deleteEVMInstance(VM->CachedInst);
VM->CachedInst = nullptr;
}
if (VM->L0Mod == OldMod)
VM->L0Mod = nullptr;
VM->RT->unloadEVMModule(OldMod);
VM->AddrCache.erase(It);
}
std::string ModName = "mod_" + std::to_string(VM->ModCounter++);
auto ModRet = VM->RT->loadEVMModule(ModName, Code, CodeSize, Rev);
if (!ModRet)
return nullptr;
Mod = *ModRet;
VM->AddrCache[AddrKey] = Mod;
}

// Update L0 cache
VM->LastCodePtr = Code;
VM->LastCodeSize = CodeSize;
VM->L0Mod = Mod;
return Mod;
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findModuleCached() states that the L0 pointer-based cache is disabled, but it still updates LastCodePtr/LastCodeSize/L0Mod and maintains invalidation logic for L0Mod. Since these members are otherwise unused, this adds confusion and maintenance cost. Either remove the L0 members/updates entirely, or re-enable the L0 lookup with a safe validation scheme (e.g., pointer+size plus a hash/content check).

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +105
/// Reset instance state for reuse in interpreter mode.
/// Avoids the cost of destroy + recreate on every EVMC execute() call.
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says resetForNewCall() is for reuse in interpreter mode, but the new dt_evmc_vm path calls it for multipass/JIT executions as well (getOrCreateInstance()). Please update the comment to reflect that the reset is used for general cross-call instance reuse across modes, or restrict usage if it truly must be interpreter-only.

Suggested change
/// Reset instance state for reuse in interpreter mode.
/// Avoids the cost of destroy + recreate on every EVMC execute() call.
/// Reset instance state for reuse across EVMC execute() calls, regardless
/// of execution mode (interpreter, multipass, or JIT). This avoids the cost
/// of destroying and recreating the instance for each call.

Copilot uses AI. Check for mistakes.
Comment on lines 69 to 73
class InterpreterExecContext {
private:
runtime::EVMInstance *Inst;
std::deque<EVMFrame> FrameStack;
std::vector<EVMFrame> FrameStack;
evmc_status_code Status = EVMC_SUCCESS;
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing FrameStack from std::deque to std::vector can invalidate pointers/references to existing frames on growth (reallocation). This is unsafe here because allocTopFrame() stores pointers to Frame.Msg in EVMInstance::MessageStack and interpreter code keeps EVMFrame* (e.g., parent frame) across nested calls; a reallocation would turn those pointers into dangling pointers and can cause memory corruption. Use a container with stable addresses (e.g., keep std::deque), or reserve a hard upper bound up-front and enforce max depth so FrameStack never reallocates, or store frames/messages in separately allocated stable storage.

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +91
/// Reset state for reuse across calls. Keeps allocated capacity to avoid
/// re-allocating the ~32KB EVMFrame on every call.
void resetForNewCall(runtime::EVMInstance *NewInst) {
Inst = NewInst;
FrameStack.clear(); // keeps vector capacity
Status = EVMC_SUCCESS;
ReturnData.clear(); // keeps vector capacity
IsJump = false;
ExeResult = evmc::Result{EVMC_SUCCESS, 0, 0};
}
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resetForNewCall() intends to preserve ReturnData/FrameStack capacity, but many interpreter paths set return data via setReturnData(std::vector<uint8_t>()) / move-assigning a new vector, which typically discards the existing buffer and defeats cross-call capacity caching. To realize the intended optimization, consider switching those "clear return data" sites to a clear() on the existing buffer (or add a dedicated clearReturnData() API) instead of replacing the vector instance.

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +84
/// Validate that the cached module's code matches the provided code.
/// Checks code_size + first 256 bytes + last 256 bytes.
bool validateCodeMatch(const uint8_t *Code, size_t CodeSize,
const EVMModule *Mod) {
if (CodeSize != Mod->CodeSize)
return false;
if (CodeSize == 0)
return true;
auto *ModCode = reinterpret_cast<const uint8_t *>(Mod->Code);
size_t HeadLen = std::min(CodeSize, static_cast<size_t>(256));
if (std::memcmp(Code, ModCode, HeadLen) != 0)
return false;
if (CodeSize > 256) {
size_t TailLen = std::min(CodeSize, static_cast<size_t>(256));
size_t TailOffset = CodeSize - TailLen;
if (std::memcmp(Code + TailOffset, ModCode + TailOffset, TailLen) != 0)
return false;
}
uint32_t Crc = 0xFFFFFFFFu;
for (size_t I = 0; I < Size; ++I)
Crc = Table[(Crc ^ Data[I]) & 0xFFu] ^ (Crc >> 8);
return Crc ^ 0xFFFFFFFFu;
return true;
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateCodeMatch() only compares code size plus the first/last 256 bytes. Two different bytecode blobs can share the same prefix/suffix while differing in the middle, which would incorrectly treat modified code as a cache hit and execute the wrong module. For correctness, the cache validation needs a full-code identity check (e.g., keccak256/CRC over the full code, or a host-provided code hash) rather than a partial window comparison.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants