Skip to content

Comments

Modular refactor#252

Open
hazemawadalla wants to merge 2 commits intomlcommons:TF_KVCachefrom
hazemawadalla:modular-refactor
Open

Modular refactor#252
hazemawadalla wants to merge 2 commits intomlcommons:TF_KVCachefrom
hazemawadalla:modular-refactor

Conversation

@hazemawadalla
Copy link
Contributor

Pushed a fix for the benchmark stall that occurs when running NVMe-only configurations (cpu=0, gpu=0) with preconditioning enabled. The root cause was a thread race in eviction that double-decremented the memory tracker until it thought the disk was empty, which disabled further eviction and filled the filesystem and caused the threads to stall compounded by capacity guards that assumed a next tier always exists for cascade, and no failure handling in the preconditioning loop. The fix adds an existence check before decrementing, detects when NVMe is the terminal tier and relaxes the guards accordingly, bails out of preconditioning after 50 consecutive failures. I also fixed a performance issue in the eviction loop: previously, every time we needed to evict one entry, we re-scanned and re-sorted the entire cache to find the oldest item so evicting 100 entries meant 100 full sorts of 60k entries. Now we sort once, walk through the list with an index, and skip any entries another thread already removed. Same eviction order, ~100x less CPU work at scale. Also fixed a TOCTOU race in NVMe file deletion and switched to os.statvfs for more accurate capacity detection.

All 206 tests pass, including new test classes covering 3-tier cascade (GPU→CPU→NVMe→delete), NVMe-only eviction with concurrent threads, and an educational 7-part test that traces the full request flow from user simulation through KV cache sizing, the 4-level latency hierarchy, .npy file I/O, and waterfall eviction.

pytest tests/test_kv_cache.py -v -k "TestThreeTierEvictionCascade"
pytest tests/test_kv_cache.py -v -k "TestNVMeOnlyEviction"
pytest tests/test_kv_cache.py -v -s --log-cli-level=DEBUG -k "TestVisualizeUserRequestFlow"

Benchmark stalls when all I/O targets NVMe (cpu=0, gpu=0) with
preconditioning enabled. Three root causes fixed, plus an O(n²)
eviction optimization:

1. Thread race in eviction concurrent threads evict the same LRU
   entry, double-decrementing nvme_memory_used until it hits ~0.
   Fix: check entry existence under metadata_lock before decrementing;
   use live size from cache_entries; clean up entry_locks for evicted keys.

2. Eviction guards reject writes on the terminal tier  the 95% size
   cap, 80% target, and low-data bailout all assume a next tier exists.
   Fix: detect terminal tier (is_last_tier) and relax all three guards.

3. Preconditioning spins forever — failed allocations never increment
   written_bytes. Fix: consecutive-failure bailout (50) with backoff.

4. O(n²) LRU scan — each eviction re-scanned and re-sorted the full
   entry list. Fix: single sorted snapshot with index walk; refresh
   only if exhausted (2 scans max instead of thousands).

Supporting fixes:
- os.statvfs for NVMe capacity (f_bavail excludes reserved blocks)
- path.unlink(missing_ok=True) for NVMe delete TOCTOU race
- Fallback "all tiers full" path now tracks nvme_memory_used

Tests: New test classes
TestThreeTierEvictionCascade (3 tests:
GPU→CPU→NVMe→delete cascade via fake GPU backend),
TestNVMeOnlyEviction (4 tests: allocation, file deletion, no negative
drift, concurrent threads), TestVisualizeUserRequestFlow (7 tests:
educational trace of full request pipeline). Model config count
updated 5→9 with deepseek-v3, qwen3-32b, gpt-oss-120b, gpt-oss-20b.

Docs: Move MLperf proposal and sources.md into docs/ subdirectory.

Files changed:
  kv_cache/cache.py      — eviction logic, capacity detection, fallback tracking
  kv_cache/benchmark.py  — preconditioning stall protection
  kv_cache/backends.py   — NVMe delete race fix
  tests/test_kv_cache.py — model configs, 3 new test classes
  docs/                  — moved from project root
@hazemawadalla hazemawadalla requested a review from a team February 20, 2026 23:12
@hazemawadalla hazemawadalla requested a review from a team as a code owner February 20, 2026 23:12
@github-actions
Copy link

MLCommons CLA bot:
Thank you very much for your submission, we really appreciate it. Before we can accept your contribution, we ask that you sign the MLCommons CLA (Apache 2). Please use this [Google form] (https://forms.gle/Ew1KkBVpyeJDuRw67) to initiate authorization. If you are from an MLCommons member organization, we will request that you be added to the CLA. If you are not from a member organization, we will email you a CLA to sign. For any questions, please contact support@mlcommons.org.
0 out of 1 committers have signed the MLCommons CLA.
@HazemAwadallah
You can retrigger this bot by commenting recheck in this Pull Request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants