cpu-o3: using reverse ordered tick & refactor the stalls logic by tastynoob · Pull Request #756 · OpenXiangShan/GEM5

tastynoob · 2026-02-09T11:32:25Z

Change-Id: I9b599a4e0d704215ad1a3bf543dbd075384fe1f4

Summary by CodeRabbit

Refactor
- Major pipeline rewrite: new time-buffered stage flow, consolidated per-thread local buffers, unified stall signaling, and reordered tick sequence for steadier instruction movement.
Bug Fixes
- Improved commit/ROB readiness and stall behavior to reduce starvation and incorrect ordering.
- Safer load/memory handling and guarded diagnostics to avoid unsafe prints.
New Features
- Perf recording now captures per-instruction Result values for richer output.

coderabbitai · 2026-02-09T11:32:46Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Consolidates per-stage stall signaling into a new public StallSignals struct, replaces many per-thread FIFO/skid queues with per-thread boost::circular_buffer fixed buffers across fetch/decode/rename/iew/commit, rewires stage wiring and tick order, renames several APIs, and adjusts commit/ROB/LSQ logging and commit buffering.

Changes

Cohort / File(s)	Summary
Communication & TimeStruct `src/cpu/o3/comm.hh`	Introduce `StallSignals` (per-thread `blockFetch/blockDecode/blockRename/blockIEW`); remove scattered stall booleans; adjust `IewComm`/`CommitComm` fields (add `robheadSeqNum`, remove some counters) and reorganize TimeStruct wiring.
Commit Stage `src/cpu/o3/commit.hh`, `src/cpu/o3/commit.cc`	Add per-thread `fixedbuffer` (boost::circular_buffer), `StallSignals* stallSig` + setter; rename `getInsts()`→`moveInstsToBuffer()`; implement per-thread move-to-buffer logic, robheadSeqNum broadcasting, and adjust commit debounce/ROB insertion flow.
CPU top & TimeBuffer wiring `src/cpu/o3/cpu.hh`, `src/cpu/o3/cpu.cc`, `docs/.../gem5_new_pipelilne.md`	Replace per-stage queues with `*Timebuffer` variants; add stall-signal wiring and `stallSignals` member; change tick order (commit→iew→rename→decode→fetch); adapt drain/squash iterator APIs and advance new timebuffers; add explanatory doc.
Fetch `src/cpu/o3/fetch.hh`, `src/cpu/o3/fetch.cc`	Wire in `StallSignals* stallSig` via setter; remove many SMT-selection helpers and per-thread stall state; simplify cache/stall handling and send-to-decode logic to consult stall signals (current single-thread placeholder).
Decode `src/cpu/o3/decode.hh`, `src/cpu/o3/decode.cc`	Replace per-thread queues/skids with `fixedbuffer[]` and global `stallBuffer` (`boost::circular_buffer`); rename APIs (`sortInsts`→`moveInstsToBuffer`, `squash`→`selfSquash`, `updateStatus`→`updateActivate`); add `setStallSignals`.
Rename `src/cpu/o3/rename.hh`, `src/cpu/o3/rename.cc`	Switch to per-thread `fixedbuffer[]`, remove many per-thread counters/state, add `setStallSignals`, rename `sortInsts`→`moveInstsToBuffer`, change `canRename()` to infer counts from buffers.
IEW `src/cpu/o3/iew.hh`, `src/cpu/o3/iew.cc`	Replace skid/inst deques with `fixedbuffer[]`; add `StallSignals* stallSig` and `serializeOnNextInst[]`; rename/rewire several APIs (`sortInsts`→`moveInstsToBuffer`, dispatch APIs parameter changes), consolidate dispatch/LSQ logic around fixedbuffer.
ROB / Commit integration `src/cpu/o3/rob.hh`, `src/cpu/o3/rob.cc`	Adjust head-group commit readiness, suppress same-cycle squash progression, and rename `findInst` parameter to `seqnum`.
LSQ, Regfile, Misc fixes `src/cpu/o3/lsq_unit.cc`, `src/cpu/o3/lsq.cc`, `src/cpu/o3/regfile.hh`, `src/cpu/o3/inst_queue.cc`, `src/cpu/o3/issue_queue.cc`, `src/cpu/o3/rename_map.cc`, `src/cpu/o3/dyn_inst.hh`, `src/cpu/reg_class.hh`	Guard and buffer memData prints, change debug categories to `Scoreboard`, remove `DynInst::isTempSerializeAfter`, add defensive reg-class guard, tweak panic/assert messages.
Perf / Meta changes `src/cpu/o3/perfCCT.hh`, `src/cpu/o3/perfCCT.cc`, `src/cpu/o3/BaseO3CPU.py`, `configs/common/xiangshan.py`	Add `Result` detail and `InstMeta::value` tracking/serialization; increase `PerfCCT::MaxMetas`; add `Result` to PerfRecord; adjust SQL typing for `result*` columns.
Header/API updates `src/cpu/o3/{commit,fetch,decode,iew,rename}.hh`, `src/cpu/o3/cpu.hh`	Add `setStallSignals(StallSignals*)` setters across stages; change several method signatures (`sortInsts`→`moveInstsToBuffer`, `getInsts`→`moveInstsToBuffer`, `squashInstIt` return-type change); swap containers to circular buffers/deque where applicable.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CPU as CPU (tick)
    participant Commit as Commit
    participant IEW as IEW
    participant Rename as Rename
    participant Decode as Decode
    participant Fetch as Fetch
    participant Stall as StallSignals

    rect rgba(200,200,255,0.5)
    CPU->>Commit: tick -> moveInstsToBuffer()
    Commit->>Commit: insert into ROB, update robheadSeqNum
    Commit->>Stall: set blockRename/blockIEW as needed
    end

    rect rgba(200,255,200,0.5)
    CPU->>IEW: tick -> moveInstsToBuffer()/dispatchInsts()
    IEW->>Stall: consult blockIEW / blockRename
    IEW->>Rename: hand off dispatchable insts
    end

    rect rgba(255,220,180,0.5)
    CPU->>Rename: tick -> moveInstsToBuffer() -> rename
    Rename->>Stall: consult blockDecode / blockRename
    Rename->>Decode: transfer renamed insts
    end

    rect rgba(255,200,200,0.5)
    CPU->>Decode: tick -> moveInstsToBuffer() -> decode
    Decode->>Stall: consult blockFetch / blockDecode
    Decode->>Fetch: request more fetch
    end

    rect rgba(220,220,220,0.5)
    CPU->>Fetch: tick -> sendInstructionsToDecode()
    Fetch->>Stall: respect blockFetch
    Fetch->>Decode: enqueue into decode fixedbuffer
    end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

cpu-pred-btb: remove fetch stream use fetch target #744 — Modifies comm.hh IewComm/CommitComm and stall/ROB signaling; directly overlaps with comm/time-struct and commit changes here.
cpu-o3: move resolve queue merging from iew to fetch stage #635 — Changes resolved-CFI propagation and IewComm resolvedCFIs; closely related to the comm.hh restructuring in this PR.
cpu-o3: rename execWB to toCommit wire for instruction squashing #678 — Updates IEW→Commit wiring and toCommit/IEW latency; overlaps with IEW/commit rewire and delay/getWire adjustments in this PR.

Poem

🐰
I hopped through buffers, round and neat,
Bundled stall flags, synced the beat.
Rings now spin where skids once lay,
Stalls in one bundle show the way.
Commit crunches — carrot, hooray!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.79% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'cpu-o3: using reverse ordered tick & refactor the stalls logic' clearly and specifically describes the main changes in the pull request: reverse-ordered tick implementation and stalls logic refactoring across multiple O3 pipeline stages.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch refactor-stalls

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)

src/cpu/o3/fetch.hh (1)

634-641: ⚠️ Potential issue | 🟡 Minor

Remove dead Stalls struct from fetch header; clarify incomplete drain handling.

The Stalls struct at lines 634-638 is unused since stalls[MaxThreads] was replaced by StallSignals* stallSig. Unlike the Rename stage which still actively uses its Stalls struct, fetch only references stallSig with its blockFetch member. The struct definition can be removed.

Additionally, StallSignals provides no drain equivalent, and drainStall() in fetch.cc is an empty stub. This suggests drain handling was intentionally removed or remains incomplete. Clarify whether drain functionality is intentional or needs implementation. The stallSig pointer initialization via setStallSignals() called during CPU construction is safe and does not require nullptr initialization.
src/cpu/o3/commit.cc (1)
1789-1857: ⚠️ Potential issue | 🟠 Major

Protect fixedbuffer intake when prior batch hasn’t drained.

getInsts() assumes the per-thread fixedbuffer is empty whenever rename delivers a new batch. If that invariant ever breaks, the new batch is mixed with older entries. A runtime guard makes this robust even in non-assert builds.
🛠️ Suggested guard
-    ThreadID tid = insts_from_rename > 0 ? fromRename->insts[0]->threadNumber : -1;
-    if (tid != -1) assert(fixedbuffer[tid].empty());
+    ThreadID tid = insts_from_rename > 0 ? fromRename->insts[0]->threadNumber
+                                         : InvalidThreadID;
+    if (tid != InvalidThreadID && !fixedbuffer[tid].empty()) {
+        stallSig->blockIEW[tid] = true;
+        DPRINTF(Commit, "[tid:%i] Fixedbuffer not drained; deferring rename intake\n", tid);
+        return;
+    }
src/cpu/o3/rename.cc (2)
652-675: ⚠️ Potential issue | 🟠 Major

Rename stage never deactivates.

any_unblocking is hardcoded to true, so the stage stays Active even when idle, preventing CPU idling. Compute it from stall signals/buffer state.
🛠️ Suggested fix
-    bool any_unblocking = true;
+    bool any_unblocking = false;
+    for (ThreadID tid = 0; tid < numThreads; ++tid) {
+        if (!stallSig->blockDecode[tid] && !fixedbuffer[tid].empty()) {
+            any_unblocking = true;
+            break;
+        }
+    }
477-606: ⚠️ Potential issue | 🟡 Minor

Fix DPRINTF format/argument mismatch causing undefined behavior.

The format string at lines 501-502 has 2 specifiers (%llu, %s) but 3 arguments are passed (tid, inst->seqNum, inst->pcState()). The tid argument is missing from the format string, which causes the printf-style formatter to misalign arguments and trigger undefined behavior.
Suggested fix
-            DPRINTF(Rename, "[sn:%llu] instruction  with PC %s is squashed, skipping.\n",
-                    tid, inst->seqNum, inst->pcState());
+            DPRINTF(Rename, "[tid:%i] [sn:%llu] instruction with PC %s is squashed, skipping.\n",
+                    tid, inst->seqNum, inst->pcState());
src/cpu/o3/decode.cc (1)
336-371: ⚠️ Potential issue | 🟡 Minor

updateActivate is effectively never called.

status_change is never set in tick(), so updateActivate() won’t reflect block/unblock transitions. Call it unconditionally or compute status_change from stall-signal changes.
🛠️ Minimal fix
-    if (status_change) {
-        updateActivate();
-    }
+    updateActivate();
Also applies to: 464-469
src/cpu/o3/iew.hh (1)
556-559: ⚠️ Potential issue | 🟡 Minor

Typo in comment: "fetrenamech" should be "rename".
Proposed fix
-        /** Distribution of number of fetrenamech stall reasons each tick. */
+        /** Distribution of number of rename stall reasons each tick. */

🤖 Fix all issues with AI agents

In `@src/cpu/o3/comm.hh`:
- Around line 328-335: The StallSignals struct's bool arrays (blockFetch,
blockDecode, blockRename, blockIEW) are left uninitialized causing undefined
behavior; update StallSignals to value-initialize these arrays (e.g., add
default member initializers or a constructor that sets all elements to false) so
every entry for MaxThreads is explicitly false on construction, ensuring any
reader sees deterministic values.

In `@src/cpu/o3/commit.hh`:
- Around line 179-182: Initialize the raw pointer member stallSig to nullptr in
the commit class declaration and add a direct include for
<boost/circular_buffer.hpp> to this header; specifically, in commit.hh set
StallSignals* stallSig = nullptr; (so uses in commit.cc at symbols referencing
stallSig are safe) and add `#include` <boost/circular_buffer.hpp> near other
includes so fixedbuffer (boost::circular_buffer<DynInstPtr>
fixedbuffer[MaxThreads]) no longer relies on a transitive include.

In `@src/cpu/o3/cpu.hh`:
- Around line 529-546: Remove the redundant member declaration stallSig from the
class (it duplicates the existing StallSignals stallSignals and is never used);
delete the line declaring StallSignals stallSig; and ensure no other code relies
on that member (the stages use setStallSignals() and their own stallSig
members), then rebuild to verify there are no references to the removed symbol.

In `@src/cpu/o3/decode.cc`:
- Around line 112-115: Decode::clearStates currently does nothing but must purge
any per-thread buffered state for the given ThreadID to avoid stale entries when
TIDs are reused; implement it to remove/clear entries belonging to tid from
Decode's per-thread containers (e.g., fixedBuffer and stallBuffer) and reset any
other per-thread bookkeeping used by Decode so removeThread() no longer leaves
stale instructions. Locate Decode::clearStates and ensure it iterates the
fixedBuffer and stallBuffer structures, erasing elements whose ThreadID matches
the tid argument (and clearing any related per-thread maps/counters), leaving
other threads' data intact.

In `@src/cpu/o3/decode.hh`:
- Around line 186-187: The member pointer stallSig is declared uninitialized
(StallSignals* stallSig;) and should be set to a null pointer to avoid
indeterminate-value access before setStallSignals is called; initialize stallSig
to nullptr at declaration or assign nullptr in the class constructor (the class
that contains stallSig and the setStallSignals method) so all code paths see a
well-defined value until setStallSignals populates it.

In `@src/cpu/o3/fetch.cc`:
- Around line 1294-1325: The code hardcodes ThreadID tid = 0 which disables SMT;
replace this with the proper thread selection logic (use selectFetchThread() or
the same selection used by sendInstructionsToDecode()) so tid is set to the
chosen thread before checking stallSig->blockFetch[tid] and pulling from
fetchQueue[tid]; if SMT selection is not yet implemented, add a fail-fast
(ASSERT/ fatal) when numThreads > 1 to avoid silent starvation and document that
selectFetchThread()/sendInstructionsToDecode() must be used to pick tid.

In `@src/cpu/o3/iew.cc`:
- Around line 820-837: The loop currently sets stallSig->blockRename for both
threads when multiple active threads are found (using tid,
stallSig->blockRename, fixedbuffer, canInsertLDSTQue), which stalls all SMT
threads; change it to allow exactly one active thread to proceed (e.g., pick a
single thread to keep blockRename=false and set blockRename=true for all other
active threads) and implement a simple selection policy (round-robin using a
persistent nextThread index or a priority tie-breaker) so that when scanning
fixedbuffer[] you assign tid to the chosen thread and mark every other active
thread's stallSig->blockRename true instead of blocking both.
- Line 429: The IEW→Commit timing is broken because toCommit is reading
iewQueue->getWire(0) while Commit expects the data at offset -iewToCommitDelay;
restore the intended zero-latency alignment by changing the producer read to use
the negative delay: set toCommit = iewQueue->getWire(-iewToCommitDelay)
(symbols: toCommit, iewQueue, getWire, iewToCommitDelay, fromIEW, Commit) so
both IEW and Commit use the same buffer offset convention.
- Around line 798-811: The issue is that canInsertLDSTQue(ThreadID) performs
side-effectful calls getAndResetLastLQPopEntries/getAndResetLastSQPopEntries
which are only executed when stallSig->blockIEW[i] is true (due to short-circuit
&&), so counters aren't reset otherwise; fix by invoking the reset logic
unconditionally and then using its result in the block condition: call a new or
existing getter that performs
getAndResetLastLQPopEntries/getAndResetLastSQPopEntries (or call
canInsertLDSTQue(i) and store its bool result) before computing bool block =
stallSig->blockIEW[i] && !... so that the reset side-effects always run, then
use the stored boolean in the original condition (or refactor canInsertLDSTQue
into pure check + separate reset function and call reset unconditionally in
dispatchInsts()).
- Around line 724-748: When detecting a squash in the loop
(fromCommit->commitInfo[i].squash) clear fixedbuffer[i] before calling squash(i)
and returning so any stale instructions are discarded; specifically, move or add
fixedbuffer[i].clear() immediately when fromCommit->commitInfo[i].squash is true
(before squash(i) and the return) because IEW::squash() does not clear the
buffer unlike decode/rename; keep the existing clear for the robSquashing path
unchanged.
- Around line 118-119: The postfix increment and decrement operators for the
wire class are returning references to temporaries; change the signatures of
wire::operator++(int) and wire::operator--(int) to return by value (wire)
instead of wire& and update their definitions accordingly so they return the
temporary (wire(this, i)) by value; ensure both the declaration in the class and
the out-of-line definitions (operator++(int) and operator--(int)) are adjusted
to match the new return type.
- Around line 755-759: The assert fixedbuffer[tid].empty() can fire legitimately
when dispatch stalls; change it to tolerate non-empty buffers by replacing the
unconditional assert with a conditional that either clears retained instructions
or asserts only if blockRename[tid] is false. Specifically, in the block
handling insts_from_rename/fromRename, replace "if (tid != -1)
assert(fixedbuffer[tid].empty());" with logic that: if tid != -1 and
fixedbuffer[tid] is not empty, then either call a clearing/rollback helper to
remove the newly moved instructions (so moveInstsToBuffer() leaves no leftovers)
or assert(!blockRename[tid]) (i.e., only assert emptiness when blockRename[tid]
is not set); reference functions/fields: insts_from_rename, fromRename->insts,
fixedbuffer, IEW::tick(), dispatchInsts(), blockRename, moveInstsToBuffer().

In `@src/cpu/o3/iew.hh`:
- Line 157: Initialize the raw pointer StallSignals* stallSig to nullptr (e.g.,
in IEW's member initializer or declaration) and update uses to check/assert
before dereferencing: add a null-check or assertion at the start of methods that
use stallSig (notably squash and squashDueToBranch) and in any code paths that
might run before setStallSignals() is called; ensure setStallSignals() still
assigns the pointer when available.
- Line 51: Remove the unused boost include: delete the line with `#include`
<boost/circular_buffer.hpp> from iew.hh because this header is not referenced in
this file (the container used is fixedbuffer declared as
std::deque<DynInstPtr>); ensure no other symbols in iew.hh depend on
boost::circular_buffer after removal and run a quick build to confirm no missing
includes.

In `@src/cpu/o3/regfile.hh`:
- Around line 423-427: The debug print in the VecPredRegClass case uses the
wrong regClass instance: replace the call to
vectorRegFile.regClass.valString(val) with
vecPredRegFile.regClass.valString(val) so the DPRINTF reflects the same register
file that vecPredRegFile.set(idx, val) writes to; update the VecPredRegClass
case handling (the DPRINTF line) to use vecPredRegFile.regClass.valString(val).

In `@src/cpu/o3/rename.cc`:
- Around line 389-410: The bug is that releaseSeq is computed using
historyBuffer->empty() which only checks thread 0; in Rename::releasePhysRegs
you must check the current thread’s history buffer (use
historyBuffer[tid].empty()) and, when non-empty, use
historyBuffer[tid].back().instSeqNum to set releaseSeq; update the assignment
inside the loop where fromCommit->commitInfo[tid].doneSeqNum is handled so it
references historyBuffer[tid] instead of historyBuffer.

🧹 Nitpick comments (5)

src/cpu/o3/commit.cc (1)
168-187: Prefer fatal_if for the renameToROBDelay invariant.

This constraint is fundamental to the new buffering path; enforcing it in release builds avoids silent misconfigurations.
🔧 Suggested change
-    assert(renameToROBDelay == 1);
+    fatal_if(renameToROBDelay != 1,
+             "renameToROBDelay must be 1 for O3 commit (got %d).",
+             renameToROBDelay);
src/cpu/o3/rename.cc (1)
76-84: Prefer fatal_if for the decodeToRenameDelay invariant.

This is a hard requirement for the refactor; enforcing it in release builds avoids silent misconfigurations.
🔧 Suggested change
-    assert(decodeToRenameDelay == 1);
+    fatal_if(decodeToRenameDelay != 1,
+             "decodeToRenameDelay must be 1 for O3 rename (got %d).",
+             decodeToRenameDelay);
src/cpu/o3/iew.hh (1)

391-391: Consider renaming fixedbuffer to something more descriptive.

The name fixedbuffer is vague — it doesn't convey that this is the per-thread dispatch buffer between rename and IEW. A name like renameBuffer or dispatchBuffer would improve readability and align with the role described in the AI summary. Also, std::deque is not "fixed" in any meaningful sense.
src/cpu/o3/iew.cc (2)
896-901: Commented-out serialize stall check should be removed or restored.

This entire block is commented out with no explanation. If the serialize handling is intentionally removed as part of this refactor, delete the dead code. If it's temporarily disabled for debugging, add a TODO or FIXME comment explaining why.

1650-1653: Tick ordering: moveInstsToBuffer() before checkSquash() may process instructions that should be squashed.

In the reverse-ordered tick, moveInstsToBuffer() runs first (line 1651), populating fixedbuffer with instructions from rename. Then checkSquash() runs (line 1652), potentially discovering a squash from commit. As noted in a separate comment, checkSquash doesn't clear fixedbuffer on the squash path. Even if it did, the ordering means one wasted cycle of work moving instructions that will immediately be discarded. Consider swapping the order so checkSquash() runs first.
Proposed reordering
     // dispatch
-    moveInstsToBuffer();
     checkSquash();
+    moveInstsToBuffer();
     dispatchInsts();

src/cpu/o3/comm.hh

src/cpu/o3/commit.hh

src/cpu/o3/cpu.hh

src/cpu/o3/decode.cc

src/cpu/o3/decode.hh

src/cpu/o3/iew.cc

src/cpu/o3/iew.hh

src/cpu/o3/regfile.hh

src/cpu/o3/rename.cc

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/cpu/o3/rename.cc (1)

592-613: ⚠️ Potential issue | 🟠 Major

Rename stage is forced active every cycle.

any_unblocking is initialized to true, so the deactivation branch is unreachable.

Suggested fix

-    bool any_unblocking = true;
+    bool any_unblocking = false;
+    for (auto tid : *activeThreads) {
+        if (!stallSig->blockRename[tid]) {
+            any_unblocking = true;
+            break;
+        }
+    }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/rename.cc` around lines 592 - 613, any_unblocking is incorrectly
initialized to true causing the rename stage to be forced Active every cycle;
change the initialization of any_unblocking so it reflects the real unblocking
condition (e.g., start as false and set to true only when you detect unblocking
events such as free physical regs, completed renames, or other conditions used
elsewhere in this file), then keep the existing branch that flips _status
between Inactive and Active and calls cpu->activateStage(CPU::RenameIdx) /
cpu->deactivateStage(CPU::RenameIdx); locate the variable any_unblocking and the
surrounding status logic (references: any_unblocking, _status, Inactive, Active,
cpu->activateStage, cpu->deactivateStage, CPU::RenameIdx) and compute its value
from the proper signals instead of hardcoding true.

src/cpu/o3/cpu.cc (1)

1497-1515: ⚠️ Potential issue | 🔴 Critical

Fix iterator underflow in squashInstIt (can hit UB at list boundaries).

Line 1515 always decrements instIt even when the iterator is at instList.begin() (or the list becomes empty after erase). That can cause undefined behavior during squash walks.

Proposed fix

-CPU::ListIt
-CPU::squashInstIt(ListIt &instIt, ThreadID tid)
+CPU::ListIt
+CPU::squashInstIt(ListIt instIt, ThreadID tid)
 {
+    const bool atBegin = (instIt == instList.begin());
+
     if ((*instIt)->threadNumber == tid) {
         DPRINTF(O3CPU, "Squashing instruction, "
                 "[tid:%i] [sn:%lli] PC %s\n",
                 (*instIt)->threadNumber,
                 (*instIt)->seqNum,
                 (*instIt)->pcState());

         // Mark it as squashed.
         (*instIt)->setSquashed();

         // `@todo`: Formulate a consistent method for deleting
         // instructions from the instruction list
         // Remove the instruction from the list.
         instIt = instList.erase(instIt);
     }
-    return --instIt;
+    if (instList.empty()) {
+        return instList.end();
+    }
+    if (atBegin) {
+        return instList.begin();
+    }
+    --instIt;
+    return instIt;
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/cpu.cc` around lines 1497 - 1515, The code in CPU::squashInstIt
decrements instIt unconditionally after erasing, which underflows if erase
returned begin() or the list became empty; change the logic so after
instList.erase(instIt) you check for boundary conditions: if instList.empty()
return instList.end(); otherwise if instIt == instList.begin() return instIt;
else return --instIt; this ensures you never pre-decrement begin() and handles
the empty-list case safely.

♻️ Duplicate comments (6)

src/cpu/o3/fetch.cc (1)

1305-1305: ⚠️ Potential issue | 🟠 Major

Hardcoded tid = 0 still disables SMT fetch/dispatch.

Line 1305 and Line 1675 keep single-thread hardcoding, so non-zero threads won’t make progress when numThreads > 1.

Minimal fail-fast until SMT selection is restored

 ThreadID
 Fetch::selectFetchThread()
 {
-    ThreadID tid = 0; // TODO: smt support
+    fatal_if(numThreads > 1,
+             "SMT fetch selection is not implemented; set numThreads=1.");
+    ThreadID tid = 0; // TODO: smt support

Also applies to: 1675-1675

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` at line 1305, Replace the hardcoded ThreadID tid = 0
usages with the actual thread identifier used by the fetch/dispatch code path so
SMT can make progress; locate the two occurrences where ThreadID tid is set (the
one currently written as "ThreadID tid = 0" and the duplicate at the other
occurrence) and initialize tid from the caller/context (e.g., use the
method/argument that carries thread index or the FetchState/ThreadContext’s
threadId) and ensure any subsequent SMT selection logic (in the same function
and the corresponding dispatch path) uses this tid rather than a constant 0.

src/cpu/o3/cpu.hh (1)

529-547: ⚠️ Potential issue | 🟡 Minor

Remove the redundant CPU-level stallSig member.

Line 529 already defines the shared stall state as stallSignals, but Line 546 adds another StallSignals instance. Keeping both invites accidental split wiring.

Suggested cleanup

     StallSignals stallSignals;
@@
-    StallSignals stallSig;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/cpu.hh` around lines 529 - 547, Remove the redundant CPU-level
member StallSignals stallSig: delete the declaration of stallSig and replace any
usage sites referencing stallSig to use the existing stallSignals instance
instead (search for stallSig and update references in methods/functions that
currently access it to use stallSignals). Ensure there are no remaining
duplicate definitions, update include/forward declarations if needed, and run
the build/tests to confirm no references remain to stallSig.

src/cpu/o3/rename.cc (1)

401-406: ⚠️ Potential issue | 🟠 Major

releaseSeq still checks the wrong history buffer.

historyBuffer->empty() does not check the current thread’s buffer and can corrupt per-thread release progression.

Suggested fix

-            releaseSeq = historyBuffer->empty() ? 0 : historyBuffer[tid].back().instSeqNum;
+            releaseSeq = historyBuffer[tid].empty() ? 0
+                                                    : historyBuffer[tid].back().instSeqNum;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/rename.cc` around lines 401 - 406, The code uses
historyBuffer->empty() which checks the container itself instead of the
per-thread buffer and can corrupt per-thread release progression; change the
check to historyBuffer[tid].empty() and compute releaseSeq from
historyBuffer[tid].back().instSeqNum only when that per-thread buffer is
non-empty (i.e., replace historyBuffer->empty() ? 0 :
historyBuffer[tid].back().instSeqNum with historyBuffer[tid].empty() ? 0 :
historyBuffer[tid].back().instSeqNum) to ensure releaseSeq is derived from the
correct thread buffer (symbols: fromCommit, commitInfo, tid, releaseSeq,
historyBuffer).

src/cpu/o3/decode.cc (1)

112-115: ⚠️ Potential issue | 🟠 Major

clearStates still doesn’t clear per-thread decode buffers.

This remains a no-op, so buffered entries for tid can survive remove/reuse cycles.

Suggested fix outline

 void
 Decode::clearStates(ThreadID tid)
 {
-
+    fixedbuffer[tid].clear();
+
+    auto delIt = stallBuffer.begin();
+    for (auto it = eachstallSize.begin(); it != eachstallSize.end();) {
+        const int size = *it;
+        auto start = delIt;
+        auto end = start + size;
+        if ((*start)->threadNumber == tid) {
+            delIt = stallBuffer.erase(start, end);
+            it = eachstallSize.erase(it);
+        } else {
+            delIt = end;
+            ++it;
+        }
+    }
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/decode.cc` around lines 112 - 115, Implement
Decode::clearStates(ThreadID tid) so it actually clears all per-thread decode
buffers and associated indices/state for the given tid: locate the
Decode::clearStates(ThreadID tid) definition and add logic to remove/clear any
per-thread buffer containers, queues or maps (the per-thread decode buffers and
their entries), reset head/tail or index counters and any cached decode state
for tid, and free or shrink underlying storage as needed so buffered entries
cannot survive remove/reuse cycles.

src/cpu/o3/iew.cc (2)

424-430: ⚠️ Potential issue | 🟠 Major

Validate IEW→Commit wire offset; code and comment are currently inconsistent.

The comment states IEW writes [-1] and Commit reads [-1], but Line 430 uses getWire(0). If Commit still consumes -iewToCommitDelay, this changes latency/visibility semantics.

#!/bin/bash
# Verify IEW producer and Commit consumer offsets for IEW queue.
rg -n 'toCommit\s*=\s*iewQueue->getWire' src/cpu/o3/iew.cc -C2
rg -n 'fromIEW\s*=\s*iewQueue->getWire' src/cpu/o3/commit.cc src/cpu/o3/commit.hh -C2
rg -n 'iewToCommitDelay' src/cpu/o3/iew.cc src/cpu/o3/commit.cc src/cpu/o3/commit.hh -C2

If zero-latency alignment is still intended

-    toCommit = iewQueue->getWire(0);
+    toCommit = iewQueue->getWire(-iewToCommitDelay);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 424 - 430, Comment and code disagree on the
IEW→Commit wire offset: the comment says IEW writes at [-1] but the code sets
toCommit = iewQueue->getWire(0); update the code so the IEW producer and Commit
consumer use the same queue offset (or update the comment to match the intended
behavior). Specifically, either change the assignment in iew.cc to use
iewQueue->getWire(-iewToCommitDelay) so it matches the "[-1]"/zero-latency
intent, or if zero-latency is not intended, change the comment to reflect
getWire(0); also verify the consumer binding in commit.cc/commit.hh (fromIEW)
uses the same offset and make them consistent (ensure symbols toCommit,
iewToCommitDelay, and fromIEW reference the same offset policy).

805-817: ⚠️ Potential issue | 🟠 Major

Avoid short-circuiting canInsertLDSTQue(); it has reset side effects.

canInsertLDSTQue() calls getAndResetLastLQPopEntries() / getAndResetLastSQPopEntries(). With stallSig->blockIEW[i] || !canInsertLDSTQue(i), those resets are skipped whenever blockIEW[i] is true.

Proposed fix

     for (int i = 0; i < numThreads; i++) {
-        bool block = stallSig->blockIEW[i] || !canInsertLDSTQue(i);
+        bool canInsert = canInsertLDSTQue(i);
+        bool block = stallSig->blockIEW[i] || !canInsert;
         bool active = !block && !fixedbuffer[i].empty();

Also applies to: 829-833

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 805 - 817, The expression using
short-circuiting with stallSig->blockIEW[i] prevents canInsertLDSTQue(i) from
being called and thus skips its
getAndResetLastLQPopEntries/getAndResetLastSQPopEntries side effects; fix by
ensuring canInsertLDSTQue(i) is always invoked (e.g., evaluate and store its
result in a local bool before combining with stallSig->blockIEW[i], or
explicitly call the reset/getters when blockIEW is true) so the reset side
effects in canInsertLDSTQue (and its internal calls to
getAndResetLastLQPopEntries/getAndResetLastSQPopEntries) always run; apply the
same change for the similar occurrence around the code referenced (lines
829-833) where short-circuiting currently skips the call.

🧹 Nitpick comments (1)

src/cpu/o3/issue_queue.cc (1)

331-331: Minor: Missing newline in panic message.

The panic format string is missing a trailing \n, which is inconsistent with other panic/DPRINTF calls in this file (e.g., lines 335, 348).

Suggested fix

-            if (!dst_inst->isLoad()) panic("dst[sn:%llu] is not load, src[sn:%llu]", dst_inst->seqNum, inst->seqNum);
+            if (!dst_inst->isLoad()) panic("dst[sn:%llu] is not load, src[sn:%llu]\n", dst_inst->seqNum, inst->seqNum);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/issue_queue.cc` at line 331, The panic call in the issue queue
check (the line calling panic(...) that verifies dst_inst->isLoad()) is missing
a trailing "\n" in its format string; update the panic format to include a
newline so it matches other panic/DPRINTF usage (e.g., add "\n" at the end of
the format string referencing dst_inst->seqNum and inst->seqNum).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cpu/o3/commit.cc`:
- Around line 1936-1937: The check uses rob->findInst(0, inst->seqNum) and will
panic for non-zero threads; update the call to use the instruction's thread
identifier (e.g., rob->findInst(inst->thread, inst->seqNum) or the correct field
name on Instruction if different) so the ROB lookup uses the instruction's
thread, keeping the panic_if and message unchanged.
- Around line 1387-1392: The code unsafely copies head_inst->effSize bytes into
an 8-byte buffer and then reads via an unaligned cast; instead, ensure effSize
is bounded and avoid the unaligned cast by copying into a properly typed
uint64_t: declare uint64_t value = 0; if (head_inst->memData &&
head_inst->effSize > 0) std::memcpy(&value, head_inst->memData,
std::min<std::size_t>(head_inst->effSize, sizeof(value))); then set load_value =
value and call loadTripleCounter.update(load_pc, load_addr, load_value); this
prevents overflow and UB from unaligned/oversized reads while preserving zero
extension for short loads.

In `@src/cpu/o3/decode.cc`:
- Around line 434-435: The activation path in tick() is never reached because
status_change is set to false and never updated; change the initialization to
call updateActivate() (or assign status_change = updateActivate()) so the
boolean reflects the actual activation transition, and ensure toRenameIndex is
still reset afterwards; apply the same fix to the second occurrence (the block
around the other initialization). Locate tick(), the local variable
status_change, the updateActivate() call, and toRenameIndex to implement this
swap so the activation logic runs when updateActivate() indicates a change.

In `@src/cpu/o3/fetch.cc`:
- Around line 1301-1304: In sendInstructionsToDecode(), when
stallSig->blockFetch[tid] is true the code must not drain fetchQueue[tid] or
forward instructions for that tid; add a guard around the per-thread
drain/forward logic (referencing fetchQueue[tid] and tid) to skip processing
when blockFetch[tid] is set. Also ensure that when no threads are active
(any_thread_active is false) you still reset or clear numInst before returning
so numInst cannot remain stale; move or add the numInst reset/clear immediately
before the early return that checks any_thread_active. Apply the same changes in
the nearby blocks mentioned (1308-1326, 1340-1341) where per-thread draining and
numInst handling occur.

In `@src/cpu/o3/iew.cc`:
- Around line 734-744: The loop currently returns after handling the first
thread with fromCommit->commitInfo[i].squash, skipping later threads; change the
control flow so all threads are processed: replace the early `return` in the
block that calls squash(i) with a `continue` (or otherwise let the loop proceed)
so squash(i), localSquashVer.update(...), fetchRedirect[i] = false and the
iewStats/stallEvents/ROBWalk and setAllStalls(StallReason::CommitSquash) logic
run for every thread that has commitInfo[i].squash; ensure any shared post-loop
behavior still executes only once if needed.

In `@src/cpu/o3/lsq.cc`:
- Around line 1664-1671: The code in SingleReq::recvTimingResp uses a fixed
8-byte local buffer and memcpy(buffer, pkt->getPtr<char>(), pkt->getSize())
which can overflow if pkt->getSize() > 8; change the logic in recvTimingResp to
copy at most sizeof(buffer) bytes (e.g., use std::min(pkt->getSize(), sizeof
buffer)) into buffer and, if pkt->getSize() is smaller than 8, zero-fill the
remaining bytes so the subsequent *((uint64_t*)buffer) read is safe; reference
the symbols recvTimingResp, pkt->getSize(), pkt->getPtr, buffer, and the DPRINTF
line when making the fix.

In `@src/cpu/o3/rename.cc`:
- Around line 483-485: The DPRINTF in rename.cc has a format/argument mismatch:
it prints "[sn:%llu] ... %s" but passes tid first and an extra argument; fix it
by making the sequence number the first argument and removing tid (or add a
corresponding format if tid should be logged). Concretely, update the
DPRINTF(...) call to pass (unsigned long long)inst->seqNum as the first argument
and inst->pcState() as the second (keep the "%llu" and "%s" format), referencing
the DPRINTF invocation and the inst->seqNum / inst->pcState() symbols to locate
the change.

In `@src/cpu/o3/rename.hh`:
- Around line 211-212: stallsSig is a raw pointer that may be dereferenced in
tick() before setStallSignals() is called; initialize stallSig to nullptr at its
declaration and add a defensive null-check (or assert) inside tick() before any
dereference to avoid undefined behavior, and apply the same initialization/check
fix for the other pointer instance noted around line 342; reference the member
name stallSig, the setter setStallSignals(), and the consumer tick() when making
the changes.

---

Outside diff comments:
In `@src/cpu/o3/cpu.cc`:
- Around line 1497-1515: The code in CPU::squashInstIt decrements instIt
unconditionally after erasing, which underflows if erase returned begin() or the
list became empty; change the logic so after instList.erase(instIt) you check
for boundary conditions: if instList.empty() return instList.end(); otherwise if
instIt == instList.begin() return instIt; else return --instIt; this ensures you
never pre-decrement begin() and handles the empty-list case safely.

In `@src/cpu/o3/rename.cc`:
- Around line 592-613: any_unblocking is incorrectly initialized to true causing
the rename stage to be forced Active every cycle; change the initialization of
any_unblocking so it reflects the real unblocking condition (e.g., start as
false and set to true only when you detect unblocking events such as free
physical regs, completed renames, or other conditions used elsewhere in this
file), then keep the existing branch that flips _status between Inactive and
Active and calls cpu->activateStage(CPU::RenameIdx) /
cpu->deactivateStage(CPU::RenameIdx); locate the variable any_unblocking and the
surrounding status logic (references: any_unblocking, _status, Inactive, Active,
cpu->activateStage, cpu->deactivateStage, CPU::RenameIdx) and compute its value
from the proper signals instead of hardcoding true.

---

Duplicate comments:
In `@src/cpu/o3/cpu.hh`:
- Around line 529-547: Remove the redundant CPU-level member StallSignals
stallSig: delete the declaration of stallSig and replace any usage sites
referencing stallSig to use the existing stallSignals instance instead (search
for stallSig and update references in methods/functions that currently access it
to use stallSignals). Ensure there are no remaining duplicate definitions,
update include/forward declarations if needed, and run the build/tests to
confirm no references remain to stallSig.

In `@src/cpu/o3/decode.cc`:
- Around line 112-115: Implement Decode::clearStates(ThreadID tid) so it
actually clears all per-thread decode buffers and associated indices/state for
the given tid: locate the Decode::clearStates(ThreadID tid) definition and add
logic to remove/clear any per-thread buffer containers, queues or maps (the
per-thread decode buffers and their entries), reset head/tail or index counters
and any cached decode state for tid, and free or shrink underlying storage as
needed so buffered entries cannot survive remove/reuse cycles.

In `@src/cpu/o3/fetch.cc`:
- Line 1305: Replace the hardcoded ThreadID tid = 0 usages with the actual
thread identifier used by the fetch/dispatch code path so SMT can make progress;
locate the two occurrences where ThreadID tid is set (the one currently written
as "ThreadID tid = 0" and the duplicate at the other occurrence) and initialize
tid from the caller/context (e.g., use the method/argument that carries thread
index or the FetchState/ThreadContext’s threadId) and ensure any subsequent SMT
selection logic (in the same function and the corresponding dispatch path) uses
this tid rather than a constant 0.

In `@src/cpu/o3/iew.cc`:
- Around line 424-430: Comment and code disagree on the IEW→Commit wire offset:
the comment says IEW writes at [-1] but the code sets toCommit =
iewQueue->getWire(0); update the code so the IEW producer and Commit consumer
use the same queue offset (or update the comment to match the intended
behavior). Specifically, either change the assignment in iew.cc to use
iewQueue->getWire(-iewToCommitDelay) so it matches the "[-1]"/zero-latency
intent, or if zero-latency is not intended, change the comment to reflect
getWire(0); also verify the consumer binding in commit.cc/commit.hh (fromIEW)
uses the same offset and make them consistent (ensure symbols toCommit,
iewToCommitDelay, and fromIEW reference the same offset policy).
- Around line 805-817: The expression using short-circuiting with
stallSig->blockIEW[i] prevents canInsertLDSTQue(i) from being called and thus
skips its getAndResetLastLQPopEntries/getAndResetLastSQPopEntries side effects;
fix by ensuring canInsertLDSTQue(i) is always invoked (e.g., evaluate and store
its result in a local bool before combining with stallSig->blockIEW[i], or
explicitly call the reset/getters when blockIEW is true) so the reset side
effects in canInsertLDSTQue (and its internal calls to
getAndResetLastLQPopEntries/getAndResetLastSQPopEntries) always run; apply the
same change for the similar occurrence around the code referenced (lines
829-833) where short-circuiting currently skips the call.

In `@src/cpu/o3/rename.cc`:
- Around line 401-406: The code uses historyBuffer->empty() which checks the
container itself instead of the per-thread buffer and can corrupt per-thread
release progression; change the check to historyBuffer[tid].empty() and compute
releaseSeq from historyBuffer[tid].back().instSeqNum only when that per-thread
buffer is non-empty (i.e., replace historyBuffer->empty() ? 0 :
historyBuffer[tid].back().instSeqNum with historyBuffer[tid].empty() ? 0 :
historyBuffer[tid].back().instSeqNum) to ensure releaseSeq is derived from the
correct thread buffer (symbols: fromCommit, commitInfo, tid, releaseSeq,
historyBuffer).

---

Nitpick comments:
In `@src/cpu/o3/issue_queue.cc`:
- Line 331: The panic call in the issue queue check (the line calling panic(...)
that verifies dst_inst->isLoad()) is missing a trailing "\n" in its format
string; update the panic format to include a newline so it matches other
panic/DPRINTF usage (e.g., add "\n" at the end of the format string referencing
dst_inst->seqNum and inst->seqNum).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4fc39e5 and 33b75e7.

📒 Files selected for processing (23)

src/cpu/o3/comm.hh
src/cpu/o3/commit.cc
src/cpu/o3/commit.hh
src/cpu/o3/cpu.cc
src/cpu/o3/cpu.hh
src/cpu/o3/decode.cc
src/cpu/o3/decode.hh
src/cpu/o3/dyn_inst.hh
src/cpu/o3/fetch.cc
src/cpu/o3/fetch.hh
src/cpu/o3/iew.cc
src/cpu/o3/iew.hh
src/cpu/o3/inst_queue.cc
src/cpu/o3/issue_queue.cc
src/cpu/o3/lsq.cc
src/cpu/o3/lsq_unit.cc
src/cpu/o3/regfile.hh
src/cpu/o3/rename.cc
src/cpu/o3/rename.hh
src/cpu/o3/rename_map.cc
src/cpu/o3/rob.cc
src/cpu/o3/rob.hh
src/cpu/reg_class.hh

💤 Files with no reviewable changes (2)

src/cpu/o3/dyn_inst.hh
src/cpu/o3/inst_queue.cc

🚧 Files skipped from review as they are similar to previous changes (3)

src/cpu/o3/regfile.hh
src/cpu/o3/lsq_unit.cc
src/cpu/o3/fetch.hh

src/cpu/o3/commit.cc

src/cpu/o3/decode.cc

src/cpu/o3/fetch.cc

src/cpu/o3/iew.cc

src/cpu/o3/lsq.cc

src/cpu/o3/rename.cc

src/cpu/o3/rename.hh

github-actions · 2026-02-28T07:43:14Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.1727`	-
This PR	`1.9349`	📉 `-0.2378` (`-10.94%`)

✅ Difftest smoke test passed!

github-actions · 2026-02-28T07:45:52Z

🚀 Performance test triggered: spec06-0.8c

github-actions · 2026-02-28T09:48:43Z

🚀 Performance test triggered: spec06-0.8c

github-actions · 2026-02-28T09:54:46Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.1727`	-
This PR	`1.9349`	📉 `-0.2378` (`-10.94%`)

✅ Difftest smoke test passed!

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/cpu/o3/rename.cc (1)

590-603: ⚠️ Potential issue | 🟠 Major

updateActivate() currently cannot deactivate Rename.

Line 592 sets any_unblocking to true unconditionally, so the inactive/deactivate path is never taken.

Proposed fix

 void
 Rename::updateActivate()
 {
-    bool any_unblocking = true;
+    bool any_unblocking = false;
+    for (ThreadID tid = 0; tid < numThreads; ++tid) {
+        if (!stallSig->blockDecode[tid] && !fixedbuffer[tid].empty()) {
+            any_unblocking = true;
+            break;
+        }
+    }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/rename.cc` around lines 590 - 603, Rename::updateActivate
currently hardcodes any_unblocking = true which prevents deactivation; change it
to compute the real unblocking condition (replace the constant true with the
actual boolean expression that checks Rename's internal state: e.g., whether
there are pending entries, free rename/rob slots, or other wakeup conditions
used elsewhere in Rename) and then handle the deactivation path: when
any_unblocking is false and _status == Active set _status = Inactive, log
DPRINTF(Activity, "Deactivating stage.\n"); and call
cpu->deactivateStage(CPU::RenameIdx) so the stage can properly deactivate.
Ensure you update references to any_unblocking, _status, Inactive, Active,
cpu->activateStage and cpu->deactivateStage in Rename::updateActivate.

src/cpu/o3/cpu.cc (1)

1497-1515: ⚠️ Potential issue | 🔴 Critical

squashInstIt() can decrement an invalid iterator after erase.

At Line 1515, return --instIt; is unsafe when instIt is at begin() or the list becomes empty after erase. This can trigger undefined behavior in squash paths.

Suggested fix

 CPU::ListIt
 CPU::squashInstIt(ListIt &instIt, ThreadID tid)
 {
     if ((*instIt)->threadNumber == tid) {
@@
-        instIt = instList.erase(instIt);
+        instIt = instList.erase(instIt);
+        if (instList.empty() || instIt == instList.begin()) {
+            return instIt;
+        }
+        auto prev = instIt;
+        --prev;
+        return prev;
     }
-    return --instIt;
+    if (instIt == instList.begin()) {
+        return instIt;
+    }
+    auto prev = instIt;
+    --prev;
+    return prev;
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/cpu.cc` around lines 1497 - 1515, The function CPU::squashInstIt
currently does an unconditional return --instIt which can decrement begin() or
an invalid iterator after erase; change the logic so after possibly erasing (in
CPU::squashInstIt) you check if instIt == instList.begin() (or instIt ==
instList.end() && instList.empty()) and return instIt as-is, otherwise decrement
and return (--instIt); ensure both the erased-path and non-erased-path use this
safe check so you never decrement begin() or an empty list iterator.

♻️ Duplicate comments (13)

src/cpu/o3/rename.cc (1)

401-406: ⚠️ Potential issue | 🟠 Major

Use the current thread’s history buffer when computing releaseSeq.

On Line 405, historyBuffer->empty() checks only thread 0. In SMT, this can compute the wrong release boundary for other threads.

Proposed fix

-            releaseSeq = historyBuffer->empty() ? 0 : historyBuffer[tid].back().instSeqNum;
+            releaseSeq = historyBuffer[tid].empty() ? 0 : historyBuffer[tid].back().instSeqNum;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/rename.cc` around lines 401 - 406, The releaseSeq computation uses
historyBuffer->empty() which checks only thread 0; change it to test the current
thread's history buffer (use historyBuffer[tid].empty()) and then use
historyBuffer[tid].back().instSeqNum for releaseSeq when not empty; update the
block around fromCommit->commitInfo[tid], tid, releaseSeq, and historyBuffer
references accordingly so SMT threads use their own history buffers.

src/cpu/o3/comm.hh (1)

330-337: ⚠️ Potential issue | 🟠 Major

Initialize StallSignals arrays to deterministic defaults.

Line 333-Line 336 declares raw bool arrays with no initialization, so early reads may observe garbage values.

Proposed fix

 struct StallSignals
 {
-    bool blockFetch[MaxThreads];// decode to fetch
-    bool blockDecode[MaxThreads];// rename to decode
-    bool blockRename[MaxThreads];// iew to rename (if iew is stalling, rename all threads would be stalled)
-    bool blockIEW[MaxThreads];// commit to iew
+    bool blockFetch[MaxThreads]{};   // decode to fetch
+    bool blockDecode[MaxThreads]{};  // rename to decode
+    bool blockRename[MaxThreads]{};  // iew to rename
+    bool blockIEW[MaxThreads]{};     // commit to iew
 };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/comm.hh` around lines 330 - 337, StallSignals declares raw bool
arrays (blockFetch, blockDecode, blockRename, blockIEW) that are uninitialized;
add deterministic initialization by either providing an in-struct initializer
(e.g., = {false} / = {} for each array) or adding a default constructor
StallSignals() that zeroes all MaxThreads entries for blockFetch, blockDecode,
blockRename, and blockIEW so every element is false on construction.

src/cpu/o3/commit.cc (2)

1936-1937: ⚠️ Potential issue | 🟠 Major

Use the instruction’s thread in ROB lookup.

Line 1936 hardcodes rob->findInst(0, inst->seqNum). This is wrong for non-zero threads and can panic incorrectly.

Proposed fix

-            panic_if(!rob->findInst(0, inst->seqNum), "[sn:%llu] Committed instruction not found in ROB",
+            panic_if(!rob->findInst(inst->threadNumber, inst->seqNum),
+                     "[sn:%llu] Committed instruction not found in ROB",
                      inst->seqNum);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/commit.cc` around lines 1936 - 1937, The ROB lookup is using a
hardcoded thread 0: change the call rob->findInst(0, inst->seqNum) to use the
instruction's actual thread identifier (e.g. rob->findInst(inst->threadId(),
inst->seqNum) or rob->findInst(inst->thread->id, inst->seqNum) depending on the
project API) so the lookup uses inst's thread rather than 0 and avoids false
panics; update the call site in commit.cc where rob->findInst and inst->seqNum
are referenced.

1387-1392: ⚠️ Potential issue | 🔴 Critical

Fix unsafe load-value extraction (overflow + UB risk).

Line 1389 copies effSize bytes into an 8-byte buffer without bounds, and Line 1391 reads through a casted pointer. This can overflow and trigger undefined behavior.

Proposed fix

-                    char buffer[8] = {0};
-                    if (head_inst->memData) {
-                        std::memcpy(buffer, head_inst->memData, head_inst->effSize);
-                    }
-                    Addr load_value = *((uint64_t *)buffer);
+                    uint64_t load_value_u64 = 0;
+                    if (head_inst->memData && head_inst->effSize > 0) {
+                        const size_t copy_size =
+                            std::min<size_t>(head_inst->effSize, sizeof(load_value_u64));
+                        std::memcpy(&load_value_u64, head_inst->memData, copy_size);
+                    }
+                    Addr load_value = static_cast<Addr>(load_value_u64);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/commit.cc` around lines 1387 - 1392, The code unsafely copies
head_inst->effSize bytes into an 8-byte stack buffer and then reads it via a
uint64_t cast, which can overflow and cause UB; fix by guarding and building the
64-bit value safely: ensure head_inst->memData is non-null, clamp the copy
length to at most 8 bytes (e.g., size_t n = std::min<size_t>(head_inst->effSize,
8)), zero-initialize the destination (or initialize a uint64_t value = 0), then
copy only n bytes into that uint64_t using memcpy or a byte-wise loop to avoid
alignment/aliasing issues, and finally use that constructed load_value when
calling loadTripleCounter.update(load_pc, load_addr, load_value).

src/cpu/o3/decode.cc (2)

112-115: ⚠️ Potential issue | 🟠 Major

clearStates() should clear per-thread buffered state.

Line 112-Line 115 is currently a no-op, so stale instructions for tid can remain in fixedbuffer/stall queues during thread reuse.

Proposed fix

 void
 Decode::clearStates(ThreadID tid)
 {
-
+    fixedbuffer[tid].clear();
+
+    auto delIt = stallBuffer.begin();
+    for (auto it0 = eachstallSize.begin(); it0 != eachstallSize.end();) {
+        int size = *it0;
+        auto start_it = delIt;
+        auto end_it = start_it + size;
+        if ((*start_it)->threadNumber == tid) {
+            delIt = stallBuffer.erase(start_it, end_it);
+            it0 = eachstallSize.erase(it0);
+        } else {
+            delIt = end_it;
+            ++it0;
+        }
+    }
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/decode.cc` around lines 112 - 115, Decode::clearStates currently
does nothing, leaving per-thread buffered instructions alive across thread
reuse; implement Decode::clearStates(ThreadID tid) to remove/clear any entries
associated with the given tid from per-thread structures such as fixedbuffer and
any stall queues (and other per-thread buffers maintained by Decode), e.g.
iterate fixedbuffer and stall queue containers and erase or reset entries whose
thread id equals tid, and reset any per-thread bookkeeping counters/flags so the
thread starts with a clean state.

434-473: ⚠️ Potential issue | 🟠 Major

updateActivate() is never reached due a dead gate flag.

status_change is set to false on Line 434 and never updated, so the call on Line 471 does not run.

Proposed fix

-    bool status_change = false;
+    bool status_change = false;
@@
-    if (status_change) {
-        updateActivate();
-    }
+    updateActivate();

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/decode.cc` around lines 434 - 473, status_change is initialized
false and never updated so updateActivate() never runs; change the code so any
operation that can flip thread activation returns a bool and ORs into
status_change (for example make moveInstsToBuffer() and checkSquash() return a
bool and call status_change |= moveInstsToBuffer(); status_change |=
checkSquash(); or have those functions set a provided reference flag), and also
ensure decodeInsts(tid) reports/returns any activation changes (status_change |=
decodeInsts(tid); or decodeInsts sets the flag) before the final if
(status_change) updateActivate(); so updateActivate() is invoked when any of
those routines change status.

src/cpu/o3/iew.cc (3)

828-830: ⚠️ Potential issue | 🟠 Major

canInsertLDSTQue() should be evaluated unconditionally each tick.

canInsertLDSTQue() performs getAndResetLastLQPopEntries() / getAndResetLastSQPopEntries() side effects. With the current || expression, those resets are skipped whenever stallSig->blockIEW[i] is true.

Suggested fix

     for (int i = 0; i < numThreads; i++) {
-        bool block = stallSig->blockIEW[i] || !canInsertLDSTQue(i);
+        bool can_insert_ldst = canInsertLDSTQue(i);
+        bool block = stallSig->blockIEW[i] || !can_insert_ldst;
         bool active = !block && !fixedbuffer[i].empty();

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 828 - 830, The expression currently
short-circuits so canInsertLDSTQue(i) isn't called when stallSig->blockIEW[i] is
true; evaluate canInsertLDSTQue(i) every tick into a local bool (e.g., bool
canInsert = canInsertLDSTQue(i)) and then compute block as bool block =
stallSig->blockIEW[i] || !canInsert; keep the subsequent active calculation
(bool active = !block && !fixedbuffer[i].empty()) unchanged so the
side-effecting getAndResetLastLQPopEntries()/getAndResetLastSQPopEntries()
always run.

833-841: ⚠️ Potential issue | 🟠 Major

Current multi-thread arbitration can starve non-zero thread IDs.

When multiple threads are active, the logic keeps the first discovered thread as tid and blocks rename for both, which can repeatedly favor lower-index threads and degrade SMT fairness.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 833 - 841, The current arbitration in the
active-thread selection block (variables tid, i, InvalidThreadID and
stallSig->blockRename in iew.cc) always keeps the first-seen active thread and
marks both threads blocked, which biases lower-index threads; change this to a
fair round-robin/rotating selection: introduce or use a rotating start index
(e.g., lastChosenThread) and scan threads beginning after it to pick the next
active tid, then only set stallSig->blockRename for other threads as appropriate
(instead of marking both), and update lastChosenThread when you pick tid so
subsequent arbitration rotates priority and prevents starvation of non-zero
thread IDs.

734-744: ⚠️ Potential issue | 🟠 Major

Do not return on first squash inside checkSquash() loop.

At Line 743, returning after the first squashing thread skips squash handling for later threads in the same tick.

Suggested fix

-            return;
+            continue;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 734 - 744, The loop in checkSquash()
currently returns after handling the first squashing thread, which skips
processing subsequent threads; change the control flow to handle all threads by
replacing the early "return" with a "continue" (or otherwise removing the
return) so squash(i), localSquashVer.update(...), fetchRedirect[i] = false,
iewStats.stallEvents[ROBWalk]++, and setAllStalls(StallReason::CommitSquash)
execute for each thread that has fromCommit->commitInfo[i].squash; ensure the
remaining loop iterations still run and that any shared side-effects remain
consistent after this change.

src/cpu/o3/decode.hh (1)

186-186: ⚠️ Potential issue | 🟠 Major

Initialize stallSig to nullptr to avoid indeterminate pointer state.

Without default initialization, any pre-wiring access path can observe garbage pointer values.

Suggested fix

-    StallSignals* stallSig;
+    StallSignals* stallSig = nullptr;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/decode.hh` at line 186, The pointer member StallSignals* stallSig
is uninitialized and can contain garbage; initialize it to a known null state by
setting stallSig = nullptr where it's declared or by adding stallSig(nullptr) to
the constructor initializer list for the class that contains it (reference
symbol: stallSig / StallSignals* stallSig) so any pre-wiring access sees a
defined null pointer.

src/cpu/o3/fetch.cc (2)

1301-1312: ⚠️ Potential issue | 🟠 Major

Block/no-active paths should not forward instructions and must clear numInst.

At Line 1301, sendInstructionsToDecode() returns without clearing numInst, and at Line 1308 it still drains fetchQueue[tid] after detecting stallSig->blockFetch[tid]. This can leak stale per-cycle fetch accounting and violate decode backpressure.

Suggested fix

     if (!any_thread_active) {
         // All threads are blocked, no instructions to send
+        numInst = 0;
         return;
     }
@@
     if (stallSig->blockFetch[tid]) {
         // If decode stalled, use decode's stall reason
         DPRINTF(Fetch, "[tid:%i] Fetch stalled\n", tid);
         setAllFetchStalls(fromDecode->decodeInfo[tid].blockReason);
+        numInst = 0;
+        return;
     }

Also applies to: 1341-1341

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` around lines 1301 - 1312, sendInstructionsToDecode()
currently returns early when no threads are active or when
stallSig->blockFetch[tid] is set but leaves per-cycle counters and still drains
fetchQueue, which can leak stale numInst and break decode backpressure; update
the logic so that before each early return (both the any_thread_active false
path and the stallSig->blockFetch[tid] path referenced in the fetch loop and the
similar path around the other occurrence) you explicitly set numInst = 0 for the
corresponding ThreadID and refrain from draining fetchQueue[tid]; ensure you
still call setAllFetchStalls(fromDecode->decodeInfo[tid].blockReason) on stall
and return immediately after clearing numInst so no instructions are forwarded
or accounted for in that cycle.

1305-1305: ⚠️ Potential issue | 🟠 Major

Hardcoded tid = 0 still disables SMT progress.

Line 1305 and Line 1675 pin behavior to thread 0. With numThreads > 1, other threads can be starved indefinitely.

Minimal fail-fast until SMT scheduling is restored

-    ThreadID tid = 0; // TODO: smt support
+    fatal_if(numThreads > 1,
+             "SMT fetch selection is not implemented; set numThreads=1.");
+    ThreadID tid = 0; // TODO: smt support

Also applies to: 1675-1675

src/cpu/o3/iew.hh (1)

159-159: ⚠️ Potential issue | 🟠 Major

Initialize stallSig to nullptr at declaration.

stallSig currently has an indeterminate value until setStallSignals() is called, which is unsafe for any early-access path.
Suggested fix
-    StallSignals* stallSig;
+    StallSignals* stallSig = nullptr;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.hh` at line 159, Initialize the member pointer stallSig to
nullptr where it's declared (StallSignals* stallSig) in the iew.hh class
declaration so it has a defined value before setStallSignals() is called; update
the declaration to assign nullptr and ensure any early-access code guards
against a null stallSig until setStallSignals() assigns the real pointer.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cpu/o3/commit.cc`:
- Line 1246: The file uses DPRINTF with the debug categories CommitTrace and IEW
(e.g., the DPRINTF call that prints head_inst->genDisassembly()) but does not
include their debug headers; add the missing debug header includes for those
categories (for example include the corresponding debug/CommitTrace.hh and
debug/IEW.hh headers) near the other includes in the file so the CommitTrace and
IEW symbols are declared and compilation no longer depends on transitive
includes.

In `@src/cpu/o3/rename_map.cc`:
- Around line 111-121: The file uses DPRINTF(Scoreboard, ...) (seen around the
map assignment and refcount logs) but doesn't include the Scoreboard debug
header; add an include for "debug/Scoreboard.hh" alongside the existing
"debug/Rename.hh" include in src/cpu/o3/rename_map.cc so the Scoreboard debug
flag symbol is defined and builds resolve (search for the DPRINTF(Scoreboard,
...) calls to place the include near other debug headers).

---

Outside diff comments:
In `@src/cpu/o3/cpu.cc`:
- Around line 1497-1515: The function CPU::squashInstIt currently does an
unconditional return --instIt which can decrement begin() or an invalid iterator
after erase; change the logic so after possibly erasing (in CPU::squashInstIt)
you check if instIt == instList.begin() (or instIt == instList.end() &&
instList.empty()) and return instIt as-is, otherwise decrement and return
(--instIt); ensure both the erased-path and non-erased-path use this safe check
so you never decrement begin() or an empty list iterator.

In `@src/cpu/o3/rename.cc`:
- Around line 590-603: Rename::updateActivate currently hardcodes any_unblocking
= true which prevents deactivation; change it to compute the real unblocking
condition (replace the constant true with the actual boolean expression that
checks Rename's internal state: e.g., whether there are pending entries, free
rename/rob slots, or other wakeup conditions used elsewhere in Rename) and then
handle the deactivation path: when any_unblocking is false and _status == Active
set _status = Inactive, log DPRINTF(Activity, "Deactivating stage.\n"); and call
cpu->deactivateStage(CPU::RenameIdx) so the stage can properly deactivate.
Ensure you update references to any_unblocking, _status, Inactive, Active,
cpu->activateStage and cpu->deactivateStage in Rename::updateActivate.

---

Duplicate comments:
In `@src/cpu/o3/comm.hh`:
- Around line 330-337: StallSignals declares raw bool arrays (blockFetch,
blockDecode, blockRename, blockIEW) that are uninitialized; add deterministic
initialization by either providing an in-struct initializer (e.g., = {false} / =
{} for each array) or adding a default constructor StallSignals() that zeroes
all MaxThreads entries for blockFetch, blockDecode, blockRename, and blockIEW so
every element is false on construction.

In `@src/cpu/o3/commit.cc`:
- Around line 1936-1937: The ROB lookup is using a hardcoded thread 0: change
the call rob->findInst(0, inst->seqNum) to use the instruction's actual thread
identifier (e.g. rob->findInst(inst->threadId(), inst->seqNum) or
rob->findInst(inst->thread->id, inst->seqNum) depending on the project API) so
the lookup uses inst's thread rather than 0 and avoids false panics; update the
call site in commit.cc where rob->findInst and inst->seqNum are referenced.
- Around line 1387-1392: The code unsafely copies head_inst->effSize bytes into
an 8-byte stack buffer and then reads it via a uint64_t cast, which can overflow
and cause UB; fix by guarding and building the 64-bit value safely: ensure
head_inst->memData is non-null, clamp the copy length to at most 8 bytes (e.g.,
size_t n = std::min<size_t>(head_inst->effSize, 8)), zero-initialize the
destination (or initialize a uint64_t value = 0), then copy only n bytes into
that uint64_t using memcpy or a byte-wise loop to avoid alignment/aliasing
issues, and finally use that constructed load_value when calling
loadTripleCounter.update(load_pc, load_addr, load_value).

In `@src/cpu/o3/decode.cc`:
- Around line 112-115: Decode::clearStates currently does nothing, leaving
per-thread buffered instructions alive across thread reuse; implement
Decode::clearStates(ThreadID tid) to remove/clear any entries associated with
the given tid from per-thread structures such as fixedbuffer and any stall
queues (and other per-thread buffers maintained by Decode), e.g. iterate
fixedbuffer and stall queue containers and erase or reset entries whose thread
id equals tid, and reset any per-thread bookkeeping counters/flags so the thread
starts with a clean state.
- Around line 434-473: status_change is initialized false and never updated so
updateActivate() never runs; change the code so any operation that can flip
thread activation returns a bool and ORs into status_change (for example make
moveInstsToBuffer() and checkSquash() return a bool and call status_change |=
moveInstsToBuffer(); status_change |= checkSquash(); or have those functions set
a provided reference flag), and also ensure decodeInsts(tid) reports/returns any
activation changes (status_change |= decodeInsts(tid); or decodeInsts sets the
flag) before the final if (status_change) updateActivate(); so updateActivate()
is invoked when any of those routines change status.

In `@src/cpu/o3/decode.hh`:
- Line 186: The pointer member StallSignals* stallSig is uninitialized and can
contain garbage; initialize it to a known null state by setting stallSig =
nullptr where it's declared or by adding stallSig(nullptr) to the constructor
initializer list for the class that contains it (reference symbol: stallSig /
StallSignals* stallSig) so any pre-wiring access sees a defined null pointer.

In `@src/cpu/o3/fetch.cc`:
- Around line 1301-1312: sendInstructionsToDecode() currently returns early when
no threads are active or when stallSig->blockFetch[tid] is set but leaves
per-cycle counters and still drains fetchQueue, which can leak stale numInst and
break decode backpressure; update the logic so that before each early return
(both the any_thread_active false path and the stallSig->blockFetch[tid] path
referenced in the fetch loop and the similar path around the other occurrence)
you explicitly set numInst = 0 for the corresponding ThreadID and refrain from
draining fetchQueue[tid]; ensure you still call
setAllFetchStalls(fromDecode->decodeInfo[tid].blockReason) on stall and return
immediately after clearing numInst so no instructions are forwarded or accounted
for in that cycle.

In `@src/cpu/o3/iew.cc`:
- Around line 828-830: The expression currently short-circuits so
canInsertLDSTQue(i) isn't called when stallSig->blockIEW[i] is true; evaluate
canInsertLDSTQue(i) every tick into a local bool (e.g., bool canInsert =
canInsertLDSTQue(i)) and then compute block as bool block =
stallSig->blockIEW[i] || !canInsert; keep the subsequent active calculation
(bool active = !block && !fixedbuffer[i].empty()) unchanged so the
side-effecting getAndResetLastLQPopEntries()/getAndResetLastSQPopEntries()
always run.
- Around line 833-841: The current arbitration in the active-thread selection
block (variables tid, i, InvalidThreadID and stallSig->blockRename in iew.cc)
always keeps the first-seen active thread and marks both threads blocked, which
biases lower-index threads; change this to a fair round-robin/rotating
selection: introduce or use a rotating start index (e.g., lastChosenThread) and
scan threads beginning after it to pick the next active tid, then only set
stallSig->blockRename for other threads as appropriate (instead of marking
both), and update lastChosenThread when you pick tid so subsequent arbitration
rotates priority and prevents starvation of non-zero thread IDs.
- Around line 734-744: The loop in checkSquash() currently returns after
handling the first squashing thread, which skips processing subsequent threads;
change the control flow to handle all threads by replacing the early "return"
with a "continue" (or otherwise removing the return) so squash(i),
localSquashVer.update(...), fetchRedirect[i] = false,
iewStats.stallEvents[ROBWalk]++, and setAllStalls(StallReason::CommitSquash)
execute for each thread that has fromCommit->commitInfo[i].squash; ensure the
remaining loop iterations still run and that any shared side-effects remain
consistent after this change.

In `@src/cpu/o3/iew.hh`:
- Line 159: Initialize the member pointer stallSig to nullptr where it's
declared (StallSignals* stallSig) in the iew.hh class declaration so it has a
defined value before setStallSignals() is called; update the declaration to
assign nullptr and ensure any early-access code guards against a null stallSig
until setStallSignals() assigns the real pointer.

In `@src/cpu/o3/rename.cc`:
- Around line 401-406: The releaseSeq computation uses historyBuffer->empty()
which checks only thread 0; change it to test the current thread's history
buffer (use historyBuffer[tid].empty()) and then use
historyBuffer[tid].back().instSeqNum for releaseSeq when not empty; update the
block around fromCommit->commitInfo[tid], tid, releaseSeq, and historyBuffer
references accordingly so SMT threads use their own history buffers.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 33b75e7 and f77f257.

📒 Files selected for processing (23)

src/cpu/o3/comm.hh
src/cpu/o3/commit.cc
src/cpu/o3/commit.hh
src/cpu/o3/cpu.cc
src/cpu/o3/cpu.hh
src/cpu/o3/decode.cc
src/cpu/o3/decode.hh
src/cpu/o3/dyn_inst.hh
src/cpu/o3/fetch.cc
src/cpu/o3/fetch.hh
src/cpu/o3/iew.cc
src/cpu/o3/iew.hh
src/cpu/o3/inst_queue.cc
src/cpu/o3/issue_queue.cc
src/cpu/o3/lsq.cc
src/cpu/o3/lsq_unit.cc
src/cpu/o3/regfile.hh
src/cpu/o3/rename.cc
src/cpu/o3/rename.hh
src/cpu/o3/rename_map.cc
src/cpu/o3/rob.cc
src/cpu/o3/rob.hh
src/cpu/reg_class.hh

💤 Files with no reviewable changes (2)

src/cpu/o3/dyn_inst.hh
src/cpu/o3/inst_queue.cc

✅ Files skipped from review due to trivial changes (1)

src/cpu/o3/regfile.hh

🚧 Files skipped from review as they are similar to previous changes (6)

src/cpu/o3/issue_queue.cc
src/cpu/o3/lsq_unit.cc
src/cpu/o3/lsq.cc
src/cpu/o3/rob.hh
src/cpu/o3/fetch.hh
src/cpu/o3/rob.cc

src/cpu/o3/commit.cc

src/cpu/o3/rename_map.cc

Change-Id: I850ca92264f83e4d3a1178aedac71050d758b8cf

Change-Id: I4e03eee48edb3e816147896ffe2ce3cf3ac7f122

Change-Id: I33c41a75ec20c23becf9e778baef11620b2a3673

github-actions · 2026-03-05T10:26:51Z

🚀 Performance test triggered: spec06-0.8c

github-actions · 2026-03-05T10:40:19Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.2203`	-
This PR	`0.9934`	📉 `-1.2268` (`-55.26%`)

✅ Difftest smoke test passed!

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/cpu/o3/cpu.cc (1)

1497-1516: ⚠️ Potential issue | 🔴 Critical

Iterator underflow/UB in squashInstIt.

After erase, Line 1515 always returns --instIt. This can decrement begin() (or end() of an empty list), which is undefined behavior.

Proposed fix

 CPU::ListIt
 CPU::squashInstIt(ListIt &instIt, ThreadID tid)
 {
     if ((*instIt)->threadNumber == tid) {
         DPRINTF(O3CPU, "Squashing instruction, "
                 "[tid:%i] [sn:%lli] PC %s\n",
                 (*instIt)->threadNumber,
                 (*instIt)->seqNum,
                 (*instIt)->pcState());
         (*instIt)->setSquashed();
-        instIt = instList.erase(instIt);
+        instIt = instList.erase(instIt); // iterator to next element (or end)
     }
-    return --instIt;
+    if (instList.empty()) {
+        return instList.end();
+    }
+    if (instIt == instList.begin()) {
+        return instIt;
+    }
+    if (instIt == instList.end()) {
+        return std::prev(instIt);
+    }
+    return std::prev(instIt);
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/cpu.cc` around lines 1497 - 1516, The function CPU::squashInstIt
performs instList.erase(instIt) and then unconditionally returns --instIt which
can underflow; instead capture the iterator returned by instList.erase (e.g.,
auto nextIt = instList.erase(instIt)), then if nextIt == instList.begin() return
nextIt (no decrement), otherwise return std::prev(nextIt); update
CPU::squashInstIt to use nextIt and std::prev to avoid decrementing
begin()/end() and reference instList, squashInstIt, erase, and seqNum/pcState
for locating the code.

src/cpu/o3/rename.cc (1)

590-613: ⚠️ Potential issue | 🟠 Major

updateActivate() currently can’t deactivate Rename.

Line 592 initializes any_unblocking to true, so the inactive path is never taken. This keeps Rename active even when idle.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/rename.cc` around lines 590 - 613, The function
Rename::updateActivate currently hardcodes any_unblocking = true which prevents
the inactive branch and keeps the stage always active; replace that placeholder
with the actual predicate that determines whether Rename is unblocking (e.g.,
evaluate the rename's ready/blocked state using existing helpers or checks used
elsewhere in the class instead of the literal true), so that any_unblocking
reflects real work availability and allows the _status/Inactve -> Active and
Active -> Inactive transitions to call cpu->activateStage(CPU::RenameIdx) and
cpu->deactivateStage(CPU::RenameIdx) appropriately (make the change inside
Rename::updateActivate and use the class' existing fields/methods that indicate
blocking status rather than the hardcoded flag).

♻️ Duplicate comments (20)

src/cpu/o3/lsq.cc (1)

1666-1673: ⚠️ Potential issue | 🟠 Major

Unsafe debug data dump can overflow and trigger UB.

At Line 1668, copying pkt->getSize() into an 8-byte buffer can overflow. At Line 1672, casting buffer to uint64_t* is unsafe, and %d is the wrong format for 64-bit data.

Proposed fix

-    if (debug::LSQ) {
-        char buffer[8];
-        std::memcpy(buffer, pkt->getPtr<char>(), pkt->getSize());
-        DPRINTF(LSQ, "Single Req::recvTimingResp: inst: %llu, pkt: %#lx, isLoad: %d, "
-                    "isLLSC: %d, isUncache: %d, isCachehit: %d, data: %d\n",
-                    pkt->req->getReqInstSeqNum(), pkt->getAddr(), isLoad(), mainReq()->isLLSC(),
-                    mainReq()->isUncacheable(), cacheHit, *((uint64_t*)buffer));
-    }
+    if (debug::LSQ) {
+        uint64_t data = 0;
+        const size_t copy_sz = std::min<size_t>(pkt->getSize(), sizeof(data));
+        std::memcpy(&data, pkt->getPtr<char>(), copy_sz);
+        DPRINTF(LSQ, "Single Req::recvTimingResp: inst: %llu, pkt: %#lx, isLoad: %d, "
+                    "isLLSC: %d, isUncache: %d, isCachehit: %d, data: %#llx\n",
+                    pkt->req->getReqInstSeqNum(), pkt->getAddr(), isLoad(), mainReq()->isLLSC(),
+                    mainReq()->isUncacheable(), cacheHit,
+                    static_cast<unsigned long long>(data));
+    }

#!/bin/bash
# Verify the unsafe pattern still exists in the current file.
rg -n -C2 'char buffer\[8\]|memcpy\(buffer,\s*pkt->getPtr<char>\(\),\s*pkt->getSize\(\)\)|\*\(\(uint64_t\*\)buffer\)|data:\s*%d' src/cpu/o3/lsq.cc

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/lsq.cc` around lines 1666 - 1673, The debug dump is unsafe:
replace the fixed 8-byte buffer and raw cast with a uint64_t local (e.g.,
uint64_t data = 0), copy at most sizeof(data) bytes from pkt->getPtr<char>()
using std::memcpy(&data, pkt->getPtr<char>(), std::min(pkt->getSize(),
sizeof(data))), and update the DPRINTF in recvTimingResp (the block with
DPRINTF(LSQ,...), pkt->getPtr<char>(), pkt->getSize(), mainReq(), etc.) to print
the value using the correct 64-bit format (e.g., cast to (unsigned long long)
and use %llu) instead of casting buffer to uint64_t* and using %d.

src/cpu/o3/regfile.hh (1)

424-426: ⚠️ Potential issue | 🟡 Minor

Use the predicate regClass formatter in VecPredRegClass logging.

At Line [425], the value string is formatted via vectorRegFile.regClass, but this branch writes vecPredRegFile. The formatter should match the predicate register file.

🐛 Proposed fix

           case VecPredRegClass:
             DPRINTF(Scoreboard, "RegFile: Setting predicate register %i to %s\n",
-                    idx, vectorRegFile.regClass.valString(val));
+                    idx, vecPredRegFile.regClass.valString(val));
             vecPredRegFile.set(idx, val);
             break;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/regfile.hh` around lines 424 - 426, The DPRINTF call uses
vectorRegFile.regClass.valString(val) but the code is setting a predicate
register with vecPredRegFile.set(idx, val), so replace the formatter to use the
predicate register file's regClass (vecPredRegFile.regClass.valString(val)) so
the logged value string matches the predicate register class; update the DPRINTF
invocation accordingly (symbols: DPRINTF, vectorRegFile, vecPredRegFile,
regClass, valString, vecPredRegFile.set).

src/cpu/o3/rename.hh (1)

342-342: ⚠️ Potential issue | 🟠 Major

Initialize stallSig defensively.

Line 342 leaves stallSig indeterminate until explicitly set. This is unsafe if tick()/helpers touch it before wiring.

Proposed fix

-    StallSignals* stallSig;
+    StallSignals* stallSig = nullptr;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/rename.hh` at line 342, The field StallSignals* stallSig is left
uninitialized; initialize it defensively (e.g., set to nullptr) so any early
calls in tick() or helper methods can't read an indeterminate pointer. Update
the declaration or the owning class's constructor (the class that contains
stallSig) to initialize stallSig to nullptr and ensure any code that
dereferences it (e.g., tick(), helper methods) checks for null before use.

src/cpu/o3/cpu.hh (1)

546-546: ⚠️ Potential issue | 🟡 Minor

Remove redundant stallSig member.

Line 546 introduces a second StallSignals object (stallSig) alongside stallSignals (Line 529). Keeping both is error-prone and obscures which object is actually wired.
Proposed fix
-    StallSignals stallSig;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/cpu.hh` at line 546, There are two redundant StallSignals members
declared (stallSignals and stallSig) causing confusion; remove the duplicate
declaration stallSig and update any references to use the existing stallSignals
member. Search for usages of stallSig in the class and implementation files and
replace them with stallSignals, then delete the stallSig field from the class
(ensure constructors/initializers and any initialization lists no longer
reference stallSig).

src/cpu/o3/comm.hh (1)

333-336: ⚠️ Potential issue | 🟠 Major

Initialize stall signal arrays before first use.

Line 333–336 declare per-thread stall flags without initialization. First-cycle reads can observe garbage values and cause spurious stalls/unblocks.

Proposed fix

 struct StallSignals
 {
-    bool blockFetch[MaxThreads];// decode to fetch
-    bool blockDecode[MaxThreads];// rename to decode
-    bool blockRename[MaxThreads];// iew to rename (if iew is stalling, rename all threads would be stalled)
-    bool blockIEW[MaxThreads];// commit to iew
+    bool blockFetch[MaxThreads]{};  // decode to fetch
+    bool blockDecode[MaxThreads]{}; // rename to decode
+    bool blockRename[MaxThreads]{}; // iew to rename
+    bool blockIEW[MaxThreads]{};    // commit to iew
 };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/comm.hh` around lines 333 - 336, The per-thread stall flag arrays
blockFetch, blockDecode, blockRename, and blockIEW are declared without
initialization and can contain garbage on first-cycle reads; initialize them to
false before first use. Fix by initializing these arrays to all-false (either
via inline initialization in their declaration or by setting every element to
false in the containing class/struct constructor or init routine that runs
before use) so no spurious stalls occur when functions referencing
blockFetch/blockDecode/blockRename/blockIEW run on cycle 0.

src/cpu/o3/commit.cc (3)

1402-1406: ⚠️ Potential issue | 🔴 Critical

Fix unsafe load-value extraction path.

Line 1402–1406 can overflow the 8-byte buffer when effSize > 8, and the cast-based read is undefined on some targets.

Proposed fix

-                    char buffer[8] = {0};
-                    if (head_inst->memData) {
-                        std::memcpy(buffer, head_inst->memData, head_inst->effSize);
-                    }
-                    Addr load_value = *((uint64_t *)buffer);
+                    uint64_t load_value_u64 = 0;
+                    if (head_inst->memData && head_inst->effSize > 0) {
+                        const size_t n =
+                            std::min<size_t>(head_inst->effSize, sizeof(load_value_u64));
+                        std::memcpy(&load_value_u64, head_inst->memData, n);
+                    }
+                    Addr load_value = static_cast<Addr>(load_value_u64);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/commit.cc` around lines 1402 - 1406, The current extraction uses a
fixed 8-byte stack buffer and a casted read which can overflow when
head_inst->effSize > 8 and is undefined on some architectures; replace this by
declaring a uint64_t tmp = 0, then if (head_inst->memData) memcpy(&tmp,
head_inst->memData, std::min(head_inst->effSize, sizeof(tmp))); and set Addr
load_value = tmp. Update references around head_inst, memData, effSize and
load_value to use this safe, size-capped memcpy instead of the 8-byte buffer and
pointer cast.

1948-1949: ⚠️ Potential issue | 🟠 Major

ROB lookup uses wrong thread ID.

Line 1948 hardcodes thread 0 in rob->findInst, which can raise false panics for other threads.

Proposed fix

-            panic_if(!rob->findInst(0, inst->seqNum), "[sn:%llu] Committed instruction not found in ROB",
+            panic_if(!rob->findInst(inst->threadNumber, inst->seqNum),
+                     "[sn:%llu] Committed instruction not found in ROB",
                      inst->seqNum);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/commit.cc` around lines 1948 - 1949, The ROB lookup is using a
hardcoded thread id (0) in rob->findInst, causing false panics for non-zero
threads; change the call to use the instruction's actual thread identifier
(replace the literal 0 with the instruction's thread id accessor, e.g.,
inst->threadId() or inst->threadId depending on the codebase) so the panic_if
line uses rob->findInst(inst->threadId(), inst->seqNum) (or the equivalent
accessor) instead of rob->findInst(0, inst->seqNum).

1261-1261: ⚠️ Potential issue | 🟠 Major

Use declared debug flags (or include corresponding debug headers).

Line 1261 and Line 1868 use CommitTrace/IEW debug categories in this translation unit without direct declarations. This can break builds depending on include order.

Proposed fix (reuse an existing local debug category)

-                DPRINTF(CommitTrace, "CT: %s\n", head_inst->genDisassembly());
+                DPRINTF(Commit, "CT: %s\n", head_inst->genDisassembly());
...
-                DPRINTF(IEW, "Multiple active threads detected, blocking all threads\n");
+                DPRINTF(Commit, "Multiple active threads detected, blocking all threads\n");

Also applies to: 1868-1868

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/commit.cc` at line 1261, This translation unit uses
DPRINTF(CommitTrace, ...) and DPRINTF(IEW, ...) but does not declare those debug
flags; fix by either including the corresponding debug headers (add includes for
debug/CommitTrace.hh and debug/IEW.hh) near the top of src/cpu/o3/commit.cc so
the flags are declared, or change the DPRINTF calls to an already-declared local
debug category used in this file (reuse the file's existing debug symbol) so the
build doesn't depend on include order.

src/cpu/o3/commit.hh (1)

181-181: ⚠️ Potential issue | 🟠 Major

Initialize stallSig at declaration.

Line 181 keeps stallSig uninitialized. If any path dereferences before setStallSignals(), behavior is undefined.

Proposed fix

-    StallSignals* stallSig;
+    StallSignals* stallSig = nullptr;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/commit.hh` at line 181, The member pointer StallSignals* stallSig
is declared uninitialized; initialize it at declaration (e.g., to nullptr) to
avoid undefined behavior if accessed before setStallSignals() is called. Update
the declaration of stallSig in commit.hh so the pointer has a safe default
value, ensuring any early dereferences can be guarded against or checked against
null in methods that use stallSig (such as setStallSignals() and any commit
class methods that access stallSig).

src/cpu/o3/rename.cc (2)

401-406: ⚠️ Potential issue | 🟠 Major

releaseSeq still checks the wrong thread’s history buffer.

Line 405 uses historyBuffer->empty(), which checks thread 0. This can compute incorrect releaseSeq for other threads.

Suggested fix

-            releaseSeq = historyBuffer->empty() ? 0 : historyBuffer[tid].back().instSeqNum;
+            releaseSeq = historyBuffer[tid].empty() ? 0 : historyBuffer[tid].back().instSeqNum;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/rename.cc` around lines 401 - 406, The code computes releaseSeq
using historyBuffer->empty(), which inspects thread-0 instead of the current
thread; change the emptiness check to historyBuffer[tid].empty() and keep using
historyBuffer[tid].back().instSeqNum so releaseSeq is derived from the correct
thread's history; update the conditional to: releaseSeq =
historyBuffer[tid].empty() ? 0 : historyBuffer[tid].back().instSeqNum
(referencing fromCommit, commitInfo[tid], tid, finalCommitSeq, releaseSeq, and
historyBuffer[tid]).

483-485: ⚠️ Potential issue | 🟡 Minor

Fix DPRINTF format/argument mismatch in squashed-inst log.

The format string expects [sn:%llu] ... %s, but arguments pass tid first and one extra parameter.

Suggested fix

-            DPRINTF(Rename, "[sn:%llu] instruction  with PC %s is squashed, skipping.\n",
-                    tid, inst->seqNum, inst->pcState());
+            DPRINTF(Rename,
+                    "[tid:%i] [sn:%llu] instruction with PC %s is squashed, skipping.\n",
+                    tid, inst->seqNum, inst->pcState());

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/rename.cc` around lines 483 - 485, The DPRINTF in
src/cpu/o3/rename.cc has a format/argument mismatch: the format string expects
"[sn:%llu] ... %s" but the call passes tid first and an extra parameter. Fix by
making the arguments match the format — pass inst->seqNum as the %llu and
inst->pcState() (or its string/printable representation) as the %s, and remove
the stray tid argument; alternatively, if tid should be logged, update the
format string to include a specifier for tid and order the arguments accordingly
(refer to the DPRINTF call and the variables tid, inst->seqNum,
inst->pcState()).

src/cpu/o3/decode.hh (1)

108-109: ⚠️ Potential issue | 🟠 Major

Initialize stallSig to nullptr at declaration.

Line 186 currently leaves stallSig indeterminate until setStallSignals() is called.

Suggested fix

-    StallSignals* stallSig;
+    StallSignals* stallSig = nullptr;

Also applies to: 186-187

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/decode.hh` around lines 108 - 109, The member pointer stallSig is
left uninitialized until setStallSignals() is called; locate the StallSignals*
stallSig declaration in the class in decode.hh and initialize it to nullptr
(e.g., StallSignals* stallSig = nullptr) so it has a defined value by default;
keep setStallSignals(StallSignals* stall_signals) as-is to assign the real
pointer later.

src/cpu/o3/fetch.cc (2)

1316-1335: ⚠️ Potential issue | 🟠 Major

Blocked fetch path still forwards instructions to decode.

At Line 1317, fetch is marked blocked, but Lines 1325-1335 still drain fetchQueue[tid] and push to decode. This violates the stall contract.

Suggested fix

     if (stallSig->blockFetch[tid]) {
         // If decode stalled, use decode's stall reason
         DPRINTF(Fetch, "[tid:%i] Fetch stalled\n", tid);
         setAllFetchStalls(fromDecode->decodeInfo[tid].blockReason);
+        numInst = 0;
+        return;
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` around lines 1316 - 1335, The code marks fetch stalled
via stallSig->blockFetch[tid] and calls setAllFetchStalls(...), but still drains
fetchQueue[tid] into toDecode->insts; change the control flow in the fetch
routine so that when stallSig->blockFetch[tid] is true you do not execute the
loop that pops from fetchQueue and writes to toDecode (either return/continue
early or wrap the draining loop in an else), ensuring you preserve the stall
reason from fromDecode->decodeInfo[tid].blockReason and do not modify
wroteToTimeBuffer, fetchQueue[tid], or toDecode->size while blocked.

1314-1315: ⚠️ Potential issue | 🟠 Major

SMT is still effectively disabled by hardcoded tid = 0.

Line 1314 and Line 1681 force thread 0 selection. With numThreads > 1, other threads can starve silently.

Suggested fail-fast until SMT selection is restored

 ThreadID
 Fetch::selectFetchThread()
 {
-    ThreadID tid = 0; // TODO: smt support
+    fatal_if(numThreads > 1,
+             "SMT fetch selection is not implemented; set numThreads=1.");
+    ThreadID tid = 0; // TODO: smt support

Also applies to: 1679-1695

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` around lines 1314 - 1315, The code currently hardcodes
ThreadID tid = 0 in fetch.cc which effectively disables SMT and can starve
threads; update the ThreadID selection logic in the fetch stage (the
declaration/assignment of ThreadID tid and the other places that force tid = 0)
to fail-fast when the system is configured with numThreads > 1: detect the
multi-thread configuration (e.g., via numThreads or threads()->numThreads())
and, instead of silently using tid = 0, log a clear fatal error or assert and
abort (or return an error) indicating SMT selection is not implemented yet so
the misconfiguration is obvious; apply the same fail-fast change to the other
occurrences that currently force tid = 0 so multi-thread setups do not silently
starve.

src/cpu/o3/iew.hh (1)

159-160: ⚠️ Potential issue | 🟠 Major

Initialize stallSig to a safe default before first use.

stallSig is declared uninitialized at Line 159. If any path dereferences it before setStallSignals(), behavior is undefined.

Suggested fix

-    StallSignals* stallSig;
+    StallSignals* stallSig = nullptr;

Also applies to: 300-300

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.hh` around lines 159 - 160, StallSignals* stallSig is declared
uninitialized; initialize it to a safe default (nullptr) where declared or in
the containing class/struct constructor, and ensure any code that dereferences
it (before setStallSignals is called) either checks for nullptr or only runs
after setStallSignals has been invoked; update the declaration of stallSig and
relevant constructors and usages (e.g., setStallSignals, any methods in the IEW
class that read stallSig) to enforce the nullptr-initialized invariant.

src/cpu/o3/iew.cc (3)

828-830: ⚠️ Potential issue | 🟠 Major

canInsertLDSTQue() side effects are still short-circuited.

At Line 829, stallSig->blockIEW[i] || ... can skip canInsertLDSTQue(i), so getAndResetLast*PopEntries() may not run every tick.

Suggested fix

-        bool block = stallSig->blockIEW[i] || !canInsertLDSTQue(i);
+        bool can_insert_lsq = canInsertLDSTQue(i);
+        bool block = stallSig->blockIEW[i] || !can_insert_lsq;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 828 - 830, The current expression
"stallSig->blockIEW[i] || !canInsertLDSTQue(i)" short-circuits and can skip
calling canInsertLDSTQue(i), preventing its side-effects (e.g.,
getAndResetLast*PopEntries()) from running each tick; change the code to call
canInsertLDSTQue(i) unconditionally (e.g., bool canInsert = canInsertLDSTQue(i);
bool block = stallSig->blockIEW[i] || !canInsert;) so the side-effecting
function always executes while preserving the original block logic that uses its
return value; update the loop that computes "active" (which uses fixedbuffer[i])
to use the new block variable.

734-744: ⚠️ Potential issue | 🟠 Major

checkSquash() exits after first squashing thread.

Line 743 returns from the function, so later threads in the same tick are skipped.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 734 - 744, The function currently returns
after handling the first squashed thread, skipping remaining threads; change the
control flow in the loop inside checkSquash (the block that calls squash(i),
localSquashVer.update(...), sets fetchRedirect[i]=false, increments
iewStats.stallEvents[ROBWalk], and calls
setAllStalls(StallReason::CommitSquash)) so it does not return
immediately—replace the final return with continue (or otherwise let the loop
proceed) so all threads in the tick are inspected and processed; keep any
necessary outer return behavior after the loop if intended.

424-431: ⚠️ Potential issue | 🟠 Major

toCommit wire offset and comment are inconsistent; verify IEW→Commit timing alignment.

Lines 426-429 document [-1] behavior, but Line 430 uses getWire(0). Please verify producer/consumer offsets are intentionally matched with reverse tick ordering.

#!/bin/bash
# Verify IEW producer and Commit consumer offsets for IEW queue communication.
rg -n 'setIEWQueue|toCommit\s*=|fromIEW\s*=|getWire\(' src/cpu/o3/iew.cc src/cpu/o3/commit.cc src/cpu/o3/commit.hh -C2

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 424 - 431, Comment and code disagree about
IEW→Commit wire offset: the text describes writing/reading at position [-1]
given iewToCommitDelay == 1, but the code sets toCommit = iewQueue->getWire(0).
Inspect and align the producer/consumer offsets by either changing getWire(0) to
getWire(-1) (or the equivalent reverse-index API) in iew.cc where toCommit is
assigned, or update the comment to document why getWire(0) is correct
(referencing iewToCommitDelay, iewQueue->getWire(), toCommit, and
Commit::fromIEW). Also verify corresponding reads in Commit (fromIEW) and any
setIEWQueue usage to ensure both sides use the same offset convention and adjust
one side if needed so producer and consumer offsets match.

src/cpu/o3/decode.cc (2)

112-115: ⚠️ Potential issue | 🟠 Major

clearStates() still does not clear per-thread decode buffers.

clearStates(ThreadID tid) is empty at Line 112, so per-thread buffered instructions can survive thread lifecycle transitions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/decode.cc` around lines 112 - 115, Decode::clearStates(ThreadID
tid) is empty and must clear all per-thread decode buffers and state for the
given tid; implement it to locate the per-thread buffers used by Decode (e.g.,
arrays or maps keyed by ThreadID such as decodeBuffer[tid], decodeQueue[tid], or
perThreadNextPC/validFlags) and reset/clear them (empty queues, reset
indices/counters, clear any pending instructions and valid flags) so no buffered
instructions survive thread lifecycle transitions; ensure you only clear the
entry for the provided tid and leave other threads untouched.

434-473: ⚠️ Potential issue | 🟠 Major

updateActivate() remains unreachable in tick().

status_change is initialized false at Line 434 and never updated, so the call at Line 471 never executes.

Suggested fix

-    bool status_change = false;
+    bool status_change = false;
@@
-    if (status_change) {
-        updateActivate();
-    }
+    updateActivate();

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/decode.cc` around lines 434 - 473, status_change is never set so
updateActivate() never runs; fix by setting status_change = true whenever we
change stall/fetch state. Specifically, when assigning stallSig->blockFetch[i] =
block (and in the branch where you set stallSig->blockFetch[tid] and
stallSig->blockFetch[i] = true) compare the new value to the previous value and
set status_change = true if it differs; this ensures the updateActivate() call
(and any other post-decode logic) runs when stall/fetch status actually changes.
Refer to status_change, stallSig->blockFetch, tid, and updateActivate().

🧹 Nitpick comments (3)

src/cpu/reg_class.hh (1)

215-218: Broaden enum bounds validation before indexing regClassStrings.

Nice defensive guard for InvalidRegClass, but Line 218 can still read out-of-bounds if regClass is any other invalid enum value (e.g., casted/corrupted value). Consider a full range check before indexing.

Proposed hardening patch

     constexpr const char*
     className() const
     {
-        if (regClass == InvalidRegClass) {
+        if (regClass < IntRegClass || regClass > MiscRegClass) {
             return "InvalidRegClass";
         }
-        return regClassStrings[regClass];
+        return regClassStrings[static_cast<size_t>(regClass)];
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/reg_class.hh` around lines 215 - 218, The current code only guards
against InvalidRegClass but still indexes regClassStrings with any other
out-of-range/corrupted enum value; update the lookup to validate the enum range
before indexing (e.g., check regClass is >= 0 and < the enum upper
bound/NumRegClasses or compare against RegClass_LAST) and return a safe fallback
string when the check fails; refer to regClass, InvalidRegClass and
regClassStrings when making this change.

src/cpu/o3/perfCCT.hh (1)

72-72: Consider deriving MaxMetas from a single source of truth.

The hardcoded 3000 can drift from actual in-flight instruction capacity over time; binding this to an existing CPU/config constant would reduce silent overwrite risk in modulo indexing.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/perfCCT.hh` at line 72, Replace the hardcoded const int MaxMetas =
3000 with a derived constant tied to the canonical in-flight instruction
capacity (e.g., use the existing MaxNum from DynInst or the project-wide
in-flight instruction constant) so the size stays in sync; change the definition
of MaxMetas to reference that canonical symbol (for example replace 3000 with
DynInst::MaxNum or the equivalent global MAX_IN_FLIGHT_INSTS) and update any
dependent uses (modulo indexing logic that refers to MaxMetas) to rely on this
derived value.

src/cpu/o3/perfCCT.cc (1)

117-119: Please replace duplicated mask literals with a named constant.

Using 0x0fffffffffffffffllu inline in multiple fields obscures intent and risks divergence later; a shared constant plus a short comment (why this mask) would make this safer to maintain.

♻️ Suggested cleanup

+namespace {
+constexpr uint64_t PerfCCTSqlMask = 0x0fffffffffffffffULL;
+}
...
-    ss << "," << (meta->value & 0x0fffffffffffffffllu);
+    ss << "," << (meta->value & PerfCCTSqlMask);
...
-    ss << "," << (meta->pc & 0x0fffffffffffffffllu);
+    ss << "," << (meta->pc & PerfCCTSqlMask);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/perfCCT.cc` around lines 117 - 119, Duplicate literal mask
0x0fffffffffffffffllu is used when printing meta->value and meta->pc; define a
single named constant (e.g., constexpr uint64_t ADDR_MASK =
0x0fffffffffffffffllu; with a short comment explaining purpose) in perfCCT.cc
near the top of the translation unit, then replace the inline literals in the
serialization/printing sites (references: the streaming lines that use
meta->value and meta->pc) to use ADDR_MASK instead; ensure the constant type
matches the operands (uint64_t/unsigned long long as appropriate) so no implicit
conversion warnings occur.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cpu/o3/fetch.hh`:
- Line 641: The member variable stallSig in class/struct where it's declared as
StallSignals* stallSig is left uninitialized; change its declaration to
initialize it defensively (e.g., StallSignals* stallSig = nullptr) so it has a
defined null value until wiring/assignment occurs, ensuring no
indeterminate-pointer UB if construction/init order changes; update any
constructors or initializer lists accordingly if you prefer initializing there
but ensure the pointer is set to nullptr by default.

In `@src/cpu/o3/iew.cc`:
- Around line 1089-1092: The serialize path in the classify/dispatch loop exits
via "break" without setting the dispatch stall flags; update the block inside
the if (checkSerialize(inst)) in src/cpu/o3/iew.cc to set the appropriate
dispatch stall indicators (e.g., set breakDispatch = true and set blockReason to
the serialize stall reason used elsewhere in this file/enum) immediately before
breaking so downstream stall reporting sees the serialize reason.

In `@src/cpu/o3/regfile.hh`:
- Around line 308-309: The DPRINTF call in RegFile (the line with
DPRINTF(Scoreboard, "RegFile: trying to access %s register %i \n", type, idx);)
uses %s for a RegClassType enum causing UB; replace the format/argument to print
the register class name string by calling RegId(type, 0).className() (i.e., use
the same pattern as in src/cpu/o3/rename.cc) and keep the %i for idx so the call
becomes DPRINTF(Scoreboard, "RegFile: trying to access %s register %i \n",
RegId(type, 0).className(), idx); ensure this change is applied where
getReg/virt_reg.correct are used in the same function.

---

Outside diff comments:
In `@src/cpu/o3/cpu.cc`:
- Around line 1497-1516: The function CPU::squashInstIt performs
instList.erase(instIt) and then unconditionally returns --instIt which can
underflow; instead capture the iterator returned by instList.erase (e.g., auto
nextIt = instList.erase(instIt)), then if nextIt == instList.begin() return
nextIt (no decrement), otherwise return std::prev(nextIt); update
CPU::squashInstIt to use nextIt and std::prev to avoid decrementing
begin()/end() and reference instList, squashInstIt, erase, and seqNum/pcState
for locating the code.

In `@src/cpu/o3/rename.cc`:
- Around line 590-613: The function Rename::updateActivate currently hardcodes
any_unblocking = true which prevents the inactive branch and keeps the stage
always active; replace that placeholder with the actual predicate that
determines whether Rename is unblocking (e.g., evaluate the rename's
ready/blocked state using existing helpers or checks used elsewhere in the class
instead of the literal true), so that any_unblocking reflects real work
availability and allows the _status/Inactve -> Active and Active -> Inactive
transitions to call cpu->activateStage(CPU::RenameIdx) and
cpu->deactivateStage(CPU::RenameIdx) appropriately (make the change inside
Rename::updateActivate and use the class' existing fields/methods that indicate
blocking status rather than the hardcoded flag).

---

Duplicate comments:
In `@src/cpu/o3/comm.hh`:
- Around line 333-336: The per-thread stall flag arrays blockFetch, blockDecode,
blockRename, and blockIEW are declared without initialization and can contain
garbage on first-cycle reads; initialize them to false before first use. Fix by
initializing these arrays to all-false (either via inline initialization in
their declaration or by setting every element to false in the containing
class/struct constructor or init routine that runs before use) so no spurious
stalls occur when functions referencing
blockFetch/blockDecode/blockRename/blockIEW run on cycle 0.

In `@src/cpu/o3/commit.cc`:
- Around line 1402-1406: The current extraction uses a fixed 8-byte stack buffer
and a casted read which can overflow when head_inst->effSize > 8 and is
undefined on some architectures; replace this by declaring a uint64_t tmp = 0,
then if (head_inst->memData) memcpy(&tmp, head_inst->memData,
std::min(head_inst->effSize, sizeof(tmp))); and set Addr load_value = tmp.
Update references around head_inst, memData, effSize and load_value to use this
safe, size-capped memcpy instead of the 8-byte buffer and pointer cast.
- Around line 1948-1949: The ROB lookup is using a hardcoded thread id (0) in
rob->findInst, causing false panics for non-zero threads; change the call to use
the instruction's actual thread identifier (replace the literal 0 with the
instruction's thread id accessor, e.g., inst->threadId() or inst->threadId
depending on the codebase) so the panic_if line uses
rob->findInst(inst->threadId(), inst->seqNum) (or the equivalent accessor)
instead of rob->findInst(0, inst->seqNum).
- Line 1261: This translation unit uses DPRINTF(CommitTrace, ...) and
DPRINTF(IEW, ...) but does not declare those debug flags; fix by either
including the corresponding debug headers (add includes for debug/CommitTrace.hh
and debug/IEW.hh) near the top of src/cpu/o3/commit.cc so the flags are
declared, or change the DPRINTF calls to an already-declared local debug
category used in this file (reuse the file's existing debug symbol) so the build
doesn't depend on include order.

In `@src/cpu/o3/commit.hh`:
- Line 181: The member pointer StallSignals* stallSig is declared uninitialized;
initialize it at declaration (e.g., to nullptr) to avoid undefined behavior if
accessed before setStallSignals() is called. Update the declaration of stallSig
in commit.hh so the pointer has a safe default value, ensuring any early
dereferences can be guarded against or checked against null in methods that use
stallSig (such as setStallSignals() and any commit class methods that access
stallSig).

In `@src/cpu/o3/cpu.hh`:
- Line 546: There are two redundant StallSignals members declared (stallSignals
and stallSig) causing confusion; remove the duplicate declaration stallSig and
update any references to use the existing stallSignals member. Search for usages
of stallSig in the class and implementation files and replace them with
stallSignals, then delete the stallSig field from the class (ensure
constructors/initializers and any initialization lists no longer reference
stallSig).

In `@src/cpu/o3/decode.cc`:
- Around line 112-115: Decode::clearStates(ThreadID tid) is empty and must clear
all per-thread decode buffers and state for the given tid; implement it to
locate the per-thread buffers used by Decode (e.g., arrays or maps keyed by
ThreadID such as decodeBuffer[tid], decodeQueue[tid], or
perThreadNextPC/validFlags) and reset/clear them (empty queues, reset
indices/counters, clear any pending instructions and valid flags) so no buffered
instructions survive thread lifecycle transitions; ensure you only clear the
entry for the provided tid and leave other threads untouched.
- Around line 434-473: status_change is never set so updateActivate() never
runs; fix by setting status_change = true whenever we change stall/fetch state.
Specifically, when assigning stallSig->blockFetch[i] = block (and in the branch
where you set stallSig->blockFetch[tid] and stallSig->blockFetch[i] = true)
compare the new value to the previous value and set status_change = true if it
differs; this ensures the updateActivate() call (and any other post-decode
logic) runs when stall/fetch status actually changes. Refer to status_change,
stallSig->blockFetch, tid, and updateActivate().

In `@src/cpu/o3/decode.hh`:
- Around line 108-109: The member pointer stallSig is left uninitialized until
setStallSignals() is called; locate the StallSignals* stallSig declaration in
the class in decode.hh and initialize it to nullptr (e.g., StallSignals*
stallSig = nullptr) so it has a defined value by default; keep
setStallSignals(StallSignals* stall_signals) as-is to assign the real pointer
later.

In `@src/cpu/o3/fetch.cc`:
- Around line 1316-1335: The code marks fetch stalled via
stallSig->blockFetch[tid] and calls setAllFetchStalls(...), but still drains
fetchQueue[tid] into toDecode->insts; change the control flow in the fetch
routine so that when stallSig->blockFetch[tid] is true you do not execute the
loop that pops from fetchQueue and writes to toDecode (either return/continue
early or wrap the draining loop in an else), ensuring you preserve the stall
reason from fromDecode->decodeInfo[tid].blockReason and do not modify
wroteToTimeBuffer, fetchQueue[tid], or toDecode->size while blocked.
- Around line 1314-1315: The code currently hardcodes ThreadID tid = 0 in
fetch.cc which effectively disables SMT and can starve threads; update the
ThreadID selection logic in the fetch stage (the declaration/assignment of
ThreadID tid and the other places that force tid = 0) to fail-fast when the
system is configured with numThreads > 1: detect the multi-thread configuration
(e.g., via numThreads or threads()->numThreads()) and, instead of silently using
tid = 0, log a clear fatal error or assert and abort (or return an error)
indicating SMT selection is not implemented yet so the misconfiguration is
obvious; apply the same fail-fast change to the other occurrences that currently
force tid = 0 so multi-thread setups do not silently starve.

In `@src/cpu/o3/iew.cc`:
- Around line 828-830: The current expression "stallSig->blockIEW[i] ||
!canInsertLDSTQue(i)" short-circuits and can skip calling canInsertLDSTQue(i),
preventing its side-effects (e.g., getAndResetLast*PopEntries()) from running
each tick; change the code to call canInsertLDSTQue(i) unconditionally (e.g.,
bool canInsert = canInsertLDSTQue(i); bool block = stallSig->blockIEW[i] ||
!canInsert;) so the side-effecting function always executes while preserving the
original block logic that uses its return value; update the loop that computes
"active" (which uses fixedbuffer[i]) to use the new block variable.
- Around line 734-744: The function currently returns after handling the first
squashed thread, skipping remaining threads; change the control flow in the loop
inside checkSquash (the block that calls squash(i), localSquashVer.update(...),
sets fetchRedirect[i]=false, increments iewStats.stallEvents[ROBWalk], and calls
setAllStalls(StallReason::CommitSquash)) so it does not return
immediately—replace the final return with continue (or otherwise let the loop
proceed) so all threads in the tick are inspected and processed; keep any
necessary outer return behavior after the loop if intended.
- Around line 424-431: Comment and code disagree about IEW→Commit wire offset:
the text describes writing/reading at position [-1] given iewToCommitDelay == 1,
but the code sets toCommit = iewQueue->getWire(0). Inspect and align the
producer/consumer offsets by either changing getWire(0) to getWire(-1) (or the
equivalent reverse-index API) in iew.cc where toCommit is assigned, or update
the comment to document why getWire(0) is correct (referencing iewToCommitDelay,
iewQueue->getWire(), toCommit, and Commit::fromIEW). Also verify corresponding
reads in Commit (fromIEW) and any setIEWQueue usage to ensure both sides use the
same offset convention and adjust one side if needed so producer and consumer
offsets match.

In `@src/cpu/o3/iew.hh`:
- Around line 159-160: StallSignals* stallSig is declared uninitialized;
initialize it to a safe default (nullptr) where declared or in the containing
class/struct constructor, and ensure any code that dereferences it (before
setStallSignals is called) either checks for nullptr or only runs after
setStallSignals has been invoked; update the declaration of stallSig and
relevant constructors and usages (e.g., setStallSignals, any methods in the IEW
class that read stallSig) to enforce the nullptr-initialized invariant.

In `@src/cpu/o3/lsq.cc`:
- Around line 1666-1673: The debug dump is unsafe: replace the fixed 8-byte
buffer and raw cast with a uint64_t local (e.g., uint64_t data = 0), copy at
most sizeof(data) bytes from pkt->getPtr<char>() using std::memcpy(&data,
pkt->getPtr<char>(), std::min(pkt->getSize(), sizeof(data))), and update the
DPRINTF in recvTimingResp (the block with DPRINTF(LSQ,...), pkt->getPtr<char>(),
pkt->getSize(), mainReq(), etc.) to print the value using the correct 64-bit
format (e.g., cast to (unsigned long long) and use %llu) instead of casting
buffer to uint64_t* and using %d.

In `@src/cpu/o3/regfile.hh`:
- Around line 424-426: The DPRINTF call uses
vectorRegFile.regClass.valString(val) but the code is setting a predicate
register with vecPredRegFile.set(idx, val), so replace the formatter to use the
predicate register file's regClass (vecPredRegFile.regClass.valString(val)) so
the logged value string matches the predicate register class; update the DPRINTF
invocation accordingly (symbols: DPRINTF, vectorRegFile, vecPredRegFile,
regClass, valString, vecPredRegFile.set).

In `@src/cpu/o3/rename.cc`:
- Around line 401-406: The code computes releaseSeq using
historyBuffer->empty(), which inspects thread-0 instead of the current thread;
change the emptiness check to historyBuffer[tid].empty() and keep using
historyBuffer[tid].back().instSeqNum so releaseSeq is derived from the correct
thread's history; update the conditional to: releaseSeq =
historyBuffer[tid].empty() ? 0 : historyBuffer[tid].back().instSeqNum
(referencing fromCommit, commitInfo[tid], tid, finalCommitSeq, releaseSeq, and
historyBuffer[tid]).
- Around line 483-485: The DPRINTF in src/cpu/o3/rename.cc has a format/argument
mismatch: the format string expects "[sn:%llu] ... %s" but the call passes tid
first and an extra parameter. Fix by making the arguments match the format —
pass inst->seqNum as the %llu and inst->pcState() (or its string/printable
representation) as the %s, and remove the stray tid argument; alternatively, if
tid should be logged, update the format string to include a specifier for tid
and order the arguments accordingly (refer to the DPRINTF call and the variables
tid, inst->seqNum, inst->pcState()).

In `@src/cpu/o3/rename.hh`:
- Line 342: The field StallSignals* stallSig is left uninitialized; initialize
it defensively (e.g., set to nullptr) so any early calls in tick() or helper
methods can't read an indeterminate pointer. Update the declaration or the
owning class's constructor (the class that contains stallSig) to initialize
stallSig to nullptr and ensure any code that dereferences it (e.g., tick(),
helper methods) checks for null before use.

---

Nitpick comments:
In `@src/cpu/o3/perfCCT.cc`:
- Around line 117-119: Duplicate literal mask 0x0fffffffffffffffllu is used when
printing meta->value and meta->pc; define a single named constant (e.g.,
constexpr uint64_t ADDR_MASK = 0x0fffffffffffffffllu; with a short comment
explaining purpose) in perfCCT.cc near the top of the translation unit, then
replace the inline literals in the serialization/printing sites (references: the
streaming lines that use meta->value and meta->pc) to use ADDR_MASK instead;
ensure the constant type matches the operands (uint64_t/unsigned long long as
appropriate) so no implicit conversion warnings occur.

In `@src/cpu/o3/perfCCT.hh`:
- Line 72: Replace the hardcoded const int MaxMetas = 3000 with a derived
constant tied to the canonical in-flight instruction capacity (e.g., use the
existing MaxNum from DynInst or the project-wide in-flight instruction constant)
so the size stays in sync; change the definition of MaxMetas to reference that
canonical symbol (for example replace 3000 with DynInst::MaxNum or the
equivalent global MAX_IN_FLIGHT_INSTS) and update any dependent uses (modulo
indexing logic that refers to MaxMetas) to rely on this derived value.

In `@src/cpu/reg_class.hh`:
- Around line 215-218: The current code only guards against InvalidRegClass but
still indexes regClassStrings with any other out-of-range/corrupted enum value;
update the lookup to validate the enum range before indexing (e.g., check
regClass is >= 0 and < the enum upper bound/NumRegClasses or compare against
RegClass_LAST) and return a safe fallback string when the check fails; refer to
regClass, InvalidRegClass and regClassStrings when making this change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2910bb1a-4da0-4a6c-9ca2-51fa9d52c57d

📥 Commits

Reviewing files that changed from the base of the PR and between f77f257 and fdba12d.

📒 Files selected for processing (27)

configs/common/xiangshan.py
src/cpu/o3/BaseO3CPU.py
src/cpu/o3/comm.hh
src/cpu/o3/commit.cc
src/cpu/o3/commit.hh
src/cpu/o3/cpu.cc
src/cpu/o3/cpu.hh
src/cpu/o3/decode.cc
src/cpu/o3/decode.hh
src/cpu/o3/dyn_inst.hh
src/cpu/o3/fetch.cc
src/cpu/o3/fetch.hh
src/cpu/o3/iew.cc
src/cpu/o3/iew.hh
src/cpu/o3/inst_queue.cc
src/cpu/o3/issue_queue.cc
src/cpu/o3/lsq.cc
src/cpu/o3/lsq_unit.cc
src/cpu/o3/perfCCT.cc
src/cpu/o3/perfCCT.hh
src/cpu/o3/regfile.hh
src/cpu/o3/rename.cc
src/cpu/o3/rename.hh
src/cpu/o3/rename_map.cc
src/cpu/o3/rob.cc
src/cpu/o3/rob.hh
src/cpu/reg_class.hh

💤 Files with no reviewable changes (2)

src/cpu/o3/dyn_inst.hh
src/cpu/o3/inst_queue.cc

🚧 Files skipped from review as they are similar to previous changes (2)

src/cpu/o3/rob.hh
src/cpu/o3/issue_queue.cc

src/cpu/o3/fetch.hh

src/cpu/o3/iew.cc

src/cpu/o3/regfile.hh

XiangShanRobot · 2026-03-05T12:53:52Z

[Generated by GEM5 Performance Robot]
commit: fdba12d
workflow: On-Demand SPEC Test (Tier 1.5)

Ideal BTB Performance

Overall Score

	PR	Master	Diff(%)
Score	20.43	20.36	+0.33 🟢

Change-Id: I9751e3b2518d4a707153f46d2285069798134b7f

coderabbitai

🧹 Nitpick comments (1)

docs/Gem5_Docs/top/gem5_new_pipelilne.md (1)
140-142: Clarify the semantics of getWire(-delay) to avoid misreading the time direction.

Line 141 states: getWire(-delay)：读取延迟 delay 拍后才可见的槽位

This phrasing is ambiguous. "delay 拍后才可见" (becomes visible delay cycles later) could be misread as accessing a future slot, when the semantics actually mean reading data written delay cycles in the past. The TimeBuffer implementation confirms that negative indices access slots from previous cycles (via the ring buffer's base offset). Recommend clarifying with an example: getWire(-delay) reads data that was written delay cycles ago, not data that will appear in the future.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/Gem5_Docs/top/gem5_new_pipelilne.md` around lines 140 - 142, The
phrasing for getWire(-delay) is ambiguous about time direction; update the
wording around getWire and TimeBuffer to state clearly that negative indices
read past slots (data written delay cycles ago), not future slots—e.g., replace
"getWire(-delay)：读取延迟 delay 拍后才可见的槽位" with a clarifying sentence such as
"getWire(-delay)：读取 delay 拍之前写入的槽位（即 delay 拍前的数据），不是未来的数据），示例：getWire(-2) 读取 2
拍前写入的值。" Also mention that this behavior comes from TimeBuffer's ring-base
offset and that advance() moves the time window forward.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/Gem5_Docs/top/gem5_new_pipelilne.md`:
- Around line 140-142: The phrasing for getWire(-delay) is ambiguous about time
direction; update the wording around getWire and TimeBuffer to state clearly
that negative indices read past slots (data written delay cycles ago), not
future slots—e.g., replace "getWire(-delay)：读取延迟 delay 拍后才可见的槽位" with a
clarifying sentence such as "getWire(-delay)：读取 delay 拍之前写入的槽位（即 delay
拍前的数据），不是未来的数据），示例：getWire(-2) 读取 2 拍前写入的值。" Also mention that this behavior
comes from TimeBuffer's ring-base offset and that advance() moves the time
window forward.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b92e8ee9-03be-422a-a021-2af6891a3650

📥 Commits

Reviewing files that changed from the base of the PR and between fdba12d and 85fa332.

📒 Files selected for processing (1)

docs/Gem5_Docs/top/gem5_new_pipelilne.md

github-actions · 2026-03-06T07:29:51Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.2203`	-
This PR	`2.2219`	📈 `+0.0016` (`+0.07%`)

✅ Difftest smoke test passed!

Change-Id: I734133d4557a01281e0de4f962181f36a5cefea8

github-actions · 2026-03-06T08:14:43Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.2203`	-
This PR	`0.9931`	📉 `-1.2272` (`-55.27%`)

✅ Difftest smoke test passed!

Change-Id: I4dff1631acaf7a07f663f05c878d7bd40cc9740a

github-actions · 2026-03-06T08:47:47Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.2203`	-
This PR	`0.9931`	📉 `-1.2272` (`-55.27%`)

✅ Difftest smoke test passed!

Change-Id: I76d9e187dcef57de818907d9804a8b05b4a76cff

github-actions · 2026-03-06T10:28:40Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.2203`	-
This PR	`2.2354`	📈 `+0.0152` (`+0.68%`)

✅ Difftest smoke test passed!

tastynoob requested review from happy-lx and jensen-yan February 9, 2026 11:32

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

tastynoob force-pushed the refactor-stalls branch from 4fc39e5 to 33b75e7 Compare February 28, 2026 07:31

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

tastynoob added the perf label Feb 28, 2026

tastynoob force-pushed the refactor-stalls branch from 33b75e7 to f77f257 Compare February 28, 2026 09:42

tastynoob added perf and removed perf labels Feb 28, 2026

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

src/cpu/o3/commit.cc Show resolved Hide resolved

src/cpu/o3/rename_map.cc Show resolved Hide resolved

tastynoob added 2 commits March 5, 2026 10:55

cpu-o3: using reverse ordered tick & refactor the stalls logic

d966055

Change-Id: I850ca92264f83e4d3a1178aedac71050d758b8cf

cpu-o3: perfcct support result record

e6d58ac

Change-Id: I4e03eee48edb3e816147896ffe2ce3cf3ac7f122

tastynoob force-pushed the refactor-stalls branch from f77f257 to fdba12d Compare March 5, 2026 10:26

tastynoob removed the perf label Mar 5, 2026

cpu-o3: fix reverse ordered tick

fdba12d

Change-Id: I33c41a75ec20c23becf9e778baef11620b2a3673

tastynoob added the perf label Mar 5, 2026

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

src/cpu/o3/fetch.hh Show resolved Hide resolved

src/cpu/o3/iew.cc Show resolved Hide resolved

src/cpu/o3/regfile.hh Show resolved Hide resolved

cpu-o3: add pr docs

85fa332

Change-Id: I9751e3b2518d4a707153f46d2285069798134b7f

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

happy-lx previously approved these changes Mar 6, 2026

View reviewed changes

jensen-yan dismissed happy-lx’s stale review via 4079192 March 6, 2026 07:54

cpu-o3: Fix reverse-tick topdown propagation

4079192

Change-Id: I734133d4557a01281e0de4f962181f36a5cefea8

cpu-o3: Refine reverse-tick stall attribution

669eca4

Change-Id: I4dff1631acaf7a07f663f05c878d7bd40cc9740a

jensen-yan previously approved these changes Mar 6, 2026

View reviewed changes

jensen-yan dismissed their stale review via 0acbad5 March 6, 2026 10:14

cpu-o3: Delay resolve updates after reverse tick

0acbad5

Change-Id: I76d9e187dcef57de818907d9804a8b05b4a76cff

jensen-yan self-requested a review March 6, 2026 10:32

jensen-yan approved these changes Mar 6, 2026

View reviewed changes

jensen-yan merged commit 1464005 into xs-dev Mar 6, 2026
1 check passed

jensen-yan deleted the refactor-stalls branch March 6, 2026 10:34

Conversation

tastynoob commented Feb 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 28, 2026

🚀 Coremark Smoke Test Results

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

🚀 Coremark Smoke Test Results

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 5, 2026

Uh oh!

github-actions bot commented Mar 5, 2026

🚀 Coremark Smoke Test Results

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

XiangShanRobot commented Mar 5, 2026

Ideal BTB Performance

Overall Score

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 6, 2026

🚀 Coremark Smoke Test Results

Uh oh!

github-actions bot commented Mar 6, 2026

🚀 Coremark Smoke Test Results

Uh oh!

github-actions bot commented Mar 6, 2026

tastynoob commented Feb 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 9, 2026 •

edited

Loading