diff --git a/.codex/skills/branch-predictability-triage/SKILL.md b/.codex/skills/branch-predictability-triage/SKILL.md
new file mode 100644
index 0000000000..aae2a994c8
--- /dev/null
+++ b/.codex/skills/branch-predictability-triage/SKILL.md
@@ -0,0 +1,355 @@
+---
+name: branch-predictability-triage
+description: "用于分析分支 PC 对应的 ELF / 函数 / 源码语义，并判断该分支更像是语义上天然难预测，还是更像预测器没有学好。适用于 SPEC06 checkpoint、benchmark ELF、topMispredictsByBranch.csv、单个 branch PC 归因。"
+---
+
+# 分支可预测性归因技能
+
+## 何时使用
+
+- 你手里有一个或多个 branch PC，想知道它们属于：
+  - benchmark 主体
+  - runtime / toolchain（如 `libgcc`、`glibc`、`jemalloc`）
+  - `bbl` / kernel / 高地址运行时
+- 你想把 branch PC 尽量对应到：
+  - ELF
+  - 函数名
+  - 源码文件 / 代码块
+- 你想判断一条分支更像：
+  - 语义上天然难预测
+  - 结构上本应较容易预测，但 predictor 没抓住
+- 你想分析 SPEC06 切片，例如：
+  - 新 profile：`/nfs/home/share/checkpoints_profiles/.../checkpoint-0-0-0`
+  - 老 profile：`/nfs/share/zyy/spec06_rv64gcb_O3_20m_gcc12.2.0-intFpcOff-jeMalloc/...`
+
+## 目标
+
+输出一份面向分支预测分析的结论，而不是单纯做地址翻译。最终结论至少要回答：
+
+1. 这条 branch PC 属于哪类代码。
+2. 能否对应到 benchmark 自身源码。
+3. 这条分支的控制流模式是什么。
+4. 更像是“天然难预测”还是“predictor 还有提升空间”。
+
+## 输入优先级
+
+优先收集以下信息：
+
+1. branch PC 列表
+2. 切片名或 benchmark 名
+3. 切片根目录
+4. 可能的 `topMispredictsByBranch.csv` / `topMisrateByBranch.csv`
+5. 如有，用户本地 benchmark 源码树
+
+如果用户只给了 PC，没有给切片名，也可以先按地址区间做粗分类。
+
+## 地址分类规则
+
+先判断 PC 是否属于 benchmark ELF 的装载范围，不要一上来就找源码行。
+
+### A. 低地址、落在 benchmark ELF `.text`
+
+常见表现：
+
+- `0x10000` 左右开始的静态可执行映像
+- `llvm-symbolizer` / `nm` 能解析到 benchmark 函数
+
+处理策略：
+
+- 优先认为是 benchmark 主体或静态链接进 benchmark 的 runtime/helper
+
+### B. 高地址，如 `0x8000xxxx`
+
+常见表现：
+
+- 不落在 benchmark ELF 的 LOAD 段
+- 更像 `bbl` / opensbi / kernel / 其他 runtime
+
+处理策略：
+
+- 不要强行拿 benchmark ELF 解
+- 先说明该地址大概率不属于 benchmark 主体
+- 如果没有对应 runtime ELF，只能停在“非 benchmark 代码”这一层
+
+## 仓库内常见路径
+
+### 新 profile
+
+- checkpoint 根目录：`/nfs/home/share/checkpoints_profiles/<profile>/checkpoint-0-0-0`
+- ELF 目录：`/nfs/home/share/checkpoints_profiles/<profile>/elf`
+
+常见映射：
+
+- `gcc_typeck` / `gcc_scilab` / `gcc_expr2` / `gcc_200` -> `elf/gcc`
+- `perlbench_splitmail` / `perlbench_diffmail` -> `elf/perlbench`
+- `bzip2_*` -> `elf/bzip2`
+- `gobmk_*` -> `elf/gobmk`
+- `astar_*` -> `elf/astar`
+- `gamess_*` -> `elf/gamess`
+- `mcf` -> `elf/mcf`
+- `sjeng` -> `elf/sjeng`
+
+### 老 profile
+
+- 根目录：`/nfs/share/zyy/spec06_rv64gcb_O3_20m_gcc12.2.0-intFpcOff-jeMalloc`
+- benchmark ELF：`elf/<bench>_base.riscv64-linux-gnu-gcc12.2.0`
+- 运行镜像：`bin/*-bbl-linux-spec.bin`
+
+注意：
+
+- `bin/*-bbl-linux-spec.bin` 往往不是 ELF，不能直接 `addr2line`
+- 真正可用于静态语义分析的通常是 `elf/` 下的 benchmark ELF
+
+### 本地源码树
+
+常见 SPEC2006 源码路径：
+
+- `/nfs/home/yanyue/tools/cpu2006_analyze/benchspec/CPU2006`
+
+例如：
+
+- `400.perlbench/src`
+- `403.gcc/src`
+- `429.mcf/src`
+- `458.sjeng/src`
+
+## 推荐工具
+
+优先使用：
+
+- `file`
+- `readelf -S`
+- `readelf -Wl`
+- `nm -n`
+- `llvm-symbolizer`
+- `llvm-objdump -d --line-numbers --source`
+- `rg`
+
+必要时使用：
+
+- `gdb -batch -ex 'info line *ADDR'`
+- `readelf --debug-dump=decodedline`
+
+不建议默认依赖系统自带 `addr2line`，因为某些 RISC-V + DWARF 组合下它可能只能给函数名，不能稳定给源码行。
+
+## 标准分析流程
+
+### 第一步：确认 ELF 是否可用
+
+先检查：
+
+```bash
+file <elf>
+readelf -S <elf> | rg 'debug|symtab|strtab'
+readelf -Wl <elf>
+```
+
+目标：
+
+- 确认是否是 ELF
+- 是否带 `debug_info`
+- 代码装载地址范围是什么
+
+### 第二步：判断 PC 是否属于该 ELF
+
+如果 PC 明显不在 LOAD 段范围内：
+
+- 直接标记为“非该 benchmark ELF 主体地址”
+- 不要继续做伪映射
+
+### 第三步：先到函数级
+
+优先拿到函数名：
+
+```bash
+llvm-symbolizer --obj=<elf> 0xPC
+nm -n <elf> | rg '<附近符号>'
+```
+
+如果只能到函数名，也不要停。函数级 + 本地源码通常已经足够做语义分析。
+
+### 第四步：查看函数内分支上下文
+
+```bash
+llvm-objdump -d --line-numbers --source \
+  --start-address=<pc附近起点> \
+  --stop-address=<pc附近终点> \
+  <elf>
+```
+
+重点看：
+
+- 比较指令前的 load / and / shift / compare
+- branch 是：
+  - `beqz/bnez`
+  - `blt/bge`
+  - 循环回边
+  - 早退条件
+  - “刷新最大值”类选择分支
+
+### 第五步：映射到本地源码块
+
+如果 line table 不够稳定：
+
+- 用函数名在本地源码树里找定义
+- 再用汇编语义对到源码块
+
+示例：
+
+```bash
+rg -n '^.*\\bpush_slidE\\b\\s*\\(' /nfs/home/yanyue/tools/cpu2006_analyze/benchspec/CPU2006/458.sjeng/src/*.c
+```
+
+这一步的目标不是强行制造“精确某一行”，而是定位到：
+
+- 哪个函数
+- 哪个 `if/else/loop`
+- 它的输入依赖是什么
+
+### 第六步：判断分支类型
+
+每条分支至少归到以下一种：
+
+- `loop-exit`
+- `guard / fastpath`
+- `predicate-result`
+- `max/min update`
+- `pointer/null/empty check`
+- `state-machine / parser / regex`
+- `runtime/helper`
+
+### 第七步：输出预测性结论
+
+结论至少包含：
+
+- 该分支更像“结构型易预测”还是“语义型难预测”
+- 如果 predictor 表现差，更该怀疑：
+  - predictor 模型 / 历史建模 / alias / 容量
+  - 还是输入分布本身导致的不可规整
+
+## 预测性判断准则
+
+下面是默认启发式，不是绝对规则。
+
+### 通常偏容易预测
+
+- `for/while` 循环退出条件
+- 连续扫描直到边界/空值/哨兵值
+- 长度下界检查
+- 空指针 / 空格 / `npiece` / `frame` / null-check
+- 稳定模式位，例如 `captures`、`mode`、`flag` 长时间不变
+- 明显偏置的错误路径 / 稀有路径
+
+常见表现：
+
+- 连续若干次 taken，然后一次 not-taken
+- 连续若干次 not-taken，然后一次 taken
+- 同一 phase 下高度偏置
+
+如果这类分支 mispredict 很高，更值得怀疑：
+
+- predictor 没学住简单结构
+- 同一 PC 混入太多上下文
+- 表项别名或容量冲突
+
+### 通常更难预测
+
+- regex / parser / symbol-table / search-state 驱动的判断
+- `if (value > best)` 这种“刷新最大值/最小值”类分支
+- 依赖 `load` 出来的动态值，再做分类/比较
+- 依赖输入真假分布的 filter / predicate 结果
+- 依赖多重全局状态的启发式判断
+- 匹配成功/失败、查表命中/未命中、搜索剪枝命中/未命中
+
+常见表现：
+
+- 同一 PC 在不同 phase 下行为变化很大
+- taken ratio 接近中间值
+- 结果高度依赖输入内容或状态机位置
+
+如果这类分支 mispredict 很高，不一定说明 predictor 有明显问题；可能是语义上本来就更难。
+
+## 典型案例模板
+
+### 案例 A：滑动子走子生成
+
+类似：
+
+- `board[target] == npiece`
+- `board[target] != frame`
+
+判断：
+
+- 这是典型扫描型分支
+- 通常结构规整，偏容易预测
+- 如果预测差，优先怀疑 predictor 没把 ray 长度/phase 模式学好
+
+### 案例 B：搜索排序中的“刷新最大值”
+
+类似：
+
+- `if (move_ordering[i] > best)`
+
+判断：
+
+- 这是数据相关分支
+- 依赖 move ordering 分布
+- 比 loop-exit 明显更难
+- 预测差未必是 predictor bug
+
+### 案例 C：regex / match 成败
+
+类似：
+
+- `if (!s) goto nope;`
+- `if (CALLREGEXEC(...))`
+
+判断：
+
+- 强依赖输入文本、状态、匹配位置
+- 通常比长度检查更难预测
+
+## 输出格式建议
+
+对每个 branch PC，建议输出以下字段：
+
+- `pc`
+- `benchmark`
+- `elf`
+- `belongs_to`
+  - `benchmark`
+  - `runtime/toolchain`
+  - `bbl/high-address`
+- `function`
+- `source_candidate`
+- `semantic_pattern`
+- `predictability`
+  - `easy`
+  - `medium`
+  - `hard`
+- `why`
+- `tage_interpretation`
+  - `more_like_predictor_issue`
+  - `more_like_semantically_hard`
+  - `mixed`
+
+## 使用时的注意事项
+
+- 不要把“无法精确到行号”误判为“完全无法分析”。
+- 对 benchmark 主体，函数级定位 + 本地源码块通常已经足够做预测性判断。
+- 对 `0x8000xxxx` 这类地址，先排除 runtime/bbl，再谈源码。
+- 对 `libgcc/glibc` helper，要明确告诉用户：这不是 benchmark 自身算法分支。
+- 如果用户的目标是比较 predictor 设计优劣，优先找：
+  - 结构上本应好预测，但 mispredict 很高的分支
+- 如果用户的目标是解释 workload 本身难度，优先找：
+  - 语义上强数据相关的分支
+
+## 默认结论风格
+
+回答时优先给出：
+
+1. 该 PC 属于什么代码
+2. 它大致对应哪段源码逻辑
+3. 它属于哪种分支模式
+4. 我为什么判断它偏 easy / hard
+5. 我更倾向把责任归到 predictor 还是 workload 语义
diff --git a/.codex/skills/frontend-pmu-analysis/SKILL.md b/.codex/skills/frontend-pmu-analysis/SKILL.md
new file mode 100644
index 0000000000..605fbbe5e2
--- /dev/null
+++ b/.codex/skills/frontend-pmu-analysis/SKILL.md
@@ -0,0 +1,64 @@
+---
+name: frontend-pmu-analysis
+description: "仅做 BPU 计数器提取与批量汇总（机器可读 JSON/CSV）。配置文件只需要写原始 stats 计数器名。"
+---
+
+# BPU 计数器分析技能（极简）
+
+## 何时使用
+- 你已经跑完 gem5，只想批量提取 BPU 相关原始计数器。
+- 你不想在脚本里做复杂推导，只要原始值。
+- 你需要机器可读结果给后续脚本/表格处理。
+
+## 核心原则
+- 只提取原始 counter，不做公式推导。
+- 配置文件只写 counter 名称。
+- 目录结构不做强约束，递归扫描 `stats.txt`。
+- 如果发现某些分支预测错误特别高，还可以看`stats.txt` 同目录的`topMispredictsByBranch.csv`, 记录了哪些分支被错误预测很多。
+- 必要时候可以使用 `--enable-bp-db tage` 来打开tage 的trace db 分析。
+
+## 入口脚本
+- `.codex/skills/frontend-pmu-analysis/scripts/analyze_bpu_counters.py`
+
+## 默认配置
+- `.codex/skills/frontend-pmu-analysis/configs/bpu_counters.txt`
+
+## 用法
+```bash
+python3 .codex/skills/frontend-pmu-analysis/scripts/analyze_bpu_counters.py \
+  --debug-dir /tmp/debug/tage-new8
+```
+
+指定自定义计数器文件：
+
+```bash
+python3 .codex/skills/frontend-pmu-analysis/scripts/analyze_bpu_counters.py \
+  --debug-dir /tmp/debug/tage-new8 \
+  --counters-file /path/to/my_counters.txt
+```
+
+## 输出
+- `bpu_counters_summary.json`
+- `bpu_counters_summary.csv`
+
+输出字段只包含：
+- case 路径
+- stats 路径
+- `values`（命中的计数器和值）
+- `missing`（缺失计数器）
+- `errors`（解析错误）
+
+## 计数器文件格式
+推荐 `txt`（每行一个）：
+
+```txt
+system.cpu.ipc
+system.cpu.commit.branchMispredicts
+system.cpu.commit.branches
+```
+
+如果要批量分析更多计数器，也可以添加到configs 的txt 中
+
+也支持：
+- `yaml`：`counters: [ ... ]` 或直接列表
+- `csv`：第一列或 `counter` 列为计数器名
diff --git a/.codex/skills/frontend-pmu-analysis/configs/bpu_counters.txt b/.codex/skills/frontend-pmu-analysis/configs/bpu_counters.txt
new file mode 100644
index 0000000000..eb17457b9a
--- /dev/null
+++ b/.codex/skills/frontend-pmu-analysis/configs/bpu_counters.txt
@@ -0,0 +1,28 @@
+# Keep only raw counter names. No formulas.
+system.cpu.ipc
+system.cpu.frontendBound
+system.cpu.badSpecBound
+system.cpu.backendBound
+system.cpu.commit.branchMispredicts
+system.cpu.commit.branches
+system.cpu.branchPred.condMiss
+system.cpu.branchPred.condNum
+system.cpu.branchPred.predsOfEachStage::0
+system.cpu.branchPred.predsOfEachStage::2
+system.cpu.branchPred.overrideCount
+system.cpu.branchPred.commitOverrideCount
+system.cpu.branchPred.tage.updateAllocSuccess
+system.cpu.branchPred.tage.updateAllocFailure
+system.cpu.branchPred.tage.updateBankConflict
+system.cpu.branchPred.tage.updateAccessPerBank::0
+system.cpu.branchPred.tage.updateAccessPerBank::1
+system.cpu.branchPred.tage.updateAccessPerBank::2
+system.cpu.branchPred.tage.updateAccessPerBank::3
+system.cpu.branchPred.ittage.commitPredCorrect
+system.cpu.branchPred.ittage.commitPredWrong
+system.cpu.branchPred.ubtb.predHit
+system.cpu.branchPred.ubtb.predMiss
+system.cpu.branchPred.abtb.predHit
+system.cpu.branchPred.abtb.predMiss
+system.cpu.branchPred.mbtb.predHit
+system.cpu.branchPred.mbtb.predMiss
diff --git a/.codex/skills/frontend-pmu-analysis/scripts/analyze_bpu_counters.py b/.codex/skills/frontend-pmu-analysis/scripts/analyze_bpu_counters.py
new file mode 100644
index 0000000000..36aaf6b29f
--- /dev/null
+++ b/.codex/skills/frontend-pmu-analysis/scripts/analyze_bpu_counters.py
@@ -0,0 +1,243 @@
+#!/usr/bin/env python3
+
+from __future__ import annotations
+
+import argparse
+import concurrent.futures
+import csv
+import json
+from dataclasses import dataclass
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Dict, List, Tuple
+
+BEGIN = "---------- Begin Simulation Statistics ----------"
+END = "---------- End Simulation Statistics   ----------"
+DEFAULT_COUNTERS = Path(__file__).resolve().parent.parent / "configs" / "bpu_counters.txt"
+
+
+@dataclass
+class CaseRecord:
+    case_path: str
+    stats_path: str
+    values: Dict[str, float]
+    missing: List[str]
+    errors: List[str]
+
+
+def now_iso() -> str:
+    return datetime.now(timezone.utc).isoformat()
+
+
+def parse_last_stats_block(path: Path) -> Dict[str, float]:
+    lines = path.read_text(encoding="utf-8", errors="ignore").splitlines()
+    blocks: List[List[str]] = []
+    in_block = False
+    current: List[str] = []
+
+    for line in lines:
+        stripped = line.strip()
+        if stripped == BEGIN:
+            in_block = True
+            current = []
+            continue
+        if stripped == END and in_block:
+            blocks.append(current)
+            in_block = False
+            continue
+        if in_block:
+            current.append(line)
+
+    target = blocks[-1] if blocks else lines
+    stats: Dict[str, float] = {}
+    for line in target:
+        if not line or line.startswith("-"):
+            continue
+        parts = line.split()
+        if len(parts) < 2:
+            continue
+        key, value = parts[0], parts[1]
+        try:
+            stats[key] = float(value)
+        except ValueError:
+            continue
+    return stats
+
+
+def load_counters(path: Path) -> List[str]:
+    suffix = path.suffix.lower()
+    if suffix in {".txt", ""}:
+        counters = [line.strip() for line in path.read_text(encoding="utf-8").splitlines()]
+        counters = [c for c in counters if c and not c.startswith("#")]
+        if not counters:
+            raise ValueError(f"no counters found in {path}")
+        return counters
+
+    if suffix in {".yml", ".yaml"}:
+        import yaml
+
+        payload = yaml.safe_load(path.read_text(encoding="utf-8"))
+        if isinstance(payload, list):
+            counters = [str(x).strip() for x in payload if str(x).strip()]
+        elif isinstance(payload, dict):
+            raw = payload.get("counters", [])
+            counters = [str(x).strip() for x in raw if str(x).strip()]
+        else:
+            raise ValueError("yaml must be list or object with counters")
+        if not counters:
+            raise ValueError(f"no counters found in {path}")
+        return counters
+
+    if suffix == ".csv":
+        counters: List[str] = []
+        with path.open(encoding="utf-8", newline="") as fp:
+            reader = csv.DictReader(fp)
+            if reader.fieldnames is None:
+                raise ValueError(f"invalid csv with no header: {path}")
+            column = "counter" if "counter" in reader.fieldnames else reader.fieldnames[0]
+            for row in reader:
+                value = str(row.get(column, "")).strip()
+                if value:
+                    counters.append(value)
+        if not counters:
+            raise ValueError(f"no counters found in {path}")
+        return counters
+
+    raise ValueError("counter file must be .txt/.yml/.yaml/.csv")
+
+
+def analyze_one(stats_path: Path, debug_dir: Path, counters: List[str]) -> CaseRecord:
+    case_rel = stats_path.parent.relative_to(debug_dir)
+    record = CaseRecord(
+        case_path=str(case_rel),
+        stats_path=str(stats_path),
+        values={},
+        missing=[],
+        errors=[],
+    )
+
+    try:
+        stats = parse_last_stats_block(stats_path)
+    except Exception as exc:
+        record.errors.append(f"parse stats failed: {exc}")
+        return record
+
+    values: Dict[str, float] = {}
+    missing: List[str] = []
+    for counter in counters:
+        if counter in stats:
+            values[counter] = stats[counter]
+        else:
+            missing.append(counter)
+
+    record.values = values
+    record.missing = missing
+    return record
+
+
+def write_outputs(debug_dir: Path, counters_file: Path, counters: List[str],
+                  records: List[CaseRecord]) -> Tuple[Path, Path]:
+    summary_json = debug_dir / "bpu_counters_summary.json"
+    summary_csv = debug_dir / "bpu_counters_summary.csv"
+
+    payload = {
+        "generated_at": now_iso(),
+        "debug_dir": str(debug_dir),
+        "counters_file": str(counters_file),
+        "counters": counters,
+        "cases": [
+            {
+                "case_path": r.case_path,
+                "stats_path": r.stats_path,
+                "values": r.values,
+                "missing": r.missing,
+                "errors": r.errors,
+            }
+            for r in sorted(records, key=lambda x: x.case_path)
+        ],
+    }
+    summary_json.write_text(json.dumps(payload, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")
+
+    headers = ["case_path", "stats_path", "missing_count", "error_count", *counters]
+    with summary_csv.open("w", encoding="utf-8", newline="") as fp:
+        writer = csv.DictWriter(fp, fieldnames=headers)
+        writer.writeheader()
+        for record in sorted(records, key=lambda x: x.case_path):
+            row = {
+                "case_path": record.case_path,
+                "stats_path": record.stats_path,
+                "missing_count": len(record.missing),
+                "error_count": len(record.errors),
+            }
+            for counter in counters:
+                row[counter] = record.values.get(counter, "")
+            writer.writerow(row)
+
+    return summary_json, summary_csv
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(description="Extract raw BPU counters from gem5 stats.txt")
+    parser.add_argument("--debug-dir", type=str, required=True, help="Root directory to scan")
+    parser.add_argument(
+        "--counters-file",
+        type=str,
+        default=str(DEFAULT_COUNTERS),
+        help="Counter list file (.txt/.yml/.yaml/.csv)",
+    )
+    parser.add_argument(
+        "--stats-glob",
+        type=str,
+        default="**/stats.txt",
+        help="Glob under debug-dir to find stats files",
+    )
+    parser.add_argument("--max-workers", type=int, default=8)
+    return parser
+
+
+def main() -> int:
+    args = build_parser().parse_args()
+
+    debug_dir = Path(args.debug_dir).resolve()
+    counters_file = Path(args.counters_file).resolve()
+
+    if not debug_dir.exists():
+        raise FileNotFoundError(f"debug dir not found: {debug_dir}")
+    if not counters_file.is_file():
+        raise FileNotFoundError(f"counters file not found: {counters_file}")
+
+    counters = load_counters(counters_file)
+    stats_files = sorted(debug_dir.glob(args.stats_glob))
+
+    records: List[CaseRecord] = []
+    with concurrent.futures.ThreadPoolExecutor(max_workers=args.max_workers) as executor:
+        future_map = {
+            executor.submit(analyze_one, stats_path, debug_dir, counters): stats_path
+            for stats_path in stats_files
+            if stats_path.is_file()
+        }
+        for future in concurrent.futures.as_completed(future_map):
+            stats_path = future_map[future]
+            try:
+                records.append(future.result())
+            except Exception as exc:
+                case_rel = stats_path.parent.relative_to(debug_dir)
+                records.append(
+                    CaseRecord(
+                        case_path=str(case_rel),
+                        stats_path=str(stats_path),
+                        values={},
+                        missing=counters,
+                        errors=[f"unhandled analysis exception: {exc}"],
+                    )
+                )
+
+    summary_json, summary_csv = write_outputs(debug_dir, counters_file, counters, records)
+    print(f"wrote: {summary_json}")
+    print(f"wrote: {summary_csv}")
+    print(f"stats files: {len(records)}")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/.codex/skills/mgsc-table-probe/SKILL.md b/.codex/skills/mgsc-table-probe/SKILL.md
new file mode 100644
index 0000000000..b9f7b2e0af
--- /dev/null
+++ b/.codex/skills/mgsc-table-probe/SKILL.md
@@ -0,0 +1,86 @@
+---
+name: mgsc-table-probe
+description: 分析香山 MGSC/SC 在前端微测试上的效果。适用于以下场景：(1) 用 off/l_only/g_only/i_only/full 等 A/B profile 批量运行 mgsc_test；(2) 比较不同 profile 下的 topMispredictsByBranch.csv 和 stats.txt；(3) 使用 bp.db 里的 MGSCTRACE 将每个分支的收益/损失归因到具体 SC 表；(4) 决定如何为 Global 或 IMLI 表设计新的测试。
+---
+
+# MGSC 表探测
+
+## 概览
+运行标准化的 SC 表 A/B 实验，并产出分支级别的归因结果：
+- `summary.csv`：case 级别的 `off` 与各 profile 的 delta 汇总。
+- `branch_delta.csv`：分支级别的误预测 delta，以及 SC 修复/伤害与表贡献比例。
+- `report.md`：便于人工快速阅读的排序报告。
+
+这个 skill 的目标是快速迭代 SC 测试质量，而不是做完整的性能调优。
+
+## 快速开始
+
+1) 对所有 `mgsc_test` 二进制执行 probe：
+```bash
+python3 .codex/skills/mgsc-table-probe/scripts/mgsc_table_probe.py \
+  --outdir debug/sc_table_probe \
+  --profiles off,l_only,g_only,i_only,full \
+  --max-workers 4
+```
+
+2) 快速检查单个 case：
+```bash
+python3 .codex/skills/mgsc-table-probe/scripts/mgsc_table_probe.py \
+  --outdir debug/sc_table_probe_smoke \
+  --tests fp_sc_alias_pair \
+  --profiles off,g_only,i_only \
+  --max-workers 1
+```
+
+3) 仅重建报告（不重新运行 gem5）：
+```bash
+python3 .codex/skills/mgsc-table-probe/scripts/mgsc_table_probe.py \
+  --outdir debug/sc_table_probe \
+  --profiles off,l_only,g_only,i_only,full \
+  --skip-run
+```
+
+## 工作流
+
+1) **基线 + 单表隔离 profile**
+- 始终包含 `off`。
+- 在检查 `full` 之前，先加入单表 profile（如 `g_only`、`i_only`）。
+
+2) **快速筛选有价值的测试**
+- 在 `summary.csv` 中，优先关注满足 `condMiss_delta < 0` 且 `mgsc_net_use > 0` 的 case。
+
+3) **筛选有价值的分支**
+- 在 `branch_delta.csv` 中，优先关注满足以下条件的行：
+  - `delta_misp < 0`
+  - `focus_decisive_ratio` 较高
+  - `focus_agree_fix_ratio` 较高
+
+4) **决定下一步微测试方向**
+- 如果 `g_only` 很少带来改善，且 `focus_decisive_ratio(g)` 很低，说明 global-history 模式较弱。
+- 如果 `i_only` 从未带来帮助，说明循环/迭代相位信号暴露得还不够。
+- 可以参考 `references/test-patterns.md` 里的模式来编写下一个测试。
+
+## 输出
+
+- `debug/sc_table_probe/summary.csv`
+- `debug/sc_table_probe/branch_delta.csv`
+- `debug/sc_table_probe/report.md`
+- `debug/sc_table_probe/report.json`
+
+## 关键参数
+
+- `--profiles`：从 `off,l_only,g_only,i_only,full` 中选择。
+- `--tests`：逗号分隔的测试名（不带后缀），例如 `fp_sc_alias_pair,imli_iter`。
+- `--extra-param`：透传额外的 gem5 `--param`。
+- `--copy-cpt-to-tmp`：避免路径访问问题。
+- `--skip-run`：仅生成报告，不执行运行。
+
+## 注意事项
+
+- 除非你明确要评估交互效应，否则在做 SC 子表归因时应保持 `microtage` 关闭。
+- 不同 profile 之间必须使用同一组 checkpoint，否则 delta 无效。
+- 做 branch PC 映射时，使用 mgsc_test 构建目录下的 `*-riscv64-xs.txt` 反汇编文件。
+
+## 参考资料
+
+- 关于面向 G/IMLI 的微测试模式，参见 `references/test-patterns.md`。
diff --git a/.codex/skills/mgsc-table-probe/agents/openai.yaml b/.codex/skills/mgsc-table-probe/agents/openai.yaml
new file mode 100644
index 0000000000..1dc68f3ace
--- /dev/null
+++ b/.codex/skills/mgsc-table-probe/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "MGSC Table Probe"
+  short_description: "Batch analyze SC sub-table effects and branch-level fixes"
+  default_prompt: "Use this skill to run mgsc_test A/B profiles (off/l_only/g_only/i_only/full), parse topMispredictsByBranch.csv and MGSCTRACE, and identify which SC table fixes TAGE on which branch."
diff --git a/.codex/skills/mgsc-table-probe/references/test-patterns.md b/.codex/skills/mgsc-table-probe/references/test-patterns.md
new file mode 100644
index 0000000000..5bdeaa6ef2
--- /dev/null
+++ b/.codex/skills/mgsc-table-probe/references/test-patterns.md
@@ -0,0 +1,51 @@
+# SC 测试模式（G / IMLI）
+
+## 目标
+构造这样的微测试：TAGE 对其较弱，但某个特定 SC 表可以修正分支方向。
+
+## GTable（全局历史）模式
+
+当分支结果依赖于近期跨分支结果，而不是纯粹的局部周期模式时，使用这类模式。
+
+- 保持一个目标分支 PC 稳定不变。
+- 在目标分支前加入 2 到 4 个 feeder 分支。
+- 让目标分支的方向依赖于前几次迭代中的 feeder 结果。
+- 注入低幅度噪声分支，刻意压低 TAGE 的置信度。
+
+示例思路：
+- `b0` period 3, `b1` period 5, `b2` period 7.
+- Target: `t = last_b0 ^ last_b1 ^ (b2_now & 1)`.
+- 预期：`g_only` 可能改善一部分 PC；`l_only` 则未必有效。
+
+## ITable（IMLI）模式
+
+当分支方向依赖于循环迭代相位时使用，尤其是依赖 backward-taken 次数的场景。
+
+- 使用固定的循环次数，例如 16/24/32。
+- 让循环内部的某个分支始终位于相同的静态 PC。
+- 让某个特定相位分支在循环尾部或头部附近发生翻转。
+- 可以选择交替切换外层相位，以移动翻转位置。
+
+示例思路：
+- 对于每次外层循环：
+  - 内层 `i in [0, 31]`。
+  - 目标分支仅在 `i == 17` 时 taken（或者按相位在 `i == 17/18` 时 taken）。
+- 预期：如果相位信号暴露充分且稳定，`i_only` 应该会有收益。
+
+## 新微测试的验收标准
+
+如果满足以下全部条件，则这个测试是有价值的：
+
+1. `summary.csv` 中：目标 profile 满足 `condMiss_delta < 0`。
+2. `branch_delta.csv` 中：至少有一个热点分支满足 `delta_misp < 0`。
+3. 同一个热点分支的 `net_use` 为正。
+4. 该分支上的目标表指标具有实际意义：
+   - `focus_decisive_ratio` 不能接近 0。
+   - `focus_agree_fix_ratio` 应该足够高。
+
+## 常见失败模式
+
+- 分支完全是局部周期性的 -> LTable 占主导，G/I 的效果很难体现。
+- 随机性过强 -> 所有表都退化，没有稳定收益。
+- 目标分支不够热 -> 统计噪声太大。
+- 混入多个 branch PC -> 归因会变得模糊。
diff --git a/.codex/skills/mgsc-table-probe/scripts/mgsc_table_probe.py b/.codex/skills/mgsc-table-probe/scripts/mgsc_table_probe.py
new file mode 100644
index 0000000000..5e7af5eed4
--- /dev/null
+++ b/.codex/skills/mgsc-table-probe/scripts/mgsc_table_probe.py
@@ -0,0 +1,589 @@
+#!/usr/bin/env python3
+"""Probe SC table effectiveness on mgsc_test workloads.
+
+This script helps answer:
+1) Which existing micro-tests are sensitive to SC (vs SC off)?
+2) For a target table (e.g., G / IMLI), can that table alone improve mispredicts?
+3) For improved branches, does MGSCTRACE indicate SC is fixing TAGE mistakes?
+
+Typical usage:
+  python3 .codex/skills/mgsc-table-probe/scripts/mgsc_table_probe.py \
+    --outdir debug/sc_table_probe \
+    --profiles off,l_only,g_only,i_only,full \
+    --max-workers 4
+"""
+
+from __future__ import annotations
+
+import argparse
+import csv
+import dataclasses
+import json
+import shutil
+import sqlite3
+import subprocess
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from pathlib import Path
+from typing import Dict, Iterable, List, Optional, Tuple
+
+
+REPO_ROOT = Path(__file__).resolve().parents[1]
+DEFAULT_GEM5 = REPO_ROOT / "build" / "RISCV" / "gem5.opt"
+DEFAULT_CONFIG = REPO_ROOT / "configs" / "example" / "kmhv3.py"
+DEFAULT_CPT_DIR = Path("/nfs/home/yanyue/tools/nexus-am/tests/frontendtest/mgsc_test/build")
+DEFAULT_SRC_DIR = Path("/nfs/home/yanyue/tools/nexus-am/tests/frontendtest/mgsc_test/tests")
+
+TOP_CSV = "topMispredictsByBranch.csv"
+STATS_TXT = "stats.txt"
+BP_DB = "bp.db"
+
+TABLE_COLS = {
+    "bw": "bwPercsum",
+    "l": "lPercsum",
+    "i": "iPercsum",
+    "g": "gPercsum",
+    "p": "pPercsum",
+    "bias": "biasPercsum",
+}
+
+
+@dataclasses.dataclass(frozen=True)
+class Profile:
+    name: str
+    params: Tuple[str, ...]
+    focus_table: Optional[str]
+    enable_db: bool = True
+
+
+@dataclasses.dataclass
+class Case:
+    name: str
+    bin_path: Path
+    disasm_path: Optional[Path]
+    src_path: Optional[Path]
+
+
+@dataclasses.dataclass
+class RunResult:
+    case: Case
+    profile: Profile
+    run_dir: Path
+    ok: bool
+    cmd: List[str]
+    stats: Dict[str, float]
+    top: Dict[int, Dict[str, float]]
+    db_overall: Dict[str, float]
+    db_by_pc: Dict[int, Dict[str, float]]
+    error: str = ""
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="SC table probe harness")
+    parser.add_argument("--gem5-bin", default=str(DEFAULT_GEM5))
+    parser.add_argument("--config", default=str(DEFAULT_CONFIG))
+    parser.add_argument("--cpt-dir", default=str(DEFAULT_CPT_DIR))
+    parser.add_argument("--src-dir", default=str(DEFAULT_SRC_DIR))
+    parser.add_argument("--outdir", default=str(REPO_ROOT / "debug" / "sc_table_probe"))
+    parser.add_argument(
+        "--profiles",
+        default="off,l_only,g_only,i_only,full",
+        help="Comma separated profile names",
+    )
+    parser.add_argument("--tests", default="", help="Comma separated test names, empty means all")
+    parser.add_argument("--extra-param", action="append", default=[])
+    parser.add_argument("--max-workers", type=int, default=1)
+    parser.add_argument("--skip-run", action="store_true", help="Reuse existing outdir results")
+    parser.add_argument("--copy-cpt-to-tmp", action="store_true", default=True)
+    parser.add_argument("--no-copy-cpt-to-tmp", action="store_false", dest="copy_cpt_to_tmp")
+    parser.add_argument("--top-branch-limit", type=int, default=200)
+    return parser.parse_args()
+
+
+def builtin_profiles() -> Dict[str, Profile]:
+    return {
+        "off": Profile(
+            name="off",
+            params=(
+                "system.cpu[0].branchPred.mgsc.enabled=False",
+                "system.cpu[0].branchPred.microtage.enabled=False",
+            ),
+            focus_table=None,
+            enable_db=False,
+        ),
+        "full": Profile(
+            name="full",
+            params=(
+                "system.cpu[0].branchPred.mgsc.enabled=True",
+                "system.cpu[0].branchPred.mgsc.enableBwTable=True",
+                "system.cpu[0].branchPred.mgsc.enableLTable=True",
+                "system.cpu[0].branchPred.mgsc.enableITable=True",
+                "system.cpu[0].branchPred.mgsc.enableGTable=True",
+                "system.cpu[0].branchPred.mgsc.enablePTable=True",
+                "system.cpu[0].branchPred.mgsc.enableBiasTable=True",
+                "system.cpu[0].branchPred.microtage.enabled=False",
+            ),
+            focus_table=None,
+        ),
+        "l_only": Profile(
+            name="l_only",
+            params=(
+                "system.cpu[0].branchPred.mgsc.enabled=True",
+                "system.cpu[0].branchPred.mgsc.enableBwTable=False",
+                "system.cpu[0].branchPred.mgsc.enableLTable=True",
+                "system.cpu[0].branchPred.mgsc.enableITable=False",
+                "system.cpu[0].branchPred.mgsc.enableGTable=False",
+                "system.cpu[0].branchPred.mgsc.enablePTable=False",
+                "system.cpu[0].branchPred.mgsc.enableBiasTable=False",
+                "system.cpu[0].branchPred.microtage.enabled=False",
+            ),
+            focus_table="l",
+        ),
+        "g_only": Profile(
+            name="g_only",
+            params=(
+                "system.cpu[0].branchPred.mgsc.enabled=True",
+                "system.cpu[0].branchPred.mgsc.enableBwTable=False",
+                "system.cpu[0].branchPred.mgsc.enableLTable=False",
+                "system.cpu[0].branchPred.mgsc.enableITable=False",
+                "system.cpu[0].branchPred.mgsc.enableGTable=True",
+                "system.cpu[0].branchPred.mgsc.enablePTable=False",
+                "system.cpu[0].branchPred.mgsc.enableBiasTable=False",
+                "system.cpu[0].branchPred.microtage.enabled=False",
+            ),
+            focus_table="g",
+        ),
+        "i_only": Profile(
+            name="i_only",
+            params=(
+                "system.cpu[0].branchPred.mgsc.enabled=True",
+                "system.cpu[0].branchPred.mgsc.enableBwTable=False",
+                "system.cpu[0].branchPred.mgsc.enableLTable=False",
+                "system.cpu[0].branchPred.mgsc.enableITable=True",
+                "system.cpu[0].branchPred.mgsc.enableGTable=False",
+                "system.cpu[0].branchPred.mgsc.enablePTable=False",
+                "system.cpu[0].branchPred.mgsc.enableBiasTable=False",
+                "system.cpu[0].branchPred.microtage.enabled=False",
+            ),
+            focus_table="i",
+        ),
+    }
+
+
+def parse_hex_or_int(v: str) -> int:
+    s = v.strip().lower()
+    if not s:
+        return 0
+    if s.startswith("0x"):
+        return int(s, 16)
+    if any(ch in "abcdef" for ch in s):
+        return int(s, 16)
+    return int(s, 10)
+
+
+def parse_stats(path: Path) -> Dict[str, float]:
+    keys = {
+        "system.cpu.ipc",
+        "system.cpu.fetch.rate",
+        "system.cpu.branchPred.condNum",
+        "system.cpu.branchPred.condMiss",
+        "system.cpu.commit.branchMispredicts",
+        "system.cpu.branchPred.mgsc.scUsed",
+        "system.cpu.branchPred.mgsc.scCorrectTageWrong",
+        "system.cpu.branchPred.mgsc.scWrongTageCorrect",
+        "simTicks",
+    }
+    out: Dict[str, float] = {}
+    if not path.exists():
+        return out
+    for line in path.read_text(encoding="utf-8", errors="ignore").splitlines():
+        parts = line.split()
+        if len(parts) < 2:
+            continue
+        if parts[0] not in keys:
+            continue
+        try:
+            out[parts[0]] = float(parts[1])
+        except ValueError:
+            continue
+    return out
+
+
+def parse_top_csv(path: Path, limit: int) -> Dict[int, Dict[str, float]]:
+    out: Dict[int, Dict[str, float]] = {}
+    if not path.exists():
+        return out
+    with path.open(encoding="utf-8", newline="") as fp:
+        rows = list(csv.DictReader(fp))
+    for row in rows[:limit]:
+        try:
+            pc_text = (row.get("pc", "") or "").strip()
+            # topMispredictsByBranch.csv stores PC in hex form without "0x" prefix.
+            pc = int(pc_text, 16) if pc_text else 0
+            out[pc] = {
+                "mispredicts": float(row.get("mispredicts", 0)),
+                "total": float(row.get("total", 0)),
+                "misPermil": float(row.get("misPermil", 0)),
+                "dirMiss": float(row.get("dirMiss", 0)),
+            }
+        except (ValueError, TypeError):
+            continue
+    return out
+
+
+def pct(old: float, new: float) -> float:
+    if old == 0:
+        return 0.0
+    return (new - old) / old * 100.0
+
+
+def query_mgsc_db(db_path: Path) -> Tuple[Dict[str, float], Dict[int, Dict[str, float]]]:
+    if not db_path.exists():
+        return {}, {}
+    con = sqlite3.connect(str(db_path))
+    cur = con.cursor()
+    cur.execute("PRAGMA temp_store=MEMORY")
+
+    overall_row = cur.execute(
+        """
+        SELECT
+          COUNT(*) AS rows,
+          SUM(CASE WHEN useSc=1 THEN 1 ELSE 0 END) AS use_sc_rows,
+          SUM(CASE WHEN useSc=1 AND tagePred!=actualTaken AND scPred=actualTaken THEN 1 ELSE 0 END) AS fix_use,
+          SUM(CASE WHEN useSc=1 AND tagePred=actualTaken AND scPred!=actualTaken THEN 1 ELSE 0 END) AS hurt_use
+        FROM MGSCTRACE
+        """
+    ).fetchone()
+    overall = {
+        "rows": float(overall_row[0] or 0),
+        "use_sc_rows": float(overall_row[1] or 0),
+        "fix_use": float(overall_row[2] or 0),
+        "hurt_use": float(overall_row[3] or 0),
+    }
+    overall["net_use"] = overall["fix_use"] - overall["hurt_use"]
+
+    select_cols = [
+        "branchPC",
+        "COUNT(*) AS rows",
+        "SUM(CASE WHEN useSc=1 THEN 1 ELSE 0 END) AS use_sc",
+        "SUM(CASE WHEN useSc=1 AND tagePred!=actualTaken AND scPred=actualTaken THEN 1 ELSE 0 END) AS fix_use",
+        "SUM(CASE WHEN useSc=1 AND tagePred=actualTaken AND scPred!=actualTaken THEN 1 ELSE 0 END) AS hurt_use",
+    ]
+    for short, col in TABLE_COLS.items():
+        select_cols.append(
+        f"SUM(CASE WHEN useSc=1 AND ((totalSum>=0) != ((totalSum - {col})>=0)) THEN 1 ELSE 0 END) AS {short}_decisive"
+        )
+        select_cols.append(
+            f"SUM(CASE WHEN useSc=1 AND tagePred!=actualTaken AND scPred=actualTaken "
+            f"AND (({col}>=0)=actualTaken) THEN 1 ELSE 0 END) AS {short}_agree_fix"
+        )
+
+    rows = cur.execute(
+        f"""
+        SELECT {", ".join(select_cols)}
+        FROM MGSCTRACE
+        GROUP BY branchPC
+        """
+    ).fetchall()
+    con.close()
+
+    by_pc: Dict[int, Dict[str, float]] = {}
+    for row in rows:
+        idx = 0
+        pc = int(row[idx]); idx += 1
+        rows_cnt = float(row[idx] or 0); idx += 1
+        use_sc = float(row[idx] or 0); idx += 1
+        fix_use = float(row[idx] or 0); idx += 1
+        hurt_use = float(row[idx] or 0); idx += 1
+
+        ent: Dict[str, float] = {
+            "rows": rows_cnt,
+            "use_sc": use_sc,
+            "fix_use": fix_use,
+            "hurt_use": hurt_use,
+            "net_use": fix_use - hurt_use,
+        }
+        for short in TABLE_COLS:
+            decisive = float(row[idx] or 0); idx += 1
+            agree_fix = float(row[idx] or 0); idx += 1
+            ent[f"{short}_decisive"] = decisive
+            ent[f"{short}_agree_fix"] = agree_fix
+            ent[f"{short}_decisive_ratio"] = decisive / use_sc if use_sc else 0.0
+            ent[f"{short}_agree_fix_ratio"] = agree_fix / fix_use if fix_use else 0.0
+        by_pc[pc] = ent
+    return overall, by_pc
+
+
+def discover_cases(cpt_dir: Path, src_dir: Path, selected: Optional[Iterable[str]]) -> List[Case]:
+    allow = set(selected) if selected else None
+    cases: List[Case] = []
+    for bin_path in sorted(cpt_dir.glob("*-riscv64-xs.bin")):
+        stem = bin_path.name.replace("-riscv64-xs.bin", "")
+        if allow is not None and stem not in allow:
+            continue
+        disasm = cpt_dir / f"{stem}-riscv64-xs.txt"
+        src = src_dir / f"{stem}.c"
+        cases.append(
+            Case(
+                name=stem,
+                bin_path=bin_path,
+                disasm_path=disasm if disasm.exists() else None,
+                src_path=src if src.exists() else None,
+            )
+        )
+    return cases
+
+
+def maybe_copy_to_tmp(case: Case, run_dir: Path) -> Path:
+    tmp_path = Path("/tmp") / f"{case.name}-riscv64-xs.bin"
+    shutil.copy2(case.bin_path, tmp_path)
+    return tmp_path
+
+
+def run_one(
+    case: Case,
+    profile: Profile,
+    args: argparse.Namespace,
+    outdir: Path,
+) -> RunResult:
+    run_dir = outdir / profile.name / case.name
+    run_dir.mkdir(parents=True, exist_ok=True)
+    cmd = [
+        str(Path(args.gem5_bin)),
+        "--outdir",
+        str(run_dir),
+        str(Path(args.config)),
+        "--raw-cpt",
+    ]
+    cpt_path = maybe_copy_to_tmp(case, run_dir) if args.copy_cpt_to_tmp else case.bin_path
+    cmd.extend(["--generic-rv-cpt", str(cpt_path)])
+    if profile.enable_db:
+        cmd.extend(["--enable-bp-db", "mgsc"])
+    for p in profile.params:
+        cmd.extend(["--param", p])
+    for p in args.extra_param:
+        cmd.extend(["--param", p])
+
+    ok = True
+    err = ""
+    if not args.skip_run:
+        stdout = (run_dir / "gem5.stdout").open("w", encoding="utf-8")
+        stderr = (run_dir / "gem5.stderr").open("w", encoding="utf-8")
+        try:
+            proc = subprocess.run(cmd, stdout=stdout, stderr=stderr, text=True)
+            ok = proc.returncode == 0
+            if not ok:
+                err = f"returncode={proc.returncode}"
+        finally:
+            stdout.close()
+            stderr.close()
+    else:
+        ok = (run_dir / STATS_TXT).exists()
+        if not ok:
+            err = "skip-run but stats not found"
+
+    stats = parse_stats(run_dir / STATS_TXT)
+    top = parse_top_csv(run_dir / TOP_CSV, args.top_branch_limit)
+    db_overall, db_by_pc = query_mgsc_db(run_dir / BP_DB) if profile.enable_db else ({}, {})
+    return RunResult(
+        case=case,
+        profile=profile,
+        run_dir=run_dir,
+        ok=ok,
+        cmd=cmd,
+        stats=stats,
+        top=top,
+        db_overall=db_overall,
+        db_by_pc=db_by_pc,
+        error=err,
+    )
+
+
+def build_reports(results: List[RunResult], profiles: List[Profile], outdir: Path) -> None:
+    baseline: Dict[str, RunResult] = {}
+    for r in results:
+        if r.profile.name == "off":
+            baseline[r.case.name] = r
+
+    summary_rows: List[Dict[str, object]] = []
+    branch_rows: List[Dict[str, object]] = []
+
+    for r in results:
+        base = baseline.get(r.case.name)
+        off_cond_miss = base.stats.get("system.cpu.branchPred.condMiss", 0.0) if base else 0.0
+        on_cond_miss = r.stats.get("system.cpu.branchPred.condMiss", 0.0)
+        off_cond_num = base.stats.get("system.cpu.branchPred.condNum", 0.0) if base else 0.0
+        on_cond_num = r.stats.get("system.cpu.branchPred.condNum", 0.0)
+        off_rate = off_cond_miss / off_cond_num if off_cond_num else 0.0
+        on_rate = on_cond_miss / on_cond_num if on_cond_num else 0.0
+
+        summary_rows.append(
+            {
+                "case": r.case.name,
+                "profile": r.profile.name,
+                "ok": int(r.ok),
+                "off_condMiss": off_cond_miss,
+                "on_condMiss": on_cond_miss,
+                "condMiss_delta": on_cond_miss - off_cond_miss,
+                "off_condMissRate": off_rate,
+                "on_condMissRate": on_rate,
+                "condMissRate_delta_pct": pct(off_rate, on_rate),
+                "off_branchMisp": base.stats.get("system.cpu.commit.branchMispredicts", 0.0) if base else 0.0,
+                "on_branchMisp": r.stats.get("system.cpu.commit.branchMispredicts", 0.0),
+                "mgsc_fix_use": r.db_overall.get("fix_use", 0.0),
+                "mgsc_hurt_use": r.db_overall.get("hurt_use", 0.0),
+                "mgsc_net_use": r.db_overall.get("net_use", 0.0),
+                "source": str(r.case.src_path) if r.case.src_path else "",
+            }
+        )
+
+        if base is None or r.profile.name == "off":
+            continue
+        pcs = set(base.top.keys()) | set(r.top.keys())
+        for pc in sorted(pcs):
+            off = base.top.get(pc, {})
+            on = r.top.get(pc, {})
+            off_m = float(off.get("mispredicts", 0.0))
+            on_m = float(on.get("mispredicts", 0.0))
+            db = r.db_by_pc.get(pc, {})
+            row = {
+                "case": r.case.name,
+                "profile": r.profile.name,
+                "pc_hex": f"0x{pc:x}",
+                "off_misp": off_m,
+                "on_misp": on_m,
+                "delta_misp": on_m - off_m,
+                "off_total": float(off.get("total", 0.0)),
+                "on_total": float(on.get("total", 0.0)),
+                "fix_use": db.get("fix_use", 0.0),
+                "hurt_use": db.get("hurt_use", 0.0),
+                "net_use": db.get("net_use", 0.0),
+                "use_sc": db.get("use_sc", 0.0),
+            }
+            for short in TABLE_COLS:
+                row[f"{short}_decisive_ratio"] = db.get(f"{short}_decisive_ratio", 0.0)
+                row[f"{short}_agree_fix_ratio"] = db.get(f"{short}_agree_fix_ratio", 0.0)
+            if r.profile.focus_table:
+                focus = r.profile.focus_table
+                row["focus_table"] = focus
+                row["focus_decisive_ratio"] = row[f"{focus}_decisive_ratio"]
+                row["focus_agree_fix_ratio"] = row[f"{focus}_agree_fix_ratio"]
+            else:
+                row["focus_table"] = ""
+                row["focus_decisive_ratio"] = 0.0
+                row["focus_agree_fix_ratio"] = 0.0
+            branch_rows.append(row)
+
+    summary_csv = outdir / "summary.csv"
+    branch_csv = outdir / "branch_delta.csv"
+    write_csv(summary_csv, summary_rows)
+    write_csv(branch_csv, branch_rows)
+
+    md_lines = render_markdown(summary_rows, branch_rows, profiles)
+    (outdir / "report.md").write_text("\n".join(md_lines), encoding="utf-8")
+    (outdir / "report.json").write_text(
+        json.dumps({"summary": summary_rows, "branch_delta": branch_rows}, indent=2),
+        encoding="utf-8",
+    )
+
+
+def write_csv(path: Path, rows: List[Dict[str, object]]) -> None:
+    if not rows:
+        path.write_text("", encoding="utf-8")
+        return
+    keys = list(rows[0].keys())
+    with path.open("w", encoding="utf-8", newline="") as fp:
+        writer = csv.DictWriter(fp, fieldnames=keys)
+        writer.writeheader()
+        writer.writerows(rows)
+
+
+def render_markdown(
+    summary_rows: List[Dict[str, object]],
+    branch_rows: List[Dict[str, object]],
+    profiles: List[Profile],
+) -> List[str]:
+    lines: List[str] = []
+    lines.append("# SC Table Probe Report")
+    lines.append("")
+    lines.append("## Profiles")
+    lines.append("")
+    for p in profiles:
+        focus = p.focus_table if p.focus_table else "-"
+        lines.append(f"- `{p.name}`: focus={focus}, db={'on' if p.enable_db else 'off'}")
+    lines.append("")
+
+    lines.append("## Overall (sorted by condMiss reduction)")
+    lines.append("")
+    lines.append("| case | profile | off condMiss | on condMiss | delta | off rate | on rate | delta% | net_use |")
+    lines.append("| --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |")
+    sorted_rows = sorted(summary_rows, key=lambda x: float(x.get("condMiss_delta", 0.0)))
+    for r in sorted_rows[:80]:
+        lines.append(
+            f"| {r['case']} | {r['profile']} | {r['off_condMiss']:.0f} | {r['on_condMiss']:.0f} | "
+            f"{r['condMiss_delta']:.0f} | {r['off_condMissRate']:.4f} | {r['on_condMissRate']:.4f} | "
+            f"{r['condMissRate_delta_pct']:+.2f}% | {r['mgsc_net_use']:.0f} |"
+        )
+    lines.append("")
+
+    lines.append("## G / I candidate branches (best improvements)")
+    lines.append("")
+    lines.append("| case | profile | pc | off misp | on misp | delta | net_use | focus decisive | focus agree_fix |")
+    lines.append("| --- | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: |")
+    focus_rows = [r for r in branch_rows if r.get("focus_table") in {"g", "i"} and r["off_misp"] >= 50]
+    focus_rows.sort(key=lambda x: float(x["delta_misp"]))
+    for r in focus_rows[:80]:
+        lines.append(
+            f"| {r['case']} | {r['profile']} | {r['pc_hex']} | {r['off_misp']:.0f} | {r['on_misp']:.0f} | "
+            f"{r['delta_misp']:.0f} | {r['net_use']:.0f} | {r['focus_decisive_ratio']:.3f} | "
+            f"{r['focus_agree_fix_ratio']:.3f} |"
+        )
+    lines.append("")
+    lines.append("Interpretation tips:")
+    lines.append("- `delta<0` means SC profile improves that branch against `off`.")
+    lines.append("- High `focus_decisive_ratio` means the focus table often changes SC final sign.")
+    lines.append("- High `focus_agree_fix_ratio` means focus table sign aligns with real outcome on SC-fix events.")
+    return lines
+
+
+def main() -> int:
+    args = parse_args()
+    outdir = Path(args.outdir)
+    outdir.mkdir(parents=True, exist_ok=True)
+
+    builtins = builtin_profiles()
+    profile_names = [x.strip() for x in args.profiles.split(",") if x.strip()]
+    profiles: List[Profile] = []
+    for name in profile_names:
+        if name not in builtins:
+            raise ValueError(f"Unknown profile: {name}. choose from {sorted(builtins)}")
+        profiles.append(builtins[name])
+    if "off" not in {p.name for p in profiles}:
+        profiles.insert(0, builtins["off"])
+
+    selected = [x.strip() for x in args.tests.split(",") if x.strip()] or None
+    cases = discover_cases(Path(args.cpt_dir), Path(args.src_dir), selected)
+    if not cases:
+        print("No test cases found.")
+        return 1
+
+    tasks = [(case, profile) for case in cases for profile in profiles]
+    results: List[RunResult] = []
+    with ThreadPoolExecutor(max_workers=max(1, args.max_workers)) as ex:
+        futures = [
+            ex.submit(run_one, case=case, profile=profile, args=args, outdir=outdir)
+            for case, profile in tasks
+        ]
+        for fut in as_completed(futures):
+            res = fut.result()
+            results.append(res)
+            status = "OK" if res.ok else "FAIL"
+            print(f"[{status}] {res.profile.name}/{res.case.name}")
+
+    build_reports(results, profiles, outdir)
+    print(f"Report written to: {outdir / 'report.md'}")
+    print(f"CSV written to: {outdir / 'summary.csv'} and {outdir / 'branch_delta.csv'}")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/.codex/skills/run-cpt-regression/SKILL.md b/.codex/skills/run-cpt-regression/SKILL.md
new file mode 100644
index 0000000000..a446c72afb
--- /dev/null
+++ b/.codex/skills/run-cpt-regression/SKILL.md
@@ -0,0 +1,53 @@
+---
+name: run-cpt-regression
+description: "仅负责批量运行 gem5 checkpoint（1次或2次）。不做任何分析。"
+---
+
+# 批量 CPT 运行技能（仅运行）
+
+## 何时使用
+- 你只想批量跑 checkpoint / 小测试。
+- 你希望 run 与 analysis 完全解耦。
+
+## 核心原则
+- 这个 skill **不做分析**。
+- 只产出运行目录、`stats.txt`、`gem5.stdout`、`gem5.stderr`。
+
+## 入口脚本
+- `.codex/skills/run-cpt-regression/scripts/run_cpt_back.py`
+
+## 典型用法
+批量跑（默认 ref+opt）：
+
+```bash
+python3 .codex/skills/run-cpt-regression/scripts/run_cpt_back.py \
+  --debug-dir /tmp/debug/tage-new8
+```
+
+仅跑 opt（跳过 ref）：
+
+```bash
+python3 .codex/skills/run-cpt-regression/scripts/run_cpt_back.py \
+  --debug-dir /tmp/debug/tage-new8 \
+  --skip-ref
+```
+
+仅跑指定切片：
+
+```bash
+python3 .codex/skills/run-cpt-regression/scripts/run_cpt_back.py \
+  --debug-dir /tmp/debug/tage-new8 \
+  --slices 2fetch coremark10
+```
+
+带参数运行某个切片, 使用-P：
+```bash
+GCBV_REF_SO=<path/to/riscv64-nemu-interpreter-so> \
+./build/RISCV/gem5.opt ./configs/example/kmhv3.py \
+    --raw-cpt \
+    --generic-rv-cpt=<path/to/raw_checkpoint.bin> \
+    -P "system.cpu[0].branchPred.mgsc.enabled=True"
+```
+
+## 后续分析
+请使用另一个 skill：`frontend-pmu-analysis`。
diff --git a/.codex/skills/run-cpt-regression/scripts/run_cpt_back.py b/.codex/skills/run-cpt-regression/scripts/run_cpt_back.py
new file mode 100755
index 0000000000..03639529da
--- /dev/null
+++ b/.codex/skills/run-cpt-regression/scripts/run_cpt_back.py
@@ -0,0 +1,183 @@
+#!/usr/bin/env python3
+
+from __future__ import annotations
+
+import argparse
+import concurrent.futures
+import logging
+import os
+import subprocess
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Dict, List
+
+REPO_ROOT = Path(__file__).resolve().parents[4]
+GEM5_BUILD_DIR = REPO_ROOT / "build" / "RISCV"
+KMHV3_CONFIG = REPO_ROOT / "configs" / "example" / "kmhv3.py"
+
+
+@dataclass
+class SimConfig:
+    binary: str
+    slice_name: str
+    checkpoint: str
+    outdir: Path
+    args: List[str]
+
+
+class GEM5Runner:
+    def __init__(self, max_workers: int, debug_dir: str, kmhv3_params: List[str], skip_ref: bool):
+        self.max_workers = max_workers
+        debug_path = Path(debug_dir)
+        if not debug_path.is_absolute():
+            debug_path = REPO_ROOT / debug_path
+        self.debug_dir = debug_path
+
+        self.kmhv3_params = kmhv3_params
+        self.skip_ref = skip_ref
+
+        logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
+        self.logger = logging.getLogger(__name__)
+
+        self.slices: Dict[str, str] = {
+            "coremark10": "/nfs/home/share/gem5_ci/checkpoints/coremark-riscv64-xs.bin",
+        }
+        self.load_frontend_tests()
+
+    def load_frontend_tests(self) -> None:
+        am_home = os.environ.get("AM_HOME")
+        if not am_home:
+            self.logger.warning("AM_HOME is not set; skip frontend test discovery")
+            return
+
+        base = Path(am_home) / "tests" / "frontendtest"
+        build_dirs = [
+            base / "build",
+            base / "br_target_test" / "build",
+            base / "cond_br_test" / "build",
+            base / "mgsc_test" / "build",
+        ]
+
+        discovered = 0
+        for build_dir in build_dirs:
+            if not build_dir.exists():
+                self.logger.warning("Frontend test directory not found: %s", build_dir)
+                continue
+            for binary in build_dir.glob("*-riscv64-xs.bin"):
+                name = binary.stem
+                suffix = "-riscv64-xs"
+                if name.endswith(suffix):
+                    name = name[: -len(suffix)]
+                if name not in self.slices:
+                    discovered += 1
+                self.slices[name] = str(binary)
+
+        if discovered:
+            self.logger.info("Discovered %d frontend tests via AM_HOME", discovered)
+
+    def generate_configs(self) -> List[SimConfig]:
+        configs: List[SimConfig] = []
+        for slice_name, checkpoint in self.slices.items():
+            if not self.skip_ref:
+                configs.append(
+                    SimConfig(
+                        binary="gem5.opt.ref",
+                        slice_name=slice_name,
+                        checkpoint=checkpoint,
+                        outdir=self.debug_dir / f"{slice_name}_ref",
+                        args=[""],
+                    )
+                )
+            configs.append(
+                SimConfig(
+                    binary="gem5.opt",
+                    slice_name=slice_name,
+                    checkpoint=checkpoint,
+                    outdir=self.debug_dir / f"{slice_name}_opt",
+                    args=[""],
+                )
+            )
+        return configs
+
+    def run_single(self, config: SimConfig) -> bool:
+        config.outdir.mkdir(parents=True, exist_ok=True)
+        stdout_file = config.outdir / "gem5.stdout"
+        stderr_file = config.outdir / "gem5.stderr"
+
+        cmd: List[str] = [
+            str(GEM5_BUILD_DIR / config.binary),
+            "--outdir",
+            str(config.outdir),
+            str(KMHV3_CONFIG),
+            "--generic-rv-cpt",
+            str(config.checkpoint),
+            "--raw-cpt",
+            *config.args,
+        ]
+        for param in self.kmhv3_params:
+            cmd.extend(["-P", param])
+
+        self.logger.info("Run %s with %s", config.slice_name, config.binary)
+        with stdout_file.open("w", encoding="utf-8") as out, stderr_file.open("w", encoding="utf-8") as err:
+            proc = subprocess.run(cmd, stdout=out, stderr=err, text=True)
+
+        if proc.returncode == 0:
+            return True
+
+        err_text = stderr_file.read_text(encoding="utf-8", errors="ignore").strip()
+        self.logger.error("Simulation failed: %s %s: %s", config.slice_name, config.binary, err_text)
+        return False
+
+    def run_all(self) -> int:
+        configs = self.generate_configs()
+        success = 0
+        fail = 0
+
+        with concurrent.futures.ThreadPoolExecutor(max_workers=self.max_workers) as executor:
+            future_map = {executor.submit(self.run_single, cfg): cfg for cfg in configs}
+            for future in concurrent.futures.as_completed(future_map):
+                cfg = future_map[future]
+                try:
+                    if future.result():
+                        success += 1
+                    else:
+                        fail += 1
+                except Exception as exc:
+                    fail += 1
+                    self.logger.error("Unhandled simulation exception on %s: %s", cfg.slice_name, exc)
+
+        self.logger.info("Simulation done. success=%d failed=%d", success, fail)
+        return 0 if fail == 0 else 1
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(description="Run gem5 checkpoint batch only (no analysis)")
+    parser.add_argument("--max-workers", type=int, default=64)
+    parser.add_argument("--debug-dir", type=str, default="debug/test1")
+    parser.add_argument("--slices", type=str, nargs="+", help="Run only selected slices")
+    parser.add_argument("--skip-ref", action="store_true", help="Skip gem5.opt.ref runs")
+    parser.add_argument("--param", action="append", default=[], help="Repeatable kmhv3 -P argument")
+    return parser
+
+
+def main() -> int:
+    args = build_parser().parse_args()
+
+    runner = GEM5Runner(
+        max_workers=args.max_workers,
+        debug_dir=args.debug_dir,
+        kmhv3_params=args.param,
+        skip_ref=args.skip_ref,
+    )
+
+    if args.slices:
+        runner.slices = {k: v for k, v in runner.slices.items() if k in args.slices}
+        if not runner.slices:
+            runner.logger.error("No valid slices specified")
+            return 1
+
+    return runner.run_all()
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/configs/common/xiangshan.py b/configs/common/xiangshan.py
index 1190d5486d..ab0f5c6eef 100644
--- a/configs/common/xiangshan.py
+++ b/configs/common/xiangshan.py
@@ -369,7 +369,7 @@ def build_xiangshan_system(args):
 
         enable_bp_db = len(args.enable_bp_db) > 1
         if enable_bp_db:
-            bp_db_switches = args.enable_bp_db[1] + ['basic']
+            bp_db_switches = list(args.enable_bp_db[1])
             print("BP db switches:", bp_db_switches)
         else:
             bp_db_switches = []
diff --git a/configs/example/kmhv3.py b/configs/example/kmhv3.py
index 71844d9478..9c75771831 100644
--- a/configs/example/kmhv3.py
+++ b/configs/example/kmhv3.py
@@ -105,9 +105,17 @@ def setKmhV3Params(args, system):
             cpu.branchPred.mbtb.enabled = True
             cpu.branchPred.tage.enabled = True
             cpu.branchPred.ittage.enabled = True
-            cpu.branchPred.mgsc.enabled = False
+            cpu.branchPred.mgsc.enabled = True
             cpu.branchPred.ras.enabled = True
 
+            # RTL alignment: only enable bias + path + IMLI tables, disable PC threshold
+            cpu.branchPred.mgsc.enableBwTable = False
+            cpu.branchPred.mgsc.enableLTable = False
+            cpu.branchPred.mgsc.enableITable = True
+            cpu.branchPred.mgsc.enableGTable = False
+            cpu.branchPred.mgsc.enablePTable = True
+            cpu.branchPred.mgsc.enableBiasTable = True
+
         # l1 cache per core
         if args.caches:
             cpu.icache.size = '64kB'
diff --git a/src/cpu/pred/BranchPredictor.py b/src/cpu/pred/BranchPredictor.py
index a5132f48af..6de883fa63 100644
--- a/src/cpu/pred/BranchPredictor.py
+++ b/src/cpu/pred/BranchPredictor.py
@@ -1057,7 +1057,7 @@ class BTBTAGE(TimedBaseBTBPredictor):
     useAltOnNaSize = Param.Unsigned(128, "Size of the useAltOnNa table")
     useAltOnNaWidth = Param.Unsigned(7, "Width of the useAltOnNa table")
     numBanks = Param.Unsigned(4, "Number of banks for bank conflict simulation")
-    enableBankConflict = Param.Bool(True, "Enable bank conflict simulation")
+    enableBankConflict = Param.Bool(False, "Enable bank conflict simulation")
     numDelay = 2
 
 class MicroTAGE(BTBTAGE):
@@ -1152,6 +1152,7 @@ class BTBMGSC(TimedBaseBTBPredictor):
     enablePTable = Param.Bool(True, "Enable P (path) table")
     enableBiasTable = Param.Bool(True, "Enable Bias table")
     enablePCThreshold = Param.Bool(False, "Enable PC-indexed threshold table")
+    focusBranchPC = Param.Addr(0, "Only write MGSCTRACE for this branch PC when non-zero")
 
     numDelay = 2
 
diff --git a/src/cpu/pred/btb/btb_mgsc.cc b/src/cpu/pred/btb/btb_mgsc.cc
index 0211ad7eaa..9011dbbce6 100755
--- a/src/cpu/pred/btb/btb_mgsc.cc
+++ b/src/cpu/pred/btb/btb_mgsc.cc
@@ -157,6 +157,7 @@ BTBMGSC::BTBMGSC()
       enablePTable(true),
       enableBiasTable(true),
       enablePCThreshold(false),
+      focusBranchPC(0),
       mgscStats()
 {
     // Test-only small config: keep tables tiny and deterministic for fast unit tests.
@@ -204,6 +205,7 @@ BTBMGSC::BTBMGSC(const Params &p)
       enablePTable(p.enablePTable),
       enableBiasTable(p.enableBiasTable),
       enablePCThreshold(p.enablePCThreshold),
+      focusBranchPC(p.focusBranchPC),
       mgscStats(this)
 {
     DPRINTF(MGSC, "BTBMGSC constructor\n");
@@ -413,6 +415,9 @@ BTBMGSC::generateSinglePrediction(const BTBEntry &btb_entry, const Addr &startPC
     int p_update_thres = enablePCThreshold ? findThreshold(pUpdateThreshold, btb_entry.pc) : 0;
 
     int total_thres = (updateThreshold / 8) + p_update_thres;
+    // Threshold is used as a confidence gate; avoid negative values which
+    // effectively disable the gate (abs(sum) > negative is almost always true).
+    total_thres = std::max(total_thres, 0);
 
     bool use_sc_pred = forceUseSC;  // Force use SC if configured
     if (!use_sc_pred) {
@@ -656,6 +661,11 @@ void
 BTBMGSC::updateGlobalThreshold(Addr pc, bool update_direction)
 {
     updateCounter(update_direction, updateThresholdWidth, updateThreshold);
+    // Keep global threshold non-negative; negative thresholds make SC gating
+    // degenerate and can cause overuse of SC.
+    if (updateThreshold < 0) {
+        updateThreshold = 0;
+    }
 }
 
 void
@@ -771,7 +781,7 @@ BTBMGSC::updateSinglePredictor(const BTBEntry &entry, bool actual_taken, const M
 
 #ifndef UNIT_TEST
     // Write trace record
-    if (enableDB) {
+    if (enableDB && (focusBranchPC == 0 || entry.pc == focusBranchPC)) {
         MgscTrace t;
         t.set(entry.pc,
             tage_pred_taken, pred.tage_conf_high, pred.tage_conf_mid, pred.tage_conf_low,
@@ -784,7 +794,7 @@ BTBMGSC::updateSinglePredictor(const BTBEntry &entry, bool actual_taken, const M
 #endif
 
     // Only update tables if prediction was wrong or confidence was low
-    if (sc_pred_taken != actual_taken || abs(total_sum) < total_thres) {
+    if (sc_pred_taken != actual_taken || abs(total_sum) < (total_thres / 2)) {
         // get weight table index from startPC
         Addr weightTableIdx = getPcIndex(stream.startPC, weightTableIdxWidth);
         bool threshold_inc = (sc_pred_taken != actual_taken);
diff --git a/src/cpu/pred/btb/btb_mgsc.hh b/src/cpu/pred/btb/btb_mgsc.hh
index ee94023d1d..100fc639a4 100755
--- a/src/cpu/pred/btb/btb_mgsc.hh
+++ b/src/cpu/pred/btb/btb_mgsc.hh
@@ -351,6 +351,7 @@ class BTBMGSC : public TimedBaseBTBPredictor
     bool enablePTable;
     bool enableBiasTable;
     bool enablePCThreshold;
+    Addr focusBranchPC;
 
     // Folded history for index calculation
     std::vector<GlobalBwFoldedHist> indexBwFoldedHist;
@@ -522,6 +523,7 @@ class BTBMGSC : public TimedBaseBTBPredictor
         static bool &enablePTable(BTBMGSC &mgsc) { return mgsc.enablePTable; }
         static bool &enableBiasTable(BTBMGSC &mgsc) { return mgsc.enableBiasTable; }
         static bool &enablePCThreshold(BTBMGSC &mgsc) { return mgsc.enablePCThreshold; }
+        static Addr &focusBranchPC(BTBMGSC &mgsc) { return mgsc.focusBranchPC; }
 
         static auto &bwTable(BTBMGSC &mgsc) { return mgsc.bwTable; }
         static auto &lTable(BTBMGSC &mgsc) { return mgsc.lTable; }