diff --git a/.codex/skills/branch-predictability-triage/SKILL.md b/.codex/skills/branch-predictability-triage/SKILL.md new file mode 100644 index 0000000000..aae2a994c8 --- /dev/null +++ b/.codex/skills/branch-predictability-triage/SKILL.md @@ -0,0 +1,355 @@ +--- +name: branch-predictability-triage +description: "用于分析分支 PC 对应的 ELF / 函数 / 源码语义,并判断该分支更像是语义上天然难预测,还是更像预测器没有学好。适用于 SPEC06 checkpoint、benchmark ELF、topMispredictsByBranch.csv、单个 branch PC 归因。" +--- + +# 分支可预测性归因技能 + +## 何时使用 + +- 你手里有一个或多个 branch PC,想知道它们属于: + - benchmark 主体 + - runtime / toolchain(如 `libgcc`、`glibc`、`jemalloc`) + - `bbl` / kernel / 高地址运行时 +- 你想把 branch PC 尽量对应到: + - ELF + - 函数名 + - 源码文件 / 代码块 +- 你想判断一条分支更像: + - 语义上天然难预测 + - 结构上本应较容易预测,但 predictor 没抓住 +- 你想分析 SPEC06 切片,例如: + - 新 profile:`/nfs/home/share/checkpoints_profiles/.../checkpoint-0-0-0` + - 老 profile:`/nfs/share/zyy/spec06_rv64gcb_O3_20m_gcc12.2.0-intFpcOff-jeMalloc/...` + +## 目标 + +输出一份面向分支预测分析的结论,而不是单纯做地址翻译。最终结论至少要回答: + +1. 这条 branch PC 属于哪类代码。 +2. 能否对应到 benchmark 自身源码。 +3. 这条分支的控制流模式是什么。 +4. 更像是“天然难预测”还是“predictor 还有提升空间”。 + +## 输入优先级 + +优先收集以下信息: + +1. branch PC 列表 +2. 切片名或 benchmark 名 +3. 切片根目录 +4. 可能的 `topMispredictsByBranch.csv` / `topMisrateByBranch.csv` +5. 如有,用户本地 benchmark 源码树 + +如果用户只给了 PC,没有给切片名,也可以先按地址区间做粗分类。 + +## 地址分类规则 + +先判断 PC 是否属于 benchmark ELF 的装载范围,不要一上来就找源码行。 + +### A. 低地址、落在 benchmark ELF `.text` + +常见表现: + +- `0x10000` 左右开始的静态可执行映像 +- `llvm-symbolizer` / `nm` 能解析到 benchmark 函数 + +处理策略: + +- 优先认为是 benchmark 主体或静态链接进 benchmark 的 runtime/helper + +### B. 高地址,如 `0x8000xxxx` + +常见表现: + +- 不落在 benchmark ELF 的 LOAD 段 +- 更像 `bbl` / opensbi / kernel / 其他 runtime + +处理策略: + +- 不要强行拿 benchmark ELF 解 +- 先说明该地址大概率不属于 benchmark 主体 +- 如果没有对应 runtime ELF,只能停在“非 benchmark 代码”这一层 + +## 仓库内常见路径 + +### 新 profile + +- checkpoint 根目录:`/nfs/home/share/checkpoints_profiles//checkpoint-0-0-0` +- ELF 目录:`/nfs/home/share/checkpoints_profiles//elf` + +常见映射: + +- `gcc_typeck` / `gcc_scilab` / `gcc_expr2` / `gcc_200` -> `elf/gcc` +- `perlbench_splitmail` / `perlbench_diffmail` -> `elf/perlbench` +- `bzip2_*` -> `elf/bzip2` +- `gobmk_*` -> `elf/gobmk` +- `astar_*` -> `elf/astar` +- `gamess_*` -> `elf/gamess` +- `mcf` -> `elf/mcf` +- `sjeng` -> `elf/sjeng` + +### 老 profile + +- 根目录:`/nfs/share/zyy/spec06_rv64gcb_O3_20m_gcc12.2.0-intFpcOff-jeMalloc` +- benchmark ELF:`elf/_base.riscv64-linux-gnu-gcc12.2.0` +- 运行镜像:`bin/*-bbl-linux-spec.bin` + +注意: + +- `bin/*-bbl-linux-spec.bin` 往往不是 ELF,不能直接 `addr2line` +- 真正可用于静态语义分析的通常是 `elf/` 下的 benchmark ELF + +### 本地源码树 + +常见 SPEC2006 源码路径: + +- `/nfs/home/yanyue/tools/cpu2006_analyze/benchspec/CPU2006` + +例如: + +- `400.perlbench/src` +- `403.gcc/src` +- `429.mcf/src` +- `458.sjeng/src` + +## 推荐工具 + +优先使用: + +- `file` +- `readelf -S` +- `readelf -Wl` +- `nm -n` +- `llvm-symbolizer` +- `llvm-objdump -d --line-numbers --source` +- `rg` + +必要时使用: + +- `gdb -batch -ex 'info line *ADDR'` +- `readelf --debug-dump=decodedline` + +不建议默认依赖系统自带 `addr2line`,因为某些 RISC-V + DWARF 组合下它可能只能给函数名,不能稳定给源码行。 + +## 标准分析流程 + +### 第一步:确认 ELF 是否可用 + +先检查: + +```bash +file +readelf -S | rg 'debug|symtab|strtab' +readelf -Wl +``` + +目标: + +- 确认是否是 ELF +- 是否带 `debug_info` +- 代码装载地址范围是什么 + +### 第二步:判断 PC 是否属于该 ELF + +如果 PC 明显不在 LOAD 段范围内: + +- 直接标记为“非该 benchmark ELF 主体地址” +- 不要继续做伪映射 + +### 第三步:先到函数级 + +优先拿到函数名: + +```bash +llvm-symbolizer --obj= 0xPC +nm -n | rg '<附近符号>' +``` + +如果只能到函数名,也不要停。函数级 + 本地源码通常已经足够做语义分析。 + +### 第四步:查看函数内分支上下文 + +```bash +llvm-objdump -d --line-numbers --source \ + --start-address= \ + --stop-address= \ + +``` + +重点看: + +- 比较指令前的 load / and / shift / compare +- branch 是: + - `beqz/bnez` + - `blt/bge` + - 循环回边 + - 早退条件 + - “刷新最大值”类选择分支 + +### 第五步:映射到本地源码块 + +如果 line table 不够稳定: + +- 用函数名在本地源码树里找定义 +- 再用汇编语义对到源码块 + +示例: + +```bash +rg -n '^.*\\bpush_slidE\\b\\s*\\(' /nfs/home/yanyue/tools/cpu2006_analyze/benchspec/CPU2006/458.sjeng/src/*.c +``` + +这一步的目标不是强行制造“精确某一行”,而是定位到: + +- 哪个函数 +- 哪个 `if/else/loop` +- 它的输入依赖是什么 + +### 第六步:判断分支类型 + +每条分支至少归到以下一种: + +- `loop-exit` +- `guard / fastpath` +- `predicate-result` +- `max/min update` +- `pointer/null/empty check` +- `state-machine / parser / regex` +- `runtime/helper` + +### 第七步:输出预测性结论 + +结论至少包含: + +- 该分支更像“结构型易预测”还是“语义型难预测” +- 如果 predictor 表现差,更该怀疑: + - predictor 模型 / 历史建模 / alias / 容量 + - 还是输入分布本身导致的不可规整 + +## 预测性判断准则 + +下面是默认启发式,不是绝对规则。 + +### 通常偏容易预测 + +- `for/while` 循环退出条件 +- 连续扫描直到边界/空值/哨兵值 +- 长度下界检查 +- 空指针 / 空格 / `npiece` / `frame` / null-check +- 稳定模式位,例如 `captures`、`mode`、`flag` 长时间不变 +- 明显偏置的错误路径 / 稀有路径 + +常见表现: + +- 连续若干次 taken,然后一次 not-taken +- 连续若干次 not-taken,然后一次 taken +- 同一 phase 下高度偏置 + +如果这类分支 mispredict 很高,更值得怀疑: + +- predictor 没学住简单结构 +- 同一 PC 混入太多上下文 +- 表项别名或容量冲突 + +### 通常更难预测 + +- regex / parser / symbol-table / search-state 驱动的判断 +- `if (value > best)` 这种“刷新最大值/最小值”类分支 +- 依赖 `load` 出来的动态值,再做分类/比较 +- 依赖输入真假分布的 filter / predicate 结果 +- 依赖多重全局状态的启发式判断 +- 匹配成功/失败、查表命中/未命中、搜索剪枝命中/未命中 + +常见表现: + +- 同一 PC 在不同 phase 下行为变化很大 +- taken ratio 接近中间值 +- 结果高度依赖输入内容或状态机位置 + +如果这类分支 mispredict 很高,不一定说明 predictor 有明显问题;可能是语义上本来就更难。 + +## 典型案例模板 + +### 案例 A:滑动子走子生成 + +类似: + +- `board[target] == npiece` +- `board[target] != frame` + +判断: + +- 这是典型扫描型分支 +- 通常结构规整,偏容易预测 +- 如果预测差,优先怀疑 predictor 没把 ray 长度/phase 模式学好 + +### 案例 B:搜索排序中的“刷新最大值” + +类似: + +- `if (move_ordering[i] > best)` + +判断: + +- 这是数据相关分支 +- 依赖 move ordering 分布 +- 比 loop-exit 明显更难 +- 预测差未必是 predictor bug + +### 案例 C:regex / match 成败 + +类似: + +- `if (!s) goto nope;` +- `if (CALLREGEXEC(...))` + +判断: + +- 强依赖输入文本、状态、匹配位置 +- 通常比长度检查更难预测 + +## 输出格式建议 + +对每个 branch PC,建议输出以下字段: + +- `pc` +- `benchmark` +- `elf` +- `belongs_to` + - `benchmark` + - `runtime/toolchain` + - `bbl/high-address` +- `function` +- `source_candidate` +- `semantic_pattern` +- `predictability` + - `easy` + - `medium` + - `hard` +- `why` +- `tage_interpretation` + - `more_like_predictor_issue` + - `more_like_semantically_hard` + - `mixed` + +## 使用时的注意事项 + +- 不要把“无法精确到行号”误判为“完全无法分析”。 +- 对 benchmark 主体,函数级定位 + 本地源码块通常已经足够做预测性判断。 +- 对 `0x8000xxxx` 这类地址,先排除 runtime/bbl,再谈源码。 +- 对 `libgcc/glibc` helper,要明确告诉用户:这不是 benchmark 自身算法分支。 +- 如果用户的目标是比较 predictor 设计优劣,优先找: + - 结构上本应好预测,但 mispredict 很高的分支 +- 如果用户的目标是解释 workload 本身难度,优先找: + - 语义上强数据相关的分支 + +## 默认结论风格 + +回答时优先给出: + +1. 该 PC 属于什么代码 +2. 它大致对应哪段源码逻辑 +3. 它属于哪种分支模式 +4. 我为什么判断它偏 easy / hard +5. 我更倾向把责任归到 predictor 还是 workload 语义 diff --git a/.codex/skills/frontend-pmu-analysis/SKILL.md b/.codex/skills/frontend-pmu-analysis/SKILL.md new file mode 100644 index 0000000000..605fbbe5e2 --- /dev/null +++ b/.codex/skills/frontend-pmu-analysis/SKILL.md @@ -0,0 +1,64 @@ +--- +name: frontend-pmu-analysis +description: "仅做 BPU 计数器提取与批量汇总(机器可读 JSON/CSV)。配置文件只需要写原始 stats 计数器名。" +--- + +# BPU 计数器分析技能(极简) + +## 何时使用 +- 你已经跑完 gem5,只想批量提取 BPU 相关原始计数器。 +- 你不想在脚本里做复杂推导,只要原始值。 +- 你需要机器可读结果给后续脚本/表格处理。 + +## 核心原则 +- 只提取原始 counter,不做公式推导。 +- 配置文件只写 counter 名称。 +- 目录结构不做强约束,递归扫描 `stats.txt`。 +- 如果发现某些分支预测错误特别高,还可以看`stats.txt` 同目录的`topMispredictsByBranch.csv`, 记录了哪些分支被错误预测很多。 +- 必要时候可以使用 `--enable-bp-db tage` 来打开tage 的trace db 分析。 + +## 入口脚本 +- `.codex/skills/frontend-pmu-analysis/scripts/analyze_bpu_counters.py` + +## 默认配置 +- `.codex/skills/frontend-pmu-analysis/configs/bpu_counters.txt` + +## 用法 +```bash +python3 .codex/skills/frontend-pmu-analysis/scripts/analyze_bpu_counters.py \ + --debug-dir /tmp/debug/tage-new8 +``` + +指定自定义计数器文件: + +```bash +python3 .codex/skills/frontend-pmu-analysis/scripts/analyze_bpu_counters.py \ + --debug-dir /tmp/debug/tage-new8 \ + --counters-file /path/to/my_counters.txt +``` + +## 输出 +- `bpu_counters_summary.json` +- `bpu_counters_summary.csv` + +输出字段只包含: +- case 路径 +- stats 路径 +- `values`(命中的计数器和值) +- `missing`(缺失计数器) +- `errors`(解析错误) + +## 计数器文件格式 +推荐 `txt`(每行一个): + +```txt +system.cpu.ipc +system.cpu.commit.branchMispredicts +system.cpu.commit.branches +``` + +如果要批量分析更多计数器,也可以添加到configs 的txt 中 + +也支持: +- `yaml`:`counters: [ ... ]` 或直接列表 +- `csv`:第一列或 `counter` 列为计数器名 diff --git a/.codex/skills/frontend-pmu-analysis/configs/bpu_counters.txt b/.codex/skills/frontend-pmu-analysis/configs/bpu_counters.txt new file mode 100644 index 0000000000..eb17457b9a --- /dev/null +++ b/.codex/skills/frontend-pmu-analysis/configs/bpu_counters.txt @@ -0,0 +1,28 @@ +# Keep only raw counter names. No formulas. +system.cpu.ipc +system.cpu.frontendBound +system.cpu.badSpecBound +system.cpu.backendBound +system.cpu.commit.branchMispredicts +system.cpu.commit.branches +system.cpu.branchPred.condMiss +system.cpu.branchPred.condNum +system.cpu.branchPred.predsOfEachStage::0 +system.cpu.branchPred.predsOfEachStage::2 +system.cpu.branchPred.overrideCount +system.cpu.branchPred.commitOverrideCount +system.cpu.branchPred.tage.updateAllocSuccess +system.cpu.branchPred.tage.updateAllocFailure +system.cpu.branchPred.tage.updateBankConflict +system.cpu.branchPred.tage.updateAccessPerBank::0 +system.cpu.branchPred.tage.updateAccessPerBank::1 +system.cpu.branchPred.tage.updateAccessPerBank::2 +system.cpu.branchPred.tage.updateAccessPerBank::3 +system.cpu.branchPred.ittage.commitPredCorrect +system.cpu.branchPred.ittage.commitPredWrong +system.cpu.branchPred.ubtb.predHit +system.cpu.branchPred.ubtb.predMiss +system.cpu.branchPred.abtb.predHit +system.cpu.branchPred.abtb.predMiss +system.cpu.branchPred.mbtb.predHit +system.cpu.branchPred.mbtb.predMiss diff --git a/.codex/skills/frontend-pmu-analysis/scripts/analyze_bpu_counters.py b/.codex/skills/frontend-pmu-analysis/scripts/analyze_bpu_counters.py new file mode 100644 index 0000000000..36aaf6b29f --- /dev/null +++ b/.codex/skills/frontend-pmu-analysis/scripts/analyze_bpu_counters.py @@ -0,0 +1,243 @@ +#!/usr/bin/env python3 + +from __future__ import annotations + +import argparse +import concurrent.futures +import csv +import json +from dataclasses import dataclass +from datetime import datetime, timezone +from pathlib import Path +from typing import Dict, List, Tuple + +BEGIN = "---------- Begin Simulation Statistics ----------" +END = "---------- End Simulation Statistics ----------" +DEFAULT_COUNTERS = Path(__file__).resolve().parent.parent / "configs" / "bpu_counters.txt" + + +@dataclass +class CaseRecord: + case_path: str + stats_path: str + values: Dict[str, float] + missing: List[str] + errors: List[str] + + +def now_iso() -> str: + return datetime.now(timezone.utc).isoformat() + + +def parse_last_stats_block(path: Path) -> Dict[str, float]: + lines = path.read_text(encoding="utf-8", errors="ignore").splitlines() + blocks: List[List[str]] = [] + in_block = False + current: List[str] = [] + + for line in lines: + stripped = line.strip() + if stripped == BEGIN: + in_block = True + current = [] + continue + if stripped == END and in_block: + blocks.append(current) + in_block = False + continue + if in_block: + current.append(line) + + target = blocks[-1] if blocks else lines + stats: Dict[str, float] = {} + for line in target: + if not line or line.startswith("-"): + continue + parts = line.split() + if len(parts) < 2: + continue + key, value = parts[0], parts[1] + try: + stats[key] = float(value) + except ValueError: + continue + return stats + + +def load_counters(path: Path) -> List[str]: + suffix = path.suffix.lower() + if suffix in {".txt", ""}: + counters = [line.strip() for line in path.read_text(encoding="utf-8").splitlines()] + counters = [c for c in counters if c and not c.startswith("#")] + if not counters: + raise ValueError(f"no counters found in {path}") + return counters + + if suffix in {".yml", ".yaml"}: + import yaml + + payload = yaml.safe_load(path.read_text(encoding="utf-8")) + if isinstance(payload, list): + counters = [str(x).strip() for x in payload if str(x).strip()] + elif isinstance(payload, dict): + raw = payload.get("counters", []) + counters = [str(x).strip() for x in raw if str(x).strip()] + else: + raise ValueError("yaml must be list or object with counters") + if not counters: + raise ValueError(f"no counters found in {path}") + return counters + + if suffix == ".csv": + counters: List[str] = [] + with path.open(encoding="utf-8", newline="") as fp: + reader = csv.DictReader(fp) + if reader.fieldnames is None: + raise ValueError(f"invalid csv with no header: {path}") + column = "counter" if "counter" in reader.fieldnames else reader.fieldnames[0] + for row in reader: + value = str(row.get(column, "")).strip() + if value: + counters.append(value) + if not counters: + raise ValueError(f"no counters found in {path}") + return counters + + raise ValueError("counter file must be .txt/.yml/.yaml/.csv") + + +def analyze_one(stats_path: Path, debug_dir: Path, counters: List[str]) -> CaseRecord: + case_rel = stats_path.parent.relative_to(debug_dir) + record = CaseRecord( + case_path=str(case_rel), + stats_path=str(stats_path), + values={}, + missing=[], + errors=[], + ) + + try: + stats = parse_last_stats_block(stats_path) + except Exception as exc: + record.errors.append(f"parse stats failed: {exc}") + return record + + values: Dict[str, float] = {} + missing: List[str] = [] + for counter in counters: + if counter in stats: + values[counter] = stats[counter] + else: + missing.append(counter) + + record.values = values + record.missing = missing + return record + + +def write_outputs(debug_dir: Path, counters_file: Path, counters: List[str], + records: List[CaseRecord]) -> Tuple[Path, Path]: + summary_json = debug_dir / "bpu_counters_summary.json" + summary_csv = debug_dir / "bpu_counters_summary.csv" + + payload = { + "generated_at": now_iso(), + "debug_dir": str(debug_dir), + "counters_file": str(counters_file), + "counters": counters, + "cases": [ + { + "case_path": r.case_path, + "stats_path": r.stats_path, + "values": r.values, + "missing": r.missing, + "errors": r.errors, + } + for r in sorted(records, key=lambda x: x.case_path) + ], + } + summary_json.write_text(json.dumps(payload, indent=2, ensure_ascii=False) + "\n", encoding="utf-8") + + headers = ["case_path", "stats_path", "missing_count", "error_count", *counters] + with summary_csv.open("w", encoding="utf-8", newline="") as fp: + writer = csv.DictWriter(fp, fieldnames=headers) + writer.writeheader() + for record in sorted(records, key=lambda x: x.case_path): + row = { + "case_path": record.case_path, + "stats_path": record.stats_path, + "missing_count": len(record.missing), + "error_count": len(record.errors), + } + for counter in counters: + row[counter] = record.values.get(counter, "") + writer.writerow(row) + + return summary_json, summary_csv + + +def build_parser() -> argparse.ArgumentParser: + parser = argparse.ArgumentParser(description="Extract raw BPU counters from gem5 stats.txt") + parser.add_argument("--debug-dir", type=str, required=True, help="Root directory to scan") + parser.add_argument( + "--counters-file", + type=str, + default=str(DEFAULT_COUNTERS), + help="Counter list file (.txt/.yml/.yaml/.csv)", + ) + parser.add_argument( + "--stats-glob", + type=str, + default="**/stats.txt", + help="Glob under debug-dir to find stats files", + ) + parser.add_argument("--max-workers", type=int, default=8) + return parser + + +def main() -> int: + args = build_parser().parse_args() + + debug_dir = Path(args.debug_dir).resolve() + counters_file = Path(args.counters_file).resolve() + + if not debug_dir.exists(): + raise FileNotFoundError(f"debug dir not found: {debug_dir}") + if not counters_file.is_file(): + raise FileNotFoundError(f"counters file not found: {counters_file}") + + counters = load_counters(counters_file) + stats_files = sorted(debug_dir.glob(args.stats_glob)) + + records: List[CaseRecord] = [] + with concurrent.futures.ThreadPoolExecutor(max_workers=args.max_workers) as executor: + future_map = { + executor.submit(analyze_one, stats_path, debug_dir, counters): stats_path + for stats_path in stats_files + if stats_path.is_file() + } + for future in concurrent.futures.as_completed(future_map): + stats_path = future_map[future] + try: + records.append(future.result()) + except Exception as exc: + case_rel = stats_path.parent.relative_to(debug_dir) + records.append( + CaseRecord( + case_path=str(case_rel), + stats_path=str(stats_path), + values={}, + missing=counters, + errors=[f"unhandled analysis exception: {exc}"], + ) + ) + + summary_json, summary_csv = write_outputs(debug_dir, counters_file, counters, records) + print(f"wrote: {summary_json}") + print(f"wrote: {summary_csv}") + print(f"stats files: {len(records)}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/.codex/skills/mgsc-table-probe/SKILL.md b/.codex/skills/mgsc-table-probe/SKILL.md new file mode 100644 index 0000000000..b9f7b2e0af --- /dev/null +++ b/.codex/skills/mgsc-table-probe/SKILL.md @@ -0,0 +1,86 @@ +--- +name: mgsc-table-probe +description: 分析香山 MGSC/SC 在前端微测试上的效果。适用于以下场景:(1) 用 off/l_only/g_only/i_only/full 等 A/B profile 批量运行 mgsc_test;(2) 比较不同 profile 下的 topMispredictsByBranch.csv 和 stats.txt;(3) 使用 bp.db 里的 MGSCTRACE 将每个分支的收益/损失归因到具体 SC 表;(4) 决定如何为 Global 或 IMLI 表设计新的测试。 +--- + +# MGSC 表探测 + +## 概览 +运行标准化的 SC 表 A/B 实验,并产出分支级别的归因结果: +- `summary.csv`:case 级别的 `off` 与各 profile 的 delta 汇总。 +- `branch_delta.csv`:分支级别的误预测 delta,以及 SC 修复/伤害与表贡献比例。 +- `report.md`:便于人工快速阅读的排序报告。 + +这个 skill 的目标是快速迭代 SC 测试质量,而不是做完整的性能调优。 + +## 快速开始 + +1) 对所有 `mgsc_test` 二进制执行 probe: +```bash +python3 .codex/skills/mgsc-table-probe/scripts/mgsc_table_probe.py \ + --outdir debug/sc_table_probe \ + --profiles off,l_only,g_only,i_only,full \ + --max-workers 4 +``` + +2) 快速检查单个 case: +```bash +python3 .codex/skills/mgsc-table-probe/scripts/mgsc_table_probe.py \ + --outdir debug/sc_table_probe_smoke \ + --tests fp_sc_alias_pair \ + --profiles off,g_only,i_only \ + --max-workers 1 +``` + +3) 仅重建报告(不重新运行 gem5): +```bash +python3 .codex/skills/mgsc-table-probe/scripts/mgsc_table_probe.py \ + --outdir debug/sc_table_probe \ + --profiles off,l_only,g_only,i_only,full \ + --skip-run +``` + +## 工作流 + +1) **基线 + 单表隔离 profile** +- 始终包含 `off`。 +- 在检查 `full` 之前,先加入单表 profile(如 `g_only`、`i_only`)。 + +2) **快速筛选有价值的测试** +- 在 `summary.csv` 中,优先关注满足 `condMiss_delta < 0` 且 `mgsc_net_use > 0` 的 case。 + +3) **筛选有价值的分支** +- 在 `branch_delta.csv` 中,优先关注满足以下条件的行: + - `delta_misp < 0` + - `focus_decisive_ratio` 较高 + - `focus_agree_fix_ratio` 较高 + +4) **决定下一步微测试方向** +- 如果 `g_only` 很少带来改善,且 `focus_decisive_ratio(g)` 很低,说明 global-history 模式较弱。 +- 如果 `i_only` 从未带来帮助,说明循环/迭代相位信号暴露得还不够。 +- 可以参考 `references/test-patterns.md` 里的模式来编写下一个测试。 + +## 输出 + +- `debug/sc_table_probe/summary.csv` +- `debug/sc_table_probe/branch_delta.csv` +- `debug/sc_table_probe/report.md` +- `debug/sc_table_probe/report.json` + +## 关键参数 + +- `--profiles`:从 `off,l_only,g_only,i_only,full` 中选择。 +- `--tests`:逗号分隔的测试名(不带后缀),例如 `fp_sc_alias_pair,imli_iter`。 +- `--extra-param`:透传额外的 gem5 `--param`。 +- `--copy-cpt-to-tmp`:避免路径访问问题。 +- `--skip-run`:仅生成报告,不执行运行。 + +## 注意事项 + +- 除非你明确要评估交互效应,否则在做 SC 子表归因时应保持 `microtage` 关闭。 +- 不同 profile 之间必须使用同一组 checkpoint,否则 delta 无效。 +- 做 branch PC 映射时,使用 mgsc_test 构建目录下的 `*-riscv64-xs.txt` 反汇编文件。 + +## 参考资料 + +- 关于面向 G/IMLI 的微测试模式,参见 `references/test-patterns.md`。 diff --git a/.codex/skills/mgsc-table-probe/agents/openai.yaml b/.codex/skills/mgsc-table-probe/agents/openai.yaml new file mode 100644 index 0000000000..1dc68f3ace --- /dev/null +++ b/.codex/skills/mgsc-table-probe/agents/openai.yaml @@ -0,0 +1,4 @@ +interface: + display_name: "MGSC Table Probe" + short_description: "Batch analyze SC sub-table effects and branch-level fixes" + default_prompt: "Use this skill to run mgsc_test A/B profiles (off/l_only/g_only/i_only/full), parse topMispredictsByBranch.csv and MGSCTRACE, and identify which SC table fixes TAGE on which branch." diff --git a/.codex/skills/mgsc-table-probe/references/test-patterns.md b/.codex/skills/mgsc-table-probe/references/test-patterns.md new file mode 100644 index 0000000000..5bdeaa6ef2 --- /dev/null +++ b/.codex/skills/mgsc-table-probe/references/test-patterns.md @@ -0,0 +1,51 @@ +# SC 测试模式(G / IMLI) + +## 目标 +构造这样的微测试:TAGE 对其较弱,但某个特定 SC 表可以修正分支方向。 + +## GTable(全局历史)模式 + +当分支结果依赖于近期跨分支结果,而不是纯粹的局部周期模式时,使用这类模式。 + +- 保持一个目标分支 PC 稳定不变。 +- 在目标分支前加入 2 到 4 个 feeder 分支。 +- 让目标分支的方向依赖于前几次迭代中的 feeder 结果。 +- 注入低幅度噪声分支,刻意压低 TAGE 的置信度。 + +示例思路: +- `b0` period 3, `b1` period 5, `b2` period 7. +- Target: `t = last_b0 ^ last_b1 ^ (b2_now & 1)`. +- 预期:`g_only` 可能改善一部分 PC;`l_only` 则未必有效。 + +## ITable(IMLI)模式 + +当分支方向依赖于循环迭代相位时使用,尤其是依赖 backward-taken 次数的场景。 + +- 使用固定的循环次数,例如 16/24/32。 +- 让循环内部的某个分支始终位于相同的静态 PC。 +- 让某个特定相位分支在循环尾部或头部附近发生翻转。 +- 可以选择交替切换外层相位,以移动翻转位置。 + +示例思路: +- 对于每次外层循环: + - 内层 `i in [0, 31]`。 + - 目标分支仅在 `i == 17` 时 taken(或者按相位在 `i == 17/18` 时 taken)。 +- 预期:如果相位信号暴露充分且稳定,`i_only` 应该会有收益。 + +## 新微测试的验收标准 + +如果满足以下全部条件,则这个测试是有价值的: + +1. `summary.csv` 中:目标 profile 满足 `condMiss_delta < 0`。 +2. `branch_delta.csv` 中:至少有一个热点分支满足 `delta_misp < 0`。 +3. 同一个热点分支的 `net_use` 为正。 +4. 该分支上的目标表指标具有实际意义: + - `focus_decisive_ratio` 不能接近 0。 + - `focus_agree_fix_ratio` 应该足够高。 + +## 常见失败模式 + +- 分支完全是局部周期性的 -> LTable 占主导,G/I 的效果很难体现。 +- 随机性过强 -> 所有表都退化,没有稳定收益。 +- 目标分支不够热 -> 统计噪声太大。 +- 混入多个 branch PC -> 归因会变得模糊。 diff --git a/.codex/skills/mgsc-table-probe/scripts/mgsc_table_probe.py b/.codex/skills/mgsc-table-probe/scripts/mgsc_table_probe.py new file mode 100644 index 0000000000..5e7af5eed4 --- /dev/null +++ b/.codex/skills/mgsc-table-probe/scripts/mgsc_table_probe.py @@ -0,0 +1,589 @@ +#!/usr/bin/env python3 +"""Probe SC table effectiveness on mgsc_test workloads. + +This script helps answer: +1) Which existing micro-tests are sensitive to SC (vs SC off)? +2) For a target table (e.g., G / IMLI), can that table alone improve mispredicts? +3) For improved branches, does MGSCTRACE indicate SC is fixing TAGE mistakes? + +Typical usage: + python3 .codex/skills/mgsc-table-probe/scripts/mgsc_table_probe.py \ + --outdir debug/sc_table_probe \ + --profiles off,l_only,g_only,i_only,full \ + --max-workers 4 +""" + +from __future__ import annotations + +import argparse +import csv +import dataclasses +import json +import shutil +import sqlite3 +import subprocess +from concurrent.futures import ThreadPoolExecutor, as_completed +from pathlib import Path +from typing import Dict, Iterable, List, Optional, Tuple + + +REPO_ROOT = Path(__file__).resolve().parents[1] +DEFAULT_GEM5 = REPO_ROOT / "build" / "RISCV" / "gem5.opt" +DEFAULT_CONFIG = REPO_ROOT / "configs" / "example" / "kmhv3.py" +DEFAULT_CPT_DIR = Path("/nfs/home/yanyue/tools/nexus-am/tests/frontendtest/mgsc_test/build") +DEFAULT_SRC_DIR = Path("/nfs/home/yanyue/tools/nexus-am/tests/frontendtest/mgsc_test/tests") + +TOP_CSV = "topMispredictsByBranch.csv" +STATS_TXT = "stats.txt" +BP_DB = "bp.db" + +TABLE_COLS = { + "bw": "bwPercsum", + "l": "lPercsum", + "i": "iPercsum", + "g": "gPercsum", + "p": "pPercsum", + "bias": "biasPercsum", +} + + +@dataclasses.dataclass(frozen=True) +class Profile: + name: str + params: Tuple[str, ...] + focus_table: Optional[str] + enable_db: bool = True + + +@dataclasses.dataclass +class Case: + name: str + bin_path: Path + disasm_path: Optional[Path] + src_path: Optional[Path] + + +@dataclasses.dataclass +class RunResult: + case: Case + profile: Profile + run_dir: Path + ok: bool + cmd: List[str] + stats: Dict[str, float] + top: Dict[int, Dict[str, float]] + db_overall: Dict[str, float] + db_by_pc: Dict[int, Dict[str, float]] + error: str = "" + + +def parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser(description="SC table probe harness") + parser.add_argument("--gem5-bin", default=str(DEFAULT_GEM5)) + parser.add_argument("--config", default=str(DEFAULT_CONFIG)) + parser.add_argument("--cpt-dir", default=str(DEFAULT_CPT_DIR)) + parser.add_argument("--src-dir", default=str(DEFAULT_SRC_DIR)) + parser.add_argument("--outdir", default=str(REPO_ROOT / "debug" / "sc_table_probe")) + parser.add_argument( + "--profiles", + default="off,l_only,g_only,i_only,full", + help="Comma separated profile names", + ) + parser.add_argument("--tests", default="", help="Comma separated test names, empty means all") + parser.add_argument("--extra-param", action="append", default=[]) + parser.add_argument("--max-workers", type=int, default=1) + parser.add_argument("--skip-run", action="store_true", help="Reuse existing outdir results") + parser.add_argument("--copy-cpt-to-tmp", action="store_true", default=True) + parser.add_argument("--no-copy-cpt-to-tmp", action="store_false", dest="copy_cpt_to_tmp") + parser.add_argument("--top-branch-limit", type=int, default=200) + return parser.parse_args() + + +def builtin_profiles() -> Dict[str, Profile]: + return { + "off": Profile( + name="off", + params=( + "system.cpu[0].branchPred.mgsc.enabled=False", + "system.cpu[0].branchPred.microtage.enabled=False", + ), + focus_table=None, + enable_db=False, + ), + "full": Profile( + name="full", + params=( + "system.cpu[0].branchPred.mgsc.enabled=True", + "system.cpu[0].branchPred.mgsc.enableBwTable=True", + "system.cpu[0].branchPred.mgsc.enableLTable=True", + "system.cpu[0].branchPred.mgsc.enableITable=True", + "system.cpu[0].branchPred.mgsc.enableGTable=True", + "system.cpu[0].branchPred.mgsc.enablePTable=True", + "system.cpu[0].branchPred.mgsc.enableBiasTable=True", + "system.cpu[0].branchPred.microtage.enabled=False", + ), + focus_table=None, + ), + "l_only": Profile( + name="l_only", + params=( + "system.cpu[0].branchPred.mgsc.enabled=True", + "system.cpu[0].branchPred.mgsc.enableBwTable=False", + "system.cpu[0].branchPred.mgsc.enableLTable=True", + "system.cpu[0].branchPred.mgsc.enableITable=False", + "system.cpu[0].branchPred.mgsc.enableGTable=False", + "system.cpu[0].branchPred.mgsc.enablePTable=False", + "system.cpu[0].branchPred.mgsc.enableBiasTable=False", + "system.cpu[0].branchPred.microtage.enabled=False", + ), + focus_table="l", + ), + "g_only": Profile( + name="g_only", + params=( + "system.cpu[0].branchPred.mgsc.enabled=True", + "system.cpu[0].branchPred.mgsc.enableBwTable=False", + "system.cpu[0].branchPred.mgsc.enableLTable=False", + "system.cpu[0].branchPred.mgsc.enableITable=False", + "system.cpu[0].branchPred.mgsc.enableGTable=True", + "system.cpu[0].branchPred.mgsc.enablePTable=False", + "system.cpu[0].branchPred.mgsc.enableBiasTable=False", + "system.cpu[0].branchPred.microtage.enabled=False", + ), + focus_table="g", + ), + "i_only": Profile( + name="i_only", + params=( + "system.cpu[0].branchPred.mgsc.enabled=True", + "system.cpu[0].branchPred.mgsc.enableBwTable=False", + "system.cpu[0].branchPred.mgsc.enableLTable=False", + "system.cpu[0].branchPred.mgsc.enableITable=True", + "system.cpu[0].branchPred.mgsc.enableGTable=False", + "system.cpu[0].branchPred.mgsc.enablePTable=False", + "system.cpu[0].branchPred.mgsc.enableBiasTable=False", + "system.cpu[0].branchPred.microtage.enabled=False", + ), + focus_table="i", + ), + } + + +def parse_hex_or_int(v: str) -> int: + s = v.strip().lower() + if not s: + return 0 + if s.startswith("0x"): + return int(s, 16) + if any(ch in "abcdef" for ch in s): + return int(s, 16) + return int(s, 10) + + +def parse_stats(path: Path) -> Dict[str, float]: + keys = { + "system.cpu.ipc", + "system.cpu.fetch.rate", + "system.cpu.branchPred.condNum", + "system.cpu.branchPred.condMiss", + "system.cpu.commit.branchMispredicts", + "system.cpu.branchPred.mgsc.scUsed", + "system.cpu.branchPred.mgsc.scCorrectTageWrong", + "system.cpu.branchPred.mgsc.scWrongTageCorrect", + "simTicks", + } + out: Dict[str, float] = {} + if not path.exists(): + return out + for line in path.read_text(encoding="utf-8", errors="ignore").splitlines(): + parts = line.split() + if len(parts) < 2: + continue + if parts[0] not in keys: + continue + try: + out[parts[0]] = float(parts[1]) + except ValueError: + continue + return out + + +def parse_top_csv(path: Path, limit: int) -> Dict[int, Dict[str, float]]: + out: Dict[int, Dict[str, float]] = {} + if not path.exists(): + return out + with path.open(encoding="utf-8", newline="") as fp: + rows = list(csv.DictReader(fp)) + for row in rows[:limit]: + try: + pc_text = (row.get("pc", "") or "").strip() + # topMispredictsByBranch.csv stores PC in hex form without "0x" prefix. + pc = int(pc_text, 16) if pc_text else 0 + out[pc] = { + "mispredicts": float(row.get("mispredicts", 0)), + "total": float(row.get("total", 0)), + "misPermil": float(row.get("misPermil", 0)), + "dirMiss": float(row.get("dirMiss", 0)), + } + except (ValueError, TypeError): + continue + return out + + +def pct(old: float, new: float) -> float: + if old == 0: + return 0.0 + return (new - old) / old * 100.0 + + +def query_mgsc_db(db_path: Path) -> Tuple[Dict[str, float], Dict[int, Dict[str, float]]]: + if not db_path.exists(): + return {}, {} + con = sqlite3.connect(str(db_path)) + cur = con.cursor() + cur.execute("PRAGMA temp_store=MEMORY") + + overall_row = cur.execute( + """ + SELECT + COUNT(*) AS rows, + SUM(CASE WHEN useSc=1 THEN 1 ELSE 0 END) AS use_sc_rows, + SUM(CASE WHEN useSc=1 AND tagePred!=actualTaken AND scPred=actualTaken THEN 1 ELSE 0 END) AS fix_use, + SUM(CASE WHEN useSc=1 AND tagePred=actualTaken AND scPred!=actualTaken THEN 1 ELSE 0 END) AS hurt_use + FROM MGSCTRACE + """ + ).fetchone() + overall = { + "rows": float(overall_row[0] or 0), + "use_sc_rows": float(overall_row[1] or 0), + "fix_use": float(overall_row[2] or 0), + "hurt_use": float(overall_row[3] or 0), + } + overall["net_use"] = overall["fix_use"] - overall["hurt_use"] + + select_cols = [ + "branchPC", + "COUNT(*) AS rows", + "SUM(CASE WHEN useSc=1 THEN 1 ELSE 0 END) AS use_sc", + "SUM(CASE WHEN useSc=1 AND tagePred!=actualTaken AND scPred=actualTaken THEN 1 ELSE 0 END) AS fix_use", + "SUM(CASE WHEN useSc=1 AND tagePred=actualTaken AND scPred!=actualTaken THEN 1 ELSE 0 END) AS hurt_use", + ] + for short, col in TABLE_COLS.items(): + select_cols.append( + f"SUM(CASE WHEN useSc=1 AND ((totalSum>=0) != ((totalSum - {col})>=0)) THEN 1 ELSE 0 END) AS {short}_decisive" + ) + select_cols.append( + f"SUM(CASE WHEN useSc=1 AND tagePred!=actualTaken AND scPred=actualTaken " + f"AND (({col}>=0)=actualTaken) THEN 1 ELSE 0 END) AS {short}_agree_fix" + ) + + rows = cur.execute( + f""" + SELECT {", ".join(select_cols)} + FROM MGSCTRACE + GROUP BY branchPC + """ + ).fetchall() + con.close() + + by_pc: Dict[int, Dict[str, float]] = {} + for row in rows: + idx = 0 + pc = int(row[idx]); idx += 1 + rows_cnt = float(row[idx] or 0); idx += 1 + use_sc = float(row[idx] or 0); idx += 1 + fix_use = float(row[idx] or 0); idx += 1 + hurt_use = float(row[idx] or 0); idx += 1 + + ent: Dict[str, float] = { + "rows": rows_cnt, + "use_sc": use_sc, + "fix_use": fix_use, + "hurt_use": hurt_use, + "net_use": fix_use - hurt_use, + } + for short in TABLE_COLS: + decisive = float(row[idx] or 0); idx += 1 + agree_fix = float(row[idx] or 0); idx += 1 + ent[f"{short}_decisive"] = decisive + ent[f"{short}_agree_fix"] = agree_fix + ent[f"{short}_decisive_ratio"] = decisive / use_sc if use_sc else 0.0 + ent[f"{short}_agree_fix_ratio"] = agree_fix / fix_use if fix_use else 0.0 + by_pc[pc] = ent + return overall, by_pc + + +def discover_cases(cpt_dir: Path, src_dir: Path, selected: Optional[Iterable[str]]) -> List[Case]: + allow = set(selected) if selected else None + cases: List[Case] = [] + for bin_path in sorted(cpt_dir.glob("*-riscv64-xs.bin")): + stem = bin_path.name.replace("-riscv64-xs.bin", "") + if allow is not None and stem not in allow: + continue + disasm = cpt_dir / f"{stem}-riscv64-xs.txt" + src = src_dir / f"{stem}.c" + cases.append( + Case( + name=stem, + bin_path=bin_path, + disasm_path=disasm if disasm.exists() else None, + src_path=src if src.exists() else None, + ) + ) + return cases + + +def maybe_copy_to_tmp(case: Case, run_dir: Path) -> Path: + tmp_path = Path("/tmp") / f"{case.name}-riscv64-xs.bin" + shutil.copy2(case.bin_path, tmp_path) + return tmp_path + + +def run_one( + case: Case, + profile: Profile, + args: argparse.Namespace, + outdir: Path, +) -> RunResult: + run_dir = outdir / profile.name / case.name + run_dir.mkdir(parents=True, exist_ok=True) + cmd = [ + str(Path(args.gem5_bin)), + "--outdir", + str(run_dir), + str(Path(args.config)), + "--raw-cpt", + ] + cpt_path = maybe_copy_to_tmp(case, run_dir) if args.copy_cpt_to_tmp else case.bin_path + cmd.extend(["--generic-rv-cpt", str(cpt_path)]) + if profile.enable_db: + cmd.extend(["--enable-bp-db", "mgsc"]) + for p in profile.params: + cmd.extend(["--param", p]) + for p in args.extra_param: + cmd.extend(["--param", p]) + + ok = True + err = "" + if not args.skip_run: + stdout = (run_dir / "gem5.stdout").open("w", encoding="utf-8") + stderr = (run_dir / "gem5.stderr").open("w", encoding="utf-8") + try: + proc = subprocess.run(cmd, stdout=stdout, stderr=stderr, text=True) + ok = proc.returncode == 0 + if not ok: + err = f"returncode={proc.returncode}" + finally: + stdout.close() + stderr.close() + else: + ok = (run_dir / STATS_TXT).exists() + if not ok: + err = "skip-run but stats not found" + + stats = parse_stats(run_dir / STATS_TXT) + top = parse_top_csv(run_dir / TOP_CSV, args.top_branch_limit) + db_overall, db_by_pc = query_mgsc_db(run_dir / BP_DB) if profile.enable_db else ({}, {}) + return RunResult( + case=case, + profile=profile, + run_dir=run_dir, + ok=ok, + cmd=cmd, + stats=stats, + top=top, + db_overall=db_overall, + db_by_pc=db_by_pc, + error=err, + ) + + +def build_reports(results: List[RunResult], profiles: List[Profile], outdir: Path) -> None: + baseline: Dict[str, RunResult] = {} + for r in results: + if r.profile.name == "off": + baseline[r.case.name] = r + + summary_rows: List[Dict[str, object]] = [] + branch_rows: List[Dict[str, object]] = [] + + for r in results: + base = baseline.get(r.case.name) + off_cond_miss = base.stats.get("system.cpu.branchPred.condMiss", 0.0) if base else 0.0 + on_cond_miss = r.stats.get("system.cpu.branchPred.condMiss", 0.0) + off_cond_num = base.stats.get("system.cpu.branchPred.condNum", 0.0) if base else 0.0 + on_cond_num = r.stats.get("system.cpu.branchPred.condNum", 0.0) + off_rate = off_cond_miss / off_cond_num if off_cond_num else 0.0 + on_rate = on_cond_miss / on_cond_num if on_cond_num else 0.0 + + summary_rows.append( + { + "case": r.case.name, + "profile": r.profile.name, + "ok": int(r.ok), + "off_condMiss": off_cond_miss, + "on_condMiss": on_cond_miss, + "condMiss_delta": on_cond_miss - off_cond_miss, + "off_condMissRate": off_rate, + "on_condMissRate": on_rate, + "condMissRate_delta_pct": pct(off_rate, on_rate), + "off_branchMisp": base.stats.get("system.cpu.commit.branchMispredicts", 0.0) if base else 0.0, + "on_branchMisp": r.stats.get("system.cpu.commit.branchMispredicts", 0.0), + "mgsc_fix_use": r.db_overall.get("fix_use", 0.0), + "mgsc_hurt_use": r.db_overall.get("hurt_use", 0.0), + "mgsc_net_use": r.db_overall.get("net_use", 0.0), + "source": str(r.case.src_path) if r.case.src_path else "", + } + ) + + if base is None or r.profile.name == "off": + continue + pcs = set(base.top.keys()) | set(r.top.keys()) + for pc in sorted(pcs): + off = base.top.get(pc, {}) + on = r.top.get(pc, {}) + off_m = float(off.get("mispredicts", 0.0)) + on_m = float(on.get("mispredicts", 0.0)) + db = r.db_by_pc.get(pc, {}) + row = { + "case": r.case.name, + "profile": r.profile.name, + "pc_hex": f"0x{pc:x}", + "off_misp": off_m, + "on_misp": on_m, + "delta_misp": on_m - off_m, + "off_total": float(off.get("total", 0.0)), + "on_total": float(on.get("total", 0.0)), + "fix_use": db.get("fix_use", 0.0), + "hurt_use": db.get("hurt_use", 0.0), + "net_use": db.get("net_use", 0.0), + "use_sc": db.get("use_sc", 0.0), + } + for short in TABLE_COLS: + row[f"{short}_decisive_ratio"] = db.get(f"{short}_decisive_ratio", 0.0) + row[f"{short}_agree_fix_ratio"] = db.get(f"{short}_agree_fix_ratio", 0.0) + if r.profile.focus_table: + focus = r.profile.focus_table + row["focus_table"] = focus + row["focus_decisive_ratio"] = row[f"{focus}_decisive_ratio"] + row["focus_agree_fix_ratio"] = row[f"{focus}_agree_fix_ratio"] + else: + row["focus_table"] = "" + row["focus_decisive_ratio"] = 0.0 + row["focus_agree_fix_ratio"] = 0.0 + branch_rows.append(row) + + summary_csv = outdir / "summary.csv" + branch_csv = outdir / "branch_delta.csv" + write_csv(summary_csv, summary_rows) + write_csv(branch_csv, branch_rows) + + md_lines = render_markdown(summary_rows, branch_rows, profiles) + (outdir / "report.md").write_text("\n".join(md_lines), encoding="utf-8") + (outdir / "report.json").write_text( + json.dumps({"summary": summary_rows, "branch_delta": branch_rows}, indent=2), + encoding="utf-8", + ) + + +def write_csv(path: Path, rows: List[Dict[str, object]]) -> None: + if not rows: + path.write_text("", encoding="utf-8") + return + keys = list(rows[0].keys()) + with path.open("w", encoding="utf-8", newline="") as fp: + writer = csv.DictWriter(fp, fieldnames=keys) + writer.writeheader() + writer.writerows(rows) + + +def render_markdown( + summary_rows: List[Dict[str, object]], + branch_rows: List[Dict[str, object]], + profiles: List[Profile], +) -> List[str]: + lines: List[str] = [] + lines.append("# SC Table Probe Report") + lines.append("") + lines.append("## Profiles") + lines.append("") + for p in profiles: + focus = p.focus_table if p.focus_table else "-" + lines.append(f"- `{p.name}`: focus={focus}, db={'on' if p.enable_db else 'off'}") + lines.append("") + + lines.append("## Overall (sorted by condMiss reduction)") + lines.append("") + lines.append("| case | profile | off condMiss | on condMiss | delta | off rate | on rate | delta% | net_use |") + lines.append("| --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |") + sorted_rows = sorted(summary_rows, key=lambda x: float(x.get("condMiss_delta", 0.0))) + for r in sorted_rows[:80]: + lines.append( + f"| {r['case']} | {r['profile']} | {r['off_condMiss']:.0f} | {r['on_condMiss']:.0f} | " + f"{r['condMiss_delta']:.0f} | {r['off_condMissRate']:.4f} | {r['on_condMissRate']:.4f} | " + f"{r['condMissRate_delta_pct']:+.2f}% | {r['mgsc_net_use']:.0f} |" + ) + lines.append("") + + lines.append("## G / I candidate branches (best improvements)") + lines.append("") + lines.append("| case | profile | pc | off misp | on misp | delta | net_use | focus decisive | focus agree_fix |") + lines.append("| --- | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: |") + focus_rows = [r for r in branch_rows if r.get("focus_table") in {"g", "i"} and r["off_misp"] >= 50] + focus_rows.sort(key=lambda x: float(x["delta_misp"])) + for r in focus_rows[:80]: + lines.append( + f"| {r['case']} | {r['profile']} | {r['pc_hex']} | {r['off_misp']:.0f} | {r['on_misp']:.0f} | " + f"{r['delta_misp']:.0f} | {r['net_use']:.0f} | {r['focus_decisive_ratio']:.3f} | " + f"{r['focus_agree_fix_ratio']:.3f} |" + ) + lines.append("") + lines.append("Interpretation tips:") + lines.append("- `delta<0` means SC profile improves that branch against `off`.") + lines.append("- High `focus_decisive_ratio` means the focus table often changes SC final sign.") + lines.append("- High `focus_agree_fix_ratio` means focus table sign aligns with real outcome on SC-fix events.") + return lines + + +def main() -> int: + args = parse_args() + outdir = Path(args.outdir) + outdir.mkdir(parents=True, exist_ok=True) + + builtins = builtin_profiles() + profile_names = [x.strip() for x in args.profiles.split(",") if x.strip()] + profiles: List[Profile] = [] + for name in profile_names: + if name not in builtins: + raise ValueError(f"Unknown profile: {name}. choose from {sorted(builtins)}") + profiles.append(builtins[name]) + if "off" not in {p.name for p in profiles}: + profiles.insert(0, builtins["off"]) + + selected = [x.strip() for x in args.tests.split(",") if x.strip()] or None + cases = discover_cases(Path(args.cpt_dir), Path(args.src_dir), selected) + if not cases: + print("No test cases found.") + return 1 + + tasks = [(case, profile) for case in cases for profile in profiles] + results: List[RunResult] = [] + with ThreadPoolExecutor(max_workers=max(1, args.max_workers)) as ex: + futures = [ + ex.submit(run_one, case=case, profile=profile, args=args, outdir=outdir) + for case, profile in tasks + ] + for fut in as_completed(futures): + res = fut.result() + results.append(res) + status = "OK" if res.ok else "FAIL" + print(f"[{status}] {res.profile.name}/{res.case.name}") + + build_reports(results, profiles, outdir) + print(f"Report written to: {outdir / 'report.md'}") + print(f"CSV written to: {outdir / 'summary.csv'} and {outdir / 'branch_delta.csv'}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/.codex/skills/run-cpt-regression/SKILL.md b/.codex/skills/run-cpt-regression/SKILL.md new file mode 100644 index 0000000000..a446c72afb --- /dev/null +++ b/.codex/skills/run-cpt-regression/SKILL.md @@ -0,0 +1,53 @@ +--- +name: run-cpt-regression +description: "仅负责批量运行 gem5 checkpoint(1次或2次)。不做任何分析。" +--- + +# 批量 CPT 运行技能(仅运行) + +## 何时使用 +- 你只想批量跑 checkpoint / 小测试。 +- 你希望 run 与 analysis 完全解耦。 + +## 核心原则 +- 这个 skill **不做分析**。 +- 只产出运行目录、`stats.txt`、`gem5.stdout`、`gem5.stderr`。 + +## 入口脚本 +- `.codex/skills/run-cpt-regression/scripts/run_cpt_back.py` + +## 典型用法 +批量跑(默认 ref+opt): + +```bash +python3 .codex/skills/run-cpt-regression/scripts/run_cpt_back.py \ + --debug-dir /tmp/debug/tage-new8 +``` + +仅跑 opt(跳过 ref): + +```bash +python3 .codex/skills/run-cpt-regression/scripts/run_cpt_back.py \ + --debug-dir /tmp/debug/tage-new8 \ + --skip-ref +``` + +仅跑指定切片: + +```bash +python3 .codex/skills/run-cpt-regression/scripts/run_cpt_back.py \ + --debug-dir /tmp/debug/tage-new8 \ + --slices 2fetch coremark10 +``` + +带参数运行某个切片, 使用-P: +```bash +GCBV_REF_SO= \ +./build/RISCV/gem5.opt ./configs/example/kmhv3.py \ + --raw-cpt \ + --generic-rv-cpt= \ + -P "system.cpu[0].branchPred.mgsc.enabled=True" +``` + +## 后续分析 +请使用另一个 skill:`frontend-pmu-analysis`。 diff --git a/.codex/skills/run-cpt-regression/scripts/run_cpt_back.py b/.codex/skills/run-cpt-regression/scripts/run_cpt_back.py new file mode 100755 index 0000000000..03639529da --- /dev/null +++ b/.codex/skills/run-cpt-regression/scripts/run_cpt_back.py @@ -0,0 +1,183 @@ +#!/usr/bin/env python3 + +from __future__ import annotations + +import argparse +import concurrent.futures +import logging +import os +import subprocess +from dataclasses import dataclass +from pathlib import Path +from typing import Dict, List + +REPO_ROOT = Path(__file__).resolve().parents[4] +GEM5_BUILD_DIR = REPO_ROOT / "build" / "RISCV" +KMHV3_CONFIG = REPO_ROOT / "configs" / "example" / "kmhv3.py" + + +@dataclass +class SimConfig: + binary: str + slice_name: str + checkpoint: str + outdir: Path + args: List[str] + + +class GEM5Runner: + def __init__(self, max_workers: int, debug_dir: str, kmhv3_params: List[str], skip_ref: bool): + self.max_workers = max_workers + debug_path = Path(debug_dir) + if not debug_path.is_absolute(): + debug_path = REPO_ROOT / debug_path + self.debug_dir = debug_path + + self.kmhv3_params = kmhv3_params + self.skip_ref = skip_ref + + logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s") + self.logger = logging.getLogger(__name__) + + self.slices: Dict[str, str] = { + "coremark10": "/nfs/home/share/gem5_ci/checkpoints/coremark-riscv64-xs.bin", + } + self.load_frontend_tests() + + def load_frontend_tests(self) -> None: + am_home = os.environ.get("AM_HOME") + if not am_home: + self.logger.warning("AM_HOME is not set; skip frontend test discovery") + return + + base = Path(am_home) / "tests" / "frontendtest" + build_dirs = [ + base / "build", + base / "br_target_test" / "build", + base / "cond_br_test" / "build", + base / "mgsc_test" / "build", + ] + + discovered = 0 + for build_dir in build_dirs: + if not build_dir.exists(): + self.logger.warning("Frontend test directory not found: %s", build_dir) + continue + for binary in build_dir.glob("*-riscv64-xs.bin"): + name = binary.stem + suffix = "-riscv64-xs" + if name.endswith(suffix): + name = name[: -len(suffix)] + if name not in self.slices: + discovered += 1 + self.slices[name] = str(binary) + + if discovered: + self.logger.info("Discovered %d frontend tests via AM_HOME", discovered) + + def generate_configs(self) -> List[SimConfig]: + configs: List[SimConfig] = [] + for slice_name, checkpoint in self.slices.items(): + if not self.skip_ref: + configs.append( + SimConfig( + binary="gem5.opt.ref", + slice_name=slice_name, + checkpoint=checkpoint, + outdir=self.debug_dir / f"{slice_name}_ref", + args=[""], + ) + ) + configs.append( + SimConfig( + binary="gem5.opt", + slice_name=slice_name, + checkpoint=checkpoint, + outdir=self.debug_dir / f"{slice_name}_opt", + args=[""], + ) + ) + return configs + + def run_single(self, config: SimConfig) -> bool: + config.outdir.mkdir(parents=True, exist_ok=True) + stdout_file = config.outdir / "gem5.stdout" + stderr_file = config.outdir / "gem5.stderr" + + cmd: List[str] = [ + str(GEM5_BUILD_DIR / config.binary), + "--outdir", + str(config.outdir), + str(KMHV3_CONFIG), + "--generic-rv-cpt", + str(config.checkpoint), + "--raw-cpt", + *config.args, + ] + for param in self.kmhv3_params: + cmd.extend(["-P", param]) + + self.logger.info("Run %s with %s", config.slice_name, config.binary) + with stdout_file.open("w", encoding="utf-8") as out, stderr_file.open("w", encoding="utf-8") as err: + proc = subprocess.run(cmd, stdout=out, stderr=err, text=True) + + if proc.returncode == 0: + return True + + err_text = stderr_file.read_text(encoding="utf-8", errors="ignore").strip() + self.logger.error("Simulation failed: %s %s: %s", config.slice_name, config.binary, err_text) + return False + + def run_all(self) -> int: + configs = self.generate_configs() + success = 0 + fail = 0 + + with concurrent.futures.ThreadPoolExecutor(max_workers=self.max_workers) as executor: + future_map = {executor.submit(self.run_single, cfg): cfg for cfg in configs} + for future in concurrent.futures.as_completed(future_map): + cfg = future_map[future] + try: + if future.result(): + success += 1 + else: + fail += 1 + except Exception as exc: + fail += 1 + self.logger.error("Unhandled simulation exception on %s: %s", cfg.slice_name, exc) + + self.logger.info("Simulation done. success=%d failed=%d", success, fail) + return 0 if fail == 0 else 1 + + +def build_parser() -> argparse.ArgumentParser: + parser = argparse.ArgumentParser(description="Run gem5 checkpoint batch only (no analysis)") + parser.add_argument("--max-workers", type=int, default=64) + parser.add_argument("--debug-dir", type=str, default="debug/test1") + parser.add_argument("--slices", type=str, nargs="+", help="Run only selected slices") + parser.add_argument("--skip-ref", action="store_true", help="Skip gem5.opt.ref runs") + parser.add_argument("--param", action="append", default=[], help="Repeatable kmhv3 -P argument") + return parser + + +def main() -> int: + args = build_parser().parse_args() + + runner = GEM5Runner( + max_workers=args.max_workers, + debug_dir=args.debug_dir, + kmhv3_params=args.param, + skip_ref=args.skip_ref, + ) + + if args.slices: + runner.slices = {k: v for k, v in runner.slices.items() if k in args.slices} + if not runner.slices: + runner.logger.error("No valid slices specified") + return 1 + + return runner.run_all() + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/configs/common/xiangshan.py b/configs/common/xiangshan.py index 1190d5486d..ab0f5c6eef 100644 --- a/configs/common/xiangshan.py +++ b/configs/common/xiangshan.py @@ -369,7 +369,7 @@ def build_xiangshan_system(args): enable_bp_db = len(args.enable_bp_db) > 1 if enable_bp_db: - bp_db_switches = args.enable_bp_db[1] + ['basic'] + bp_db_switches = list(args.enable_bp_db[1]) print("BP db switches:", bp_db_switches) else: bp_db_switches = [] diff --git a/configs/example/kmhv3.py b/configs/example/kmhv3.py index 71844d9478..9c75771831 100644 --- a/configs/example/kmhv3.py +++ b/configs/example/kmhv3.py @@ -105,9 +105,17 @@ def setKmhV3Params(args, system): cpu.branchPred.mbtb.enabled = True cpu.branchPred.tage.enabled = True cpu.branchPred.ittage.enabled = True - cpu.branchPred.mgsc.enabled = False + cpu.branchPred.mgsc.enabled = True cpu.branchPred.ras.enabled = True + # RTL alignment: only enable bias + path + IMLI tables, disable PC threshold + cpu.branchPred.mgsc.enableBwTable = False + cpu.branchPred.mgsc.enableLTable = False + cpu.branchPred.mgsc.enableITable = True + cpu.branchPred.mgsc.enableGTable = False + cpu.branchPred.mgsc.enablePTable = True + cpu.branchPred.mgsc.enableBiasTable = True + # l1 cache per core if args.caches: cpu.icache.size = '64kB' diff --git a/src/cpu/pred/BranchPredictor.py b/src/cpu/pred/BranchPredictor.py index a5132f48af..6de883fa63 100644 --- a/src/cpu/pred/BranchPredictor.py +++ b/src/cpu/pred/BranchPredictor.py @@ -1057,7 +1057,7 @@ class BTBTAGE(TimedBaseBTBPredictor): useAltOnNaSize = Param.Unsigned(128, "Size of the useAltOnNa table") useAltOnNaWidth = Param.Unsigned(7, "Width of the useAltOnNa table") numBanks = Param.Unsigned(4, "Number of banks for bank conflict simulation") - enableBankConflict = Param.Bool(True, "Enable bank conflict simulation") + enableBankConflict = Param.Bool(False, "Enable bank conflict simulation") numDelay = 2 class MicroTAGE(BTBTAGE): @@ -1152,6 +1152,7 @@ class BTBMGSC(TimedBaseBTBPredictor): enablePTable = Param.Bool(True, "Enable P (path) table") enableBiasTable = Param.Bool(True, "Enable Bias table") enablePCThreshold = Param.Bool(False, "Enable PC-indexed threshold table") + focusBranchPC = Param.Addr(0, "Only write MGSCTRACE for this branch PC when non-zero") numDelay = 2 diff --git a/src/cpu/pred/btb/btb_mgsc.cc b/src/cpu/pred/btb/btb_mgsc.cc index 0211ad7eaa..9011dbbce6 100755 --- a/src/cpu/pred/btb/btb_mgsc.cc +++ b/src/cpu/pred/btb/btb_mgsc.cc @@ -157,6 +157,7 @@ BTBMGSC::BTBMGSC() enablePTable(true), enableBiasTable(true), enablePCThreshold(false), + focusBranchPC(0), mgscStats() { // Test-only small config: keep tables tiny and deterministic for fast unit tests. @@ -204,6 +205,7 @@ BTBMGSC::BTBMGSC(const Params &p) enablePTable(p.enablePTable), enableBiasTable(p.enableBiasTable), enablePCThreshold(p.enablePCThreshold), + focusBranchPC(p.focusBranchPC), mgscStats(this) { DPRINTF(MGSC, "BTBMGSC constructor\n"); @@ -413,6 +415,9 @@ BTBMGSC::generateSinglePrediction(const BTBEntry &btb_entry, const Addr &startPC int p_update_thres = enablePCThreshold ? findThreshold(pUpdateThreshold, btb_entry.pc) : 0; int total_thres = (updateThreshold / 8) + p_update_thres; + // Threshold is used as a confidence gate; avoid negative values which + // effectively disable the gate (abs(sum) > negative is almost always true). + total_thres = std::max(total_thres, 0); bool use_sc_pred = forceUseSC; // Force use SC if configured if (!use_sc_pred) { @@ -656,6 +661,11 @@ void BTBMGSC::updateGlobalThreshold(Addr pc, bool update_direction) { updateCounter(update_direction, updateThresholdWidth, updateThreshold); + // Keep global threshold non-negative; negative thresholds make SC gating + // degenerate and can cause overuse of SC. + if (updateThreshold < 0) { + updateThreshold = 0; + } } void @@ -771,7 +781,7 @@ BTBMGSC::updateSinglePredictor(const BTBEntry &entry, bool actual_taken, const M #ifndef UNIT_TEST // Write trace record - if (enableDB) { + if (enableDB && (focusBranchPC == 0 || entry.pc == focusBranchPC)) { MgscTrace t; t.set(entry.pc, tage_pred_taken, pred.tage_conf_high, pred.tage_conf_mid, pred.tage_conf_low, @@ -784,7 +794,7 @@ BTBMGSC::updateSinglePredictor(const BTBEntry &entry, bool actual_taken, const M #endif // Only update tables if prediction was wrong or confidence was low - if (sc_pred_taken != actual_taken || abs(total_sum) < total_thres) { + if (sc_pred_taken != actual_taken || abs(total_sum) < (total_thres / 2)) { // get weight table index from startPC Addr weightTableIdx = getPcIndex(stream.startPC, weightTableIdxWidth); bool threshold_inc = (sc_pred_taken != actual_taken); diff --git a/src/cpu/pred/btb/btb_mgsc.hh b/src/cpu/pred/btb/btb_mgsc.hh index ee94023d1d..100fc639a4 100755 --- a/src/cpu/pred/btb/btb_mgsc.hh +++ b/src/cpu/pred/btb/btb_mgsc.hh @@ -351,6 +351,7 @@ class BTBMGSC : public TimedBaseBTBPredictor bool enablePTable; bool enableBiasTable; bool enablePCThreshold; + Addr focusBranchPC; // Folded history for index calculation std::vector indexBwFoldedHist; @@ -522,6 +523,7 @@ class BTBMGSC : public TimedBaseBTBPredictor static bool &enablePTable(BTBMGSC &mgsc) { return mgsc.enablePTable; } static bool &enableBiasTable(BTBMGSC &mgsc) { return mgsc.enableBiasTable; } static bool &enablePCThreshold(BTBMGSC &mgsc) { return mgsc.enablePCThreshold; } + static Addr &focusBranchPC(BTBMGSC &mgsc) { return mgsc.focusBranchPC; } static auto &bwTable(BTBMGSC &mgsc) { return mgsc.bwTable; } static auto &lTable(BTBMGSC &mgsc) { return mgsc.lTable; }