Update MCU SoC post-P&R test data#44
Closed
github-actions[bot] wants to merge 11 commits intomainfrom
Closed
Conversation
ed4b23f to
bdd3e71
Compare
2 tasks
bdd3e71 to
8ec3348
Compare
This was referenced Feb 27, 2026
Add --timing-vcd flag that produces timing-accurate VCD output where signal transitions are offset from clock edges by their computed arrival times. The GPU kernel already computes per-gate arrival times for setup/hold checking; this feature writes them to global memory so the host can produce sub-cycle-accurate output. Changes: - GPU kernels (Metal/CUDA): write shared_writeout_arrival to global memory at arrival_state_offset when enabled - FlattenedScriptV1: add timing_arrivals_enabled, arrival_state_offset fields; update effective_state_size() for 3-section layout - vcd_io: add expand_states_for_arrivals(), split_arrival_states(), write_output_vcd_timed() with ps-to-timescale conversion - loom CLI: wire --timing-vcd flag, SimParams.arrival_state_offset, and timed VCD writer dispatch Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
f43e281 to
7fdcde9
Compare
Add detailed section to Known Issues explaining why Loom only supports edge-triggered DFFs, why CVC's test suite can't be reused as reference tests (NAND-latch flip-flops), and what would be needed to add latch support (new DriverType, two-phase evaluation, GPU kernel changes). Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
- Change IdCode::from(0) to IdCode(0) for vcd_ng tuple struct API - Make write_output_vcd_timed generic over W: Write for testability - Remove writer.flush() calls (vcd_ng::Writer has no flush method) - Add 8 comprehensive tests for expand/split/write timing arrivals Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
The Metal kernel uses a double-buffered read pattern where t4_5 holds the current stage's data while the next stage's data is pre-loaded. The gate_delay extraction was incorrectly placed AFTER the t4_5 overwrite, causing it to read the next stage's padding slot instead of the current one. For single-stage designs (like inv_chain), this read garbage/zeros. Fix: extract gate_delay from t4_5.c4 before overwriting t4_5. Also fix arrival tracking to add gate_delay even for pass-through positions (orb == 0xFFFFFFFF) across all hierarchy levels, since pass-throughs can represent physical cells (e.g., inverter chains) with accumulated delays. Also fix load_timing_from_sdf to iterate all cell origins per AIG pin instead of only the first, enabling correct delay accumulation for inverter chains collapsed to a single AIG wire. Verified: inv_chain test produces correct 1323ps arrival delay matching the analytical SDF sum (CLK→Q=350ps + 16 inverters=973ps). Co-developed-by: Claude Code v2.1.62 (claude-opus-4-6)
Suppress unused variable warnings (staged, num_srams, num_ios, num_dup, part_end) and remove dead assignments (offset before break, script_pi before break) that were cluttering build output. Co-developed-by: Claude Code v2.1.62 (claude-opus-4-6)
- tb_cvc.v: CVC testbench with SDF annotation for inv_chain timing validation (expected total delay: 1323ps) - inv_chain_stimulus.vcd: Input stimulus for timing VCD tests - compare_vcd.py: VCD comparison script for Loom vs CVC output - watchlist.json: Signal watchlist for timing_sim_cpu tracing - CI workflow: CVC reference simulation job for automated validation Co-developed-by: Claude Code v2.1.62 (claude-opus-4-6)
Dockerfile builds CVC (open-src-cvc) from source on linux/amd64 with gcc/binutils for its native code compilation. run_cvc.sh builds the image, runs the inv_chain testbench with SDF back-annotation, and compares against Loom's timing output. Results: CVC reports 1235ps total delay vs Loom's 1323ps — an 88ps (7.1%) conservative overestimate. This is expected: Loom uses max(rise, fall) per cell since the GPU kernel processes 32 packed signals and cannot track per-signal transition direction. CVC tracks actual rise/fall transitions through the inverter chain. The 88ps decomposes as: 8 inverter stages × 10ps IOPATH rise/fall asymmetry = 80ps 8 interconnect wires × 1ps rise/fall asymmetry = 8ps Usage: bash tests/timing_test/cvc/run_cvc.sh Co-developed-by: Claude Code v2.1.62 (claude-opus-4-6)
Add detailed section to timing-simulation.md covering the three independent sources of timing overestimation: 1. max(rise, fall) per cell — GPU can't track transition direction across 32 packed signals (80ps / 6.5% for inv_chain) 2. max wire delay across multi-input pins — single wire delay per cell regardless of which input is critical (8ps for inv_chain) 3. max arrival across 32 packed signals per thread — mitigated by timing-aware bit packing (0ps for inv_chain, larger in practice) Documents CVC reference validation: Loom 1323ps vs CVC 1235ps (88ps / 7.1% conservative overestimate) for the inv_chain design. Updates implementation phases to reflect completed GPU arrival tracking and timing-aware VCD output. Co-developed-by: Claude Code v2.1.62 (claude-opus-4-6)
40 outputs at 5 logic depths (3, 5, 9, 13, 17) exercise Source 3 overestimation in timing-aware bit packing. CVC reference shows distinct arrival times per group (513ps to 1286ps), confirming the conservative timing model. Includes hand-crafted SDF, stimulus VCD, CVC testbench, and Docker runner script. Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
The previous fallback logic used `find | sort -r | head -1` which grabbed a pre-PnR SDF (step 08) alphabetically instead of the post-PnR SDF from STAPostPNR (step 51) that includes interconnect delays. Now explicitly searches for stapostpnr nom_tt SDF first. Co-developed-by: Claude Code v2.1.44 (claude-opus-4-6)
7fdcde9 to
6f2a2bd
Compare
2 tasks
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated rebuild of
tests/mcu_soc/data/using librelane.Trigger:
workflow_dispatchThe
mcu-soc-metalCI job will validate simulation.