-
Notifications
You must be signed in to change notification settings - Fork 74
fix test welford #5719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix test welford #5719
Conversation
|
Review updated until commit eec699a Description
|
| Relevant files | |||
|---|---|---|---|
| Bug fix |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Include dependencies
|
Greptile SummaryDynamically calculates buffer size thresholds for Welford reduction tests based on actual cluster configuration instead of using hardcoded values. Key Changes:
This makes the test portable across different GPU architectures with varying cluster sizes (similar to PR #5717 for Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Test as Translate1Welford Test
participant Runtime as FusionExecutorCache
participant Scheduler as Welford Scheduler
participant Device as GPU Device Info
Note over Test: Test Setup
Test->>Test: Create fusion with Welford op
Note over Test: Small Inner Size (Translated)
Test->>Runtime: run_test(64)
Runtime->>Scheduler: Schedule welford reduction
Scheduler-->>Runtime: Translation applied (>2 exprs)
Runtime-->>Test: Returns runtime1
Test->>Test: Verify translation occurred
Note over Test: Large Inner Size (Not Translated)
Test->>Device: getMaxClusterSize()
Device-->>Test: sm_per_cluster
Test->>Device: deviceAvailableSharedMemoryBytes()
Device-->>Test: available_memory
Test->>Test: Calculate total_elements based on cluster config
Note over Test: If cluster size = 1: use smem_buffer_count<br/>Else: use regs_buffer_count * sm_per_cluster
Test->>Runtime: run_test(total_elements + 1024)
Runtime->>Scheduler: Schedule welford reduction
Scheduler-->>Runtime: No translation (WelfordOp preserved)
Runtime-->>Test: Returns runtime2
Test->>Test: Verify WelfordOp exists in segments
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 1 comment
|
!test |
7bc4c9e to
e59425a
Compare
|
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
|
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
| // size is larger than both register and shared memory size of a single SM. | ||
| // Context: cluster reduction only uses register persistence while block | ||
| // reduction may also use shared memory persistence. | ||
| const int64_t sm_per_cluster = scheduler_utils::getMaxClusterSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: Add #include <scheduler/utils.h> at the top of the file (after line 14) to declare scheduler_utils::getMaxClusterSize(). Without this include, the code will not compile.
Reference: tests/cpp/test_cluster.cpp includes this header on line 17.
|
!test |
Similar to #5717