-
Notifications
You must be signed in to change notification settings - Fork 74
[Build Speed] Dynamic type improvements #5739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reduces cumulative template instantiation time by 22% (clang -ftime-trace).
Move isSame, ceildiv, max, fmax, min, fmin from header to cpp to reduce template instantiation costs. These functions use PolymorphicValue operators that trigger ForAllTypes recursion.
|
!test |
Description
|
| Relevant files | |||||||
|---|---|---|---|---|---|---|---|
| Enhancement |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 No relevant tests |
| ⚡ Recommended focus areas for review |
Template instantiation
|
Test failures
-
(High, 44)
NCCL NVLS multicast binding errors in multi-device distributed tests (multidevice/* on dlcluster_viking_ci)Test Name H100 (dist.) Source tests.python.multidevice.test_communication.test_allgather ❌ tests.python.multidevice.test_communication.test_allgather_expanded_broadcast ❌ tests.python.multidevice.test_communication.test_allreduce ❌ tests.python.multidevice.test_communication.test_reduce_scatter ❌ tests.python.multidevice.test_communication.test_reduce_scatter_noncontiguous ❌ tests.python.multidevice.test_dtensor.test_column_parallel_linear ❌ tests.python.multidevice.test_dtensor.test_plus_one ❌ tests.python.multidevice.test_dtensor.test_row_parallel_linear ❌ tests.python.multidevice.test_expert_parallel.test_dispatch_and_combine ❌ tests.python.multidevice.test_matmul.test_column_parallel_grouped_mm ❌ ... with 34 more test failures omitted. Check internal logs. -
(High, 1)
NCCL invalid group usage in multidevice overlap test_overlap_allgather_matmul_shard_outermostTest Name H100 (dist.) Source tests.python.multidevice.test_overlap.test_overlap_allgather_matmul_shard_outermost[backend_type=CommunicatorBackend.cuda] ❌
No description provided.