Skip to content

Conversation

@Chao1Han
Copy link
Contributor

@Chao1Han Chao1Han commented Jan 6, 2026

No description provided.

Copilot AI review requested due to automatic review settings January 6, 2026 06:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds test coverage for the Premul Sum operation in the XCCL distributed communication backend. The Premul Sum operation performs element-wise multiplication by a factor before reduction, combining two operations into one for better performance.

  • Adds Premul Sum tests to test_allreduce_ops covering multiple data types (half, float, double) and factor types (scalar and tensor)
  • Includes commented-out test code for reduce operations with a note that Premul Sum is not supported for reduce ops
  • Adds Premul Sum testing to test_reduce_scatter_ops comparing results against separate multiplication and sum operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +772 to +781
for factor in (3.0, torch.tensor([5.0], device=self.rank)):
if isinstance(factor, torch.Tensor):
factor_ref = factor.cpu().item()
else:
factor_ref = factor
output = [t.float() for t in output]
tensor_lists = [[t.float() for t in tl] for tl in tensor_lists]
output_ref = [t.float() for t in output]
tensor_lists_ref = [
[t.float() * factor_ref for t in tl] for tl in tensor_lists
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variables output and tensor_lists are being modified in-place within the loop iteration. This means subsequent iterations will operate on already-converted float tensors, potentially causing errors or unexpected behavior. These conversions should either be moved outside the loop or use fresh copies for each iteration.

Suggested change
for factor in (3.0, torch.tensor([5.0], device=self.rank)):
if isinstance(factor, torch.Tensor):
factor_ref = factor.cpu().item()
else:
factor_ref = factor
output = [t.float() for t in output]
tensor_lists = [[t.float() for t in tl] for tl in tensor_lists]
output_ref = [t.float() for t in output]
tensor_lists_ref = [
[t.float() * factor_ref for t in tl] for tl in tensor_lists
# Create baseline float tensors once, and clone them inside the loop to
# avoid accumulating mutations across iterations.
base_output = [t.float() for t in output]
base_tensor_lists = [[t.float() for t in tl] for tl in tensor_lists]
for factor in (3.0, torch.tensor([5.0], device=self.rank)):
if isinstance(factor, torch.Tensor):
factor_ref = factor.cpu().item()
else:
factor_ref = factor
# Fresh copies for this iteration
output = [t.clone() for t in base_output]
tensor_lists = [[t.clone() for t in tl] for tl in base_tensor_lists]
output_ref = [t.clone() for t in base_output]
tensor_lists_ref = [
[t.clone() * factor_ref for t in tl] for tl in base_tensor_lists

Copilot uses AI. Check for mistakes.
Comment on lines +773 to +776
if isinstance(factor, torch.Tensor):
factor_ref = factor.cpu().item()
else:
factor_ref = factor
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern of extracting a scalar reference from a factor (lines 773-776) is duplicated from earlier in the test file. Consider extracting this into a helper function to reduce code duplication and improve maintainability.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants