Skip to content

Conversation

@chunhuanMeng
Copy link
Contributor

@chunhuanMeng chunhuanMeng commented Dec 5, 2025

Updates the SYCL kernel compilation flags in the set_build_flags macro to better control floating-point behavior and enable fused multiply-add (FMA) optimizations for both MSVC and GNU compilers.

Compiler flag changes for floating-point behavior and FMA:

  • For MSVC: Added /Qfma to enable FMA instructions, and /Qftz- to disable flush-to-zero mode.
  • For GNU: Replaced several fine-grained floating-point flags with -fno-fast-math for strict floating-point compliance and -fma to enable FMA instructions.

Copilot AI review requested due to automatic review settings December 5, 2025 05:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes math-related compiler flags for GNU compiler builds by simplifying floating-point behavior control. The change replaces four specific floating-point flags with two more comprehensive options.

Key Changes:

  • Consolidated multiple floating-point flags into -fno-fast-math and -fma for simpler and more predictable floating-point behavior
  • Maintained strict floating-point semantics while enabling fused multiply-add optimizations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chunhuanMeng chunhuanMeng added the windows_ci Only for Windows CI trigger label Dec 24, 2025
@chunhuanMeng chunhuanMeng changed the title Optimize mah related build option Optimize math related build option Dec 25, 2025
@EikanWang
Copy link
Contributor

@chunhuanMeng , I suppose CUDA also enables FMA in the PyTorch build system, right?

@chunhuanMeng
Copy link
Contributor Author

@chunhuanMeng , I suppose CUDA also enables FMA in the PyTorch build system, right?
Yes, CUDA enables FMA by default.

@EikanWang
Copy link
Contributor

Please help collect the performance data.

set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -fno-associative-math)
set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -fno-approx-func)
set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -Wno-absolute-value)
set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -fno-fast-math)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chunhuanMeng , should we better enable -ffp-contract=fast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-ffp-contract=fast enables floating-point contraction, allowing FMA formation, and -fma does the same.

@EikanWang
Copy link
Contributor

Before landing the PR, let's collect performance data to obtain performance insights.

@EikanWang EikanWang requested a review from kdrozd-dev January 5, 2026 12:52
@intel intel deleted a comment from github-actions bot Jan 7, 2026
@intel intel deleted a comment from github-actions bot Jan 7, 2026
@intel intel deleted a comment from github-actions bot Jan 7, 2026
@mengfei25
Copy link
Contributor

No regression for performance
Test 3 suites dynamo benchmarks and overall eager is 1.000x and inductor is 1.000x compared with main branch (2ce9db8)

PR run: https://github.com/intel/torch-xpu-ops/actions/runs/20703251204
main run: https://github.com/intel/torch-xpu-ops/actions/runs/20703257119

@github-actions
Copy link

github-actions bot commented Jan 7, 2026

Performance outliers, please check!

  • 🔴 [-1, 80%), should be regression
Category Model Target vs. Baseline [Eager] Target vs. Baseline [Inductor]
torchbench_bfloat16_training resnet18 0.882706 0.748951

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

windows_ci Only for Windows CI trigger

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants