-
Notifications
You must be signed in to change notification settings - Fork 66
Optimize math related build option #2462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes math-related compiler flags for GNU compiler builds by simplifying floating-point behavior control. The change replaces four specific floating-point flags with two more comprehensive options.
Key Changes:
- Consolidated multiple floating-point flags into
-fno-fast-mathand-fmafor simpler and more predictable floating-point behavior - Maintained strict floating-point semantics while enabling fused multiply-add optimizations
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@chunhuanMeng , I suppose CUDA also enables |
|
|
Please help collect the performance data. |
| set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -fno-associative-math) | ||
| set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -fno-approx-func) | ||
| set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -Wno-absolute-value) | ||
| set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -fno-fast-math) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chunhuanMeng , should we better enable -ffp-contract=fast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-ffp-contract=fast enables floating-point contraction, allowing FMA formation, and -fma does the same.
|
Before landing the PR, let's collect performance data to obtain performance insights. |
|
No regression for performance PR run: https://github.com/intel/torch-xpu-ops/actions/runs/20703251204 |
Performance outliers, please check!
|
Updates the SYCL kernel compilation flags in the
set_build_flagsmacro to better control floating-point behavior and enable fused multiply-add (FMA) optimizations for both MSVC and GNU compilers.Compiler flag changes for floating-point behavior and FMA:
/Qfmato enable FMA instructions, and/Qftz-to disable flush-to-zero mode.-fno-fast-mathfor strict floating-point compliance and-fmato enable FMA instructions.