-
Notifications
You must be signed in to change notification settings - Fork 111
Add fltflt division and fltflt operator overloads #1121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR includes several enhancements for float-float support: - Add division support for the float-float (fltflt) type - Add operator overloads for +, -, / , *, <, >, etc. - Add a matx::sqrt() overload for the fltflt type and introduce a dispatch function for sqrt() that can use this function - Update the sar_bp operator kernel to use the overloaded operator types - Add extensive unit testing for the float-float functions, including checks on the number of effective mantissa bits. Signed-off-by: Thomas Benson <tbenson@nvidia.com>
|
/build |
Greptile SummaryEnhances float-float (fltflt) support with division operations, operator overloads, and sqrt functionality. The PR modernizes the fltflt API by adding C++ operator overloads (+, -, *, /, ==, !=, <, >, <=, >=) and conversion operators, allowing more natural arithmetic expressions. The Key Changes:
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User as User Code
participant Operator as Operator Overload
participant FltFlt as fltflt Function
participant Detail as detail:: Helpers
participant CUDA as CUDA Intrinsics
User->>Operator: a + b (fltflt)
Operator->>FltFlt: fltflt_add(a, b)
FltFlt->>FltFlt: fltflt_two_sum(a.hi, b.hi)
FltFlt->>Detail: fadd_rn(a, b)
Detail->>CUDA: __fadd_rn (device) or a+b (host)
CUDA-->>Detail: result
Detail-->>FltFlt: sum and error
FltFlt->>FltFlt: fltflt_fast_two_sum
FltFlt-->>Operator: normalized fltflt
Operator-->>User: result
User->>Operator: a * b (fltflt)
Operator->>FltFlt: fltflt_mul(a, b)
FltFlt->>FltFlt: fltflt_two_prod_fma(a.hi, b.hi)
FltFlt->>Detail: fmaf_rn(a, b, c)
Detail->>CUDA: __fmaf_rn (device) or ::fmaf (host)
CUDA-->>Detail: product
Detail-->>FltFlt: error-free product
FltFlt-->>Operator: result
Operator-->>User: result
User->>Operator: a / b (fltflt)
Operator->>FltFlt: fltflt_div(a, b)
FltFlt->>Detail: fdividef_rn(1.0f, b.hi)
Detail->>CUDA: __fdividef (device) or a/b (host)
CUDA-->>Detail: approximation
FltFlt->>FltFlt: refine with Newton iteration
FltFlt-->>Operator: high-precision result
Operator-->>User: result
User->>Operator: sqrt(a) (fltflt)
Operator->>FltFlt: matx::sqrt → fltflt_sqrt
FltFlt->>Detail: fltflt_rsqrt(a.hi)
Detail->>CUDA: rsqrtf (device) or 1/sqrtf (host)
CUDA-->>Detail: reciprocal sqrt
FltFlt->>FltFlt: refine with correction
FltFlt-->>Operator: high-precision sqrt
Operator-->>User: result
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 files reviewed, 2 comments
include/matx/kernels/fltflt.h
Outdated
| fltflt::operator double() const { | ||
| return fltflt_to_double(*this); | ||
| } | ||
| fltflt::operator float() const { | ||
| return fltflt_to_float(*this); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: Conversion operator definitions missing __MATX_HOST__ __MATX_DEVICE__ __MATX_INLINE__ decorators
| fltflt::operator double() const { | |
| return fltflt_to_double(*this); | |
| } | |
| fltflt::operator float() const { | |
| return fltflt_to_float(*this); | |
| } | |
| __MATX_HOST__ __MATX_DEVICE__ __MATX_INLINE__ fltflt::operator double() const { | |
| return fltflt_to_double(*this); | |
| } | |
| __MATX_HOST__ __MATX_DEVICE__ __MATX_INLINE__ fltflt::operator float() const { | |
| return fltflt_to_float(*this); | |
| } |
include/matx/kernels/sar_bp.cuh
Outdated
| bin = static_cast<loose_compute_t>( | ||
| fltflt_to_float(fltflt_mul(diffR_ff, dr_inv_fltflt)) + bin_offset); | ||
| static_cast<loose_compute_t>(diffR_ff * dr_inv_fltflt) + bin_offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Redundant static_cast<loose_compute_t> - the inner cast already returns loose_compute_t
| bin = static_cast<loose_compute_t>( | |
| fltflt_to_float(fltflt_mul(diffR_ff, dr_inv_fltflt)) + bin_offset); | |
| static_cast<loose_compute_t>(diffR_ff * dr_inv_fltflt) + bin_offset); | |
| bin = static_cast<loose_compute_t>(diffR_ff * dr_inv_fltflt) + bin_offset; |
Signed-off-by: Thomas Benson <tbenson@nvidia.com>
This PR includes several enhancements for float-float support: