Skip to content

Conversation

@tbensonatl
Copy link
Collaborator

This PR includes several enhancements for float-float support:

  • Add division support for the float-float (fltflt) type
  • Add operator overloads for +, -, / , *, <, >, etc.
  • Add a matx::sqrt() overload for the fltflt type and introduce a dispatch function for sqrt() that can use this function
  • Update the sar_bp operator kernel to use the overloaded operator types
  • Add extensive unit testing for the float-float functions, including checks on the number of effective mantissa bits.

This PR includes several enhancements for float-float support:

- Add division support for the float-float (fltflt) type
- Add operator overloads for +, -, / , *, <, >, etc.
- Add a matx::sqrt() overload for the fltflt type and introduce a dispatch
  function for sqrt() that can use this function
- Update the sar_bp operator kernel to use the overloaded operator types
- Add extensive unit testing for the float-float functions, including checks on the number of effective mantissa bits.

Signed-off-by: Thomas Benson <tbenson@nvidia.com>
@tbensonatl tbensonatl self-assigned this Jan 17, 2026
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 17, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@tbensonatl
Copy link
Collaborator Author

/build

@greptile-apps
Copy link

greptile-apps bot commented Jan 17, 2026

Greptile Summary

Enhances float-float (fltflt) support with division operations, operator overloads, and sqrt functionality. The PR modernizes the fltflt API by adding C++ operator overloads (+, -, *, /, ==, !=, <, >, <=, >=) and conversion operators, allowing more natural arithmetic expressions. The sar_bp.cuh kernel now uses these overloaded operators instead of explicit function calls, improving code readability. A new matx::sqrt overload enables the unary operator dispatch system to work with fltflt types through ADL. Comprehensive unit tests validate >=44 mantissa bits of precision.

Key Changes:

  • Added fltflt_div, fltflt_div_float_denom, and fltflt_div_float_numer division functions
  • Implemented operator overloads for all arithmetic and comparison operations
  • Added explicit conversion operators to double and float
  • Introduced detail:: namespace helpers for host/device CUDA intrinsic wrappers
  • Updated SAR backprojection kernel to use cleaner operator syntax
  • Added ADL-based sqrt dispatch in scalar_internal.h using unqualified call with using declarations
  • Added 1093-line comprehensive test suite with mantissa bit precision validation

Confidence Score: 5/5

  • This PR is safe to merge with no issues found
  • The code is well-architected with proper host/device abstraction, comprehensive testing, and careful attention to numerical precision. The operator overloads follow C++ best practices and maintain backward compatibility. The implementation correctly handles edge cases (division by zero, signed zero comparisons) and uses established algorithms from academic literature. The extensive test suite validates precision guarantees.
  • No files require special attention

Important Files Changed

Filename Overview
include/matx/kernels/fltflt.h Added division operators, operator overloads (+, -, *, /, comparisons), sqrt overload, and conversion operators. Code is well-structured with proper host/device abstraction.
include/matx/kernels/sar_bp.cuh Refactored to use overloaded operators instead of explicit function calls for fltflt arithmetic. Makes code cleaner and more readable.
include/matx/operators/scalar_internal.h Added ADL-based sqrt dispatch using unqualified call to enable matx::sqrt overload for fltflt. Clean integration with existing infrastructure.
test/00_misc/FloatFloatTests.cu Comprehensive test suite for fltflt operations with mantissa bit precision validation. Tests addition, multiplication, subtraction, division, sqrt, abs, and all comparison operators.

Sequence Diagram

sequenceDiagram
    participant User as User Code
    participant Operator as Operator Overload
    participant FltFlt as fltflt Function
    participant Detail as detail:: Helpers
    participant CUDA as CUDA Intrinsics

    User->>Operator: a + b (fltflt)
    Operator->>FltFlt: fltflt_add(a, b)
    FltFlt->>FltFlt: fltflt_two_sum(a.hi, b.hi)
    FltFlt->>Detail: fadd_rn(a, b)
    Detail->>CUDA: __fadd_rn (device) or a+b (host)
    CUDA-->>Detail: result
    Detail-->>FltFlt: sum and error
    FltFlt->>FltFlt: fltflt_fast_two_sum
    FltFlt-->>Operator: normalized fltflt
    Operator-->>User: result

    User->>Operator: a * b (fltflt)
    Operator->>FltFlt: fltflt_mul(a, b)
    FltFlt->>FltFlt: fltflt_two_prod_fma(a.hi, b.hi)
    FltFlt->>Detail: fmaf_rn(a, b, c)
    Detail->>CUDA: __fmaf_rn (device) or ::fmaf (host)
    CUDA-->>Detail: product
    Detail-->>FltFlt: error-free product
    FltFlt-->>Operator: result
    Operator-->>User: result

    User->>Operator: a / b (fltflt)
    Operator->>FltFlt: fltflt_div(a, b)
    FltFlt->>Detail: fdividef_rn(1.0f, b.hi)
    Detail->>CUDA: __fdividef (device) or a/b (host)
    CUDA-->>Detail: approximation
    FltFlt->>FltFlt: refine with Newton iteration
    FltFlt-->>Operator: high-precision result
    Operator-->>User: result

    User->>Operator: sqrt(a) (fltflt)
    Operator->>FltFlt: matx::sqrt → fltflt_sqrt
    FltFlt->>Detail: fltflt_rsqrt(a.hi)
    Detail->>CUDA: rsqrtf (device) or 1/sqrtf (host)
    CUDA-->>Detail: reciprocal sqrt
    FltFlt->>FltFlt: refine with correction
    FltFlt-->>Operator: high-precision sqrt
    Operator-->>User: result
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 291 to 296
fltflt::operator double() const {
return fltflt_to_double(*this);
}
fltflt::operator float() const {
return fltflt_to_float(*this);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Conversion operator definitions missing __MATX_HOST__ __MATX_DEVICE__ __MATX_INLINE__ decorators

Suggested change
fltflt::operator double() const {
return fltflt_to_double(*this);
}
fltflt::operator float() const {
return fltflt_to_float(*this);
}
__MATX_HOST__ __MATX_DEVICE__ __MATX_INLINE__ fltflt::operator double() const {
return fltflt_to_double(*this);
}
__MATX_HOST__ __MATX_DEVICE__ __MATX_INLINE__ fltflt::operator float() const {
return fltflt_to_float(*this);
}

Comment on lines 262 to 263
bin = static_cast<loose_compute_t>(
fltflt_to_float(fltflt_mul(diffR_ff, dr_inv_fltflt)) + bin_offset);
static_cast<loose_compute_t>(diffR_ff * dr_inv_fltflt) + bin_offset);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Redundant static_cast<loose_compute_t> - the inner cast already returns loose_compute_t

Suggested change
bin = static_cast<loose_compute_t>(
fltflt_to_float(fltflt_mul(diffR_ff, dr_inv_fltflt)) + bin_offset);
static_cast<loose_compute_t>(diffR_ff * dr_inv_fltflt) + bin_offset);
bin = static_cast<loose_compute_t>(diffR_ff * dr_inv_fltflt) + bin_offset;

Signed-off-by: Thomas Benson <tbenson@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants