-
Notifications
You must be signed in to change notification settings - Fork 95
question15:add nvidia branch #903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds NVIDIA/CUDA implementations for 5 mathematical operators to support "question 15". The implementation includes complete test infrastructure, operator registration, CUDA kernels, CPU fallbacks, and Python bindings.
Key changes:
- Added 5 new operators:
atanh,addcmul,cdist,binary_cross_entropy_with_logits, andreciprocal - Implemented CUDA kernels for NVIDIA devices with CPU fallbacks
- Added comprehensive test suites for both infiniop and infinicore layers
- Updated Python bindings to expose new operators
Reviewed changes
Copilot reviewed 88 out of 88 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| test/infiniop/*.py | Test files for all 5 operators with test cases and tolerance maps |
| test/infiniop/libinfiniop/op_register.py | Registration of operator API bindings |
| src/infiniop/ops/*/operator.cc | Operator dispatchers for device-specific implementations |
| src/infiniop/ops//nvidia/.cu | CUDA kernel implementations |
| src/infiniop/ops//cpu/.cc | CPU implementations |
| src/infinicore/ops//.cc | InfiniCore operator implementations |
| src/infinicore/pybind11/ops/*.hpp | Python binding definitions |
| python/infinicore/ops/*.py | Python wrapper functions |
| include/infiniop/ops/*.h | C API header files |
| include/infinicore/ops/*.hpp | C++ API header files |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,163 @@ | |||
| import torch | |||
| import ctypes | |||
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Module 'ctypes' is imported with both 'import' and 'import from'.
| @@ -0,0 +1,171 @@ | |||
| import torch | |||
| import ctypes | |||
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Module 'ctypes' is imported with both 'import' and 'import from'.
| @@ -0,0 +1,154 @@ | |||
| import torch | |||
| import ctypes | |||
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Module 'ctypes' is imported with both 'import' and 'import from'.
| @@ -0,0 +1,156 @@ | |||
| import torch | |||
| import ctypes | |||
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Module 'ctypes' is imported with both 'import' and 'import from'.
| @@ -0,0 +1,165 @@ | |||
| import torch | |||
| import ctypes | |||
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Module 'ctypes' is imported with both 'import' and 'import from'.
| @@ -0,0 +1,154 @@ | |||
| import torch | |||
| import ctypes | |||
| from ctypes import c_uint64, c_float, c_char_p | |||
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'c_float' is not used.
| @@ -0,0 +1,156 @@ | |||
| import torch | |||
| import ctypes | |||
| from ctypes import c_uint64, c_float, c_double | |||
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'c_float' is not used.
| constexpr int BCE_MAX_DIMS = 8; | ||
|
|
||
| struct BCETensorInfoDevice { | ||
| size_t ndim; | ||
| size_t shape[BCE_MAX_DIMS]; | ||
| ptrdiff_t strides[BCE_MAX_DIMS]; | ||
| }; | ||
|
|
||
| static inline BCETensorInfoDevice make_device_info(const BCETensorInfo &info) { | ||
| BCETensorInfoDevice dev{}; | ||
| dev.ndim = info.ndim; | ||
| for (size_t i = 0; i < info.ndim && i < static_cast<size_t>(BCE_MAX_DIMS); ++i) { | ||
| dev.shape[i] = info.dims[i]; | ||
| dev.strides[i] = info.stride[i]; | ||
| } |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BCETensorInfoDevice stores shape and strides in fixed-size arrays of length BCE_MAX_DIMS, but make_device_info sets ndim directly from info.ndim and only truncates the arrays when copying. If info.ndim exceeds BCE_MAX_DIMS, later calls to indexToOffset(idx, logits_info.ndim, logits_info.shape, logits_info.strides) will index past the end of these arrays, producing out-of-bounds offsets and causing GPU memory corruption or unintended reads/writes on the logits/targets/outputs buffers. To harden this, either enforce info.ndim <= BCE_MAX_DIMS at descriptor creation or clamp and validate ndim before launching the CUDA kernel so that higher-rank tensors are rejected or handled safely.
针对第15题,添加了NVIDIA实现