question15：add nvidia branch #903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

JoeLin2333 wants to merge 6 commits into InfiniTensor:main from JoeLin2333:nvidia-15

JoeLin2333 commented Jan 9, 2026

针对第15题，添加了NVIDIA实现

Your Name added 6 commits

December 25, 2025 19:22


          add atanh code

6e8e708


          add addcmul kernel

f0a4a29


          add cdist kernel

4c237fe


          add bce_with_logits function

74b9932


          add reciprocal

16d5652


          fix bcm cpu bug

8b89033

JoeLin2333 requested review from a team and Copilot

January 9, 2026 07:17

Copilot started reviewing on behalf of JoeLin2333

January 9, 2026 07:17

Copilot AI reviewed

View reviewed changes

Copilot AI left a comment

Pull request overview

This pull request adds NVIDIA/CUDA implementations for 5 mathematical operators to support "question 15". The implementation includes complete test infrastructure, operator registration, CUDA kernels, CPU fallbacks, and Python bindings.

Key changes:

Added 5 new operators: atanh, addcmul, cdist, binary_cross_entropy_with_logits, and reciprocal
Implemented CUDA kernels for NVIDIA devices with CPU fallbacks
Added comprehensive test suites for both infiniop and infinicore layers
Updated Python bindings to expose new operators

Reviewed changes

Copilot reviewed 88 out of 88 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
test/infiniop/*.py	Test files for all 5 operators with test cases and tolerance maps
test/infiniop/libinfiniop/op_register.py	Registration of operator API bindings
src/infiniop/ops/*/operator.cc	Operator dispatchers for device-specific implementations
src/infiniop/ops//nvidia/.cu	CUDA kernel implementations
src/infiniop/ops//cpu/.cc	CPU implementations
src/infinicore/ops//.cc	InfiniCore operator implementations
src/infinicore/pybind11/ops/*.hpp	Python binding definitions
python/infinicore/ops/*.py	Python wrapper functions
include/infiniop/ops/*.h	C API header files
include/infinicore/ops/*.hpp	C++ API header files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

test/infiniop/addcmul.py

		@@ -0,0 +1,163 @@
		import torch
		import ctypes

Copilot AI Jan 9, 2026

Module 'ctypes' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.

test/infiniop/atanh.py

		@@ -0,0 +1,171 @@
		import torch
		import ctypes

Copilot AI Jan 9, 2026

Module 'ctypes' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.

test/infiniop/binary_cross_entropy_with_logits.py

		@@ -0,0 +1,154 @@
		import torch
		import ctypes

Copilot AI Jan 9, 2026

Module 'ctypes' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.

test/infiniop/cdist.py

		@@ -0,0 +1,156 @@
		import torch
		import ctypes

Copilot AI Jan 9, 2026

Module 'ctypes' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.

test/infiniop/reciprocal.py

		@@ -0,0 +1,165 @@
		import torch
		import ctypes

Copilot AI Jan 9, 2026

Module 'ctypes' is imported with both 'import' and 'import from'.

Copilot uses AI. Check for mistakes.

test/infiniop/binary_cross_entropy_with_logits.py

@@ @@ -0,0 +1,154 @@ @@
+              import torch
+              import ctypes
+              from ctypes import c_uint64, c_float, c_char_p

Copilot AI Jan 9, 2026

Import of 'c_float' is not used.

Copilot uses AI. Check for mistakes.

test/infiniop/cdist.py

@@ @@ -0,0 +1,156 @@ @@
+              import torch
+              import ctypes
+              from ctypes import c_uint64, c_float, c_double

Copilot AI Jan 9, 2026

Import of 'c_float' is not used.

Copilot uses AI. Check for mistakes.

...iniop/ops/binary_cross_entropy_with_logits/nvidia/binary_cross_entropy_with_logits_nvidia.cu

Comment on lines +17 to +31

+              constexpr int BCE_MAX_DIMS = 8;
+              struct BCETensorInfoDevice {
+                  size_t ndim;
+                  size_t shape[BCE_MAX_DIMS];
+                  ptrdiff_t strides[BCE_MAX_DIMS];
+              };
+              static inline BCETensorInfoDevice make_device_info(const BCETensorInfo &info) {
+                  BCETensorInfoDevice dev{};
+                  dev.ndim = info.ndim;
+                  for (size_t i = 0; i < info.ndim && i < static_cast<size_t>(BCE_MAX_DIMS); ++i) {
+                      dev.shape[i] = info.dims[i];
+                      dev.strides[i] = info.stride[i];
+                  }

Copilot AI Jan 9, 2026

BCETensorInfoDevice stores shape and strides in fixed-size arrays of length BCE_MAX_DIMS, but make_device_info sets ndim directly from info.ndim and only truncates the arrays when copying. If info.ndim exceeds BCE_MAX_DIMS, later calls to indexToOffset(idx, logits_info.ndim, logits_info.shape, logits_info.strides) will index past the end of these arrays, producing out-of-bounds offsets and causing GPU memory corruption or unintended reads/writes on the logits/targets/outputs buffers. To harden this, either enforce info.ndim <= BCE_MAX_DIMS at descriptor creation or clamp and validate ndim before launching the CUDA kernel so that higher-rank tensors are rejected or handled safely.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet