Skip to content

Enables calling CUDA kernels directly from Python#140

Open
Vipul-Cariappa wants to merge 1 commit intocompiler-research:masterfrom
Vipul-Cariappa:dev/CUDA-func
Open

Enables calling CUDA kernels directly from Python#140
Vipul-Cariappa wants to merge 1 commit intocompiler-research:masterfrom
Vipul-Cariappa:dev/CUDA-func

Conversation

@Vipul-Cariappa
Copy link
Collaborator

Required for benchmarks.
Enables calling CUDA kernels directly from Python, without any C/C++ wrapper on the CUDA kernel.

TODO
Add test at cppyy. The following may be sufficient.

import ctypes

import cppjit
from cppjit import cpp

cppjit.cppdef(
    r"""
__global__ void vectorAdd(float *a, float *b, float *c, int n) {
  int i = blockIdx.x * blockDim.x + threadIdx.x;
  if (i < n) {
    c[i] = a[i] + b[i];
  }
}
"""
)

N = 10
size = N * cppjit.sizeof(float)

a = ctypes.POINTER(ctypes.c_float)()
b = ctypes.POINTER(ctypes.c_float)()
c = ctypes.POINTER(ctypes.c_float)()

cpp.cudaMallocManaged(a, size)
cpp.cudaMallocManaged(b, size)
cpp.cudaMallocManaged(c, size)

print("BEFORE INIT")
for i in range(N):
    print(f"{a[i] = }, {b[i] = }, {c[i] = }")
    a[i] = i * 1.0
    b[i] = i * 2.0

print("AFTER INIT")
for i in range(N):
    print(f"{a[i] = }, {b[i] = }, {c[i] = }")

threadsPerBlock = 256
blocksPerGrid = (N + threadsPerBlock - 1) // threadsPerBlock

cpp.vectorAdd[blocksPerGrid, threadsPerBlock](a, b, c, N)
cpp.cudaDeviceSynchronize()

err = cpp.cudaGetLastError()
if err:
    print(f"{cpp.cudaGetErrorName(err)}: {cpp.cudaGetErrorString(err)}")

print("AFTER COMPUTE")
for i in range(N):
    print(f"{a[i] = }, {b[i] = }, {c[i] = }")

Depends On

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant