Question about cuda compatibility

With the upgrade of CUDA and NVML versions, some functions have emerged with a "_v2" suffix, such as `nvmlDeviceGetMemoryInfo` and `nvmlDeviceGetMemoryInfo_v2`. When upper-level applications call these functions, they may preferentially invoke the v2 functions. If libcuda.so or libnvidia-ml.so does not declare the v2 functions, then the v1 version will be called, as in this code snippet https://github.com/XuehaiPan/nvitop/blob/470245dc3da0d9f4e3106b2c981d63d23440a5a5/nvitop/api/libnvml.py#L861-L879  . 

However, when we implement a hook library like nvshare, if we provide a declaration for the v2 version of the function to be compatible with higher versions and attempt to call the v2 version in the real library, there could be an issue if the real library is a lower version that does not have the v2 function, potentially leading to an exception.

For instance, in this code at https://github.com/grgalex/nvshare/blob/main/src/hook.c#L598  , it returns CUDA_ERROR_NOT_INITIALIZED when real libcuda.so has no `cuGetProcAddress_v2` function, which might cause the user program to malfunction.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about cuda compatibility #18

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about cuda compatibility #18

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions