-
Notifications
You must be signed in to change notification settings - Fork 32
Description
With the upgrade of CUDA and NVML versions, some functions have emerged with a "_v2" suffix, such as nvmlDeviceGetMemoryInfo and nvmlDeviceGetMemoryInfo_v2. When upper-level applications call these functions, they may preferentially invoke the v2 functions. If libcuda.so or libnvidia-ml.so does not declare the v2 functions, then the v1 version will be called, as in this code snippet https://github.com/XuehaiPan/nvitop/blob/470245dc3da0d9f4e3106b2c981d63d23440a5a5/nvitop/api/libnvml.py#L861-L879 .
However, when we implement a hook library like nvshare, if we provide a declaration for the v2 version of the function to be compatible with higher versions and attempt to call the v2 version in the real library, there could be an issue if the real library is a lower version that does not have the v2 function, potentially leading to an exception.
For instance, in this code at https://github.com/grgalex/nvshare/blob/main/src/hook.c#L598 , it returns CUDA_ERROR_NOT_INITIALIZED when real libcuda.so has no cuGetProcAddress_v2 function, which might cause the user program to malfunction.