Skip to content

Question about cuda compatibility #18

@pokerfaceSad

Description

@pokerfaceSad

With the upgrade of CUDA and NVML versions, some functions have emerged with a "_v2" suffix, such as nvmlDeviceGetMemoryInfo and nvmlDeviceGetMemoryInfo_v2. When upper-level applications call these functions, they may preferentially invoke the v2 functions. If libcuda.so or libnvidia-ml.so does not declare the v2 functions, then the v1 version will be called, as in this code snippet https://github.com/XuehaiPan/nvitop/blob/470245dc3da0d9f4e3106b2c981d63d23440a5a5/nvitop/api/libnvml.py#L861-L879 .

However, when we implement a hook library like nvshare, if we provide a declaration for the v2 version of the function to be compatible with higher versions and attempt to call the v2 version in the real library, there could be an issue if the real library is a lower version that does not have the v2 function, potentially leading to an exception.

For instance, in this code at https://github.com/grgalex/nvshare/blob/main/src/hook.c#L598 , it returns CUDA_ERROR_NOT_INITIALIZED when real libcuda.so has no cuGetProcAddress_v2 function, which might cause the user program to malfunction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions