-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Hello, I am a student from National Taiwan University of Science and Technology. I would like to do some research based on your program, but after carefully reading the nvshare code and deploying it on Kubernetes, I encountered the following deadlock/loop:
The workload’s TensorFlow triggers cuInit for the first time, which also triggers initialize_client. The client_fn thread is started during initialize_client, but before initialize_client completes, client_fn calls real_cuInit.
Inside real_cuInit, dlsym("cuInit") is invoked, which is intercepted by your interposer and returns the wrapper. This re-enters the wrapper cuInit, which again calls pthread_once(&init_done, initialize_client). Since initialize_client has not finished, pthread_once blocks, and cuInit gets stuck.
Moreover, I think that even without the client_fn issue, based on the program logic and my actual runs, this could still evolve into a situation where real_cuInit and cuInit repeatedly call each other, causing a deadlock.
I would greatly appreciate your advice and help. Thank you very much.