-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hi,
I'm experiencing an issue with CUDA memory running out during training on a single NVIDIA Tesla V100 with 32GB of VRAM. I have noticed the following warning during training:
tiny-cuda-nn warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
This warning suggests that FullyFusedMLP is not supported for the selected architecture (Compute Capability 7.0, which corresponds to the V100). I understand that raising the target GPU architecture to 7.5 or higher may resolve this issue, but I am unsure if this is directly related to the memory issue.
The model training still runs out of memory on the single 32GB V100. Would increasing the number of GPUs help, or does the DataParallel strategy only perform data parallelism (where each GPU loads the model independently)?
What I need help with:
-
How to solve the CUDA out-of-memory issue on a single V100, or should I consider adding more GPUs?
-
Does the warning about FullyFusedMLP and the need for architecture 7.5+ impact memory usage? Should I resolve this warning, and will it reduce memory consumption?
Thanks in advance!