Skip to content

Inconsistent GPU usage when embeddings exist vs. when they don’t #6

@stormliucong

Description

@stormliucong

Description:
When an HPO embedding cache already exists, the process raises a “Can't initialize NVML” warning. As a result, the computation falls back to CPU (see Example 1).

However, if no embedding exists, the error does not occur, and inference runs on GPU as expected (see Example 2).

Environment:
This behavior was observed on a Slurm-managed HPC cluster (HMS Biogrid). The exact environment setup may be a contributing factor, but I’m not entirely sure.

Example1

Executing: python.phenogpt2 /home/ch262025/PhenoGPT2/inference.py -i "/home/ch262025/PhenoGPT2/data/example/task_list_subset.json" -o "/home/ch262025/PhenoGPT2/data/results/example_testing" -model_dir "/programs/local/biogrids/phenogpt2/models/PhenoGPT2-EHR" -index "0" -negation --text_only
/programs/x86_64-linux/phenogpt2/51acdf1/.pixi/envs/default/lib/python3.11/site-packages/torch/cuda/__init__.py:129: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at /home/conda/feedstock_root/build_artifacts/libtorch_1744247799952/work/c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
/programs/x86_64-linux/phenogpt2/51acdf1/.pixi/envs/default/lib/python3.11/site-packages/torch/cuda/__init__.py:734: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
`torch_dtype` is deprecated! Use `dtype` instead!
Detected existing HPO Database Embeddings

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:00<00:00, 79.09it/s]
start phenogpt2
/home/ch262025/PhenoGPT2/data/results/example_testing
use_vision: False

  0%|          | 0/10 [00:00<?, ?it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

Example 2

Executing: python.phenogpt2 /home/ch262025/PhenoGPT2/inference.py -i "/home/ch262025/PhenoGPT2/data/example/task_list_subset.json" -o "/home/ch262025/PhenoGPT2/data/results/example_testing" -model_dir "/programs/local/biogrids/phenogpt2/models/PhenoGPT2-EHR" -index "0" -negation --text_only
No existing HPO Database Embeddings are stored - Running embedding now
Embedding HPO database::   0%|          | 0/40451 [00:00<?, ?it/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.

Embedding HPO database::   0%|          | 1/40451 [00:00<1:59:52,  5.62it/s]
Embedding HPO database::   0%|          | 19/40451 [00:00<08:13, 81.94it/s] 
...
Embedding HPO database:: 100%|██████████| 40451/40451 [03:04<00:00, 219.69it/s]
`torch_dtype` is deprecated! Use `dtype` instead!

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:01<00:03,  1.13s/it]
Loading checkpoint shards:  50%|█████     | 2/4 [00:02<00:02,  1.11s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:03<00:01,  1.10s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.26it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.10it/s]
start phenogpt2
/home/ch262025/PhenoGPT2/data/results/example_testing
use_vision: False

  0%|          | 0/10 [00:00<?, ?it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
2025-10-02 16:14:02.715 | DEBUG    | PyRuSH.PyRuSHSentencizer:predict:100 - ....
...
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Merging all chunks together: 0it [00:00, ?it/s]�[A
Merging all chunks together: 1it [00:00, 32768.00it/s]

 10%|█         | 1/10 [00:06<00:58,  6.53s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
...
Merging all chunks together: 0it [00:00, ?it/s]�[A
Merging all chunks together: 1it [00:00, 38130.04it/s]

100%|██████████| 10/10 [00:38<00:00,  3.29s/it]
100%|██████████| 10/10 [00:38<00:00,  3.89s/it]


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions