Low GPU utilization #20292
Unanswered
jbuckman
asked this question in
DDP / multi-GPU / multi-node
Low GPU utilization
#20292
Replies: 1 comment
-
I'm experiencing almost same issue. After completing a 50,000-step training run with two A40 GPUs, I observed very high and severe fluctuations. The only difference is that my fluctuated GPU utilization peaked at 100%. I'm suspecting this is due to a mismatch in data throughput between the CPU pre-processing and GPU processing. Have you found any effective solutions? Thanks in advance. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is my first time trying out PyTorch Lightning. On a node with 8 H100s, I'm running the code from this example: https://lightning.ai/lightning-ai/studios/pretrain-an-llm-with-pytorch-lightning?tab=overview
All three versions, small medium large, get extremely low GPU utilization, nvidia-smi fluctuating but typically around 50%. Is the example code not correct? What changes need to be made?
Beta Was this translation helpful? Give feedback.
All reactions