Low GPU utilization #20292

jbuckman · 2024-09-19T22:22:03Z

jbuckman
Sep 19, 2024

This is my first time trying out PyTorch Lightning. On a node with 8 H100s, I'm running the code from this example: https://lightning.ai/lightning-ai/studios/pretrain-an-llm-with-pytorch-lightning?tab=overview

All three versions, small medium large, get extremely low GPU utilization, nvidia-smi fluctuating but typically around 50%. Is the example code not correct? What changes need to be made?

linyanxiang26 · 2025-09-10T05:39:01Z

linyanxiang26
Sep 10, 2025

I'm experiencing almost same issue. After completing a 50,000-step training run with two A40 GPUs, I observed very high and severe fluctuations. The only difference is that my fluctuated GPU utilization peaked at 100%. I'm suspecting this is due to a mismatch in data throughput between the CPU pre-processing and GPU processing. Have you found any effective solutions? Thanks in advance.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Low GPU utilization #20292

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Low GPU utilization #20292

Uh oh!

Uh oh!

jbuckman Sep 19, 2024

Replies: 1 comment

Uh oh!

linyanxiang26 Sep 10, 2025

jbuckman
Sep 19, 2024

linyanxiang26
Sep 10, 2025