Speaker Diarization Inference: GPU Utilization 0%, is this normal? #7969
-
Been monitoring the GPU utilization during NeMo diarization inference (msdd) and the gpu utilization is always at 0% even after setting the device to 'cuda'. I find the the titanet extraction to be quite slow for long videos, currently using a Nvidia T4 for inference. Just wondering if this is normal behavior. @tango4j sorry to bother you but do you perhaps know? Currently running the inference in a nvidia/cuda:12.0.0-cudnn8-devel-ubuntu22.04 container also tried cuda 11.8 no difference. |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
Hi, this behavior is not normal. NeMo diarization is checking usage of GPU when cuda=True in CI-test. |
Beta Was this translation helpful? Give feedback.
-
'cuda is on' gets printed when I try this. The GPU utilization is still at 0% when diarizing however the GPU memory seems to be going up a bit when being used. So not sure what could be wrong, the torch version is 2.1.1+cu118 and cuda = 11.8. When I try to monitor my other container that runs whisper, the gpu utilization spikes up when being used so there shouldn't be anything wrong with the monitoring. I just followed the diarization_inference tutorial when implementing. |
Beta Was this translation helpful? Give feedback.
-
I suppose I also get this output before the cluster process So the NeMo toolkit can't detect my GPU? However running torch.cuda.is_available() returns True. 😫 |
Beta Was this translation helpful? Give feedback.
-
for some reason cuda=False is getting fed. make sure |
Beta Was this translation helpful? Give feedback.
-
Thank you! Added the line |
Beta Was this translation helpful? Give feedback.
-
Hi, im having the same problem as above, I tried to create the model with:
and even
On a certain machine (the jupyter notebook on AWS Sagemaker, running on Data Science 3.0, ml.g4dn.xlarge machine), all of these option worked with GPU flawlessly, this only takes 6 seconds for my audio file. But when i deploy it on AWS Sagemaker Endpoint(with the container PyTorch 2.0.0 with HuggingFace transformers, running on ml.g4dn.xlarge), Nemo keeps telling me:
just as above. This took 112 seconds to complete with the same audio file. i dont understand why with the same audio file, same code, it could result in two completely different outcomes? I appreciate any help, thank you in advanced |
Beta Was this translation helpful? Give feedback.
for some reason cuda=False is getting fed.
make sure
cfg.device="cuda"
is holding at the line below.
https://github.com/tango4j/NeMo/blob/d81beac52423dbd04b48e4e04567b17df2428e3a/examples/speaker_tasks/diarization/neural_diarizer/multiscale_diar_decoder_infer.py#L32