Speaker Diarization Inference: GPU Utilization 0%, is this normal? #7969

tonytonfisk2 · 2023-12-04T22:04:27Z

tonytonfisk2
Dec 4, 2023

Been monitoring the GPU utilization during NeMo diarization inference (msdd) and the gpu utilization is always at 0% even after setting the device to 'cuda'. I find the the titanet extraction to be quite slow for long videos, currently using a Nvidia T4 for inference. Just wondering if this is normal behavior. @tango4j sorry to bother you but do you perhaps know?

Currently running the inference in a nvidia/cuda:12.0.0-cudnn8-devel-ubuntu22.04 container also tried cuda 11.8 no difference.

Answered by tango4j

Dec 5, 2023

for some reason cuda=False is getting fed.

make sure cfg.device="cuda"
is holding at the line below.

https://github.com/tango4j/NeMo/blob/d81beac52423dbd04b48e4e04567b17df2428e3a/examples/speaker_tasks/diarization/neural_diarizer/multiscale_diar_decoder_infer.py#L32

View full answer

tango4j · 2023-12-04T22:15:32Z

tango4j
Dec 4, 2023
Collaborator

Hi, this behavior is not normal. NeMo diarization is checking usage of GPU when cuda=True in CI-test.
I believe your pytorch setting is somewhat off.
I suggest if torch.cuda.is_available():; print('CUDA is on') in the inference code and check if device is really available.
It is a quite common scenario that pytorch and CUDA environment is not detecting the GPU in the system.

0 replies

tonytonfisk2 · 2023-12-05T01:07:04Z

tonytonfisk2
Dec 5, 2023
Author

'cuda is on' gets printed when I try this. The GPU utilization is still at 0% when diarizing however the GPU memory seems to be going up a bit when being used. So not sure what could be wrong, the torch version is 2.1.1+cu118 and cuda = 11.8.

When I try to monitor my other container that runs whisper, the gpu utilization spikes up when being used so there shouldn't be anything wrong with the monitoring.

I just followed the diarization_inference tutorial when implementing.

0 replies

tonytonfisk2 · 2023-12-05T01:27:16Z

tonytonfisk2
Dec 5, 2023
Author

I suppose I also get this output before the cluster process
[NeMo W 2023-12-05 01:12:36 speaker_utils:464] cuda=False, using CPU for eigen decomposition. This might slow down the clustering process.

So the NeMo toolkit can't detect my GPU? However running torch.cuda.is_available() returns True. 😫

0 replies

tango4j · 2023-12-05T05:03:47Z

tango4j
Dec 5, 2023
Collaborator

for some reason cuda=False is getting fed.

make sure cfg.device="cuda"
is holding at the line below.

https://github.com/tango4j/NeMo/blob/d81beac52423dbd04b48e4e04567b17df2428e3a/examples/speaker_tasks/diarization/neural_diarizer/multiscale_diar_decoder_infer.py#L32

0 replies

tonytonfisk2 · 2023-12-05T12:33:30Z

tonytonfisk2
Dec 5, 2023
Author

Thank you! Added the line diarizer_model = NeuralDiarizer(cfg=cfg).to(cfg.device) It now runs on GPU! Inference is now much faster.

0 replies

Hyprnx · 2024-03-28T03:30:32Z

Hyprnx
Mar 28, 2024

Hi, im having the same problem as above,

I tried to create the model with:

        model = NeuralDiarizer.from_pretrained('diar_msdd_telephonic', map_location='cuda')

        model = NeuralDiarizer.from_pretrained('diar_msdd_telephonic', map_location='cuda:0')

        model = NeuralDiarizer.from_pretrained('diar_msdd_telephonic', map_location='cuda').to('cuda')

        model = NeuralDiarizer.from_pretrained('diar_msdd_telephonic', map_location='cuda:0').to('cuda:0')

and even

        model = NeuralDiarizer.from_pretrained('diar_msdd_telephonic').to('cuda:0')

On a certain machine (the jupyter notebook on AWS Sagemaker, running on Data Science 3.0, ml.g4dn.xlarge machine), all of these option worked with GPU flawlessly, this only takes 6 seconds for my audio file.

But when i deploy it on AWS Sagemaker Endpoint(with the container PyTorch 2.0.0 with HuggingFace transformers, running on ml.g4dn.xlarge), Nemo keeps telling me:

[NeMo W 2023-12-05 01:12:36 speaker_utils:464] cuda=False, using CPU for eigen decomposition. This might slow down the clustering process.

just as above. This took 112 seconds to complete with the same audio file.

i dont understand why with the same audio file, same code, it could result in two completely different outcomes?

I appreciate any help, thank you in advanced

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speaker Diarization Inference: GPU Utilization 0%, is this normal? #7969

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Speaker Diarization Inference: GPU Utilization 0%, is this normal? #7969

tonytonfisk2 Dec 4, 2023

Replies: 6 comments

tango4j Dec 4, 2023 Collaborator

tonytonfisk2 Dec 5, 2023 Author

tonytonfisk2 Dec 5, 2023 Author

tango4j Dec 5, 2023 Collaborator

tonytonfisk2 Dec 5, 2023 Author

Hyprnx Mar 28, 2024

tonytonfisk2
Dec 4, 2023

tango4j
Dec 4, 2023
Collaborator

tonytonfisk2
Dec 5, 2023
Author

tonytonfisk2
Dec 5, 2023
Author

tango4j
Dec 5, 2023
Collaborator

tonytonfisk2
Dec 5, 2023
Author

Hyprnx
Mar 28, 2024