Timing scaden train on cpu vs gpu #101

nagendraKU · 2021-07-02T13:27:09Z

I am running scaden train on a cluster node with a Tesla V100 GPU, but (on casual observation) I don't see a time difference in the training when the GPU is enabled or disabled.

I do get the following message when the GPU is disabled, so it looks like scaden can "see" the GPU ? I have tensorflow-gpu installed.

INFO Training M256 Model ... train.py:54
2021-07-02 15:14:22.035782: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

Is there a way to check if the GPU is being used by scaden ? And at a practical level, is it just simpler to let scaden train run on a 40 core CPU than getting the GPU part to work ?

KevinMenden · 2021-07-02T13:47:14Z

Hi @nagendraKU ,

it looks like there is some issue with the CUDA installation and it somehow can't connect to it. That can happen for various reasons - hard to tell from here!

But from a practical aspect, yes you're right :) I think you'll be just fine with your 40 core CPU, training should not take too long anyway. So in that case it might not be worth the effort to get the GPU running. It's not a huge model!

Cheers,
Kevin

nagendraKU · 2021-07-02T14:08:54Z

Thanks for the input Kevin !

nagendraKU closed this as completed Jul 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timing scaden train on cpu vs gpu #101

Timing scaden train on cpu vs gpu #101

nagendraKU commented Jul 2, 2021

KevinMenden commented Jul 2, 2021

nagendraKU commented Jul 2, 2021

Timing scaden train on cpu vs gpu #101

Timing scaden train on cpu vs gpu #101

Comments

nagendraKU commented Jul 2, 2021

KevinMenden commented Jul 2, 2021

nagendraKU commented Jul 2, 2021