-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving the GPU usage efficiency during training #75
Comments
there are two stages
pacemaker does use only signle GPU
|
I output the cpu and gpu utilization every 5 mins, gpus actually is fully used.
but cpu is under-utilized, for example
I later can probably try to use L-BFGS-B. Just was surprised why the cpu usage is so low. JJ |
are u sure that 5min is good enough sample rate to see in-epoch CPU/GPU utilization?? easiest way - ssh to computational node and use interactive tools like nvitop |
Dear authors,
When I use Tesla 4 GPU to train a model and found that the node utilization is quite low.
for example:
only 5.1% core is being used. While the time/eval ~ 70 mcs/at for 640 atomic functions is acceptable, the usage efficiency I think can be improved.
Here is more details about the node:
Since BFGS is fully based on CPU, I expect the improvement on CPU usage can shorten the training time. Also, will the training time be shortened by increasing the # of GPUs?
Best,
JJ
The text was updated successfully, but these errors were encountered: