Not seeing the inference speed up on cuda using the sparse trainer notebook #27

HamidShojanazeri · 2021-09-09T05:11:18Z

Hi @madlag ,
I have tried the notebook which is very similar to the notebook you shared in the issue #5 but I am not seeing any speed up at the end if we move the models to cuda, although I can see about 1.3X speed up on cpu. I am running this on EC2 g4dn.2xlarge instance which has T4 card.

This is my training code and this is the inference code. I wonder if I am missing something here.

The parameter counts shows the reduction but the inference speed is both pruned and non-pruned ~9 ms.

prunebert_model.num_parameters() / bert_model_original.num_parameters() = 0.6118184376136527

Thanks for you help and the great work.

The text was updated successfully, but these errors were encountered:

madlag · 2021-10-21T10:54:31Z

Hello @HamidShojanazeri,

Not 100% sure, but this is probably because the GPU has not enough computation to do at once to show a significant difference. Try increase the batch size, and see if this makes a difference.
(The GPUs those days are huge parallel beasts, and if you are not sending them enough work they are mostly idle whatever the task, you just see latency but there is not 100% use of the compute power.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not seeing the inference speed up on cuda using the sparse trainer notebook #27

Not seeing the inference speed up on cuda using the sparse trainer notebook #27

HamidShojanazeri commented Sep 9, 2021 •

edited

Loading

madlag commented Oct 21, 2021

Not seeing the inference speed up on cuda using the sparse trainer notebook #27

Not seeing the inference speed up on cuda using the sparse trainer notebook #27

Comments

HamidShojanazeri commented Sep 9, 2021 • edited Loading

madlag commented Oct 21, 2021

HamidShojanazeri commented Sep 9, 2021 •

edited

Loading