The training speed is very slow #126

songsenIng · 2023-11-13T05:40:02Z

Hello, thank you so much for your excellent work！
I made my own dataset using colmap, but training this dataset was very slow (30,000 iterations took one day), what could be the reason for this phenomenon? What should I do? I look forward to your reply. Thank you very much.

songsenIng · 2023-11-13T06:07:24Z

I hope you can help. Thank you very much

olkovi · 2023-11-13T19:18:02Z

I had a similar training speed on one machine where the training was CPU limited, so you should check your GPU load. If it only reaches the expected value intermittently, as opposed to hovering above 95% most of the time, then this is likely the problem.

Try launching NeuS on a different machine if this indeed is the case.

songsenIng · 2023-11-14T01:43:15Z

I had a similar training speed on one machine where the training was CPU limited, so you should check your GPU load. If it only reaches the expected value intermittently, as opposed to hovering above 95% most of the time, then this is likely the problem.

Try launching NeuS on a different machine if this indeed is the case.

Hello, thank you for your reply! While training my own data set, both the GPU and CPU were kept low. Only the running memory usage is higher, about 60% (25G). Therefore, it is speculated that the training speed may be slow because the training model does not call the GPU. But I don't know what caused the "slow training" problem during the production of the data set. Can you help me?

olkovi · 2023-11-14T14:37:20Z

Hello, thank you for your reply! While training my own data set, both the GPU and CPU were kept low. Only the running memory usage is higher, about 60% (25G). Therefore, it is speculated that the training speed may be slow because the training model does not call the GPU. But I don't know what caused the "slow training" problem during the production of the data set. Can you help me?

Well, you've found the reason then, more or less. You need to supply the data to the GPU such that it remains fully loaded. Ultimately my solution was just to use the same environment on a different machine, so I can't point you to anything precise.

Does your environment does match the requirements.txt, by the way? If not, try to build one that does.

Also you could try increasing the batch size. This didn't work for me. If for you it doesn't work either, you'll have to profile the code to see which parts consume the most time, and then try to cut down on them.

songsenIng · 2023-11-14T14:47:20Z

Well, thank you for your answer. It's very helpful to me.

Zeng-Fan-Yi · 2024-01-15T12:44:18Z

Well, thank you for your answer. It's very helpful to me.谢谢你的回答对我很有帮助。

I am also facing a similar situation as you. Have you solved the problem？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The training speed is very slow #126

The training speed is very slow #126

songsenIng commented Nov 13, 2023

songsenIng commented Nov 13, 2023

olkovi commented Nov 13, 2023

songsenIng commented Nov 14, 2023 •

edited

Loading

olkovi commented Nov 14, 2023

songsenIng commented Nov 14, 2023

Zeng-Fan-Yi commented Jan 15, 2024

The training speed is very slow #126

The training speed is very slow #126

Comments

songsenIng commented Nov 13, 2023

songsenIng commented Nov 13, 2023

olkovi commented Nov 13, 2023

songsenIng commented Nov 14, 2023 • edited Loading

olkovi commented Nov 14, 2023

songsenIng commented Nov 14, 2023

Zeng-Fan-Yi commented Jan 15, 2024

songsenIng commented Nov 14, 2023 •

edited

Loading