Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The training speed is very slow #126

Open
songsenIng opened this issue Nov 13, 2023 · 6 comments
Open

The training speed is very slow #126

songsenIng opened this issue Nov 13, 2023 · 6 comments

Comments

@songsenIng
Copy link

Hello, thank you so much for your excellent work!
I made my own dataset using colmap, but training this dataset was very slow (30,000 iterations took one day), what could be the reason for this phenomenon? What should I do? I look forward to your reply. Thank you very much.

@songsenIng
Copy link
Author

I hope you can help. Thank you very much

@olkovi
Copy link

olkovi commented Nov 13, 2023

I had a similar training speed on one machine where the training was CPU limited, so you should check your GPU load. If it only reaches the expected value intermittently, as opposed to hovering above 95% most of the time, then this is likely the problem.

Try launching NeuS on a different machine if this indeed is the case.

@songsenIng
Copy link
Author

songsenIng commented Nov 14, 2023

I had a similar training speed on one machine where the training was CPU limited, so you should check your GPU load. If it only reaches the expected value intermittently, as opposed to hovering above 95% most of the time, then this is likely the problem.

Try launching NeuS on a different machine if this indeed is the case.

Hello, thank you for your reply! While training my own data set, both the GPU and CPU were kept low. Only the running memory usage is higher, about 60% (25G). Therefore, it is speculated that the training speed may be slow because the training model does not call the GPU. But I don't know what caused the "slow training" problem during the production of the data set. Can you help me?

@olkovi
Copy link

olkovi commented Nov 14, 2023

Hello, thank you for your reply! While training my own data set, both the GPU and CPU were kept low. Only the running memory usage is higher, about 60% (25G). Therefore, it is speculated that the training speed may be slow because the training model does not call the GPU. But I don't know what caused the "slow training" problem during the production of the data set. Can you help me?

Well, you've found the reason then, more or less. You need to supply the data to the GPU such that it remains fully loaded. Ultimately my solution was just to use the same environment on a different machine, so I can't point you to anything precise.

Does your environment does match the requirements.txt, by the way? If not, try to build one that does.

Also you could try increasing the batch size. This didn't work for me. If for you it doesn't work either, you'll have to profile the code to see which parts consume the most time, and then try to cut down on them.

@songsenIng
Copy link
Author

Well, thank you for your answer. It's very helpful to me.

@Zeng-Fan-Yi
Copy link

Well, thank you for your answer. It's very helpful to me.谢谢你的回答对我很有帮助。

I am also facing a similar situation as you. Have you solved the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants