-
-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix import error, Copy-on-read Overhead ( called memory leak in repository ) and slightly refactor dist_utils.py for improved readability #418
base: main
Are you sure you want to change the base?
Conversation
Thanks for your pr for these problems. For If |
it's a problem multiprocess when dataloader num_workers > 0 As a result, Train / evaluation dataset both caus unnecessary memory usage. It sounds strange that there is no problem without evaluation operation. Even the memory_check.py code that I provided doesn't do either of the train/evaluation operations. |
c9c9e18
to
55ca98f
Compare
From what i see, the problem starts at synchronization and accumulation part at det_solver.py
|
@VladKatsman
Unless you don't use your cython object, I'm guessing that memory efficiency will definitely increase. This only reduces memory usage, not erasing memory altogether. If you don't have enough memory, it may seem like it doesn't help you in terms of memory. Or as you said, det_solver.py problem also exists at the same time. |
I am sorry, i will reply to you from high level point of view, without code. That is command i used before train params: It takes about 21GB out of 24GB of each GPU memory and about 20 GB RAM. Now during evaluation, the number raises over 128 GB RAM (which my total RAM size). Your updated code did not solve that problem as well. There is still SEGFAULT error. I've evaluated model using 1 GPU and 1 process so it took about 50 GB RAM for evaluation which is huge number as well. I dont know where to start to look for the problem, it looks like evaluation code itself is not memory efficient somehow. If we will choose to use your project I will be happy to debug it and commit fixes and changes. |
Currently, there is a problem with memory exploding in the coco dataset class.
#93 #172 #207
The cause by Copy-on-read of the Forked CPython object
if you want to explore this problem, check this blog post Demystify-RAM-Usage-in-Multiprocess-DataLoader
The CocoDetection_share_memory class uses less total pss memory than current repository coco dataset class.
This can be found in memory_check.py.