Cuda out of memory issue #15

LaFeuilleMorte · 2024-12-13T08:21:35Z

Hi, I've met with cuda oom issue even with a small dataset 126 images. And I use the mcmc gaussian splatting and set cap_max=150,000 to reduce memory footprint. But the process on my A100 GPU crashed with OOM error.

| File "/aistudio/workspace/system-default/envs/droidsplat/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap |
-- | -- | --
| | 2024-12-13 16:15:09.130 | self.run() |
| | 2024-12-13 16:15:09.130 | File "/aistudio/workspace/system-default/envs/droidsplat/lib/python3.10/multiprocessing/process.py", line 108, in run |
| | 2024-12-13 16:15:09.130 | self._target(*self._args, **self._kwargs) |
| | 2024-12-13 16:15:09.130 | File "/aistudio/workspace/aigc/wangqihang013/aigc3d/repos/neural_rendering/sfm/DROID-Splat/src/slam.py", line 310, in tracking |
| | 2024-12-13 16:15:09.130 | self.frontend(timestamp, image, depth, intrinsic, gt_pose, static_mask=static_mask) |
| | 2024-12-13 16:15:09.130 | File "/aistudio/workspace/system-default/envs/droidsplat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl |
| | 2024-12-13 16:15:09.130 | return self._call_impl(*args, **kwargs) |
| | 2024-12-13 16:15:09.130 | File "/aistudio/workspace/system-default/envs/droidsplat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl |
| | 2024-12-13 16:15:09.130 | return forward_call(*args, **kwargs) |
| | 2024-12-13 16:15:09.130 | File "/aistudio/workspace/system-default/envs/droidsplat/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context |
| | 2024-12-13 16:15:09.130 | return func(*args, **kwargs) |
| | 2024-12-13 16:15:09.130 | File "/aistudio/workspace/aigc/wangqihang013/aigc3d/repos/neural_rendering/sfm/DROID-Splat/src/frontend.py", line 39, in forward |
| | 2024-12-13 16:15:09.130 | self.optimizer() # Local Bundle Adjustment |
| | 2024-12-13 16:15:09.130 | File "/aistudio/workspace/aigc/wangqihang013/aigc3d/repos/neural_rendering/sfm/DROID-Splat/src/frontend.py", line 220, in call |
| | 2024-12-13 16:15:09.130 | self.__update() |
| | 2024-12-13 16:15:09.130 | File "/aistudio/workspace/aigc/wangqihang013/aigc3d/repos/neural_rendering/sfm/DROID-Splat/src/frontend.py", line 100, in __update |
| | 2024-12-13 16:15:09.130 | self.graph.rm_factors(self.graph.age > self.max_age, store=True) |
| | 2024-12-13 16:15:09.130 | File "/aistudio/workspace/system-default/envs/droidsplat/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast |
| | 2024-12-13 16:15:09.130 | return func(*args, **kwargs) |
| | 2024-12-13 16:15:09.130 | File "/aistudio/workspace/aigc/wangqihang013/aigc3d/repos/neural_rendering/sfm/DROID-Splat/src/factor_graph.py", line 178, in rm_factors |
| | 2024-12-13 16:15:09.130 | self.corr = self.corr[~mask] |
| | 2024-12-13 16:15:09.130 | File "/aistudio/workspace/aigc/wangqihang013/aigc3d/repos/neural_rendering/sfm/DROID-Splat/src/modules/corr.py", line 72, in getitem |
| | 2024-12-13 16:15:09.130 | self.corr_pyramid[i] = self.corr_pyramid[i][index] |
| | 2024-12-13 16:15:09.130 | torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 334.00 MiB. GPU 0 has a total capacty of 39.42 GiB of which 289.06 MiB is free. Process 48412 has 31.97 GiB memory in use. Process 65823 has 2.55 GiB memory in use. Process 67515 has 1.78 GiB memory in use. Process 68107 has 416.00 MiB memory in use. Process 69865 has 2.02 GiB memory in use. Process 70462 has 416.00 MiB memory in use. Of the allocated memory 652.43 MiB is allocated by PyTorch, and 647.57 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ChenHoy · 2024-12-14T11:01:56Z

That is very odd, what resolution do the images have? Can you maybe share the log of what parameters you used?

LaFeuilleMorte · 2024-12-17T06:16:25Z

That is very odd, what resolution do the images have? Can you maybe share the log of what parameters you used?

Sorry for the late reply, My config was:

original camera parameters

cam:
H: 960
W: 960
H_out: 480 # 360
W_out: 480 # 640
H_edge: 0
W_edge: 0

We calibrate the camera once in prgbd mode without any scale optimization, which roughly gets the right parameters

fx: 275 # heuristic: 1296.0
fy: 275 # heuristic: 1296.0
cx: 480 # heuristic: 960
cy: 480 # heuristic: 540
calibration_txt: ''
camera_model: "pinhole"

And my running script:

python run.py data=Custom/hd.yaml
data.input_folder= {MY_DATA_FOLDER}
tracking=base
stride=1
mode=rgb
mapping.mcmc.cap_max=150000

LaFeuilleMorte · 2024-12-17T06:17:26Z

That is very odd, what resolution do the images have? Can you maybe share the log of what parameters you used?

Sorry for the late reply, My config was:

cam:
H: 960
W: 960
H_out: 480 # 360
W_out: 480 # 640
H_edge: 0
W_edge: 0

fx: 275 # heuristic: 1296.0
fy: 275 # heuristic: 1296.0
cx: 480 # heuristic: 960
cy: 480 # heuristic: 540
calibration_txt: ''
camera_model: "pinhole"

And my running script:

python run.py data=Custom/hd.yaml
data.input_folder= {MY_DATA_FOLDER}
tracking=base
stride=1
mode=rgb
mapping.mcmc.cap_max=150000

ChenHoy · 2024-12-17T13:05:46Z

Hey,
I dont understand the calibration part: so you roughly get the right parameters after calibration? Does the OOM happen with intrinsics optimization or without?

Your resolution is not too big, you dont have a lot of images and you dont seem to have a lot of Gaussians either, I dont really understand why this OOM would happen. Can you give more info where/when this is triggered? Could you try to run the SLAM system without backend by setting run_backend=False? That way we can rule out, that its the global Bundle Adjustment, during which we OOM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda out of memory issue #15

Cuda out of memory issue #15

LaFeuilleMorte commented Dec 13, 2024 •

edited

Loading

ChenHoy commented Dec 14, 2024

LaFeuilleMorte commented Dec 17, 2024

LaFeuilleMorte commented Dec 17, 2024 •

edited

Loading

ChenHoy commented Dec 17, 2024

Cuda out of memory issue #15

Cuda out of memory issue #15

Comments

LaFeuilleMorte commented Dec 13, 2024 • edited Loading

ChenHoy commented Dec 14, 2024

LaFeuilleMorte commented Dec 17, 2024

original camera parameters

We calibrate the camera once in prgbd mode without any scale optimization, which roughly gets the right parameters

LaFeuilleMorte commented Dec 17, 2024 • edited Loading

ChenHoy commented Dec 17, 2024

LaFeuilleMorte commented Dec 13, 2024 •

edited

Loading

LaFeuilleMorte commented Dec 17, 2024 •

edited

Loading