-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: CUDA error: invalid device ordinal #3
Comments
Hi! Thanks for the question and the interest in the work. When developing this code, I was using a machine with 8 GPUs. I just pushed a change to make the code compatible with more machines. See here: 833f421 Note: for a 2 GPU machine, you may also want to try running with Let me know if you are still running into problems and thanks for the issue! |
I have the same problem as issue4 here is the log after I enter Traceback (most recent call last):
File "/home/pi/anaconda3/envs/cow/lib/python3.7/multiprocessing/popen_fork.py", line 28, in poll
Process Process-1:
Process Process-3:
Process Process-7:
Process Process-5:
Process Process-2:
Process Process-6:
Process Process-4:
Process Process-8:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/pi/anaconda3/envs/cow/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/pi/anaconda3/envs/cow/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
Traceback (most recent call last):
File "/home/pi/anaconda3/envs/cow/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/pi/anaconda3/envs/cow/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/pi/anaconda3/envs/cow/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/pi/Desktop/RL_learning/cow/robothor_challenge.py", line 267, in inference_worker
controller = ai2thor.controller.Controller(**controller_kwargs)
File "/home/pi/Desktop/RL_learning/cow/robothor_challenge.py", line 267, in inference_worker
controller = ai2thor.controller.Controller(**controller_kwargs)
Traceback (most recent call last):
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/controller.py", line 498, in __init__
host=host,
File "/home/pi/anaconda3/envs/cow/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/controller.py", line 498, in __init__
host=host,
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/controller.py", line 1299, in start
self.last_event = self.server.receive()
File "/home/pi/Desktop/RL_learning/cow/robothor_challenge.py", line 267, in inference_worker
controller = ai2thor.controller.Controller(**controller_kwargs)
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/fifo_server.py", line 182, in receive
metadata, files = self._recv_message()
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/controller.py", line 1299, in start
self.last_event = self.server.receive()
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/controller.py", line 498, in __init__
host=host,
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/fifo_server.py", line 103, in _recv_message
self.server_pipe = open(self.server_pipe_path, "rb")
File "/home/pi/anaconda3/envs/cow/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
KeyboardInterrupt
Traceback (most recent call last):
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/fifo_server.py", line 103, in _recv_message
self.server_pipe = open(self.server_pipe_path, "rb")
File "/home/pi/anaconda3/envs/cow/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/pi/anaconda3/envs/cow/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/pi/anaconda3/envs/cow/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/pi/Desktop/RL_learning/cow/robothor_challenge.py", line 267, in inference_worker
controller = ai2thor.controller.Controller(**controller_kwargs)
KeyboardInterrupt
File "/home/pi/Desktop/RL_learning/cow/robothor_challenge.py", line 267, in inference_worker
controller = ai2thor.controller.Controller(**controller_kwargs)
File "/home/pi/Desktop/RL_learning/cow/robothor_challenge.py", line 267, in inference_worker
controller = ai2thor.controller.Controller(**controller_kwargs)
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/controller.py", line 498, in __init__
host=host,
KeyboardInterrupt
File "/home/pi/anaconda3/envs/cow/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/controller.py", line 498, in __init__
host=host,
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/controller.py", line 498, in __init__
host=host,
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/controller.py", line 1299, in start
self.last_event = self.server.receive()
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/controller.py", line 1299, in start
self.last_event = self.server.receive()
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/controller.py", line 1299, in start
self.last_event = self.server.receive()
File "/home/pi/anaconda3/envs/cow/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/fifo_server.py", line 182, in receive
metadata, files = self._recv_message()
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/fifo_server.py", line 182, in receive
metadata, files = self._recv_message()
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/fifo_server.py", line 182, in receive
metadata, files = self._recv_message()
File "/home/pi/anaconda3/envs/cow/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/pi/Desktop/RL_learning/cow/robothor_challenge.py", line 267, in inference_worker
controller = ai2thor.controller.Controller(**controller_kwargs)
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/fifo_server.py", line 103, in _recv_message
self.server_pipe = open(self.server_pipe_path, "rb")
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/fifo_server.py", line 103, in _recv_message
self.server_pipe = open(self.server_pipe_path, "rb")
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/fifo_server.py", line 103, in _recv_message
self.server_pipe = open(self.server_pipe_path, "rb")
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/controller.py", line 498, in __init__
host=host,
File "/home/pi/anaconda3/envs/cow/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
KeyboardInterrupt
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/controller.py", line 1299, in start
self.last_event = self.server.receive()
KeyboardInterrupt
File "/home/pi/Desktop/RL_learning/cow/robothor_challenge.py", line 267, in inference_worker
controller = ai2thor.controller.Controller(**controller_kwargs)
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/fifo_server.py", line 182, in receive
metadata, files = self._recv_message()
KeyboardInterrupt
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/controller.py", line 498, in __init__
host=host,
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/fifo_server.py", line 103, in _recv_message
self.server_pipe = open(self.server_pipe_path, "rb")
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/controller.py", line 1299, in start
self.last_event = self.server.receive()
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/fifo_server.py", line 182, in receive
metadata, files = self._recv_message()
File "/home/pi/anaconda3/envs/cow/lib/python3.7/site-packages/ai2thor/fifo_server.py", line 103, in _recv_message
self.server_pipe = open(self.server_pipe_path, "rb")
KeyboardInterrupt
KeyboardInterrupt
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt my computer isn't out of memory +-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.116.04 Driver Version: 525.116.04 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro M5000 Off | 00000000:03:00.0 On | Off |
| 42% 47C P5 27W [/](https://file+.vscode-resource.vscode-cdn.net/) 150W | 6109MiB [/](https://file+.vscode-resource.vscode-cdn.net/) 8192MiB | 5% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Quadro M5000 Off | 00000000:A1:00.0 Off | Off |
| 39% 43C P8 12W [/](https://file+.vscode-resource.vscode-cdn.net/) 150W | 4849MiB [/](https://file+.vscode-resource.vscode-cdn.net/) 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1842 G [/usr/lib/xorg/Xorg](https://file+.vscode-resource.vscode-cdn.net/usr/lib/xorg/Xorg) 65MiB |
| 0 N/A N/A 3181 G [/usr/lib/xorg/Xorg](https://file+.vscode-resource.vscode-cdn.net/usr/lib/xorg/Xorg) 530MiB |
| 0 N/A N/A 3381 G [/usr/bin/gnome-shell](https://file+.vscode-resource.vscode-cdn.net/usr/bin/gnome-shell) 64MiB |
| 0 N/A N/A 5141 G ...2gtk-4.0/WebKitWebProcess 52MiB |
| 0 N/A N/A 28551 G ...RendererForSitePerProcess 21MiB |
| 0 N/A N/A 235930 G ...RendererForSitePerProcess 10MiB |
| 0 N/A N/A 3011973 G ...RendererForSitePerProcess 167MiB |
| 0 N/A N/A 3258465 G ...300715944505616879,262144 30MiB |
| 0 N/A N/A 3268641 G ...155906284107188537,131072 126MiB |
| 0 N/A N/A 3489731 G ...093122278100996567,262144 25MiB |
| 0 N/A N/A 3798582 G ...626843.log --shared-files 120MiB |
| 0 N/A N/A 3823746 C ...onda3/envs/cow/bin/python 1207MiB |
| 0 N/A N/A 3823847 C ...onda3/envs/cow/bin/python 1207MiB |
| 0 N/A N/A 3823947 C ...onda3/envs/cow/bin/python 1207MiB |
| 0 N/A N/A 3824047 C ...onda3/envs/cow/bin/python 1207MiB |
| 1 N/A N/A 1842 G [/usr/lib/xorg/Xorg](https://file+.vscode-resource.vscode-cdn.net/usr/lib/xorg/Xorg) 3MiB |
| 1 N/A N/A 3181 G [/usr/lib/xorg/Xorg](https://file+.vscode-resource.vscode-cdn.net/usr/lib/xorg/Xorg) 3MiB |
| 1 N/A N/A 3823797 C ...onda3/envs/cow/bin/python 1207MiB |
| 1 N/A N/A 3823895 C ...onda3/envs/cow/bin/python 1207MiB |
| 1 N/A N/A 3823997 C ...onda3/envs/cow/bin/python 1207MiB |
| 1 N/A N/A 3824097 C ...onda3/envs/cow/bin/python 1207MiB |
+-----------------------------------------------------------------------------+ |
Are the processes running at all or are the threads locking? |
Yes, the processes running at all, I use About threads locking.
so I didn't lock the thread manually. |
I have the same issue with @Bailey-24 |
Can it only run with 8GPU? I also want to know about GUI issues. |
@Bailey-24 was your only change to switch to an 8 GPU machine? |
yes, I only change to switch to an 8 GPU machine. |
Interesting, will close this issue, but will open a new issue for <8 GPU testing |
I ran command
python pasture_runner.py -a src.models.agent_fbe_owl -n 8 --arch B32 --center
why this happened? and how to solve?
I have followed the solution in StackOverflow or GitHub, but it also has the same problem.
Is that the cuda vision is not correct?
I'm eager to use both two GPUs to run.
The text was updated successfully, but these errors were encountered: