Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

queue.Empty #4

Open
Southyang opened this issue May 24, 2023 · 13 comments
Open

queue.Empty #4

Southyang opened this issue May 24, 2023 · 13 comments

Comments

@Southyang
Copy link

I use a GPU to deploy this project, after I run this line of code

python pasture_runner.py -a src.models.agent_fbe_owl -n 1 --arch B32 --center

This prompt appeared

Traceback (most recent call last):
  File "pasture_runner.py", line 278, in <module>
    main()
  File "pasture_runner.py", line 273, in main
    test=False
  File "/home/southyang/southyang/code/cow/robothor_challenge.py", line 470, in inference
    timeout=1000)
  File "/home/southyang/anaconda3/envs/cow/lib/python3.7/multiprocessing/queues.py", line 105, in get
    raise Empty
_queue.Empty

I read the code in the corresponding part, but I didn't find where the problem is, how can I solve it?

@sagadre
Copy link
Collaborator

sagadre commented May 24, 2023

When running this command line can you also run watch -n 0.5 nvidia-smi to check if the processes are running on the GPU? You should see GPU power utilization going up for the GPU you are running on.

Unfortunately, not able to reproduce this issue on my end. Maybe you can also post the exact conda environment that you are using (conda env export > environment.yml), and I can investigate that further

@Bailey-24
Copy link

After waiting one hour.
image

run watch -n 0.5 nvidia-smi and python pasture_runner.py -a src.models.agent_fbe_owl -n 4 --arch B32 --center
bug

run conda env export > environment.yml

name: cow
channels:
  - aihabitat
  - pytorch
  - defaults
  - conda-forge
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/pro/
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - attrs=21.4.0=pyhd3eb1b0_0
  - brotli=1.0.9=he6710b0_2
  - bzip2=1.0.8=h7b6447c_0
  - c-ares=1.18.1=h7f8727e_0
  - ca-certificates=2022.6.15=ha878542_0
  - certifi=2022.6.15=py37h89c1867_0
  - cmake=3.14.0=h52cb24c_0
  - cycler=0.11.0=pyhd3eb1b0_0
  - dbus=1.13.18=hb2f20db_0
  - expat=2.4.4=h295c915_0
  - ffmpeg=4.3=hf484d3e_0
  - fontconfig=2.13.1=h6c09931_0
  - fonttools=4.25.0=pyhd3eb1b0_0
  - freetype=2.11.0=h70c0345_0
  - giflib=5.2.1=h7b6447c_0
  - gitdb=4.0.9=pyhd8ed1ab_0
  - gitpython=3.1.27=pyhd8ed1ab_0
  - glib=2.69.1=h4ff587b_1
  - gmp=6.2.1=h295c915_3
  - gnutls=3.6.15=he1e5248_0
  - gst-plugins-base=1.14.0=hbbd80ab_1
  - gstreamer=1.14.0=h28cd5cc_2
  - habitat-sim-mutex=1.0=headless_nobullet
  - headless=2.0=0
  - icu=58.2=he6710b0_3
  - imageio=2.19.3=pyhcf75d05_0
  - imageio-ffmpeg=0.4.7=pyhd8ed1ab_0
  - jbig=2.1=h7f98852_2003
  - jpeg=9e=h166bdaf_1
  - kiwisolver=1.4.2=py37h7cecad7_1
  - krb5=1.19.2=hac12032_0
  - lame=3.100=h7f98852_1001
  - lcms2=2.12=h3be6417_0
  - ld_impl_linux-64=2.38=h1181459_1
  - lerc=2.2.1=h2531618_0
  - libblas=3.9.0=15_linux64_openblas
  - libcblas=3.9.0=15_linux64_openblas
  - libcurl=7.82.0=h0b77cf5_0
  - libdeflate=1.7=h27cfd23_5
  - libedit=3.1.20210910=h7f8727e_0
  - libev=4.33=h7f8727e_1
  - libffi=3.3=he6710b0_2
  - libgcc-ng=11.2.0=h1234567_1
  - libgfortran-ng=12.1.0=h69a702a_16
  - libgfortran5=12.1.0=hdcd56e2_16
  - libgomp=11.2.0=h1234567_1
  - libiconv=1.16=h7f8727e_2
  - libidn2=2.3.2=h7f8727e_0
  - liblapack=3.9.0=15_linux64_openblas
  - libllvm11=11.1.0=h3826bc1_1
  - libnghttp2=1.46.0=hce63b2e_0
  - libopenblas=0.3.20=h043d6bf_1
  - libpng=1.6.37=h21135ba_2
  - libssh2=1.10.0=h8f2d780_0
  - libstdcxx-ng=11.2.0=h1234567_1
  - libtasn1=4.16.0=h27cfd23_0
  - libtiff=4.3.0=hf544144_1
  - libunistring=0.9.10=h27cfd23_0
  - libuuid=1.0.3=h7f8727e_2
  - libwebp=1.2.2=h55f646e_0
  - libwebp-base=1.2.2=h7f98852_1
  - libxcb=1.13=h7f98852_1004
  - libxml2=2.9.14=h74e7548_0
  - llvmlite=0.38.0=py37h4ff587b_0
  - lz4-c=1.9.3=h9c3ff4c_1
  - matplotlib=3.5.1=py37h06a4308_1
  - matplotlib-base=3.5.1=py37ha18d171_1
  - munkres=1.1.4=py_0
  - ncurses=6.3=h7f8727e_2
  - nettle=3.7.3=hbbd107a_1
  - numba=0.55.1=py37h51133e4_0
  - numpy=1.21.6=py37h976b520_0
  - olefile=0.46=pyh9f0ad1d_1
  - openh264=2.1.1=h4ff587b_0
  - openjpeg=2.4.0=hb52868f_1
  - openssl=1.1.1o=h7f8727e_0
  - packaging=21.3=pyhd3eb1b0_0
  - pcre=8.45=h295c915_0
  - pip=21.2.2=py37h06a4308_0
  - pthread-stubs=0.4=h36c2ea0_1001
  - pyparsing=3.0.9=pyhd8ed1ab_0
  - pyqt=5.9.2=py37h05f1152_2
  - python=3.7.13=h12debd9_0
  - python-dateutil=2.8.2=pyhd3eb1b0_0
  - python_abi=3.7=2_cp37m
  - qt=5.9.7=h5867ecd_1
  - quaternion=2022.4.1=py37h540881e_0
  - readline=8.1.2=h7f8727e_1
  - rhash=1.4.1=h3c74f83_1
  - scipy=1.7.3=py37hf2a6cf1_0
  - setuptools=61.2.0=py37h06a4308_0
  - sip=4.19.8=py37hf484d3e_0
  - six=1.16.0=pyhd3eb1b0_1
  - smmap=3.0.5=pyhd3eb1b0_0
  - sqlite=3.38.5=hc218d9a_0
  - tbb=2021.5.0=hd09550d_0
  - tk=8.6.12=h1ccaba5_0
  - tornado=6.1=py37h540881e_3
  - tqdm=4.64.0=py37h06a4308_0
  - typing-extensions=4.2.0=hd8ed1ab_1
  - typing_extensions=4.2.0=pyha770c72_1
  - wheel=0.37.1=pyhd3eb1b0_0
  - xorg-fixesproto=5.0=h7f98852_1002
  - xorg-inputproto=2.3.2=h7f98852_1002
  - xorg-kbproto=1.0.7=h7f98852_1002
  - xorg-libx11=1.7.2=h7f98852_0
  - xorg-libxau=1.0.9=h7f98852_0
  - xorg-libxcursor=1.2.0=h7f98852_0
  - xorg-libxdmcp=1.1.3=h7f98852_0
  - xorg-libxext=1.3.4=h7f98852_1
  - xorg-libxfixes=5.0.3=h7f98852_1004
  - xorg-libxi=1.7.10=h7f98852_0
  - xorg-libxinerama=1.1.4=h9c3ff4c_1001
  - xorg-libxrandr=1.5.2=h7f98852_1
  - xorg-libxrender=0.9.10=h7f98852_1003
  - xorg-randrproto=1.5.0=h7f98852_1001
  - xorg-renderproto=0.11.1=h7f98852_1002
  - xorg-xextproto=7.3.0=h7f98852_1002
  - xorg-xproto=7.0.31=h27cfd23_1007
  - xz=5.2.5=h7f8727e_1
  - zlib=1.2.12=h7f8727e_2
  - zstd=1.5.2=ha4553b6_0
  - pip:
    - absl-py==1.1.0
    - ai2thor==4.3.0
    - aiohttp==3.8.1
    - aiosignal==1.2.0
    - allenact==0.5.1
    - allenact-plugins==0.5.1
    - astunparse==1.6.3
    - async-timeout==4.0.2
    - asynctest==0.13.0
    - aws-requests-auth==0.4.3
    - botocore==1.27.18
    - box2d-py==2.3.8
    - cachetools==5.2.0
    - charset-normalizer==2.0.12
    - click==8.1.3
    - cloudpickle==1.6.0
    - colour==0.1.5
    - datasets==2.3.2
    - decorator==4.4.2
    - dill==0.3.5.1
    - docker-pycreds==0.4.0
    - filelock==3.7.1
    - flask==2.1.2
    - flatbuffers==1.12
    - frozenlist==1.3.0
    - fsspec==2022.5.0
    - ftfy==6.1.1
    - gast==0.4.0
    - google-auth==2.8.0
    - google-auth-oauthlib==0.4.6
    - google-pasta==0.2.0
    - grpcio==1.47.0
    - gym==0.19.0
    - gym-minigrid==1.0.3
    - gym-notices==0.0.8
    - h5py==3.7.0
    - habitat-sim==0.2.1
    - huggingface-hub==0.8.1
    - idna==3.3
    - importlib-metadata==4.12.0
    - itsdangerous==2.1.2
    - jinja2==3.1.2
    - jmespath==1.0.1
    - joblib==1.1.0
    - keras==2.9.0
    - keras-preprocessing==1.1.2
    - libclang==14.0.1
    - markdown==3.3.7
    - markupsafe==2.1.1
    - moviepy==1.0.3
    - msgpack==1.0.4
    - multidict==6.0.2
    - multiprocess==0.70.13
    - networkx==2.6.3
    - oauthlib==3.2.0
    - opencv-python==4.6.0.66
    - opt-einsum==3.3.0
    - pandas==1.3.5
    - pathtools==0.1.2
    - patsy==0.5.2
    - pickle5==0.0.12
    - pillow==8.4.0
    - proglog==0.1.10
    - progressbar2==4.0.0
    - promise==2.3
    - protobuf==3.19.4
    - psutil==5.9.1
    - pyarrow==8.0.0
    - pyasn1==0.4.8
    - pyasn1-modules==0.2.8
    - pyglet==1.5.26
    - pyquaternion==0.9.9
    - python-utils==3.3.3
    - python-xlib==0.31
    - pytz==2022.1
    - pyyaml==6.0
    - regex==2022.6.2
    - requests==2.28.0
    - requests-oauthlib==1.3.1
    - responses==0.18.0
    - rsa==4.8
    - scikit-learn==1.0.2
    - sentry-sdk==1.9.0
    - setproctitle==1.2.3
    - shortuuid==1.0.9
    - tensorboard==2.9.1
    - tensorboard-data-server==0.6.1
    - tensorboard-plugin-wit==1.8.1
    - tensorboardx==2.5.1
    - tensorflow==2.9.1
    - tensorflow-estimator==2.9.0
    - tensorflow-io-gcs-filesystem==0.26.0
    - termcolor==1.1.0
    - threadpoolctl==3.1.0
    - timm==0.6.7
    - tokenizers==0.12.1
    - torch==1.11.0
    - torchaudio==0.11.0
    - torchvision==0.12.0
    - transformers==4.21.1
    - trimesh==3.14.0
    - urllib3==1.26.9
    - wandb==0.13.2
    - wcwidth==0.2.5
    - werkzeug==2.1.2
    - wrapt==1.14.1
    - xxhash==3.0.0
    - yacs==0.1.8
    - yarl==1.7.2
    - zipp==3.8.0
prefix: /home/pi/anaconda3/envs/cow

@tyz1030
Copy link

tyz1030 commented May 27, 2023

I use a GPU to deploy this project, after I run this line of code

python pasture_runner.py -a src.models.agent_fbe_owl -n 1 --arch B32 --center

This prompt appeared

Traceback (most recent call last):
  File "pasture_runner.py", line 278, in <module>
    main()
  File "pasture_runner.py", line 273, in main
    test=False
  File "/home/southyang/southyang/code/cow/robothor_challenge.py", line 470, in inference
    timeout=1000)
  File "/home/southyang/anaconda3/envs/cow/lib/python3.7/multiprocessing/queues.py", line 105, in get
    raise Empty
_queue.Empty

I read the code in the corresponding part, but I didn't find where the problem is, how can I solve it?

I'm having the same issue

@Southyang
Copy link
Author

ai2thor
The program is stuck here, is it a problem with ai2thor?
robothor_challenge.py, 267 lines

@Southyang
Copy link
Author

As I continued to debug, I found that it would get stuck here.
I want to know,if this file exist?
ai2thor/fifo_server.py

fifo

@sagadre
Copy link
Collaborator

sagadre commented Jun 11, 2023

@Southyang are you still running into issues?

@Southyang
Copy link
Author

yeah, I still have this issue. And I encountered another problem.
I want to run the Grad-CAM localization strategy alone, and wrote the following code:

def main():
    prompts_path = "./prompt_templates/simple_template.json"
    env_type = EnvTypes.ROBOTHOR
    class_type = ClassTypes.REGULAR
    classes, classes_clip, agent_height, floor_tolerance, negate_action, templates = get_env_class_vars(prompts_path, env_type, class_type)

    clip_model_name = "ViT-B/32"
    threshold = 0.625  # clip weight

    device_number = 0
    device = torch.device("cpu")
    if torch.cuda.is_available():
        device = torch.device("cuda:{0}".format(device_number))

    center_only = False

    print(clip_model_name, classes, classes_clip, templates, threshold, device, center_only)

    Gard_model = ClipGrad(clip_model_name, classes, classes_clip,
                          templates, threshold, device,
                          center_only=center_only)
    # print(Gard_model.class_to_language_feature['HousePlant'])
    pic = Image.open("./scene2.png")
    image = Gard_model.preprocess(pic).unsqueeze(0).to(device)
    image_relevance = Gard_model.forward(image, 'HousePlant')
    print(image_relevance.shape)
    bg_img = plt.imread('./scene2.png')

    # normalize
    adjusted_tensor = np.resize(image_relevance, (bg_img.shape[1], bg_img.shape[0]))
    denominator = np.max(adjusted_tensor) - np.min(adjusted_tensor)
    if denominator != 0:
        normalized = (adjusted_tensor - np.min(adjusted_tensor)) / denominator
    else:
        normalized = adjusted_tensor

    # print(normalized)
    plt.imshow(bg_img)
    plt.imshow(normalized, alpha=0.2, cmap='hot')
    plt.title('Grad-CAM Blended')
    plt.show()

if __name__ == '__main__':
    main()

But the output is like this.

logits_per_image: tensor([[19.7161]], device='cuda:0', grad_fn=<MmBackward0>)
image_relevance:
tensor([[8.9781e-04, 8.9781e-04, 8.9781e-04,  ..., 2.7663e-04, 2.7663e-04,
         2.7663e-04],
        [8.9781e-04, 8.9781e-04, 8.9781e-04,  ..., 2.7663e-04, 2.7663e-04,
         2.7663e-04],
        [8.9781e-04, 8.9781e-04, 8.9781e-04,  ..., 2.7663e-04, 2.7663e-04,
         2.7663e-04],
        ...,
        [1.4318e-03, 1.4318e-03, 1.4318e-03,  ..., 9.6540e-05, 9.6540e-05,
         9.6540e-05],
        [1.4318e-03, 1.4318e-03, 1.4318e-03,  ..., 9.6540e-05, 9.6540e-05,
         9.6540e-05],
        [1.4318e-03, 1.4318e-03, 1.4318e-03,  ..., 9.6540e-05, 9.6540e-05,
         9.6540e-05]], device='cuda:0')
image_relevance * self.gradient_scalar > self.threshold:
tensor([[False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False],
        ...,
        [False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False]], device='cuda:0')
torch.Size([224, 224])

After entering the interpret_vit function(clip_grad.py 84 line), the probability value is reduced to very small, unable to draw the heat map.
2023-06-12 10-22-42 的屏幕截图

@OrmosiaCui
Copy link

when I run this command
python scripts/startx.py

It always shows
_XSERVTransSocketUNIXCreateListener: ...SocketCreateListener() failed
_XSERVTransMakeAllCOTSServerListeners: server already running
(EE)
Fatal server error:
(EE) Cannot establish any listening sockets - Make sure an X server isn't already running(EE)
(EE)
Please consult the The X.Org Foundation support
at http://wiki.x.org
for help.
(EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
(EE)
(EE) Server terminated with error (1). Closing log file.

if this is the reason of queue.Empty problem?
if anyone had solved the queue.Empty problem?

@CatLiZi
Copy link

CatLiZi commented Oct 17, 2024

I'm having the same issue,too. I want to know if anyone has found a solution to it now

I use a GPU to deploy this project, after I run this line of code

python pasture_runner.py -a src.models.agent_fbe_owl -n 1 --arch B32 --center

This prompt appeared

Traceback (most recent call last):
  File "pasture_runner.py", line 278, in <module>
    main()
  File "pasture_runner.py", line 273, in main
    test=False
  File "/home/southyang/southyang/code/cow/robothor_challenge.py", line 470, in inference
    timeout=1000)
  File "/home/southyang/anaconda3/envs/cow/lib/python3.7/multiprocessing/queues.py", line 105, in get
    raise Empty
_queue.Empty

I read the code in the corresponding part, but I didn't find where the problem is, how can I solve it?

I'm having the same issue,too. I want to know if anyone has found a solution to it now

@LinqingZhong
Copy link

@CatLiZi @Southyang I have encountered the same issue. May I ask have you solved this problem ?

@anotheryia
Copy link

I use a GPU to deploy this project, after I run this line of code

python pasture_runner.py -a src.models.agent_fbe_owl -n 1 --arch B32 --center

This prompt appeared

Traceback (most recent call last):
  File "pasture_runner.py", line 278, in <module>
    main()
  File "pasture_runner.py", line 273, in main
    test=False
  File "/home/southyang/southyang/code/cow/robothor_challenge.py", line 470, in inference
    timeout=1000)
  File "/home/southyang/anaconda3/envs/cow/lib/python3.7/multiprocessing/queues.py", line 105, in get
    raise Empty
_queue.Empty

I read the code in the corresponding part, but I didn't find where the problem is, how can I solve it?

I think the problem is in line 475. receive_queue.get(timeout=1000) in 470 will not raise a TimeoutError exception, it will just raise a queue.Empty exception. So i think it just need to change line 475 and 484 to except queue.Empty: to catch the Empty exception as in line 274. The coder may have found the issue and just change it in line 274 but forget the other two.

@CatLiZi
Copy link

CatLiZi commented Oct 24, 2024

@CatLiZi @Southyang I have encountered the same issue. May I ask have you solved this problem ?

@LinqingZhong @Southyang As I delved deeper into debugging, I found that the reason for the queue being empty was consistent with the question raised by a previous researcher. During this code sentence, it got stuck and the thread was blocked, making it impossible to execute subsequent inference code

image

Has anyone found a solution to this code because I can no longer delve deeper into it

@anotheryia
Copy link

anotheryia commented Oct 24, 2024

我这边问题出在第439行定义的x_display上面,我是ssh到服务器上运行的。一是(可能)服务器上面没登陆一个图像桌面,第267行初始化Controller会报错,二是我的x_display只有设为:2才能正常打开,:0.0,:1.0,:2.1等都是打不开的,应该参考cow/issues/7可以查到可以设置什么数值。如果是没图像的服务器可能得用CloudRendering
For me, it stuck because of setting x_display in line 439, i ssh to the server to run the project. Firstly, if i dont log in a desktop, it errors in line 267. Also, i set x_display to :2 only for which i can successfully initialize Controller, not for :0.0, :1.0, :2.1, cow/issues/7 may be helpful. For headless(how to name it?) server, CloudRendering might be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants