Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'RoboPaligemma' object has no attribute 'image_processor' #14

Open
stephonjou opened this issue Jan 24, 2025 · 5 comments

Comments

@stephonjou
Copy link

bash scripts/run.sh configs/calvin_finetune/finetune_paligemma_cont-lstm-post_full-ft_text_vision_wd=0_ws-8_act-10.json

get the following error:
[rank0]: Traceback (most recent call last):
[rank0]: File "main.py", line 350, in
[rank0]: experiment(variant=configs)
[rank0]: File "main.py", line 153, in experiment
[rank0]: image_preprocess = model.model.image_processor
[rank0]: File "/home/zxlong10/.conda/envs/robovlms/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1709, in getattr
[rank0]: raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
[rank0]: AttributeError: 'RoboPaligemma' object has no attribute 'image_processor'

@cheng-haha
Copy link

same issue. can the author fix this bug?

@lixinghang12
Copy link

Sorry for the typo, please modify line 8 of robovlms/model/backbone/robopaligemma.py

return self.model.processor
from return self.model.processor to return self.processor. We will update the repo as well.

@Ericonaldo
Copy link
Collaborator

Should be fixed now.

@cheng-haha
Copy link

It's trainable. But repeated warnings abound:
"You are passing both text and images to PaliGemmaProcessor. The processor expects special image tokens in the text, as many tokens as there are images per each text. It is recommended to add <image> tokens in the very beginning of your text and <bos> token after that. For this call, we will infer how many images each text has and add special tokens.
You are passing both text and images to PaliGemmaProcessor. The processor expects special image tokens in the text, as many tokens as there are images per each text. It is recommended to add <image> tokens in the very beginning of your text and <bos> token after that. For this call, we will infer how many images each text has and add special tokens.
You are passing both text and images to PaliGemmaProcessor. The processor expects special image tokens in the text, as many tokens as there are images per each text. It is recommended to add <image> tokens in the very beginning of your text and <bos> token after that. For this call, we will infer how many images each text has and add special tokens.
You are passing both text and images to PaliGemmaProcessor. The processor expects special image tokens in the text, as many tokens as there are images per each text. It is recommended to add <image> tokens in the very beginning of your text and <bos> token after that. For this call, we will infer how many images each text has and add special tokens.
You are passing both text and images to PaliGemmaProcessor. The processor expects special image tokens in the text, as many tokens as there are images per each text. It is recommended to add <image> tokens in the very beginning of your text and <bos> token after that. For this call, we will infer how many images each text has and add special tokens.
"

@lixinghang12
Copy link

@cheng-haha Since we utilize PaliGemmaProcessor as an image / text processor and concat the two modalities in the backbone, so that the processing precedure is different from the original Processor which utilizes <image> as a placeholder and replace it with the image token sequences. You can ignore the warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants