Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 3DSRBench #708

Merged
merged 1 commit into from
Jan 20, 2025
Merged

Add 3DSRBench #708

merged 1 commit into from
Jan 20, 2025

Conversation

wufeim
Copy link
Contributor

@wufeim wufeim commented Jan 4, 2025

Add support for 3DSRBench. Data files are hosted on huggingface.co.

@PhoenixZ810
Copy link
Collaborator

Hi,
Thank you for submitting 3DRSBench to our attention. We have reviewed your contribution and noticed that while the link to 3DRSBench has been provided, the specific evaluation metrics have not yet been included.

Please complete the detailed evaluation method for your benchmark. Once we have this information, we will be able to proceed with adding 3DRSBench to the VLMEvalkit.

Best regards

@wufeim
Copy link
Contributor Author

wufeim commented Jan 4, 2025

Hi, Thank you for submitting 3DRSBench to our attention. We have reviewed your contribution and noticed that while the link to 3DRSBench has been provided, the specific evaluation metrics have not yet been included.

Please complete the detailed evaluation method for your benchmark. Once we have this information, we will be able to proceed with adding 3DRSBench to the VLMEvalkit.

Best regards

Thanks for the feedback. I see that from the documentation certain functions may need to be implemented. However, 3DSRBench follows the standard multiple choice VQA datasets, and can be evaluated with the following command:

python3 run.py --data 3DSRBenchv1 --model GPT4o_20240806

I assume no other functions need to be implemented? Otherwise could you point me to the relevant functions/metrics that need to be provided? Thanks!

@PhoenixZ810
Copy link
Collaborator

Hi, Thank you for submitting 3DRSBench to our attention. We have reviewed your contribution and noticed that while the link to 3DRSBench has been provided, the specific evaluation metrics have not yet been included.
Please complete the detailed evaluation method for your benchmark. Once we have this information, we will be able to proceed with adding 3DRSBench to the VLMEvalkit.
Best regards

Thanks for the feedback. I see that from the documentation certain functions may need to be implemented. However, 3DSRBench follows the standard multiple choice VQA datasets, and can be evaluated with the following command:

python3 run.py --data 3DSRBenchv1 --model GPT4o_20240806

I assume no other functions need to be implemented? Otherwise could you point me to the relevant functions/metrics that need to be provided? Thanks!

Understood. We will proceed to validate this pr shortly. Thank you for your patience.

@PhoenixZ810
Copy link
Collaborator

Hi,
We have evaluated 3DSR Bench using LLaVA-Next-8B-llama3, and the results are outlined below. Our score of 51.31 with LLaVA-Next-8B is notably higher than the 45.5 reported in the paper, which may seem unusual.

We are curious to know if you have had the opportunity to run similar evaluations using VLMEvalkit. If so, we would greatly appreciate it if you could share your results that might help us understand this discrepancy better.
图片

@kennymckormick
Copy link
Member

@wufeim , I have read the paper and find that 3DSRBench adopts CircularEval for evaluation. However, the current implementation is not for CircularEval.

@PhoenixZ810 PhoenixZ810 self-assigned this Jan 14, 2025
@PhoenixZ810 PhoenixZ810 merged commit 9568fe9 into open-compass:main Jan 20, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants