This repository contains code for the paper `An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics'.
Recently, reference-free metrics such as CLIPScore (Hessel et al., 2021), UMIC (Lee et al., 2021), and PAC-S (Sarto et al., 2023) have been proposed for automatic evaluation of image captions, demonstrating a high correlation with human judgment. We provide insights into the strengths and limitations of reference-free metrics for image captioning evaluation, guiding future improvements in this area.
- Dataset: Download the dataset from here. Additionally, we have provided the file containing scores for all baselines for each metric.
Please download them and unzip them in ./dataset directory.
- TODO: Update README and clean the code. Please do not hesitate to reach out for any issue.