BART-MMSS

[Paper] accepted at the SIGIR 2023:

Adapting Generative Pretrained Language Model for Open-domain Multimodal Sentence Summarization, by Dengtian Lin, Liqiang Jing, Xuemeng Song*, Meng Liu, Teng Sun, Liqiang Nie.

Here we attach our trained model on google driver.

Abstract

Multimodal sentence summarization, aiming to generate a brief summary of the source sentence and image, is a new yet challenging task. Although existing methods have achieved compelling success, they still suffer from two key limitations: 1) lacking the adaptation of generative pre-trained language models for open-domain MMSS, and 2) lacking the explicit critical information modeling. To address these limitations, we propose a BART-MMSS framework, where BART is adopted as the backbone. To be specific, we propose a prompt-guided image encoding module to extract the source image feature. It leverages several soft to-be-learned prompts for image patch embedding, which facilitates the visual content injection to BART for open-domain MMSS tasks. Thereafter, we devise an explicit source critical token learning module to directly capture the critical tokens of the source sentence with the reference of the source image, where we incorporate explicit supervision to improve performance. Extensive experiments on a public dataset fully validate the superiority of our proposed method. In addition, the predicted tokens by the vision-guided key-token highlighting module can be easily understood by humans and hence improve the interpretability of our model.

Model

Figure 1: The proposed scheme based on BART backbone, which consists of four vital modules: Source Sentence Encoding, Prompt-guided Source Image Encoding, Explicit Critical Token Learning, and Multimodal Summary Generation.

Data

we chose the Multimodal Sentence Summarization(MMSS) dataset, which has been widely used to evaluate the performance of multimodal summarization models. The MMSS dataset contains $66,000$ samples, including $62,000$ for training, $2,000$ samples for validation, and $2,000$ for testing.

Citations

if you find bart-mmss models helpful, feel free to cite the following publication:

@inproceedings{bart-mmss,
  author       = {Dengtian Lin and
                  Liqiang Jing and
                  Xuemeng Song and
                  Meng Liu and
                  Teng Sun and
                  Liqiang Nie},
  title        = {Adapting Generative Pretrained Language Model for Open-domain Multimodal
                  Sentence Summarization},
  booktitle    = {Proceedings of the 46th International {ACM} {SIGIR} Conference on
                  Research and Development in Information Retrieval, {SIGIR} 2023, Taipei,
                  Taiwan, July 23-27, 2023},
  pages        = {195--204},
  publisher    = {{ACM}},
  year         = {2023},
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Figure		Figure
configs		configs
models		models
utils		utils
README.md		README.md
dataset.py		dataset.py
eval.py		eval.py
requirement.txt		requirement.txt
trainv20.py		trainv20.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BART-MMSS

Abstract

Model

Data

Citations

About

Releases

Packages

Languages

731963709/bart-mmss

Folders and files

Latest commit

History

Repository files navigation

BART-MMSS

Abstract

Model

Data

Citations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages