Skip to content

whuscity/multimodal-dataset-for-humanitarian-information-identification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Multimodal dataset for humanitarian information identification

This dataset contains 4,383 crisis-related Twitter text-image pairs annotated for the humanitarian information identification task. It was constructed based on two existing data sources, namely CrisisLexT26 and Twitter Datasets from Crises.

Annotation details

Given a text-image pair, the annotators need to determine whether it contains the following categories of humanitarian information.

  • Caution and advice
  • Needs and offers
  • Other
  • Affected individuals
  • Infrastructure and utility damage
  • Response

Each positive sample (i.e., the text-image pair that contains humanitarian information) can be assigned one or more labels. If a text-image pair does not contain any humanitarian information, it is labeled as Not humanitarian.

More details can be found in our paper.

Data format

This resource contains the following files or folders.

  • annotation.csv:It contains 4 fields: "tweet_id", "tweet_text", "image_path", and "label" (Multiple labels of a sample are separated by ";").
  • image:This folder contains all the images in our dataset. It can be downloaded here.

Citation request

Please cite the following paper if you use this resource in your research.

Wu, X., Mao, J., Xie, H., & Li, G. (2022). Identifying humanitarian information for emergency response by modeling the correlation and independence between text and images. Information Processing & Management, 59(4), 102977.

About

A multimodal Twitter dataset for humanitarian information identification.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published