This dataset contains 4,383 crisis-related Twitter text-image pairs annotated for the humanitarian information identification task. It was constructed based on two existing data sources, namely CrisisLexT26 and Twitter Datasets from Crises.
Given a text-image pair, the annotators need to determine whether it contains the following categories of humanitarian information.
- Caution and advice
- Needs and offers
- Other
- Affected individuals
- Infrastructure and utility damage
- Response
Each positive sample (i.e., the text-image pair that contains humanitarian information) can be assigned one or more labels. If a text-image pair does not contain any humanitarian information, it is labeled as Not humanitarian.
More details can be found in our paper.
This resource contains the following files or folders.
- annotation.csv:It contains 4 fields: "tweet_id", "tweet_text", "image_path", and "label" (Multiple labels of a sample are separated by ";").
- image:This folder contains all the images in our dataset. It can be downloaded here.
Please cite the following paper if you use this resource in your research.
Wu, X., Mao, J., Xie, H., & Li, G. (2022). Identifying humanitarian information for emergency response by modeling the correlation and independence between text and images. Information Processing & Management, 59(4), 102977.