This repo is the official implementation of Biting Off More Than You Can Detect: Retrieval-Augmented Multimodal Experts for Short Video Hate Detection accepted by WWW 2025.
data # dir of each dataset
- HateMM
- MultiHateClip # i.e., MHClip-B and MHClip-Y
- en
- zh
retrieval # code of retrieval
src # code of MoRE
- config # training config
- model # model implementation
- utils # training utils
- data # dataloader of MoRE
We provide video IDs for each dataset in both temporal and five-fold splits. Due to copyright restrictions, the raw datasets are not included. You can obtain the datasets from their respective original project sites.
Access the full dataset from hate-alert/HateMM.
Access the full dataset from Social-AI-Studio/MultiHateClip: Official repository for ACM Multimedia'24 paper "MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili".
To set up the environment, run the following commands:
conda create --name py312 python=3.12
pip install torch transformers tqdm loguru pandas torchmetrics scikit-learn colorama wandb hydra-core
-
Sample 16 frames from each video in the dataset.
-
Extract on-screen text from keyframes using Paddle-OCR.
-
Extract audio transcripts from video audio using Whisper-v3.
-
Encode visual feature from each video using a pre-trained ViT model.
-
Encode audio feature to MFCC with libsora.
-
Encode textual feature using a pre-trained BERT model.
-
Encode audio transcirpt using BERT to make audio memory bank.
-
Encode title and description using BERT to make textual memory bank.
-
Encode 16 frames using ViT to make visual memory bank.
# conduct retrieval
python retrieve/make_retrieval_result.py
# Run ExMRD for the HateMM dataset
python src/main.py --config-name HateMM_MoRE
# Run ExMRD for the MHClip-Y dataset
python src/main.py --config-name MHClipEN_MoRE
# Run ExMRD for the MHClip-B dataset
python src/main.py --config-name MHClipZH_MoRE
If you find our research useful, please cite this paper:
@inproceedings{lang2025biting,
author = {Lang, Jian and Hong, Rongpei and Xu, Jin and Li, Yili and Xu, Xovee and Zhou, Fan},
booktitle = {The {Web} {Conference} ({WWW})},
year = {2025},
organization = {ACM},
title = {Biting Off More Than You Can Detect: Retrieval-Augmented Multimodal Experts for Short Video Hate Detection},
}