MoRE: Mixture of Retrieval-Augmented Multimodal Experts

This repo is the official implementation of Biting Off More Than You Can Detect: Retrieval-Augmented Multimodal Experts for Short Video Hate Detection accepted by WWW 2025.

Source Code Structure

data        # dir of each dataset
- HateMM 
- MultiHateClip     # i.e., MHClip-B and MHClip-Y
    - en
    - zh

retrieval   # code of retrieval

src         # code of MoRE
- config    # training config
- model     # model implementation
- utils     # training utils
- data      # dataloader of MoRE

Dataset

We provide video IDs for each dataset in both temporal and five-fold splits. Due to copyright restrictions, the raw datasets are not included. You can obtain the datasets from their respective original project sites.

HateMM

Access the full dataset from hate-alert/HateMM.

MHClip-B and MHClip-Y

Access the full dataset from Social-AI-Studio/MultiHateClip: Official repository for ACM Multimedia'24 paper "MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili".

Usage

Requirements

To set up the environment, run the following commands:

conda create --name py312 python=3.12
pip install torch transformers tqdm loguru pandas torchmetrics scikit-learn colorama wandb hydra-core

Data Preprocess

Sample 16 frames from each video in the dataset.
Extract on-screen text from keyframes using Paddle-OCR.
Extract audio transcripts from video audio using Whisper-v3.
Encode visual feature from each video using a pre-trained ViT model.
Encode audio feature to MFCC with libsora.
Encode textual feature using a pre-trained BERT model.

Retrieval

Encode audio transcirpt using BERT to make audio memory bank.
Encode title and description using BERT to make textual memory bank.
Encode 16 frames using ViT to make visual memory bank.

# conduct retrieval
python retrieve/make_retrieval_result.py

Run

# Run ExMRD for the HateMM dataset
python src/main.py --config-name HateMM_MoRE

# Run ExMRD for the MHClip-Y dataset
python src/main.py --config-name MHClipEN_MoRE

# Run ExMRD for the MHClip-B dataset
python src/main.py --config-name MHClipZH_MoRE

Citation

If you find our research useful, please cite this paper:

@inproceedings{lang2025biting,
	author = {Lang, Jian and Hong, Rongpei and Xu, Jin and Li, Yili and Xu, Xovee and Zhou, Fan},
	booktitle = {The {Web} {Conference} ({WWW})},
	year = {2025},
	organization = {ACM},
	title = {Biting Off More Than You Can Detect: Retrieval-Augmented Multimodal Experts for Short Video Hate Detection},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
retrieve		retrieve
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoRE: Mixture of Retrieval-Augmented Multimodal Experts

Source Code Structure

Dataset

HateMM

MHClip-B and MHClip-Y

Usage

Requirements

Data Preprocess

Retrieval

Run

Citation

About

Releases

Packages

Languages

License

ICDM-UESTC/MoRE

Folders and files

Latest commit

History

Repository files navigation

MoRE: Mixture of Retrieval-Augmented Multimodal Experts

Source Code Structure

Dataset

HateMM

MHClip-B and MHClip-Y

Usage

Requirements

Data Preprocess

Retrieval

Run

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages