- Release data.
- Release the code for ambiguous action recognition.
- Release the code for transparent solution tracking.
The challenge of replicating research results has posed a significant impediment to the field of molecular biology. The advent of modern intelligent systems has led to notable progress in various domains. Consequently, we embarked on an investigation of intelligent monitoring systems as a means of tackling the issue of the reproducibility crisis. Specifically, we first curate a comprehensive multimodal dataset, named ProBio, as an initial step towards this objective. This dataset comprises fine-grained hierarchical annotations intended for the purpose of studying activity understanding in Molecular Biology Lab (BioLab). Next, we devise two challenging benchmarks, transparent solution tracking and multimodal action recognition, to emphasize the unique characteristics and difficulties associated with activity understanding in BioLab settings. Finally, we provide a thorough experimental evaluation of contemporary video understanding models and highlight their limitations in this specialized domain to identify potential avenues for future research. We hope
ProBio with associated benchmarks may garner increased focus on modern AI techniques in the realm of molecular biology.
This paper makes three primary contributions:
- We introduce
ProBio, the first protocol-guided dataset with dense hierarchical annotations in BioLab to facilitate the standardization of protocols and the development of intelligent monitoring systems for reducing the reproducibility crisis.
- We propose two challenging benchmarking tasks to measure models’ capability in leveraging both visual observations and language protocols for fine-grained multimodal video understanding, especially for ambiguous actions and environment states.
- We provide an extensive experimental analysis of the proposed tasks to highlight the limitations of existing multimodal video understanding models and point out future research directions.
python==3.8.16
You can install all the requirements via:
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
Install randaugment via this link.
- Download our processed data and put it in ./ProBio.
- Extract videos into frames and separate them into train, test, and val sets.
git clone https://github.com/jiemingcui/ProBio.git
cd ProBio/
unzip ProBio_dataset
mkdir -p ./data/autobio
mv ProBio_dataset/Kinetics/* ./data/autobio/
bash ./proprecess/reorg.sh
- Reorganize all data as structure below:
|-- data
|-- autobio
|-- clip_images
|-- videos
|-- autobio_labels.csv
|-- autobio_test.csv
|-- autobio_test_frames.txt
|-- autobio_train.csv
|-- autobio_train_frames.txt
|-- autobio_val.csv
|-- autobio_val_frames.txt
|-- ambiguity.json
This model is trained on 1 NVIDIA 3090
# train
python train.py --config ./configs/k400/autobio_train.yaml
If you get stuck after training to a certain number of steps, then please try to locate the WandB problem.
# test
python test.py --config ./configs/k400/autobio_test.yaml
After the submission, we changed the way of sampling video data and further optimized the current model. As a result, there has been an improvement in performance. The improved results are shown in the table below:
Category | Top1 | Top5 | Mean Top1 | Mean Top5 |
---|---|---|---|---|
Overall | 76.44 | 99.43 | 73.58 | 99.60 |
Special thanks to Putao for his significant contribution to achieving this result.
Our code is based on ActionCLIP, CLIP, and STM. Thanks for these great projects.
@inproceedings{cui2023probio,
title={ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab},
author={Jieming Cui, Ziren Gong, Baoxiong Jia, Siyuan Huang, Zilong Zheng, Jianzhu Ma, Yixin Zhu},
booktitle={Advances in Neural Information Processing Systems},
year={2023}
}