Welcome to the 4M (Massively Multimodal Masked Modeling) Tutorial! This tutorial is an introduction to working with a multimodal foundation model, specifically 4M (https://4m.epfl.ch/), and will guide you through its setup and hands-on exercises for generation and retrieval tasks.
Before running the notebooks, follow Environment.md to install the required dependencies.
Once the setup is complete, start exploring pre-trained 4M models by completing the following Jupyter notebooks:
- Part 01: Multi-modal Generation – Run
Part_01_4M_generation.ipynb
. - Part 02: Multi-modal Retrieval – Run
Part_02_4M_retrieval.ipynb
.
This tutorial will help you understand the core functionalities of 4M and allow you to to experiment with multimodal generation and retrieval. This should give you a good base to later-on implement and train your own nano4M from scratch!