Skip to content

[NeurIPS 2023 Spotlight] The Pursuit of Human Labeling: A New Perspective on Unsupervised Learning

License

Notifications You must be signed in to change notification settings

mlbio-epfl/hume

Folders and files

NameName
Last commit message
Last commit date

Latest commit

c723a24 · Nov 7, 2023

History

5 Commits
Oct 29, 2023
Oct 29, 2023
Nov 7, 2023
Oct 29, 2023
Oct 29, 2023
Oct 29, 2023
Oct 29, 2023
Oct 29, 2023
Oct 29, 2023

Repository files navigation

The Pursuit of Human Labeling: A New Perspective on Unsupervised Learning

Artyom Gadetsky, Maria Brbić

Project page | BibTeX


This repo contains the source code of the HUME algorithm written in PyTorch. HUME is a model-agnostic framework for inferring human labeling of a given dataset without any external supervision. For more details please check our paper The Pursuit of Human Labeling: A New Perspective on Unsupervised Learning (NeurIPS '23).

Dependencies

The code is built with the following libraries:

Data Preparation

You can download the prepared representations that we used in our experiments by running:

wget https://brbiclab.epfl.ch/wp-content/uploads/2023/11/data.zip
unzip data.zip

You can download the tasks found by HUME for evaluation by running:

wget https://brbiclab.epfl.ch/wp-content/uploads/2023/11/tasks.zip
unzip tasks.zip

You can also use your own representations and datasets. HUME is compatible with any pretrained representations. The rule of thumb is to use self-supervised representations pretrained on the dataset of interest as ϕ 1 . Do not forget to normalize the representations ϕ 1 to have unit norm (see Section 2.2 in the paper for the details). As the second representation space ϕ 2 you can use any large pretrained model suitable for your dataset.

Training

To check the available hyperparameters you can run:

python hume.py --help

The default hyperparameters are set to correspond to the hyperparameters used on the STL-10, CIFAR-10 and CIFAR-100-20 datasets.

For example, to run HUME on CIFAR-10 in inductive setting with MOCOv2 self-supervised representations and DINOv2 pretrained representations, run:

python hume.py \
--phi1_path data/representations/mocov2/cifar10train_l2.npy \
--phi2_path data/representations/dino/cifar10train.npy \
--gt_labels_path data/labels/cifar10train_targets.npy \
--exp_path tasks/inductive/moco_dino/cifar10/ \
--k 10 \
--seed 42 # Choose random seed

For STL-10 and CIFAR-100-20 just change paths accordingly and set --k to the corresponding number of classes.

Similarly, to run the same experiment in transductive setting, run:

python hume.py \
--phi1_path data/representations/mocov2/cifar10traintest_l2.npy \
--phi2_path data/representations/dino/cifar10traintest.npy \
--gt_labels_path data/labels/cifar10traintest_targets.npy \
--exp_path tasks/transductive/moco_dino/cifar10/ \
--k 10 \
--seed 42 # Choose random seed

To run HUME on ImageNet-1000 in inductive setting with MOCOv2 self-supervised representations and DINOv2 pretrained representations, run:

python hume.py \
--phi1_path data/representations/mocov2/imagenet1000train_l2.npy \
--phi1_path_val data/representations/mocov2/imagenet1000test_l2.npy \
--phi2_path data/representations/dino/imagenet1000train.npy \
--phi2_path_val data/representations/dino/imagenet1000test.npy \
--gt_labels_path data/labels/imagenet1000test_targets.npy \
--exp_path tasks/inductive/moco_dino/imagenet1000/ \
--k 1000 \
--outer_lr 0.1 \
--inner_lr 0.1 \
--adaptation_steps 100 \
--subset_size 20000 \
--train_fraction 0.7 \
--no_anneal \
--no_rand_init \
--seed 42 # Choose random seed

Evaluation

To evaluate the obtained tasks use evaluate.py. For example, to evaluate 100 tasks obtained on CIFAR-10 in the inductive setting, run:

python evaluate.py \
--phi1_path data/representations/mocov2/cifar10test_l2.npy \
--tasks_path tasks/inductive/moco_dino/cifar10/ \
--gt_labels_path data/labels/cifar10test_targets.npy

Acknowledgements

While developing HUME we greatly benefited from the open-source repositories:

Citing

If you find our code useful, please consider citing:

@inproceedings{
    gadetsky2023pursuit,
    title={The Pursuit of Human Labeling: A New Perspective on Unsupervised Learning},
    author={Gadetsky, Artyom and Brbi\'c, Maria},
    booktitle={Advances in Neural Information Processing Systems},
    year={2023},
}