This repository contains the code for the experiments in
published at ICML 2023.
Check out the ICML2023 branch for a more readable version of the repository that contains the code relevant for the paper at time of conference publication, with largely the same features. The master branch may contain more updated code, more features (e.g. more datasets) and configs, but might as a consequence be more difficult to navigate. Other brances (e.g. dev/master) might be unstable.
- Running the experiments
- Brief overview of the code structure
- Some details on config files and run options
The python dependencies of the project are listed in environment.yaml
.
Working with this repository is easiest with conda installed. Additionally, in order to be able to run these experiments in parallel, the program should be run on a machine (cluster) with Slurm workload scheduler (with GPUs available, otherwise the job requirements are never satisfied).
Assuming conda is installed, the following commands will run a simple experiment run:
# Create and activate the conda environment
conda env create -f environment.yaml
conda activate nc
# Run a simple test
cd src
python3 main.py --config ../config/debug.yaml
which will give checkpoints, measurements, and plots in logs/debug/
.
The actual experiments are time-consuming enough that it is highly advisable to use slurm on a gpu cluster. To run the actual experiments in slurm, run
python3 main.py --config ../config/matrix/matrix_papyan_mseloss.yaml --matrix --slurm # For resnets
python3 main.py --config ../config/matrix/matrix_customnets_param_search.yaml --matrix --slurm # For other networks
which will send the jobs to the slurm scheduler.
To conserve computational resources, it can be useful to first run a minimal example:
python3 main.py --config ../config/matrix_debug.yaml --matrix --slurm
The experiments can also run without the slurm scheduler by dropping the --slurm
flag, but will then run serially.
Any experiment is run by running and specifying a config file: main.py --config path/to/config.yaml
.
In its simplest form this will:
- Create an
Experiment
object (fromExperiment.py
), passing the config file from the given path. - After the config file has been parsed, the
Experiment
contains all specifications of the experiment in its attributes:- The dataset (a
DatasetWrapper
) - The model (a
Models.ForwardHookedOutput
) - The optimizer (a
WrappedOptimizer
) - A logging-handler (a
Logger
) - The IDs of the measures (a dictionairy pointing to the different
Measure
classes)
- The dataset (a
- Then, handled by the Experiment functions:
- The model is trained using the specified dataset (calling
experiment.train()
), saving checkpoints along the way. - For each checkpoint, for each specified layer, the
Measure.measure(...)
methods are called, producing ameasure_name.csv
file.
- The model is trained using the specified dataset (calling
- Finally,
NCPlotter.plot_runs
runs where the measures are saved, creating the relevant plots from the measurement data.
All the files will be put in a log-file specified by the Logging: log-dir: logs/dir/path
, typically under logs/
.
Some of the config files (typically config/matrix/somename.yaml
) contain a Matrix
parameter.
When calling main.py
with the --matrix
(or -m
) flag, this will generate subfolders named with the
hyperparameters/config used in this specific run, and place the relevant config file in the correct subfolder.
Depending on the additional flags set, either:
- If no additional flags, the runs will be performed sequentially. This is not reccommended as it might take days to run.
- If the
-s
(--slurm
) flag is set, the tasks will be submitted in parallel to a local slurm job manager. This is the recommended way to run the experiments. - A dry run will be performed if
-d
(--dry_run
) is set, showing details of the different runs and configs without actually running the experiments nor creating the folder structure. - For other flags, refer to
src/main.py
or runpython3 main.py -h
.