Neutrino_Flows

A public repository for a minimal working example in the neutrino flows project

This repository facilitates the steps required the produce and save a fully trained conditional normalising flow for neutrino regression.

Configuration

The entire session is configured by the tree yaml files which are found in config/

data.yaml: Controls the data loading, where to file the files, which variables to use, which scalers to apply etc
flow.yaml: Controls the configuration of the conditional INN, from the number of layers in the flow to the size of the deep set
train.yaml: Controls the training set, here one can configure learning rate schedulers, gradient clipping, early stopping etc. Each of these yaml files essentially define a hierarchy of keyword arguments which are passed to the functions in nureg/. In each of these file there are extensive comments outlining what each option does.

Setup the environment
- Either use the requirement.txt file to setup the appropriate python packages.
  - This project was tested with python 3.9
- Alternatively use the docker build file to create an image which can run the package
Download the data
- The datafiles are a bit larger and thus are not stored in this repository
- You can find them on Zenodo:
  - doi: 10.5281/zenodo.6782987
- Make sure that the "path" keyword in config/train.yaml points to the downloaded folder
Specify the save path for the flow
- This is done by setting the "base_kwargs/name" and "base_kwargs/save_dir" in config/flow.yaml
- The code will try and create the directory if it does not exist
Run the script
- Use simply "python train.py" and watch it go!

The main executable script is train.py and it performs the following steps:

Loads in the three configuration dictionaries for the session
Initialises the dataset for training using the data config
Initialises the flow for training using the flow config and the dimensionality of samples in the dataset
Creates a save directory for the flow into which it stores:
- Copies of the configs
- Histograms of the raw dataset features
- Preprocessing scalers, which are fit on the dataset
- Histograms of the pre-processed dataset features
- During training this will also be filled with the loss values, model checkpoints, and the "best" model based on lowest validation loss
Splits the dataset into a training and a holdout validation set
Initialises a Trainer using the train config, and starts the training session
- The Trainer class performs model fitting via gradient descent
- During this it facilitates moving data to and from the network device, checkpoint saving, early stopping etc.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
config		config
nureg		nureg
.gitignore		.gitignore
Dockerfile		Dockerfile
Flow.png		Flow.png
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py