docker container version of scsnl preprocessing pipeline
The goal is to begin dockerizing the available workflows located at the SCSNL github page , starting with the preprocessing workflow. Dockerization/containerization of workflows allows for scalable, reproducible workflows that can be run on any environment that supports docker.
- Docker installation (docker docs)
- check whether docker is installed and running
docker info
- clone the latest github version, by either clicking here, or running
git clone [email protected]:cdla/scsnl_preproc_docker.git
- unzip the directory (if needed) and cd into the directory
unzip scsnl_preproc_docker.zip;
cd scsnl_preproc_docker
- build the docker image
docker build -t scsnl/preproc_spm12 .
- (not tested/functional, but this is how the docker workflow would be run) run the workflow, with the three arguments indicating the location of the config_file.m, the data_dir, and the output_dir
docker run -t scsnl/preproc_spm12 -v /oak/project_location:/project_dir/ -v /oak/raw_data_location/:/raw_data/ -v /oak/output_dir/:/output_dir/ -v /oak/config_file_location.txt:/config.m subject_index
- use docker2singularity to create a singularity image so that the docker workflow can run in research computing clusters such as Sherlock.
- generate the same environment using neurodocker
Upon doing some preliminary research, it looks like there are some field tools that have already done the lion's share of work in this process including:
- the spm bids-app
- I was surprised when I ran across this tool. Using the spm12 standalone MCR version, it runs a preset preprocessing pipeline, that has been written in spm_batch format already.
- neurodocker
- this tool looks particularly interesting and I will likely make time to familiarize myself with it. It's a command line tool that generates Dockerfiles and Singularity images. Ultimately dockerization of workflows typically also needs to be followed by the creation of singularity images because singularity images can be run on university hpcc resources like Sherlock. Docker containers/daemons require root/admin power, whereas singularity images can be run without that requirement.
- common tool to translate docker containers to singularity images is docker2simularity.
- this tool looks particularly interesting and I will likely make time to familiarize myself with it. It's a command line tool that generates Dockerfiles and Singularity images. Ultimately dockerization of workflows typically also needs to be followed by the creation of singularity images because singularity images can be run on university hpcc resources like Sherlock. Docker containers/daemons require root/admin power, whereas singularity images can be run without that requirement.
- the official spm docker
- This repository looks like its been created three months ago, and recently updated, to include both the Matlab Compiled Runtime (MCR) version as well as the octave version. SPM Documentation says that officially SPM is not supported, and there are currently some known issues as indicated here .
I. use official spm dockerfile
II. translate existing pipeline to nipype and then dockerize nipype environment with something like neurodocker
III. create MCR version of preprocessing scripts and add to official spm docker
IV. use neurodocker framework to create environment
route I.
- the official spm docker works mainly off spm_batch formatted language. I would need to figure out how to translate the wrapped command line functions such as this, as well as how to translate the pipeline's use of fsl to its spm analogues like when reorienting the data/"FlipZ", like here)
route II
- this route is the one that I would be most comfortable with, given my relative comfort with nipype as compared to other frameworks. I think that this route would take the longest.
route III I chose to go this route
-
this route I think will have the best translation / easiest for users to who are familiar with the existing pipeline to translate over to
-
For this route, the goal is to create a dockerfile that has:
-
will need to modify existing scripts to use the spm12 mcr
route IV
- neurodocker seems like a very useful tool that I should familiarize myself with. I could see myself using this tool in the future.
-
make a dockerfile that supports SPM12 MCR and FSL (relevant commit)
-
remove/update script locations to docker relevant places (relevant commit)
-
modify existing scripts to take data location/project location/ output location as arguments for command (relevant commit)
- due to the nature of containers, these directories will have to be mounted as volumes within the container.
-
within preproc functions and utils, remove filepaths (fsl commands and added toolboxes) (relevant commit)
-
modify existing scripts to change spm_run locations to unix matlab commands that invoke spm12-mcr compiled versions standalone usage docs
- example:
spm_jobman('run', BatchFile);
turns to
system(sprintf('spm12 batch %s',BatchFile));
-
compile the SCSNL preprocessing scripts (including ARTRepair toolbox) into an executable using mcc . (relevant script)(relevant commit)
-
test that spm functions, artrepair functions, and unix/fsl functions are running appropriately within the matlab compiled app on a sample dataset
- this will determine whether matlab runtime compiler and docker interaction requires workflow restructure to handle passing data to the container.
-
integrate scsnl standalone app into dockerfile (add volume mounts from modified scsnl preproc scripts)
-
test dockerfile
-
comparison against non-dockerized version of scripts to make sure no hidden bugs arise.
Possible Issues:
- coreg function references OldNorm as templates for spm12 workflow, which I would need to get a copy of that nifti
- verify ARTRepair version (spm8 version referenced within workflow)
- slicetiming file being optional (how to work with dockerized file mappings)
- if mcr mapping of filepaths does not work with sample dataset, restructure file path mappings to be done in dockerfile instead of within preprocessingfmri.m wrapper.