A group of workflows for using openneuro data
testing if I can edit from scinet
Steps required:
- Before you start - set these environment variables
- Downloading data
- Running fMRIprep
Destination - because we are trying to make this a general purpose repo - we will avoid the nesting as a subdataset bit I usually like so much?
if in the terminal - load the datalad module
Set these two environment variables to get everything going
## set OPENNEURO_DSID to the openneuro dataset id
## set the second environment variable to get the base directory
This is where data is sitting on the scc
To get started let's make sure we have this scripts in the code folder
mkdir ${BASEDIR}/code
cd ${BASEDIR}/code
git clone [email protected]:krembilneuroinformatics/openneuro_preproc.git
Note: for this to work - you need to add a ssh key for SciNet to your github.
├── code
│ └── openneuro_preproc # a clone of this repo
├── containers
│ └── fmriprep-20.2.7.simg # the singularity image used to run fmriprep (need to run steps below to get this first!)
├── ${OPENNEURO_DSID} # folder for the dataset
│ ├── bids # the bids data is the data downloaded from openneuro
│ ├── derived # holds derivatives derived from the bids data
│ └── logs # logs from jobs run on cluster
└── fmriprep_home # an extra folder with pre-downloaded fmriprep templates (see setup section)
## git annex is already on all nodes
source /external/rprshnas01/netdata_kcni/edlab/venvs/datalad-0-15-5/bin/activate
## loading Erin's datalad environment on the SciNet system
module load git-annex/8.20200618 # git annex is needed by datalad
module use /project/a/arisvoin/edickie/modules #this let's you read modules from Erin's folder
module load datalad/0.15.5 # this is the datalad module in Erin's folder
datalad clone https://github.com/OpenNeuroDatasets/${OPENNEURO_DSID}.git bids
The above bit would "clone" the dataset - meaning it will only download the little files and download instructions. To actually download the imaging data we need to use "datalad get".
This is useful - because we can limit downloading time/space by exploring the dataset and only downloading what we are really interested in.
Let's start by getting all the anatomical MRI images - we always need these
datalad get */anat/*
next - let's grab the resting-state fMRI data and associated files. Under BIDS convension - they are always in the "func" folder and all have "task-rest" in their filename.
datalad get */anat/*task-rest*
building fmriprep container on scinet
This step was run by Erin
module load tools/singularity/3.8.5 #(not necessary ot module load but run other steps); (gets recipe to do 'science' from docker)
# singularity build /my_images/fmriprep-<version>.simg docker://nipreps/fmriprep:<version>
mkdir ${BASEDIR}/containers
singularity build ${BASEDIR}/containers/fmriprep-20.2.7.simg \
The above step is downloading ALL the fmriprep software and putting it in a 'tupperware' container (according to Erin).
Testing and setting up for the singularity run..
We need a copy of the freesurfer license to be in: you can get htis from the freesrufer webiste or within the SCC (our option)
ls ${BASEDIR}/fmriprep_home/.freesurfer.txt
Testing the singularity binds..
singularity shell --cleanenv \
-B ${BASEDIR}/fmriprep_home:/home/fmriprep --home /home/fmriprep \
From inside the container - set up templateflow (note due this before submitting a job)
python -c "from templateflow.api import get; get(['MNI152NLin2009cAsym', 'MNI152NLin6Asym'])"
python -c "from templateflow.api import get; get(['fsaverage', 'fsLR'])"
python -c "from templateflow.api import get; get(['OASIS30ANTs'])"
Note: this step uses and estimated 24hrs for processing time per participant! So if all participants run at once (in our parallel cluster) it will still take a day to run.
## note step one is to make sure you are on one of the login nodes
ssh niagara.scinet.utoronto.ca
## don't forget to make sure that $BASEDIR and $OPENNEURO_DSID are defined..
module load singularity/3.8.0
## go to the repo and pull new changes
cd ${BASEDIR}/code/openneuro_preproc
git pull
## calculate the length of the array-job given
N_SUBJECTS=$(( $( wc -l ${BASEDIR}/${OPENNEURO_DSID}/bids/participants.tsv | cut -f1 -d' ' ) - 1 ))
array_job_length=$(echo "$N_SUBJECTS/${SUB_SIZE}" | bc)
echo "number of array is: ${array_job_length}"
## submit the array job to the queue
sbatch --array=0-${array_job_length} ${BASEDIR}/code/openneuro_preproc/code/01_fmriprep_anat_scinet.sh
Running the functional step looks pretty similar to running the anat step. The time taken and resources needed will depend on how many functional tasks exists in the experiment - fMRIprep will try to run these in paralell if resources are available to do that.
Note - the script enclosed uses some interesting extra opions:
- it defaults to running all the fmri tasks - the
flag can be used to filter from there - it is outputing cifti files (HCP fsLR91k space as well as MNI and native space outputs)
- it is running
synthetic distortion
correction by default - instead of trying to work with the datasets available feildmaps - because feildmaps correction can go wrong.
## note step one is to make sure you are on one of the login nodes
ssh niagara.scinet.utoronto.ca
## don't forget to make sure that $BASEDIR and $OPENNEURO_DSID are defined..
module load singularity/3.8.0
## go to the repo and pull new changes
cd ${BASEDIR}/code/openneuro_preproc
git pull
## figuring out appropriate array-job size
SUB_SIZE=1 # for func the sub size is moving to 1 participant because there are two runs and 8 tasks per run..
N_SUBJECTS=$(( $( wc -l ${BASEDIR}/${OPENNEURO_DSID}/bids/participants.tsv | cut -f1 -d' ' ) - 1 ))
array_job_length=$(echo "$N_SUBJECTS/${SUB_SIZE}" | bc)
echo "number of array is: ${array_job_length}"
## submit the array job to the queue
sbatch --array=0-${array_job_length} ${BASEDIR}/code/openneuro_preproc/code/02_fmriprep_func_scinet.sh
Before running this make sure that the fmriprep container exits and that you have set the freesurfer license instructions above
Also don't forget about setting the environment variables for $BASEDIR
## note step one is to make sure you are on one of the submit nodes
ssh dev02
## don't forget to make sure that $BASEDIR and $OPENNEURO_DSID are defined..
## go to the repo and pull new changes
cd ${BASEDIR}/code/openneuro_preproc
git pull
## figuring out appropriate array-job size
N_SUBJECTS=$(( $( wc -l ${BASEDIR}/${OPENNEURO_DSID}/bids/participants.tsv | cut -f1 -d' ' ) - 1 ))
echo "number of array is: ${N_SUBJECTS}"
## submit the array job to the queue
sbatch --array=0-${array_job_length} ${BASEDIR}/code/openneuro_preproc/code/01_fmriprep_func_scc.sh