Rewrite of parts of Bella Brezovec's brainsss code (https://github.com/ClandininLab/brainsss).
It uses snakemake to schedule tasks which builds a list of output files that are needed and dynamically schedules jobs to create each of the output files.
The advantage of snakemake is a great increase in running the pipeline.
The best way to enjoy snake_brainsss is to use stimpack to write experimental metadata on the Bruker imaging PC and to collect fictrac behavioral data on the fictrac PC.
Currently, no visual stimulus presentation has been implemented.
Todo: Implement a median z-score (instead of mean z-score)
https://github.com/pimentel/sherlock-notes/blob/master/README.md (maybe useful info)
ml python/3.9.0
(on sherlock only)
python3 -m venv .env_snakemake
# Create environment (#Note for myself: Local Mac is '.env_snake_brainsss')
source .env_snakemake/bin/activate
# Activate environment
pip install --upgrade pip
pip3 install antspyx
# Install this first as it's the most fragile package
pip install snakemake
pip install natsort
pip install lxml
pip install openpyxl
pip install h5py
pip install pyfiglet
pip3 install opencv-python-headless
pip install scipy
pip install PuLP==2.7.0
# manually downgrade version for snakemake compatibility
pip install urllib3==1.26.6
# manually downgrade version for snakemake compatibility
In your favorite folder on sherlock type:
git clone https://github.com/DavidTadres/snake_brainsss
Before the code works as expected, a config file with you name needs to be prepared. It can be arbitrarily named, just remember the name.
The config file is in the folder 'users' and contains several lines (see dtadres.json for example):
-
One line is called 'datatset_path' defines the root path where all the preprocessed data lives.
-
Another line is called 'autotransferred_stimpack'. If you use David's fork of Brukerbridge and stimpack to create metadata (including fictrac tracking) you can set this to True to automate fly building (i.e. no need to manually copy anything).
-
One line called 'fictrac_path' points the fly_builder to where to collect fictrac '.dat' files. This line is optional and will be ignored if 'autotransferred_stimpack' is set to True.
-
The 'shell_python_command' should be set to 'python3' on sherlock. If you use your own PC it might need to be set to 'python': Open a terminal, enter the snakebrainsss environment and type 'python3'. If you get an error, use 'python'
-
The 'moco_temp_folder' should be '/scratch/groups/trc' on sherlock. Only change this to another path if you use snake_brainsss on your own PC.
See an existing file, i.e. 'dtadres.json' for an example.
If you are (successfully) using autransferred_stimpack: True the below does not apply!
Before building your first fly, please read this part carefully to make sure it'll run as expected:
-
fictrac: Your fictrac data must be in the folder define in the 'user.json' file. It must have the following name: DATE_FLY_FUNC
For example if you want to build the flies in the imports folder "/oak/stanford/groups/trc/data/David/Bruker/imports/20240326" you'll see that there are 4 flies in that folder.
Each of those should have a fictrac folder. For the script to associate the correct fictrac data with the imaging data you must rename the fictrac folders to: "/oak/stanford/groups/trc/data/David/Bruker/Fictrac/20240328_fly1_func0"
Since it's possible to have more than one 'func' folder per fly, you must define which imaging data a particular fictrac dataset is associated with.
-
Each fly must have a 'fly.json' in the 'imports' folder. For example, in
""/oak/stanford/groups/trc/data/David/Bruker/imports/20240326/fly1" you will find such a fly.json file.The file contains 3 essential and several self-explanatory optional fields. Here we focus on the essentials:
a) genotype - this is essential because the flybuilder will group flies per genotype. For example, if genotype is "SS84990_DNa03_x_GCaMP6f" the fly builder will go to the 'dataset_folder' defined in the user.json (for dtadres.json that would be "/oak/stanford/groups/trc/data/David/Bruker/preprocessed") and create a folder "SS84990_DNa03_x_GCaMP6f" and deposit 'fly_001, fly_002...' in there.
! Hence, even a slight change in the genotype can be confusing as you'll create a lot of genotype folder. Make sure to be consistent !
b) anatomy_channel - This is the static imaging channel: For example if we have tdTomato on Channel 1 and GCaMP on channel 2, you would type "channel_1" (exactly like this!). There currently can only be a single anatomy channel for snakebrainsss to work as expected.
c) functional_channel - This is the dynamic imaging channel. For example if we have tdTomato on Channel 1 and GCaMP on channel 2, you would type "['channel_2']" (exactly like this!).
If you have two imaging channels, i.e. rGECI on Channel 1, gCarvi on channel 2 and mCardinal on Channel 3 you would type "['channel_1', 'channel_2']" (exactly like this!). Note that this hasn't been tested yet.
Now you are ready to build a fly.
to build flies in this folder'/oak/stanford/groups/trc/data/David/Bruker/imports/20240326'
On Sherlock type:
ml python/3.9.0
source .env_snakemake/bin/activate
cd to your snakebrainsss folder. If you don't know what this means try:
cd snake_brainsss/workflow/
Then type:
snakemake --config user=dtadres --profile profiles/simple_slurm -s build_fly.smk --directory /oak/stanf ord/groups/trc/data/David/Bruker/imports/20240326 -np
The '-np' at the end makes this a test run. Check the output.
It will says which flies it wants to create in the genotype folder. If it looks ok, re-run the code without the '-np' flag:
snakemake --config user=dtadres --profile profiles/simple_slurm -s build_fly.smk --directory /oak/stanf ord/groups/trc/data/David/Bruker/imports/20240326
Building the flies will take a few minutes (~2 minutes per fly)
You will get an email when the job starts and another email when it ends (or fails, cancels etc.)
Before continuing, make sure that the fictrac and imaging data is in the expected folder!
Visual stimulus not implemented TBD!
Fictrac fps is currently hardcoded to 100 fps (preprocess_fly.smk, line 37). To be changed in the future
Once the fly is built, the preprocessing is straightforward. For example, to preprocess fl "oak/stanford/groups/trc/data/David/Bruker/preprocessed/SS84990_DNa03_x_GCaMP6f/fly_037" you just type:
snakemake --config user=dtadres --profile profiles/simple_slurm -s preprocess_fly.smk --directory /oak/stanf ord/groups/trc/data/David/Bruker/preprocessed/SS84990_DNa03_x_GCaMP6f/fly_037
This should take ~2 hours and you should have the same output as with the original brainsss.
You will get an email when the job starts and another email when it ends (or fails, cancels etc.). This is useful for debugging so that it's set to send email. To be discussed to turn it off in the future.
It's easy to automatically move all incoming emails into a given folder so that snakebrainsss doesn't clog you inbox.
- the imaging files in 'func' and 'anat' folders all are called the same: instead of calling them 'functional_channel_1.nii' or 'anatomical_channel_1.nii' everything is just called 'channel_1.nii'. Reason: Makes working with wildcards in snakemake much easier.
- fly_builder enforces having a fly.json file. It MUST contain the following fields: 'Genotype', 'anatomy channel' and
'functional channel':
- Genotype is used to create subfolders for each genotype which contain 'fly_001', 'fly_002' etc.
- anatomy channel is a string with one of the following three values: 'channel1', 'channel2' or 'channel3'
- functional channel is a list of strings, for example ['channel2',''], or ['channel1','channel2']
- A given 'fly' will be assumed to always have the same anatomical and functional channels! I.e. if a fly is GCaMP and tdTomato, all recordings (of that fly, so that's anat, func0, func1 etc.) are assumed to be made with both the red and green channel!
- correlation is done with a vectorized function I wrote instead of the scipy-pearson function which only works in 1D. Correlation now takes <1 minute. Result is expected to be almost identical (float32 as opposed to scipy float64) and precision can be increased if need be.
- make_mean_brain was previously calculated with standard settings which casts a float64 array. For example: meanbrain = np.mean(brain_data, axis=-1), meanbrain.dtype -> dtype('float64') However, we never seem to be in float64 space so I changed it to: meanbrain = np.mean(brain_data, axis=-1, dtype=np.float32), meanbrain.dtype -> dtype('float32')
- We assume that a given fly was imaged with a given set of channels. i.e. if fly001 has GCaMP and tdTomato it is assumed that ALL folders inside fly001 have data with two channels!
- I'm unsure about whether the conversion from uint16 to float32 is necessary and whether it's ok to convert back to uint16 (happens for example when writing the h5 file back to a nii file).
- Why use nibabel at all? e.g. motion correction ants.registration takes a numpy array it seems. > Probably because it can be opened natively in fiji?
- align_anat calls: Why do we provide a resolution (e.g. res_ant = (0.653, 0.653, 1)) and then resample with (2,2,2)? It helps with plotting! Bella made the FDA 2um isotropic. The resolution would be higher but we don't really see more and it make plotting easier/faster
- Dangerous: in fictrac_utils.smooth_and_interp_fictrac the scipy.signal.savgol filter has a fixed smoothing length that does NOT depend on the framerate. This will yield different smoothing results depending on the recording framerate!
- Check line 146/147 in motion_correction.py in brainsss. Data is loaded as uint16 and then converted to float32 Meanbrain is made by calling np.mean which returns a float64 array. I think this might lead to a loss of precision Better to avoid using uint16 in line 146 and use directly float32.
- Dangerous: line 61 in 'temporal_high_pass_filter.py' in brainsss - sigma is defined as 200 and not dependent on the speed at which data was collected! I think this will yield different filtering results depending on the aquisition speed!
- Unclear what's happening/what the consequence is: in align_anat.py starting line 135: the variable is called fwdtranforms_save_dirs but it contains moco['invtransforms']. I think it does copy invtransforms but not 100% sure.
- Try parallelization of motion correction. This is by far the slowest step and would profit greatly from parallelization. -> Done
- Test what happens if we make supervoxels per volume (instead of z-slice). TBD but reason for supervoxel per z-slice is to keep temporal resolution as high as possible: Each slice is recorded ~20 ms later compared to the previous one. If we do 3D supervoxels we'll time-smear
- Keep track of dtypes more consistenly. -> Done
- Correct axis metadata in nii files! Problem is that for us z=0 is posterior whereas for field it's anterior.
- Resultion/voxel size should go into the nii file which should help with alignments.
did: pip list showed that I had urllib3==2.1.0 installed
https://stackoverflow.com/questions/76187256/importerror-urllib3-v2-0-only-supports-openssl-1-1-1-currently-the-ssl-modu suggested to just do:
pip install urllib3==1.26.6
Warning indeed went away
To install on my M2 Mac (also see: ANTsX/ANTsPy#519) install homebrew: https://brew.sh/
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install libpng
in environment: pip install cmake
pip install itk
pip install Downloads/antspyx-0.3.8-cp310-cp310-macosx_13_0_arm64.whl
pip install snakemake
pip install h5py
pip install pyfiglet
pip install natsort
Create environment
python3 -m venv .env_snake_brainsss
source .env_snake_brainsss/bin/activate
download wheel for arm Mac https://github.com/dipterix/rpyANTs/releases/tag/0.0.1.9000
pip install YOUR WHEEL, for example: pip install antspyx-0.3.8-cp310-cp310-macosx_13_0_arm64.whl
python3 -m venv .env_snake_brainsss source .env_snake_brainsss/bin/activate
(### Install mamba https://snakemake.readthedocs.io/en/stable/tutorial/setup.html
conda activate base mamba create -c conda-forge -c bioconda -n snakemake snakemake
To activate env: conda activate snakemake)