-
Notifications
You must be signed in to change notification settings - Fork 1
Tutorials
To get started please follow the Installation instructions to install STIM either through Conda or by building it from source. There are two different examples based on the storage layout, a single slice one and one with multiple slices. Therefore, we first explain the basics of our storage layout.
For the tutorials, please download the example Visium data by clicking here and navigate to the folder where the data is stored. We assume you installed STIM using Conda and have the appropriate Conda environment active. If you compiled STIM from source, the executables may not be in your $PATH
. In this case, call them with the full path (e.g., ./st-explorer
if you installed them in the current directory).
Note: your browser might automatically unzip the data, we cover both cases during the resaving step in the tutorials below.
A spatial transcriptomics dataset can consist of a single 2-dimensional (2d) slice, or a container that contains several 2d slices and thereby forms a 3d volume. Note that for any 3d volume (container-dataset), each 2d slice can also be addressed as an individual dataset (slice-dataset). Most commands support both types of datasets, while some require a container (e.g. alignment).
Slice-datasets can either be saved in an anndata-conforming layout, where the expression values, locations and annotations are stored in /X
, /obsm/spatial
and /obs
, respectively; or in a generic hierarchical layout, where the arrays are stored in /expressionValues
, /locations
and /annotations
, respectively. The N5 API is used to read and write these layouts using the N5, Zarr, or HDF5 backend. If your slice(s) are stored in .csv
files, you can use the st-resave
command (see below) to resave your data into one of the supported formats by specifying the extension of the output as .h5
(generic HDF5), .n5
(generic N5), or .zarr
(generic Zarr); an additional suffix ad
is used to indicate the AnnData-conforming layout (e.g. h5ad
for HDF5-backed AnnData).
For a slice-dataset, you can:
- interactively view it using
st-explorer
(explore all genes & annotations) orst-bdv-view
(view multiple genes in parallel) - render the dataset in ImageJ/Fiji and save the rendering, e.g., as TIFF, using
st-render
; - normalize the dataset using
st-normalize
; - add annotations such as, e.g., celltypes, using
st-add-annotations
; - create a container-dataset from one or more slice-datasets (see below).
For alignment of several slices, slices have to be grouped into an N5-container to allow additional annotations to be stored. In addition to all commands listed above for slice-datasets, the subsequent commands can be used for container-datasets:
- create a container-dataset containing one or more existing slice-datasets using
st-add-slice
; - add a slice-dataset to a pre-existing container-dataset using
st-add-slice
; - perform pairwise alignment of slices using
st-align-pairs
(pre-processing); - visualize aligned pairs of slices using
st-align-pairs-view
(optional user verification); - perform global alignment of all slices using
st-align-global
(yielding the actual transformation for each slice-dataset); - visualize globally aligned data in BigDataViewer using
st-bdv-view
.
- First, we need to convert the data we just downloaded as CSV into one of the supported formats for efficent storage and access to the dataset. We want the first slice of the data to be saved in an anndata file called
slice1.h5ad
. Assuming the data are in the downloadedvisium.zip
file in the same directory as the executables, execute the following:
st-resave -i visium.zip/section1_locations.csv,visium.zip/section1_reads.csv,slice1.h5ad
This will automatically load the *.csv
files from within the zipped file and create a slice1.h5ad
file in the current directory (alternatively, you could extract the *.csv
files as well and link them). The entire resaving process should take about 10 seconds on a modern notebook with an SSD. Note: if your browser automatically unzipped the data, just change visium.zip
to the respective folder name, most likely visium
.*
- Next, we will simply take a look at the slice-dataset directly:
st-explorer -i slice1.h5ad -c '0,110'
First, type calm2
into the 'search gene' box. Using -c '0,110'
we already set the display range to more or less match this dataset. You can manually change it by clicking in the BigDataViewer window and press s
to bring up the brightness dialog. Feel free to play with the Visualization Options in the explorer, e.g. move Gauss Rendering to 0.5 to get a sharper image and then play with the Median Filter radius to filter the data.
- Now, we will create a TIFF image for gene Calm2 and Mbp:
st-render -i slice1.h5ad -g 'Calm2,Mbp' -sf 0.5
You can now for example overlay both images into a two-channel image using Image > Color > Merge Channels
and select Calm2 as magenta and Mbp as green. You could for example convert this image to RGB Image > Type > RGB Color
and then save it as TIFF, JPEG or AVI (e.g JPEG compression). These can be added to your presentation or paper for example, check out our beautiful AVI here (you need to click download on the right top). You could render a bigger image setting -s 0.1
. Note: Please check the documentation of ImageJ and Fiji for help how to further process images.
-
Make sure you followed the previous tutorial such that you've already resaved the first slice of the visium dataset as anndata file
slice1.h5ad
. -
In order to perform the alignment of the whole dataset (would work identically for more than two slices), we need to create a container-dataset containing the already resaved slice-dataset:
st-add-slice -c visium.n5 -i slice1.h5ad
This will create an N5 container visium.n5
and link the first slice to it. If you don't want the slice to be linked but moved instead, you can use the -m
flag. Also, custom storage locations for the location, expression values, and annotations arrays within the slice can be given by -l
, -e
, and -a
, respectively.
- Now we resave the second slice of the data as N5 slice-dataset. Assuming the data are in the downloaded
visium.zip
file in the same directory as the executables:
st-resave \
-i visium.zip/section2_locations.csv,visium.zip/section2_reads.csv,slice2.n5 \
-c visium.n5
It will automatically load the *.csv
files from within the zipped file and add it to the visium.n5
container-dataset already containing the first slice. The entire resaving process should take about 10 seconds on a modern notebook with an SSD. Note: if your browser automatically unzipped the data, just change visium.zip
to the respective folder name, most likely visium
.
- Next, we can again take a look at the data, which now includes both slice-datasets. We can do this interactively or by rendering using one of the following commands:
st-explorer -i visium.n5 -c '0,110'
st-bdv-view -i visium.n5 -c '0,110' -g 'Calm2,Mbp' -sf 0.5
st-render -i visium.n5 -g 'Calm2,Mbp' -sf 0.5
Selecting genes and adjusting visualization options work exactly as in the first tutorial.
We can now overlay both images into a two-channel image again using Image > Color > Merge Channels
and select Calm2 as magenta and Mbp as green. By flipping through the slices (slice1 and slice2) you will realize that they are not aligned.
- To remedy this, we will perform alignment of the two slices. We will use 15 automatically selected genes
-n
, a maximum error of 100--maxEpsilon
(in units of the sequenced locations) and require at least 30 inliers per gene--minNumInliersGene
(this dataset is more robust than the SlideSeq one). The alignment process takes around 1-2 minutes on a modern notebook. Note: at this point no transformations are stored within the container-dataset, but only the list of corresponding points between all pairs of slices.
st-align-pairs -c visium.n5 -n 15 -sf 0.5 --maxEpsilon 100 --minNumInliersGene 30
For your dataset, the optimal choice of parameters may vary. A good baseline for the --maxEpsilon
parameter is ten times the average distance between the sequenced points. If the --maxEpsilon
option is not given, this value is computed and used automatically. For the number of selected genes -n
, higher values yield better results but then alignment is slower. Increasing the minimal number of inliers per gene --minNumInliersGene
can also increase alignment quality, but can lead to the alignment to fail.
-
Now we will visualize before/after alignment of this pair of slices. To this end, we create two independent images, one using
st-render
(see above) and one usingst-align-pairs-view
on the automatically selected gene mt-Nd4.st-render
will display the slices unaligned, whilest-align-pairs-view
will show them aligned.
st-render -i visium.n5 -sf 0.5 -g mt-Nd4
st-align-pairs-view -c visium.n5 -sf 0.5 -g mt-Nd4
Note: to create the GIF shown I saved both images independently, opened them in Fiji, cropped them, combined them, converted them to 8-bit color, set framerate to 1 fps, and saved it as one GIF.
- Finally, we perform the global alignment. In this particular case, it is identical to the pairwise alignment process as we only have two slices. However, we still need to do it so the final transformations for the slices are stored in the slice-datasets. After that,
st-explorer
,st-bdv-view
andst-render
will take these transformations into account when displaying the data. This final processing step usually only takes a few seconds.
st-align-global -c visium.n5 --absoluteThreshold 100 -sf 0.5 --lambda 0.0 --skipICP
-
The final dataset can for example be visualized and interactively explored using BigDataViewer. Therefore, we specify three genes
-g Calm2,Mbp,mt-Nd4
, a crisper rendering-sf 0.5
, and a relative z-spacing between the two planes that shows them close to each other-z 2
. Of course, the same data can be visualized usingst-explorer
andst-render
, and visualization options such as color or contrast per gene can be adjusted manually.
st-bdv-view -i visium.n5 -g Calm2,Mbp,mt-Nd4 -c '0,150' -sf 0.5 -z 2
We encourage you to use this small two slice dataset as a starting point for playing with and extending STIM. If you have any questions, feature requests or concerns please open an issue here on GitHub. Thanks so much!