Tutorials

To get started please follow the Installation instructions to install STIM either through Conda or by building it from source. There are two different examples based on the storage layout, a single slice one and one with multiple slices. Therefore, we first explain the basics of our storage layout.

For the tutorials, please download the example Visium data by clicking here and navigate to the folder where the data is stored. We assume you installed STIM using Conda and have the appropriate Conda environment active. If you compiled STIM from source, the executables may not be in your $PATH. In this case, call them with the full path (e.g., ./st-explorer if you installed them in the current directory). Note: your browser might automatically unzip the data, we cover both cases during the resaving step in the tutorials below.

Data layout

A spatial transcriptomics dataset can consist of a single 2-dimensional (2d) slice, or a container that contains several 2d slices and thereby forms a 3d volume. Note that for any 3d volume (container-dataset), each 2d slice can also be addressed as an individual dataset (slice-dataset). Most commands support both types of datasets, while some require a container (e.g. alignment).

Slice-datasets can either be saved in an anndata-conforming layout, where the expression values, locations and annotations are stored in /X, /obsm/spatial and /obs, respectively; or in a generic hierarchical layout, where the arrays are stored in /expressionValues, /locations and /annotations, respectively. The N5 API is used to read and write these layouts using the N5, Zarr, or HDF5 backend. If your slice(s) are stored in .csv files, you can use the st-resave command (see below) to resave your data into one of the supported formats by specifying the extension of the output as .h5 (generic HDF5), .n5 (generic N5), or .zarr (generic Zarr); an additional suffix ad is used to indicate the AnnData-conforming layout (e.g. h5ad for HDF5-backed AnnData).

For a slice-dataset, you can:

interactively view it using st-explorer (explore all genes & annotations) or st-bdv-view (view multiple genes in parallel)
render the dataset in ImageJ/Fiji and save the rendering, e.g., as TIFF, using st-render;
normalize the dataset using st-normalize;
add annotations such as, e.g., celltypes, using st-add-annotations;
create a container-dataset from one or more slice-datasets (see below).

For alignment of several slices, slices have to be grouped into an N5-container to allow additional annotations to be stored. In addition to all commands listed above for slice-datasets, the subsequent commands can be used for container-datasets:

create a container-dataset containing one or more existing slice-datasets using st-add-slice;
add a slice-dataset to a pre-existing container-dataset using st-add-slice;
perform pairwise alignment of slices using st-align-pairs (pre-processing);
visualize aligned pairs of slices using st-align-pairs-view (optional user verification);
perform global alignment of all slices using st-align-global (yielding the actual transformation for each slice-dataset);
visualize globally aligned data in BigDataViewer using st-bdv-view.

Tutorial: interactively exploring a single slice-dataset

First, we need to convert the data we just downloaded as CSV into one of the supported formats for efficent storage and access to the dataset. We want the first slice of the data to be saved in an anndata file called slice1.h5ad. Assuming the data are in the downloaded visium.zip file in the same directory as the executables, execute the following:

st-resave -i visium.zip/section1_locations.csv,visium.zip/section1_reads.csv,slice1.h5ad

This will automatically load the *.csv files from within the zipped file and create a slice1.h5ad file in the current directory (alternatively, you could extract the *.csv files as well and link them). The entire resaving process should take about 10 seconds on a modern notebook with an SSD. Note: if your browser automatically unzipped the data, just change visium.zip to the respective folder name, most likely visium.*

Next, we will simply take a look at the slice-dataset directly:

st-explorer -i slice1.h5ad -c '0,110'

First, type calm2 into the 'search gene' box. Using -c '0,110' we already set the display range to more or less match this dataset. You can manually change it by clicking in the BigDataViewer window and press s to bring up the brightness dialog. Feel free to play with the Visualization Options in the explorer, e.g. move Gauss Rendering to 0.5 to get a sharper image and then play with the Median Filter radius to filter the data.

Now, we will create a TIFF image for gene Calm2 and Mbp:

st-render -i slice1.h5ad -g 'Calm2,Mbp' -sf 0.5

You can now for example overlay both images into a two-channel image using Image > Color > Merge Channels and select Calm2 as magenta and Mbp as green. You could for example convert this image to RGB Image > Type > RGB Color and then save it as TIFF, JPEG or AVI (e.g JPEG compression). These can be added to your presentation or paper for example, check out our beautiful AVI here (you need to click download on the right top). You could render a bigger image setting -s 0.1. Note: Please check the documentation of ImageJ and Fiji for help how to further process images.

Tutorial: aligning a multi-slice container-dataset

Make sure you followed the previous tutorial such that you've already resaved the first slice of the visium dataset as anndata file slice1.h5ad.
In order to perform the alignment of the whole dataset (would work identically for more than two slices), we need to create a container-dataset containing the already resaved slice-dataset:

st-add-slice -c visium.n5 -i slice1.h5ad

This will create an N5 container visium.n5 and link the first slice to it. If you don't want the slice to be linked but moved instead, you can use the -m flag. Also, custom storage locations for the location, expression values, and annotations arrays within the slice can be given by -l, -e, and -a, respectively.

Now we resave the second slice of the data as N5 slice-dataset. Assuming the data are in the downloaded visium.zip file in the same directory as the executables:

st-resave \
   -i visium.zip/section2_locations.csv,visium.zip/section2_reads.csv,slice2.n5 \
   -c visium.n5

It will automatically load the *.csv files from within the zipped file and add it to the visium.n5 container-dataset already containing the first slice. The entire resaving process should take about 10 seconds on a modern notebook with an SSD. Note: if your browser automatically unzipped the data, just change visium.zip to the respective folder name, most likely visium.

Next, we can again take a look at the data, which now includes both slice-datasets. We can do this interactively or by rendering using one of the following commands:

st-explorer -i visium.n5 -c '0,110'
st-bdv-view -i visium.n5 -c '0,110' -g 'Calm2,Mbp' -sf 0.5
st-render -i visium.n5 -g 'Calm2,Mbp' -sf 0.5

Selecting genes and adjusting visualization options work exactly as in the first tutorial. Example overlay of calm-2, mbp We can now overlay both images into a two-channel image again using Image > Color > Merge Channels and select Calm2 as magenta and Mbp as green. By flipping through the slices (slice1 and slice2) you will realize that they are not aligned.

To remedy this, we will perform alignment of the two slices. We will use 15 automatically selected genes -n, a maximum error of 100 --maxEpsilon (in units of the sequenced locations) and require at least 30 inliers per gene --minNumInliersGene (this dataset is more robust than the SlideSeq one). The alignment process takes around 1-2 minutes on a modern notebook. Note: at this point no transformations are stored within the container-dataset, but only the list of corresponding points between all pairs of slices.

st-align-pairs -c visium.n5 -n 15 -sf 0.5 --maxEpsilon 100 --minNumInliersGene 30

For your dataset, the optimal choice of parameters may vary. A good baseline for the --maxEpsilon parameter is ten times the average distance between the sequenced points. If the --maxEpsilon option is not given, this value is computed and used automatically. For the number of selected genes -n, higher values yield better results but then alignment is slower. Increasing the minimal number of inliers per gene --minNumInliersGene can also increase alignment quality, but can lead to the alignment to fail.

Now we will visualize before/after alignment of this pair of slices. To this end, we create two independent images, one using st-render (see above) and one using st-align-pairs-view on the automatically selected gene mt-Nd4. st-render will display the slices unaligned, while st-align-pairs-view will show them aligned.

st-render -i visium.n5 -sf 0.5 -g mt-Nd4
st-align-pairs-view -c visium.n5 -sf 0.5 -g mt-Nd4

Note: to create the GIF shown I saved both images independently, opened them in Fiji, cropped them, combined them, converted them to 8-bit color, set framerate to 1 fps, and saved it as one GIF.

Finally, we perform the global alignment. In this particular case, it is identical to the pairwise alignment process as we only have two slices. However, we still need to do it so the final transformations for the slices are stored in the slice-datasets. After that, st-explorer, st-bdv-view and st-render will take these transformations into account when displaying the data. This final processing step usually only takes a few seconds.

st-align-global -c visium.n5 --absoluteThreshold 100 -sf 0.5 --lambda 0.0 --skipICP

The final dataset can for example be visualized and interactively explored using BigDataViewer. Therefore, we specify three genes -g Calm2,Mbp,mt-Nd4, a crisper rendering -sf 0.5, and a relative z-spacing between the two planes that shows them close to each other -z 2. Of course, the same data can be visualized using st-explorer and st-render, and visualization options such as color or contrast per gene can be adjusted manually.

st-bdv-view -i visium.n5 -g Calm2,Mbp,mt-Nd4 -c '0,150' -sf 0.5 -z 2

We encourage you to use this small two slice dataset as a starting point for playing with and extending STIM. If you have any questions, feature requests or concerns please open an issue here on GitHub. Thanks so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorials

Data layout

Tutorial: interactively exploring a single slice-dataset

Tutorial: aligning a multi-slice container-dataset

Clone this wiki locally