diff --git a/docs/cryoet_data_portal_docsite_data.md b/docs/cryoet_data_portal_docsite_data.md
index f0505199b..c48812488 100644
--- a/docs/cryoet_data_portal_docsite_data.md
+++ b/docs/cryoet_data_portal_docsite_data.md
@@ -25,7 +25,42 @@ Datasets are contributed sets of image files associated with imaging one sample
The Browse Datasets page shows a table of all datasets on the Portal. These datasets are not currently ordered. Instead, the left side filter panel provides options for filtering the table according to files included in the datasets, such as ground truth annotation files; the author or ID of the dataset; organism in the sample; hardware; metadata for the tilt series or reconstructed tomograms. In addition, the search bar filters based on keywords or phrases contained in the dataset titles. The dataset entries in the table have descriptive names, such as "S. pombe cryo-FIB lamellae acquired with defocus-only," which aim to summarize the experiment as well as a Dataset ID assigned by the Portal, the organism name, number of runs in the dataset, and list of annotated objects, such as membrane. Datasets on the Portal may be found in other image databases. On the Browse Datasets page, the datasets table shows the EMPIAR ID for datasets that are also found on the Electron Microscopy Public Image Archive.
-On a given Dataset Overview page, the View All Info panel contains metadata for the dataset.
+On a given Dataset Overview page, the View All Info panel contains metadata for the dataset. These metadata are defined in the tables below including their mapping to attributes in the Portal API:
+
+**Dataset Metadata**
+| **Portal Metadata** | **API Expression** | **Definition** |
+|---------------------|---------------------------------------|---------------------------------------------------------------------|
+| Deposition Date | Dataset.deposition_date | Date when a dataset is initially received by the Data Portal. |
+| Grant ID | DatasetFunding.grant_id | Grant identifier provided by the funding agency. |
+| Funding Agency | DatasetFunding.funding_agency_name | Name of the funding agency. |
+| Related Databases | Dataset.related_database_entries | The dataset identifier for other databases, e.g. EMPIAR, that contain this dataset. |
+
+**Sample and Experiment Conditions**
+| **Portal Metadata** | **API Expression** | **Definition** |
+|---------------------|--------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------|
+| Sample Type | Dataset.sample_type | Type of sample: cell, tissue, organism, intact organelle, in-vitro mixture, in-silico synthetic data, other. |
+| Organism Name | Dataset.organism_name | Name of the organism from which the biological sample is derived from, e.g. homo sapiens. |
+| Tissue Name | Dataset.tissue_name | Name of the tissue from which a biological sample used in a CryoET study is derived from. |
+| Cell Name | Dataset.cell_name | Name of the cell from which a biological sample used in a CryoET study is derived from, e.g. sperm. |
+| Cell Line or Strain Name | Dataset.cell_strain_name | Cell line or strain for the sample e.g. C57BI |
+| Cellular Component | Dataset.cell_component_name | Name of the cellular component, e.g. sperm flagellum |
+| Sample Preparation | Dataset.sample_preparation | Description of how the sample was prepared. |
+| Grid Preparation | Dataset.grid_preparation | Description of how the CryoET grid was prepared. |
+| Other Setup | Dataset.other_setup | Description of other setup not covered by sample preparation or grid preparation that may make this dataset unique in the same publication. |
+
+**Tilt Series**
+| **Portal Metadata** | **API Expression** | **Definition** |
+|-------------------------------------------|-------------------------------------------|---------------------------------------------------------------------------------------------------------------|
+| Acceleration Voltage | TiltSeries.acceleration_voltage | Electron Microscope Accelerator voltage in volts. |
+| Spherical Aberration Constant | TiltSeries.spherical_aberration_constant | Spherical Aberration Constant of the objective lens in millimeters. |
+| Microscope Manufacturer | TiltSeries.microscope_manufacturer | Name of the microscope manufacturer. |
+| Microscope Model | TiltSeries.microscope_model | Microscope model name. |
+| Energy Filter | TiltSeries.microscope_energy_filter | Energy filter setup used. |
+| Phase Plate | TiltSeries.microscope_phase_plate | Phase plate configuration. |
+| Image Corrector | TiltSeries.microscope_image_corrector | Image corrector setup. |
+| Additional microscope optical setup | TiltSeries.microscope_additional_info | Other microscope optical setup information, in addition to energy filter, phase plate and image corrector. |
+| Camera Manufacturer | TiltSeries.camera_manufacturer | Name of the camera manufacturer. |
+| Camera Model | TiltSeries.camera_model | Camera model name. |
### Dataset Overview Page
@@ -47,11 +82,51 @@ The tilt series quality score is assigned by the dataset authors to communicate
The `Download Dataset` button opens a dialog with instructions for downloading the dataset using [Amazon Web Services Command Line Interface](./cryoet_data_portal_docsite_aws.md) or the [Portal API](./python-api.rst). Datasets are downloaded as folders named the Dataset ID. The folder contains subfolders for each run named the author-chosen run name, a folder named Images which contains the key photos of the dataset displayed on the Portal, and a JSON file named `dataset_metadata.json` containing the dataset metadata. The run folders contain subfolders named Tomogram and TiltSeries, containing the tomogram and tilt series image files, and a JSON file named `run_metadata.json` containing the run metadata. More details on the run folder file structure is found in the documentation [below](#run-download-options).
+The metadata schema of any JSON file stored with the data on the data portal's S3 bucket is described in LinkML and can be found [here](https://github.com/chanzuckerberg/cryoet-data-portal-backend/tree/main/schema/v1.1.0).
+
## Runs
A tomography run is a collection of all data and annotations related to one physical location in a sample and is associated with a dataset that typically contains many other runs. On the Data Portal pages, runs are directly linked to their tomograms. However, in the [data schema](https://chanzuckerberg.github.io/cryoet-data-portal/python-api.html#data-model) used in the Portal API, runs are connected to tomograms through the `TomogramVoxelSpacing` class which specifies the sampling or voxel size of the tomogram. For a single run, multiple tomograms of different spacings can be available.
-An overview of all runs in a dataset is presented in the Dataset Overview page. Each run has its own Run Overview Page, where the View All Info panel contains metadata for the run.
+An overview of all runs in a dataset is presented in the Dataset Overview page. Each run has its own Run Overview Page, where the View All Info panel contains metadata for the run. These metadata are defined in the tables below including their mapping to attributes in the Portal API:
+
+**Tilt Series**
+| **Portal Metadata** | **API Expression** | **Definition** |
+|-------------------------------------------|-------------------------------------------|---------------------------------------------------------------------------------------------------------------|
+| Microscope Manufacturer | TiltSeries.microscope_manufacturer | Name of the microscope manufacturer. |
+| Microscope Model | TiltSeries.microscope_model | Microscope model name. |
+| Phase Plate | TiltSeries.microscope_phase_plate | Phase plate configuration. |
+| Image Corrector | TiltSeries.microscope_image_corrector | Image corrector setup. |
+| Additional microscope optical setup | TiltSeries.microscope_additional_info | Other microscope optical setup information, in addition to energy filter, phase plate and image corrector. |
+| Acceleration Voltage | TiltSeries.acceleration_voltage | Electron Microscope Accelerator voltage in volts. |
+| Spherical Aberration Constant | TiltSeries.spherical_aberration_constant | Spherical Aberration Constant of the objective lens in millimeters. |
+| Camera Manufacturer | TiltSeries.camera_manufacturer | Name of the camera manufacturer. |
+| Camera Model | TiltSeries.camera_model | Camera model name. |
+| Energy Filter | TiltSeries.microscope_energy_filter | Energy filter setup used. |
+| Data Acquisition Software | TiltSeries.data_acquisition_software | Software used to collect data. |
+| Pixel Spacing | TiltSeries.pixel_spacing | Pixel spacing for the tilt series. |
+| Tilt Axis | TiltSeries.tilt_axis | Rotation angle in degrees. |
+| Tilt Range | TiltSeries.tilt_range | Total tilt range in degrees. |
+| Tile Step | TiltSeries.tiltstep | Tilt step in degrees. |
+| Tilting Scheme | TiltSeries.tilting_scheme | The order of stage tilting during acquisition of the data. |
+| Total Flux | TiltSeries.total_flux | Number of electrons reaching the specimen in a square Angstrom area for the entire tilt series. |
+| Binning from Frames | TiltSeries.binning_from_frames | Describes the binning factor from frames to tilt series file. |
+| Series is Aligned | No API field | True or false, indicating whether the tilt series images have been transformed to account for the tomographic alignment. |
+| Related EMPIAR Entry | TiltSeries.related_empiar_entry | EMPIAR dataset identifier If a tilt series is deposited into EMPIAR. |
+
+**Tomogram**
+| **Portal Metadata** | **API Expression** | **Definition** |
+|-----------------------------------------|-----------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
+| Reconstruction Software | Tomogram.reconstruction_software | Name of software used for reconstruction. |
+| Reconstruction Method | Tomogram.reconstruction_method | Reconstruction method, e.g. Weighted back-projection, SART, SIRT. |
+| Processing Software | Tomogram.processing_software | Processing software used to derive the tomogram. |
+| Available Processing | Tomogram.processing | Description of additional processing used to derive the tomogram, e.g. denoised. |
+| Smallest Available Voxel Spacing | `min_vs = min([vs.voxel_spacing for vs in Run.tomogram_voxel_spacings])` | Smallest voxel spacing of the available tomograms. |
+| Size (x, y, z) | `(Tomogram.size_x, Tomogram.size_y, Tomogram.size_z)` or `Tomogram.scale0_dimensions` | Comma separated x,y,z dimensions of the unscaled tomogram. |
+| Fiducial Alignment Status | Tomogram.fiducial_alignment_status | Fiducial Alignment status: True = aligned with fiducial, False = aligned without fiducial. |
+| Ctf Corrected | Tomogram.ctf_corrected | Whether this tomogram is contrast transfer function corrected. |
+| Affine Transformation Matrix | Tomogram.affine_transformation_matrix | The flip or rotation transformation. |
+
### Run Overview Page
@@ -84,7 +159,38 @@ Annotations also have an optional precision field, which is the percentage of tr
Authors may also utilize the Ground Truth flag on entries in the annotation table. The Ground Truth flag indicates that this annotation is endorsed by the author for use as training or validation data for machine learning models.
-Each annotation has its own metadata, which can be viewed using the info icon on the entry in the annotations table.
+Each annotation has its own metadata, which can be viewed using the info icon on the entry in the annotations table. These metadata are defined in the tables below including their mapping to attributes in the Portal API:
+
+**Annotation Overview**
+| **Portal Metadata** | **API Expression** | **Definition** |
+|---------------------------|-------------------------------------|------------------------------------------------------------------------------------------------|
+| Annotation ID | Annotation.id | Numeric identifier assigned by the Portal. |
+| Annotation Authors | Annotation.authors | Authors of this annotation. |
+| Publication | Annotation.annotation_publication | DOIs for publications that describe the dataset. |
+| Deposition Date | Annotation.deposition_date | Date when an annotation set is initially received by the Data Portal. |
+| Release Date | Annotation.release_date | Date when annotation data is made public by the Data Portal. |
+| Last Modified Date | Annotation.last_modified_date | Date when an annotation was last modified in the Data Portal. |
+| Method Type | Annotation.method_type | The method type for generating the annotation (e.g., manual, hybrid, automated). |
+| Annotation Method | Annotation.annotation_method | Describes how the annotation is made, e.g., Manual, crYoLO, Positive Unlabeled Learning, template matching. |
+| Annotation Software | Annotation.annotation_software | Software used for generating this annotation. |
+
+**Annotation Object**
+| **Portal Metadata** | **API Expression** | **Definition** |
+|---------------------------|------------------------------------|--------------------------------------------------------------------------------------------------------------|
+| Object Name | Annotation.object_name | Name of the object being annotated, e.g., ribosome, nuclear pore complex, actin filament, membrane. |
+| GO ID | Annotation.object_id | Gene Ontology Cellular Component identifier for the annotation object. |
+| Object Count | Annotation.object_count | Number of objects identified. |
+| Object Shape Type | AnnotationFile.shape_type | Description of whether this is a Point, OrientedPoint, or SegmentationMask file. |
+| Object State | Annotation.object_state | Additional information about the annotated object not captured by the gene ontology (e.g., open or closed state for molecules). |
+| Object Description | Annotation.object_description | Description of the annotated object, including additional information not covered by the Annotation object name and state. |
+
+**Annotation Confidence**
+| **Portal Metadata** | **API Expression** | **Definition** |
+|---------------------------|--------------------------------------|----------------------------------------------------------------------------------------------------|
+| Ground Truth Status | Annotation.ground_truth_status | Whether an annotation is considered ground truth, as determined by the annotation author. |
+| Ground Truth Used | Annotation.ground_truth_used | Annotation filename used as ground truth for precision and recall. |
+| Precision | Annotation.confidence_precision | Percentage of annotation objects being true positive. |
+| Recall | Annotation.confidence_recall | Percentage of true positives being annotated correctly. |
### Visualizing Annotations with Tomograms in Neuroglancer
diff --git a/docs/cryoet_data_portal_docsite_landing.md b/docs/cryoet_data_portal_docsite_landing.md
index c481f5c30..dbf99bc2a 100644
--- a/docs/cryoet_data_portal_docsite_landing.md
+++ b/docs/cryoet_data_portal_docsite_landing.md
@@ -13,7 +13,7 @@ We welcome feedback from the community on the data structure, design and functio
- [Installation](https://chanzuckerberg.github.io/cryoet-data-portal/cryoet_data_portal_docsite_quick_start.html)
- [Python Client API Reference](https://chanzuckerberg.github.io/cryoet-data-portal/python-api.html)
-- [Example Code Snippets for Common Tasks](https://chanzuckerberg.github.io/cryoet-data-portal/cryoet_data_portal_docsite_examples.html)
+- [Tutorials](./tutorials.md)
- [napari Plugin Documentation](https://chanzuckerberg.github.io/cryoet-data-portal/cryoet_data_portal_docsite_napari.html)
## Amazon Web Services S3 Bucket Info
diff --git a/docs/cryoet_data_portal_docsite_quick_start.md b/docs/cryoet_data_portal_docsite_quick_start.md
index 9116dbd1c..fe9003b4f 100644
--- a/docs/cryoet_data_portal_docsite_quick_start.md
+++ b/docs/cryoet_data_portal_docsite_quick_start.md
@@ -1,11 +1,12 @@
# Quick start
-This page provides details to start using the CryoET Data Portal.
+This page provides details to help you get started using the CryoET Data Portal Client API.
**Contents**
-1. [Installation](#installation).
-2. [Python quick start](#python-quick-start).
+1. [Installation](#installation)
+2. [API Methods Overview](#api-methods-overview)
+3. [Example Code Snippets](#examples)
## Installation
@@ -18,7 +19,7 @@ The CryoET Data Portal Client requires a Linux or MacOS system with:
- Recommended: >5 Mbps internet connection.
- Recommended: for increased performance, use the API through an AWS-EC2 instance from the region `us-west-2`. The CryoET Portal data are hosted in a AWS-S3 bucket in that region.
-### Python
+### Install in a Virtual Environment
(Optional) In your working directory, make and activate a virtual environment or conda environment. For example:
@@ -33,13 +34,56 @@ Install the latest `cryoet_data_portal` package via pip:
pip install -U cryoet-data-portal
```
-## Python quick start
+## API Methods Overview
-Below are 3 examples of common operations you can do with the client. Check out the [examples page](https://chanzuckerberg.github.io/cryoet-data-portal/cryoet_data_portal_docsite_examples.html) for more code snippets.
+The Portal API has methods for searching and downloading data. **Every class** has a `find` and `get_by_id` method for selecting data, and most classes have `download...` methods for downloading the data. Below is a table of the API classes download methods.
-### Browse data in the portal
+| **Class** | **Download Methods** |
+|-------------------------|--------------------------------------------------------------------------------------------------------|
+| [Dataset](./python-api.rst#dataset)| `download_everything` |
+| [DatasetAuthor](./python-api.rst#datasetauthor)| Not applicable as this class doesn't contain data files|
+| [DatasetFunding](./python-api.rst#datasetfunding)| Not applicable as this class doesn't contain data files|
+| [Run](./python-api.rst#run)| `download_everything` |
+| [TomogramVoxelSpacing](./python-api.rst#tomogramvoxelspacing)| `download_everything` |
+| [Tomogram](./python-api.rst#tomogram)| `download_all_annotations`, `download_mrcfile`, `download_omezarr` |
+| [TomogramAuthor](./python-api.rst#tomogramauthor)| Not applicable as this class doesn't contain data files |
+| [Annotation](./python-api.rst#annotation)| `download` |
+| [AnnotationFile](./python-api.rst#annotationfile)| None, use the Annotation or Tomogram class to download annotations |
+| [AnnotationAuthor](./python-api.rst#annotationauthor)| Not applicable as this class doesn't contain data files |
+| [TiltSeries](./python-api.rst#tiltseries)| `download_alignment_file`, `download_angle_list`, `download_collection_metadata`, `download_mrcfile`, `download_omezarr` |
-The following iterates over all datasets in the portal, then all runs per dataset, then all tomograms per run
+The `find` method selects data based on user-chosen queries. These queries can have python operators `==`, `!=`, `>`, `>=`, `<`, `<=`; method operators `like`, `ilike`, `_in`; and strings or numbers. The method operators are defined in the table below:
+
+| **Method Operator** | **Definition** |
+|---------------------|----------------------------------------------------------------------------------------------|
+| like | partial match, with the `%` character being a wildcard |
+| ilike | case-insensitive partial match, with the `%` character being a wildcard |
+| _in | accepts a list of values that are acceptable matches |
+
+The general format of using the `find` method is as follows:
+
+```
+data_of_interest = find(client, queries)
+```
+
+The `get_by_id` method allows you to select data using the ID found on the Portal. For example, to select the data for [Dataset 10005](https://cryoetdataportal.czscience.com/datasets/10005) on the Portal and download it into your current directory use this snippet:
+
+```
+data_10005 = Dataset.get_by_id(client, 10005)
+data_10005.download_everything()
+```
+
+## Examples
+
+Below are 3 examples of common operations you can do with the API. Check out the [examples page](./cryoet_data_portal_docsite_examples.md) for more code snippets or the [tutorials page](./tutorials.md) for longer examples.
+
+### Browse all data in the portal
+
+To illustrate the relationships among the classes in the Portal, below is a loop that iterates over all datasets in the portal, then all runs per dataset, then all tomograms per run and outputs the name of each object.
+
+:::{attention}
+This loop is impractical! It iterates over all data in the Portal. It is simply for demonstrative purposes and should not be included in efficient code.
+:::
```python
from cryoet_data_portal import Client, Dataset
@@ -59,7 +103,7 @@ for dataset in Dataset.find(client):
```
-And outputs the name of each object:
+The output with the object names would display something like:
```
Dataset: S. pombe cells with defocus
@@ -69,6 +113,22 @@ Dataset: S. pombe cells with defocus
...
```
+### Find all datasets containing membrane annotations
+
+The below example uses the `find` method with a longer API expression in the query to select datasets that have membrane annotations and print the IDs of those datasets.
+
+```
+import cryoet_data_portal as portal
+
+# Instantiate a client, using the data portal GraphQL API by default
+client = portal.Client()
+
+# Use the find method to select datasets that contain membrane annotations
+datasets = portal.Dataset.find(client, [portal.Dataset.runs.tomogram_voxel_spacings.annotations.object_name.ilike("%membrane%")])
+for d in datasets:
+ print(d.id)
+```
+
### Find all tomograms for a certain organism and download preview-sized MRC files:
The following iterates over all tomograms related to a specific organism and downloads each tomogram in MRC format.
diff --git a/docs/figures/chimx_boundary.png b/docs/figures/chimx_boundary.png
new file mode 100644
index 000000000..06ef5d249
Binary files /dev/null and b/docs/figures/chimx_boundary.png differ
diff --git a/docs/figures/final.png b/docs/figures/final.png
new file mode 100644
index 000000000..111c8b50e
Binary files /dev/null and b/docs/figures/final.png differ
diff --git a/docs/figures/mesh_fit.png b/docs/figures/mesh_fit.png
new file mode 100644
index 000000000..36c6d8583
Binary files /dev/null and b/docs/figures/mesh_fit.png differ
diff --git a/docs/figures/prediction_fit.png b/docs/figures/prediction_fit.png
new file mode 100644
index 000000000..5ebc96e4d
Binary files /dev/null and b/docs/figures/prediction_fit.png differ
diff --git a/docs/figures/tomo_side_dark.png b/docs/figures/tomo_side_dark.png
new file mode 100644
index 000000000..1ffd7b2af
Binary files /dev/null and b/docs/figures/tomo_side_dark.png differ
diff --git a/docs/figures/tomo_side_light.png b/docs/figures/tomo_side_light.png
new file mode 100644
index 000000000..683632e3e
Binary files /dev/null and b/docs/figures/tomo_side_light.png differ
diff --git a/docs/figures/tomo_top_both.png b/docs/figures/tomo_top_both.png
new file mode 100644
index 000000000..2eea211fd
Binary files /dev/null and b/docs/figures/tomo_top_both.png differ
diff --git a/docs/figures/top_bottom_dark.png b/docs/figures/top_bottom_dark.png
new file mode 100644
index 000000000..c1a8b1f0f
Binary files /dev/null and b/docs/figures/top_bottom_dark.png differ
diff --git a/docs/figures/top_bottom_light.png b/docs/figures/top_bottom_light.png
new file mode 100644
index 000000000..d63d29b2c
Binary files /dev/null and b/docs/figures/top_bottom_light.png differ
diff --git a/docs/figures/valid_area_dark.png b/docs/figures/valid_area_dark.png
new file mode 100644
index 000000000..61b537428
Binary files /dev/null and b/docs/figures/valid_area_dark.png differ
diff --git a/docs/figures/valid_area_light.png b/docs/figures/valid_area_light.png
new file mode 100644
index 000000000..013318f58
Binary files /dev/null and b/docs/figures/valid_area_light.png differ
diff --git a/docs/index.rst b/docs/index.rst
index 18a43f42c..0a7ca82fa 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -2,13 +2,15 @@
:parser: myst_parser.sphinx_
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
:hidden:
cryoet_data_portal_docsite_quick_start.md
python-api
cryoet_data_portal_docsite_data.md
cryoet_data_portal_docsite_napari.md
- cryoet_data_portal_docsite_examples.md
+ tutorials.md
+ tutorial_sample_boundaries.md
+ cryoet_data_portal_docsite_examples.md
cryoet_data_portal_docsite_aws.md
cryoet_data_portal_docsite_faq.md
diff --git a/docs/tutorial_sample_boundaries.md b/docs/tutorial_sample_boundaries.md
new file mode 100644
index 000000000..f2ab98529
--- /dev/null
+++ b/docs/tutorial_sample_boundaries.md
@@ -0,0 +1,646 @@
+## Predicting sample boundaries
+
+![tutorial-goal](./figures/tomo_side_light.png)
+*Side view onto a cryo-electron tomogram [run 15094](https://cryoetdataportal.czscience.com/runs/15094) without (left) and with (right) sample boundary annotation*
+
+Biological samples acquired in a cryoET experiment are usually thin slabs of vitrified ice containing the biological specimen of interest. Unfortunately, it is difficult to determine orientation and thickness of the samples ahead of reconstruction. For this reason, volumes reconstructed from cryoET tilt series are often larger than the actual sample and contain a significant amount of empty space (i.e. the vacuum inside the TEM column).
+
+There are several reasons for why it can be useful to determine more accurate sample boundaries, for example:
+
+- statistical analysis of the sample preparation process
+- masking out the vacuum region to reduce the size of the volume
+- masking out the vacuum region during the training of a neural network
+- capping of membrane segmentations to define topological boundaries
+
+Below, we will show how to use [**copick**](https://github.com/copick/copick), an adapted version of [deepfinder](https://github.com/jtschwar/cryoet-deepfinder/tree/master) and [album](https://album.solutions/) to predict sample boundaries for datasets [10301](https://cryoetdataportal.czscience.com/datasets/10301) and [10302](https://cryoetdataportal.czscience.com/datasets/10302) from the [CZ cryoET Data Portal](https://cryoetdataportal.czscience.com). Copick is a cross-platform, storage-agnostic and server-less dataset API for cryoET datasets.
+
+![topview](./figures/tomo_top_both.png)
+
+*Top view onto the same tomogram ([run 15094](https://cryoetdataportal.czscience.com/runs/15094)) from dataset [10302](https://cryoetdataportal.czscience.com/datasets/10302).*
+
+### Step 0: Environment and Pre-requisites
+
+For the purpose of this tutorial we will assume that we are working on a machine with access to an NVIDIA GPU and a working `CUDA 12.3`/`CUDNN 8.9` installation. Before we can start, we need to install the necessary software. We will use the following tools:
+
+#### 1. ChimeraX and ChimeraX-copick (for visualization and annotation)
+
+Download and install ChimeraX from [here](https://www.cgl.ucsf.edu/chimerax/download.html). After installing ChimeraX, install the ChimeraX-copick extension by running the following command in ChimeraX:
+
+```
+toolshed install copick
+```
+
+#### 2. Album and copick-catalog (for processing steps)
+
+Comprehensive installation instructions for Album can be found on the [Album docs website](https://docs.album.solutions/en/latest/installation-instructions.html), but in brief, to install Album use:
+
+```bash
+conda create -n album album -c conda-forge
+conda activate album
+```
+
+Now, add copick's Album catalog ([copick-catalog](https://github.com/copick/copick-catalog)) to your album installation and install the requried solutions by running the following commands:
+
+```bash
+album add-catalog git@github.com:copick/copick-catalog.git
+album update && album upgrade
+album install copick:create_empty_picks:0.2.0
+album install copick:fit_sample:0.7.0
+album install copick:create_rec_limits:0.5.0
+album install copick:intersect_mesh:0.5.0
+album install copick:mesh_to_seg:0.7.0
+album install copick:sample_mesh:0.5.0
+album install copick:fit_sample_seg:0.9.0
+```
+
+#### 3. J-finder (for segmentation)
+
+Download and install a copick-compatible version of deepfinder:
+
+```bash
+conda create -n deepfinder python=3.10
+conda activate deepfinder
+git clone https://github.com/jtschwar/cryoet-deepfinder.git
+cd cryoet-deepfinder
+pip install .
+```
+
+### Step 1: Setup your copick projects
+
+We will create two copick projects that use datasets 10301 and 10302 from the CZ cryoET Data Portal. Both datasets stem
+from the same experiments and have the same characteristics, but the tomograms in dataset 10301 have protein annotations.
+We will use dataset 10301 as a training set and evaluate on dataset 10302.
+
+We will store new annotations in a local directory, called the "overlay", while the tomogram image data is obtained
+from the CZ cryoet data portal. In the following, we will create a configuration file `config_train.json` that describes
+the project. The configuration file is a JSON file that contains all information necessary to access the data and
+describing the objects that can be accessed and created using the copick API.
+
+The first part of the configuration file provides general information about the project, such as the project name,
+description, and copick-API version.
+
+
+ config_train.json
+
+ ```json
+ {
+ "config_type": "cryoet_data_portal",
+ "name": "Sample Boundary Prediction - Training Set",
+ "description": "This project uses dataset 10301 from the CZ cryoET Data Portal as a training set for sample boundary prediction.",
+ "version": "0.5.4"
+ }
+ ```
+
+
+Next, we define the objects that can be accessed and created using the copick API. In this case, we will create 5 objects:
+
+- top-layer -- the top layer of the sample
+- bottom-layer -- the bottom layer of the sample
+- valid-area -- the valid area of the reconstructed tomogram
+- sample -- the sample itself
+- valid-sample -- the sample excluding the invalid reconstruction area
+
+
+ config_train.json
+
+ ```json
+ {
+ "pickable_objects": [
+ {
+ "name": "top-layer",
+ "is_particle": true,
+ "label": 100,
+ "color": [ 255, 0, 0, 255],
+ "radius": 150,
+ "map_threshold": 0.037
+ },
+ {
+ "name": "bottom-layer",
+ "is_particle": true,
+ "label": 101,
+ "color": [
+ 0,
+ 255,
+ 0,
+ 255
+ ],
+ "radius": 150,
+ "map_threshold": 0.037
+ },
+ {
+ "name": "sample",
+ "is_particle": false,
+ "label": 102,
+ "color": [ 0, 0, 255, 128],
+ "radius": 150,
+ "map_threshold": 0.037
+ },
+ {
+ "name": "valid-area",
+ "is_particle": false,
+ "label": 103,
+ "color": [
+ 255,
+ 255,
+ 0,
+ 128
+ ],
+ "radius": 150,
+ "map_threshold": 0.037
+ },
+ {
+ "name": "valid-sample",
+ "is_particle": false,
+ "label": 2,
+ "color": [
+ 0,
+ 255,
+ 255,
+ 128
+ ],
+ "radius": 150,
+ "map_threshold": 0.037
+ }
+ ]
+ }
+ ```
+
+
+Finally, we define where **copick** should look for the data and store any annotations (in this case the home directory
+of Bob).
+
+
+ config_train.json
+
+ ```json
+ {
+ "overlay_root": "local:///home/bob/copick_project_train/",
+ "overlay_fs_args": {
+ "auto_mkdir": true
+ },
+ "dataset_ids" : [10301]
+ }
+ ```
+
+
+We will repeat this process for a second project, `config_evaluate.json`, that includes both dataset 10301 and dataset
+10302 for evaluation. Find both full examples below:
+
+
+ config_train.json
+
+ ```json
+ {
+ "config_type": "cryoet_data_portal",
+ "name": "Sample Boundary Prediction - Training Set",
+ "description": "This project uses dataset 10301 from the CZ cryoET Data Portal as a training set for sample boundary prediction.",
+ "version": "0.5.4",
+ "pickable_objects": [
+ {
+ "name": "top-layer",
+ "is_particle": true,
+ "label": 100,
+ "color": [ 255, 0, 0, 255],
+ "radius": 150,
+ "map_threshold": 0.037
+ },
+ {
+ "name": "bottom-layer",
+ "is_particle": true,
+ "label": 101,
+ "color": [
+ 0,
+ 255,
+ 0,
+ 255
+ ],
+ "radius": 150,
+ "map_threshold": 0.037
+ },
+ {
+ "name": "sample",
+ "is_particle": false,
+ "label": 102,
+ "color": [ 0, 0, 255, 128],
+ "radius": 150,
+ "map_threshold": 0.037
+ },
+ {
+ "name": "valid-area",
+ "is_particle": false,
+ "label": 103,
+ "color": [
+ 255,
+ 255,
+ 0,
+ 128
+ ],
+ "radius": 150,
+ "map_threshold": 0.037
+ },
+ {
+ "name": "valid-sample",
+ "is_particle": false,
+ "label": 2,
+ "color": [
+ 0,
+ 255,
+ 255,
+ 128
+ ],
+ "radius": 150,
+ "map_threshold": 0.037
+ }
+ ],
+ "overlay_root": "local:///home/bob/copick_project_train/",
+ "overlay_fs_args": {
+ "auto_mkdir": true
+ },
+ "dataset_ids" : [10301]
+ }
+ ```
+
+
+
+ config_evaluate.json
+
+ ```json
+ {
+ "config_type": "cryoet_data_portal",
+ "name": "Sample Boundary Prediction - Evaluation Set",
+ "description": "This project uses datasets 10301 and 10302 from the CZ cryoET Data Portal for sample boundary prediction.",
+ "version": "0.5.4",
+ "pickable_objects": [
+ {
+ "name": "top-layer",
+ "is_particle": true,
+ "label": 100,
+ "color": [ 255, 0, 0, 255],
+ "radius": 150,
+ "map_threshold": 0.037
+ },
+ {
+ "name": "bottom-layer",
+ "is_particle": true,
+ "label": 101,
+ "color": [
+ 0,
+ 255,
+ 0,
+ 255
+ ],
+ "radius": 150,
+ "map_threshold": 0.037
+ },
+ {
+ "name": "sample",
+ "is_particle": false,
+ "label": 102,
+ "color": [ 0, 0, 255, 128],
+ "radius": 150,
+ "map_threshold": 0.037
+ },
+ {
+ "name": "valid-area",
+ "is_particle": false,
+ "label": 103,
+ "color": [
+ 255,
+ 255,
+ 0,
+ 128
+ ],
+ "radius": 150,
+ "map_threshold": 0.037
+ },
+ {
+ "name": "valid-sample",
+ "is_particle": false,
+ "label": 2,
+ "color": [
+ 0,
+ 255,
+ 255,
+ 128
+ ],
+ "radius": 150,
+ "map_threshold": 0.037
+ }
+ ],
+ "overlay_root": "local:///home/bob/copick_project_evaluate/",
+ "overlay_fs_args": {
+ "auto_mkdir": true
+ },
+ "dataset_ids" : [10301, 10302]
+ }
+ ```
+
+
+### Step 2: Annotate the training set
+
+We will now use ChimeraX to annotate the top- and bottom- boundaries of the training set. In a first step we will create
+empty `CopickPicks` objects for the top- and bottom-layer in the training set. To do this we use the
+`create_empty_picks`-solution:
+
+```bash
+album run copick:create_empty_picks:0.2.0 \
+--copick_config_path config_train.json \
+--out_object top-layer \
+--out_user bob \
+--out_session 1
+
+album run copick:create_empty_picks:0.2.0 \
+--copick_config_path config_train.json \
+--out_object bottom-layer \
+--out_user bob \
+--out_session 1
+```
+
+Open ChimeraX and start the copick extension by running the following command in the ChimeraX command line:
+
+```
+copick start config_train.json
+```
+
+![chimerax-interface](./figures/chimx_boundary.png)
+*The ChimeraX-copick interface after loading run 14069.*
+
+This will open a new window with the copick interface. On the top left side you will see the available objects, on the
+bottom left you can find a list of runs in the dataset. On the right side you can find the interface of ArtiaX (the plugin that allows you to annotate objects in ChimeraX).
+
+Double-click a run's directory (e.g. `14069`) in the run list to show the available resolutions, double-click the
+resolution's directory (`VS:7.840`) to display the available tomograms. In order to load a tomogram, double-click the
+tomogram.
+
+The tomogram will be displayed in the main viewport in the center. Available pickable objects are displayed in the
+list on the left side. Select a pickable object (e.g. top-layer) by double-clicking it and start annotating the top-border
+of the sample by placing points on the top-border of the sample.
+
+You can switch the slicing direction in the `Tomogram`-tab on the right. You can move through the 2D slices of the
+tomogram using the slider on the right or `Shift + Mouse Wheel`. For more information on how to use the copick interface,
+see the info box below and refer to the [ChimeraX documentation](https://www.cgl.ucsf.edu/chimerax/docs/user/index.html).
+
+
+ Keyboard Shortcuts
+
+**Particles**
+
+ - `--` Remove Particle.
+ - `00` Set 0% transparency for active particle list.
+ - `55` Set 50% transparency for active particle list.
+ - `88` Set 80% transparency for active particle list.
+ - `aa` Previous Particle.
+ - `dd` Next Particle.
+ - `sa` Select all particles for active particle list.
+ - `ss` Select particles mode
+ - `ww` Hide/Show ArtiaX particle lists.
+
+**Picking**
+
+ - `ap` Add on plane mode
+ - `dp` Delete picked mode
+ - `ds` Delete selected particles
+
+**Visualization**
+
+ - `cc` Turn Clipping On/Off
+ - `ee` Switch to orthoplanes.
+ - `ff` Move planes mouse mode.
+ - `qq` Switch to single plane.
+ - `rr` Rotate slab mouse mode.
+ - `xx` View XY orientation.
+ - `yy` View YZ orientation.
+ - `zz` View XZ orientation.
+
+**Info**
+
+ - `?` Show Shortcuts in Log.
+ - `il` Toggle Info Label.
+
+
+
+At the end of this step, you should have annotated the top- and bottom-layer of the all 18 tomograms in the training set.
+
+![top-bottom](./figures/top_bottom_light.png)
+
+*Points clicked along the top and bottom boundary of the sample of a tomogram.*
+
+### Step 3: Create the training data
+
+#### Valid reconstruction area
+
+Next, we will create the training data for the sample boundary prediction. First, we will create bounding boxes that
+describe the valid reconstruction area in each tomogram. In most TEMs, the tilt axis is not exactly parallel to
+either of the detector axes, causing tomograms to have small regions of invalid reconstruction at the corners. Using
+the `create_rec_limits`-solution, we can compute 3D meshes that describe the valid reconstruction area in each
+tomogram.
+
+In this case, we will assume an in-plane rotation of -6 degrees.
+
+```bash
+album run copick:create_rec_limits:0.5.0 \
+--copick_config_path config_train.json \
+--voxel_spacing 7.84 \
+--tomo_type wbp \
+--angle -6 \
+--output_object valid-area \
+--output_user bob \
+--output_session 0
+```
+
+You can now visualize the created bounding boxes in ChimeraX by restarting the copick interface and selecting the
+`valid-area` object in the Mesh-tab on the left side.
+
+![valid-area](./figures/valid_area_light.png)
+
+*Top view onto a tomogram [run 15094](https://cryoetdataportal.czscience.com/runs/14069) without (left)
+and with (right) valid reconstruction area mesh overlayed.*
+
+#### Sample
+
+Now, we will use the points created in [Step 2](#step-2-annotate-the-training-set) to create a second 3D mesh that
+describes the sample boundaries. We do this, by fitting a plane defined by a cubic spline grid to the points using the
+[torch-cubic-spline-grid](https://github.com/teamtomo/torch-cubic-spline-grids) package in the `fit_sample`-solution.
+
+```bash
+album run copick:fit_sample:0.7.0 \
+--copick_config_path config_train.json \
+--top_object top-layer \
+--bottom_object bottom-layer \
+--input_user bob --input_session 1 \
+--voxel_spacing 7.84 \
+--tomo_type wbp \
+--output_object sample \
+--output_user bob \
+--output_session 0
+```
+
+#### Intersection
+
+Next, we will intersect the valid reconstruction area with the sample to create a new object that describes the valid
+sample area. We do this using the `intersect_mesh`-solution:
+
+```bash
+album run copick:intersect_mesh:0.5.0 \
+--copick_config_path config_train.json \
+--object_a valid-area \
+--user_a bob \
+--session_a 0 \
+--object_b sample \
+--user_b bob \
+--session_b 0 \
+--output_object valid-sample \
+--output_user bob \
+--output_session 0 \
+```
+
+You can now visualize the final 3D mesh for training in ChimeraX by restarting the copick interface and selecting the `valid-area` object in the Mesh-tab on the left side.
+
+![mesh](./figures/mesh_fit.png)
+
+*Side view of the tomogram with points and intersected, valid sample area.*
+
+#### Training data
+
+Finally, we will create the training data for the sample boundary prediction. We will use the `mesh_to_seg`-solution to
+create a dense segmentation of the same size as the tomogram from the 3D meshes.
+
+```bash
+album run copick:mesh_to_seg:0.7.0 \
+--copick_config_path config_train.json \
+--input_object valid-sample \
+--input_user bob \
+--input_session 0 \
+--voxel_spacing 7.84 \
+--tomo_type wbp
+```
+
+We also need to determine where sub-volumes for training should be cropped. This allows us to ensure the correct ratio
+of positive and negative samples in the training data. We will use the `sample_mesh`-solution to create a set of
+points sampled using poisson disk and rejection sampling. The solution allows to specify the number of points inside,
+on the surface and outside the mesh.
+
+```bash
+album run copick:sample_mesh:0.5.0 \
+--copick_config_path config_train.json \
+--input_object valid-sample \
+--input_user bob \
+--input_session 0 \
+--voxel_spacing 7.84 \
+--tomo_type wbp \
+--num_surf 300 \ # Number of points on the surface of the mesh
+--num_internal 300 \ # Number of points inside of the mesh
+--num_random 100 \ # Number of points outside of the mesh
+--min_dist 200 \ # Minimum distance between points in angstrom
+--output_user bob
+```
+
+The resulting segmentations will have the same name, user and session ID as the input object. You can now visualize the
+segmentations in ChimeraX by restarting the copick interface and selecting the `valid-sample` object in the
+`Segmentation`-tab on the top left part of the interface. You can also visualize the sampled points from the
+`Points`-tab on the left side.
+
+### Step 4: Train the model
+
+#### Create the multilabel segmentation
+In the next step, we will create a second dense segmentation volume that contains the sample segmentation from the
+[previous step](#step-3-create-the-training-data). This is redundant in this case, but necessary for the training of the
+J-finder model if there were multiple segmentation targets. While the segmentation created previously is a binary mask,
+the segmentation volume created here contains integer labels for each voxel, corresponding to the "label" field in the
+`pickable_objects`-list in the configuration file.
+
+In order to do this, we will run step 1 of the J-finder pipeline:
+
+```bash
+step1 create \
+--config config_train.json \
+--target valid-sample bob 0 0 \ # Format: input-picks-name user session radius
+--seg-target valid-sample bob 0 \ # Format: input-segmentation-name from-mesh user session
+--voxel-size 7.84 \
+--tomogram-algorithm wbp \
+--out-name sampletargets
+```
+
+This should create a new segmentation volume with name `sampletargets`, user `train-deepfinder` and session `0`.
+
+#### Train the model
+
+Next, we will train the J-finder model using the training data created in the previous steps. We will use the
+`train`-command of the J-finder pipeline:
+
+```bash
+mkdir outputs
+
+step2 train \
+--path-train config_train.json \
+--train-voxel-size 7.84 \
+--train-tomo-type wbp \
+--output-path outputs/ \
+--n-class 3 --dim-in 64 \
+--valid-tomo-ids 14069,14070,14071 \
+--train-tomo-ids 14072,14073,14074,14075,14076,14077,14078,14079,14080,14081,14082,14083,14084,14085,14086 \
+--sample-size 10 \
+--label-name sampletargets \
+--target valid-sample bob 0
+```
+
+In this case, runs `14069`, `14070`, and `14071` will be used for validation, while the remaining runs will be used for
+training. The model will be trained for 10 epochs with a sample size of 10. The model will be saved in the
+`outputs`-directory.
+
+### Step 5: Evaluate the model
+
+Now, we will evaluate the model on the evaluation set. For demonstration purposes we will only evaluate on three
+tomograms. We will use the `segment`-command of the J-finder pipeline:
+
+```bash
+step3 segment \
+--predict-config config_evaluate.json \
+--path-weights outputs/net_weights_FINAL.h5 \
+--n-class 3 --patch-size 196 \
+--voxel-size 7.84 \
+--tomogram-algorithm wbp \
+--segmentation-name segmentation \
+--user-id output \
+--session-id 0 \
+--tomo-ids 14114,14132,14137,14163
+```
+
+This will create a new segmentation volume with name `segmentation`, user `output` and session `0` for the tomograms
+`14114`, `14132`, `14137`, and `14163`. You can now visualize the segmentations in ChimeraX by restarting the copick interface
+and selecting the `segmentation` object in the `Segmentation`-tab on the top left part of the interface.
+
+![prediction-fit](./figures/prediction_fit.png)
+
+*Segmentation generated by the model and box fit to the segmentation.*
+
+### Step 6: Post-processing
+
+Finally, we will post-process the segmentations to create the final sample boundaries. The segmentations can contain
+small isolated regions that are not part of the sample. We will use the `fit_sample_seg`-solution to fit a box with
+parallel sides to the segmentation.
+
+```bash
+album run copick:fit_sample_seg:0.9.0 \
+--copick_config_path config_evaluate.json \
+--top_object top-layer \
+--bottom_object bottom-layer \
+--input_user output \
+--input_session 0 \
+--seg_name segmentation \
+--voxel_spacing 7.84 \
+--tomo_type wbp \
+--run_names 14114,14132,14137,14163 \
+--output_object valid-sample \
+--output_user output \
+--output_session 0
+```
+
+You can now visualize the final 3D mesh for evaluation in ChimeraX by restarting the copick interface and selecting the
+`valid-sample` object in the `Mesh`-tab on the left side. Below you can see the final result for the three tomograms
+`14114`, `14132`, `14137`, and `14163`.
+
+![final-fit](./figures/final.png)
+
+*Clipped boundaries predicted for [run 14114](ttps://cryoetdataportal.czscience.com/runs/14114), [run 14132](https://cryoetdataportal.czscience.com/runs/14132), [run 14137](https://cryoetdataportal.czscience.com/runs/14137), and
+[run 14163](https://cryoetdataportal.czscience.com/runs/14163) (left to right, top to bottom).*
diff --git a/docs/tutorials.md b/docs/tutorials.md
new file mode 100644
index 000000000..e915b4d7a
--- /dev/null
+++ b/docs/tutorials.md
@@ -0,0 +1,8 @@
+# Tutorials
+
+These tutorials will help you learn about using the Portal API.
+
+- [Example code snippets](./cryoet_data_portal_docsite_examples.md) are short examples of using the API.
+- [Predicting sample boundaries](./tutorial_sample_boundaries.md) in tomograms is an end to end example of using AWS and the Portal API.
+
+Please start a [discussion on Github](https://github.com/chanzuckerberg/cryoet-data-portal/discussions/new/choose) if you'd like to request a tutorial.