Skip to content

Commit

Permalink
Deploying to gh-pages from @ 7a432eb 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
ivirshup committed Jan 17, 2025
1 parent aa68968 commit 7553972
Show file tree
Hide file tree
Showing 23 changed files with 583 additions and 146 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,7 @@ <h1>cellxgene_census.experimental.pp.get_highly_variable_genes<a class="headerli
<span class="go"> )</span>
</pre></div>
</div>
<p>Fetch an <a class="reference external" href="https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.html#anndata.AnnData" title="(in anndata v0.11.1.dev38+gfaec0f8)"><code class="xref py py-class docutils literal notranslate"><span class="pre">anndata.AnnData</span></code></a> with top 500 genes:</p>
<p>Fetch an <a class="reference external" href="https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.html#anndata.AnnData" title="(in anndata v0.12.0.dev40+g68bb5b4)"><code class="xref py py-class docutils literal notranslate"><span class="pre">anndata.AnnData</span></code></a> with top 500 genes:</p>
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="k">with</span> <span class="n">cellxgene_census</span><span class="o">.</span><span class="n">open_soma</span><span class="p">(</span><span class="n">census_version</span><span class="o">=</span><span class="s2">&quot;stable&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">census</span><span class="p">:</span>
<span class="go"> organism = &quot;mus_musculus&quot;</span>
<span class="go"> obs_value_filter = &quot;is_primary_data == True and tissue_general == &#39;lung&#39;&quot;</span>
Expand Down
8 changes: 4 additions & 4 deletions _autosummary/cellxgene_census.get_anndata.html

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,6 @@ EFO:0700004,BD Rhapsody Targeted mRNA
EFO:0700010,TruDrop
EFO:0700011,GEXSCOPE technology
EFO:0700016,Smart-seq v4
EFO:0010961, Visium Spatial Gene Expression
EFO:0009920, Slide-seq
EFO:0030062, Slide-seqV2
84 changes: 0 additions & 84 deletions _images/cellxgene_census_docsite_model.svg

This file was deleted.

9 changes: 9 additions & 0 deletions _sources/articles.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,12 @@ What's new?
:maxdepth: 1

articles/2024/*

2025
----------

.. toctree::
:glob:
:maxdepth: 1

articles/2025/*
53 changes: 53 additions & 0 deletions _sources/articles/2025/20250117-spatial.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Beta Release of Spatial Data on Census!

**Published:** Jan 14, 2025
**By:** Cathy Stolitzka, Isaac Virshup, Maximilian Lombardo

The Census team is pleased to announce the release of Spatial data on Census!

This has been a large joint effort between the Census team and TileDB to create an easy-to-use and backwards-compatible spatial schema that enables easy analysis with non-spatial and spatial data!

This first release is a **beta release** with the ability to export all spatial data (10x Visium and Slideseq) from an `obs/var` query in `SpatialData`. The ability to export `SpatialData` with spatial-based filters, transforms, etc., is not supported and will be implemented in a future release.

---

## SOMA Spatial Data Model

![Updated CELLxGENE Census Schema with spatial data](/census-spatial-schema.svg)

### Building Blocks

#### **SOMAExperiment**

A collection encapsulating data from one or more single-cell datasets, with reserved attributes:

| Field Name | Field Description |
|-------------------------|-----------------------------------------------------------------------------------|
| `obs` | A DataFrame for observation metadata |
| `ms` | A collection (`SOMAMeasurement`), with cell-by-gene data matrices and a gene metadata DataFrame |
| **[NEW] `spatial`** | A collection of `Scene` objects (see below) |
| **[NEW] `obs_spatial_presence`** | A DataFrame to map observations to `Scene` objects |

---

#### **[NEW] SOMAScene**

A collection of spatial assets. All assets in one `Scene` should correspond to the same physical coordinate system. The collection provides operations for getting, setting, and transforming between coordinate systems, with reserved attributes:

| Field Name | Field Description |
|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| `obsl` | A collection of spatial arrays and collections. The rows in these arrays correspond to observations and may correspond to `obs` of a `SOMAExperiment`. |
| `varl` | A collection of collections for spatial arrays on the `SOMAMeasurements`. The top-level collection is indexed by measurement name. The rows in the contained arrays correspond to features and may correspond to `var` of the associated `SOMAMeasurement`. |
| `img` | A `SOMAImageCollection` of images (single and multi-resolution). |

---

#### **[NEW] SOMAImageCollection**

A group of multi-resolution images that can be accessed by levels. Below are some sample operations on this collection type. Every `SOMAImageNDArray` in the collection must be mappable to the same physical space by translation and scaling only.

| Operation | Description |
|-------------------|-------------------------------------------------------------------------|
| `levels` | Sequence of level numbers in the slide |
| `dimensions` | A `(width, height)` tuple for level 0 of the slide |
| `level_dimensions`| A sequence of down-sample factors for each level of the slide |
2 changes: 1 addition & 1 deletion _sources/cellxgene_census_docsite_landing.md.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<span style="background-color: #f3bfcb; color; font-size: 18px"> 🚀 Now in testing: Spatial! From Jan 16th, latest builds will include data from Slide-seq and Visium assays. ⚠️ Opening these builds requires `tiledbsoma>=1.15.3`. ⚠️
<span style="background-color: #f3bfcb; color; font-size: 18px"> 🚀 Now in testing: Spatial! From Jan 16th, latest builds will include data from Slide-seq and Visium assays. ⚠️ Opening these builds requires `tiledbsoma>=1.15.3` ⚠️. [Learn more](articles/2025/20250117-spatial.md)!

<span style="background-color: #f3bfcb; color; font-size: 18px"> 🚀 New to the Census: Train PyTorch models directly with Census data with our efficient and easy-to-use PyTorch loaders. [Learn more](https://chanzuckerberg.github.io/cellxgene-census/articles/2024/20240709-pytorch.html)!

Expand Down
53 changes: 31 additions & 22 deletions _sources/cellxgene_census_docsite_schema.md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,56 +12,65 @@ This page provides a user-friendly overview of the Census contents and its schem

The Census is a collection of a variety of **[SOMA objects](#soma-objects)** organized with the following hierarchy.

![image](cellxgene_census_docsite_model.svg)
![image](census-spatial-schema.svg)

As you can see the Census data is a `SOMACollection` with two high-level items:

1. `"census_info"` for the census summary info.
2. `"census_data"` for the single-cell data and metadata.

### Census summary info `"census_info"`
### Census Summary Info `"census_info"`

A `SOMAcollection` with tables providing information of the census as a whole, it has the following items:

- `"summary"`: high-level information of this Census, e.g. build date, total cell count, etc.
- `"summary"`: High-level information of this Census, e.g., build date, total cell count, etc.
- `"datasets"`: A table with all datasets from CELLxGENE Discover used to create the Census.
- `"summary_cell_counts"`: Cell counts stratified by relevant cell metadata.

### Census single-cell data `"census_data"`
---

Data for each organism is stored in independent `SOMAExperiment` objects which are a specialized form of a `SOMACollection`. Each of these store a data matrix (cell by genes), cell metadata, gene metadata, and feature presence matrix:
### Census Single-Cell Data `"census_data"`

Data for each organism is stored in independent `SOMAExperiment` objects, which are a specialized form of a `SOMACollection`. Each of these stores a data matrix (cell by genes), cell metadata, gene metadata, and feature presence matrix.

This is how the data is organized for one organism – *Homo sapiens*:

- `["homo_sapiens"].obs`: Cell metadata.
- `["homo_sapiens"].ms["RNA"].X`: Data matrices: raw counts in `X["raw"]`, and library-size normalized counts in `X["normalized"]` (only avialble in Census schema V1.1.0 and above).
- `["homo_sapiens"].ms["RNA"].var`: Gene Metadata.
- `["homo_sapiens"].ms["RNA"]["feature_dataset_presence_matrix"]`: a sparse boolean array indicating which genes were measured in each dataset.
- `["homo_sapiens"].ms["RNA"].X`: Data matrices: raw counts in `X["raw"]`, and library-size normalized counts in `X["normalized"]` (only available in Census schema V1.1.0 and above).
- `["homo_sapiens"].ms["RNA"].var`: Gene metadata.
- `["homo_sapiens"].ms["RNA"]["feature_dataset_presence_matrix"]`: A sparse boolean array indicating which genes were measured in each dataset.

---

## Data included in the Census
### Data Included in the Census

All data from [CZ CELLxGENE Discover](https://cellxgene.cziscience.com/) that adheres to the following criteria is included in the Census:

- Cells from human or mouse.
- Non-spatial RNA data, see full list of sequencing technologies included [here](https://github.com/chanzuckerberg/cellxgene-census/blob/main/docs/cellxgene_census_schema.md#assays).
- **Spatial and non-spatial RNA data**, see the full list of sequencing technologies included [here](https://github.com/chanzuckerberg/cellxgene-census/blob/main/docs/cellxgene_census_schema.md#assays).
- Raw counts.
- Only standardized cell and gene metadata as described in the CELLxGENE Discover dataset [schema](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md).

⚠️ Note that the data includes:

- **Full-gene sequencing read counts** (e.g. Smart-Seq2) and **molecule counts** (e.g. 10X).
- **Duplicate cells** present across multiple datasets, these can be filtered in or out using the cell metadata variable `is_primary_data`.
- **Full-gene sequencing read counts** (e.g., Smart-Seq2) and **molecule counts** (e.g., 10X).
- **Duplicate cells** present across multiple datasets. These can be filtered in or out using the cell metadata variable `is_primary_data`.

## SOMA objects
---

You can find the full SOMA specification [here](https://github.com/single-cell-data/SOMA/blob/main/abstract_specification.md#foundational-types).
### SOMA Objects

The following is short description of the main SOMA objects used by the Census:
You can find the full SOMA specification [here](https://github.com/single-cell-data/SOMA/blob/main/abstract_specification.md#foundational-types).

- `DenseNDArray` is a dense, N-dimensional array, with offset (zero-based) integer indexing on each dimension.
- `SparseNDArray` is the same as `DenseNDArray` but sparse, and supports point indexing (disjoint index access).
- `DataFrame` is a multi-column table with a user-defined columns names and value types, with support for point indexing.
- `Collection` is a persistent container of named SOMA objects.
- `Experiment` is a class that represents a single-cell experiment. It always contains two objects:
- `obs`: a `DataFrame` with primary annotations on the observation axis.
- `ms`: a `Collection` of measurements, each composed of `X` matrices and axis annotation matrices or data frames (e.g. `var`, `varm`, `obsm`, etc).
The following is a short description of the main SOMA objects used by the Census:

- **`DenseNDArray`**: A dense, N-dimensional array, with offset (zero-based) integer indexing on each dimension.
- **`SparseNDArray`**: The same as `DenseNDArray` but sparse, and supports point indexing (disjoint index access).
- **`DataFrame`**: A multi-column table with user-defined column names and value types, with support for point indexing.
- **`Collection`**: A persistent container of named SOMA objects.
- **`Experiment`**: A class that represents a single-cell experiment. It always contains two objects:
- `obs`: A `DataFrame` with primary annotations on the observation axis.
- `ms`: A `Collection` of measurements, each composed of `X` matrices and axis annotation matrices or data frames (e.g., `var`, `varm`, `obsm`, etc.).
- **`SOMAScene`**: A `Collection` of `obsl`, `varl`, and `img`.
- **`Spatial`**: A collection of `Scene` objects.
- **`obs_spatial_presence`**: A `DataFrame` to map observations to `Scene` objects.
12 changes: 12 additions & 0 deletions articles.html
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,10 @@
<li class="toctree-l3"><a class="reference internal" href="articles/2024/20240710_embedding_metrics_dec_2023_lts.html">Benchmarks of single-cell Census models</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#id3">2025</a><ul>
<li class="toctree-l3"><a class="reference internal" href="articles/2025/20250117-spatial.html">Beta Release of Spatial Data on Census!</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="cellxgene_census_aws_open_data.html">Census in AWS ☁️</a></li>
Expand Down Expand Up @@ -229,6 +233,14 @@ <h2>2024<a class="headerlink" href="#id2" title="Permalink to this heading"><
</ul>
</div>
</section>
<section id="id3">
<h2>2025<a class="headerlink" href="#id3" title="Permalink to this heading"></a></h2>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="articles/2025/20250117-spatial.html">Beta Release of Spatial Data on Census!</a></li>
</ul>
</div>
</section>
</section>


Expand Down
1 change: 1 addition & 0 deletions articles/2023/20230808-r_api_release.html
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../articles.html#id2">2024</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../articles.html#id3">2025</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../cellxgene_census_aws_open_data.html">Census in AWS ☁️</a></li>
Expand Down
1 change: 1 addition & 0 deletions articles/2023/20230919-out_of_core_methods.html
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../articles.html#id2">2024</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../articles.html#id3">2025</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../cellxgene_census_aws_open_data.html">Census in AWS ☁️</a></li>
Expand Down
1 change: 1 addition & 0 deletions articles/2023/20231012-normalized_layer_precalc_stats.html
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../articles.html#id2">2024</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../articles.html#id3">2025</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../cellxgene_census_aws_open_data.html">Census in AWS ☁️</a></li>
Expand Down
1 change: 1 addition & 0 deletions articles/2024/20240404-categoricals.html
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@
<li class="toctree-l3"><a class="reference internal" href="20240710_embedding_metrics_dec_2023_lts.html">Benchmarks of single-cell Census models</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../articles.html#id3">2025</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../cellxgene_census_aws_open_data.html">Census in AWS ☁️</a></li>
Expand Down
1 change: 1 addition & 0 deletions articles/2024/20240709-pytorch.html
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@
<li class="toctree-l3"><a class="reference internal" href="20240710_embedding_metrics_dec_2023_lts.html">Benchmarks of single-cell Census models</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../articles.html#id3">2025</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../cellxgene_census_aws_open_data.html">Census in AWS ☁️</a></li>
Expand Down
5 changes: 3 additions & 2 deletions articles/2024/20240710_embedding_metrics_dec_2023_lts.html
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@

<link rel="index" title="Index" href="../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" />
<link rel="next" title="CZ CELLxGENE Discover Census in AWS" href="../../cellxgene_census_aws_open_data.html" />
<link rel="next" title="Beta Release of Spatial Data on Census!" href="../2025/20250117-spatial.html" />
<link rel="prev" title="First stable iteration of Census (SOMA) PyTorch loaders" href="20240709-pytorch.html" />
</head>

Expand Down Expand Up @@ -160,6 +160,7 @@
</li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../articles.html#id3">2025</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../cellxgene_census_aws_open_data.html">Census in AWS ☁️</a></li>
Expand Down Expand Up @@ -525,7 +526,7 @@ <h4>Spinal cord<a class="headerlink" href="#spinal-cord" title="Permalink to thi
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="20240709-pytorch.html" class="btn btn-neutral float-left" title="First stable iteration of Census (SOMA) PyTorch loaders" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="../../cellxgene_census_aws_open_data.html" class="btn btn-neutral float-right" title="CZ CELLxGENE Discover Census in AWS" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
<a href="../2025/20250117-spatial.html" class="btn btn-neutral float-right" title="Beta Release of Spatial Data on Census!" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>

<hr/>
Expand Down
Loading

0 comments on commit 7553972

Please sign in to comment.