diff --git a/README.md b/README.md index a404eac..50b817f 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,14 @@ # pyroe -## About `pyroe` +The main purpose of `pyroe` is to provide the python interface for loading the quantification results of single-cell sequencing data generated by [`alevin-fry`](https://github.com/COMBINE-lab/alevin-fry) and [`simpleaf`](https://github.com/COMBINE-lab/simpleaf). +- The major function of pyroe is the [`load_fry`](https://pyroe.readthedocs.io/en/latest/processing_fry_quants.html#load-fry-full-usage) function, which loads the quantification results into an [`anndata`](https://anndata.readthedocs.io/en/latest/) object to perform downstream analysis provided by [`scanpy`](https://scanpy.readthedocs.io/en/stable/). It provides many options for constructing the final `anndata` object by combining the count matrices representing difference splicing statuses differently. +- Moreover, `pyroe` provides the interface for the [`quantaf`](https://combine-lab.github.io/quantaf/) project, which is a database containing the quantification results of many publicly available datasets. + +### Background [`Alevin-fry`](https://github.com/COMBINE-lab/alevin-fry) is a fast, accurate, and memory frugal quantification tool for preprocessing single-cell RNA-sequencing data. Detailed information can be found in the alevin-fry [pre-print](https://www.biorxiv.org/content/10.1101/2021.06.29.450377v2), and [paper](https://www.nature.com/articles/s41592-022-01408-3). -The `pyroe` package provides useful functions for analyzing single-cell or single-nucleus RNA-sequencing data using `alevin-fry`. The documentation for `pyroe` has its own dedicated website. Please visit the [ReadTheDocs pyroe website here](https://pyroe.readthedocs.io). +[`simpleaf`](https://github.com/COMBINE-lab/simpleaf) provides a simple and easy-to-use interface for running `alevin-fry`, and also more advanced features such as designing and executing custom workflows for single-cell data analysis. ([Paper](https://doi.org/10.1093/bioinformatics/btad614) and [Documentation](https://simpleaf.readthedocs.io/en/latest/)) + +## Major Updates +Since Pyroe v0.10.0, the functionality for creating augmented transcriptome references and generating gene ID to gene name file has been moved to the [`roers`](https://github.com/COMBINE-lab/roers) packge, which is automatically installed together with [`simpleaf`](https://github.com/COMBINE-lab/alevin-fry). For all our users, we recommend using the simplified command line interface provided in [`simpleaf`](https://simpleaf.readthedocs.io/en/latest/) to process your single-cell sequencing data. The [`simpleaf index`](https://simpleaf.readthedocs.io/en/latest/index-command.html) command will automatically generate the augmented transcriptome reference (including the gene ID to gene name file), indexing the reference for you. diff --git a/bin/pyroe b/bin/pyroe index d845d3d..db86bd4 100755 --- a/bin/pyroe +++ b/bin/pyroe @@ -1,13 +1,20 @@ #!/usr/bin/env python import logging - from pyroe import make_splici_txome, make_spliceu_txome +from pyroe import id_to_name + +if make_spliceu_txome is None or make_splici_txome is None or id_to_name is None: + raise ImportError("To run pyroe CLI, Please install pyranges, biopython and bedtools.") from pyroe import fetch_processed_quant from pyroe import convert -from pyroe import id_to_name from pyroe import output_formats +# because of pyranges, we need to ignore FutureWarnings +import warnings +warnings.simplefilter(action='ignore', category=FutureWarning) + + if __name__ == "__main__": import argparse import sys diff --git a/docs/source/building_splici_index.rst b/docs/source/building_splici_index.rst index ee7d090..e3a698c 100644 --- a/docs/source/building_splici_index.rst +++ b/docs/source/building_splici_index.rst @@ -1,11 +1,11 @@ ################################################################################# -Preparing an expanded transcriptome reference for quantification with alevin-fry +(Deprecated since v0.10.0) Preparing an expanded transcriptome reference for quantification with alevin-fry ################################################################################# The USA mode in alevin-fry requires an expanded index reference, in which sequences represent spliced and unspliced transcripts. Pyroe provides CLI programs and python functions to build the pre-defined expanded references, the spliced + intronic (*splici*) reference, which includes the spliced transcripts plus the (merged and collapsed) intronic sequences of each gene and the spliced + unspliced (*spliceu*) reference, which consists of the spliced transcripts plus the unspliced transcript (genes' entire genomic interval) of each gene. The ``make_splici_txome()`` and ``make_spliceu_txome()`` python functions are designed to make the *splici* and *spliceu* reference by taking a genome FASTA file and a gene annotation GTF file as the input. Furthermore, the Preparing a *spliced+intronic* transcriptome reference -------------------------------------------------------- +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The *splici* index reference of a given species consists of the transcriptome of the species, i.e., the spliced transcripts and the intronic sequences of the species. Within a gene, if the flanked intronic sequences overlap with each other, the overlapped intronic sequences will be collapsed as a single intronic sequence to make sure each base will appear only once in the intronic sequences of each gene. @@ -29,8 +29,8 @@ The `pyroe make-spliced+intronic` program writes three files to your specified o * A three-column transcript-name-to-gene-name file that stores the name of each reference sequence in the splici index reference, their corresponding gene name, and the splicing status (`S` for spliced and `U` for unspliced) of those transcripts. * A two-column TSV file that maps gene ids (used as the keys in eventual alevin-fry output) to gene names. This can later be used with the ``pyroe convert`` command line program to convert gene ids to gene names in the count matrix. -Full usage -^^^^^^^^^^ +**Full usage** + .. code:: @@ -120,7 +120,7 @@ The ``pyroe make-spliced+intronic`` command line program calls the ``make_splici Nothing will be returned. The splici reference files will be written to disk. Preparing a *spliced+unspliced* transcriptome reference -------------------------------------------------------- +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Recently, `He et al., 2023 `_ introduced the spliced + unspliced (*spliceu*) index in alevin-fry. This requires the spliced + unspliced transcriptome reference, where the unspliced transcripts of each gene represent the entire genomic interval of that gene. Details about the *spliceu* can be found in `the preprint `_. To make the spliceu reference using pyroe, one can call the ``make_spliceu_txome()`` python function or ``pyroe make-spliced+unspliced`` or its alias ``pyroe make-spliceu`` from the command line. The following example shows the shell command of building a spliceu reference from a given reference set in the directory ``spliceu_txome``. @@ -132,8 +132,8 @@ Recently, `He et al., 2023 `_ command of ``pyroe`` allows you to specify an id to name mapping so that the converted output matrix will be labeled with gene names rather than identifiers. However, you must provide it with a 2-column tab-separated file mapping IDs to names. This command can help you with that task. diff --git a/docs/source/index.rst b/docs/source/index.rst index 8c03e7b..3db0761 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -4,18 +4,24 @@ Welcome to the documentation for pyroe What is pyroe? =================== -The pyroe package provides useful functions for analyzing single-cell or single-nucleus RNA-sequencing data using `alevin-fry`. Since `simpleaf` version 0.14.0, `roers `_, instead of pyroe, became as the default augmented reference constructor for `alevin-fry` and `simpleaf`. Now, the main purpose of `pyroe` is to provide the function `load_fry` to load `alevin-fry` quantification results into Python as an `anndata `_ object, so as to be compatible with `scanpy `_. If you have trouble installing `pyroe`, you can also define the ``load_fry`` function in your own Python script, the definition of ``load_fry`` can be found at here: `load_fry `_. +The pyroe package provides useful functions for analyzing single-cell or single-nucleus RNA-sequencing data using `alevin-fry`. +The main purpose of `pyroe` is to provide the function `load_fry` to load `alevin-fry` and `simpleaf` quantification results into Python as an `anndata `_ object, so as to perform downstream analysis provided by `scanpy `_. Moreover, `pyroe` also provides functions to fetch the pre-computed quantification results from the `quantaf `_ database. + +In previous versions (before v 0.10.0), pyroe also provided the functions to construct the augmented transcriptome references. Since `simpleaf` version 0.14.0, `roers `_, instead of pyroe, became the default augmented reference constructor for `alevin-fry` and `simpleaf`. If you would like to use the deprecated functions to construct the augmented references, please install an older version of pyroe. Notice that old versions of pyroe are compatitble with pandas version less than 2.0.0. So, we suggest you to install the old versions of pyroe in a conda environment with a isolated environment, so as to not affect the other packages in your system. + +**To note that** although pyroe is available on bioconda and can be easily installed, if you encounter any problem during installation, you can define the `load_fry` function locally in your python script by copying the function definition defined `here `_. The only dependency of `load_fry` is `scanpy `_. + .. toctree:: :maxdepth: 2 :caption: Contents: installing - building_splici_index processing_fry_quants + converting_quants fetching_processed_quants + building_splici_index geneid_to_name - converting_quants LICENSE.rst Indices and tables diff --git a/setup.cfg b/setup.cfg index 859a875..6898d64 100644 --- a/setup.cfg +++ b/setup.cfg @@ -16,20 +16,14 @@ classifiers = packages = find: package_dir = = src -scripts = - bin/pyroe +# scripts = +# bin/pyroe python_requires = >=3.7 include_package_data = True install_requires = - pandas >= 1.3.0, < 2.2.0 - pyranges == 0.0.129 - biopython >= 1.77 packaging >= 21.0 scanpy >= 1.8.2 -# [options.extras_require] -# scanpy = -# scanpy >= 1.8.2 [options.packages.find] where = src diff --git a/src/pyroe/__init__.py b/src/pyroe/__init__.py index 69ba268..ce93022 100644 --- a/src/pyroe/__init__.py +++ b/src/pyroe/__init__.py @@ -1,12 +1,19 @@ __version__ = "0.10.0" from pyroe.load_fry import load_fry -from pyroe.make_txome import make_splici_txome, make_spliceu_txome from pyroe.fetch_processed_quant import fetch_processed_quant from pyroe.load_processed_quant import load_processed_quant from pyroe.ProcessedQuant import ProcessedQuant from pyroe.convert import convert -from pyroe.id_to_name import id_to_name from pyroe.pyroe_utils import output_formats + +# try: +# from pyroe.make_txome import make_splici_txome, make_spliceu_txome +# from pyroe.id_to_name import id_to_name +# except ImportError: +# make_splici_txome = None +# make_spliceu_txome = None +# id_to_name = None + # flake8: noqa