Merge pull request #44 from COMBINE-lab/optional_pyranges

Optional pyranges
COMBINE-lab · Dec 31, 2024 · c8c94ba · c8c94ba
2 parents ad07985 + 3cdefdf
commit c8c94ba
Show file tree

Hide file tree

Showing 7 changed files with 48 additions and 26 deletions.
diff --git a/README.md b/README.md
@@ -1,7 +1,14 @@
 # pyroe
 
-## About `pyroe`
+The main purpose of `pyroe` is to provide the python interface for loading the quantification results of single-cell sequencing data generated by [`alevin-fry`](https://github.com/COMBINE-lab/alevin-fry) and [`simpleaf`](https://github.com/COMBINE-lab/simpleaf). 
+- The major function of pyroe is the [`load_fry`](https://pyroe.readthedocs.io/en/latest/processing_fry_quants.html#load-fry-full-usage) function, which loads the quantification results into an [`anndata`](https://anndata.readthedocs.io/en/latest/) object to perform downstream analysis provided by [`scanpy`](https://scanpy.readthedocs.io/en/stable/). It provides many options for constructing the final `anndata` object by combining the count matrices representing difference splicing statuses differently. 
+- Moreover, `pyroe` provides the interface for the [`quantaf`](https://combine-lab.github.io/quantaf/) project, which is a database containing the quantification results of many publicly available datasets.
 
+
+### Background
 [`Alevin-fry`](https://github.com/COMBINE-lab/alevin-fry) is a fast, accurate, and memory frugal quantification tool for preprocessing single-cell RNA-sequencing data. Detailed information can be found in the alevin-fry [pre-print](https://www.biorxiv.org/content/10.1101/2021.06.29.450377v2), and [paper](https://www.nature.com/articles/s41592-022-01408-3).
 
-The `pyroe` package provides useful functions for analyzing single-cell or single-nucleus RNA-sequencing data using `alevin-fry`.  The documentation for `pyroe` has its own dedicated website.  Please visit the [ReadTheDocs pyroe website here](https://pyroe.readthedocs.io).
+[`simpleaf`](https://github.com/COMBINE-lab/simpleaf) provides a simple and easy-to-use interface for running `alevin-fry`, and also more advanced features such as designing and executing custom workflows for single-cell data analysis. ([Paper](https://doi.org/10.1093/bioinformatics/btad614) and [Documentation](https://simpleaf.readthedocs.io/en/latest/))
+
+## Major Updates
+Since Pyroe v0.10.0, the functionality for creating augmented transcriptome references and generating gene ID to gene name file has been moved to the [`roers`](https://github.com/COMBINE-lab/roers) packge, which is automatically installed together with [`simpleaf`](https://github.com/COMBINE-lab/alevin-fry). For all our users, we recommend using the simplified command line interface provided in [`simpleaf`](https://simpleaf.readthedocs.io/en/latest/) to process your single-cell sequencing data. The [`simpleaf index`](https://simpleaf.readthedocs.io/en/latest/index-command.html) command will automatically generate the augmented transcriptome reference (including the gene ID to gene name file), indexing the reference for you.
diff --git a/bin/pyroe b/bin/pyroe
@@ -1,13 +1,20 @@
 #!/usr/bin/env python
 
 import logging
-
 from pyroe import make_splici_txome, make_spliceu_txome
+from pyroe import id_to_name
+
+if make_spliceu_txome is None or make_splici_txome is None or id_to_name is None:
+    raise ImportError("To run pyroe CLI, Please install pyranges, biopython and bedtools.")
 from pyroe import fetch_processed_quant
 from pyroe import convert
-from pyroe import id_to_name
 from pyroe import output_formats
 
+# because of pyranges, we need to ignore FutureWarnings
+import warnings
+warnings.simplefilter(action='ignore', category=FutureWarning)
+
+
 if __name__ == "__main__":
     import argparse
     import sys

diff --git a/docs/source/building_splici_index.rst b/docs/source/building_splici_index.rst
@@ -1,11 +1,11 @@
 #################################################################################
-Preparing an expanded transcriptome reference for quantification with alevin-fry
+(Deprecated since v0.10.0) Preparing an expanded transcriptome reference for quantification with alevin-fry
 #################################################################################
 
 The USA mode in alevin-fry requires an expanded index reference, in which sequences represent spliced and unspliced transcripts. Pyroe provides CLI programs and python functions to build the pre-defined expanded references, the spliced + intronic (*splici*) reference, which includes the spliced transcripts plus the (merged and collapsed) intronic sequences of each gene and the spliced + unspliced (*spliceu*) reference, which consists of the spliced transcripts plus the unspliced transcript (genes' entire genomic interval) of each gene. The ``make_splici_txome()`` and ``make_spliceu_txome()`` python functions are designed to make the *splici* and *spliceu* reference by taking a genome FASTA file and a gene annotation GTF file as the input. Furthermore, the 
 
 Preparing a *spliced+intronic* transcriptome reference
--------------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The *splici* index reference of a given species consists of the transcriptome of the species, i.e., the spliced transcripts and the intronic sequences of the species. Within a gene, if the flanked intronic sequences overlap with each other, the overlapped intronic sequences will be collapsed as a single intronic sequence to make sure each base will appear only once in the intronic sequences of each gene.
 
@@ -29,8 +29,8 @@ The `pyroe make-spliced+intronic` program writes three files to your specified o
 * A three-column transcript-name-to-gene-name file that stores the name of each reference sequence in the splici index reference, their corresponding gene name, and the splicing status (`S` for spliced and `U` for unspliced) of those transcripts.
 * A two-column TSV file that maps gene ids (used as the keys in eventual alevin-fry output) to gene names. This can later be used with the ``pyroe convert`` command line program to convert gene ids to gene names in the count matrix.
 
-Full usage
-^^^^^^^^^^
+**Full usage**
+
 
 .. code::
 
@@ -120,7 +120,7 @@ The ``pyroe make-spliced+intronic`` command line program calls the ``make_splici
   Nothing will be returned. The splici reference files will be written to disk.
 
 Preparing a *spliced+unspliced* transcriptome reference
--------------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Recently, `He et al., 2023 <https://www.biorxiv.org/content/10.1101/2023.01.04.522742>`_ introduced the spliced + unspliced (*spliceu*) index in alevin-fry. This requires the spliced + unspliced transcriptome reference, where the unspliced transcripts of each gene represent the entire genomic interval of that gene. Details about the *spliceu* can be found in `the preprint <https://www.biorxiv.org/content/10.1101/2023.01.04.522742>`_. To make the spliceu reference using pyroe, one can call the ``make_spliceu_txome()`` python function or ``pyroe make-spliced+unspliced`` or its alias ``pyroe make-spliceu`` from the command line. The following example shows the shell command of building a spliceu reference from a given reference set in the directory ``spliceu_txome``.
 
@@ -132,8 +132,8 @@ Recently, `He et al., 2023 <https://www.biorxiv.org/content/10.1101/2023.01.04.5
   spliceu_txome \
   --filename-prefix spliceu
 
-Full usage
-^^^^^^^^^^
+**Full usage**
+
 
 .. code::
 
@@ -208,7 +208,8 @@ The ``pyroe make-spliced+unspliced`` command line program calls the ``make_splic
 
 
 Notes on the input gene annotation GTF files for building an expanded reference
-----------------------------------------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
 Pyroe builds expanded transcriptome references, the spliced + intronic (*splici*) and the spliced + unspliced (*spliceu*) transcriptome reference, based on a genome build FASTA file and a gene annotation GTF file.
 
 The input GTF file will be processed before extracting unspliced sequences. If pyroe finds invalid records, a ``clean_gtf.gtf`` file will be generated in the specified output directory.  **Note** : The features extracted in the spliced + unspliced transcriptome will not necessarily be those present in the ``clean_gtf.gtf`` file — as this command will prefer the input in the user-provided file wherever possible. One can rerun pyroe using the ``clean_gtf.gtf`` file if needed. More specifically:

diff --git a/docs/source/geneid_to_name.rst b/docs/source/geneid_to_name.rst
@@ -1,4 +1,4 @@
-Generating a gene id to gene name mapping
+(Deprecated since v0.10.0) Generating a gene id to gene name mapping
 =========================================
 
 It is often useful to perform analyses with gene *names* rather than gene *identifiers*. The `convert <https://pyroe.readthedocs.io/en/latest/converting_quants.html>`_ command of ``pyroe`` allows you to specify an id to name mapping so that the converted output matrix will be labeled with gene names rather than identifiers.  However, you must provide it with a 2-column tab-separated file mapping IDs to names.  This command can help you with that task.

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -4,18 +4,24 @@ Welcome to the documentation for pyroe
 What is pyroe?
 ===================
 
-The pyroe package provides useful functions for analyzing single-cell or single-nucleus RNA-sequencing data using `alevin-fry`. Since `simpleaf` version 0.14.0, `roers <https://github.com/COMBINE-lab/roers>`_, instead of pyroe, became as the default augmented reference constructor for `alevin-fry` and `simpleaf`. Now, the main purpose of `pyroe` is to provide the function `load_fry` to load `alevin-fry` quantification results into Python as an `anndata <http://anndata.readthedocs.io/>`_ object, so as to be compatible with `scanpy <https://scanpy.readthedocs.io/en/stable/index.html>`_. If you have trouble installing `pyroe`, you can also define the ``load_fry`` function in your own Python script, the definition of ``load_fry`` can be found at here: `load_fry <https://github.com/COMBINE-lab/pyroe/blob/main/src/pyroe/load_fry.py>`_. 
+The pyroe package provides useful functions for analyzing single-cell or single-nucleus RNA-sequencing data using `alevin-fry`.
+The main purpose of `pyroe` is to provide the function `load_fry` to load `alevin-fry` and `simpleaf` quantification results into Python as an `anndata <http://anndata.readthedocs.io/>`_ object, so as to perform downstream analysis provided by `scanpy <https://scanpy.readthedocs.io/en/stable/index.html>`_. Moreover, `pyroe` also provides functions to fetch the pre-computed quantification results from the `quantaf <https://combine-lab.github.io/quantaf/>`_ database. 
+
+In previous versions (before v 0.10.0), pyroe also provided the functions to construct the augmented transcriptome references. Since `simpleaf` version 0.14.0, `roers <https://github.com/COMBINE-lab/roers>`_, instead of pyroe, became the default augmented reference constructor for `alevin-fry` and `simpleaf`. If you would like to use the deprecated functions to construct the augmented references, please install an older version of pyroe. Notice that old versions of pyroe are compatitble with pandas version less than 2.0.0. So, we suggest you to install the old versions of pyroe in a conda environment with a isolated environment, so as to not affect the other packages in your system.  
+
+**To note that** although pyroe is available on bioconda and can be easily installed, if you encounter any problem during installation, you can define the `load_fry` function locally in your python script by copying the function definition defined `here <https://github.com/COMBINE-lab/pyroe/blob/main/src/pyroe/load_fry.py>`_. The only dependency of `load_fry` is `scanpy <https://scanpy.readthedocs.io/en/stable/installation.html>`_. 
+
 
 .. toctree::
    :maxdepth: 2
    :caption: Contents:
 
    installing
-   building_splici_index
    processing_fry_quants
+   converting_quants
    fetching_processed_quants
+   building_splici_index
    geneid_to_name
-   converting_quants
    LICENSE.rst
 
 Indices and tables

diff --git a/setup.cfg b/setup.cfg
@@ -16,20 +16,14 @@ classifiers =
 packages = find:
 package_dir =
     = src
-scripts =
-    bin/pyroe
+# scripts =
+#     bin/pyroe
 python_requires = >=3.7
 include_package_data = True
 install_requires = 
-    pandas >= 1.3.0, < 2.2.0
-    pyranges == 0.0.129
-    biopython >= 1.77
     packaging >= 21.0
     scanpy >= 1.8.2
 
-# [options.extras_require]
-# scanpy = 
-#     scanpy >= 1.8.2
 
 [options.packages.find]
 where = src

diff --git a/src/pyroe/__init__.py b/src/pyroe/__init__.py
@@ -1,12 +1,19 @@
 __version__ = "0.10.0"
 
 from pyroe.load_fry import load_fry
-from pyroe.make_txome import make_splici_txome, make_spliceu_txome
 from pyroe.fetch_processed_quant import fetch_processed_quant
 from pyroe.load_processed_quant import load_processed_quant
 from pyroe.ProcessedQuant import ProcessedQuant
 from pyroe.convert import convert
-from pyroe.id_to_name import id_to_name
 from pyroe.pyroe_utils import output_formats
 
+
+# try:
+#     from pyroe.make_txome import make_splici_txome, make_spliceu_txome
+#     from pyroe.id_to_name import id_to_name
+# except ImportError:
+#     make_splici_txome = None
+#     make_spliceu_txome = None
+#     id_to_name = None
+
 # flake8: noqa