Strain Prediction and Analysis using Representative SEquences (SPARSE)

SPARSE indexes >100,000 reference genomes in public databases in to hierarchical clusters and uses it to predict origins of metagenomic reads.

Installation

SPARSE runs on Unix and requires Python >= version 2.7

System modules (Ubuntu 16.04) :

pip
gfortran
llvm
libncurses5-dev
cmake
xvfb-run (for malt, optional)

3rd-party software:

samtools (>=1.2)
mash (>=1.1.1)
bowtie2 (>=2.3.2)
malt (>=0.4.0) (optional)

See requirements.txt for python module dependencies.

Installation via PIP

pip install meta-sparse

Installation from source codes (Ubuntu)

sudo apt-get update
sudo apt-get install gfortran llvm libncurses5-dev cmake python-pip samtools bowtie2
git clone https://github.com/zheminzhou/SPARSE
cd SPARSE/EM && make
pip install -r requirements.txt

Updating SPARSE

To update SPARSE, move to installation directory and pull the latest version:

cd SPARSE
git pull

Quick Start

See http://sparse.readthedocs.io/en/latest/ for full documentation.

Download reference database

We provide a pre-compiled database based on RefSeq (dated 14.10.2017) to download at http://enterobase.warwick.ac.uk/sparse/ Please download the complete folder refseq_20171014/ and do not change its internal folder structure. The database can be unpacked by running:

cd refseq_20171014 && sh untar.bash

This pre-compiled database contains four default mapping databases, which can be specified in the next step: representative, subpopulation, Virus, Eukaryota.

To update the database or build a costum database, please refer to the full documentation.

Predict read origins

This following command will map and evaluate all reads in both fastq-files against the specified mapping databases.

python SPARSE.py predict --dbname refseq_20171014 --mapDB representative,subpopulation,Virus,Eukaryota --r1 read1.fq.gz --r2 read2.fq.gz --workspace <workspace_name>

For single-end reads, only --r1 needs to be specified. All output files are stored in the respective workspace.

Create a report

python SPARSE.py report <workspace_name>

The report will be stored in <workspace_name>/profile.txt

Extract reference specific reads

The following command extracts all reads specific to the provided reference ids, which can be found in the output of step 2.

python SPARSE.py extract --dbname refseq_20171014 --workspace <workspace_name> --ref_id <comma delimited indices>

Citation

SPARSE has not been formally published yet. If you use SPARSE please cite the preprint https://www.biorxiv.org/content/early/2017/11/07/215707

Zhemin Zhou, Nina Luhmann, Nabil-Fareed Alikhan, Christopher Quince, Mark Achtman, 'Accurate Reconstruction of Microbial Strains Using Representative Reference Genomes' bioRxiv 215707; doi: https://doi.org/10.1101/215707

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
EM		EM
bin		bin
docs		docs
evaluation		evaluation
example		example
inputs		inputs
modules		modules
old_backup		old_backup
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
SPARSE.py		SPARSE.py
__init__.py		__init__.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Strain Prediction and Analysis using Representative SEquences (SPARSE)

Installation

Installation via PIP

Installation from source codes (Ubuntu)

Updating SPARSE

Quick Start

Citation

About

Releases

Packages

Languages

License

achtman-lab/SPARSE

Folders and files

Latest commit

History

Repository files navigation

Strain Prediction and Analysis using Representative SEquences (SPARSE)

Installation

Installation via PIP

Installation from source codes (Ubuntu)

Updating SPARSE

Quick Start

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages