Skip to content

PyO3 bindings and Python interface to sylph, an ultrafast method for containment ANI querying and taxonomic profiling.

License

Notifications You must be signed in to change notification settings

althonos/pysylph

Repository files navigation

🕊️ Pysylph Stars

PyO3 bindings and Python interface to sylph, an ultrafast method for containment ANI querying and taxonomic profiling.

Actions Coverage License PyPI Bioconda AUR Wheel Python Versions Python Implementations Source Mirror Issues Docs Changelog Downloads

🗺️ Overview

sylph[1] is a method developed by Jim Shaw and Yun William Yu for fast and robust ANI querying or metagenomic profiling for metagenomic shotgun samples. It uses a statistical model based on Poisson coverage to compute coverage-adjusted ANI instead of naive ANI.

pysylph is a Python module, implemented using the PyO3 framework, that provides bindings to sylph. It directly links to the sylph code, which has the following advantages over CLI wrappers:

  • pre-built wheels: pysylph is distributed on PyPI and features pre-built wheels for common platforms, including x86-64 and Arm64.
  • single dependency: If your software or your analysis pipeline is distributed as a Python package, you can add pysylph as a dependency to your project, and stop worrying about the sylph binary being present on the end-user machine.
  • sans I/O: Everything happens in memory, in Python objects you control, making it easier to pass your sequences to pysylph without having to write them to a temporary file.

This library is still a work-in-progress, and in an experimental stage, with API breaks very likely between minor versions.

🔧 Installing

Pysylph can be installed directly from PyPI, which hosts some pre-built CPython wheels for x86-64 platforms, as well as the code required to compile from source with Rust and maturin:

$ pip install pysylph

🔖 Citation

Pysylph is scientific software, and builds on top of sylph. Please cite sylph if you are using it in an academic work, for instance as:

pysylph, a Python library binding to sylph (Shaw & Yu, 2024).

💡 Examples

🔨 Creating a database

A database is a collection of genomes sketched for fast querying.

Here is how to create a database into memory, using Biopython to load genomes:

sketcher = pysylph.Sketcher()
sketches = []

for path in pathlib.Path(".").glob("*.fasta"):
    contigs = [ str(record.seq) for record in Bio.SeqIO.parse(path, "fasta") ]
    sketch = sketcher.sketch_genome(name=path.stem, contigs=contigs)
    sketches.append(sketch)

database = pysylph.Database(sketches)

Sketcher methods are re-entrant and can be used to sketch multiple genomes in parallel using for instance a ThreadPool.

📝 Saving a database

The database can be saved to the binary format used by the sylph binary as well:

database.dump("genomes.syldb")

🗒️ Loading a database

A database previously created with sylph can be loaded transparently in pysylph:

database = pysylph.Database.load("genomes.syldb")

📊 Sketching a query

Samples must also be sketched before they can be used to query a database. Here is how to sketch a sample made of single-ended reads stored in FASTQ format:

reads = [str(record.seq) for record in Bio.SeqIO.parse("sample.fastq", "fastq")]
sample = sketcher.sketch_single(name="sample", reads=reads)

🔬 Querying a database

Once a sample has been sketched, it can be used to query a database for ANI containment or taxonomic profiling:

profiler = pysylph.Profiler()
results = profiler.query(sample, database)   # ANI containment
results = profiler.profile(sample, database) # taxonomic profiling

Profiler methods are re-entrant and can be used to query a database with multiple samples in parallel using for instance a ThreadPool.

🔎 See Also

Computing ANI for closed genomes? You may also be interested in pyskani, a Python package for computing ANI binding to skani, which was developed by the same authors.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

⚖️ License

This library is provided under the MIT License. It contains some code included verbatim from the the sylph source code, which was written by Jim Shaw and is distributed under the terms of the MIT License as well. Source distributions of pysylph vendors additional sources under their own terms using the cargo vendor command.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original sylph authors. It was developed by Martin Larralde during his PhD project at the Leiden University Medical Center in the Zeller team.

📚 References

About

PyO3 bindings and Python interface to sylph, an ultrafast method for containment ANI querying and taxonomic profiling.

Resources

License

Stars

Watchers

Forks

Packages

No packages published