Protocols for building, analyzing, and using the trees and other resources in this project.
Genome retrieval: Download all bacterial and archaeal genomes available from NCBI GenBank and RefSeq, using RepoPhlAn.
Genome sampling: Select n genomes form a genome pool such that they maximize included biodiversity as measured by the k-mer signatures of genomes.
Marker identification: Identify and extract amino acid sequences of 400 global marker genes from genomes, using PhyloPhlAn.
Tree building: Build phylogenetic trees of genes and species using various approaches.
Tree manipulation: manipulate phylogenetic trees using the Python scripts developed by our team.
Taxon subsampling: Select n taxa from a larger phylogenetic tree such that it maximizes representation of deep-branching, large clades.
Taxonomy curation: Evaluate, modify and extend existing taxonomic assignments based on a phylogenetic tree.
Tree comparison: Compare the phylogenetic relationships and distances indicated by individual species trees.
Tree comparison by depth: Compare the topologies of two trees with consideration of phylogenetic depth.
Major clade dimension: Calculate and compare the dimensions of major clades (e.g., Archaea vs. Bacteria), including distances between crown groups and distances between leaves.
Shared clades: Collapse two very large trees to a shared set of large clades to enable back-to-back comparison via tanglegram.
Gene tree discordance: Analyze evolutionary discrepancy reflected by individual gene trees.
Saturation test: Analyze potential amino acid substitution saturation and how it impacts estimated phylogenetic distances.
GTDB translation: Process GTDB taxonomy and trees to enable cross-translation with our work.
- Tree rendering: Collapse tree at given rank(s) and generate files ready for iTOL and FigTree rendering.
Genome database: Build a reference genome database with phylogeny-curated taxonomy to improve an existing metagenomic sequence classification workflow.
Community ecology: Convert WGS sequence alignments into a "gOTU table" and perform microbial community ecology analyses with the reference phylogeny.
Tree profiling: Modify an existing metagenomic profiling workflow to allow sequences to be directly assigned to tips and internal nodes of the reference phylogeny.