readcomb
is a collection of command line and Python tools for fast detection
of recombination events in pooled high-throughput sequencing data. readcomb
searches for changes in parental haplotype phase across individual reads and classifies
recombination events based on various properties of the observed recombinant haplotypes.
readcomb
was designed for use with the model alga Chlamydomonas reinhardtii and
currently only supports haploids. Although the means of specifically detecting gene
conversion are more specific to C. reinhardtii, everything else in readcomb
is
generalizable to the detection of recombination events in any haploid species.
pip install readcomb
- cyvcf2 - Fast retrieval and filtering of VCF files and VCF objects written in C
- pysam - Interface for SAM and BAM files and provides SAM and BAM objects
- pandas - Support for data tables
- tqdm - Provides updating progress bars for command line programs
- samtools - Used for preprocessing of VCF and BAM files
Command line preprocessing script for BAM files. bamprep
will prepare an
index file, filter out unusuable reads, and output a BAM sorted by read name.
readcomb
requires BAMs sorted by read name for fast parsing and filtering.
readcomb-bamprep --bam [bam_filepath] --out [outdir]
Optional parameters:
--samtools
- Path to samtools binary--threads [int]
- Number of threads samtools should use (default 1)--index_csi
- Create CSI index instead of BAI--no_progress
- Disable index creation - this will speed upbamprep
but will mean no progress bars when filtering
Command line preprocessing script for VCF files
readcomb-vcfprep --vcf [vcf_filepath] --out [output_filepath]
Optional arguments
--snps_only
- Keep only SNPs--indels_only
- Keep only indels--no_hets
- Remove heterozygote calls--min_GQ [int]
- Minimum genotype quality at both sites (default 30)
Command line multiprocessing script for identification of bam sequences with phase changes
readcomb-filter --bam [bam_filepath] --vcf [vcf_filepath]
Optional arguments:
-p, --processes [processes]
, Number of processes available for filter (default 4)-m, --mode [phase_change|no_match]
, Filtering mode (defaultphase_change
)-l, --log [log_filepath]
, Filename for log metric output-o, --out [output_filepath]
, File to write filtered output to (defaultrecomb_diagnosis
)
Python module for detailed classification of sequences containing phase changes
>>> import readcomb.classification as rc
>>> from cyvcf2 import VCF
>>> bam_filepath = 'data/example_sequences.bam'
>>> vcf_filepath = 'data/example_variants.vcf.gz'
>>> pairs = rc.pairs_creation(bam_filepath, vcf_filepath) # generate list of Pair objects
>>> cyvcf_object = VCF(vcf_filepath) # cyvcf2 file object
>>> print(pairs[0])
Record name: chromosome_1-199370
Read1: chromosome_1:499417-499667
Read2: chromosome_1:499766-500016
VCF: data/example_variants.vcf.gz
>>> pairs[0].classify(cyvcf_object) # run classification algorithm
>>> print(pairs[0])
Record name: chromosome_1-199370
Read1: chromosome_1:499417-499667
Read2: chromosome_1:499766-500016
VCF: data/example_variants.vcf.gz
Unmatched Variant(s): False
Condensed: [['CC2936', 499417, 499626], ['CC2935', 499626, 499736], ['CC2936', 499736, 500016]]
Call: gene_conversion
Condensed Masked: [['CC2936', 499487, 499626], ['CC2935', 499626, 499736], ['CC2936', 499736, 499946]]
Call Masked: gene_conversion
GNU General Public License v3 (GPLv3+)
Currently in alpha