Skip to content

Fast detection of recombinant sequences from BAM files

License

Notifications You must be signed in to change notification settings

ness-lab/readcomb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

readcomb - fast detection of recombinant reads in BAMs

PyPI version

readcomb is a collection of command line and Python tools for fast detection of recombination events in pooled high-throughput sequencing data. readcomb searches for changes in parental haplotype phase across individual reads and classifies recombination events based on various properties of the observed recombinant haplotypes.

readcomb was designed for use with the model alga Chlamydomonas reinhardtii and currently only supports haploids. Although the means of specifically detecting gene conversion are more specific to C. reinhardtii, everything else in readcomb is generalizable to the detection of recombination events in any haploid species.

Installation

pip install readcomb

Dependencies

  • cyvcf2 - Fast retrieval and filtering of VCF files and VCF objects written in C
  • pysam - Interface for SAM and BAM files and provides SAM and BAM objects
  • pandas - Support for data tables
  • tqdm - Provides updating progress bars for command line programs
  • samtools - Used for preprocessing of VCF and BAM files

Usage:

bamprep

Command line preprocessing script for BAM files. bamprep will prepare an index file, filter out unusuable reads, and output a BAM sorted by read name. readcomb requires BAMs sorted by read name for fast parsing and filtering.

readcomb-bamprep --bam [bam_filepath] --out [outdir]

Optional parameters:

  • --samtools - Path to samtools binary
  • --threads [int] - Number of threads samtools should use (default 1)
  • --index_csi - Create CSI index instead of BAI
  • --no_progress - Disable index creation - this will speed up bamprep but will mean no progress bars when filtering

vcfprep

Command line preprocessing script for VCF files

readcomb-vcfprep --vcf [vcf_filepath] --out [output_filepath]

Optional arguments

  • --snps_only - Keep only SNPs
  • --indels_only - Keep only indels
  • --no_hets - Remove heterozygote calls
  • --min_GQ [int] - Minimum genotype quality at both sites (default 30)

filter

Command line multiprocessing script for identification of bam sequences with phase changes

readcomb-filter --bam [bam_filepath] --vcf [vcf_filepath]

Optional arguments:

  • -p, --processes [processes], Number of processes available for filter (default 4)
  • -m, --mode [phase_change|no_match], Filtering mode (default phase_change)
  • -l, --log [log_filepath], Filename for log metric output
  • -o, --out [output_filepath], File to write filtered output to (default recomb_diagnosis)

classification

Python module for detailed classification of sequences containing phase changes

>>> import readcomb.classification as rc
>>> from cyvcf2 import VCF

>>> bam_filepath = 'data/example_sequences.bam'
>>> vcf_filepath = 'data/example_variants.vcf.gz'
>>> pairs = rc.pairs_creation(bam_filepath, vcf_filepath)     # generate list of Pair objects
>>> cyvcf_object = VCF(vcf_filepath)                          # cyvcf2 file object

>>> print(pairs[0])
Record name: chromosome_1-199370 
Read1: chromosome_1:499417-499667 
Read2: chromosome_1:499766-500016 
VCF: data/example_variants.vcf.gz

>>> pairs[0].classify(cyvcf_object)                           # run classification algorithm
>>> print(pairs[0])
Record name: chromosome_1-199370 
Read1: chromosome_1:499417-499667 
Read2: chromosome_1:499766-500016 
VCF: data/example_variants.vcf.gz
Unmatched Variant(s): False 
Condensed: [['CC2936', 499417, 499626], ['CC2935', 499626, 499736], ['CC2936', 499736, 500016]] 
Call: gene_conversion 
Condensed Masked: [['CC2936', 499487, 499626], ['CC2935', 499626, 499736], ['CC2936', 499736, 499946]] 
Call Masked: gene_conversion 

License

GNU General Public License v3 (GPLv3+)

Development

Currently in alpha

Source code

Development repo

About

Fast detection of recombinant sequences from BAM files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages