Skip to content

script to convert bam2cram and reverse using find and samtools

License

Notifications You must be signed in to change notification settings

vidboda/BamCramConvert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BamCramConvert

DOI

Script to convert bam2cram and reverse using find, samtools (requires UNIX find and samtools...), and optionally a slighlty modified forked version of bam2cram-check. BamCramConvert can also apply Crumble compression to your files!

Usage : bash bcc.sh

Mandatory arguments :

	* -d|--directory	<path to search dir>: root dir for find command    
	* -s|--size		<File size to search (man find -size)>: ex: +200000000k or +20G will search for files greater than 20Go; see man find -size argument    
	* -mt|--modif-time	<File last modif to search (man find -mtime): ex: +180 will search for files older than 6 months; see man find -mtime argument    
	* -f|--file-type	<bam|cram>: file type to find and convert from (bam will search for bam files and convert to cram)

Optional arguments :

	* -rm|--remove		:removes original file and index (in case of full conversion success implies bam2cram-check) - default: false
	* -st|--samtools	<path to samtools> - default: try to locate in PATH
	* -fa|--ref-fasta	<path to ref genome .fa>: path to a fasta file reference genome (the directory containing the fasta file must also contain samtools index) - default: /usr/local/share/refData/genome/hg19/hg19.fa
	* -drc|--disable-ref-check      :disable fasta reference file checking. Currently bcc can make a checking for hg19 and hg38 based on chr1 length. Disable for other assemblies.
	* -c|--check		: uses bam2cram-check (slightly modified) to check the conversion - implicitely included with -rm - if fails and -rm: rm canceled) - requires python >3.5 and samtools > 1.3
	* -p|--python3		<path to python3> - used in combination with -c or -rm: needed to run submodule bam2cram-check - default: /usr/bin/python3 - python version must be > 3.5
	* -uc|--use-crumble     : uses crumble to compress the converted BAM/CRAM file - Note: a file that already contains "crumble" in its name will not be converted again
	* -cp|--crumble-path    <path to crumble> - used in combination with -uc: needed to run crumble - default: try to locate in PATH
	* -v | --verbosity 	<integer> : decrease or increase verbosity level (ERROR : 1 | WARNING : 2 | INFO : 3 | COMMAND [default] : 4 | DEBUG : 5)

General arguments :

	* -sl|--slurm   : when running in SLURM environnment, generates srun commands - default: false
	* -th|--threads : number of threads to be used for samtools -@ option (0 => 1 total thread, 1 => 2 total threads...)
	* -h		: show this help message and exit
	* -t		: test mode (dont execute command just print them) - default: false

Installation:

see INSTALL.md

Use cases:

  • convert BAM files in directory dir (recursively) into CRAM, BAM size min 100Mo and older than 7 days, do not remove original BAM files, Reference genome is hg38 (must match with BAM files to be converted, non matching files based on chr1 length will be ignored - supports hg19 and hg38 only):
bash bcc.sh -d path/to/dir/ -mt +7 -s +100M -f bam -fa /path/to/hg38.fa
  • the same in SLURM environnment using 4 threads for samtools commands:
bash bcc.sh -d path/to/dir/ -mt +7 -s +100M -f bam -fa /path/to/hg38.fa -sl -th 3
  • the same in dry run mode and providing optional samtools path (otherwise is searched in PATH) - it is highly recommanded to use the dry run mode before launching a command, just to be sure of what will be done:
bash bcc.sh -d path/to/dir/ -mt +7 -s +100M -f bam -fa /path/to/hg38.fa -sl -th 3 -st /special/place/samtools -t
  • convert CRAM files in directory dir (recursively) into BAM, CRAM size min 5Go, from today, remove original CRAM files (implies bam2cram-check):
bash bcc.sh -d patho/to/dir/ -mt -1 -s +5G -f cram -fa /path/to/hg38.fa -rm
  • the same as above without removing the original files and now we explicitely apply bam2cram-check to the new files (and provide optional python3 (>3.5) path):
bash bcc.sh -d patho/to/dir/ -mt -1 -s +5G -f cram -fa /path/to/hg38.fa -c -p /usr/bin/python3
  • convert BAM files in directory dir (recursively) into CRAM, BAM size min 100Mo and older than 7 days, apply crumble to the new CRAM and provide optional crumble path (otherwise is searched in PATH), remove original file:
bash bcc.sh -d path/to/dir/ -mt +7 -s +100M -f bam -fa /path/to/hg38.fa -uc -cp /special/place/crumble -rm