Skip to content
Isaac Overcast edited this page Nov 21, 2015 · 6 revisions

Important files generated and specific config used by each step.

Step 1 - Demultiplex

In - Raw reads from 'raw_fastq_path', must be in fastq format (can be gzip compressed).

Out - Demultiplexed individuals to <work>/fastq/*.gz & info to /fastq/s1_demultiplex_stats.txt

Step 2 - Filter based on Phred Q score

Out - <work>/edits/*.fasta & info to <work>/edits/s2_rawedit_stats.txt

Step 3 - Cluster reads within individuals

Out

  • Dereplicated and sorted reads: <work>/edits/*.derep
  • <work>/clust_<tolerance>/
  • *.htemp - FASTA file of unmatched searches (vsearch)
  • *.utemp - user defined output stats from vsearch
  • *.clust.gz - unaligned clusters
  • *.clustS.gz - Aligned clusters (post-muscle)
  • *s3_cluster_stats.txt
  • IFF reference sequence mapping
  • <work>/refmapping/
  • *.sam - Raw output of smalt mapping
  • *.<mapped/unmapped>.bam - bam files for mapped and unmapped reads
  • *.sorted-<mapped/umapped>.bam - sorted bam files for mapped and unmapped reads
  • <work>/edits/*.fastq - Updated fastq files in the edits dir to contain only unmapped reads
  • <work>/clust_<tolerance>/
  • *.clustsS.gz - Merged denovo clusters (post-muscle) and reference sequence aligned pileups.
  • Info to <work>/edits/clust_<tolerance>/s3_cluster_stats.txt

Step 4 - Joint estimation of H and E

Out

  • Info to <work>/edits/clust_<tolerance>/s4_Pi_E_estimate.txt

Step 5 - Consensus sequences and HDF5 database with coverages

Out

  • consensus reads: <work>/edits/clust_<tolerance>/consens_<outprefix>/
  • *.consens - FASTA file of consensus reads
  • *...hd5f... - database storage (maybe) of read depths

Step 6 - Cluster across samples

Out

  • ordered consensus reads: <work>/edits/clust_<tolerance>/consens_<outprefix>/cat...
  • vsearch matching output: <work>/edits/clust_<tolerance>/consens_<outprefix>/cat.utemp
  • database of all clusters containing depth data: ...