Skip to content
This repository has been archived by the owner on Nov 3, 2021. It is now read-only.

WDL workflow for population variant calling using htsget, DeepVariant, and GLnexus

License

Notifications You must be signed in to change notification settings

dnanexus-rnd/DeepVariant-GLnexus-WDL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepVariant+GLnexus workflows

These portable WDL workflows use DeepVariant to call variants from WGS read alignments, followed by GLnexus to merge the resulting Genome VCF (gVCF) files for several samples into a Project VCF (pVCF). The wdl/ directory has three nested workflows:

Based on the DeepVariant docs, the sequential workflow to generate gVCF from a given BAM file and genomic range.

             +----------------------------------------------------------------------------+
             |                                                                            |
             |  DeepVariant.wdl                                                           |
             |                                                                            |
             |  +-----------------+    +-----------------+    +------------------------+  |
sample.bam   |  |                 |    |                 |    |                        |  |
 genome.fa ----->  make_examples  |---->  call_variants  |---->  postprocess_variants  |-----> gVCF
     range   |  |                 |    |                 |    |                        |  |
             |  +-----------------+    +--------^--------+    +------------------------+  |
             |                                  |                                         |
             |                                  |                                         |
             +----------------------------------|-----------------------------------------+
                                                |
                                       DeepVariant Model

make_examples and call_variants internally parallelize across CPUs on the machine they run on. The tasks use the docker image published by the DeepVariant team.

To further parallelize WGS calling accross several machines, scatters DeepVariant.wdl across several genomic ranges (typically full-length chromosomes). For each range, fetches a BAM slice using the GA4GH htsget client in samtools 1.7+, given an htsget server endpoint and sample ID. Finally, concatenates the per-range gVCFs to the complete product.

             +--------------------------------------------------------------------------------+
             |                                                                                |
             |  htsget_DeepVariant.wdl                                                        |
             |                                                                                |
             |       +-----------------+    +-------------------+                             |
             |       |                 |    |                   |  range gVCF                 |
             |   +--->  htsget client  |---->  DeepVariant.wdl  |---+                         |
             |   |   |  (samtools)     |    |                   |   |                         |
             |   |   |                 |    +-------------------+   |                         |
sample ID    |   |   +-----------------+                            |  +-------------------+  |
             |   |                                                  +-->                   |  |
   ranges -------+---> ...                  ...                 ... --->  bcftools concat  +-----> sample gVCF
    (e.g.    |   |                                                  +-->                   |  |
     chr1    |   |   +-----------------+                            |  +-------------------+  |
     chr2    |   |   |                 |    +-------------------+   |                         |
     ...)    |   +--->  htsget client  |    |                   |   |                         |
             |       |  (samtools)     |---->  DeepVariant.wdl  |---+                         |
             |       |                 |    |                   |  range gVCF                 |
             |       +------------^----+    +-------------------+                             |
             |            |       |                                                           |
             |            |       |                                                           |
             +------------|-------|-----------------------------------------------------------+
                          |       |
               sample ID  |       |
                   range  |       |  range BAM
                          |       |
                     +----v------------+
                     |                 |
                     |  htsget server  |
                     |                 |
                     +-----------------+

By using htsget, the workflow scatters across the ranges without first having to download and slice up a monolithic BAM file.

Scatters htsget_DeepVariant.wdl across several samples to generate an array of gVCF files, then feeds these to GLnexus to merge them into a pVCF.

              +-----------------------------------------------------------+
              |                                                           |
              |  htsget_DeepVariant_GLnexus.wdl                           |
              |                                                           |
              |       +--------------------------+                        |
              |       |                          |   sample gVCF          |
              |   +--->  htsget_DeepVariant.wdl  |----+                   |
              |   |   |                          |    |                   |
              |   |   +--------------------------+    |    +-----------+  |
              |   |                                   +---->           |  |
sample IDs -------+---> ...                      ...  ----->  GLnexus  +----> project VCF
              |   |                                   +---->           |  |
              |   |   +--------------------------+    |    +-----------+  |
              |   |   |                          |    |                   |
              |   +--->  htsget_DeepVariant.wdl  |----+                   |
              |       |                          |   sample gVCF          |
              |       +--------------------------+                        |
              |                                                           |
              +-----------------------------------------------------------+

Here's an example inputs JSON providing everything required to launch this top-level workflow with dxWDL or Cromwell:

{
    "htsget_DeepVariant_GLnexus.accessions": ["NA12878","NA12891","NA12892"],
    "htsget_DeepVariant_GLnexus.htsget_endpoint": "https://htsnexus.rnd.dnanex.us/v1/reads/BroadHiSeqX_b37",
    "htsget_DeepVariant_GLnexus.ranges": ["12:112204691-112247789","17:41196312-41277500"],
    "htsget_DeepVariant_GLnexus.ref_fasta_gz": (REFERENCE GENOME FILE),
    "htsget_DeepVariant_GLnexus.model_tar": (DEEPVARIANT MODEL FILES),
    "htsget_DeepVariant_GLnexus.output_name": "b37_CEUtrio_ALDH2_BRCA1",
}

About

WDL workflow for population variant calling using htsget, DeepVariant, and GLnexus

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published