Skip to content

Commit

Permalink
📚 update GCP links
Browse files Browse the repository at this point in the history
  • Loading branch information
dmiller15 committed Jun 17, 2024
1 parent 12013d5 commit f08577f
Show file tree
Hide file tree
Showing 5 changed files with 46 additions and 52 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ calling metrics, and, if no contamination value is provided, the VerifyBAMID
output. Additionally, if the user sets the `run_sex_metrics` input to true, two
additional outputs from samtools idxstats will be provided.

This workflow is the current production workflow, equivalent to this [Cavatica public app](https://cavatica.sbgenomics.com/public/apps#cavatica/apps-publisher/kfdrc-gatk-haplotypecaller-workflow).
This workflow is the current production workflow, equivalent to this [CAVATICA public app](https://cavatica.sbgenomics.com/public/apps#cavatica/apps-publisher/kfdrc-gatk-haplotypecaller-workflow).

## Inputs

Expand Down Expand Up @@ -60,12 +60,14 @@ idxstats will be run on the input reads to generate the sex metrics outputs
1. For contamination input, either populate the `contamination` field or provide the three contamination
files: `contamination_sites_bed`, `contamination_sites_mu`, and `contamination_sites_ud`. Failure to
provide one of these groups will result in a failed run.
1. Suggested reference inputs (available from the [Broad Resource Bundle](https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0)):
1. Suggested reference inputs (available from the [Broad Resource Bundle](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0)):
- contamination_sites_bed: Homo_sapiens_assembly38.contam.bed
- contamination_sites_mu: Homo_sapiens_assembly38.contam.mu
- contamination_sites_ud: Homo_sapiens_assembly38.contam.UD
- dbsnp_vcf: Homo_sapiens_assembly38.dbsnp138.vcf
- reference_tar: Homo_sapiens_assembly38.tgz
- wgs_calling_interval_list: wgs_coverage_regions.hg38.interval_list
- wgs_evaluation_interval_list: wgs_evaluation_regions.hg38.interval_list
1. The input for the reference_tar must be a tar file containing the reference fasta along with its indexes.
The required indexes are `[.64.ann,.64.amb,.64.bwt,.64.pac,.64.sa,.dict,.fai]` and are generated by bwa, picard, and samtools.
Additionally, an `.64.alt` index is recommended.
Expand Down
12 changes: 6 additions & 6 deletions docs/KFDRC_SENTIEON_ALIGNMENT_GVCF_WORKFLOW_README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ out their website:
This workflow has a unique input `sentieon_license` that is not present in our
main alignment workflow. To run the Sentieon tool, users must provide the license
value to run any of the Sentieon tools. We have provided a default value that
works exclusively on Cavatica. Alternatively, if you wish to use this outside
of Cavatica, you will need to provide your own server license.
works exclusively on CAVATICA. Alternatively, if you wish to use this outside
of CAVATICA, you will need to provide your own server license.

Otherwise, this workflow uses identical inputs as our existing alignment workflow.
For more information see: https://github.com/kids-first/kf-alignment-workflow#inputs
Expand Down Expand Up @@ -90,11 +90,11 @@ Metrics collection and contamination estimation are unchanged.
## Basic Info
- [D3b dockerfiles](https://github.com/d3b-center/bixtools)
- Testing Tools:
- [Seven Bridges Cavatica Platform](https://cavatica.sbgenomics.com/)
- [Seven Bridges CAVATICA Platform](https://cavatica.sbgenomics.com/)
- [Common Workflow Language reference implementation (cwltool)](https://github.com/common-workflow-language/cwltool/)

## References
- KFDRC AWS s3 bucket: s3://kids-first-seq-data/broad-references/
- Cavatica: https://cavatica.sbgenomics.com/u/kfdrc-harmonization/kf-references/
- KFDRC AWS S3 bucket: s3://kids-first-seq-data/broad-references/
- CAVATICA: https://cavatica.sbgenomics.com/u/kfdrc-harmonization/kf-references/
- Sentieon: https://support.sentieon.com/manual/DNAseq_usage/dnaseq/
- Broad Institute Goolge Cloud: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/
- Broad Institute Goolge Cloud: https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0
4 changes: 3 additions & 1 deletion docs/KFDRC_SENTIEON_GVCF_WORKFLOW_README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,14 @@ verifybamid_output: If not provided by the user, the workflow will output verify
provide the three contamination files: `contamination_sites_bed`,
`contamination_sites_mu`, and `contamination_sites_ud`. Failure to provide one
of these groups will result in a failed run.
1. Suggested reference inputs (available from the [Broad Resource Bundle](https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0)):
1. Suggested reference inputs (available from the [Broad Resource Bundle](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0)):
- contamination_sites_bed: Homo_sapiens_assembly38.contam.bed
- contamination_sites_mu: Homo_sapiens_assembly38.contam.mu
- contamination_sites_ud: Homo_sapiens_assembly38.contam.UD
- dbsnp_vcf: Homo_sapiens_assembly38.dbsnp138.vcf
- reference_tar: Homo_sapiens_assembly38.tgz
- wgs_calling_interval_list: wgs_coverage_regions.hg38.interval_list
- wgs_evaluation_interval_list: wgs_evaluation_regions.hg38.interval_list
1. The input for the reference_tar must be a tar file containing the reference
fasta along with its indexes. The required indexes are
`[.64.ann,.64.amb,.64.bwt,.64.pac,.64.sa,.dict,.fai]` and are generated by bwa,
Expand Down
14 changes: 7 additions & 7 deletions workflows/kfdrc_sentieon_alignment_wf.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ doc: |
This workflow has a unique input `sentieon_license` that is not present in our
main alignment workflow. To run the Sentieon tool, users must provide the license
value to run any of the Sentieon tools. We have provided a default value that
works exclusively on Cavatica. Alternatively, if you wish to use this outside
of Cavatica, you will need to provide your own server license.
works exclusively on CAVATICA. Alternatively, if you wish to use this outside
of CAVATICA, you will need to provide your own server license.

Otherwise, this workflow uses identical inputs as our existing alignment workflow.
For more information see: https://github.com/kids-first/kf-alignment-workflow#inputs
Expand Down Expand Up @@ -95,14 +95,14 @@ doc: |
## Basic Info
- [D3b dockerfiles](https://github.com/d3b-center/bixtools)
- Testing Tools:
- [Seven Bridges Cavatica Platform](https://cavatica.sbgenomics.com/)
- [Seven Bridges CAVATICA Platform](https://cavatica.sbgenomics.com/)
- [Common Workflow Language reference implementation (cwltool)](https://github.com/common-workflow-language/cwltool/)

## References
- KFDRC AWS s3 bucket: s3://kids-first-seq-data/broad-references/
- Cavatica: https://cavatica.sbgenomics.com/u/kfdrc-harmonization/kf-references/
- KFDRC AWS S3 bucket: s3://kids-first-seq-data/broad-references/
- CAVATICA: https://cavatica.sbgenomics.com/u/kfdrc-harmonization/kf-references/
- Sentieon: https://support.sentieon.com/manual/DNAseq_usage/dnaseq/
- Broad Institute Goolge Cloud: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/
- Broad Institute Goolge Cloud: https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0
requirements:
- class: ScatterFeatureRequirement
- class: StepInputExpressionRequirement
Expand Down Expand Up @@ -178,7 +178,7 @@ inputs:
class: File, path: 6669ac8127374715fc3ba3c4, name: hla_v3.43.0_gencode_v39_dna_seq.fa}}
hla_dna_gene_coords: {type: 'File?', doc: "FASTA file containing the coordinates of the HLA genes for DNA.", "sbg:suggestedValue": {
class: File, path: 6669ac8127374715fc3ba3c2, name: hla_v3.43.0_gencode_v39_dna_coord.fa}}
t1k_abnormal_unmap_flag: { type: 'boolean?', doc: "Set if the flag in BAM for the unmapped read-pair is nonconcordant" }
t1k_abnormal_unmap_flag: {type: 'boolean?', doc: "Set if the flag in BAM for the unmapped read-pair is nonconcordant"}
outputs:
cram: {type: File, outputSource: sentieon_readwriter_bam_to_cram/output_reads, doc: "(Re)Aligned Reads File"}
gvcf: {type: 'File?', outputSource: generate_gvcf/gvcf, doc: "Genomic VCF generated from the realigned alignment file."}
Expand Down
62 changes: 26 additions & 36 deletions workflows/kfdrc_sentieon_gvcf_wf.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ doc: |
# Kids First Data Resource Center Sentieon gVCF Workflow

<p align="center">
<img src="https://github.com/d3b-center/d3b-research-workflows/raw/master/doc/kfdrc-logo-sm.png">
<img src="./kids_first_logo.svg" alt="Kids First repository logo" width="660px" />
</p>

This workflow takes a BAM/CRAM file, runs VerifyBamID, then runs Sentieon
Expand Down Expand Up @@ -35,12 +35,14 @@ doc: |
provide the three contamination files: `contamination_sites_bed`,
`contamination_sites_mu`, and `contamination_sites_ud`. Failure to provide one
of these groups will result in a failed run.
1. Suggested reference inputs (available from the [Broad Resource Bundle](https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0)):
1. Suggested reference inputs (available from the [Broad Resource Bundle](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0)):
- contamination_sites_bed: Homo_sapiens_assembly38.contam.bed
- contamination_sites_mu: Homo_sapiens_assembly38.contam.mu
- contamination_sites_ud: Homo_sapiens_assembly38.contam.UD
- dbsnp_vcf: Homo_sapiens_assembly38.dbsnp138.vcf
- reference_tar: Homo_sapiens_assembly38.tgz
- wgs_calling_interval_list: wgs_coverage_regions.hg38.interval_list
- wgs_evaluation_interval_list: wgs_evaluation_regions.hg38.interval_list
1. The input for the reference_tar must be a tar file containing the reference
fasta along with its indexes. The required indexes are
`[.64.ann,.64.amb,.64.bwt,.64.pac,.64.sa,.dict,.fai]` and are generated by bwa,
Expand All @@ -57,46 +59,34 @@ requirements:
- class: MultipleInputFeatureRequirement
- class: SubworkflowFeatureRequirement
inputs:
sentieon_license: {type: 'string?', default: "10.5.64.221:8990", doc: "License server\
\ host and port"}
input_reads: {type: 'File', secondaryFiles: [{pattern: '.bai', required: false},
{pattern: '^.bai', required: false}, {pattern: '.crai', required: false}, {
pattern: '^.crai', required: false}], doc: "Input BAM/CRAM file"}
sentieon_license: {type: 'string?', default: "10.5.64.221:8990", doc: "License server host and port"}
input_reads: {type: 'File', secondaryFiles: [{pattern: '.bai', required: false}, {pattern: '^.bai', required: false}, {pattern: '.crai',
required: false}, {pattern: '^.crai', required: false}], doc: "Input BAM/CRAM file"}
recal_table: {type: 'File?', doc: "Recalibration table from BQSR"}
output_basename: {type: 'string', doc: "String to use as the base for output filenames"}
reference_tar: {type: 'File', doc: "Tar file containing a reference fasta and, optionally,\
\ its complete set of associated indexes (samtools, bwa, and picard)", "sbg:suggestedValue": {
class: File, path: 5f4ffff4e4b0370371c05153, name: Homo_sapiens_assembly38.tgz}}
dbsnp_vcf: {type: 'File', doc: "dbSNP vcf file", "sbg:suggestedValue": {class: File,
path: 6063901f357c3a53540ca84b, name: Homo_sapiens_assembly38.dbsnp138.vcf}}
dbsnp_idx: {type: 'File?', doc: "dbSNP vcf index file", "sbg:suggestedValue": {
class: File, path: 6063901e357c3a53540ca834, name: Homo_sapiens_assembly38.dbsnp138.vcf.idx}}
contamination: {type: 'float?', doc: "Precalculated contamination value. Providing\
\ the value here will skip the run of VerifyBAMID and use the provided value\
\ as ground truth."}
contamination_sites_bed: {type: 'File?', doc: ".Bed file for markers used in this\
\ analysis,format(chr\tpos-1\tpos\trefAllele\taltAllele)", "sbg:suggestedValue": {
class: File, path: 6063901e357c3a53540ca833, name: Homo_sapiens_assembly38.contam.bed}}
contamination_sites_mu: {type: 'File?', doc: ".mu matrix file of genotype matrix",
"sbg:suggestedValue": {class: File, path: 60639017357c3a53540ca7cd, name: Homo_sapiens_assembly38.contam.mu}}
contamination_sites_ud: {type: 'File?', doc: ".UD matrix file from SVD result of\
\ genotype matrix", "sbg:suggestedValue": {class: File, path: 6063901f357c3a53540ca84f,
name: Homo_sapiens_assembly38.contam.UD}}
wgs_evaluation_interval_list: {type: 'File', doc: "Target intervals to restrict\
\ gvcf metric analysis (for VariantCallingMetrics)", "sbg:suggestedValue": {class: File,
path: 60639017357c3a53540ca7d3, name: wgs_evaluation_regions.hg38.interval_list}}
conditional: {type: 'boolean?', doc: "Hook to enable/disable this workflow when\
\ nested in another workflow."}
run_sex_metrics: {type: 'boolean?', doc: "idxstats will be collected\
\ and X/Y ratios calculated"}
reference_tar: {type: 'File', doc: "Tar file containing a reference fasta and, optionally, its complete set of associated indexes
(samtools, bwa, and picard)", "sbg:suggestedValue": {class: File, path: 5f4ffff4e4b0370371c05153, name: Homo_sapiens_assembly38.tgz}}
dbsnp_vcf: {type: 'File', doc: "dbSNP vcf file", "sbg:suggestedValue": {class: File, path: 6063901f357c3a53540ca84b, name: Homo_sapiens_assembly38.dbsnp138.vcf}}
dbsnp_idx: {type: 'File?', doc: "dbSNP vcf index file", "sbg:suggestedValue": {class: File, path: 6063901e357c3a53540ca834, name: Homo_sapiens_assembly38.dbsnp138.vcf.idx}}
contamination: {type: 'float?', doc: "Precalculated contamination value. Providing the value here will skip the run of VerifyBAMID
and use the provided value as ground truth."}
contamination_sites_bed: {type: 'File?', doc: ".Bed file for markers used in this analysis,format(chr\tpos-1\tpos\trefAllele\taltAllele)",
"sbg:suggestedValue": {class: File, path: 6063901e357c3a53540ca833, name: Homo_sapiens_assembly38.contam.bed}}
contamination_sites_mu: {type: 'File?', doc: ".mu matrix file of genotype matrix", "sbg:suggestedValue": {class: File, path: 60639017357c3a53540ca7cd,
name: Homo_sapiens_assembly38.contam.mu}}
contamination_sites_ud: {type: 'File?', doc: ".UD matrix file from SVD result of genotype matrix", "sbg:suggestedValue": {class: File,
path: 6063901f357c3a53540ca84f, name: Homo_sapiens_assembly38.contam.UD}}
wgs_evaluation_interval_list: {type: 'File', doc: "Target intervals to restrict gvcf metric analysis (for VariantCallingMetrics)",
"sbg:suggestedValue": {class: File, path: 60639017357c3a53540ca7d3, name: wgs_evaluation_regions.hg38.interval_list}}
conditional: {type: 'boolean?', doc: "Hook to enable/disable this workflow when nested in another workflow."}
run_sex_metrics: {type: 'boolean?', doc: "idxstats will be collected and X/Y ratios calculated"}
outputs:
gvcf: {type: File, outputSource: sentieon_haplotyper/output}
gvcf_calling_metrics: {type: 'File[]', outputSource: picard_collectgvcfcallingmetrics/output}
verifybamid_output: {type: 'File?', outputSource: verifybamid_checkcontam_conditional/output}
idxstats: {type: 'File?', outputSource: samtools_idxstats_xy_ratio/output, doc: "samtools\
\ idxstats of the realigned BAM file."}
xy_ratio: {type: 'File?', outputSource: samtools_idxstats_xy_ratio/ratio, doc: "Text\
\ file containing X and Y reads statistics generated from idxstats."}
idxstats: {type: 'File?', outputSource: samtools_idxstats_xy_ratio/output, doc: "samtools idxstats of the realigned BAM file."}
xy_ratio: {type: 'File?', outputSource: samtools_idxstats_xy_ratio/ratio, doc: "Text file containing X and Y reads statistics generated
from idxstats."}
steps:
index_dbsnp:
run: ../tools/gatk_indexfeaturefile.cwl
Expand Down

0 comments on commit f08577f

Please sign in to comment.