📚 update GCP links

kids-first · Jun 17, 2024 · f08577f · f08577f
1 parent 12013d5
commit f08577f
Show file tree

Hide file tree

Showing 5 changed files with 46 additions and 52 deletions.
diff --git a/docs/KFDRC_GATK_HAPLOTYPECALLER_CRAM_TO_GVCF_WORKFLOW_README.md b/docs/KFDRC_GATK_HAPLOTYPECALLER_CRAM_TO_GVCF_WORKFLOW_README.md
@@ -6,7 +6,7 @@ calling metrics, and, if no contamination value is provided, the VerifyBAMID
 output. Additionally, if the user sets the `run_sex_metrics` input to true, two
 additional outputs from samtools idxstats will be provided.
 
-This workflow is the current production workflow, equivalent to this [Cavatica public app](https://cavatica.sbgenomics.com/public/apps#cavatica/apps-publisher/kfdrc-gatk-haplotypecaller-workflow).
+This workflow is the current production workflow, equivalent to this [CAVATICA public app](https://cavatica.sbgenomics.com/public/apps#cavatica/apps-publisher/kfdrc-gatk-haplotypecaller-workflow).
 
 ## Inputs
 
@@ -60,12 +60,14 @@ idxstats will be run on the input reads to generate the sex metrics outputs
 1. For contamination input, either populate the `contamination` field or provide the three contamination
    files: `contamination_sites_bed`, `contamination_sites_mu`, and `contamination_sites_ud`. Failure to
    provide one of these groups will result in a failed run.
-1. Suggested reference inputs (available from the [Broad Resource Bundle](https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0)):
+1. Suggested reference inputs (available from the [Broad Resource Bundle](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0)):
     - contamination_sites_bed: Homo_sapiens_assembly38.contam.bed
     - contamination_sites_mu: Homo_sapiens_assembly38.contam.mu
     - contamination_sites_ud: Homo_sapiens_assembly38.contam.UD
     - dbsnp_vcf: Homo_sapiens_assembly38.dbsnp138.vcf
     - reference_tar: Homo_sapiens_assembly38.tgz
+    - wgs_calling_interval_list: wgs_coverage_regions.hg38.interval_list
+    - wgs_evaluation_interval_list: wgs_evaluation_regions.hg38.interval_list
 1. The input for the reference_tar must be a tar file containing the reference fasta along with its indexes.
    The required indexes are `[.64.ann,.64.amb,.64.bwt,.64.pac,.64.sa,.dict,.fai]` and are generated by bwa, picard, and samtools.
    Additionally, an `.64.alt` index is recommended.

diff --git a/docs/KFDRC_SENTIEON_ALIGNMENT_GVCF_WORKFLOW_README.md b/docs/KFDRC_SENTIEON_ALIGNMENT_GVCF_WORKFLOW_README.md
@@ -27,8 +27,8 @@ out their website:
 This workflow has a unique input `sentieon_license` that is not present in our
 main alignment workflow. To run the Sentieon tool, users must provide the license
 value to run any of the Sentieon tools. We have provided a default value that
-works exclusively on Cavatica. Alternatively, if you wish to use this outside
-of Cavatica, you will need to provide your own server license.
+works exclusively on CAVATICA. Alternatively, if you wish to use this outside
+of CAVATICA, you will need to provide your own server license.
 
 Otherwise, this workflow uses identical inputs as our existing alignment workflow.
 For more information see: https://github.com/kids-first/kf-alignment-workflow#inputs
@@ -90,11 +90,11 @@ Metrics collection and contamination estimation are unchanged.
 ## Basic Info
 - [D3b dockerfiles](https://github.com/d3b-center/bixtools)
 - Testing Tools:
-    - [Seven Bridges Cavatica Platform](https://cavatica.sbgenomics.com/)
+    - [Seven Bridges CAVATICA Platform](https://cavatica.sbgenomics.com/)
     - [Common Workflow Language reference implementation (cwltool)](https://github.com/common-workflow-language/cwltool/)
 
 ## References
-- KFDRC AWS s3 bucket: s3://kids-first-seq-data/broad-references/
-- Cavatica: https://cavatica.sbgenomics.com/u/kfdrc-harmonization/kf-references/
+- KFDRC AWS S3 bucket: s3://kids-first-seq-data/broad-references/
+- CAVATICA: https://cavatica.sbgenomics.com/u/kfdrc-harmonization/kf-references/
 - Sentieon: https://support.sentieon.com/manual/DNAseq_usage/dnaseq/
-- Broad Institute Goolge Cloud: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/
+- Broad Institute Goolge Cloud: https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0
diff --git a/docs/KFDRC_SENTIEON_GVCF_WORKFLOW_README.md b/docs/KFDRC_SENTIEON_GVCF_WORKFLOW_README.md
@@ -30,12 +30,14 @@ verifybamid_output: If not provided by the user, the workflow will output verify
    provide the three contamination files: `contamination_sites_bed`,
    `contamination_sites_mu`, and `contamination_sites_ud`. Failure to provide one
    of these groups will result in a failed run.
-1. Suggested reference inputs (available from the [Broad Resource Bundle](https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0)):
+1. Suggested reference inputs (available from the [Broad Resource Bundle](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0)):
     - contamination_sites_bed: Homo_sapiens_assembly38.contam.bed
     - contamination_sites_mu: Homo_sapiens_assembly38.contam.mu
     - contamination_sites_ud: Homo_sapiens_assembly38.contam.UD
     - dbsnp_vcf: Homo_sapiens_assembly38.dbsnp138.vcf
     - reference_tar: Homo_sapiens_assembly38.tgz
+    - wgs_calling_interval_list: wgs_coverage_regions.hg38.interval_list
+    - wgs_evaluation_interval_list: wgs_evaluation_regions.hg38.interval_list
 1. The input for the reference_tar must be a tar file containing the reference
    fasta along with its indexes.  The required indexes are
    `[.64.ann,.64.amb,.64.bwt,.64.pac,.64.sa,.dict,.fai]` and are generated by bwa,

diff --git a/workflows/kfdrc_sentieon_alignment_wf.cwl b/workflows/kfdrc_sentieon_alignment_wf.cwl
@@ -32,8 +32,8 @@ doc: |
   This workflow has a unique input `sentieon_license` that is not present in our
   main alignment workflow. To run the Sentieon tool, users must provide the license
   value to run any of the Sentieon tools. We have provided a default value that
-  works exclusively on Cavatica. Alternatively, if you wish to use this outside
-  of Cavatica, you will need to provide your own server license.
+  works exclusively on CAVATICA. Alternatively, if you wish to use this outside
+  of CAVATICA, you will need to provide your own server license.
 
   Otherwise, this workflow uses identical inputs as our existing alignment workflow.
   For more information see: https://github.com/kids-first/kf-alignment-workflow#inputs
@@ -95,14 +95,14 @@ doc: |
   ## Basic Info
   - [D3b dockerfiles](https://github.com/d3b-center/bixtools)
   - Testing Tools:
-      - [Seven Bridges Cavatica Platform](https://cavatica.sbgenomics.com/)
+      - [Seven Bridges CAVATICA Platform](https://cavatica.sbgenomics.com/)
       - [Common Workflow Language reference implementation (cwltool)](https://github.com/common-workflow-language/cwltool/)
 
   ## References
-  - KFDRC AWS s3 bucket: s3://kids-first-seq-data/broad-references/
-  - Cavatica: https://cavatica.sbgenomics.com/u/kfdrc-harmonization/kf-references/
+  - KFDRC AWS S3 bucket: s3://kids-first-seq-data/broad-references/
+  - CAVATICA: https://cavatica.sbgenomics.com/u/kfdrc-harmonization/kf-references/
   - Sentieon: https://support.sentieon.com/manual/DNAseq_usage/dnaseq/
-  - Broad Institute Goolge Cloud: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/
+  - Broad Institute Goolge Cloud: https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0
 requirements:
 - class: ScatterFeatureRequirement
 - class: StepInputExpressionRequirement
@@ -178,7 +178,7 @@ inputs:
       class: File, path: 6669ac8127374715fc3ba3c4, name: hla_v3.43.0_gencode_v39_dna_seq.fa}}
   hla_dna_gene_coords: {type: 'File?', doc: "FASTA file containing the coordinates of the HLA genes for DNA.", "sbg:suggestedValue": {
       class: File, path: 6669ac8127374715fc3ba3c2, name: hla_v3.43.0_gencode_v39_dna_coord.fa}}
-  t1k_abnormal_unmap_flag: { type: 'boolean?', doc: "Set if the flag in BAM for the unmapped read-pair is nonconcordant" }
+  t1k_abnormal_unmap_flag: {type: 'boolean?', doc: "Set if the flag in BAM for the unmapped read-pair is nonconcordant"}
 outputs:
   cram: {type: File, outputSource: sentieon_readwriter_bam_to_cram/output_reads, doc: "(Re)Aligned Reads File"}
   gvcf: {type: 'File?', outputSource: generate_gvcf/gvcf, doc: "Genomic VCF generated from the realigned alignment file."}

diff --git a/workflows/kfdrc_sentieon_gvcf_wf.cwl b/workflows/kfdrc_sentieon_gvcf_wf.cwl
@@ -6,7 +6,7 @@ doc: |
   # Kids First Data Resource Center Sentieon gVCF Workflow
 
   <p align="center">
-    <img src="https://github.com/d3b-center/d3b-research-workflows/raw/master/doc/kfdrc-logo-sm.png">
+    <img src="./kids_first_logo.svg" alt="Kids First repository logo" width="660px" />
   </p>
 
   This workflow takes a BAM/CRAM file, runs VerifyBamID, then runs Sentieon
@@ -35,12 +35,14 @@ doc: |
      provide the three contamination files: `contamination_sites_bed`,
      `contamination_sites_mu`, and `contamination_sites_ud`. Failure to provide one
      of these groups will result in a failed run.
-  1. Suggested reference inputs (available from the [Broad Resource Bundle](https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0)):
+  1. Suggested reference inputs (available from the [Broad Resource Bundle](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0)):
       - contamination_sites_bed: Homo_sapiens_assembly38.contam.bed
       - contamination_sites_mu: Homo_sapiens_assembly38.contam.mu
       - contamination_sites_ud: Homo_sapiens_assembly38.contam.UD
       - dbsnp_vcf: Homo_sapiens_assembly38.dbsnp138.vcf
       - reference_tar: Homo_sapiens_assembly38.tgz
+      - wgs_calling_interval_list: wgs_coverage_regions.hg38.interval_list
+      - wgs_evaluation_interval_list: wgs_evaluation_regions.hg38.interval_list
   1. The input for the reference_tar must be a tar file containing the reference
      fasta along with its indexes.  The required indexes are
      `[.64.ann,.64.amb,.64.bwt,.64.pac,.64.sa,.dict,.fai]` and are generated by bwa,
@@ -57,46 +59,34 @@ requirements:
 - class: MultipleInputFeatureRequirement
 - class: SubworkflowFeatureRequirement
 inputs:
-  sentieon_license: {type: 'string?', default: "10.5.64.221:8990", doc: "License server\
-      \ host and port"}
-  input_reads: {type: 'File', secondaryFiles: [{pattern: '.bai', required: false},
-      {pattern: '^.bai', required: false}, {pattern: '.crai', required: false}, {
-        pattern: '^.crai', required: false}], doc: "Input BAM/CRAM file"}
+  sentieon_license: {type: 'string?', default: "10.5.64.221:8990", doc: "License server host and port"}
+  input_reads: {type: 'File', secondaryFiles: [{pattern: '.bai', required: false}, {pattern: '^.bai', required: false}, {pattern: '.crai',
+        required: false}, {pattern: '^.crai', required: false}], doc: "Input BAM/CRAM file"}
   recal_table: {type: 'File?', doc: "Recalibration table from BQSR"}
   output_basename: {type: 'string', doc: "String to use as the base for output filenames"}
-  reference_tar: {type: 'File', doc: "Tar file containing a reference fasta and, optionally,\
-      \ its complete set of associated indexes (samtools, bwa, and picard)", "sbg:suggestedValue": {
-      class: File, path: 5f4ffff4e4b0370371c05153, name: Homo_sapiens_assembly38.tgz}}
-  dbsnp_vcf: {type: 'File', doc: "dbSNP vcf file", "sbg:suggestedValue": {class: File,
-      path: 6063901f357c3a53540ca84b, name: Homo_sapiens_assembly38.dbsnp138.vcf}}
-  dbsnp_idx: {type: 'File?', doc: "dbSNP vcf index file", "sbg:suggestedValue": {
-      class: File, path: 6063901e357c3a53540ca834, name: Homo_sapiens_assembly38.dbsnp138.vcf.idx}}
-  contamination: {type: 'float?', doc: "Precalculated contamination value. Providing\
-      \ the value here will skip the run of VerifyBAMID and use the provided value\
-      \ as ground truth."}
-  contamination_sites_bed: {type: 'File?', doc: ".Bed file for markers used in this\
-      \ analysis,format(chr\tpos-1\tpos\trefAllele\taltAllele)", "sbg:suggestedValue": {
-      class: File, path: 6063901e357c3a53540ca833, name: Homo_sapiens_assembly38.contam.bed}}
-  contamination_sites_mu: {type: 'File?', doc: ".mu matrix file of genotype matrix",
-    "sbg:suggestedValue": {class: File, path: 60639017357c3a53540ca7cd, name: Homo_sapiens_assembly38.contam.mu}}
-  contamination_sites_ud: {type: 'File?', doc: ".UD matrix file from SVD result of\
-      \ genotype matrix", "sbg:suggestedValue": {class: File, path: 6063901f357c3a53540ca84f,
-      name: Homo_sapiens_assembly38.contam.UD}}
-  wgs_evaluation_interval_list: {type: 'File', doc: "Target intervals to restrict\
-      \ gvcf metric analysis (for VariantCallingMetrics)", "sbg:suggestedValue": {class: File,
-      path: 60639017357c3a53540ca7d3, name: wgs_evaluation_regions.hg38.interval_list}}
-  conditional: {type: 'boolean?', doc: "Hook to enable/disable this workflow when\
-      \ nested in another workflow."}
-  run_sex_metrics: {type: 'boolean?', doc: "idxstats will be collected\
-      \ and X/Y ratios calculated"}
+  reference_tar: {type: 'File', doc: "Tar file containing a reference fasta and, optionally, its complete set of associated indexes
+      (samtools, bwa, and picard)", "sbg:suggestedValue": {class: File, path: 5f4ffff4e4b0370371c05153, name: Homo_sapiens_assembly38.tgz}}
+  dbsnp_vcf: {type: 'File', doc: "dbSNP vcf file", "sbg:suggestedValue": {class: File, path: 6063901f357c3a53540ca84b, name: Homo_sapiens_assembly38.dbsnp138.vcf}}
+  dbsnp_idx: {type: 'File?', doc: "dbSNP vcf index file", "sbg:suggestedValue": {class: File, path: 6063901e357c3a53540ca834, name: Homo_sapiens_assembly38.dbsnp138.vcf.idx}}
+  contamination: {type: 'float?', doc: "Precalculated contamination value. Providing the value here will skip the run of VerifyBAMID
+      and use the provided value as ground truth."}
+  contamination_sites_bed: {type: 'File?', doc: ".Bed file for markers used in this analysis,format(chr\tpos-1\tpos\trefAllele\taltAllele)",
+    "sbg:suggestedValue": {class: File, path: 6063901e357c3a53540ca833, name: Homo_sapiens_assembly38.contam.bed}}
+  contamination_sites_mu: {type: 'File?', doc: ".mu matrix file of genotype matrix", "sbg:suggestedValue": {class: File, path: 60639017357c3a53540ca7cd,
+      name: Homo_sapiens_assembly38.contam.mu}}
+  contamination_sites_ud: {type: 'File?', doc: ".UD matrix file from SVD result of genotype matrix", "sbg:suggestedValue": {class: File,
+      path: 6063901f357c3a53540ca84f, name: Homo_sapiens_assembly38.contam.UD}}
+  wgs_evaluation_interval_list: {type: 'File', doc: "Target intervals to restrict gvcf metric analysis (for VariantCallingMetrics)",
+    "sbg:suggestedValue": {class: File, path: 60639017357c3a53540ca7d3, name: wgs_evaluation_regions.hg38.interval_list}}
+  conditional: {type: 'boolean?', doc: "Hook to enable/disable this workflow when nested in another workflow."}
+  run_sex_metrics: {type: 'boolean?', doc: "idxstats will be collected and X/Y ratios calculated"}
 outputs:
   gvcf: {type: File, outputSource: sentieon_haplotyper/output}
   gvcf_calling_metrics: {type: 'File[]', outputSource: picard_collectgvcfcallingmetrics/output}
   verifybamid_output: {type: 'File?', outputSource: verifybamid_checkcontam_conditional/output}
-  idxstats: {type: 'File?', outputSource: samtools_idxstats_xy_ratio/output, doc: "samtools\
-      \ idxstats of the realigned BAM file."}
-  xy_ratio: {type: 'File?', outputSource: samtools_idxstats_xy_ratio/ratio, doc: "Text\
-      \ file containing X and Y reads statistics generated from idxstats."}
+  idxstats: {type: 'File?', outputSource: samtools_idxstats_xy_ratio/output, doc: "samtools idxstats of the realigned BAM file."}
+  xy_ratio: {type: 'File?', outputSource: samtools_idxstats_xy_ratio/ratio, doc: "Text file containing X and Y reads statistics generated
+      from idxstats."}
 steps:
   index_dbsnp:
     run: ../tools/gatk_indexfeaturefile.cwl