broadinstitute · VJalili · Sep 25, 2024 · May 13, 2024 · May 13, 2024 · May 29, 2024
diff --git a/website/docs/modules/evidence_qc.md b/website/docs/modules/evidence_qc.md
@@ -5,6 +5,8 @@ sidebar_position: 2
 slug: eqc
 ---
 
+import { Highlight, HighlightOptionalArg } from "../../src/components/highlight.js"
+
 Runs ploidy estimation, dosage scoring, and optionally VCF QC. 
 The results from this module can be used for QC and batching.
 
@@ -17,9 +19,35 @@ for further guidance on creating batches.
 We also recommend using sex assignments generated from the ploidy 
 estimates and incorporating them into the PED file, with sex = 0 for sex aneuploidies.
 
-### Prerequisites
+The upstream and downstream dependencies of the EvidenceQC workflow 
+are illustrated in the following diagram.
+
+<br/>
+
+```mermaid
+
+stateDiagram
+  direction LR
+
+  classDef inModules stroke-width:0px,fill:#00509d,color:#caf0f8
+  classDef thisModule font-weight:bold,stroke-width:0px,fill:#ff9900,color:white
+  classDef outModules stroke-width:0px,fill:#caf0f8,color:#00509d
+
+  gse: GatherSampleEvidence
+  gbe: GatherBatchEvidence
+  eqc: EvidenceQC
+  t: TrainGCNV
+  gse --> eqc
+  eqc --> t
+  eqc --> gbe
-  eqc --> gbe
+  t --> gbe
-  eqc --> gbe
+  t --> gbe
+
+  class eqc thisModule
+  class gse inModules
+  class t, gbe outModules
+```
+
+<br/>
 
-- [Gather Sample Evidence](./gse)
 
 ### Inputs
 

diff --git a/website/docs/modules/gather_batch_evidence.md b/website/docs/modules/gather_batch_evidence.md
@@ -5,25 +5,205 @@ sidebar_position: 4
 slug: gbe
 ---
 
-Runs CNV callers (cnMOPs, GATK gCNV) and combines single-sample 
-raw evidence into a batch. See above for more information on batching.
+Runs CNV callers ([cn.MOPS](https://academic.oup.com/nar/article/40/9/e69/1136601), GATK gCNV) 
+and combines single-sample raw evidence into a batch.
 
-### Prerequisites
 
-- GatherSampleEvidence
-- (Recommended) EvidenceQC
-- gCNV training. 
+```mermaid
 
-### Inputs
-- PED file (updated with EvidenceQC sex assignments, including sex = 0 
-  for sex aneuploidies. Calls will not be made on sex chromosomes 
-  when sex = 0 in order to avoid generating many confusing calls 
-  or upsetting normalized copy numbers for the batch.)
-- Read count, BAF, PE, SD, and SR files (GatherSampleEvidence)
-- Caller VCFs (GatherSampleEvidence)
-- Contig ploidy model and gCNV model files (gCNV training)
+stateDiagram
+  direction LR
+
+  classDef inModules stroke-width:0px,fill:#00509d,color:#caf0f8
+  classDef thisModule font-weight:bold,stroke-width:0px,fill:#ff9900,color:white
+  classDef outModules stroke-width:0px,fill:#caf0f8,color:#00509d
 
-### Outputs
+  gse: GatherSampleEvidence
+  eqc: EvidenceQC
+  gcnv: TrainGCNV
+  gbe: GatherBatchEvidence
+  cbe: ClusterBatch
+  gse --> gbe
+  eqc --> gbe
+  gcnv --> gbe
+  gbe --> cbe
+
+  class gbe thisModule
+  class gse, eqc, gcnv inModules
+  class cbe outModules
+```
+
+## Inputs
+This workflow takes as input the read counts, BAF, PE, SD, SR, and per-caller VCF files 
+produced in the GatherSampleEvidence workflow, and contig ploidy and gCNV models from 
+the TrainGCNV workflow.
+The following is the list of the inputs the GatherBatchEvidence workflow takes.
+
+
+#### `batch`
+An identifier for the batch.
+
+
+#### `samples`
+Sets the list of sample IDs. 
+
+
+#### `counts`
+Set to the [`GatherSampleEvidence.coverage_counts`](./gse#coverage-counts) output.
+
+
+#### Raw calls
+
+The following inputs set the per-caller raw SV calls, and should be set 
+if the caller was run in the [`GatherSampleEvidence`](./gse) workflow.
+You may set each of the following inputs to the linked output from 
+the GatherSampleEvidence workflow.
+
+
+- `manta_vcfs`: [`GatherSampleEvidence.manta_vcf`](./gse#manta-vcf);
+- `melt_vcfs`: [`GatherSampleEvidence.melt_vcf`](./gse#melt-vcf);
+- `scramble_vcfs`: [`GatherSampleEvidence.scramble_vcf`](./gse#scramble-vcf);
+- `wham_vcfs`: [`GatherSampleEvidence.wham_vcf`](./gse#wham-vcf).
+
+#### `PE_files`
+Set to the [`GatherSampleEvidence.pesr_disc`](./gse#pesr-disc) output.
+
+#### `SR_files`
+Set to the [`GatherSampleEvidence.pesr_split`](./gse#pesr-split)
+
+
+#### `SD_files`
+Set to the [`GatherSampleEvidence.pesr_sd`](./gse#pesr-sd)
+
+
+#### `matrix_qc_distance`
+You may set it to `1000000`.
+
+
+#### `min_svsize`
+Sets the minimum size of SVs to include. 
+You may set it to `50`. 
+
+
+#### `ped_file`
+A pedigree file describing the familial relationshipts between the samples in the cohort.
+The file needs to be in the 
+[PED format](https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format).
+Updated with [EvidenceQC](./eqc) sex assignments, including 
+`sex = 0` for sex aneuploidies. Calls will not be made on sex chromosomes 
+when `sex = 0` in order to avoid generating many confusing calls 
+or upsetting normalized copy numbers for the batch.
+
+
+#### `run_matrix_qc`
+Enables or disables running optional QC tasks. 
+
+
+#### `gcnv_qs_cutoff`
+You may set the value of this input to `30`.
+
+#### cn.MOPS files
+The workflow needs the following cn.MOPS files.
+
+- `cnmops_chrom_file` and `cnmops_allo_file`: FASTA index files (`.fai`) for respectively non-sex chromosome (autosome) and chromosomes X and Y (allosomes). 
+  The content of the files may read as the following, 
+  and the format is explained [on this page](https://www.htslib.org/doc/faidx.html).
+
+  ```bash
+  chrX    156040895       2903754205      100     101
+  chrY    57227415        3061355656      100     101
+  ```
+
+  You may use the following files for these fields:
+
+  ```json
+  "cnmops_chrom_file": "gs://gcp-public-data--broad-references/hg38/v0/sv-resources/resources/v1/autosome.fai"
+  "cnmops_allo_file": "gs://gcp-public-data--broad-references/hg38/v0/sv-resources/resources/v1/allosome.fai"
+  ```
+
+- `cnmops_exclude_list`: You may use the following file for this field.
+  ```
+  gs://gcp-public-data--broad-references/hg38/v0/sv-resources/resources/v1/GRCh38_Nmask.bed
+  ```
+
+#### GATK-gCNV inputs
+
+The following inputs are configured based on the outputs generated in the [`TrainGCNV`](./gcnv) workflow.
+
+- `contig_ploidy_model_tar`: [`TrainGCNV.cohort_contig_ploidy_model_tar`](./gcnv#contig-ploidy-model-tarball)
+- `gcnv_model_tars`: [`TrainGCNV.cohort_gcnv_model_tars`](./gcnv#model-tarballs)
+
+
+The workflow also enables setting a few optional arguments of gCNV.
+The arguments and their default values are as the following,
+and each argument is documented on 
+[this page](https://gatk.broadinstitute.org/hc/en-us/articles/360037593411-PostprocessGermlineCNVCalls)
+and
+[this page](https://gatk.broadinstitute.org/hc/en-us/articles/360047217671-GermlineCNVCaller).
+
+```json
+"gcnv_caller_internal_admixing_rate": 0.5,
+"gcnv_caller_update_convergence_threshold": 0.000001,
+"gcnv_cnv_coherence_length": 1000,
+"gcnv_convergence_snr_averaging_window": 100,
+"gcnv_convergence_snr_countdown_window": 10,
+"gcnv_convergence_snr_trigger_threshold": 0.2,
+"gcnv_copy_number_posterior_expectation_mode": "EXACT",
+"gcnv_depth_correction_tau": 10000,
+"gcnv_learning_rate": 0.03,
+"gcnv_log_emission_sampling_median_rel_error": 0.001,
+"gcnv_log_emission_sampling_rounds": 20,
+"gcnv_max_advi_iter_first_epoch": 1000,
+"gcnv_max_advi_iter_subsequent_epochs": 200,
+"gcnv_max_training_epochs": 5,
+"gcnv_min_training_epochs": 1,
+"gcnv_num_thermal_advi_iters": 250,
+"gcnv_p_alt": 0.000001,
+"gcnv_sample_psi_scale": 0.000001,
+"ref_copy_number_autosomal_contigs": 2
+```
+
+
+#### Docker images
+
+The workflow needs the following Docker images, which you may find a link to their 
+latest images from [this file](https://github.com/broadinstitute/gatk-sv/blob/main/inputs/values/dockers.json).
+
+  - `cnmops_docker`;
+  - `condense_counts_docker`;
+  - `linux_docker`;
+  - `sv_base_docker`;
+  - `sv_base_mini_docker`;
+  - `sv_pipeline_docker`;
+  - `sv_pipeline_qc_docker`;
+  - `gcnv_gatk_docker`;
+  - `gatk_docker`.
+
+#### Static inputs
+
+You may refer to [this reference file](https://github.com/broadinstitute/gatk-sv/blob/main/inputs/values/resources_hg38.json)
+for values of the following inputs.
+
+ - `primary_contigs_fai`;
+ - `cytoband`;
+ - `ref_dict`;
+ - `mei_bed`;
+ - `genome_file`;
+ - `sd_locs_vcf`.
+
+
+#### Optional Inputs
+The following is the list of a few optional inputs of the 
+workflow, with an example of possible values. 
+
+- `"allosomal_contigs": [["chrX", "chrY"]]`
+- `"ploidy_sample_psi_scale": 0.001`
+
+
+
+
+
+## Outputs
 
 - Combined read count matrix, SR, PE, and BAF files
 - Standardized call VCFs

diff --git a/website/docs/modules/gather_sample_evidence.md b/website/docs/modules/gather_sample_evidence.md
@@ -6,20 +6,78 @@ slug: gse
 ---
 
 Runs raw evidence collection on each sample with the following SV callers: 
-Manta, Wham, and/or MELT. For guidance on pre-filtering prior to GatherSampleEvidence, 
+Manta, Wham, Scramble, and/or MELT. For guidance on pre-filtering prior to GatherSampleEvidence, 
 refer to the Sample Exclusion section.
 
-Note: a list of sample IDs must be provided. Refer to the sample ID 
-requirements for specifications of allowable sample IDs. 
+The downstream dependencies of the GatherSampleEvidence workflow 
+are illustrated in the following diagram.
+
+```mermaid
+
+stateDiagram
+  direction LR
+
+  classDef inModules stroke-width:0px,fill:#00509d,color:#caf0f8
+  classDef thisModule font-weight:bold,stroke-width:0px,fill:#ff9900,color:white
+  classDef outModules stroke-width:0px,fill:#caf0f8,color:#00509d
+
+  gse: GatherSampleEvidence
+  eqc: EvidenceQC
+  gcnv: TrainGCNV
+  gbe: GatherBatchEvidence
+  gse --> eqc
+  gse --> gcnv
+  gse --> gbe
+
+  class gse thisModule
+  class eqc, gcnv, gbe outModules
+```
+
+
+## Inputs
+
+#### `bam_or_cram_file`
+A BAM or CRAM file aligned to hg38. Index file (.bai) must be provided if using BAM.
+
+#### `sample_id`
+Refer to the [sample ID requirements](/docs/gs/inputs#sampleids) for specifications of allowable sample IDs. 
 IDs that do not meet these requirements may cause errors.
 
-### Inputs
+#### `preprocessed_intervals`
+Picard interval list.
+
+#### `sd_locs_vcf`
+(`sd`: site depth) 
+A VCF file containing allele counts at common SNP loci of the genome, which is used for calculating BAF.  
+For human genome, you may use [`dbSNP`](https://www.ncbi.nlm.nih.gov/snp/) 
+that contains a complete list of common and clinical human single nucleotide variations, 
+microsatellites, and small-scale insertions and deletions. 
+You may find a link to the file in 
+[this reference](https://github.com/broadinstitute/gatk-sv/blob/main/inputs/values/resources_hg38.json).
 
-- Per-sample BAM or CRAM files aligned to hg38. Index files (.bai) must be provided if using BAMs.
 
-### Outputs
+## Outputs
 
-- Caller VCFs (Manta, MELT, and/or Wham)
 - Binned read counts file
 - Split reads (SR) file
 - Discordant read pairs (PE) file
+
+#### `manta_vcf` {#manta-vcf}
+A VCF file containing variants called by Manta. 
+
+#### `melt_vcf` {#melt-vcf}
+A VCF file containing variants called by MELT. 
+
+#### `scramble_vcf` {#scramble-vcf}
+A VCF file containing variants called by Scramble. 
+
+#### `wham_vcf` {#wham-vcf}
+A VCF file containing variants called by Wham. 
+
+#### `coverage_counts` {#coverage-counts}
+
+#### `pesr_disc` {#pesr-disc}
+
+#### `pesr_split` {#pesr-split}
+
+#### `pesr_sd` {#pesr-sd}