Update GatherSampleEvidence & TrainGCNV docs #681

VJalili · 2024-05-29T17:41:31Z

This PR extends the documentation of the GatherSampleEvidence and TrainGCNV workflows. Specifically, it implements the following changes.

Extends the documentation on the inputs and outputs of the GatherSampleEvidence, GatherBatchEvidence, and TrainGCNV workflows.
Instead of enlisting inputs and outputs, they are formatted as sections. This enables referencing them individually, which is useful when referencing an output of workflow passed as input to another workflow. The header level is set to 4 (i.e., ####); hence, these sections are not rendered in the quick navigation section (top-right panel) to reduce clutter.
Adds diagrams for better visualization of up and downstream dependencies of workflows, currently implemented for GatherSampleEvidence, GatherBatchEvidence, EvidenceQC, and TrainGCNV workflows.
Add JS & CSS components to enable highlighting text, which is used in this PR to tag optional outputs.

…evidence

mwalker174

Thanks @VJalili for getting these started. I have a few comments below. In general, IMO the documentation can describe each input but should not provide actual values, and we should refer the reader to /inputs for values of specific parameters to cut down on redundancy and avoid inconsistencies.

website/docs/modules/gather_sample_evidence.md

mwalker174 · 2024-09-11T14:40:19Z

website/docs/modules/gather_sample_evidence.md

+You may download the file from the following link.

- Per-sample BAM or CRAM files aligned to hg38. Index files (.bai) must be provided if using BAMs.
+```shell
+gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf
+```


This seems redundant since all resource files are available in /inputs/values and also feels inconsistent to list it for just this input.

I agree; this would also lead to having to keep updated references in various places. I changed the link to refer to the reference file, would this be better?

website/docs/modules/gather_sample_evidence.md

mwalker174 · 2024-09-11T14:43:40Z

website/docs/modules/train_gcnv.md

+
+  gse: GatherSampleEvidence
+  eqc: EvidenceQC


While we do usually run EvidenceQC prior to gCNV for outlier detection and batching, it isn't strictly required. Do you want these diagrams to reflect our recommended process or strictly input/output relationships?

I was aiming for workflow dependency, loosely reflecting and replacing the Prerequisites list we have. I am happy to change it to visualize the recommended execution order if that makes more sense.

Just chiming in that if you decide you want to represent execution order, I think we can just use the overall pipeline diagram for that and additional workflow-specific diagrams aren't necessary. If you want to show dependencies, that would add new information, but I do think it could be confusing when those dependencies don't match the execution order...

This is replicating the Prerequisites section we had, and apparently, it is their dependencies. I will change it for the recommended execution order, as module dependency is more code-related than useful to the end user. We can add a link to the diagram for the overall picture, and it is informative to have these diagrams to show what is at the upstream and downstream of each module to better navigate in the pipeline.

website/docs/modules/train_gcnv.md

website/docs/modules/gather_batch_evidence.md

Co-authored-by: Mark Walker <[email protected]>

VJalili · 2024-09-13T19:32:07Z

Thank you, @mwalker174, for the review. I like your idea of scoping documentation to descriptions of arguments and keeping references to file links or recommended/default values in separate places. I updated the PR accordingly.

…ed in follow-up PRs.

epiercehoffman

I am really looking forward to having the website fully set up as a one-stop-shop for all our documentation. I have some questions about how we want to approach documenting inputs/outputs and dependencies, as well as some more detailed comments/questions/corrections

epiercehoffman · 2024-09-12T21:30:01Z

website/docs/modules/evidence_qc.md

+  t: TrainGCNV
+  gse --> eqc
+  eqc --> t
+  eqc --> gbe


Suggested change

eqc --> gbe

t --> gbe

This is related to the existing thread on displaying dependencies. None of the outputs of EvidenceQC are used in GatherBatchEvidence (or TrainGCNV technically). EvidenceQC is recommended to use to create batches for TrainGCNV (and the following steps) but if that's what you wanted to represent I would just exclude GatherBatchEvidence from this diagram since it follows TrainGCNV

This is resolved in the updated diagrams; please recheck.

website/docs/modules/gather_batch_evidence.md

epiercehoffman · 2024-09-12T21:41:50Z

website/docs/modules/gather_batch_evidence.md

+The following is the list of the inputs the GatherBatchEvidence workflow takes.
+
+
+#### `batch`


Is the plan to have detailed documentation for every input like this? Is that necessary?

Maybe it could be collapsible so it's more approachable for users who do not need that level of detail? Most users will just use the pre-configured default inputs and will only need detailed documentation on the pipeline-level inputs and outputs, and I wouldn't want to make it more difficult for them to navigate the documentation.

One other thing to consider is there are places where we do want users to be able to edit inputs as necessary, and I wouldn't want those inputs to get lost among the others - a separate category that does not collapse maybe?

The plan is to document every required input of these modules.

We have discussed a few options for those required inputs that do not have values set on Terra, or set values need to be adjusted, or set values need tweaking for cohort-to-cohort, etc. One of the options is tagging/labeling such inputs (similar to labeling optional/conditional outputs) and we can think of other alternatives. However, that is beyond the scope of this PR as here we are just documenting all the required (at least leaving a placeholder for them), and we will revisit their spotlighting later.

website/docs/modules/gather_batch_evidence.md

epiercehoffman · 2024-09-13T20:32:29Z

website/docs/modules/gather_sample_evidence.md

+#### `wham_vcf` {#wham-vcf}
+A VCF file containing variants called by Wham. 
+
+#### `coverage_counts` {#coverage-counts}


Are there supposed to be descriptions here? Feels inconsistent with the other sections

I don't have a description of these. We discussed leaving them as placeholders to make sure we will populate them. If you have a description, feel free to suggest one.

website/docs/modules/gather_sample_evidence.md

website/docs/modules/gather_batch_evidence.md

VJalili · 2024-09-19T16:10:08Z

@mwalker174 this is ready for a re-review. Per our discussion, a few inputs/outputs have not been documented and are left as placeholders for your follow-up PR 🚀 Thanks for your feedback!!

mwalker174

Thanks let's fix the warning message I sent offline and then get this in.

VJalili · 2024-09-25T19:52:24Z

Thank you, @mwalker174! and Emma! for the feedback. Looking forward to further improvements coming next on the modules documentation.

VJalili added 5 commits May 13, 2024 08:54

Extend docs.

8b6931c

Merge remote-tracking branch 'upstream/main' into docs_gather_sample_…

cdbe237

…evidence

Add Scramble to GSE.

1fcfe12

Update TrainGCNV docs.

19a824a

Merge remote-tracking branch 'upstream/main' into docs_gather_sample_…

d5a3c98

…evidence

VJalili added the documentation Improvements or additions to documentation label May 29, 2024

VJalili added 5 commits June 6, 2024 12:00

Document the annotated_intervals output.

71de496

Add an option to highlight text.

6032dcf

Extend docs on inputs and outputs of workflows.

a0f0ced

Fix typo & add diagram for gather sample evidence.

a06c18b

Update header level to match inputs section.

00076ec

VJalili marked this pull request as ready for review August 2, 2024 17:18

VJalili requested a review from mwalker174 August 2, 2024 17:18

mwalker174 reviewed Sep 11, 2024

View reviewed changes

VJalili and others added 7 commits September 13, 2024 10:30

Update website/docs/modules/gather_sample_evidence.md

dfb0a72

Co-authored-by: Mark Walker <[email protected]>

Update website/docs/modules/gather_sample_evidence.md

c7f9795

Co-authored-by: Mark Walker <[email protected]>

Update website/docs/modules/train_gcnv.md

e1862b3

Co-authored-by: Mark Walker <[email protected]>

Replace direct link with a reference to the resources file.

fd129b3

Update website/docs/modules/train_gcnv.md

2926a89

Co-authored-by: Mark Walker <[email protected]>

Replace direct links with references to the resources file.

7d4b503

Separate gatk-sv input, & add additional external docs link.

2ab676e

VJalili added 2 commits September 13, 2024 15:34

Remove links.

fdb33ab

Add a single-line descript to avoid empty section. Needs to be extend…

b9dfb39

…ed in follow-up PRs.

epiercehoffman reviewed Sep 13, 2024

View reviewed changes

VJalili added 5 commits September 18, 2024 11:33

update diagrams to display recommended invocation order.

490c739

add a common inputs section & remove some values.

53df818

make plural

fb0bf8f

update.

d291c6c

update link

56dfaa2

clarify homogeneous

2e2a4dc

VJalili requested a review from mwalker174 September 19, 2024 16:10

mwalker174 approved these changes Sep 25, 2024

View reviewed changes

Fix a broken link.

b782169

VJalili merged commit 83e9464 into broadinstitute:main Sep 25, 2024
3 checks passed

VJalili deleted the docs_gather_sample_evidence branch September 25, 2024 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update GatherSampleEvidence & TrainGCNV docs #681

Update GatherSampleEvidence & TrainGCNV docs #681

VJalili commented May 29, 2024 •

edited

Loading

mwalker174 left a comment

mwalker174 Sep 11, 2024

VJalili Sep 13, 2024

mwalker174 Sep 11, 2024

VJalili Sep 13, 2024

epiercehoffman Sep 13, 2024

VJalili Sep 18, 2024

VJalili commented Sep 13, 2024

epiercehoffman left a comment

epiercehoffman Sep 12, 2024

VJalili Sep 18, 2024

epiercehoffman Sep 12, 2024

VJalili Sep 18, 2024

epiercehoffman Sep 13, 2024

VJalili Sep 18, 2024

VJalili commented Sep 19, 2024

mwalker174 left a comment

VJalili commented Sep 25, 2024

		The following is the list of the inputs the GatherBatchEvidence workflow takes.


		#### `batch`

Update GatherSampleEvidence & TrainGCNV docs #681

Update GatherSampleEvidence & TrainGCNV docs #681

Conversation

VJalili commented May 29, 2024 • edited Loading

mwalker174 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VJalili commented Sep 13, 2024

epiercehoffman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VJalili commented Sep 19, 2024

mwalker174 left a comment

Choose a reason for hiding this comment

VJalili commented Sep 25, 2024

VJalili commented May 29, 2024 •

edited

Loading