-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update GatherSampleEvidence & TrainGCNV docs #681
Update GatherSampleEvidence & TrainGCNV docs #681
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @VJalili for getting these started. I have a few comments below. In general, IMO the documentation can describe each input but should not provide actual values, and we should refer the reader to /inputs
for values of specific parameters to cut down on redundancy and avoid inconsistencies.
You may download the file from the following link. | ||
|
||
- Per-sample BAM or CRAM files aligned to hg38. Index files (.bai) must be provided if using BAMs. | ||
```shell | ||
gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems redundant since all resource files are available in /inputs/values
and also feels inconsistent to list it for just this input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree; this would also lead to having to keep updated references in various places. I changed the link to refer to the reference file, would this be better?
website/docs/modules/train_gcnv.md
Outdated
|
||
gse: GatherSampleEvidence | ||
eqc: EvidenceQC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we do usually run EvidenceQC prior to gCNV for outlier detection and batching, it isn't strictly required. Do you want these diagrams to reflect our recommended process or strictly input/output relationships?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was aiming for workflow dependency, loosely reflecting and replacing the Prerequisites
list we have. I am happy to change it to visualize the recommended execution order if that makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just chiming in that if you decide you want to represent execution order, I think we can just use the overall pipeline diagram for that and additional workflow-specific diagrams aren't necessary. If you want to show dependencies, that would add new information, but I do think it could be confusing when those dependencies don't match the execution order...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is replicating the Prerequisites
section we had, and apparently, it is their dependencies. I will change it for the recommended execution order, as module dependency is more code-related than useful to the end user. We can add a link to the diagram for the overall picture, and it is informative to have these diagrams to show what is at the upstream and downstream of each module to better navigate in the pipeline.
Co-authored-by: Mark Walker <[email protected]>
Co-authored-by: Mark Walker <[email protected]>
Co-authored-by: Mark Walker <[email protected]>
Co-authored-by: Mark Walker <[email protected]>
Thank you, @mwalker174, for the review. I like your idea of scoping documentation to descriptions of arguments and keeping references to file links or recommended/default values in separate places. I updated the PR accordingly. |
…ed in follow-up PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really looking forward to having the website fully set up as a one-stop-shop for all our documentation. I have some questions about how we want to approach documenting inputs/outputs and dependencies, as well as some more detailed comments/questions/corrections
website/docs/modules/evidence_qc.md
Outdated
t: TrainGCNV | ||
gse --> eqc | ||
eqc --> t | ||
eqc --> gbe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eqc --> gbe | |
t --> gbe |
This is related to the existing thread on displaying dependencies. None of the outputs of EvidenceQC are used in GatherBatchEvidence (or TrainGCNV technically). EvidenceQC is recommended to use to create batches for TrainGCNV (and the following steps) but if that's what you wanted to represent I would just exclude GatherBatchEvidence from this diagram since it follows TrainGCNV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is resolved in the updated diagrams; please recheck.
The following is the list of the inputs the GatherBatchEvidence workflow takes. | ||
|
||
|
||
#### `batch` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan to have detailed documentation for every input like this? Is that necessary?
Maybe it could be collapsible so it's more approachable for users who do not need that level of detail? Most users will just use the pre-configured default inputs and will only need detailed documentation on the pipeline-level inputs and outputs, and I wouldn't want to make it more difficult for them to navigate the documentation.
One other thing to consider is there are places where we do want users to be able to edit inputs as necessary, and I wouldn't want those inputs to get lost among the others - a separate category that does not collapse maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is to document every required input of these modules.
We have discussed a few options for those required inputs that do not have values set on Terra, or set values need to be adjusted, or set values need tweaking for cohort-to-cohort, etc. One of the options is tagging/labeling such inputs (similar to labeling optional/conditional outputs) and we can think of other alternatives. However, that is beyond the scope of this PR as here we are just documenting all the required (at least leaving a placeholder for them), and we will revisit their spotlighting later.
#### `wham_vcf` {#wham-vcf} | ||
A VCF file containing variants called by Wham. | ||
|
||
#### `coverage_counts` {#coverage-counts} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there supposed to be descriptions here? Feels inconsistent with the other sections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a description of these. We discussed leaving them as placeholders to make sure we will populate them. If you have a description, feel free to suggest one.
@mwalker174 this is ready for a re-review. Per our discussion, a few inputs/outputs have not been documented and are left as placeholders for your follow-up PR 🚀 Thanks for your feedback!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks let's fix the warning message I sent offline and then get this in.
Thank you, @mwalker174! and Emma! for the feedback. Looking forward to further improvements coming next on the modules documentation. |
This PR extends the documentation of the GatherSampleEvidence and TrainGCNV workflows. Specifically, it implements the following changes.
GatherSampleEvidence
,GatherBatchEvidence
, andTrainGCNV
workflows.####
); hence, these sections are not rendered in the quick navigation section (top-right panel) to reduce clutter.GatherSampleEvidence
,GatherBatchEvidence
,EvidenceQC
, andTrainGCNV
workflows.