From 41b1f1c02c323612b21f6883c39d8b4bb573f4c7 Mon Sep 17 00:00:00 2001 From: Vahid Date: Wed, 12 Jun 2024 16:09:18 -0400 Subject: [PATCH] Update docs on building inputs (#685) * Gather all the docs on building inputs under a common section. * Update formatting. * Fix mermaid version to avoid conflicts. * Update website/docs/advanced/build_inputs.md Co-authored-by: Mark Walker * Update website/docs/advanced/build_inputs.md Co-authored-by: Mark Walker * Add a link to cromwell configuration file. --------- Co-authored-by: Mark Walker --- website/docs/advanced/build_inputs.md | 139 ++++++++++++++++++++++++++ website/docs/gs/input_files.md | 38 ------- website/docs/gs/quick_start.md | 19 +--- 3 files changed, 142 insertions(+), 54 deletions(-) create mode 100644 website/docs/advanced/build_inputs.md diff --git a/website/docs/advanced/build_inputs.md b/website/docs/advanced/build_inputs.md new file mode 100644 index 000000000..b986919c0 --- /dev/null +++ b/website/docs/advanced/build_inputs.md @@ -0,0 +1,139 @@ +--- +title: Building inputs +description: Building work input json files +sidebar_position: 1 +slug: build_inputs +--- + +Each workflow of the GATK-SV pipeline takes a unique set of arguments as inputs. +You have different options for configuring them depending on the platform you're +using to run the pipeline. +For instance, you may use Terra workspaces if you're running on Terra (user-friendly), +or JSON files if you're running on Cromwell (for development and advanced use-cases). +For each workflow, we provide example configurations that help both in setting up +your own Terra workspace or for testing purposes with sample data. +You may run the following commands to get these example inputs. + + +1. Clone GATK-SV (you may skip this step if you have already done so). + + ```shell + git clone https://github.com/broadinstitute/gatk-sv && cd gatk-sv + ``` + +2. Create a JSON file containing the Terra billing project (for use on Terra) + or the Google project ID (for use on Cromwell) that you will use to run + the workflows with the test input. You may create this file by running + the following command and replacing `"my-google-project-id"` and + `"my-terra-billing-project"` with your project and billing IDs. + + ```shell + echo '{ "google_project_id": "my-google-project-id", "terra_billing_project_id": "my-terra-billing-project" }' > inputs/values/google_cloud.my_project.json + ``` + +3. Create test inputs. + + ```shell + bash scripts/inputs/build_default_inputs.sh -d . -c google_cloud.my_project + ``` + + Running this command generates test inputs in `gatk-sv/inputs/build` with the following structure. + + ```shell + inputs/build + ├── NA12878 + │   ├── terra + │   └── test + ├── NA19240 + │   └── test + ├── hgdp + │   └── test + └── ref_panel_1kg + ├── terra + └── test + ``` + +## Building inputs for specific use-cases (Advanced) + +### Build for batched workflows + +```shell +python scripts/inputs/build_inputs.py \ + inputs/values \ + inputs/templates/test/GATKSVPipelineSingleSample \ + inputs/build/NA19240/test \ + -a '{ "test_batch" : "ref_panel_1kg", "cloud_env": "google_cloud.my_project" }' +``` + + +### Generating a reference panel + +This section only applies to the single-sample mode. +New reference panels can be generated from a single run of the +`GATKSVPipelineBatch` workflow. +If using a Cromwell server, we recommend copying the outputs to a +permanent location by adding the following option to the +[workflow configuration](https://cromwell.readthedocs.io/en/latest/wf_options/Overview/) +file: + +```json +"final_workflow_outputs_dir" : "gs://my-outputs-bucket", +"use_relative_output_paths": false, +``` + +Here is an example of how to generate workflow input jsons from `GATKSVPipelineBatch` workflow metadata: + +1. Get metadata from Cromwshell. + + ```shell + cromshell -t60 metadata 38c65ca4-2a07-4805-86b6-214696075fef > metadata.json + ``` + +2. Run the script. + + ```shell + python scripts/inputs/create_test_batch.py \ + --execution-bucket gs://my-exec-bucket \ + --final-workflow-outputs-dir gs://my-outputs-bucket \ + metadata.json \ + > inputs/values/my_ref_panel.json + ``` + +3. Define your google project id (for Cromwell inputs) and Terra billing project (for workspace inputs). + + ```shell + echo '{ "google_project_id": "my-google-project-id", "terra_billing_project_id": "my-terra-billing-project" }' > inputs/values/google_cloud.my_project.json + ``` + +4. Build test files for batched workflows (google cloud project id required). + + ```shell + python scripts/inputs/build_inputs.py \ + inputs/values \ + inputs/templates/test \ + inputs/build/my_ref_panel/test \ + -a '{ "test_batch" : "ref_panel_1kg", "cloud_env": "google_cloud.my_project" }' + ``` + +5. Build test files for the single-sample workflow + + ```shell + python scripts/inputs/build_inputs.py \ + inputs/values \ + inputs/templates/test/GATKSVPipelineSingleSample \ + inputs/build/NA19240/test_my_ref_panel \ + -a '{ "single_sample" : "test_single_sample_NA19240", "ref_panel" : "my_ref_panel" }' + ``` + +6. Build files for a Terra workspace. + + ```shell + python scripts/inputs/build_inputs.py \ + inputs/values \ + inputs/templates/terra_workspaces/single_sample \ + inputs/build/NA12878/terra_my_ref_panel \ + -a '{ "single_sample" : "test_single_sample_NA12878", "ref_panel" : "my_ref_panel" }' + ``` + +Note that the inputs to `GATKSVPipelineBatch` may be used as resources +for the reference panel and therefore should also be in a permanent location. diff --git a/website/docs/gs/input_files.md b/website/docs/gs/input_files.md index 423be54ec..38e143869 100644 --- a/website/docs/gs/input_files.md +++ b/website/docs/gs/input_files.md @@ -60,41 +60,3 @@ The following inputs will need to be updated with the transformed sample IDs: - Sample ID list for [GatherSampleEvidence](/docs/modules/gse) or [GatherBatchEvidence](/docs/modules/gbe) - PED file - - -### Generating a reference panel (single-sample mode only) -New reference panels can be generated easily from a single run of the `GATKSVPipelineBatch` workflow. If using a Cromwell server, we recommend copying the outputs to a permanent location by adding the following option to the workflow configuration file: -``` - "final_workflow_outputs_dir" : "gs://my-outputs-bucket", - "use_relative_output_paths": false, -``` -Here is an example of how to generate workflow input jsons from `GATKSVPipelineBatch` workflow metadata: -``` -> cromshell -t60 metadata 38c65ca4-2a07-4805-86b6-214696075fef > metadata.json -> python scripts/inputs/create_test_batch.py \ - --execution-bucket gs://my-exec-bucket \ - --final-workflow-outputs-dir gs://my-outputs-bucket \ - metadata.json \ - > inputs/values/my_ref_panel.json -> # Define your google project id (for Cromwell inputs) and Terra billing project (for workspace inputs) -> echo '{ "google_project_id": "my-google-project-id", "terra_billing_project_id": "my-terra-billing-project" }' > inputs/values/google_cloud.my_project.json -> # Build test files for batched workflows (google cloud project id required) -> python scripts/inputs/build_inputs.py \ - inputs/values \ - inputs/templates/test \ - inputs/build/my_ref_panel/test \ - -a '{ "test_batch" : "ref_panel_1kg", "cloud_env": "google_cloud.my_project" }' -> # Build test files for the single-sample workflow -> python scripts/inputs/build_inputs.py \ - inputs/values \ - inputs/templates/test/GATKSVPipelineSingleSample \ - inputs/build/NA19240/test_my_ref_panel \ - -a '{ "single_sample" : "test_single_sample_NA19240", "ref_panel" : "my_ref_panel" }' -> # Build files for a Terra workspace -> python scripts/inputs/build_inputs.py \ - inputs/values \ - inputs/templates/terra_workspaces/single_sample \ - inputs/build/NA12878/terra_my_ref_panel \ - -a '{ "single_sample" : "test_single_sample_NA12878", "ref_panel" : "my_ref_panel" }' -``` -Note that the inputs to `GATKSVPipelineBatch` may be used as resources for the reference panel and therefore should also be in a permanent location. diff --git a/website/docs/gs/quick_start.md b/website/docs/gs/quick_start.md index 35b27cbcb..282b8e72e 100644 --- a/website/docs/gs/quick_start.md +++ b/website/docs/gs/quick_start.md @@ -21,22 +21,9 @@ demo data on a managed Cromwell server. ### Build Inputs -- Example workflow inputs can be found in `/inputs`. - Build using `scripts/inputs/build_default_inputs.sh`, - which generates input jsons in `/inputs/build`. - -- Some workflows require a Google Cloud Project ID to be defined in - a cloud environment parameter group. Workspace builds require a - Terra billing project ID as well. An example is provided at - `/inputs/values/google_cloud.json` but should not be used, - as modifying this file will cause tracked changes in the repository. - Instead, create a copy in the same directory with the format - `google_cloud.my_project.json` and modify as necessary. - - Note that these inputs are required only when certain data are - located in requester pays buckets. If this does not apply, - users may use placeholder values for the cloud configuration - and simply delete the inputs manually. +We provide options for building example inputs that you may use as a reference +to configure a Terra workspace or Cromwell submissions (advanced) with your own data. +Please refer to [this page](/docs/advanced/build_inputs) for instructions on how to build these inputs. ### MELT Important: The example input files contain MELT inputs that are NOT public