Skip to content

Commit

Permalink
Update docs with version number.
Browse files Browse the repository at this point in the history
  • Loading branch information
williamrowell committed Feb 11, 2025
1 parent f0bbecb commit 5fc0307
Show file tree
Hide file tree
Showing 3 changed files with 158 additions and 4 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,18 +24,18 @@ Both workflows are designed to analyze human PacBio whole genome sequencing (WGS

This is an actively developed workflow with multiple versioned releases, and we make use of git submodules for common tasks that are shared by multiple workflows. There are two ways to ensure you are using a supported release of the workflow and ensure that the submodules are correctly initialized:

1) Download the release zips directly from a [supported release](https://github.com/PacificBiosciences/HiFi-human-WGS-WDL/releases/tag/v2.1.0):
1) Download the release zips directly from a [supported release](https://github.com/PacificBiosciences/HiFi-human-WGS-WDL/releases/tag/v2.1.1):

```bash
wget https://github.com/PacificBiosciences/HiFi-human-WGS-WDL/releases/download/v2.1.0/hifi-human-wgs-singleton.zip
wget https://github.com/PacificBiosciences/HiFi-human-WGS-WDL/releases/download/v2.1.0/hifi-human-wgs-family.zip
wget https://github.com/PacificBiosciences/HiFi-human-WGS-WDL/releases/download/v2.1.1/hifi-human-wgs-singleton.zip
wget https://github.com/PacificBiosciences/HiFi-human-WGS-WDL/releases/download/v2.1.1/hifi-human-wgs-family.zip
```

2) Clone the repository and initialize the submodules:

```bash
git clone \
--depth 1 --branch v2.1.0 \
--depth 1 --branch v2.1.1 \
--recursive \
https://github.com/PacificBiosciences/HiFi-human-WGS-WDL.git
```
Expand Down
145 changes: 145 additions & 0 deletions docs/backend-dnanexus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# PacBio Human WGS Variant Pipeline on DNAnexus

The PacBio Human WGS Variant Pipeline is an analysis workflow for PacBio HiFi human whole genome sequencing data, with joint calling for related samples.

Templates and instructions for how to submit the `family` input on the DNAnexus platform are provided in the [Example JSON Documents](#example-json-documents) and [Submitting to DNAnexus](#submitting-to-dnanexus) sections below.

## Inputs

The workflow has the following inputs:

| Type | Name | Description | Notes |
| ---- | ---- | ----------- | ----- |
| Struct | family | Family struct describing samples, relationships, and unaligned BAM paths. | See below for more information |
| String | phenotypes | [Human Phenotype Ontology](https://hpo.jax.org/) (HPO) phenotypes associated with the affected proband. | For example, if the proband has seizures and hypotonia, then the `phenotypes` string might be `"HP:0001250,HP:0001252"`. |
| File | trgt_tandem_repeat_bed | BED file containing repeat coordinates and information about the repeat structure | The default file should be sufficient for most use cases |
| Integer | glnexus_mem_gb | Override GLnexus memory request (GB) | Optional. Should only be specified if GLnexus step fails. |
| Integer | pbsv_call_mem_gb | Override PBSV call memory request (GB) | Optional. Should only be specified if PBSV step fails. |
| Boolean | run_tertiary | Whether to run tertiary analysis for small variants and structural variants | Default: `true` |

## Family Struct Syntax

The `Family` input for the HiFi-human-WGS-WDL workflow is a JSON document that contains the samples for the family. The same struct is used for a single sample or trio, with the single sample case having only one entry in the `samples` array.

### Structs and Field Descriptions

#### Family Struct

The `Family` struct contains the samples for the family. The struct has the following fields:

| Type | Name | Description | Notes |
| ---- | ---- | ----------- | ----- |
| String | family_id | Unique identifier for the family | Alphanumeric characters, periods, dashes, and underscores are allowed. |
| Array\[[Sample](https://github.com/PacificBiosciences/HiFi-human-WGS-WDL/blob/main/workflows/humanwgs_structs.wdl#L3)\] | samples | Sample struct with sample specific data and metadata. | [below](#sample-struct) |

#### Sample Struct

A `Sample` struct contains sample specific data and metadata. The struct has the following fields:

| Type | Name | Description | Notes |
| ---- | ---- | ----------- | ----- |
| String | sample_id | Unique identifier for the sample | Alphanumeric characters, periods, dashes, and underscores are allowed. |
| String? | sex | Sample sex<br/>`["MALE", "FEMALE"]` | Optional field used by HiFiCNV and TRGT for genotyping. Allosome karyotype will default to XX unless sex is specified as `"MALE"`. Used for tertiary analysis X-linked inheritance filtering. |
| Boolean | affected | Affected status | If set to `true`, sample is described as being affected by all HPO terms in `phenotypes`.<br/>If set to `false`, sample is described as not being affected by all HPO terms in `phenotypes`. |
| Array\[File\] | hifi_reads | Array of [DNAnexus links](https://documentation.dnanexus.com/user/projects/path-resolution#dnanexus-links) for HiFi reads in unaligned BAM format. | |
| String? | father_id | sample_id of father (optional) | |
| String? | mother_id | sample_id of mother (optional) | |

### Example JSON Documents

#### Specifying HiFi BAM inputs

When specifying DNAnexus `hifi_reads` files, format them as `{"$dnanexus_link": {"id": "file-xxxx", "project": "project-xxxx"}}` as is in the examples below. One or more `hifi_reads` files can be specified for each sample. The single sample example has multiple files, and the trio example has one file per sample.

#### Example Single Sample JSON

In this example, the optional `sex` field is not specified, so tools will default to XX for the allosome karyotype.

```json
{
"family_id": "EXAMPLE-singleton",
"samples": [
{
"sample_id": "EXAMPLE",
"hifi_reads": [
{
"$dnanexus_link": {
"id": "file-xxxx",
"project": "project-xxxx"
}
},
{
"$dnanexus_link": {
"id": "file-xxxx",
"project": "project-xxxx"
}
}
],
"affected": true
}
]
}
```

#### Example Trio JSON

```json
{
"family": {
"family_id": "AJTRIO",
"samples": [
{
"sample_id": "HG002",
"hifi_reads": [
{
"$dnanexus_link": {
"id": "file-xxxx",
"project": "project-xxxx"
}
}
],
"affected": true,
"sex": "MALE",
"father_id": "HG003",
"mother_id": "HG004"
},
{
"sample_id": "HG003",
"hifi_reads": [
{
"$dnanexus_link": {
"id": "file-xxxx",
"project": "project-xxxx"
}
}
],
"affected": false,
"sex": "MALE"
},
{
"sample_id": "HG004",
"hifi_reads": [
{
"$dnanexus_link": {
"id": "file-xxxx",
"project": "project-xxxx"
}
}
],
"affected": false,
"sex": "FEMALE"
}
]
}
}
```

### Submitting to DNAnexus

When submitting files to DNAnexus, start by manually writing the JSON document for your job. The examples above can serve as a starting point. Validating the JSON for correctness can be done with an online validator such as [JSONLint](https://jsonlint.com/) or [JSONChecker](https://jsonchecker.com/).

If the job is being submitted via the DNAnexus CLI using an inputs JSON file ([documentation here](https://documentation.dnanexus.com/user/running-apps-and-workflows/running-apps-and-applets#from-the-cli)), the JSON document should be included as the value for the `family` input parameter.

If the job is being submitted via the DNAnexus web interface, there will be two input fields for the `family` input parameter: a file array entry and a text entry. Paste the JSON document into the `family` text input field. It is safe to paste directly in as a multi-line file as in the examples above. In the file array entry, select all the read files that are included in the `hifi_reads` section of all included samples. For the singleton example above, the two read files for the sample would be selected. For the trio example above, the three read files for the three samples would be selected.

If read files included in the `family` JSON document are not selected, the workflow will fail with the error message `keys (TreeSet(affected, hifi_reads, sample_id)) have members that do not appear in struct Family`.
9 changes: 9 additions & 0 deletions docs/backend-hpc.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,25 @@ An [example miniwdl.cfg file](https://github.com/PacificBiosciences/HiFi-human-W

> [!IMPORTANT]
> In order to simplify workflow inputs, we make use of `map` files to specify the input data. This allows for a more concise input file, but requires changing a miniwdl configuration option to allow workflows to access files that are not expressly supplied with workflow inputs. To enable this, add the following line to your `miniwdl.cfg` file:
>
> ```ini
> [file_io]
> allow_any_input = true
> ```
>
> This option is already included in the example miniwdl.cfg file described in this section.
## Installing and configuring `Cromwell`
Cromwell supports a number of different HPC backends; see [Cromwell's documentation](https://cromwell.readthedocs.io/en/stable/backends/HPC/) for more information on configuring each of the backends. Cromwell can be used in a standalone "run" mode, or in "server" mode to allow for multiple users to submit workflows. In the example below, we provide example commands for running Cromwell in "run" mode.
> [!NOTE]
> If running Cromwell on an HPC cluster using NFS for storage, you may encounter issues with NFS latency, which can cause Cromwell to fail to read files from the filesystem. You can work around this by using `script-epilogue` to add a delay & sync to the end of each job. This option is added to your backend provider config.
>
> ```bash
> script-epilogue = "sleep 60 && sync"
> ```
## Running the workflow
### Filling out workflow inputs
Expand Down

0 comments on commit 5fc0307

Please sign in to comment.