Skip to content

Commit

Permalink
Add alt text
Browse files Browse the repository at this point in the history
  • Loading branch information
avahoffman committed Aug 29, 2023
1 parent a0a8863 commit 18b877a
Showing 1 changed file with 19 additions and 21 deletions.
40 changes: 19 additions & 21 deletions 02-workflow.Rmd
Original file line number Diff line number Diff line change
@@ -1,38 +1,36 @@
# Bringing the WDL Workflow to AnVIL

Our first step is locating the workflow we want to use. Our goal is to create a subset version of existing genomic data files.

## Find the Workflow on Dockstore

Navigate to the WORKFLOWS tab of your workspace. Select "Find a Workflow".

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "The WORKFLOWS tab of AnVIL workspace is selected, with the Find a Workflow button highlighted.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_0")
```

When the dialog box opens, select "Dockstore" on the bottom left. This will take you to the [Dockstore search page](https://dockstore.org/search?descriptorType=WDL&entryType=workflows&searchMode=files).

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "The Suggested Workflows popup. The option to find additional workflows on Dockstore is highlighted.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_5")
```

Enter `fastq_subsample` in the search box. Select the workflow "fhdsl/AnVIL_WDLs/fastq_subsample".

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "On Dockstore, fastq_subsample has been entered in the search box. There is one result, for a WDL workflow called fastq_subsample.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_21")
```

## Send the Workflow to AnVIL

On the right side "Launch with" menu, select "AnVIL". This will open a new tab.

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "The AnVIL button on the Launch with menu is highlighted.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_26")
```

Select the appropriate workspace where you want the workflow to appear. Select "IMPORT".

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "On an AnVIL themed page, the user has selected the epigenetics-introduction workspace and the IMPORT button is highlighted.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_31")
```

Expand All @@ -44,7 +42,7 @@ You should now be back on AnVIL!

Use the "SELECT DATA" button to select the samples (rows) you want to subset. You can select all or some samples.

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "The workflow setup page on AnVIL has the option to SELECT DATA highlighted.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_45")
```

Expand All @@ -60,7 +58,7 @@ For paired-end sequencing, the `fastqgz_file_read_1` input is the first of two r

In this example, the column with the fastq file link is called "read1". It will look like "this.read1" under "Attribute".

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "The workflow inputs tab shows 4 possible inputs. The first is called 'fastqgz_file_read_1' and should be of the type 'File'. The entry 'this.read' is highlighted from the Attribute dropdown menu.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_50")
```

Expand All @@ -72,7 +70,7 @@ Select additional inputs.

- *Optional*: Indicate how many reads you want in your subsample file. In this example, we wanted 20,000 reads. (Default: 10,000)

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "The remaining 3 workflow inputs have been populated with Attributes as follows: sample_id is this.sample_id; fastqgz_file_read_2 is this.read2; and n is 20000.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_55")
```

Expand All @@ -82,13 +80,13 @@ Workflow outputs are written to a Google Bucket. Setting up the workflow outputs

Select the "OUTPUTS" tab. Select "Use defaults" to use the default output column naming schema.

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "On the outputs tab of the workflow setup is highlighted. Use defaults option is highlighted. The two outputs, read1_subsample and read2_subsample are set to the default values under attributes. These are 'this.read1_subsample' and 'this.read2_subsample'.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_60")
```

Click "SAVE".

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "The SAVE button for workflow inputs and outputs is highlighted.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_66")
```

Expand All @@ -98,23 +96,23 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIdit

Once you have saved the inputs and outputs, click on RUN ANALYSIS.

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "The RUN ANALYSIS button is highlighted.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_76")
```

## Monitor the Run

Navigate to the JOB HISTORY tab. You can see your most recent submissions in table form. Click on the most recent submission. Notice how each sample gets its own job. This keeps the whole process speedy!

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "The page reflects submission IDs and time submitted. Several runs have launched for each sample. The workflow status is 'Running' for all 8 samples.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_81")
```

## Inspect the Run Results

You should see the status change to "Succeeded" if everything completed correctly.

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "Workflow status is now succeeded for all 8 samples, with a check mark.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_86")
```

Expand All @@ -126,39 +124,39 @@ After 24 hours, you can also see the costs associated with each run under "Run C

It can be helpful to look at intermediate files on Google Cloud Platform, especially if runs did not complete successfully. You can view these files by clicking on the folder icon for "Execution directory".

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "The folder icon indicating execution directory besides the workflow job for one sample is highlighted.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_92")
```

For each run, you can see a number of associated files, including the output `.fq` files and log files.

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "Files for the workflow on GCP are shown. There are two fastq files that were created as well as 3 log files that are relevant for debugging, including sample_file.log, stderr, and stdout.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_97")
```

Click on `stdout` and/or `stderr` and "DOWNLOAD" to view the terminal output.

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "The DOWNLOAD button is highlighted.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_109")
```

Here is what your output in `stdout` might look like.

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "Terminal monospace text shows the following: Creating a subsampled file with 20000 lines. Created SRR10152993_1_subsample.fq. Subsampling paired-end reads for SRR10152993. Created SRR10152993_2_subsample.fq.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_125")
```

## Confirming Results in the DATA tab

We can see that the subsample files have been linked in the DATA table "sample".

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "On the DATA tab of the workspace, a new column has been created called read1_subsample. There are 8 files corresponding to each original file.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_139")
```

In our example, we have a mixture of single-end and paired-end reads, so only the paired-end samples get a second file.

```{r, fig.align='center', echo = FALSE, fig.alt= "", out.width = '100%'}
```{r, fig.align='center', echo = FALSE, fig.alt= "On the DATA tab of the workspace, a new column has been created called read2_subsample. There are 4 files corresponding to each original file that had paired-end reads.", out.width = '100%'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1zUiupDXMZlZEIditsN5igggOim9qIrR9Ckf4N8fSAB8/edit#slide=id.g2794677b0ab_0_144")
```

Expand Down

0 comments on commit 18b877a

Please sign in to comment.