generated from jhudsl/OTTR_Template
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path03-using-Shiny.Rmd
134 lines (71 loc) · 10.8 KB
/
03-using-Shiny.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
```{r, include = FALSE}
ottrpal::set_knitr_image_path()
```
# Using Shiny to Manage Workflows
Now that you've configured your first Cromwell sever, let's submit some test workflows to it using the Fred Hutch Shiny app!
>Note: Especially the first time you set up a Cromwell server, it will be busy for a few minutes setting up the database and doing all the work behind the scenes for you. Once it's "ready" to listen for workflows it will start "listening" for instructions via the Shiny app (or other methods we'll discuss later in the course). It may take 2-3 minutes before you can follow the rest of these instructions the first time. The time it takes is much shorter in the future (more like ~1 minute).
You can find our Fred Hutch Shiny app here: https://cromwellapp.fredhutch.org
<img src="assets/cromwell/app-front.png" title="The App looks like this." alt="Crowmell app" style="display: block; margin: auto;" />
This shiny app will let you use a graphic interface to submit and manage workflows you've written in WDL.
## Get Test Workflows
Here you'll see a series of sections that will allow you to do several things. In this guide we'll use a number of example workflows found in the `wdl-test-workflows` GitHub repository that can be viewed and cloned from [GitHub](https://github.com/FredHutch/wdl-test-workflows).
Each of these example workflows is in a folder containing a WDL file (specifying the workflow itself), and any input files that you'll need (in JSON format). There is [emerging documentation about the WDL specification itself being generated by the openWDL community here](https://wdl-docs.readthedocs.io/en/1.0.0/). Also, there is some useful, though very detailed, information in the [openWDL GitHub repo for the specification itself where you can learn more](https://github.com/openwdl/wdl/blob/main/versions/development/SPEC.md#introduction).
## Login
While this Shiny app runs all the time, in order for it to know where to look for your particular information, you'll need to to "login" by clicking the "Connect to Server" button on the left.
<img src="assets/cromwell/connect-to-server.png" title="The App looks like this." alt="Crowmell app" style="display: block; margin: auto;" />
When you click "Connect to Server", a box will appear where you will input the node:port combination you were assigned when you [started up your Cromwell server](https://hutchdatascience.org/FH_WDL101_Cromwell/getting-started-with-cromwell.html#kick-off-your-cromwell-server) (it will look something like this: `gizmob5:39071`).
<img src="assets/cromwell/login-box.png" title="Login box will pop up" alt="Crowmell app" style="display: block; margin: auto;" />
If your server is not ready to listen for workflows you may see this error:
<img src="assets/cromwell/invalid-server.png" title="The App can't talk to your server yet, try again in a minute." alt="Crowmell app" style="display: block; margin: auto;" />
If so, just wait 1-2 more minutes (if it's the first time you've set up a server, or less if it's a future instance) and try again. Once the Shiny app can talk to your sever, you'll see this result screen:
<img src="assets/cromwell/valid-server.png" title="The App can talk to your server!" alt="Crowmell app" style="display: block; margin: auto;" />
## Submit Jobs Tab
Once you've connected your server to the Shiny app, you can start by using the "Submit Jobs" tab on the left.
<img src="assets/cromwell/submit-tab.png" title="The submit jobs tab." alt="Crowmell app" style="display: block; margin: auto;" />
### Validate a workflow
This checks your workflow files (wdl / jsons) to test:
- are they in a known format that Cromwell can interpret?
- are they formatted properly?
- are the tasks wired up correctly?
This is called a "dry run".
Note that this does NOT test whether your input files are actually available, partly because Cromwell can pull files from local filesystems, AWS S3, Google buckets and Azure blobs. The process of testing input availability will only happen when you run the workflow for the first time. If some input files are missing, Cromwell will run tasks for the input files that ARE available, skipping tasks where inputs can't be found.
<img src="assets/cromwell/validate.png" title="Validate a workflow" alt="Crowmell app" style="display: block; margin: auto;" />
### Submit a workflow
This will let you upload the files that contain your workflow description (a WDL), and up to two different sets of input lists (in JSON format). You can run a workflow with no input JSON, one input JSON, or two input JSONs (which will be concatenated or the second will overwrite the first if the same variable is declared in both). You can upload a workflow options JSON, as well as providing text labels of your choosing to workflows if you'd like.
<img src="assets/cromwell/submit.png" title="Submit a workflow" alt="Crowmell app" style="display: block; margin: auto;" />
When you click that "Submit Workflow" button, you'll see confirmation in a new box that appears with the workflow submission ID and status. These IDs are long strings that look something like this: `4e7e244a-d6b1-41db-a324-45229ff34b00` and they're useful if, for example, you want to abort a workflow, or identify it in the "Track jobs" tab. This workflow id string is unique to an individual workflow run, so if you run the same workflow a second time, you'll get a different string. This means that this unique identifier string can be used to help understand the data source file(s) used to generate each set of results files, helping make your work reproducible.
## Track Jobs Tab
Once you've submitted a workflow, you'll want to track how it's going in the Track Jobs tab.
### History of workflows
At the top, you'll see that you can display as many days of workflow history as you'd like, filter that result for workflows with a specific name or with specific status(es) like 'failed', 'succeeded', etc. This can help if you have submitted a LOT of workflows and you don't want to see them all, or if the Cromwell server is still busy working through all of your submissions and recording their status.
<img src="assets/cromwell/cromwell-app.png" title="Top of the Track Jobs tab" alt="Crowmell app" style="display: block; margin: auto;" />
Once you click "Update View", the relevant workflows will be returned and you'll see various information on those workflows.
First, there's a "Workflow Timing" plot, showing how long each workflow ran for, and status for each.
<img src="assets/cromwell/workflows-run.png" title="Workflow plot" alt="Crowmell app" style="display: block; margin: auto;" />
Underneath, you'll see a "Workflows Run" table showing metadata for each workflow. Click on the workflow you're interested in to populate the rest of the tables (below).
<img src="assets/cromwell/cromwell-overview.png" title="Workflow table" alt="Crowmell app" style="display: block; margin: auto;" />
### Diving into a Workflow
Once you've selected a workflow row, you'll see some summary information about that workflow.
<img src="assets/cromwell/workflow-overview.png" title="Workflow overview summary" alt="overview table for a workflow" style="display: block; margin: auto;" />
You can see a plot of the timing and outcomes of all the calls in that workflow.
<img src="assets/cromwell/workflow-calls.png" title="Workflow calls" alt="workflow calls table" style="display: block; margin: auto;" />
### Call Level Information
Then there is a table of each call containing useful information such as the directory where the job is working (callRoot), its SLURM job ID, what computing resources or software environment were used, and the job's status.
<img src="assets/cromwell/job-list.png" title="Job list" alt="job list" style="display: block; margin: auto;" />
Then you can use the Job Failures and Call Caching tables to retrieve information relevant to those processes by clicking the "Get/Refresh ... Metadata" buttons (sometimes for complex workflows these can be quite large, and thus they do not load until you want them).
<img src="assets/cromwell/failures.png" title="Job failures" alt="call caching" style="display: block; margin: auto;" />
<img src="assets/cromwell/caching.png" title="Job call caching" alt="call caching" style="display: block; margin: auto;" />
Finally, once a workflow's outputs have all been created successfully, Cromwell can tell you (and this Shiny app can help you download) a table showing where to find the workflow outputs (note this is not every file created, only the ones you specify as "results" using the WDL file's "workflow output" block). This lets you find output files and interact with them, archive them, or otherwise copy them to longer term storage for use.
<img src="assets/cromwell/workflow-outputs.png" title="Workflow outputs" alt="outputs" style="display: block; margin: auto;" />
## Troubleshoot Tab
Finally, there is the Troubleshoot tab. Here you can do things like Abort running workflows or get a complete metadata output for the entire workflow to parse yourself to try to find what's happening with your workflow.
<img src="assets/cromwell/abort-troubleshoot.png" title="The troubleshoot jobs tab." alt="Troubleshoot" style="display: block; margin: auto;" />
### Abort a workflow
Sometimes you realize you want to kill a workflow. Using the workflow submission id, you can kill specific workflows using this box. Note it will take Cromwell some time to coordinate SLURM job cancellations particularly for complex workflows, but it will clean everything up for you.
### Troubleshoot a workflow
Especially in the beginning if you have catastrophic workflow failures and you can't even figure out what's going on, you can come back to this Troubleshoot box to retrieve the entire, unformatted JSON output of all metadata Cromwell has about your workflow. You probably are better served by the "Track Jobs" tab for checking how your workflow is going, but if there's nothing there that's helpful, then this box is where you'll want to go.
> Note: this output is not for the faint of heart, but it will give you hints once you get used to understanding what Cromwell is telling you.
## Run Test Workflows
Now that you know how to use the app, it's time to run a test workflow.
We have curated some basic workflows that you can use to test whether your Cromwell server is set up correctly and to let you to play with Cromwell. Once your server is up, run through the examples in our [Test Workflow Repo](https://github.com/FredHutch/wdl-test-workflows).
> Note: For test workflows that use Docker containers, the first time you run them you may notice that jobs aren't being sent very quickly. That is because for our cluster, we need to convert those Docker containers to something that can be run by Singularity. The first time a Docker container is used, it must be converted, but in the future Cromwell will used the cached version of the Docker container and jobs will be submitted more quickly.