Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APPS-1271 Update improve manifest docs #354

Merged
merged 27 commits into from
Jul 25, 2022
Merged
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
0845cc4
APPS-1271 Update improve manifest docs
Gvaihir Jul 14, 2022
34af2d0
APPS-1271 Update improve manifest docs
Gvaihir Jul 15, 2022
2208d7c
Update doc/ExpertOptions.md
Gvaihir Jul 18, 2022
e0e3092
APPS-1271 Update improve manifest docs
Gvaihir Jul 18, 2022
57706e0
APPS-1271 Update improve manifest docs
Gvaihir Jul 18, 2022
deceb59
APPS-1271 Update improve manifest docs
Gvaihir Jul 18, 2022
8a6d1ed
Update doc/ExpertOptions.md
Gvaihir Jul 21, 2022
51b1f24
Update doc/ExpertOptions.md
Gvaihir Jul 21, 2022
9f7c951
Update doc/ExpertOptions.md
Gvaihir Jul 21, 2022
0b7d5dd
Update doc/ExpertOptions.md
Gvaihir Jul 21, 2022
202d759
Update doc/ExpertOptions.md
Gvaihir Jul 21, 2022
01041f3
Update doc/ExpertOptions.md
Gvaihir Jul 21, 2022
a9c259a
Update doc/ExpertOptions.md
Gvaihir Jul 21, 2022
11a04da
Update doc/ExpertOptions.md
Gvaihir Jul 21, 2022
d40849a
Update doc/ExpertOptions.md
Gvaihir Jul 21, 2022
ee47654
Update doc/ExpertOptions.md
Gvaihir Jul 21, 2022
11179e0
Update doc/ExpertOptions.md
Gvaihir Jul 21, 2022
f49fa1d
Update doc/ExpertOptions.md
emiloslavsky Jul 21, 2022
9992e57
APPS-1271 Update improve manifest docs
Gvaihir Jul 22, 2022
0fbedde
Update doc/ExpertOptions.md
Gvaihir Jul 22, 2022
2c48a14
APPS-1271 Update improve manifest docs
Gvaihir Jul 22, 2022
455de1d
APPS-1271 Update improve manifest docs
Gvaihir Jul 22, 2022
18ee521
APPS-1271 Update improve manifest docs
Gvaihir Jul 22, 2022
85e2ed0
APPS-1271 Update improve manifest docs
Gvaihir Jul 22, 2022
96e9589
APPS-1271 Update improve manifest docs
Gvaihir Jul 22, 2022
4402b17
APPS-1271 Update improve manifest docs
Gvaihir Jul 22, 2022
7006883
Update doc/ExpertOptions.md
Gvaihir Jul 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 65 additions & 38 deletions doc/ExpertOptions.md
Original file line number Diff line number Diff line change
Expand Up @@ -1441,24 +1441,51 @@ For an in depth discussion, please see [Missing Call Arguments](MissingCallArgum

# Manifests

In extreme cases, running compiled workflows can fail due to DNAnexus platform limits on the total size of the input and output JSON documents of a job. An example is a task with many inputs/outputs that is called in scatter over a large collection. In such a case, you can enable manifest support at compile time with the `-useManifests` option. This option causes each generated applet or workflow to accept inputs as a manifest, and to produce outputs as a manifest.

A manifest is a JSON document that contains all the inputs/outputs that would otherwise be passed directly to/from the applet. A manifest can be specified in one of two ways: via a JSON input, or via a File input (where the file must exist on the platform).
In extreme cases, running compiled workflows can fail due to DNAnexus platform limits on the total size of the input and
output JSON documents of a job. An example is a task with many inputs/outputs that is called in scatter over a large collection.
In such a case, you can enable manifest support at compile time with the `-useManifests` option.
This option causes each generated applet or workflow to accept inputs as an array of manifests, and to produce outputs as a single manifest.
Comment on lines +1444 to +1447
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In extreme cases, running compiled workflows can fail due to DNAnexus platform limits on the total size of the input and
output JSON documents of a job. An example is a task with many inputs/outputs that is called in scatter over a large collection.
In such a case, you can enable manifest support at compile time with the `-useManifests` option.
This option causes each generated applet or workflow to accept inputs as an array of manifests, and to produce outputs as a single manifest.
In extreme cases, running compiled workflows can fail due to DNAnexus platform limits on the total size of the input and output JSON documents of a job. An example is a task with many inputs/outputs that is called in scatter over a large collection. In such a case, you can enable manifest support at compile time with the `-useManifests` option. This option causes each generated applet or workflow to accept inputs as an array of manifests, and to produce outputs as a single manifest.

It's a minor thing but it's easier to read (now and in the future) if there are no newlines introduced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below wherever the breakline is added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this new line here looks ugly to show the difference in versions. In the raw source it looks fine and there's no change in the rendered variant. But without this break line - the source looks like one stretched line and you have to scroll sidewise to read it. So I would really insist on having those break lines in the markdown docs from now on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it looks now:

In extreme cases, running compiled workflows can fail due to DNAnexus platform limits on the total size of the input and

This is how the rest of the doc looks without new lines:

When calling a workflow with `dx run`, jobs and analyses launched by this workflow will have their temporary workspaces to store resources and intermediate outputs. By default, when a job or an analysis has transitioned to a terminal state (done, failed, or terminated), its temporary workspace will be destroyed by the system.

To me second example is harder to read compared to my version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please click the links to actually see it in the raw .md


A manifest is a JSON document that contains all the inputs/outputs that would otherwise be passed directly to/from the
workflow stage. A manifest can be specified in one of two ways:
1. A `.json` input file (see [Manifest JSON](#manifest-json)) is the recommended way to provide inputs in the manifest format.
`java -jar dxCompiler.jar -inputs mymanifest.json` will produce `mymanifest.dx.json` that can be passed to `dx run -f mymanifest.dx.json`.
2. A platform `file-xxx` with content described in [Intermediate manifest file inputs and outputs](#intermediate-manifest-file-inputs-and-outputs)
section can be used to pass manifest output from a stage of one workflow (including the `output` stage) as input to another workflow. A
typical use case for this scenario is when a user wants to pass manifest output file from a stage (including `output` stage)
directly to a new workflow. Also, this scenario might be useful when debugging individual stages of a failing workflow.

## Manifest JSON

When manifest support is enabled, each applet has an `input_mainfest___` input field of type `hash`, which means that it accepts a JSON document as a string. For example, given the following workflow:
When manifest support is enabled, applet/workflow outputs which are passed from one stage to another (or to the final output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to inform the user of what the input_manifest___ is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not as simple. It does have this field, as well as a bunch of others, which I could not find the documentation for. APPS-1309 ticket should address that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Thanks!

stage) exist in the form of intermediate manifests. Here we describe the format of intermediate manifest for informational purposes only.
There is no need to use them as your workflow inputs, as the JSON manifest above is the recommended format.
For example, given the following workflow:

```wdl
version 1.1

task t1 {
input {
File f
}
command <<<
echo "t1: " >>out
cat "~{f}" >>out
>>>
output {
File t1_out = "out"
}
}

workflow test {
input {
String s
File f
}
...
call t1 { input: f = f }
output {
Int i
Pair[String, File] p
File wf_out = t1.t1_out
}
}
```
Expand All @@ -1470,62 +1497,62 @@ You would write the following manifest:
{
"test.input_manifest___": {
"s": "hello",
"f": "dx://file-xxx"
"f": "dx://project-aaa:file-xxx"
}
}
```

When you compile the workflow, provide the manifest using the `-inputs` option, and it will be translated to:
Compile the workflow `test` from above with the `-inputs mymanifest.json` option. A new file `mymanifest.dx.json` will be
created with the following content. **NOTE** `mymanifest.dx.json` is created by the compiler - the user does not need to
create/change it manually.


`mymanifest.dx.json`
```json
{
"input_manifest___": {
"s": "hello",
"f": {
"$dnanexus_link": "file-xxx"
}
},
"input_manifest___files": [
{
"$dnanexus_link": "file-xxx"
"encoded": false,
"types": {
"f": "File",
"s": "String"
},
"values": {
"s": "hello",
"f": "dx://project-aaa:file-xxx"
}
]
}
}
```

Finally, run your workflow using the translated input file:
The created `mymanifest.dx.json` should be used as an input file when running the workflow:
```commandline
dx run workflow-yyy -f mymanifest.dx.json
```

`dx run workflow-yyy -f mymanifest.dx.json`

## Manifest file
#### Intermediate manifest file inputs and outputs

Manifest files are less convenient to use as applet/workflow inputs because they must be uploaded to the platform. However, when manifest support is enabled, applet/workflow outputs are in the form of manifest files, so it is useful to understand the format.
When manifest support is enabled, applet/workflow outputs which are passed from one stage to another (or to the final output
stage) exist in the form of intermediate manifests. Here we describe the format of intermediate manifest for informational purposes only.
There is no need to use them as your workflow inputs, as the JSON manifest above is the recommended format.

Given the above workflow, the manifest output would be:
Given the above workflow, the manifest output from the `common` stage to the following stages (not shown) would be:

```json
{
"id": "test",
"encoded": false,
"id": "stage-common",
"values": {
"i": 1,
"p": {
"left": "hello",
"right": {
"$dnanexus_link": "file-xxx"
}
}
"s": "hello",
"f": "dx://project-aaa:file-xxx"
}
}
```

The `id` field is optional but will always be populated in the output manfiests. The manifest may contain additional fields (`types` and `definitions`) that are only for internal use and can be ignored.

To specify a manifest file as input to an applet or workflow, first upload the file to the platform and then pass it as input to the `input_manifest_files___` parameter:

`dx run workflow-yyy -iinput_manifest_files___=file-zzz`

Note that while `input_manifest_files___` is an array, you may only pass a single manifest file as input.
The `id` field represents the ID of the stage which created the manifest output. It is optional but will always be
populated in the output manifests. The manifest may contain additional `types` and `definitions` fields that are only
for internal use and can be ignored. The outputs of the workflow are referenced in the `values` field of the output manifest
in the form of a map, where keys are the names of the workflow outputs from the WDL `output` workflow section.

## Analysis outputs

Expand Down