Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Workflow Run RO-crate format #39

Open
wants to merge 53 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
88690fc
status from PR #33
famosab Nov 18, 2024
e541606
add encodingFormat for nextflow.config
famosab Nov 18, 2024
9889a69
add encodingFormat for main.nf
famosab Nov 18, 2024
f228acf
Merge pull request #10 from famosab/encode
famosab Dec 3, 2024
91fc7e2
feat: add wrroc to valid formats
famosab Dec 3, 2024
6dadcaa
Merge pull request #13 from famosab/obs
famosab Dec 3, 2024
1416833
fix: make getIntermediateOutputFiles work again (#18)
famosab Dec 6, 2024
416d920
Check in input and output if file or directory
fbartusch Dec 8, 2024
7560d4c
feat: add README to crate (#14)
famosab Dec 9, 2024
e6e3844
Set correct MIME types
fbartusch Dec 9, 2024
0d3fd2d
Add contactPoint for agent and organization (#21)
fbartusch Dec 13, 2024
816cf17
Fix #4 (#22)
fbartusch Dec 13, 2024
9aa6762
Merge branch 'dev-wrroc' into iss7-directoryType
fbartusch Dec 17, 2024
07f7ceb
Merge pull request #20 from famosab/iss7-directoryType
fbartusch Dec 17, 2024
7f0264f
start with metaYaml imports (#12)
famosab Dec 18, 2024
daf9725
add information to README
famosab Dec 18, 2024
f271d91
cleanup
bentsherman Jan 10, 2025
70a5f51
Set root crate dir to parent of wrroc path
bentsherman Jan 14, 2025
8f060ca
cleanup
bentsherman Jan 14, 2025
98d2cc4
Add helper functions
bentsherman Jan 14, 2025
4410cba
Don't copy intermediate files into crate, normalize inputs against pr…
bentsherman Jan 14, 2025
9a9a816
Fix resolved config
bentsherman Jan 14, 2025
4220097
Add CreateAction's for publishing outputs
bentsherman Jan 14, 2025
3e0b70f
minor fix
bentsherman Jan 14, 2025
d1ba473
Improve ids for modules, processes, tools, replace main script with r…
bentsherman Jan 14, 2025
b24737f
cleanup
bentsherman Jan 14, 2025
ce8235e
cleanup config parsing
bentsherman Jan 14, 2025
42da1dd
cleanup
bentsherman Jan 14, 2025
0ffea5d
Improve canonical ids
bentsherman Jan 14, 2025
fe5d4c0
Use parameter schema to populate formal parameters, copy input files
bentsherman Jan 14, 2025
90f9cbf
Exclude null property values
bentsherman Jan 16, 2025
34b4d3a
Include permalink to main script
bentsherman Jan 16, 2025
3ff0d27
Add cases for lists and maps for formal parameters
bentsherman Jan 16, 2025
0f51061
Don't download remote input files into crate
bentsherman Jan 16, 2025
7888bbc
Include main script to satisfy WRROC requirement
bentsherman Jan 16, 2025
0d48ce0
Replace "Directory" -> "Dataset"
bentsherman Jan 16, 2025
4b5161e
Exclude (and warn about) published files outside of crate directory
bentsherman Jan 17, 2025
1c9e9f8
Fix tool description, cleanup
bentsherman Jan 17, 2025
310b115
Add param input files to dataset parts
bentsherman Jan 17, 2025
9a4ff8a
Improve canonical ids for tasks and task outputs
bentsherman Jan 17, 2025
85c089e
Fix validation issues
bentsherman Jan 17, 2025
9972771
Separate staged input files from workflow inputs
bentsherman Jan 17, 2025
3c4aa37
Update docs, fix issues with wrroc config options
bentsherman Jan 17, 2025
5c2b3f1
Fix null reference error
bentsherman Jan 17, 2025
8bf3c9c
Don't copy directories specified by params into crate
bentsherman Jan 17, 2025
f6831dc
minor edits
bentsherman Jan 17, 2025
3c33bc9
Improve entity ids
bentsherman Jan 17, 2025
6f0ffed
Make intermediate outputs into contextual entities (CreativeWork)
bentsherman Jan 17, 2025
8b13a59
Exclude license entity if it is not specified
bentsherman Jan 22, 2025
98b3639
Fix null reference error when agent is not specified
bentsherman Jan 22, 2025
2478427
Add warning if pipeline repo URL can't be determined
bentsherman Jan 23, 2025
f303941
Use heuristic to identify original definition of task processor
bentsherman Jan 27, 2025
9ee072b
Encode missing input files as absolute URIs
bentsherman Jan 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 16 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ prov {
}
```

Finally, run your Nextflow pipeline. You do not need to modify your pipeline script in order to use the `nf-prov` plugin. The plugin will automatically generate a JSON file with provenance information.
Finally, run your Nextflow pipeline. You do not need to modify your pipeline script in order to use the `nf-prov` plugin. The plugin will automatically produce the specified provenance reports at the end of the workflow run.

## Configuration

Expand All @@ -44,14 +44,16 @@ Create the provenance report (default: `true` if plugin is loaded).

Configuration scope for the desired output formats. The following formats are available:

- `bco`: Render a [BioCompute Object](https://biocomputeobject.org/). Supports the `file` and `overwrite` options.

*New in version 1.3.0*: additional "pass-through" options are available for BCO fields that can't be inferred from the pipeline. See [BCO.md](./BCO.md) for more information.
- `bco`: Render a [BioCompute Object](https://biocomputeobject.org/). Supports the `file` and `overwrite` options. See [BCO.md](./BCO.md) for more information about the additional config options for BCO.

- `dag`: Render the task graph as a Mermaid diagram embedded in an HTML document. Supports the `file` and `overwrite` options.

- `legacy`: Render the legacy format originally defined in this plugin (default). Supports the `file` and `overwrite` options.

*New in version 1.4.0*

- `wrroc`: Render a [Workflow Run RO-Crate](https://www.researchobject.org/workflow-run-crate/). Includes all three profiles (Process, Workflow, and Provenance). See [WRROC.md](./WRROC.md) for more information about the additional config options for WRROC.

Any number of formats can be specified, for example:

```groovy
Expand All @@ -69,6 +71,8 @@ prov {
}
```

See [nextflow.config](./nextflow.config) for a full example of each provenance format.

`prov.patterns`

List of file patterns to include in the provenance report, from the set of published files. By default, all published files are included.
Expand Down Expand Up @@ -114,16 +118,16 @@ Following these step to package, upload and publish the plugin:

2. Update the `Plugin-Version` field in the following file with the release version:

```bash
plugins/nf-prov/src/resources/META-INF/MANIFEST.MF
```
```bash
plugins/nf-prov/src/resources/META-INF/MANIFEST.MF
```

3. Run the following command to package and upload the plugin in the GitHub project releases page:

```bash
./gradlew :plugins:nf-prov:upload
```
```bash
./gradlew :plugins:nf-prov:upload
```

4. Create a pull request against the [nextflow-io/plugins](https://github.com/nextflow-io/plugins/blob/main/plugins.json)
project to make the plugin public accessible to Nextflow app.
4. Create a pull request against the [nextflow-io/plugins](https://github.com/nextflow-io/plugins/blob/main/plugins.json)
project to make the plugin public accessible to Nextflow app.

45 changes: 45 additions & 0 deletions WRROC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Additional WRROC configuration

*New in version 1.4.0*

The `wrroc` format supports additional options to configure certain aspects of the Workflow Run RO-Crate. These fields cannot be inferred automatically from the pipeline or the run, and so must be entered through the config.

The following config options are supported:

- `prov.formats.wrroc.agent.contactType`
- `prov.formats.wrroc.agent.email`
- `prov.formats.wrroc.agent.name`
- `prov.formats.wrroc.agent.orcid`
- `prov.formats.wrroc.agent.phone`
- `prov.formats.wrroc.agent.ror`
- `prov.formats.wrroc.organization.contactType`
- `prov.formats.wrroc.organization.email`
- `prov.formats.wrroc.organization.name`
- `prov.formats.wrroc.organization.phone`
- `prov.formats.wrroc.organization.ror`
- `prov.formats.wrroc.publisher`

Refer to the [WRROC User Guide](https://www.researchobject.org/workflow-run-crate/) for more information about the associated RO-Crate entities.

Here is an example config:

```groovy
prov {
formats {
wrroc {
agent {
name = "John Doe"
orcid = "https://orcid.org/0000-0000-0000-0000"
email = "[email protected]"
phone = "(0)89-99998 000"
contactType = "Researcher"
}
organization {
name = "University of XYZ"
ror = "https://ror.org/000000000"
}
publisher = "https://ror.org/000000000"
}
}
}
```
8 changes: 8 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,13 @@ prov {
file = "${params.outdir}/manifest.json"
overwrite = true
}
wrroc {
file = "${params.outdir}/ro-crate-metadata.json"
overwrite = true
}
}
}

manifest {
license = "https://spdx.org/licenses/Apache-2.0"
}
14 changes: 5 additions & 9 deletions plugins/nf-prov/src/main/nextflow/prov/PathNormalizer.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,6 @@ class PathNormalizer {

private String commitId

private String launchDir

private String projectDir

private String workDir
Expand All @@ -42,14 +40,12 @@ class PathNormalizer {
repository = metadata.repository ? new URL(metadata.repository) : null
commitId = metadata.commitId
projectDir = metadata.projectDir.toUriString()
launchDir = metadata.launchDir.toUriString()
workDir = metadata.workDir.toUriString()
}

/**
* Normalize paths so that local absolute paths become
* relative paths, and local paths derived from remote URLs
* become the URLs.
* Normalize paths against the original remote URL, or
* work directory, where appropriate.
*
* @param path
*/
Expand All @@ -66,9 +62,9 @@ class PathNormalizer {
if( repository && path.startsWith(projectDir) )
return getProjectSourceUrl(path)

// replace launch directory with relative path
if( path.startsWith(launchDir) )
return path.replace(launchDir + '/', '')
// encode local absolute paths as file URLs
if( path.startsWith('/') )
return 'file://' + path

return path
}
Expand Down
20 changes: 20 additions & 0 deletions plugins/nf-prov/src/main/nextflow/prov/ProvHelper.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ package nextflow.prov
import java.nio.file.Path

import groovy.transform.CompileStatic
import nextflow.Session
import nextflow.exception.AbortOperationException
import nextflow.file.FileHelper
import nextflow.processor.TaskRun
Expand Down Expand Up @@ -49,6 +50,25 @@ class ProvHelper {
}
}

/**
* Get the remote file staging directory for a workflow run.
*
* @param session
*/
static Path getStageDir(Session session) {
return session.workDir.resolve("stage-${session.uniqueId}")
}

/**
* Determine whether a task input file was staged into the work directory.
*
* @param source
* @param session
*/
static boolean isStagedInput(Path source, Session session) {
return source.startsWith(getStageDir(session))
}

/**
* Get the list of output files for a task.
*
Expand Down
5 changes: 4 additions & 1 deletion plugins/nf-prov/src/main/nextflow/prov/ProvObserver.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ import nextflow.trace.TraceRecord
@CompileStatic
class ProvObserver implements TraceObserver {

public static final List<String> VALID_FORMATS = ['bco', 'dag', 'legacy']
public static final List<String> VALID_FORMATS = ['bco', 'dag', 'legacy', 'wrroc']

private Session session

Expand Down Expand Up @@ -71,6 +71,9 @@ class ProvObserver implements TraceObserver {
if( name == 'legacy' )
return new LegacyRenderer(opts)

if( name == 'wrroc' )
return new WrrocRenderer(opts)

throw new IllegalArgumentException("Invalid provenance format -- valid formats are ${VALID_FORMATS.join(', ')}")
}

Expand Down
Loading
Loading