-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Workflow Run RO-crate format #39
base: master
Are you sure you want to change the base?
Conversation
add encodingFormat for nextflow.config
feat: add wrroc to valid formats
* fx: make getIntermediateOutputFiles work again * Fix bugs fixes #16 fixes #17 --------- Co-authored-by: fbartusch <[email protected]>
* fix #7 Signed-off-by: fbartusch <[email protected]>
* feat: add README to create * feat: ignore vscode * fix: make getIntermediateOutputFiles work again (#18) (#19) * fx: make getIntermediateOutputFiles work again * Fix bugs fixes #16 fixes #17 --------- Co-authored-by: fbartusch <[email protected]> * feat: add README to json * feat: check first if readme exists * Add readme to hasPart Signed-off-by: fbartusch <[email protected]> --------- Signed-off-by: fbartusch <[email protected]> Co-authored-by: fbartusch <[email protected]>
* Add getEncodingFormat function that return the encoding format for a file * handle YAML files manually Signed-off-by: fbartusch <[email protected]>
* implements #1 Signed-off-by: fbartusch <[email protected]>
Iss7 directory type
* start with metaYaml imports * merge dev-wrroc into metaYaml (#23) * add encodingFormat for nextflow.config * add encodingFormat for main.nf * feat: add wrroc to valid formats * fix: make getIntermediateOutputFiles work again (#18) * fx: make getIntermediateOutputFiles work again * Fix bugs fixes #16 fixes #17 --------- Co-authored-by: fbartusch <[email protected]> * feat: add README to crate (#14) * feat: add README to create * feat: ignore vscode * fix: make getIntermediateOutputFiles work again (#18) (#19) * fx: make getIntermediateOutputFiles work again * Fix bugs fixes #16 fixes #17 --------- Co-authored-by: fbartusch <[email protected]> * feat: add README to json * feat: check first if readme exists * Add readme to hasPart Signed-off-by: fbartusch <[email protected]> --------- Signed-off-by: fbartusch <[email protected]> Co-authored-by: fbartusch <[email protected]> --------- Signed-off-by: fbartusch <[email protected]> Co-authored-by: fbartusch <[email protected]> * WIP * only add from meta if meta exists * remove usage from ext args * add module name to id --------- Signed-off-by: fbartusch <[email protected]> Co-authored-by: fbartusch <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
ro-crate-metadata.json |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
Found some new problems with copying input files to the crate:
|
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
@fbartusch thanks for the comments, I have fixed those first two issues @simleo is there another entity type that might be appropriate for these intermediate outputs? I tried a few based on what is allowed for |
I think
Agree. |
I ran Famke's pipeline again with the latest version of the plugin and I could not find these items anywhere in the RO-Crate metadata. |
@bentsherman I spoke to @stain (co-lead of RO-Crate) about this. First, he clarified that entities of type Second, the Five Safes Crate profile has some guidance on representing files that are referenced but not actually in the crate (focused on the sensitive data reasons). This suggests using type DigitalDocument. I think I overlooked this recommendation when making the crate in my previous comment. |
@simleo did you download the pipeline manually and run it from the local path? I think their README suggests this, but the best practice here is to run directly from the canonical repo: nextflow run famosa/wrrocmetatest # ... See my comments here. This is the only way for Nextflow to know the repo URL and commit hash, and add it to the crate. |
@elichad thanks for the suggestion. Since CreativeWork is valid and DigitalDocument seems to be recommended mainly for sensitive data, I'm inclined to leave it as CreativeWork for now, and perhaps go back to File and Dataset once the validator bug is fixed. |
@famosab @fbartusch I think we are just about ready to merge. I have tested with your test pipeline, but if you'd like to give it one more round of testing with any other pipelines, if everything looks good from your side, I think we can merge in the next few days |
@bentsherman thanks for the tip: I ran the workflow as {
"@id": "./",
"@type": "Dataset",
"hasPart": [
...
{
"@id": null
},
...
],
...
},
...
{
"@id": null,
"@type": "CreativeWork"
} In such cases, no entity at all should be added to the crate instead. |
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
I'm currently running tests against nf-core pipelines with some scripts I wrote for testing plugins. |
I've tried again (with the current version of the plugin) to run Famke's pipeline locally:
with local files in
The resulting crate is not even readable by ro-crate-py because it has absolute ids in it (
I see three possible ways to fix this:
However, with options 2 and 3 the crate consumer has no way to reconstruct the two input files. |
Most of the nf-core pipelines are currently failing with the plugin :(
I'm checking now why it fails and use the nf-core bamtofastq pipeline, as it is the fastest (and most simple?) pipeline that fails. That should make debugging easier. |
@simleo this is why I recommend using the original HTTP URLs:
But it is unavoidable that some users will be using local input files and we'll need to handle that gracefully. As a first iteration I'm inclined to warn about such input files and maybe make them CreativeWork if they aren't included in the crate. I will try a few things |
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
@simleo I ended up taking the absolute URI approach. That made the resulting crate valid. We can encourage the use of remote URIs as a best practice. In summary, only input files that are (1) specified directly by a param, (2) local, and (3) not a directory, will be copied into the crate. All of these restrictions are designed to prevent explosive data transfers from directories, remote data, and file globs. @fbartusch I ran bamtofastq with test profile and it succeeded. Let me know how the rest of your tests go with the latest revision |
@bentsherman bamtofastq looks indeed good and the validator is happy. I'm running now the tests for the other nf-core pipelines. |
@bentsherman Only one pipeline out of 42 I ran fails because of the plugin:
But all others didn't pass the validator (I used the latest commit fa8c6c7, not the PyPI release). I think this is the validator version with the least number of remaining bugs, right @simleo ? Although the list looks very long at first glance these seem to be just corner cases.
All of these messages relate to files in the temporary directory
Example: Thanks to the saved effective
Also an edge case in handling null parameter values?
Example: It looks like this:
The effective configuration during runtime is: One last thing regarding the license. |
That's the current development version, so good choice 👍
Workflow RO-Crate says:
where the first appearance of "Crate" here means the root data entity. See also Licensing, Access control and copyright. |
We worked on a first version of the plugin which is able to render valid RO-crates for any workflow run.
Happy to receive feedback to get this finished up :)
Continues #19 and #33.