Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Workflow Run RO-crate format #39

Merged
merged 58 commits into from
Feb 6, 2025
Merged
Changes from 1 commit
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
88690fc
status from PR #33
famosab Nov 18, 2024
e541606
add encodingFormat for nextflow.config
famosab Nov 18, 2024
9889a69
add encodingFormat for main.nf
famosab Nov 18, 2024
f228acf
Merge pull request #10 from famosab/encode
famosab Dec 3, 2024
91fc7e2
feat: add wrroc to valid formats
famosab Dec 3, 2024
6dadcaa
Merge pull request #13 from famosab/obs
famosab Dec 3, 2024
1416833
fix: make getIntermediateOutputFiles work again (#18)
famosab Dec 6, 2024
416d920
Check in input and output if file or directory
fbartusch Dec 8, 2024
7560d4c
feat: add README to crate (#14)
famosab Dec 9, 2024
e6e3844
Set correct MIME types
fbartusch Dec 9, 2024
0d3fd2d
Add contactPoint for agent and organization (#21)
fbartusch Dec 13, 2024
816cf17
Fix #4 (#22)
fbartusch Dec 13, 2024
9aa6762
Merge branch 'dev-wrroc' into iss7-directoryType
fbartusch Dec 17, 2024
07f7ceb
Merge pull request #20 from famosab/iss7-directoryType
fbartusch Dec 17, 2024
7f0264f
start with metaYaml imports (#12)
famosab Dec 18, 2024
daf9725
add information to README
famosab Dec 18, 2024
f271d91
cleanup
bentsherman Jan 10, 2025
70a5f51
Set root crate dir to parent of wrroc path
bentsherman Jan 14, 2025
8f060ca
cleanup
bentsherman Jan 14, 2025
98d2cc4
Add helper functions
bentsherman Jan 14, 2025
4410cba
Don't copy intermediate files into crate, normalize inputs against pr…
bentsherman Jan 14, 2025
9a9a816
Fix resolved config
bentsherman Jan 14, 2025
4220097
Add CreateAction's for publishing outputs
bentsherman Jan 14, 2025
3e0b70f
minor fix
bentsherman Jan 14, 2025
d1ba473
Improve ids for modules, processes, tools, replace main script with r…
bentsherman Jan 14, 2025
b24737f
cleanup
bentsherman Jan 14, 2025
ce8235e
cleanup config parsing
bentsherman Jan 14, 2025
42da1dd
cleanup
bentsherman Jan 14, 2025
0ffea5d
Improve canonical ids
bentsherman Jan 14, 2025
fe5d4c0
Use parameter schema to populate formal parameters, copy input files
bentsherman Jan 14, 2025
90f9cbf
Exclude null property values
bentsherman Jan 16, 2025
34b4d3a
Include permalink to main script
bentsherman Jan 16, 2025
3ff0d27
Add cases for lists and maps for formal parameters
bentsherman Jan 16, 2025
0f51061
Don't download remote input files into crate
bentsherman Jan 16, 2025
7888bbc
Include main script to satisfy WRROC requirement
bentsherman Jan 16, 2025
0d48ce0
Replace "Directory" -> "Dataset"
bentsherman Jan 16, 2025
4b5161e
Exclude (and warn about) published files outside of crate directory
bentsherman Jan 17, 2025
1c9e9f8
Fix tool description, cleanup
bentsherman Jan 17, 2025
310b115
Add param input files to dataset parts
bentsherman Jan 17, 2025
9a4ff8a
Improve canonical ids for tasks and task outputs
bentsherman Jan 17, 2025
85c089e
Fix validation issues
bentsherman Jan 17, 2025
9972771
Separate staged input files from workflow inputs
bentsherman Jan 17, 2025
3c4aa37
Update docs, fix issues with wrroc config options
bentsherman Jan 17, 2025
5c2b3f1
Fix null reference error
bentsherman Jan 17, 2025
8bf3c9c
Don't copy directories specified by params into crate
bentsherman Jan 17, 2025
f6831dc
minor edits
bentsherman Jan 17, 2025
3c33bc9
Improve entity ids
bentsherman Jan 17, 2025
6f0ffed
Make intermediate outputs into contextual entities (CreativeWork)
bentsherman Jan 17, 2025
8b13a59
Exclude license entity if it is not specified
bentsherman Jan 22, 2025
98b3639
Fix null reference error when agent is not specified
bentsherman Jan 22, 2025
2478427
Add warning if pipeline repo URL can't be determined
bentsherman Jan 23, 2025
f303941
Use heuristic to identify original definition of task processor
bentsherman Jan 27, 2025
9ee072b
Encode missing input files as absolute URIs
bentsherman Jan 27, 2025
ee7ee8c
Separate ro-crate license from pipeline license
bentsherman Feb 3, 2025
0394e92
Normalize durations and memory units as raw numbers
bentsherman Feb 3, 2025
86c74cc
Update warning about unknown parameter type
bentsherman Feb 4, 2025
dd3f0e1
Handle task inputs from work/tmp/
bentsherman Feb 4, 2025
e11e50f
Exclude parameters set to null
bentsherman Feb 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Improve entity ids
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
bentsherman committed Jan 17, 2025
commit 3c33bc92fcad9acc9bdc166682d11f70e9625cb7
88 changes: 47 additions & 41 deletions plugins/nf-prov/src/main/nextflow/prov/WrrocRenderer.groovy
Original file line number Diff line number Diff line change
@@ -94,8 +94,6 @@ class WrrocRenderer implements Renderer {
agent["affiliation"] = ["@id": organization["@id"]]

// create manifest
final softwareApplicationId = metadata.projectName + '#sa'
final organizeActionId = metadata.projectName + '#organize'
final datasetParts = []

// -- license
@@ -125,6 +123,8 @@ class WrrocRenderer implements Renderer {

// -- main script
final mainScriptId = metadata.scriptFile.name
final softwareApplicationId = "${mainScriptId}#software-application"
final organizeActionId = "${mainScriptId}#organize"
metadata.scriptFile.copyTo(crateDir)

// -- parameter schema
@@ -170,7 +170,7 @@ class WrrocRenderer implements Renderer {
log.warn "Could not determine type of parameter `${name}` for Workflow Run RO-Crate"

return withoutNulls([
"@id" : getFormalParameterId(metadata.projectName, name),
"@id" : getFormalParameterId(name),
"@type" : "FormalParameter",
"additionalType": type,
"conformsTo" : ["@id": "https://bioschemas.org/profiles/FormalParameter/1.0-RELEASE"],
@@ -184,15 +184,16 @@ class WrrocRenderer implements Renderer {
final propertyValues = params
.findAll { name, value -> value != null }
.collect { name, value ->
final paramId = getFormalParameterId(name)
final normalized =
(value instanceof List || value instanceof Map) ? JsonOutput.toJson(value)
: value instanceof CharSequence ? normalizePath(value.toString())
: value

return [
"@id" : "#${name}",
"@id" : "${paramId}/value",
"@type" : "PropertyValue",
"exampleOfWork": ["@id": getFormalParameterId(metadata.projectName, name)],
"exampleOfWork": ["@id": paramId],
"name" : name,
"value" : normalized
]
@@ -262,27 +263,37 @@ class WrrocRenderer implements Renderer {
.collect { process -> ScriptMeta.get(process.getOwnerScript()) }
.unique()
.collectMany { meta ->
meta.getDefinitions().findAll { defn -> defn instanceof ProcessDef }
} as List<ProcessDef>
meta.getDefinitions().findAll { defn -> defn instanceof ProcessDef } as List<ProcessDef>
}

final processLookup = taskProcessors
.inject([:] as Map<TaskProcessor,ProcessDef>) { acc, processor ->
final simpleName = processor.name.split(':').last()
acc[processor] = ScriptMeta.get(processor.getOwnerScript()).getProcess(simpleName)
acc
}

final moduleSoftwareApplications = processDefs
.collect() { process ->
final result = [
"@id" : getModuleId(process),
"@type" : "SoftwareApplication",
"name" : process.getName(),
"name" : process.baseName,
"url" : getModuleUrl(process),
]

final metaYaml = getModuleSchema(process)
if( metaYaml ) {
final moduleName = metaYaml.name as String
final name = metaYaml.name as String
final tools = metaYaml.getOrDefault('tools', []) as List
final parts = tools.collect { tool ->
final entry = (tool as Map).entrySet().first()
final toolName = entry.key as String
["@id": getToolId(moduleName, toolName)]
["@id": getToolId(process.baseName, toolName)]
}

if( name )
result.name = name
if( parts )
result.hasPart = parts
}
@@ -296,16 +307,14 @@ class WrrocRenderer implements Renderer {
if( !metaYaml )
return []

final moduleName = metaYaml.name as String
final tools = metaYaml.getOrDefault('tools', []) as List

return tools
.collect { tool ->
final entry = (tool as Map).entrySet().first()
final toolName = entry.key as String
final toolDescription = (entry.value as Map)?.get('description') as String
return [
"@id" : getToolId(moduleName, toolName),
"@id" : getToolId(process.baseName, toolName),
"@type" : "SoftwareApplication",
"name" : toolName,
"description" : toolDescription
@@ -316,9 +325,9 @@ class WrrocRenderer implements Renderer {
final howToSteps = taskProcessors
.collect() { process ->
[
"@id" : getProcessStepId(metadata.projectName, process),
"@id" : getProcessStepId(process),
"@type" : "HowToStep",
"workExample": ["@id": getModuleId(process)],
"workExample": ["@id": getModuleId(processLookup[process])],
"position" : process.getId()
]
}
@@ -330,10 +339,10 @@ class WrrocRenderer implements Renderer {
.collect { task -> ["@id": getTaskId(task)] }

return [
"@id" : getProcessControlId(metadata.projectName, process),
"@id" : getProcessControlId(process),
"@type" : "ControlAction",
"instrument": ["@id": getProcessStepId(metadata.projectName, process)],
"name" : "Orchestrate process " + process.getName(),
"instrument": ["@id": getProcessStepId(process)],
"name" : "Orchestrate process ${process.name}",
"object" : taskIds
]
}
@@ -345,7 +354,7 @@ class WrrocRenderer implements Renderer {
final name = getStagedInputName(source, session)

withoutNulls([
"@id" : "stage#${name}",
"@id" : "#stage/${name}",
"@type" : getType(source),
"name" : name,
"encodingFormat": getEncodingFormat(source),
@@ -357,7 +366,7 @@ class WrrocRenderer implements Renderer {
final inputs = task.getInputFilesMap().collect { name, source ->
final id =
source in taskLookup ? getTaskOutputId(taskLookup[source], source)
: ProvHelper.isStagedInput(source, session) ? "stage#${getStagedInputName(source, session)}"
: ProvHelper.isStagedInput(source, session) ? "#stage/${getStagedInputName(source, session)}"
: normalizePath(source)
["@id": id]
}
@@ -367,8 +376,8 @@ class WrrocRenderer implements Renderer {
final result = [
"@id" : getTaskId(task),
"@type" : "CreateAction",
"name" : task.getName(),
"instrument" : ["@id": getModuleId(task.processor)],
"name" : task.name,
"instrument" : ["@id": getModuleId(processLookup[task.processor])],
"agent" : ["@id": agent["@id"]],
"object" : inputs,
"result" : outputs,
@@ -398,7 +407,7 @@ class WrrocRenderer implements Renderer {
final sourceName = getTaskOutputName(task, source)

return [
"@id" : "publish#${task.hash}/${sourceName}",
"@id" : "#publish/${task.hash}/${sourceName}",
"@type" : "CreateAction",
"name" : "publish",
"instrument" : ["@id": softwareApplicationId],
@@ -432,7 +441,7 @@ class WrrocRenderer implements Renderer {
["@id": "https://w3id.org/ro/wfrun/provenance/0.1"],
["@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0"]
],
"name" : "Workflow run of " + manifest.name ?: metadata.projectName,
"name" : "Workflow run of ${manifest.name ?: metadata.projectName}",
"description": manifest.description ?: null,
"hasPart" : withoutNulls([
["@id": mainScriptId],
@@ -748,11 +757,10 @@ class WrrocRenderer implements Renderer {
/**
* Get the canonical id of a module script.
*
* @param projectName
* @param name
*/
private String getFormalParameterId(String projectName, String name) {
return "${projectName}#param#${name}"
private String getFormalParameterId(String name) {
return "#param/${name}"
}

/**
@@ -761,17 +769,16 @@ class WrrocRenderer implements Renderer {
* @param process
*/
private String getModuleId(ProcessDef process) {
final scriptPath = ScriptMeta.get(process.getOwner()).getScriptPath().normalize()
return normalizePath(scriptPath)
return "#module/${process.baseName}"
}

/**
* Get the canonical id of a module script.
* Get the canonical url of a module script.
*
* @param process
*/
private String getModuleId(TaskProcessor process) {
final scriptPath = ScriptMeta.get(process.getOwnerScript()).getScriptPath().normalize()
private String getModuleUrl(ProcessDef process) {
final scriptPath = ScriptMeta.get(process.getOwner()).getScriptPath().normalize()
return normalizePath(scriptPath)
}

@@ -782,21 +789,20 @@ class WrrocRenderer implements Renderer {
* @param toolName
*/
private static String getToolId(String moduleName, String toolName) {
return "${moduleName}#${toolName}"
return "#module/${moduleName}/${toolName}"
}

/**
* Get the canonical id of a process in the workflow DAG.
*
* @param projectName
* @param process
*/
private static String getProcessControlId(String projectName, TaskProcessor process) {
return "${projectName}#control#${process.getName()}"
private static String getProcessControlId(TaskProcessor process) {
return "#process-control/${process.name}"
}

private static String getProcessStepId(String projectName, TaskProcessor process) {
return "${projectName}#step#${process.getName()}"
private static String getProcessStepId(TaskProcessor process) {
return "#process-step/${process.name}"
}

/**
@@ -816,7 +822,7 @@ class WrrocRenderer implements Renderer {
* @param task
*/
private static String getTaskId(TaskRun task) {
return 'task#' + task.hash.toString()
return "#task/${task.hash}"
}

/**
@@ -837,11 +843,11 @@ class WrrocRenderer implements Renderer {
* @param name
*/
private static String getTaskOutputId(TaskRun task, String name) {
return "task#${task.hash}/${name}"
return "#task/${task.hash}/${name}"
}

private static String getTaskOutputId(TaskRun task, Path target) {
return "task#${task.hash}/${getTaskOutputName(task, target)}"
return "#task/${task.hash}/${getTaskOutputName(task, target)}"
}

/**