Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Workflow Run RO-crate format #39

Merged
merged 58 commits into from
Feb 6, 2025
Merged
Changes from 1 commit
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
88690fc
status from PR #33
famosab Nov 18, 2024
e541606
add encodingFormat for nextflow.config
famosab Nov 18, 2024
9889a69
add encodingFormat for main.nf
famosab Nov 18, 2024
f228acf
Merge pull request #10 from famosab/encode
famosab Dec 3, 2024
91fc7e2
feat: add wrroc to valid formats
famosab Dec 3, 2024
6dadcaa
Merge pull request #13 from famosab/obs
famosab Dec 3, 2024
1416833
fix: make getIntermediateOutputFiles work again (#18)
famosab Dec 6, 2024
416d920
Check in input and output if file or directory
fbartusch Dec 8, 2024
7560d4c
feat: add README to crate (#14)
famosab Dec 9, 2024
e6e3844
Set correct MIME types
fbartusch Dec 9, 2024
0d3fd2d
Add contactPoint for agent and organization (#21)
fbartusch Dec 13, 2024
816cf17
Fix #4 (#22)
fbartusch Dec 13, 2024
9aa6762
Merge branch 'dev-wrroc' into iss7-directoryType
fbartusch Dec 17, 2024
07f7ceb
Merge pull request #20 from famosab/iss7-directoryType
fbartusch Dec 17, 2024
7f0264f
start with metaYaml imports (#12)
famosab Dec 18, 2024
daf9725
add information to README
famosab Dec 18, 2024
f271d91
cleanup
bentsherman Jan 10, 2025
70a5f51
Set root crate dir to parent of wrroc path
bentsherman Jan 14, 2025
8f060ca
cleanup
bentsherman Jan 14, 2025
98d2cc4
Add helper functions
bentsherman Jan 14, 2025
4410cba
Don't copy intermediate files into crate, normalize inputs against pr…
bentsherman Jan 14, 2025
9a9a816
Fix resolved config
bentsherman Jan 14, 2025
4220097
Add CreateAction's for publishing outputs
bentsherman Jan 14, 2025
3e0b70f
minor fix
bentsherman Jan 14, 2025
d1ba473
Improve ids for modules, processes, tools, replace main script with r…
bentsherman Jan 14, 2025
b24737f
cleanup
bentsherman Jan 14, 2025
ce8235e
cleanup config parsing
bentsherman Jan 14, 2025
42da1dd
cleanup
bentsherman Jan 14, 2025
0ffea5d
Improve canonical ids
bentsherman Jan 14, 2025
fe5d4c0
Use parameter schema to populate formal parameters, copy input files
bentsherman Jan 14, 2025
90f9cbf
Exclude null property values
bentsherman Jan 16, 2025
34b4d3a
Include permalink to main script
bentsherman Jan 16, 2025
3ff0d27
Add cases for lists and maps for formal parameters
bentsherman Jan 16, 2025
0f51061
Don't download remote input files into crate
bentsherman Jan 16, 2025
7888bbc
Include main script to satisfy WRROC requirement
bentsherman Jan 16, 2025
0d48ce0
Replace "Directory" -> "Dataset"
bentsherman Jan 16, 2025
4b5161e
Exclude (and warn about) published files outside of crate directory
bentsherman Jan 17, 2025
1c9e9f8
Fix tool description, cleanup
bentsherman Jan 17, 2025
310b115
Add param input files to dataset parts
bentsherman Jan 17, 2025
9a4ff8a
Improve canonical ids for tasks and task outputs
bentsherman Jan 17, 2025
85c089e
Fix validation issues
bentsherman Jan 17, 2025
9972771
Separate staged input files from workflow inputs
bentsherman Jan 17, 2025
3c4aa37
Update docs, fix issues with wrroc config options
bentsherman Jan 17, 2025
5c2b3f1
Fix null reference error
bentsherman Jan 17, 2025
8bf3c9c
Don't copy directories specified by params into crate
bentsherman Jan 17, 2025
f6831dc
minor edits
bentsherman Jan 17, 2025
3c33bc9
Improve entity ids
bentsherman Jan 17, 2025
6f0ffed
Make intermediate outputs into contextual entities (CreativeWork)
bentsherman Jan 17, 2025
8b13a59
Exclude license entity if it is not specified
bentsherman Jan 22, 2025
98b3639
Fix null reference error when agent is not specified
bentsherman Jan 22, 2025
2478427
Add warning if pipeline repo URL can't be determined
bentsherman Jan 23, 2025
f303941
Use heuristic to identify original definition of task processor
bentsherman Jan 27, 2025
9ee072b
Encode missing input files as absolute URIs
bentsherman Jan 27, 2025
ee7ee8c
Separate ro-crate license from pipeline license
bentsherman Feb 3, 2025
0394e92
Normalize durations and memory units as raw numbers
bentsherman Feb 3, 2025
86c74cc
Update warning about unknown parameter type
bentsherman Feb 4, 2025
dd3f0e1
Handle task inputs from work/tmp/
bentsherman Feb 4, 2025
e11e50f
Exclude parameters set to null
bentsherman Feb 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix: make getIntermediateOutputFiles work again (#18)
* fx: make getIntermediateOutputFiles work again

* Fix bugs

fixes #16
fixes #17

---------

Co-authored-by: fbartusch <felix.bartusch@uni-tuebingen.de>
famosab and fbartusch authored Dec 6, 2024
commit 141683307f3a951e8992199c3a679b990f7886f0
79 changes: 54 additions & 25 deletions plugins/nf-prov/src/main/nextflow/prov/WrrocRenderer.groovy
Original file line number Diff line number Diff line change
@@ -41,8 +41,11 @@ import nextflow.processor.TaskRun
class WrrocRenderer implements Renderer {

private Path path
// The final RO-Crate directory
private Path crateRootDir
// Nextflow work directory
private Path workdir
// Nextflow pipeline directory (contains main.nf, assets, etc.)
private Path projectDir

private LinkedHashMap agent
@@ -110,7 +113,7 @@ class WrrocRenderer implements Renderer {
if (!Files.exists(dest))
Files.copy(source, dest)
} catch (Exception e) {
println "Failed to copy $source to $dest: ${e.message}"
println "workflowInput: Failed to copy $source to $dest: ${e.message}"
}
}
}
@@ -141,7 +144,7 @@ class WrrocRenderer implements Renderer {
Files.createDirectories(dest.getParent())
Files.copy(source, dest, StandardCopyOption.REPLACE_EXISTING)
} catch (Exception e) {
println "Failed to copy $source to $dest: ${e.message}"
println "workflowOutput Failed to copy $source to $dest: ${e.message}"
}
}
}
@@ -284,18 +287,29 @@ class WrrocRenderer implements Renderer {

def createActions = tasks
.collect { task ->

List<String> resultFileIDs = []

// Collect output files of the path
List<Path> outputFileList = []
for (taskOutputParam in task.getOutputsByType(FileOutParam)) {

if (taskOutputParam.getValue() instanceof Path) {
outputFileList.add(taskOutputParam.getValue() as Path)
continue
}

for (taskOutputFile in taskOutputParam.getValue()) {
// Path to file in workdir
Path taskOutputFilePath = Path.of(taskOutputFile.toString())
outputFileList.add(Path.of(taskOutputFile.toString()))
}
}

if (workflowOutputs.containsKey(taskOutputFilePath)) {
resultFileIDs.add(crateRootDir.relativize(workflowOutputs.get(taskOutputFilePath)).toString())
} else {
System.out.println("taskOutput not contained in workflowOutputs list: " + taskOutputFilePath)
}
// Check if the output files have a mapping in workflowOutputs
for (outputFile in outputFileList) {
if (workflowOutputs.containsKey(outputFile)) {
resultFileIDs.add(crateRootDir.relativize(workflowOutputs.get(outputFile)).toString())
} else {
System.out.println("taskOutput not contained in workflowOutputs list: " + outputFile)
}
}

@@ -553,29 +567,41 @@ class WrrocRenderer implements Renderer {
}

def Map<Path, Path> getIntermediateOutputFiles(Set<TaskRun> tasks, Map<Path, Path> workflowOutputs) {
Map<Path, Path> intermediateInputFiles = [:]

tasks.collect { task ->
List<Path> intermediateOutputFilesList = []
Map<Path, Path> intermediateOutputFilesMap = [:]

tasks.each { task ->
for (taskOutputParam in task.getOutputsByType(FileOutParam)) {

// If the param is a Path, just add it to the intermediate list
if (taskOutputParam.getValue() instanceof Path) {
intermediateOutputFilesList.add(taskOutputParam.getValue() as Path)
continue
}

for (taskOutputFile in taskOutputParam.getValue()) {
// Path to file in workdir
Path taskOutputFilePath = Path.of(taskOutputFile.toString())
intermediateOutputFilesList.add(taskOutputFile as Path)
}
}
}

if (! workflowOutputs.containsKey(taskOutputFilePath)) {
// Iterate over the file list and create the mapping
for (outputFile in intermediateOutputFilesList) {
if (!workflowOutputs.containsKey(outputFile)) {

// Find the relative path from workdir
Path relativePath = workdir.relativize(taskOutputFilePath)
// Find the relative path from workdir
Path relativePath = workdir.relativize(outputFile)

// Build the new path by combining crateRootDir and the relative part
Path outputFileInCrate = crateRootDir.resolve(workdir.fileName).resolve(relativePath)
// Build the new path by combining crateRootDir and the relative part
Path outputFileInCrate = crateRootDir.resolve(workdir.fileName).resolve(relativePath)

intermediateInputFiles.put(taskOutputFilePath, outputFileInCrate)
}
}
Files.createDirectories(outputFileInCrate.parent)
intermediateOutputFilesMap.put(outputFile, outputFileInCrate)
}
}

return intermediateInputFiles
return intermediateOutputFilesMap
}

/**
@@ -606,15 +632,18 @@ class WrrocRenderer implements Renderer {
parentDir = assetDir
else if (inputPath.startsWith(pipelineInfoDir))
parentDir = pipelineInfoDir
else {
System.out.println("Unknown parentDir: " + inputPath.toString())
}


// Ignore file with unkown (e.g. null) parentDir
if(parentDir) {
Path relativePath = parentDir.relativize(inputPath)
Path outputFileInCrate = crateRootDir.resolve(parentDir.fileName).resolve(relativePath)
workflowInputMapping.put(inputPath, outputFileInCrate)
} else {
// All other files are simple copied into the crate with their absolute path into the crate root
Path relativePath = Path.of(inputPath.toString().substring(1))
Path outputFileInCrate = crateRootDir.resolve(relativePath)
workflowInputMapping.put(inputPath, outputFileInCrate)
}
}