From 32ace57876d5f8aab2451413eb1d733bb4e86a68 Mon Sep 17 00:00:00 2001 From: Kevin Schneider Date: Thu, 27 Feb 2025 08:05:25 +0100 Subject: [PATCH 1/7] fix some typos --- ARC specification.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/ARC specification.md b/ARC specification.md index 07a0517..1e217f3 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -220,7 +220,7 @@ Notes: ## Run Description -**Runs** in an ARC represent all artefacts that result from some computation on the data within the ARC, i.e. [assays](#assay-data-and-metadata) and [external data](#external-data). These results (e.g. plots, tables, data files, etc. ) MUST reside inside one or more subdirectory of the top-level `runs` directory. +**Runs** in an ARC represent all artifacts that result from some computation on the data within the ARC, i.e. [assays](#assay-data-and-metadata) and [external data](#external-data). These results (e.g. plots, tables, data files, etc. ) MUST reside inside one or more subdirectory of the top-level `runs` directory. Each such subdirectory MUST contain a workflow description `run.cwl`, given in [Common Workflow Language](https://www.commonwl.org/) (CWL), [v1.2](https://www.commonwl.org/v1.2/) or higher, that describes how the files contained with the run are derived from assay or external data, or other runs. `run.cwl` MUST be placed in the subdirectory under the top-level `runs` directory. A parameter file `run.yml` MAY be given to specify run-specific input parameters. @@ -291,7 +291,7 @@ Files and directories referenced in ARC metadata files MUST follow one of the tw - Data nodes in `isa.assay.xlsx` files: The path MAY be specified relative to the `dataset` sub-folder of the assay. - Data nodes in `isa.study.xlsx` files: The path MAY be specified relative to the `resources` sub-folder of the study. -- `Public constraint`: The located resource MUST be publicly accessible and retreivable using a standardised communications protocol (see [FAIR A1.1](https://www.go-fair.org/fair-principles/a1-1-protocol-open-free-universally-implementable/)). +- `Public constraint`: The located resource MUST be publicly accessible and retrievable using a standardized communications protocol (see [FAIR A1.1](https://www.go-fair.org/fair-principles/a1-1-protocol-open-free-universally-implementable/)). - The `URL pattern`, which is universally applicable and MUST be used to specify path locations anywhere on the web, given the following formatting constraints: - The string MUST follow the [URI Syntax Specification (rfc3986)](https://datatracker.ietf.org/doc/html/rfc3986#section-1.1.3). @@ -357,7 +357,7 @@ ARCs can generally be shared in any state. In order for ARCs to be stored in a p - A *publishable* ARC MUST NOT be *empty*: it MUST contain minimally a single assay or a single workflow. -- All [Data Path Annotations](#data-path-annotation) made in the ARC MUST resolve to an existing, accessible data ressource. +- All [Data Path Annotations](#data-path-annotation) made in the ARC MUST resolve to an existing, accessible data resource. - A *publishable* ARC MUST be [reproducible](#reproducible-arcs) From c8c35804bc8a8743fc7a3908b228ff5dbca48b68 Mon Sep 17 00:00:00 2001 From: Kevin Schneider Date: Thu, 27 Feb 2025 08:41:42 +0100 Subject: [PATCH 2/7] first draft for ARC licensing --- ARC specification.md | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/ARC specification.md b/ARC specification.md index 1e217f3..909e4a7 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -63,9 +63,11 @@ The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RE ## Extensions -The ARC specification can be extended in a backwards compatible way and will evolve over time. This is accomplished through a community-driven ARC discussion forum and pull request mechanisms. +The ARC specification can be extended and will evolve over time. This is accomplished through a community-driven ARC discussion forum and pull request mechanisms. -All changes that are not backwards compatible with the current ARC specification will be implemented in ARC specification v2.0. +Backwards compatible changes, e.g. the addition of new optional metadata fields, will be implemented in _minor version increases_. +All changes that are not backwards compatible with the current ARC specification will be implemented in _major version increases_. +By this, we try to adopt [semantic versioning](https://semver.org/). # ARC Structure and Content @@ -340,6 +342,26 @@ Use of `folder specific pattern` relative paths from `Assay1` and `Assay2` `Data Note, that to reference `Data` which is part of `Assays1` in `Assay2`, the `general pattern` is necessary either way. Therefore it is considered the more broadly applicable and recommended pattern. +## Licensing + +_Disclaimer: This is not legal advice, when in doubt consult a legal professional in your jurisdiction_ + +Per default, digital artifacts such as the ARC fall under exclusive copyright of the creator(s). +In other words, the creator has the exclusive right to decide how the ARC is used, distributed, or modified. +To allow others to use, distribute, or modify the ARC, the creator(s) can choose a more permissive license. Examples of permissive licenses include the [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/) and the [MIT License](https://opensource.org/licenses/MIT). + +Obviously, more permissive licenses are more in line with the F**A**I**R** principles, as they allow others to **A**ccess and **R**euse ARCs. +However, **the choice of license is up to the creator(s) of the ARC**. +It may not be possible to openly share ARCs due to legal or ethical constraints, for example when sensitive clinical data is involved. +Furthermore, the license of an ARC can change during its lifetime: it may be shared under a more restrictive license during the research phase and under a more permissive license after publication. + +**Choosing a license for ARCs is optional.** +**When no license information is provided, all rights are reserved to the creators/owners.** +However, we strongly encourage the usage of permissive licenses to increase the FAIRness of your ARC where possible. + +If a license of choice shall apply to an ARC, the license information MUST be included in the root of the ARC in a plaintext file named `LICENSE`. +Text file extensions `.txt` or `.md` are allowed. + # Shareable and Publishable ARCs ARCs can generally be shared in any state. In order for ARCs to be stored in a public archive for the purpose of minting a DOI, they MUST be *publishable*. They are considered *publishable* when fulfilling the following conditions: From 3ebf3023e8c4939c4735c206b6a41321dc42598a Mon Sep 17 00:00:00 2001 From: Kevin Schneider Date: Thu, 27 Feb 2025 14:00:47 +0100 Subject: [PATCH 3/7] improve licensing specs --- ARC specification.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/ARC specification.md b/ARC specification.md index 909e4a7..889185d 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -346,18 +346,20 @@ Note, that to reference `Data` which is part of `Assays1` in `Assay2`, the `gene _Disclaimer: This is not legal advice, when in doubt consult a legal professional in your jurisdiction_ +**ARCs SHOULD be shared under a license of the authors’/owners’/creators’ choice that clearly states the data usage terms.** + Per default, digital artifacts such as the ARC fall under exclusive copyright of the creator(s). In other words, the creator has the exclusive right to decide how the ARC is used, distributed, or modified. -To allow others to use, distribute, or modify the ARC, the creator(s) can choose a more permissive license. Examples of permissive licenses include the [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/) and the [MIT License](https://opensource.org/licenses/MIT). +To allow others to use, distribute, or modify the ARC, the creator(s) can choose a more permissive license. Examples of very permissive licenses include the [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/) and the [MIT License](https://opensource.org/licenses/MIT). -Obviously, more permissive licenses are more in line with the F**A**I**R** principles, as they allow others to **A**ccess and **R**euse ARCs. +_Adding any license information to an ARC already makes it more FAI**R**_, as it clarifies the conditions under which it may be (re)used by others. However, **the choice of license is up to the creator(s) of the ARC**. -It may not be possible to openly share ARCs due to legal or ethical constraints, for example when sensitive clinical data is involved. -Furthermore, the license of an ARC can change during its lifetime: it may be shared under a more restrictive license during the research phase and under a more permissive license after publication. +It may not be possible to share ARCs without restrictions due to legal (e.g. patenting or IPR) or ethical constraints, for example when sensitive clinical data is involved. +Furthermore, the license of an ARC can change during its lifetime: it may be shared under a more restrictive license during the research and/or review phase and under a more permissive license after publication. **Choosing a license for ARCs is optional.** **When no license information is provided, all rights are reserved to the creators/owners.** -However, we strongly encourage the usage of permissive licenses to increase the FAIRness of your ARC where possible. +However, it is strongly encouraged to use licenses to increase the FAIRness of your ARC where possible. If a license of choice shall apply to an ARC, the license information MUST be included in the root of the ARC in a plaintext file named `LICENSE`. Text file extensions `.txt` or `.md` are allowed. From af017c887a5a5023ed0ccbea787ffc8a67b6c515 Mon Sep 17 00:00:00 2001 From: Kevin Schneider Date: Thu, 27 Feb 2025 16:34:38 +0100 Subject: [PATCH 4/7] #136: improve i/o column specs --- ISA-XLSX.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/ISA-XLSX.md b/ISA-XLSX.md index 2e13472..11b658e 100644 --- a/ISA-XLSX.md +++ b/ISA-XLSX.md @@ -639,11 +639,19 @@ In the `Annotation Table sheets`, column headers MUST have the first letter of e The content of the annotation table MUST be placed in an `xlsx table` whose name starts with `annotationTable`. Each sheet MUST contain at most one such annotation table. Only cells inside this table are considered as part of the formatted metadata. -`Annotation Table sheets` are structured with fields organized on a per-row basis. The first row MUST be used for column headers. Each body row is an implementation of a `Process` node. +`Annotation Table sheets` are structured with fields organized on a per-row basis. The first row MUST be used for column headers. Each body row is an implementation of a `Process`. ## Inputs and Outputs -Each annotation table sheet MUST contain at most one `Input` and at most one `Output` column, which denote the Input and Output node of the `Process` node respectively. They MUST be formatted in the pattern `Input []` and `Output []`. +Each annotation table sheet MAY contain at most one `Input` column. +Their header MUST follow the pattern `Input []` (See below for possible values of ``). +If the `Input` column is present, it MUST NOT contain empty cell values. + +Each annotation table sheet MAY contain at most one `Output` column. +Their header MUST follow the pattern `Output []` (See below for possible values of ``). +If the `Output` column is present, it MUST NOT contain empty cell values. + +Cell values of the `Input` and `Output` column represent nodes of the `Process` respectively. `NodeTypes` MUST be one of the following: From 3d1c8ce36ad59b733207e6cb9d6d45f65a73a723 Mon Sep 17 00:00:00 2001 From: Kevin Schneider Date: Thu, 27 Feb 2025 17:02:13 +0100 Subject: [PATCH 5/7] #138 mandatory investigation and study metadata --- ISA-XLSX.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/ISA-XLSX.md b/ISA-XLSX.md index 11b658e..f63f188 100644 --- a/ISA-XLSX.md +++ b/ISA-XLSX.md @@ -214,9 +214,9 @@ This section MUST contain the following labels, with the specified datatypes for | Label | Datatype | Description | |-----------------------------------|---------------------------------------------|----------------------------------------------------------------------------------------------| -| Investigation Identifier | String | A identifier or an accession number provided by a repository. This SHOULD be locally unique. | -| Investigation Title | String | A concise name given to the investigation. | -| Investigation Description | String | A textual description of the investigation. | +| Investigation Identifier | String | A mandatory identifier or an accession number provided by a repository. This SHOULD be locally unique. A value MUST be given for this label. | +| Investigation Title | String | A mandatory concise name given to the investigation. A value MUST be given for this label. | +| Investigation Description | String | A mandatory textual description of the investigation. A value MUST be given for this label. | | Investigation Submission Date | String formatted as ISO8601 date YYYY-MM-DD | The date on which the investigation was reported to the repository. | | Investigation Public Release Date | String formatted as ISO8601 date YYYY-MM-DD | The date on which the investigation was released publicly. | @@ -326,8 +326,8 @@ This section MUST contain the following labels, with the specified datatypes for | Label | Datatype | Description | |---------------------------|--------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Study Identifier | String | A unique identifier, either a temporary identifier supplied by users or one generated by a repository or other database. For example, it could be an identifier complying with the LSID specification. | -| Study Title | String | A concise phrase used to encapsulate the purpose and goal of the study. | +| Study Identifier | String | A mandatory unique identifier, either a temporary identifier supplied by users or one generated by a repository or other database. For example, it could be an identifier complying with the LSID specification. A value MUST be given for this label. | +| Study Title | String | A mandatory concise phrase used to encapsulate the purpose and goal of the study. A value MUST be given for this label. | | Study Description | String | A textual description of the study, with components such as objective or goals. | | Study Submission Date | String formatted as ISO8601 date | The date on which the study is submitted to an archive. | | Study Public Release Date | String formatted as ISO8601 date | The date on which the study SHOULD be released publicly. | From 2c6cb01484aecea720f1bb30fb22e55d4d122be8 Mon Sep 17 00:00:00 2001 From: Kevin Schneider Date: Fri, 28 Feb 2025 11:14:24 +0100 Subject: [PATCH 6/7] #86: Add identifier, title, and description to ASSAY metadata section --- ISA-XLSX.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/ISA-XLSX.md b/ISA-XLSX.md index f63f188..c9ad3e7 100644 --- a/ISA-XLSX.md +++ b/ISA-XLSX.md @@ -565,6 +565,9 @@ This section MUST contain the following labels, with the specified datatypes for | Label | Datatype | Description | |----------------------------------------------------|------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Assay Identifier | String | A mandatory unique identifier, either a temporary identifier supplied by users or one generated by a repository or other database. For example, it could be an identifier complying with the LSID specification. A value MUST be given for this label. | +| Assay Title | String | A concise phrase used to encapsulate the purpose and goal of the assay. | +| Assay Description | String | A textual description of the assay, with components such as objective or goals. | | Assay Measurement Type | String | A term to qualify the endpoint, or what is being measured (e.g. gene expression profiling or protein identification). The term can be free text or from, for example, a controlled vocabulary or an ontology. If the latter source is used the Term Accession Number and Term Source REF fields below are required. | | Assay Measurement Type Term Accession Number | String | The accession number from the Term Source associated with the selected term. | | Assay Measurement Type Term Source REF | String | The Source REF has to match one of the Term Source Name declared in the Ontology Source Reference section. | From 56f4a5eec1c48cd9439adf82168f0b4b8b7ee82b Mon Sep 17 00:00:00 2001 From: Heinrich Lukas Weil Date: Fri, 28 Feb 2025 13:05:25 +0100 Subject: [PATCH 7/7] update TOC --- ARC specification.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/ARC specification.md b/ARC specification.md index 889185d..42ab847 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -29,6 +29,8 @@ Licensed under the Creative Commons License CC BY, Version 4.0; you may not use - [Data Path Annotation](#data-path-annotation) - [Examples](#examples) - [General Pattern](#general-pattern) + - [Folder Specific pattern](#folder-specific-pattern) + - [Licensing](#licensing) - [Shareable and Publishable ARCs](#shareable-and-publishable-arcs) - [Reproducible ARCs](#reproducible-arcs) - [Mechanisms for ARC Quality Control](#mechanisms-for-arc-quality-control)