Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmonize with RO-crate profile(s) #140

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 27 additions & 5 deletions ARC specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,11 @@ The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RE

## Extensions

The ARC specification can be extended in a backwards compatible way and will evolve over time. This is accomplished through a community-driven ARC discussion forum and pull request mechanisms.
The ARC specification can be extended and will evolve over time. This is accomplished through a community-driven ARC discussion forum and pull request mechanisms.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


All changes that are not backwards compatible with the current ARC specification will be implemented in ARC specification v2.0.
Backwards compatible changes, e.g. the addition of new optional metadata fields, will be implemented in _minor version increases_.
All changes that are not backwards compatible with the current ARC specification will be implemented in _major version increases_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 👍 👍

By this, we try to adopt [semantic versioning](https://semver.org/).

# ARC Structure and Content

Expand Down Expand Up @@ -220,7 +222,7 @@ Notes:

## Run Description

**Runs** in an ARC represent all artefacts that result from some computation on the data within the ARC, i.e. [assays](#assay-data-and-metadata) and [external data](#external-data). These results (e.g. plots, tables, data files, etc. ) MUST reside inside one or more subdirectory of the top-level `runs` directory.
**Runs** in an ARC represent all artifacts that result from some computation on the data within the ARC, i.e. [assays](#assay-data-and-metadata) and [external data](#external-data). These results (e.g. plots, tables, data files, etc. ) MUST reside inside one or more subdirectory of the top-level `runs` directory.

Each such subdirectory MUST contain a workflow description `run.cwl`, given in [Common Workflow Language](https://www.commonwl.org/) (CWL), [v1.2](https://www.commonwl.org/v1.2/) or higher, that describes how the files contained with the run are derived from assay or external data, or other runs. `run.cwl` MUST be placed in the subdirectory under the top-level `runs` directory. A parameter file `run.yml` MAY be given to specify run-specific input parameters.

Expand Down Expand Up @@ -291,7 +293,7 @@ Files and directories referenced in ARC metadata files MUST follow one of the tw
- Data nodes in `isa.assay.xlsx` files: The path MAY be specified relative to the `dataset` sub-folder of the assay.
- Data nodes in `isa.study.xlsx` files: The path MAY be specified relative to the `resources` sub-folder of the study.

- `Public constraint`: The located resource MUST be publicly accessible and retreivable using a standardised communications protocol (see [FAIR A1.1](https://www.go-fair.org/fair-principles/a1-1-protocol-open-free-universally-implementable/)).
- `Public constraint`: The located resource MUST be publicly accessible and retrievable using a standardized communications protocol (see [FAIR A1.1](https://www.go-fair.org/fair-principles/a1-1-protocol-open-free-universally-implementable/)).

- The `URL pattern`, which is universally applicable and MUST be used to specify path locations anywhere on the web, given the following formatting constraints:
- The string MUST follow the [URI Syntax Specification (rfc3986)](https://datatracker.ietf.org/doc/html/rfc3986#section-1.1.3).
Expand Down Expand Up @@ -340,6 +342,26 @@ Use of `folder specific pattern` relative paths from `Assay1` and `Assay2` `Data

Note, that to reference `Data` which is part of `Assays1` in `Assay2`, the `general pattern` is necessary either way. Therefore it is considered the more broadly applicable and recommended pattern.

## Licensing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add to TOC


_Disclaimer: This is not legal advice, when in doubt consult a legal professional in your jurisdiction_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Section reads great!


Per default, digital artifacts such as the ARC fall under exclusive copyright of the creator(s).
In other words, the creator has the exclusive right to decide how the ARC is used, distributed, or modified.
To allow others to use, distribute, or modify the ARC, the creator(s) can choose a more permissive license. Examples of permissive licenses include the [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/) and the [MIT License](https://opensource.org/licenses/MIT).

Obviously, more permissive licenses are more in line with the F**A**I**R** principles, as they allow others to **A**ccess and **R**euse ARCs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is a bit ambiguous.
Does it mean less permissive, i.e. CC-BY = FAIR, CC-BY-NC-SA = less permissive = less FAIR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can make a point that any copyright makes something less accessible in the broadest sense, as there are more than 0 restrictions on the accessibility.

However, from the FAIR definition site itself:

The ‘A’ in FAIR does not necessarily mean ‘open’ or ‘free’. Rather, it implies that one should provide the exact conditions under which the data are accessible

So adding any license makes the ARC more FAIR, as that clarifies (some) conditions of access. I would however argue that the general public sentiment leans towards ‘open’ or ‘free’ being more FAIR than a dataset that applies FAIR principles but is not openly accessible. You can for example also have licenses that allow full access but prevent the R in FAIR via prohibiting any re-use.

So long story short i am ambiguous on purpose here, but open for suggestions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under further investigation, licenses are more on the side of the R in FAIR. Access obviously cannot be handled by them directly, as once you can read the license you obviously have access to the ARC. will clarify

However, **the choice of license is up to the creator(s) of the ARC**.
It may not be possible to openly share ARCs due to legal or ethical constraints, for example when sensitive clinical data is involved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

legal (e.g. patenting or IPR)

Furthermore, the license of an ARC can change during its lifetime: it may be shared under a more restrictive license during the research phase and under a more permissive license after publication.

**Choosing a license for ARCs is optional.**
**When no license information is provided, all rights are reserved to the creators/owners.**
However, we strongly encourage the usage of permissive licenses to increase the FAIRness of your ARC where possible.

If a license of choice shall apply to an ARC, the license information MUST be included in the root of the ARC in a plaintext file named `LICENSE`.
Text file extensions `.txt` or `.md` are allowed.

# Shareable and Publishable ARCs

ARCs can generally be shared in any state. In order for ARCs to be stored in a public archive for the purpose of minting a DOI, they MUST be *publishable*. They are considered *publishable* when fulfilling the following conditions:
Expand All @@ -357,7 +379,7 @@ ARCs can generally be shared in any state. In order for ARCs to be stored in a p

- A *publishable* ARC MUST NOT be *empty*: it MUST contain minimally a single assay or a single workflow.

- All [Data Path Annotations](#data-path-annotation) made in the ARC MUST resolve to an existing, accessible data ressource.
- All [Data Path Annotations](#data-path-annotation) made in the ARC MUST resolve to an existing, accessible data resource.

- A *publishable* ARC MUST be [reproducible](#reproducible-arcs)

Expand Down