Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data use ontology module and field to reference it to project schema #1342

Closed
mshadbolt opened this issue Feb 23, 2021 · 14 comments
Closed
Assignees

Comments

@mshadbolt
Copy link
Contributor

mshadbolt commented Feb 23, 2021

As we move toward supporting managed access data we would like to create a module and field to capture a data use ontology term at the project level.

What should the change/update be?

Create data_use_ontology module

A standard ontology module with text, ontology and ontology_label fields with graph restriction being a child of either
data use limitation DUO:0000001 or data use requirements DUO:0000017 (tagging @lauraclarke if she can have any input here)

Add data_use field to project.json

  • Field name: data_use_restriction (?)
  • Field description: A brief description of how the data may legally be used
  • Field type: array of ontology objects (or should there be just one?)
  • Required: no ?
  • CV or enum: reference to data_use_ontology module (see above)

If we decide this needs to be a required field moving forward, it would constitute a major schema change

We will also need to request for the Data Use Ontology to be added to the HCAO.

@lauraclarke
Copy link
Member

Open access projects can also be described by DUO
e.g https://www.ebi.ac.uk/ols/ontologies/duo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FDUO_0000004&viewMode=PreferredRoots&siblings=false
Is there any reason for this not to be a standard project field?

@ESapenaVentura
Copy link
Collaborator

By standard, do you mean required?

@mshadbolt
Copy link
Contributor Author

+1 I am confused by what you mean by 'a standard project field'

@lauraclarke
Copy link
Member

So ultimately, I think this should be a required field. As that would be a non-backwards compatible change we may wish to stage is and add it as an optional field in the project in the first instance and once everything has been updated to contain it, then we could switch to required maybe

I guess I want to make sure this isn't only a consideration for managed access projects and is also filled out for projects without restriction

@mshadbolt
Copy link
Contributor Author

ok yep, gotcha. Do you have a feel for whether we should be using just one code per project? Also which branch of the hierarchy? ie the 'limitation' branch or the 'restriction' branch, or could be either? Or do we not really know yet?

Screenshot 2021-02-23 at 18 53 00

@lauraclarke
Copy link
Member

From what I have learnt from Giselle, I think we will need both limitation and requirements for EGA projects. I don't know if you can see what DUO codes are associated with EGA projects now, that would give us a good indication

I am hoping that the default will be "no restriction" for unrestricted data and "no restriction" and "user-specific restriction" for managed access but I don't know if there will be any which end up with general research use, that is something we might want to chat with Melanie and Giselle about. I know we will need to be careful about taking anything with more restrictive DUO codes as any integrated data collection has to follow the most restrictive code for how it is used

@lauraclarke
Copy link
Member

Found example in EGA https://ega-archive.org/datasets/EGAD00001000100

@mshadbolt
Copy link
Contributor Author

Cool that is useful thanks, so based on this we should enable having multiple ontology terms in this field at the project level.

I have also requested for DUO to be added to our ontology.

@lauraclarke
Copy link
Member

@mshadbolt
Copy link
Contributor Author

Also human readable docs here: https://ega-archive.org/data-use-conditions

@mshadbolt
Copy link
Contributor Author

I have created a branch for this here: https://github.com/HumanCellAtlas/metadata-schema/tree/ms-add-duo

Once DUO is released in HCAO we can make a PR and merge

@mshadbolt mshadbolt self-assigned this Mar 17, 2021
@mshadbolt
Copy link
Contributor Author

I am actually a bit unsure what the best way to implement the 'modifier' terms that EGA does.

For example if a contributor puts 'disease specific research DUO:0000007' we need to link it to a MONDO term to indicate which disease. I am wondering whether to allow this we need a data_use_restriction module so that the ontologies from two fields can be paired.

If we put two fields into the project schema, it won't necessarily be clear which are paired with each other.

There are other examples where we might want to pair a term or allow more info to be given such as

  • geographical restriction

do any @HumanCellAtlas/wranglers have any opinion on this?

@ESapenaVentura
Copy link
Collaborator

Reading the EGA docs, given that you could give more than one code per project, I agree that pairings won't be clear

I like the module idea better.. although that still makes it possible for human error (e.g. disease specific research paired with a geographical term), although I guess we should be able to validate against it with the allOf keyword (Not sure if the graph_restriction being an extension would cause problems, though...)

Sorry just divagating here, tl;dr: I prefer the idea of a module to capture both fields and pair them if necessary

@mshadbolt
Copy link
Contributor Author

Some metadata team members had a further discussion about this. Given the goals and aims of the HCA, we are unlikely to consider data that has a disease specific restriction suitable to be a part of the atlas. The other use case for a more complex model is the 'geographical restriction' but this term has been flagged for deprecation (EBISPOT/DUO#96)

At this stage we will stick with the simplest solution, an ontology module, and a single field in the project.json that references that module and allows specification one or more DUO codes.

If we feel the need to further explain or modify the given ontology terms with restrictions around disease/geography we will most likely need to create a dedicated 'data use' module.

willrockout added a commit that referenced this issue Mar 22, 2021
Enable ability to assign ontologised Data Use Restriction at the project level. Fixes #1342
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants