-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add data use ontology module and field to reference it to project schema #1342
Comments
Open access projects can also be described by DUO |
By standard, do you mean required? |
+1 I am confused by what you mean by 'a standard project field' |
So ultimately, I think this should be a required field. As that would be a non-backwards compatible change we may wish to stage is and add it as an optional field in the project in the first instance and once everything has been updated to contain it, then we could switch to required maybe I guess I want to make sure this isn't only a consideration for managed access projects and is also filled out for projects without restriction |
From what I have learnt from Giselle, I think we will need both limitation and requirements for EGA projects. I don't know if you can see what DUO codes are associated with EGA projects now, that would give us a good indication I am hoping that the default will be "no restriction" for unrestricted data and "no restriction" and "user-specific restriction" for managed access but I don't know if there will be any which end up with general research use, that is something we might want to chat with Melanie and Giselle about. I know we will need to be careful about taking anything with more restrictive DUO codes as any integrated data collection has to follow the most restrictive code for how it is used |
Found example in EGA https://ega-archive.org/datasets/EGAD00001000100 |
Cool that is useful thanks, so based on this we should enable having multiple ontology terms in this field at the project level. I have also requested for DUO to be added to our ontology. |
https://github.com/enasequence/schema/blob/master/src/main/resources/uk/ac/ebi/ena/sra/schema/EGA.policy.xsd is how EGA structure this info |
Also human readable docs here: https://ega-archive.org/data-use-conditions |
I have created a branch for this here: https://github.com/HumanCellAtlas/metadata-schema/tree/ms-add-duo Once DUO is released in HCAO we can make a PR and merge |
I am actually a bit unsure what the best way to implement the 'modifier' terms that EGA does. For example if a contributor puts 'disease specific research DUO:0000007' we need to link it to a MONDO term to indicate which disease. I am wondering whether to allow this we need a data_use_restriction module so that the ontologies from two fields can be paired. If we put two fields into the project schema, it won't necessarily be clear which are paired with each other. There are other examples where we might want to pair a term or allow more info to be given such as
do any @HumanCellAtlas/wranglers have any opinion on this? |
Reading the EGA docs, given that you could give more than one code per project, I agree that pairings won't be clear I like the module idea better.. although that still makes it possible for human error (e.g. Sorry just divagating here, tl;dr: I prefer the idea of a module to capture both fields and pair them if necessary |
Some metadata team members had a further discussion about this. Given the goals and aims of the HCA, we are unlikely to consider data that has a disease specific restriction suitable to be a part of the atlas. The other use case for a more complex model is the 'geographical restriction' but this term has been flagged for deprecation (EBISPOT/DUO#96) At this stage we will stick with the simplest solution, an ontology module, and a single field in the If we feel the need to further explain or modify the given ontology terms with restrictions around disease/geography we will most likely need to create a dedicated 'data use' module. |
Enable ability to assign ontologised Data Use Restriction at the project level. Fixes #1342
As we move toward supporting managed access data we would like to create a module and field to capture a data use ontology term at the project level.
What should the change/update be?
Create data_use_ontology module
A standard ontology module with text, ontology and ontology_label fields with graph restriction being a child of either
data use limitation DUO:0000001 or data use requirements DUO:0000017 (tagging @lauraclarke if she can have any input here)
Add data_use field to project.json
If we decide this needs to be a required field moving forward, it would constitute a major schema change
We will also need to request for the Data Use Ontology to be added to the HCAO.
The text was updated successfully, but these errors were encountered: