Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(portal): data model changes #4659

Merged
merged 6 commits into from
Feb 10, 2025
Merged

fix(portal): data model changes #4659

merged 6 commits into from
Feb 10, 2025

Conversation

willemijn-oudijk
Copy link
Contributor

@willemijn-oudijk willemijn-oudijk commented Jan 31, 2025

What are the main changes you did

  • Biosamples: added a column 'tissue type' which links to the TissueType ontology.
  • Biosamples: changed refback for protocol used from Protocol activity's identifier to Protocol activity's input samples.
  • Protocol activity: added included in datasets column.
  • Files: change path columnType from hyperlink_array to string_array.
  • TestLoader: fix test18PortalLoader. Should expect 1 more table because of the addition of the TissueType ontology.
  • Files: changed included of datasets to included in datasets so this naming is consistent with the other tables in the model.
  • Files: changed columnType generated by protocol from ref to ref_array. Because each protocol is present two times because of the addition of the prefix to the experiment IDs (sample_ or seq_), which means that a file can now be generated by multiple protocols.
  • Individuals: changed refback for files to individuals instead of name (in table Files).

How to test

  • explain here what to do to test this (or point to unit tests)

Checklist

  • updated docs in case of new feature
  • added/updated tests
  • added/updated testplan to include a test for this fix, including ref to bug using # notation

@willemijn-oudijk willemijn-oudijk self-assigned this Jan 31, 2025
@davidruvolo51 davidruvolo51 changed the title Fix(portal): data model changes fix(portal): data model changes Feb 3, 2025
Copy link

sonarqubecloud bot commented Feb 5, 2025

@willemijn-oudijk willemijn-oudijk marked this pull request as ready for review February 6, 2025 15:40
Copy link
Contributor

@davidruvolo51 davidruvolo51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I don't have any comments for this PR.

I have a few comments for the future discussion.

  • What should we do with tissue type? Should it be merged with material type? Or is there another way to model this?
  • When creating a schema using the RD3_V2 template, it takes a while to import the MedDRA.csv file and the ontology is not loaded in the form. It is also unclear what the difference is between this ontology and the ones we use in patient registries (e.g., OMIM, ORDO, HPO, etc.). In future versions, we might want to determine the level of catalogue integration and remove tables that aren't needed or duplicates of the patient registries.

@@ -3,6 +3,7 @@ Biosamples,,,,,,,,,,,http://purl.obolibrary.org/obo/NCIT_C43376,"A natural subst
Biosamples,,id,string,1,TRUE,,,,,,http://purl.obolibrary.org/obo/NCIT_C93400,A unique proper name or character sequence that identifies this particular material,"Beacon v2, FAIR Genomes, DCAT examples, RD3"
Biosamples,,alternate ids,string_array,,,,,,,,http://purl.obolibrary.org/obo/NCIT_C90353,A backup sequence of characters used to identify an entity,RD3
Biosamples,,material type,ontology,,,,BiospecimenType,,,,http://purl.obolibrary.org/obo/NCIT_C70713,"The type of material taken from a biological entity for testing, diagnostic, propagation, treatment or research purposes","FAIR Genomes, RD3"
Biosamples,,tissue type,ontology,,,,TissueType,,,,,,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Flagging this for future discussion)

Further input is needed on how to handle material type and tissue type. We might need to revisit this in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this with Joeri, he agreed on adding tissue type for now, but he stated this might need to be merged at a later point.

@davidruvolo51
Copy link
Contributor

Linked with issue molgenis/GCC#843

@@ -3,6 +3,7 @@ Biosamples,,,,,,,,,,,http://purl.obolibrary.org/obo/NCIT_C43376,"A natural subst
Biosamples,,id,string,1,TRUE,,,,,,http://purl.obolibrary.org/obo/NCIT_C93400,A unique proper name or character sequence that identifies this particular material,"Beacon v2, FAIR Genomes, DCAT examples, RD3"
Biosamples,,alternate ids,string_array,,,,,,,,http://purl.obolibrary.org/obo/NCIT_C90353,A backup sequence of characters used to identify an entity,RD3
Biosamples,,material type,ontology,,,,BiospecimenType,,,,http://purl.obolibrary.org/obo/NCIT_C70713,"The type of material taken from a biological entity for testing, diagnostic, propagation, treatment or research purposes","FAIR Genomes, RD3"
Biosamples,,tissue type,ontology,,,,TissueType,,,,,,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, important.

@@ -2,11 +2,11 @@ tableName,tableExtends,columnName,columnType,key,required,refSchema,refTable,ref
Files,,,,,,,,,,,"http://purl.obolibrary.org/obo/STATO_0000002,http://purl.obolibrary.org/obo/NCIT_C42883","An electronic file is an information content entity which conforms to a specification or format and which is meant to hold data and information in digital form, accessible to software agents.","RD3, DCAT files add-on"
Files,,name,string,1,TRUE,,,,,,http://purl.obolibrary.org/obo/NCIT_C171191,The name of a file that identifies an electronic file,"RD3, DCAT files add-on"
Files,,alternate ids,string_array,,,,,,,,http://purl.obolibrary.org/obo/NCIT_C90353,A backup sequence of characters used to identify an entity,RD3
Files,,path,hyperlink_array,,TRUE,,,,,,http://purl.allotrope.org/ontologies/result#AFR_0001927,"A file path is an identifier for a file in a file system or network (e.g., a scp compatible <host>:<path>)","RD3, DCAT files add-on"
Files,,path,string_array,,TRUE,,,,,,http://purl.allotrope.org/ontologies/result#AFR_0001927,"A file path is an identifier for a file in a file system or network (e.g., a scp compatible <host>:<path>)","RD3, DCAT files add-on"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we allow for non URLs. That makes sense.

@willemijn-oudijk willemijn-oudijk merged commit dd8af51 into master Feb 10, 2025
7 checks passed
@willemijn-oudijk willemijn-oudijk deleted the fix/model-changes branch February 10, 2025 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants