Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add project, data collection metadata and associated CV tables #22

Open
wants to merge 21 commits into
base: dev
Choose a base branch
from

Conversation

pbishwakarma
Copy link
Contributor

Adding back in project and data collection for review according to issue process.

Closes #21

modality,[enum],Y,"Defines modality of the data in this collection. See modality controlled vocabulary"
technique,[enum],Y,"Technique used to acquire data in this collection. See technique controlled vocabulary"
license,[enum],Y,"License applied to the data in this collection. See license controlled vocabulary"
webResource,[String],N,"Links to relevant tools/pages to data collection"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a link; should we explicitly model the data repository and identifier for data repository? For other tools for visualization or code, should we model these more explicitly?

@patrick-lloyd-ray patrick-lloyd-ray self-assigned this Sep 27, 2023
@patrick-lloyd-ray patrick-lloyd-ray added documentation Improvements or additions to documentation enhancement New feature or request metadata schema New or change in metadata schema labels Sep 27, 2023
@patrick-lloyd-ray
Copy link
Member

@pbishwakarma and @carolth: it's in standard MOWG form now but could use a content check.

@djarecka
Copy link
Contributor

Hi, I was using this PT to create a google sheet and create a linkml model.

A couple things that I noticed:

  • I think sub domain name could be simply "Project", "Person", etc. instead of "Project Metadata", "Person Metadata", etc.
  • in the file with DataCollection the sub domain name is "Project Metadata"
  • for all metadata there are two columns: definition and short description. I'm not completely sure what is teh meaning of short description. Often this is almost the same as definition, sometimes is enough different that I wouldn't guess that they describe the same field
  • assuming that if data type has "[]", that means that it is multivalued
  • it is not completely sure how should I treat "Person.name | Organization.name" in the data type field, name is just a string
  • line 8, 9, 10 in DataCollection file have are about accessControl, two of them have "enum" in data type, but without specifying the name of enum. I have only one ValueSet, that is called "accessControl" with values only, no description, no extra information, so not sure how would you like to map this

@pbishwakarma
Copy link
Contributor Author

Hi Dorota,

Thanks for the comments, this had totally fell off my radar. I will update this doc based off your comments and some review.

  1. Yes I agree subdomain can just be the entities
  2. I will correct this
  3. From the donor-to-alignment schema looks like short description is meant to provide some more information than the outright definition, maybe some examples. I will update
  4. Yes [] should be multivalued
  5. That is a polymorphic type - so in the case of a contact for a project, that can be either a Person or an Organization. The value that serves as the join key is the name of the Person or Organization entity.
  6. accessControl right now is a simple enum (open/controlled access), what we'd like to do is evolve acccessControl to be composed of three things (some code to categorize the accessControl, a label for that code, and a description to allow people to specify reasons for why a subset if open/controlled). We haven't made this update yet, so if this model should be representative of current state, I think this should be a single enum field.

@djarecka
Copy link
Contributor

djarecka commented Aug 2, 2024

@pbishwakarma - btw. I was also trying to identify the elements that I could find in the Data Catalog (or nemo website that DC points to) and in the google doc that I used to create linkml model from I created tab Checks: Slots to make some notes (see column k and l): https://docs.google.com/spreadsheets/d/1dCFxzpLBReauJYIinCs_xZlTckpJWFx5nxSTFAe-vvE/edit?usp=sharing

Happy to meet next week if you're planning to work on it and want to discuss. I was also told to look at the Dandi model and see what things could be also taken from there if useful.

@pbishwakarma
Copy link
Contributor Author

@pbishwakarma - btw. I was also trying to identify the elements that I could find in the Data Catalog (or nemo website that DC points to) and in the google doc that I used to create linkml model from I created tab Checks: Slots to make some notes (see column k and l): https://docs.google.com/spreadsheets/d/1dCFxzpLBReauJYIinCs_xZlTckpJWFx5nxSTFAe-vvE/edit?usp=sharing

Got it, there are definitely some parts of the data model that are not being used currently (either they contain no data or just aren't being displayed in the UI) - I will update this PR before the working meeting on Wednesday if you want to discuss specifics about those nuances then.

Happy to meet next week if you're planning to work on it and want to discuss. I was also told to look at the Dandi model and see what things could be also taken from there if useful.

I think we took a stab at a similar exercise in the BICCN repo a while back, at least for mapping fields from our model to the archives' models, might be useful: https://github.com/BICCN/BCDC-Metadata/blob/83b00c19d21a7ef73196936a7b8e66b8637e59ee/design/schema/mappings.csv

@djarecka
Copy link
Contributor

djarecka commented Aug 2, 2024

Got it, there are definitely some parts of the data model that are not being used currently (either they contain no data or just aren't being displayed in the UI) - I will update this PR before the working meeting on Wednesday if you want to discuss specifics about those nuances then.

Great, thank you!

I think we took a stab at a similar exercise in the BICCN repo a while back, at least for mapping fields from our model to the archives' models, might be useful: https://github.com/BICCN/BCDC-Metadata/blob/83b00c19d21a7ef73196936a7b8e66b8637e59ee/design/schema/mappings.csv

yes, I think I even created the column for Dandi :) it was just before I knew anything about BICCN or BICAN

@djarecka
Copy link
Contributor

djarecka commented Aug 2, 2024

btw., perhaps you could update in google doc in the Slots tab, since I already fixed some of the issues: https://docs.google.com/spreadsheets/d/1dCFxzpLBReauJYIinCs_xZlTckpJWFx5nxSTFAe-vvE/edit?usp=sharing

Also, for the model, I would need more about the Classes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request metadata schema New or change in metadata schema
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New Metadata Schema Request]: Projects and Data Collections metadata
4 participants