Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: A system to suggest metadata changes #1942

Open
bendichter opened this issue May 19, 2024 · 2 comments
Open

Request: A system to suggest metadata changes #1942

bendichter opened this issue May 19, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request metadata Issues of dandiset/asset metadata handling

Comments

@bendichter
Copy link
Member

bendichter commented May 19, 2024

Here is an example of a great Dandiset: https://dandiarchive.org/dandiset/000957

The metadata is mostly right, but there are a few opportunities for improvement:

  • the keywords are not separated as they should be
  • The Funding information lists "Award Number: NIH BRAIN Initiative (U01NS113252)" when it should just be the grant code, without the "NIH BRAIN Initiative" label
  • The paper related resource is annotated as "Resource type: dcite:Dataset" when it should clearly be "Resource type: dcite:Preprint".

The DANDI Archive is filled with these sorts of small mistakes that could be easily fixed by a curation process.

I would like a system that would allow a curator to suggest changes to the metadata of a Dandiset that they do not own.

My preferred approach would be that the curator could create a pull request on the metadata.yml file within the DataLab GitHub repo for that Dandiset (e.g. here). We could have an automatic email that goes to the maintainer (which is required for all Dandisets) alerting them to this curation PR and requesting approval. This could be done entirely manually at first (validating against the metadata schema) and we could gradually replace steps with API and LLM-based automations that a human curator oversees. I am not sure what machinery would be needed to update the archive metadata based on an approved PR, but I would hope that wouldn't be too difficult to do via a GitHub Action mechanism.

@bendichter bendichter changed the title A system to suggest metadata changes Request: A system to suggest metadata changes May 19, 2024
@waxlamp waxlamp added enhancement New feature or request metadata Issues of dandiset/asset metadata handling labels Aug 22, 2024
@waxlamp
Copy link
Member

waxlamp commented Aug 22, 2024

@bendichter, how would you feel about the curator using a version of the meditor to create suggested edits that are then sent to the owners? Or even an in-archive JSON editor to perform "bulk" edits on the whole metadata (in lieu of doing a PR)?

I am wary of a DataLad PR approach because (1) I want to avoid a "two-way binding" between the Archive and the DataLadified version of the Dandisets and (2) that sounds complicated (I am not sure I want to deal with a Git diff and try to apply it to a JSON-represented metadata in a system that doesn't really store that as a JSON file).

@bendichter
Copy link
Member Author

bendichter commented Aug 22, 2024

@waxlamp I understand your desire to not rely on GitHub/DataLad for this. I'd like a solution where an API could be integrated and I'm alright with a clunky UI to start with. What if we just start with an API endpoint for where I can upload a new suggested dandiset.yaml? Or better yet, a subspec of the dandiset.yaml that only contains editable fields, leaving out fields like assetSummary, which is derived from the uploaded data, and Citation, which is derived by other metadata? Then I could make suggestions for fields like relatedResources, contributors, publishedBy, etc.? Then on the server side it would validate this dandiset.yaml and if it is valid will somehow submit this as a suggestion to the owner?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request metadata Issues of dandiset/asset metadata handling
Projects
None yet
Development

No branches or pull requests

2 participants