Request: A system to suggest metadata changes #1942

bendichter · 2024-05-19T00:18:29Z

Here is an example of a great Dandiset: https://dandiarchive.org/dandiset/000957

The metadata is mostly right, but there are a few opportunities for improvement:

the keywords are not separated as they should be
The Funding information lists "Award Number: NIH BRAIN Initiative (U01NS113252)" when it should just be the grant code, without the "NIH BRAIN Initiative" label
The paper related resource is annotated as "Resource type: dcite:Dataset" when it should clearly be "Resource type: dcite:Preprint".

The DANDI Archive is filled with these sorts of small mistakes that could be easily fixed by a curation process.

I would like a system that would allow a curator to suggest changes to the metadata of a Dandiset that they do not own.

My preferred approach would be that the curator could create a pull request on the metadata.yml file within the DataLab GitHub repo for that Dandiset (e.g. here). We could have an automatic email that goes to the maintainer (which is required for all Dandisets) alerting them to this curation PR and requesting approval. This could be done entirely manually at first (validating against the metadata schema) and we could gradually replace steps with API and LLM-based automations that a human curator oversees. I am not sure what machinery would be needed to update the archive metadata based on an approved PR, but I would hope that wouldn't be too difficult to do via a GitHub Action mechanism.

The text was updated successfully, but these errors were encountered:

waxlamp · 2024-08-22T00:45:08Z

@bendichter, how would you feel about the curator using a version of the meditor to create suggested edits that are then sent to the owners? Or even an in-archive JSON editor to perform "bulk" edits on the whole metadata (in lieu of doing a PR)?

I am wary of a DataLad PR approach because (1) I want to avoid a "two-way binding" between the Archive and the DataLadified version of the Dandisets and (2) that sounds complicated (I am not sure I want to deal with a Git diff and try to apply it to a JSON-represented metadata in a system that doesn't really store that as a JSON file).

bendichter · 2024-08-22T01:23:40Z

@waxlamp I understand your desire to not rely on GitHub/DataLad for this. I'd like a solution where an API could be integrated and I'm alright with a clunky UI to start with. What if we just start with an API endpoint for where I can upload a new suggested dandiset.yaml? Or better yet, a subspec of the dandiset.yaml that only contains editable fields, leaving out fields like assetSummary, which is derived from the uploaded data, and Citation, which is derived by other metadata? Then I could make suggestions for fields like relatedResources, contributors, publishedBy, etc.? Then on the server side it would validate this dandiset.yaml and if it is valid will somehow submit this as a suggestion to the owner?

bendichter assigned waxlamp May 19, 2024

bendichter changed the title ~~A system to suggest metadata changes~~ Request: A system to suggest metadata changes May 19, 2024

waxlamp added enhancement New feature or request metadata Issues of dandiset/asset metadata handling labels Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: A system to suggest metadata changes #1942

Request: A system to suggest metadata changes #1942

bendichter commented May 19, 2024 •

edited

Loading

waxlamp commented Aug 22, 2024

bendichter commented Aug 22, 2024 •

edited

Loading

Request: A system to suggest metadata changes #1942

Request: A system to suggest metadata changes #1942

Comments

bendichter commented May 19, 2024 • edited Loading

waxlamp commented Aug 22, 2024

bendichter commented Aug 22, 2024 • edited Loading

bendichter commented May 19, 2024 •

edited

Loading

bendichter commented Aug 22, 2024 •

edited

Loading