Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revise the curation notebooks set #32

Open
laureD19 opened this issue May 29, 2024 · 2 comments
Open

revise the curation notebooks set #32

laureD19 opened this issue May 29, 2024 · 2 comments
Assignees

Comments

@laureD19
Copy link
Contributor

Initially, a comprehensive approach was designed to support the variety of curation tasks to perform in order to maintain MP data up-to-date and as accurate as possible. See here - https://marketplace.sshopencloud.eu/contribute/moderator-guidelines#continuously-improving-metadata-quality - and especially here for the set of notebooks envisioned - https://marketplace.sshopencloud.eu/contribute/moderator-guidelines#python-notebooks.
After almost two years of practice, it is time to revise the approach, adapt the set of notebooks and update the main branch we publicly share - cf. #25

A few generic comments are covered in this issue, and individual issues will need to be created per notebook/thematic.

1. Ingest review notebook (to be developed)

In the meantime, ingest notebooks are in the making. The review notebook should include at least checks for presence of values in fields that were supposed to be covered by the mapping, and offer a view on the nature of some random values. Klaus worked on something earlier in 2023 but I could not find his script anymore.

2. Data analysis notebook including some Items provenance analysis, providing a non-redundancy overview, performing some completeness checks (overview of null values; description length), and checking the detailed coverage of given fields.

this one is fine and we can keep it. Maybe just a light review needed.

3. Automatic checks notebooks = curation-flags.

3.1 curation-flag-URL
3.2 curation-flag-description
3.3 Curation-flag-relations (to be completed)
3.4 curation-flag-coverage

open question: do we want to continue using the curation flags? In practice, they were not really used these last months.

4. Mass analysis and/or corrections notebooks

4.1 Duplicates & merging
4.2 Actors curation
4.3 Vocabulary management (to be developed)
@laureD19
Copy link
Contributor Author

one step ahead was suggested by @mkrzmr and agreed during our last curation TF meeting: start with listing curation requirements in the following document: https://docs.google.com/document/d/13lQhPNyWcb0RWqL0F3x376nVK67fAuvGSoB2HQg8sNg/edit

@cesareconcordia @mkrzmr @KlausIllmayer @carikan @kreetrapper @vronk - please contribute to this list that will be used as a basis for one of our next discussions

@mkrzmr
Copy link
Contributor

mkrzmr commented Nov 22, 2024

added to 4.2 to find actors with wrong externalIds (full URL instead of identifier)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants