From df51f5f948b6588e23df1eb3c1666b368e392982 Mon Sep 17 00:00:00 2001 From: Hilmar Lapp Date: Wed, 10 Jan 2018 18:21:49 -0800 Subject: [PATCH] Consolidates and focuses episode on metadata quality Expands the focus of the metadata section to metadata at all levels, while simultaneously narrowing the focus to metadata directly linked to research reproducibility. See #45 for some context, and closes #46. --- _episodes/01-documentation.md | 33 ++++++++++++- _episodes/03-record-level-metadata.md | 67 --------------------------- 2 files changed, 31 insertions(+), 69 deletions(-) delete mode 100644 _episodes/03-record-level-metadata.md diff --git a/_episodes/01-documentation.md b/_episodes/01-documentation.md index 64f9b98..0e9b91e 100644 --- a/_episodes/01-documentation.md +++ b/_episodes/01-documentation.md @@ -9,11 +9,13 @@ questions: objectives: - Describe how documentation is useful to yourself and to others - Evaluate and rank the quality of comments in published notebooks +- Evaluate and rank the quality of existing metadata records. +- Describe types of metadata directly relevant for research reproducibility. keypoints: - Your code tells *what* you did. Your documentation tells *why* you did it and why it is important. - Documentation is the key to communicating your workflow and findings with your future self, collaborators, peers, and the general public. - Jupyter Notebooks are powerful because it allows documenting the what (the code) and the why (the motivation and/or intepretation) interspersed with each other. - +- Good - better - best: Some metadata are already much better than none, more metadata make better metadata. --- ## Overview @@ -31,7 +33,8 @@ In this lesson, we will discuss the types and styles for documentation, their ut - Describe how documentation is useful to yourself and to others - Evaluate and rank the quality of comments in published notebooks - Evaluate and rank the quality of existing metadata records. -- Describe the types of and importance of record level metadata. +- Describe the types of and importance of record level metadata. +- Describe types of metadata directly relevant for research reproducibility. ## Documentation best practices @@ -71,7 +74,33 @@ Compare and contrast different research product archives for the quality and val * Solange Duruz. (2016). Simulated breed for GENMON [Data set]. Zenodo. http://doi.org/10.5281/zenodo.220887 * Zichen Wang, Avi Ma'ayan. Zika-RNAseq-Pipeline v0.1. Zenodo; 2016. http://doi.org/10.5281/zenodo.56311 +## Metadata quality: Good - Better - Best + +> Metadata is the contextual information required to interpret data ([Fig 1](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005097#pcbi-1005097-g001)) and should be clearly defined and tightly integrated with data . The importance of metadata for context, reusability, and discovery has been written about at length in guides for data management best practices. _Hart _et al._ [Ten Simple Rules for Digital Data Storage.](http://dx.doi.org/10.1371/journal.pcbi.1005097) PLoS Comput Biol. 2016;12: e1005097_ + +Metadata include information about data points, observations (rows, columns), samples, etc. There are also record-level metadata (metadata of research inputs and products as records), including typically the following: +* Title +* Authors +* Description +* Keywords + +Good metadata are important for reproducible research, because they describe the data at various levels:, including measurement protocols, observations, versions of software and other tools, and thus provide the **context for interpreting the data, analysis, and results.** + +Metadata also aid discovery. + +### Exercise 2 (7 minutes) + +This is a continuation of Exercise 1. Rank the following Zenodo records from from 1 (most helpful/informative) to 3 (least helpful/informative) for metadata quality. + +* MS Salmanpour. (2016). Data set [Data set]. Zenodo. http://doi.org/10.5281/zenodo.193025 +* Solange Duruz. (2016). Simulated breed for GENMON [Data set]. Zenodo. http://doi.org/10.5281/zenodo.220887 +* Zichen Wang, Avi Ma'ayan. Zika-RNAseq-Pipeline v0.1. Zenodo; 2016. http://doi.org/10.5281/zenodo.56311 +Discuss the following questions: +* What were the criteria that you used to rank? +* What was missing? +* What was the most helpful? +* What was the most critical piece of information? ## Examples for learning what's possible diff --git a/_episodes/03-record-level-metadata.md b/_episodes/03-record-level-metadata.md deleted file mode 100644 index f9275c3..0000000 --- a/_episodes/03-record-level-metadata.md +++ /dev/null @@ -1,67 +0,0 @@ ---- -title: Record-level Metadata -teaching: 15 -exercises: 20 -objectives: -- Evaluate and rank the quality of existing metadata records. -- Describe the types of and importance of record level metadata. -keypoints: -- _TODO_ ---- - -# Creating Record Level Metadata - -## Learning objectives: - -- Evaluate and rank the quality of existing metadata records. -- Describe the types of and importance of record level metadata. - -## Metadata quality: Good - Better - Best - -### Exercise 1 - rank these Zenodo entries in terms metadata quality (7 minutes) - -This is a continuation of the Exercise 3 in the [Documentation](documentation.md) section. Rank these from from 1 (most helpful/informative) to 3 (least helpful/informative): - -* MS Salmanpour. (2016). Data set [Data set]. Zenodo. http://doi.org/10.5281/zenodo.193025 -* Solange Duruz. (2016). Simulated breed for GENMON [Data set]. Zenodo. http://doi.org/10.5281/zenodo.220887 -* Zichen Wang, Avi Ma'ayan. Zika-RNAseq-Pipeline v0.1. Zenodo; 2016. http://doi.org/10.5281/zenodo.56311 - -Discuss the results. Specifically, answer and discuss the following questions: - -* What were the criteria that you used to rank? -* What was missing? -* What was the most helpful? -* What was the most critical piece of information? - -## The metadata in your life - -You're used to metadata within your research. You've got metadata about specific data points, observations, samples, etc. But there are many more parts of metadata. - -The information that you were looking at in the Zenodo records is metadata. Metadata about the dataset (record) on that page. Let's take a look at the pieces of these pages. - -Point out where these pieces are: - -* Title -* Authors -* Description -* Keywords - -This information is important because: - -* People need to find your stuff -* People need to know what your stuff is - -Good metadata are important for reproducible research, because they describe the data, and thus provide the context for interpreting the data, analysis, and results. - -Let's think about the workflow of discovery, the user... - -1. Searches for something -2. Reviews the results - is this the kind I was looking for, and if so, is it worth studying further? -3. Might add some filters to reduce and refine the results -4. Selects a record to review and goes to that record's page -5. Reviews the new information on this page, including the fuller description, keywords, and other readme/documentation files. -6. Downloads and digs in to the data files - -This person would continue to move through these steps so long as the information continues to look sufficiently interesting. - -Metadata also aid discovery. Metadata should be clearly defined and tightly integrated with the data and project ([Hart et al. 2016](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005097#sec008)).