Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Term definitions #13

Open
laurianvm opened this issue Nov 29, 2024 · 5 comments
Open

Term definitions #13

laurianvm opened this issue Nov 29, 2024 · 5 comments
Assignees
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed

Comments

@laurianvm
Copy link
Contributor

Dear all,
A preliminary list of terms and definitions is available here (and listed below).
Please have a look and provide feedback either as comment to this issue or by creating new issues

| **Term** | **Definition** |
|----------|-----------------|
| **Observatory** | One of water column, soft sediment, ARMS (env_package) collection location, and its associated data.|
| **Partner** | One partner may have more than one observatory; one observatory can be operated by more than one partner.|
| **Sampling event** | An action conducted at a specific location, associated with a particular Observatory, at a given time, which results in the collection of one or more samples.|
| **EMO BON record** | A digital representation of a sampling event, capturing the relevant data and metadata associated with it. |
| **Sample** | A material sample collected during events. Can also be used to refer to (the origin of) the sequence that metagoflow was run on.|
| **Catalogue asset** | The smallest unit of data that goes into the catalogue |
| **Repository** | A repository is a storage location for files and their version history, managed using Git version control. It allows users to track changes, collaborate with others, and maintain a complete record of the project's development over time. |
| **RO-Crate** | A collection of data files, metadata, and contextual information that organizes research data in a structured format, enabling easy sharing, reuse, and understanding in both machine-readable and human-readable forms. |
| **ro-crate-metadata.json** | A JSON-LD file that describes the contents of an RO-Crate, mapping the relationships between files and their metadata to streamline the tracing of provenance, context, and purpose within research workflows. |

@laurianvm laurianvm added documentation Improvements or additions to documentation help wanted Extra attention is needed labels Nov 29, 2024
@kmexter
Copy link
Contributor

kmexter commented Nov 29, 2024

it tells me I must be on a branch to make changes to the file, and as I find that a bit of a faff, I will add any suggestions here

One thing I want @cymon to add is what a "stub" is (came up on the s3store-rocrate discussion)

@kmexter
Copy link
Contributor

kmexter commented Dec 2, 2024

Term Definition
Observatory An EMO BON organisational unit: EMOBON stations have one observatories per sample type (water column, soft sediment) which are collected at one location, and that location is fixed to that observatory. Strictly-speaking the observatory is fixed to the sample type (so one observatory can be water or sediment), but when talking casually we do not include this distinction since the base name of the observatory (obs_id) is the same
Partner One EMO BON partner, usually but not limited to one institute. Stations can have have more than one observatory; one observatory can be operated by more than one partner.
Station Synonymous with partner
Sampling event An action conducted at a specific location, associated with a particular Observatory, at a given time, which results in the collection of one or more samples.
Sample / Material Sample A material sample collected during an event. Is also used to refer to (the origin of) the sequence that the bioinformatics was run on, which at that point is no longer a physical sample, but does have the unique sample ID of that physical sample.
EMO BON record A digital representation of a sampling event, capturing the relevant data and metadata associated with it. There is no fixed idea of what is included in an EMO BON record, as that depends on the system that these records are being held in
Catalogue asset The smallest unit of "EMOBON dataset" that goes into a datasets metadata catalogue. Can indeed be a single data file or a set of files.
Repository A repository is a storage location for files and their version history, managed using Git version control. It allows users to track changes, collaborate with others, and maintain a complete record of the project's development over time. EMO BON "repos" are on GitHub
RO-Crate A collection of data files, metadata, and contextual information that organizes research data in a structured format, enabling easy sharing, reuse, and understanding in both machine-readable and human-readable forms. When using this word, you are referring to the concept rather than any instantiation of it.
EMOBON RO-Crates Are the ro-crates that we have organised for the data from EMO BON. In most cases this is a single repo (plus any folders), but that is not always the case and one repo can hold multiple ro-crates in the form of multiple rocrate-metadata.json files.
rocrate-metadata.json A JSON-LD file that describes the contents of an instantiation of a RO-Crate, mapping the relationships between files and their metadata to streamline the tracing of provenance, context, and purpose within research workflows.
EMO BON data Here we mean (1) the content of the logsheets, which are filled by the observatories to describe their collected samples, (2) sequences in ENA, (3) outputs from bioinformatics
EMO BON metadata Here we mean the data that is used specifically to describe EMO BON data, that performs the function of allowing discovery, understanding, organising, cataloguing, etc. Metadata are recorded in the rocrate-metadata.json files, they are added to ENA (sample and experiment accessions) and in files in the EMO BON repos governance-data, sequencing-data, observatory-profile, among others.
logsheet The spreadsheets in which the observatories write their sample and event data. The source spreadsheets are on the EMO BON googledrive. When they are harvested as CSV into EMOBON's GH space. The "transformed" logsheets are those that have been subjected to a date-range selection and a QC
sequence A DNA string. Specifically, we mean (raw) sequences as produced from the samples by Genoscope and held on their cloud drive or as placed in ENA.
processed sequences / OTUs / ASVs These are sequences that have been processed by a bioinformatics code to a stage where they can be/have been compared to taxonomic reference libraries.

Some may want to check my definitions esp for the omics terms. Also, please add if you feel anything is missing

@laurianvm
Copy link
Contributor Author

Term definitions have been updated: https://github.com/emo-bon/emo-bon.github.io/blob/main/docs/terms.md

mpo-vliz added a commit that referenced this issue Dec 6, 2024
some extra suggestions in light of #13
@mpo-vliz
Copy link

mpo-vliz commented Dec 6, 2024

sorry to chime in late -- but I think we make this even more useful with two extra columns (that might be empty)

  1. example --> makes things really clear, and helps picture
  2. technical representation / modelling --> in case this thing is represented somewhere in the data-management flow or even the RDF publication, then why not also picture nicely how these come to life in the digital world?

Also, in general - using negatives, i.e. scentences that state what something is clearly NOT, might also be useful to further avoid confusion (no real example at the moment)

Addtionally, on the current text I made these suggestions --> #15

Main remarks:

  • temped to, but didn't add the "these are the actual tubes being shipped" to the material-sample definition. But those are them, right? Or yet more definitions needed?
  • adding that stations have a geolocation, right? or are there multiple locations for the various observations?
  • adding the sampling event also captures environment measurements
  • dropped the "smallest" unit for catalog asset -- I think it limits needlessly, catalog owners can decide to record different things from different levels in the EMO BON
  • fixed some * bullets that ended up creating extra columns?
  • Added "DVC Stub file" explanation
  • Left untouched, but uncomfertable with the "EMO BON record" -- it seems to be something that can be something else? It kind of defies our attempt at clarity in this exercise?

@cedricdcc or @laurianvm should we make this kind of docs easily navigatable from the landing page?

@kmexter
Copy link
Contributor

kmexter commented Dec 6, 2024

well, other than they are not tubes but filters (I believe - would need to check with christina) but yes.
observatories have geolocations, stations have addresses
you can remove emo bon record if you like

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

6 participants