Skip to content

MDM Schema

Fabio Cumbo edited this page Jun 7, 2021 · 1 revision

MDM is an acronym for MetaRefSGB Data Model. It has a central role in the organisation of all data processed by MetaRefSGB and it is designed to guarantee the data and metadata consistency, additionally to their integrity and availability.

It is a set of JSON schemes required for validating the MAGs, genomes, and metadata definition files that compose the MetaRefSGB resources.

If you want to submit an update request, you should first know how to structure the files that define the set of new genomes that will be processed by MetaRefSGB and then clustered into SGBs.

MDM Components

The MDM Components represent an abstraction of two main entities (plus a supplementary one) that compose every single update request submitted to MetaRefSGB:

  1. genome: this is the abstraction level that represent a Reference Genome entity;
  2. MAG: abstraction level for the Metagenome-assembled Genome entity;
  3. metadata: additional information about samples.

Data Validation

You can run the the following command in your terminal in order to automatically serialize and validate your data:

MetaRefSGB --mags=~/MAGs.txt \
           --genomes=~/genomes.txt \
           --metadata=~/metadata.txt \
           --validate-input

You may want to validate your data all within the same command or one-by-one. If you need to validate just a single file, e.g. ~/metadata.txt, you can run the following command:

MetaRefSGB --metadata=~/metadata.txt \
           --validate-input

This will convert every line of the input file into a JSON object and will validate all of them against the JSON scheme.

To propose any change to the structure of the MDM schemes (include new properties or modify an existing one), please open an issue or a pull-request and we will reply as soon as the MetaRefSGB team will evaluate your proposal.