Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VRS schema definitions mismatch #196

Open
v-rocheleau opened this issue Oct 26, 2023 · 3 comments
Open

VRS schema definitions mismatch #196

v-rocheleau opened this issue Oct 26, 2023 · 3 comments

Comments

@v-rocheleau
Copy link

Hi!

While playing with the phenopacket-tools examples, after a while I noticed that the VRS schemas in this repo vary from the official VRS spec, as stated in schema description itself in vrs-variation-adapter.json.

Is there an historical reason for this bifurcation from the specs? If so is it still needed?

If not, I believe a better approach would be to obtain the vrs.json from the VRS repo in order to replace vrs-variation-adapter.json with the official specs, it could be done by using a Git submodule for instance.

@ielis
Copy link
Collaborator

ielis commented Oct 27, 2023

Hi @v-rocheleau
the reason for the divergence is that for Phenopacket Schema, the protobuf files are the single source of truth. For the VRS part, when you follow the links starting from Phenopacket Schema repo , you land at this VRS proto file.

These VRS proto files are part of Phenopacket Schema v2.0.0 and, consequently, that's what phenopacket-tools are designed to validate.

As far as I know, VRS specs are encoded into JSON schema. However, not JSON schema concepts are translatable to Protocol Buffers language. So, it is unlikely that using Git submodules would solve this issue..

@v-rocheleau
Copy link
Author

Thanks for the quick response @ielis

I understand better now, given the protobuf/json-schema concept differences.

The vrs.json json-schema file from the VRS repo is imported in the official vrs-python repo as a submodule, so I was under the impression that this schema file could be used as the VRS JSON-schema source of truth.

Given that phenopacket-tools supports YML, protobuf and JSON-schema formats, do you think it would make sense for it to use the official schema file depending on the format?

Some background on why I am asking this:

  • I have been using some of the example Phenopacket V2 json files in this repo to help me implement the update from V1 to V2 on a Phenopackets API service: Katsu (still in progress)
  • Since our API only handles JSON data, I have been using the aforementioned vrs.json file to validate VariantDescriptors that contain VRS data.
  • I opened this issue once I realized that the JSON examples fail validation against the vrs.json schemas.

@ielis
Copy link
Collaborator

ielis commented Nov 15, 2023

Hi @v-rocheleau

Given that phenopacket-tools supports YML, protobuf and JSON-schema formats, do you think it would make sense for it to use the official schema file depending on the format?

I do not think this is the way how Phenopacket Schema is defined. The latest (v2.0.2) version includes specific protobuf file that should cover the VRS elements. However, 1:1 mapping between the protobuf files and VRS JSON schema does not exist. Therefore, a JSON document that contains a sub-tree from the "official" VRS schema, e.g. in VariationDescriptor > Variation field will not validate as v2.0.2 phenopacket even if everything else is OK.
So, coming back to your question above, I don't think this is the right thing to do.

Phenopacket tools can convert V1->V2, and the conversion is available both through CLI and the Java API (Javadocs here). However, it only works for the "VRS" items as defined in the protobuf version (not vrs.json, vrs-python, etc.), which may not be what you need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants