Skip to content

v0.7 data snapshot

Brian edited this page Dec 1, 2017 · 3 revisions

VERSION 0.7

location

(Contact for bucket location)

contents

23604192 all.json
    6624 cgi.json
   36912 jax.json
   10088 jax_trials.json
   61808 molecularmatch.json
23306360 molecularmatch_trials.json
   17104 oncokb.json
    3864 pmkb.json
     200 sage.json

Each file is contains evidence documents from the respective source. all.json contains the aggregations of all sources.

source count
molecularmatch_trials 199,069
jax 5,754
brca 5,717
oncokb 4,048
civic 3,497
molecularmatch 2,079
cgi 1,432
jax_trials 1,173
pmkb 609
sage 69
evidence_label count
C 167,453
B 33,767
D 16,703
A 1,723

structure

// An association between a phenotype('disease'), environment('drug')
// and genome(feature), harvested from a trusted knowledge base(source).
// For organization, the entrez name('genes') is included separately.
// For traceability, the document from the original source is included


message Evidence {
  string source = 1;
  repeated string genes = 2;

  // "ga4gh/sequence_annotations.proto"
  repeated google.protobuf.Struct features = 3;

  // "ga4gh/genotype_phenotype.proto"
  google.protobuf.Struct association = 4;

  // opaque source documents
  oneof opaque_source {
    google.protobuf.Struct cgi = 5;
    google.protobuf.Struct jax = 6;
    google.protobuf.Struct civic = 7;
    google.protobuf.Struct oncokb = 8;
    google.protobuf.Struct molecularmatch = 9;
    google.protobuf.Struct molecularmatch_trials = 10;
    google.protobuf.Struct jax_trials = 11;
    google.protobuf.Struct sage = 12;
  }  
}

Note: the feature and associations are based on ga4gh.feature and ga4gh.FeaturePhenotypeAssociation, but have evolved. Future phases will create a PR to the appropriate GA4GH repository