googlegenomics · deflaux · Jan 30, 2017 · Jan 26, 2017 · Jan 27, 2017 · Jan 28, 2017
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -186,6 +186,7 @@
 .. _VariantSet: https://cloud.google.com/genomics/reference/rest/v1/variantsets
 .. _Load Genomic Variants: https://cloud.google.com/genomics/v1/load-variants
 .. _Understanding the BigQuery Variants Table Schema: https://cloud.google.com/genomics/v1/bigquery-variants-schema
+.. _Verily DeepVariant: https://cloud.google.com/genomics/v1alpha2/deepvariant
 
 .. _Using Google Cloud Storage with Big Data: https://cloud.google.com/storage/docs/working-with-big-data
 .. _gsutil: https://cloud.google.com/storage/docs/gsutil
@@ -252,7 +253,7 @@
 
 .. GLOBAL SUBSTITUTIONS CAN GO HERE
 
-.. |sparkADC| replace:: If the `Application Default Credentials`_ are not sufficient, use ``--secretsFile=PATH/TO/YOUR/client_secrets.json``.  If you do not already have this file, see the `authentication instructions`_ to obtain it.
+.. |sparkADC| replace:: If the `Application Default Credentials`_ are not sufficient, use ``--client-secrets=PATH/TO/YOUR/client_secrets.json``.  If you do not already have this file, see the `authentication instructions`_ to obtain it.
 .. |dataflowADC| replace:: If the `Application Default Credentials`_ are not sufficient, use ``--client-secrets PATH/TO/YOUR/client_secrets.json``.  If you do not already have this file, see the `authentication instructions`_ to obtain it.
 .. |dataflowSomeRefs| replace:: Use a comma-separated list to run over multiple disjoint regions.  For example to run over `BRCA1`_ and `BRCA2`_ ``--references=chr13:32889610:32973808,chr17:41196311:41277499``.
 .. |dataflowAllRefs| replace:: To run this pipeline over the entire genome, use ``--allReferences`` instead of ``--references=chr17:41196311:41277499``.

diff --git a/docs/source/includes/spark_setup.rst b/docs/source/includes/spark_setup.rst
@@ -43,6 +43,6 @@
 
         cd spark-examples
         sbt assembly
-        cp target/scala-2.10/googlegenomics-spark-examples-assembly-*.jar ~/
+        cp target/scala-2.*/googlegenomics-spark-examples-assembly-*.jar ~/
         cd ~/
 
diff --git a/docs/source/use_cases/discover_public_data/genomic_data_toc.rst b/docs/source/use_cases/discover_public_data/genomic_data_toc.rst
@@ -23,6 +23,7 @@ __ RenderedVersion_
   1000_genomes
   platinum_genomes
   platinum_genomes_deepvariant
+  precision_fda
   reference_genomes
   mssng_data
   isb_cgc_data

diff --git a/docs/source/use_cases/discover_public_data/platinum_genomes_deepvariant.rst b/docs/source/use_cases/discover_public_data/platinum_genomes_deepvariant.rst
@@ -11,13 +11,13 @@ Platinum Genomes DeepVariant
    | **If you are reading this on github, you should instead click** `here`__.         |
    +-----------------------------------------------------------------------------------+
 
-.. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/platinum_genomes.html
+.. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/platinum_genomes_deepvariant.html
 
 __ RenderedVersion_
 
 .. comment: end: goto-read-the-docs
 
-This dataset comprises the `6 member CEPH pedigree 1463 <http://www.ebi.ac.uk/ena/data/view/PRJEB3381>`_ called using the DeepVariant toolchain and reference genome GRCh38.  See the `DeepVariant preprint <http://biorxiv.org/content/early/2016/12/14/092890>`_ for full details:
+This dataset comprises the `6 member CEPH pedigree 1463 <http://www.ebi.ac.uk/ena/data/view/PRJEB3381>`_ called using the  the alpha version of the `Verily DeepVariant`_ toolchain aligned to :ref:`vgrch38` reference genome.  See the `DeepVariant preprint <http://biorxiv.org/content/early/2016/12/14/092890>`_ for full details:
 
 |  `Creating a universal SNP and small indel variant caller with deep neural networks <http://biorxiv.org/content/early/2016/12/14/092890>`_
 |  Ryan Poplin, Dan Newburger, Jojo Dijamco, Nam Nguyen, Dion Loy, Sam Gross, Cory Y. McLean, Mark A. DePristo

diff --git a/docs/source/use_cases/discover_public_data/precision_fda.rst b/docs/source/use_cases/discover_public_data/precision_fda.rst
@@ -0,0 +1,38 @@
+PrecisionFDA Truth Challenge
+============================
+
+.. comment: begin: goto-read-the-docs
+
+.. container:: visible-only-on-github
+
+   +-----------------------------------------------------------------------------------+
+   | **The properly rendered version of this document can be found at Read The Docs.** |
+   |                                                                                   |
+   | **If you are reading this on github, you should instead click** `here`__.         |
+   +-----------------------------------------------------------------------------------+
+
+.. _RenderedVersion: http://googlegenomics.readthedocs.org/en/latest/use_cases/discover_public_data/precision_fda.html
+
+__ RenderedVersion_
+
+.. comment: end: goto-read-the-docs
+
+This dataset includes both:
+
+* the input for the `PrecisionFDA Truth Challenge <https://precision.fda.gov/challenges/truth>`_ comprised of whole-genome sequences for HG001 (NA12878) and HG002 (NA24385)
+* the output from the alpha version of the `Verily DeepVariant`_ toolchain aligned to :ref:`vgrch38` reference genome.  See the `DeepVariant preprint <http://biorxiv.org/content/early/2016/12/14/092890>`_ for full details:
+
+  |  `Creating a universal SNP and small indel variant caller with deep neural networks <http://biorxiv.org/content/early/2016/12/14/092890>`_
+  |  Ryan Poplin, Dan Newburger, Jojo Dijamco, Nam Nguyen, Dion Loy, Sam Gross, Cory Y. McLean, Mark A. DePristo
+  |  DOI: https://doi.org/10.1101/092890
+  |
+
+Google Cloud Platform data locations
+------------------------------------
+
+* Google Cloud Storage folder `gs://genomics-public-data/precision-fda <https://console.cloud.google.com/storage/genomics-public-data/precision-fda/>`_
+
+Provenance
+----------
+
+* The FASTQ files in `gs://genomics-public-data/precision-fda/input <https://console.cloud.google.com/storage/genomics-public-data/precision-fda/input>`_ were run through the `Verily DeepVariant`_ alpha toolchain to produce the corresponding files in `gs://genomics-public-data/precision-fda/output/deepvariant-alpha <https://console.cloud.google.com/storage/genomics-public-data/precision-fda/output/deepvariant-alpha>`_.
diff --git a/docs/source/use_cases/discover_public_data/reference_genomes.rst b/docs/source/use_cases/discover_public_data/reference_genomes.rst
@@ -58,6 +58,36 @@ Genome Reference Consortium Human Build 38 includes data from 39 gzipped fasta f
 
 More information on this source data can be found in this `NCBI article <http://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26/>`__ and in the `FTP README <ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/README_ASSEMBLIES>`__.
 
+
+.. _vgrch38:
+
+Verily's GRCh38
+^^^^^^^^^^^^^^^
+
+Verily's GRCh38 reference genome is fully compatible with any b38 genome in the autosome.
+
+Verily's GRCh38:
+
+* excludes all patch sequences
+* omits alternate haplotype chromosomes
+* includes decoy sequences
+* masks out duplicate copies of centromeric regions
+
+The base assembly is `GRCh38_no_alt_plus_hs38d1 <ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna.gz>`_. This assembly version was created specifically for analysis, with its rationale and exact genome modifications thoroughly documented in its `README <ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/README_analysis_sets.txt>`_ file.
+
+Verily applied the following modifications to the base assembly:
+
+* Reference segment names are prefixed with "chr".
+
+   +--------------------------------------------------------------+
+   | Many of the additional data files we use are provided        |
+   | by GENCODE, which uses "chr" naming convention.              |
+   +--------------------------------------------------------------+
+
+* All 74 extended IUPAC codes are converted to the first matching alphabetical base pair as recommended in the VCF 4.3 specification.
+
+* This release of the genome reference is named ``GRCh38_Verily_v1``
+
 hg19
 ^^^^