Initial commit version 1.11.0

PacificBiosciences · Aug 17, 2018 · de471a6 · de471a6
commit de471a6
Show file tree

Hide file tree

Showing 28 changed files with 715 additions and 0 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,34 @@
+Copyright (c) 2016-2018, Pacific Biosciences of California, Inc.
+
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted (subject to the limitations in the
+disclaimer below) provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright
+   notice, this list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above
+   copyright notice, this list of conditions and the following
+   disclaimer in the documentation and/or other materials provided
+   with the distribution.
+
+ * Neither the name of Pacific Biosciences nor the names of its
+   contributors may be used to endorse or promote products derived
+   from this software without specific prior written permission.
+
+NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE
+GRANTED BY THIS LICENSE. THIS SOFTWARE IS PROVIDED BY PACIFIC
+BIOSCIENCES AND ITS CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
+WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL PACIFIC BIOSCIENCES OR ITS
+CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGE.
diff --git a/README.md b/README.md
@@ -0,0 +1,51 @@
+<p align="center">
+  <img src="doc/img/minorseq.png" alt="minorseq logos" width="400px"/>
+</p>
+<h1 align="center">MinorSeq</h1>
+<p align="center">Minor Variant Calling and Phasing Tools</p>
+
+***
+## Availability
+The latest pre-release, developers-only linux binaries can be installed via [bioconda](https://bioconda.github.io/):
+
+    conda install minorseq
+
+These binaries are not ISO compliant.
+For research only.
+Not for use in diagnostics procedures.
+
+Official support is only provided for official and stable [SMRT Analysis builds](http://www.pacb.com/products-and-services/analytical-software/)
+provided by PacBio.
+
+Unofficial support for binary pre-releases is provided via github issues,
+not via mail to developers.
+
+## Quick Tools Overview
+
+### [End-to-end workflow](doc/INTRODUCTION.md)
+
+Overview how to run your sample.
+
+### [Minor variant caller](doc/JULIET.md)
+
+`juliet` identifies minor variants from aligned ccs reads.
+
+### [Reduce alignment](doc/FUSE.md)
+
+`fuse` reduces an alignment into its closest representative sequence.
+
+### [Swap BAM reference](doc/CLERIC.md)
+
+`cleric` swaps the reference of an alignment by transitive alignment.
+
+### [Minor variant pipeline](doc/JULIETFLOW.md)
+
+`julietflow` automatizes the minor variant pipeline.
+
+### [Mix Data _In-Silico_](doc/MIXDATA.md)
+
+`mixdata` helps to mix clonal strains _in-silico_ for benchmarking studies.
+
+## Disclaimer
+
+THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES.
diff --git a/doc/CLERIC.md b/doc/CLERIC.md
@@ -0,0 +1,44 @@
+<h1 align="center">
+    Cleric - Swap BAM alignment reference
+</h1>
+
+<p align="center">
+  <img src="img/cleric.png" alt="Logo of Cleric" width="200px"/>
+</p>
+
+## Install
+Install the minorseq suite using bioconda, more info [here](../README.md).
+One of the binaries is called `cleric`.
+
+## Input data
+*Cleric* operates on aligned records in the BAM format, the original reference
+and the target reference as FASTA.
+BAM file have to PacBio-compliant, meaning, cigar `M` is forbidden.
+Two sequences have to be provided, either in individual files or combined in one.
+The header of the original reference must match the reference name in the BAM.
+
+## Scope
+Current scope of *Cleric* is converting a given alignment to a different
+reference. This is done by aligning the original and target reference sequences.
+A transitive alignment is used to generate the new alignment.
+
+## Output
+*Cleric* provides a BAM file with the file named as provided via the last argument.
+
+## Example
+Simple example:
+```
+cleric m530526.align.bam reference.fasta new_ref.fasta cleric_output.bam
+```
+
+Or:
+```
+cat reference.fasta new_ref.fasta > combined.fasta
+cleric m530526.align.bam combined.fasta cleric_output.bam
+```
+
+## FAQ
+### Cleric does not finish.
+Runtime is linear in the number of reads provided. The alignment step runs a
+Needleman-Wunsch; with NxM runtime. Please do not provide references with
+lengths of human chromosomes, but concentrate on your actual amplicon target.
diff --git a/doc/FUSE.md b/doc/FUSE.md
@@ -0,0 +1,32 @@
+<h1 align="center">
+    fuse - Reduce alignment into its representative sequence
+</h1>
+
+<p align="center">
+  <img src="img/fuse.png" alt="Logo of Fuse" width="100px"/>
+</p>
+
+## Install
+Install the minorseq suite using bioconda, more info [here](../README.md).
+One of the binaries is called `fuse`.
+
+## Input data
+*Fuse* operates on aligned records in the BAM format.
+BAM files have to PacBio-compliant, meaning, cigar `M` is forbidden.
+
+## Scope
+Current scope of *Fuse* is creation of a high-quality consensus sequence.
+Fuse includes in-frame insertions with a certain distance to each other.
+Major deletions are being removed.
+
+## Output
+*Fuse* provides a FASTA file per input. Output file is provided by the second
+argument.
+
+## Example
+Simple example:
+```
+fuse m530526.align.bam m530526.fasta
+```
+
+Output: `m530526.fasta`
diff --git a/doc/INTRODUCTION.md b/doc/INTRODUCTION.md
@@ -0,0 +1,59 @@
+## How to run your sample 101
+
+### Step 1
+A simple bioconda installation:
+
+```sh
+conda install minorseq
+```
+
+--------
+### Step 2
+Create CCS2 reads from your sequel chip
+
+> Juliet currently uses PacBio CCS reads as input. The use of CCS rich QVs allows sensitive minor variant calling.
+
+```
+ccs --richQVs m54000_170101_050702_3545456.subreadset.xml yourdata.ccs.bam
+```
+
+--------
+### Step 3
+Filter CCS2 reads as described here: [JULIETFLOW.md#filtering](JULIETFLOW.md#filtering)
+
+> To ensure a uniform noise profile, we filter to 99% predicted
+>  accuracy. Barcode demultiplexing might be done.
+
+--------
+### Step 4
+Download the reference sequence of interest as `ref.fasta`
+
+> Juliet currently calls amino acid variants to a given refrence
+>  sequence so they might be easily related to known variants.
+
+--------
+### Step 5
+Create a target-config for your gene as described here: [JULIET.md#target-configuration](JULIET.md#target-configuration)
+
+> The target-config specifies Open Reading and how specific amino
+>  acids should be labeled in output results (ie Disease Resistant
+>  Mutation variants)
+
+
+--------
+### Step 6
+Run *julietflow*
+
+> The calling sequence is very simple taking sequencing reads, the
+>  reference, and the reference annotation config.
+
+```
+julietflow -i yourdata.filtered.ccs.bam -r ref.fasta -c targetconfig.json
+```
+
+--------
+### Step 7
+Interpret results in `yourdata.json` or `yourdata.html`
+
+> 'yourdata.html' is easily viewed in a web browser and reflects the
+>  underlying results stored in 'yourdata.json'