Skip to content

Commit

Permalink
Merge branch 'devel'
Browse files Browse the repository at this point in the history
  • Loading branch information
nservant committed May 15, 2020
2 parents 1618c17 + fb8cce1 commit c410dea
Show file tree
Hide file tree
Showing 10 changed files with 344 additions and 258 deletions.
14 changes: 10 additions & 4 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
*************************************
CHANGES IN VERSION 1.2.0

SIGNIFICANT USER CHANGES

o Add parameter `--minAltDepth`
o Rework filters to handle multiallelic Variants
o Swith filter name from --minVAF & --minMAF to --vaf & --maf
o test if the vcf contains only one sample

*************************************
CHANGES IN VERSION 1.1.0

Expand All @@ -17,7 +27,3 @@ RELEASE VERSION 1.0.0
o Tested on the DRAGON data
o Support for Annovar and snpEff databases
o Support for Varscan2 and Mutect2




69 changes: 47 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

This tool was designed to calculate a **Tumor Mutational Burden (TMB)** score from a VCF file.

The TMB is usually defined as the total number of non-synonymous mutations per coding area of a tumor genome. This metric is mainly used as a biomarker in clinical practice to determine whether to use or not immunomodulatory anticancer drugs (immune checkpoint inhibitors such as Nivolumab). Whole Exome Sequencing (WES) allows comprehensive measurement of TMB and is considered the gold standard. In practice, as TMB is mainly used in the routine of the clinic, and due to the high cost of WES, TMB calculation based on gene panels is preferred.
The TMB is usually defined as the total number of non-synonymous mutations per coding area of a tumor genome. This metric is mainly used as a biomarker in clinical practice to determine whether to use or not immunomodulatory anticancer drugs (immune checkpoint inhibitors such as Nivolumab). Whole Exome Sequencing (WES) allows comprehensive measurement of TMB and is considered the gold standard. In practice, as TMB is mainly used in the routine of the clinic, and due to the high cost of WES, TMB calculation based on gene panels is preferred.

Currently, the main limitation of TMB calculation is the lack of standard for its calculation. Therefore, we decided to propose a very **versatile** tool allowing the user to define exactly which type of variants to use or filter.

Expand All @@ -34,9 +34,9 @@ The TMB is defined as the number of variants over the size of the genomic region

```bash
python bin/pyTMB.py --help
usage: pyTMB.py [-h] [-i VCF] [--dbConfig DBCONFIG] [--varConfig VARCONFIG]
usage: pyTMB.py [-h] [-i VCF] [--dbConfig DBCONFIG] [--varConfig VARCONFIG]
[--effGenomeSize EFFGENOMESIZE] [--bed BED]
[--minVAF MINVAF] [--minMAF MINMAF] [--minDepth MINDEPTH]
[--vaf MINVAF] [--maf MAXMAF] [--minDepth MINDEPTH] [--minAltDepth MINALTDEPTH]
[--filterLowQual] [--filterIndels] [--filterCoding]
[--filterSplice] [--filterNonCoding] [--filterSyn]
[--filterNonSyn] [--filterCancerHotspot] [--filterPolym]
Expand All @@ -52,9 +52,10 @@ optional arguments:
--effGenomeSize EFFGENOMESIZE Effective genome size
--bed BED Capture design to use if effGenomeSize is not defined (BED file)

--minVAF MINVAF Filter variants with Allelic Ratio < minVAF
--minMAF MINMAF Filter variants with MAF < minMAF
--vaf MINVAF Filter variants with Allelic Ratio < minVAF
--maf MINMAF Filter variants with MAF < maf
--minDepth MINDEPTH Filter variants with depth < minDepth
--minAltDepth MINALTDEPTH FIlter alternative allele with depth < minAltDepth
--filterLowQual Filter low quality (i.e not PASS) variant
--filterIndels Filter insertions/deletions
--filterCoding Filter Coding variants
Expand All @@ -65,23 +66,23 @@ optional arguments:
--filterCancerHotspot Filter variants annotated as cancer hotspots
--filterPolym Filter polymorphism variants in genome databases. See --minMAF
--filterRecurrence Filter on recurrence values

--polymDb POLYMDB Databases used for polymorphisms detection (comma separated)
--cancerDb CANCERDB Databases used for cancer hotspot annotation (comma separated)

--verbose
--debug
--export
--version

```
## Configs
Working with vcf files is usually not straighforward, and mainly depends on the variant caller and annotation tools/databases used.
In order to make this tool as flexible as possible, we decided to set up **two configurations files** to defined with fields as to be checked and in which case.
The `--dbConfig` file described all details about annotation. As an exemple, we provide some configurations for **Annovar** (*conf/annovar.yml*)
The `--dbConfig` file described all details about annotation. As an exemple, we provide some configurations for **Annovar** (*conf/annovar.yml*)
and **snpEff** (*conf/snpeff.yaml*) tool.
These files can be customized by the user.
Expand Down Expand Up @@ -120,17 +121,20 @@ The same is true for the `--cancerDb` parameter.
### Filters
#### `--minVAF MINVAF`
#### `--vaf MINVAF`
Filter variants with Allelic Ratio < minVAF. Note the field used to get the Allelic Ratio field is defined in the *conf/caller.yml* file.
In this case, the programm will first look for this information in the **FORMAT** field, and then in the **INFO** field.
#### `--minMAF MINMAF`
Filter variants with MAF < minMAF. Note the databases used to check the Min Allele Frequency are set using the `--polymDb`
#### `--maf MAXMAF`
Filter variants with MAF < maf. Note the databases used to check the Min Allele Frequency are set using the `--polymDb`
parameters and the *conf/databases.yml* file.
#### `--minDepth MINDEPTH`
Filter variants with depth < minDepth. Note the field used to get the depth is defined in the *conf/caller.yml* file.
In this case, the programm will first look for this information in the **FORMAT** field, and then in the **INFO** field.
Filter variants with depth < minDepth. Note the field used to get the depth is defined in the *conf/caller.yml* file.
In this case, the programm will first look for this information in the **FORMAT** field, and then in the **INFO** field.
#### `--minAltDepth MINALTDEPTH`
FIlter alternative allele with depth < minAltDepth. The programm will look in the **FORMAT** field exclusively.
#### `--filterLowQual`
Filter variants for which is the **FILTER** field is not **PASS** or for which the **QUAL** value is not null.
Expand Down Expand Up @@ -162,7 +166,7 @@ Filter polymorphism variants from genome databases. The databases to considered
The fields to scan for each database are defined in the *conf/databases.yml* file and the population frequency is compared with the `--minMAF` field.
#### `--filterRecurrence`
Filter on recurrence values (for instance, intra-run occurence). In this case, the vcf file must contains the recurrence information
Filter on recurrence values (for instance, intra-run occurence). In this case, the vcf file must contains the recurrence information
which can be defined the *conf/databases.yml* file.
## Outputs
Expand All @@ -178,14 +182,17 @@ This option allows to export a vcf file which only contains the variants used fo
The option allows to export a vcf file with the tag **TMB_FILTERS** in the **INFO** field. This tag therefore contains the reason for which a variant would be filtered.
## Examples
## Usage and recommandations
Let's calculated the TMB on a gene panel vcf file (coding size = 1.9Mb, caller = varscan, annotation = Annovar) as the following variants :
- PASS
Here is a list of recommanded parameters for different user cases.
### Gene Panel
Let's calculated the TMB on a gene panel vcf file (coding size = 1.59Mb, caller = varscan, annotation = Annovar) with the following criteria:
- minDepth at 100X
- non-synonymous
- coding
- non polymorphism variants using 1K, gnomAD databases and a minMAF of 0.001
- coding and splice
- non polymorphism variants using 1K, gnomAD databases and a MAF of 0.001
In this case, a typical usage would be :
Expand All @@ -198,16 +205,34 @@ python pyTMB.py -i ${VCF} \
--filterNonCoding \
--filterSplice \
--filterSyn \
--filterPolym --minMAF 0.001 --polymDb 1k,gnomad \
--filterPolym --minMAF 0.001 --polymDb 1k,gnomad \
--effGenomeSize 1590000 \
--export > TMB_results.log
```
### Exome / Whole Genome Sequencing
For WES, we recommend filtering low quality, non coding, synonymous, polymorphic variants. Here, indels and splicing variants are kept. For WES, an effective Genome size of 33Mb is used but a tailored size depending on the variants and regions is preferred.
In the case of a WES variant calling using Mutect2 as variant caller and Snpeff as annotation tool, a typical usage would be :
```
python pyTMB.py -i ${VCF} --effGenomeSize 33280000 \
--dbConfig conf/snpeff.yml \
--varConfig conf/mutect2.yml \
--vaf 0.05 --maf 0.001 --minDepth 20 --minAltDepth 3\
--filterLowQual \
--filterNonCoding \
--filterSyn \
--filterPolym \
--polymDb 1k,gnomad > TMB_results.log
```
### Credits
This pipeline has been written by the bioinformatics core facility in close collaboration with the Clinical Bioinformatics and the Genetics Service of the Institut Curie.
### Contacts
For any question, bug or suggestion, please use the issues system or contact the bioinformatics core facility.
Loading

0 comments on commit c410dea

Please sign in to comment.