Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can DISCOVER be used with data from targeted sequencing? #26

Open
jud-b opened this issue Mar 6, 2024 · 1 comment
Open

Can DISCOVER be used with data from targeted sequencing? #26

jud-b opened this issue Mar 6, 2024 · 1 comment

Comments

@jud-b
Copy link

jud-b commented Mar 6, 2024

Hi,

I have successfully used DISCOVER with data from whole-exome sequencing. I am wondering whether I can use it with data from targeted sequencing data generated by MSK IMPACT or DFCI OncoPanel. Are there enough mutation events from only 300-500 genes to estimate the background mutation rate? Are there specific assumptions that are not met when one is using targeted sequencing data?
Your help would be much appreciated.

Thanks.

@scanisius
Copy link
Member

scanisius commented Mar 13, 2024

Using DISCOVER with gene panels of a few hundred genes works very well. In the DISCOVER paper, we used whole-exome data with the assumption that the estimation of the background model benefits from having mutation data for as many genes as possible. Since then, we have also applied DISCOVER to gene panel data. We have observed that for panels of a few hundred genes the results obtained with DISCOVER are very similar. You should probably be more careful with very small gene panels though.

To illustrate the concept, have a look at the R code below, which subsets the included breast cancer mutation data to the MSK-IMPACT panel genes and compares the results with those of the whole-exome analysis.

library(discover)

data(BRCA.mut)

# Download MSK-IMPACT panel genes
panel_info <- readLines(url("https://media.githubusercontent.com/media/cBioPortal/datahub/master/reference_data/gene_panels/data_gene_panel_impact505.txt"))
msk_impact_genes <- unlist(strsplit(unlist(strsplit(grep("^gene_list:", panel_info, value = TRUE), " "))[2], "\t"))


# Fit background model for full and panel mutation data
msk_impact_genes <- intersect(rownames(BRCA.mut), msk_impact_genes)

events_all_genes <- discover.matrix(BRCA.mut)
events_all_genes <- events_all_genes[msk_impact_genes, ]

events_msk_impact <- discover.matrix(BRCA.mut[msk_impact_genes, ])


# Perform DISCOVER test for genes with more than 25 mutations
subset <- rowSums(events_msk_impact$events) > 25

result_all_genes <- pairwise.discover.test(events_all_genes[subset, ])
result_msk_impact <- pairwise.discover.test(events_msk_impact[subset, ])


# Compare the resulting P values
mask <- lower.tri(result_all_genes$p.values)
p_all_genes <- result_all_genes$p.values[mask]
p_msk_impact <- result_msk_impact$p.values[mask]

plot(-log10(p_all_genes), -log10(p_msk_impact))

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants