Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up match_quantiles function for ComBat_seq functon #48

Open
wants to merge 135 commits into
base: RELEASE_3_9
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
135 commits
Select commit Hold shift + click to select a range
cfb92b0
adds sva to the repos.
Oct 15, 2011
a317a90
adds sva to the repos.
Oct 15, 2011
7c494aa
(1) Added ComBat to sva, (2) updated vignette with ComBat section, (3…
Oct 26, 2011
da6747b
(1) Added ComBat to sva, (2) updated vignette with ComBat section, (3…
jtleek Oct 26, 2011
f2f4695
added ComBat.Rd
Oct 27, 2011
20643fb
added ComBat.Rd
jtleek Oct 27, 2011
e5c4290
removed svadat.Rd
Oct 27, 2011
e7e3376
removed svadat.Rd
jtleek Oct 27, 2011
b89d11a
bumped package version numbers prior to creating 2.9 branch
Oct 31, 2011
1f3ee88
bumped package version numbers prior to creating 2.9 branch
dtenenba Oct 31, 2011
3738a21
version bump for start of 2.10 development cycle
Oct 31, 2011
593a122
version bump for start of 2.10 development cycle
dtenenba Oct 31, 2011
eaedd7f
Set default method to be for num svs in sva package. Gave users optio…
Nov 30, 2011
f8293bb
Set default method to be for num svs in sva package. Gave users optio…
jtleek Nov 30, 2011
1e2fb24
Fixed num.sv documentation to put method be first, updated vignette t…
Dec 5, 2011
50eb0a4
Fixed num.sv documentation to put method be first, updated vignette t…
jtleek Dec 5, 2011
1106d60
Fixed Combat bug so that numerical covariates can appear anywhere in …
Mar 10, 2012
1289e85
Fixed Combat bug so that numerical covariates can appear anywhere in …
jtleek Mar 10, 2012
af42e2c
first version bump prior to creating 2.10 branch
Mar 30, 2012
ee27ffe
first version bump prior to creating 2.10 branch
dtenenba Mar 30, 2012
cec4dfa
bumped version numbers after creating 2.10 branch
Mar 30, 2012
8437d97
bumped version numbers after creating 2.10 branch
dtenenba Mar 30, 2012
9e855b7
fixed bug in the sweave document that was preventing windows binary f…
Apr 25, 2012
725b3c0
fixed bug in the sweave document that was preventing windows binary f…
jtleek Apr 25, 2012
8b52703
update Imports:
Sep 24, 2012
3d8f2a0
update Imports:
Sep 24, 2012
302e04a
Bump all packages before creating 2.11 branch.
Oct 1, 2012
f048950
Bump all packages before creating 2.11 branch.
dtenenba Oct 1, 2012
3c51864
bump package versions after creating 2.11 branch
Oct 1, 2012
ff3a47c
bump package versions after creating 2.11 branch
dtenenba Oct 1, 2012
fe6af2c
bump y of x.y.z version number for BioC 2.12 release
Apr 3, 2013
21bbbb0
bump y of x.y.z version number for BioC 2.12 release
dtenenba Apr 3, 2013
84d1bf0
Bump y in version x.y.z to odd number in devel
Apr 3, 2013
4e37d93
Bump y in version x.y.z to odd number in devel
dtenenba Apr 3, 2013
0af35d5
bump versions prior to creating BioC 2.13 branch
Oct 14, 2013
4e61fe3
bump versions prior to creating BioC 2.13 branch
dtenenba Oct 14, 2013
11a9688
bump version after release branch has been created
Oct 14, 2013
1eff3a1
bump version after release branch has been created
dtenenba Oct 14, 2013
f6702a1
rename inst/doc to vignettes and bump version
Nov 1, 2013
1566571
rename inst/doc to vignettes and bump version
dtenenba Nov 1, 2013
6c7cda3
modify to new biocViews to DESCRIPTION file
Mar 4, 2014
f39623e
modify to new biocViews to DESCRIPTION file
sonali-bioc Mar 4, 2014
805114e
First version bump prior to creating 2.14 branch.
Apr 11, 2014
f6d3c1a
First version bump prior to creating 2.14 branch.
dtenenba Apr 11, 2014
4eca95f
Second version bump after creating 2.14 release branch.
Apr 11, 2014
bd62822
Second version bump after creating 2.14 release branch.
dtenenba Apr 11, 2014
976f602
setting up git-svn bridge
Jun 5, 2014
77d507b
setting up git-svn bridge
jtleek Jun 5, 2014
de9a4b1
Commit made by the Bioconductor Git-SVN bridge.
Jun 5, 2014
9fe1799
Commit made by the Bioconductor Git-SVN bridge.
jtleek Jun 5, 2014
8674880
Commit made by the Bioconductor Git-SVN bridge.
Jun 13, 2014
c273c0f
Commit made by the Bioconductor Git-SVN bridge.
jtleek Jun 13, 2014
a551fd9
Commit made by the Bioconductor Git-SVN bridge.
Jun 23, 2014
6deef13
Commit made by the Bioconductor Git-SVN bridge.
jtleek Jun 23, 2014
f67fc35
Commit made by the Bioconductor Git-SVN bridge.
Jun 24, 2014
ead41cd
Commit made by the Bioconductor Git-SVN bridge.
jtleek Jun 24, 2014
74cbbe4
Commit made by the Bioconductor Git-SVN bridge.
Sep 4, 2014
359cd67
Commit made by the Bioconductor Git-SVN bridge.
jtleek Sep 4, 2014
bf30f1c
Commit made by the Bioconductor Git-SVN bridge.
Oct 2, 2014
16ca471
Commit made by the Bioconductor Git-SVN bridge.
jtleek Oct 2, 2014
bd8fd80
Commit made by the Bioconductor Git-SVN bridge.
Oct 2, 2014
265405d
Commit made by the Bioconductor Git-SVN bridge.
jtleek Oct 2, 2014
daf3736
Bump package versions prior to creating the 3.0 branch.
Oct 13, 2014
403057e
Bump package versions prior to creating the 3.0 branch.
dtenenba Oct 13, 2014
40b9afe
Bumping versions after creating 3.0 release branch.
Oct 13, 2014
ad21068
Bumping versions after creating 3.0 release branch.
dtenenba Oct 13, 2014
f3e37a9
Commit made by the Bioconductor Git-SVN bridge.
Nov 25, 2014
0a3521f
Commit made by the Bioconductor Git-SVN bridge.
jtleek Nov 25, 2014
4893fa7
Commit made by the Bioconductor Git-SVN bridge.
Dec 4, 2014
aa24427
Commit made by the Bioconductor Git-SVN bridge.
jtleek Dec 4, 2014
a393633
Commit made by the Bioconductor Git-SVN bridge.
Feb 25, 2015
7b1c746
Commit made by the Bioconductor Git-SVN bridge.
jtleek Feb 25, 2015
5358098
Commit made by the Bioconductor Git-SVN bridge.
Feb 25, 2015
e461739
Commit made by the Bioconductor Git-SVN bridge.
jtleek Feb 25, 2015
3076e0c
Commit made by the Bioconductor Git-SVN bridge.
Mar 3, 2015
dbc332f
Commit made by the Bioconductor Git-SVN bridge.
jtleek Mar 3, 2015
49661f3
Bump versions prior to creating 3.1 branch.
Apr 16, 2015
7395600
Bump versions prior to creating 3.1 branch.
dtenenba Apr 16, 2015
dbed96e
Bump versions in trunk after creation of 3.1 branch.
Apr 16, 2015
81e8fdb
Bump versions in trunk after creation of 3.1 branch.
dtenenba Apr 16, 2015
4fe816a
Commit made by the Bioconductor Git-SVN bridge.
Aug 4, 2015
cdf0b55
Commit made by the Bioconductor Git-SVN bridge.
jtleek Aug 4, 2015
29bdf22
Bumped versions of all packages prior to creating 3.2 branch.
Oct 13, 2015
7fc9c88
Bumped versions of all packages prior to creating 3.2 branch.
dtenenba Oct 13, 2015
f48f461
Bumped version number of all packages after creation of 3.2 branch.
Oct 13, 2015
26d18d2
Bumped version number of all packages after creation of 3.2 branch.
dtenenba Oct 13, 2015
b3456c2
Bump versions prior to creation of 3.3 branch
May 3, 2016
ed5602e
Bump versions prior to creation of 3.3 branch
dtenenba May 3, 2016
f279aab
bump version after creating 3.3 branch
May 3, 2016
d010057
bump version after creating 3.3 branch
dtenenba May 3, 2016
f242a6c
bump x.y.z versions to even 'y' prior to creation of 3_4 branch
Oct 17, 2016
358f837
bump x.y.z versions to even 'y' prior to creation of 3_4 branch
Oct 17, 2016
35e7e0b
bump x.y.z versions to odd 'y' after creation of 3_4 branch
Oct 17, 2016
68669cf
bump x.y.z versions to odd 'y' after creation of 3_4 branch
Oct 17, 2016
27b8bd2
bump x.y.z versions to even y prior to creation of 3_5 branch
Apr 24, 2017
1af467f
bump x.y.z versions to even y prior to creation of 3_5 branch
Apr 24, 2017
a3406ad
bump x.y.z versions to odd y after creation of 3_5 branch
Apr 24, 2017
812b539
bump x.y.z versions to odd y after creation of 3_5 branch
Apr 24, 2017
7e4aa44
Merge pull request #26 from cafletezbrant/master
jtleek Apr 28, 2017
d1ed1ca
Resolved branch conflicts
Jun 5, 2017
21b0a7f
Resolved branch conflicts
lcolladotor Jun 5, 2017
5cbe819
v3.25.1 -- merged https://github.com/jtleek/sva-devel with svn's history
lcolladotor Jun 5, 2017
4622dd0
v3.25.1 -- merged https://github.com/jtleek/sva-devel with svn's history
lcolladotor Jun 5, 2017
a46c42f
Fix my email
lcolladotor Jun 5, 2017
d749131
Fix my email
lcolladotor Jun 5, 2017
acc1784
v3.25.2 -- resolve some notes/errors/warnings from R CMD check
lcolladotor Jun 13, 2017
c5444ca
v3.25.2 -- resolve some notes/errors/warnings from R CMD check
lcolladotor Jun 13, 2017
411e91b
v3.25.3 -- fixed another error and a note
lcolladotor Jun 15, 2017
076e768
v3.25.3 -- fixed another error and a note
lcolladotor Jun 15, 2017
9d75a30
v3.25.4 -- fixed a namespace warning
lcolladotor Jun 19, 2017
a7fd565
v3.25.4 -- fixed a namespace warning
lcolladotor Jun 19, 2017
9ef4627
Merge remote-tracking branch 'upstream/master'
lcolladotor Aug 21, 2017
842412c
Update ComBat.R
kasperdanielhansen Oct 6, 2017
ce5bb6b
Merge pull request #32 from kasperdanielhansen/patch-2
jtleek Oct 6, 2017
eeaf915
When plotting ComBat prior.plots, save the original state of the grap…
dfjenkins3 Oct 27, 2017
8510f76
Use on.exit() to restore old par settings in case of error, suggested…
dfjenkins3 May 14, 2018
cd45237
Merge branch 'master' into master
dfjenkins3 Aug 27, 2018
c20f666
coerce dat into matrix & address zero variance genes within any singl…
zhangyuqing Dec 6, 2018
1916fa5
bump x.y.z versions to odd y after creation of RELEASE_3_9 branch
nturaga May 2, 2019
a1b932d
v3.33.1 -- change email
lcolladotor May 22, 2019
814ff26
bump x.y.z version to even y prior to creation of RELEASE_3_10 branch
nturaga Oct 29, 2019
74acc9d
bump x.y.z version to odd y after creation of RELEASE_3_10 branch
nturaga Oct 29, 2019
6230c1d
Merge changes from jtleek/sva-devel with my master branch
zhangyuqing Dec 20, 2019
8cfff1f
add combat-seq to sva
zhangyuqing Dec 20, 2019
ec39175
add combat-seq example in vignettes
zhangyuqing Dec 29, 2019
2ed6d34
add back genes with all 0s in any batch
zhangyuqing Jan 23, 2020
9ef2e45
update documentation, rows are samples and columns are genes in the d…
princyparsana Mar 17, 2020
6dddeb1
Merge pull request #41 from princyparsana/master
lcolladotor Mar 17, 2020
131f9cb
v3.35.1 -- Merged https://github.com/jtleek/sva-devel/pull/41 by @pri…
lcolladotor Mar 17, 2020
e3082aa
use seq_len() instead of 1:xx instead
lcolladotor Mar 17, 2020
5d96ca8
fix a parenthesis
lcolladotor Mar 17, 2020
cbfaa32
Merge pull request #33 from dfjenkins3/master
zhangyuqing Mar 22, 2020
df7ca8b
Merge branch 'master' into master
zhangyuqing Mar 22, 2020
3f50493
Merge pull request #39 from zhangyuqing/master
zhangyuqing Mar 22, 2020
123be9b
Delete an uncessary file that was merged into the repo
lcolladotor Mar 22, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
Package: sva
Title: Surrogate Variable Analysis
Version: 3.32.0
Version: 3.35.2
Author: Jeffrey T. Leek <[email protected]>, W. Evan Johnson <[email protected]>,
Hilary S. Parker <[email protected]>, Elana J. Fertig <[email protected]>,
Andrew E. Jaffe <[email protected]>, John D. Storey <[email protected]>,
Yuqing Zhang <[email protected]>,
Leonardo Collado Torres <[email protected]>
Andrew E. Jaffe <[email protected]>, Yuqing Zhang <[email protected]>,
John D. Storey <[email protected]>,
Leonardo Collado Torres <[email protected]>
Description: The sva package contains functions for removing batch
effects and other unwanted variation in high-throughput
experiment. Specifically, the sva package contains functions
Expand All @@ -32,13 +32,14 @@ Depends:
R (>= 3.2),
mgcv,
genefilter,
BiocParallel
BiocParallel
Imports:
matrixStats,
stats,
graphics,
utils,
limma,
edgeR
Suggests:
pamr,
bladderbatch,
Expand All @@ -48,4 +49,4 @@ Suggests:
License: Artistic-2.0
biocViews: ImmunoOncology, Microarray, StatisticalMethod, Preprocessing,
MultipleComparison, Sequencing, RNASeq, BatchEffect, Normalization
RoxygenNote: 6.0.1
RoxygenNote: 7.0.2
12 changes: 12 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Generated by roxygen2: do not edit by hand

export(ComBat)
export(ComBat_seq)
export(empirical.controls)
export(f.pvalue)
export(fstats)
Expand All @@ -21,23 +22,34 @@ import(genefilter)
import(matrixStats, except = c(rowSds, rowVars))

import(mgcv)
importFrom(edgeR,DGEList)
importFrom(edgeR,estimateGLMCommonDisp)
importFrom(edgeR,estimateGLMTagwiseDisp)
importFrom(edgeR,getOffset)
importFrom(edgeR,glmFit)
importFrom(edgeR,glmFit.default)
importFrom(graphics,lines)
importFrom(graphics,par)
importFrom(limma,lmFit)
importFrom(stats,cor)
importFrom(stats,density)
importFrom(stats,dnbinom)
importFrom(stats,dnorm)
importFrom(stats,lm)
importFrom(stats,model.matrix)
importFrom(stats,pf)
importFrom(stats,pnbinom)
importFrom(stats,ppoints)
importFrom(stats,prcomp)
importFrom(stats,predict)
importFrom(stats,qgamma)
importFrom(stats,qnbinom)
importFrom(stats,qnorm)
importFrom(stats,qqline)
importFrom(stats,qqnorm)
importFrom(stats,qqplot)
importFrom(stats,smooth.spline)
importFrom(stats,var)
importFrom(utils,capture.output)
importFrom(utils,read.delim)
useDynLib(sva, .registration = TRUE)
51 changes: 40 additions & 11 deletions R/ComBat.R
Original file line number Diff line number Diff line change
Expand Up @@ -43,23 +43,47 @@
#' @export
#'

ComBat <- function (dat, batch, mod = NULL, par.prior = TRUE, prior.plots = FALSE,
mean.only = FALSE, ref.batch = NULL, BPPARAM = bpparam("SerialParam")) {
ComBat <- function(dat, batch, mod = NULL, par.prior = TRUE, prior.plots = FALSE,
mean.only = FALSE, ref.batch = NULL, BPPARAM = bpparam("SerialParam")) {
if(length(dim(batch))>1){
stop("This version of ComBat only allows one batch variable")
} ## to be updated soon!

## coerce dat into a matrix
dat <- as.matrix(dat)

## find genes with zero variance in any of the batches
batch <- as.factor(batch)
zero.rows.lst <- lapply(levels(batch), function(batch_level){
if(sum(batch==batch_level)>1){
return(which(apply(dat[, batch==batch_level], 1, function(x){var(x)==0})))
}else{
return(which(rep(1,3)==2))
}
})
zero.rows <- Reduce(union, zero.rows.lst)
keep.rows <- setdiff(1:nrow(dat), zero.rows)

if (length(zero.rows) > 0) {
cat(sprintf("Found %d genes with uniform expression within a single batch (all zeros); these will not be adjusted for batch.\n", length(zero.rows)))
# keep a copy of the original data matrix and remove zero var rows
dat.orig <- dat
dat <- dat[keep.rows, ]
}

## make batch a factor and make a set of indicators for batch
if(any(table(batch)==1)){mean.only=TRUE}
if(mean.only==TRUE){
message("Using the 'mean only' version of ComBat")
}
if(length(dim(batch))>1){
stop("This version of ComBat only allows one batch variable")
} ## to be updated soon!
batch <- as.factor(batch)

batchmod <- model.matrix(~-1+batch)
if (!is.null(ref.batch)){
## check for reference batch, check value, and make appropriate changes
if (!(ref.batch%in%levels(batch))) {
stop("reference level ref.batch is not one of the levels of the batch variable")
}
cat("Using batch =",ref.batch, "as a reference batch (this batch won't change)\n")
message("Using batch =",ref.batch, "as a reference batch (this batch won't change)")
ref <- which(levels(as.factor(batch))==ref.batch) # find the reference
batchmod[,ref] <- 1
} else {
Expand Down Expand Up @@ -114,7 +138,7 @@ ComBat <- function (dat, batch, mod = NULL, par.prior = TRUE, prior.plots = FALS
## print(dat[1:2,])

##Standardize Data across genes
cat('Standardizing Data across genes\n')
message('Standardizing Data across genes')
if (!NAs){
B.hat <- solve(crossprod(design), tcrossprod(t(design), as.matrix(dat)))
} else {
Expand Down Expand Up @@ -180,6 +204,8 @@ ComBat <- function (dat, batch, mod = NULL, par.prior = TRUE, prior.plots = FALS
## Plot empirical and parametric priors

if (prior.plots && par.prior) {
old_pars <- par(no.readonly = TRUE)
on.exit(par(old_pars))
par(mfrow=c(2,2))

## Top left
Expand Down Expand Up @@ -265,13 +291,16 @@ ComBat <- function (dat, batch, mod = NULL, par.prior = TRUE, prior.plots = FALS

bayesdata <- (bayesdata*(sqrt(var.pooled)%*%t(rep(1,n.array))))+stand.mean # FIXME

## tiny change still exist when tested on bladder data
## total sum of change within each batch around 1e-15
## (could be computational system error).
## Do not change ref batch at all in reference version
if(!is.null(ref.batch)){
bayesdata[, batches[[ref]]] <- dat[, batches[[ref]]]
}

## put genes with 0 variance in any batch back in data
if (length(zero.rows) > 0) {
dat.orig[keep.rows, ] <- bayesdata
bayesdata <- dat.orig
}

return(bayesdata)
}
216 changes: 216 additions & 0 deletions R/ComBat_seq.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
#' Adjust for batch effects using an empirical Bayes framework in RNA-seq raw counts
#'
#' ComBat_seq is an improved model from ComBat using negative binomial regression,
#' which specifically targets RNA-Seq count data.
#'
#' @param counts Raw count matrix from genomic studies (dimensions gene x sample)
#' @param batch Vector / factor for batch
#' @param group Vector / factor for biological condition of interest
#' @param covar_mod Model matrix for multiple covariates to include in linear model (signals from these variables are kept in data after adjustment)
#' @param full_mod Boolean, if TRUE include condition of interest in model
#' @param shrink Boolean, whether to apply shrinkage on parameter estimation
#' @param shrink.disp Boolean, whether to apply shrinkage on dispersion
#' @param gene.subset.n Number of genes to use in empirical Bayes estimation, only useful when shrink = TRUE
#'
#' @return data A gene x sample count matrix, adjusted for batch effects.
#'
#' @importFrom edgeR DGEList estimateGLMCommonDisp estimateGLMTagwiseDisp glmFit glmFit.default getOffset
#' @importFrom stats dnbinom lm pnbinom qnbinom
#' @importFrom utils capture.output
#'
#' @examples
#'
#' count_matrix <- matrix(rnbinom(400, size=10, prob=0.1), nrow=50, ncol=8)
#' batch <- c(rep(1, 4), rep(2, 4))
#' group <- rep(c(0,1), 4)
#'
#' # include condition (group variable)
#' adjusted_counts <- ComBat_seq(count_matrix, batch=batch, group=group, full_mod=TRUE)
#'
#' # do not include condition
#' adjusted_counts <- ComBat_seq(count_matrix, batch=batch, group=NULL, full_mod=FALSE)
#'
#' @export
#'

ComBat_seq <- function(counts, batch, group=NULL, covar_mod=NULL, full_mod=TRUE,
shrink=FALSE, shrink.disp=FALSE, gene.subset.n=NULL){
######## Preparation ########
## Does not support 1 sample per batch yet
batch <- as.factor(batch)
if(any(table(batch)<=1)){
stop("ComBat-seq doesn't support 1 sample per batch yet")
}

## Remove genes with only 0 counts in any batch
keep_lst <- lapply(levels(batch), function(b){
which(apply(counts[, batch==b], 1, function(x){!all(x==0)}))
})
keep <- Reduce(intersect, keep_lst)
rm <- setdiff(1:nrow(counts), keep)
countsOri <- counts
counts <- counts[keep, ]

# require bioconductor 3.7, edgeR 3.22.1
dge_obj <- DGEList(counts=counts)

## Prepare characteristics on batches
n_batch <- nlevels(batch) # number of batches
batches_ind <- lapply(1:n_batch, function(i){which(batch==levels(batch)[i])}) # list of samples in each batch
n_batches <- sapply(batches_ind, length)
#if(any(n_batches==1)){mean_only=TRUE; cat("Note: one batch has only one sample, setting mean.only=TRUE\n")}
n_sample <- sum(n_batches)
cat("Found",n_batch,'batches\n')

## Make design matrix
# batch
batchmod <- model.matrix(~-1+batch) # colnames: levels(batch)
# covariate
group <- as.factor(group)
if(full_mod & nlevels(group)>1){
cat("Using full model in ComBat-seq.\n")
mod <- model.matrix(~group)
}else{
cat("Using null model in ComBat-seq.\n")
mod <- model.matrix(~1, data=as.data.frame(t(counts)))
}
# drop intercept in covariate model
if(!is.null(covar_mod)){
if(is.data.frame(covar_mod)){
covar_mod <- do.call(cbind, lapply(1:ncol(covar_mod), function(i){model.matrix(~covar_mod[,i])}))
}
covar_mod <- covar_mod[, !apply(covar_mod, 2, function(x){all(x==1)})]
}
# bind with biological condition of interest
mod <- cbind(mod, covar_mod)
# combine
design <- cbind(batchmod, mod)

## Check for intercept in covariates, and drop if present
check <- apply(design, 2, function(x) all(x == 1))
#if(!is.null(ref)){check[ref]=FALSE} ## except don't throw away the reference batch indicator
design <- as.matrix(design[,!check])
cat("Adjusting for",ncol(design)-ncol(batchmod),'covariate(s) or covariate level(s)\n')

## Check if the design is confounded
if(qr(design)$rank<ncol(design)){
#if(ncol(design)<=(n_batch)){stop("Batch variables are redundant! Remove one or more of the batch variables so they are no longer confounded")}
if(ncol(design)==(n_batch+1)){stop("The covariate is confounded with batch! Remove the covariate and rerun ComBat-Seq")}
if(ncol(design)>(n_batch+1)){
if((qr(design[,-c(1:n_batch)])$rank<ncol(design[,-c(1:n_batch)]))){stop('The covariates are confounded! Please remove one or more of the covariates so the design is not confounded')
}else{stop("At least one covariate is confounded with batch! Please remove confounded covariates and rerun ComBat-Seq")}}
}

## Check for missing values in count matrix
NAs = any(is.na(counts))
if(NAs){cat(c('Found',sum(is.na(counts)),'Missing Data Values\n'),sep=' ')}


######## Estimate gene-wise dispersions within each batch ########
cat("Estimating dispersions\n")
## Estimate common dispersion within each batch as an initial value
disp_common <- sapply(1:n_batch, function(i){
if((n_batches[i] <= ncol(design)-ncol(batchmod)+1) | qr(mod[batches_ind[[i]], ])$rank < ncol(mod)){
# not enough residual degree of freedom
return(estimateGLMCommonDisp(counts[, batches_ind[[i]]], design=NULL, subset=nrow(counts)))
}else{
return(estimateGLMCommonDisp(counts[, batches_ind[[i]]], design=mod[batches_ind[[i]], ], subset=nrow(counts)))
}
})

## Estimate gene-wise dispersion within each batch
genewise_disp_lst <- lapply(1:n_batch, function(j){
if((n_batches[j] <= ncol(design)-ncol(batchmod)+1) | qr(mod[batches_ind[[j]], ])$rank < ncol(mod)){
# not enough residual degrees of freedom - use the common dispersion
return(rep(disp_common[j], nrow(counts)))
}else{
return(estimateGLMTagwiseDisp(counts[, batches_ind[[j]]], design=mod[batches_ind[[j]], ],
dispersion=disp_common[j], prior.df=0))
}
})
names(genewise_disp_lst) <- paste0('batch', levels(batch))

## construct dispersion matrix
phi_matrix <- matrix(NA, nrow=nrow(counts), ncol=ncol(counts))
for(k in 1:n_batch){
phi_matrix[, batches_ind[[k]]] <- vec2mat(genewise_disp_lst[[k]], n_batches[k])
}


######## Estimate parameters from NB GLM ########
cat("Fitting the GLM model\n")
glm_f <- glmFit(dge_obj, design=design, dispersion=phi_matrix, prior.count=1e-4) #no intercept - nonEstimable; compute offset (library sizes) within function
alpha_g <- glm_f$coefficients[, 1:n_batch] %*% as.matrix(n_batches/n_sample) #compute intercept as batch-size-weighted average from batches
new_offset <- t(vec2mat(getOffset(dge_obj), nrow(counts))) + # original offset - sample (library) size
vec2mat(alpha_g, ncol(counts)) # new offset - gene background expression # getOffset(dge_obj) is the same as log(dge_obj$samples$lib.size)
glm_f2 <- glmFit.default(dge_obj$counts, design=design, dispersion=phi_matrix, offset=new_offset, prior.count=1e-4)

gamma_hat <- glm_f2$coefficients[, 1:n_batch]
mu_hat <- glm_f2$fitted.values
phi_hat <- do.call(cbind, genewise_disp_lst)


######## In each batch, compute posterior estimation through Monte-Carlo integration ########
if(shrink){
cat("Apply shrinkage - computing posterior estimates for parameters\n")
mcint_fun <- monte_carlo_int_NB
monte_carlo_res <- lapply(1:n_batch, function(ii){
if(ii==1){
mcres <- mcint_fun(dat=counts[, batches_ind[[ii]]], mu=mu_hat[, batches_ind[[ii]]],
gamma=gamma_hat[, ii], phi=phi_hat[, ii], gene.subset.n=gene.subset.n)
}else{
invisible(capture.output(mcres <- mcint_fun(dat=counts[, batches_ind[[ii]]], mu=mu_hat[, batches_ind[[ii]]],
gamma=gamma_hat[, ii], phi=phi_hat[, ii], gene.subset.n=gene.subset.n)))
}
return(mcres)
})
names(monte_carlo_res) <- paste0('batch', levels(batch))

gamma_star_mat <- lapply(monte_carlo_res, function(res){res$gamma_star})
gamma_star_mat <- do.call(cbind, gamma_star_mat)
phi_star_mat <- lapply(monte_carlo_res, function(res){res$phi_star})
phi_star_mat <- do.call(cbind, phi_star_mat)

if(!shrink.disp){
cat("Apply shrinkage to mean only\n")
phi_star_mat <- phi_hat
}
}else{
cat("Shrinkage off - using GLM estimates for parameters\n")
gamma_star_mat <- gamma_hat
phi_star_mat <- phi_hat
}


######## Obtain adjusted batch-free distribution ########
mu_star <- matrix(NA, nrow=nrow(counts), ncol=ncol(counts))
for(jj in 1:n_batch){
mu_star[, batches_ind[[jj]]] <- exp(log(mu_hat[, batches_ind[[jj]]])-vec2mat(gamma_star_mat[, jj], n_batches[jj]))
}
phi_star <- rowMeans(phi_star_mat)


######## Adjust the data ########
cat("Adjusting the data\n")
adjust_counts <- matrix(NA, nrow=nrow(counts), ncol=ncol(counts))
for(kk in 1:n_batch){
counts_sub <- counts[, batches_ind[[kk]]]
old_mu <- mu_hat[, batches_ind[[kk]]]
old_phi <- phi_hat[, kk]
new_mu <- mu_star[, batches_ind[[kk]]]
new_phi <- phi_star
adjust_counts[, batches_ind[[kk]]] <- match_quantiles(counts_sub=counts_sub,
old_mu=old_mu, old_phi=old_phi,
new_mu=new_mu, new_phi=new_phi)
}

#dimnames(adjust_counts) <- dimnames(counts)
#return(adjust_counts)

## Add back genes with only 0 counts in any batch (so that dimensions won't change)
adjust_counts_whole <- matrix(NA, nrow=nrow(countsOri), ncol=ncol(countsOri))
dimnames(adjust_counts_whole) <- dimnames(countsOri)
adjust_counts_whole[keep, ] <- adjust_counts
adjust_counts_whole[rm, ] <- countsOri[rm, ]
return(adjust_counts_whole)
}
Loading