Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ComBat silently fails when a row has non-zero values for only a single batch and mean.only=TRUE #14

Open
khughitt opened this issue May 30, 2016 · 19 comments

Comments

@khughitt
Copy link

Overview

An edge case causes ComBat to silently fail when mean.only=TRUE and return a matrix of all NA/NaN values when there is at least one row in the data for which only a single batch has non-zero values.

To reproduce

Based on the vignette example:

#!/usr/bin/env Rscript
library(sva)
library(bladderbatch)
data(bladderdata)

pheno = pData(bladderEset)
edata = exprs(bladderEset)
batch = pheno$batch

modcombat = model.matrix(~1, data=pheno)

# modify a row so that only one batch is non-zero
edata[1,batch==3] <- 1
edata[1,batch!=3] <- 0

# if mean.only is not set to True, ComBat produces an error
#Error in while (change > conv) { : missing value where TRUE/FALSE needed
#Calls: ComBat -> it.sol
#Execution halted
#ComBat(dat=edata, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)

# if mean.only=TRUE, ComBat does not error out, but produces a result with all
# NaNs
res <- ComBat(dat=edata, batch=batch, mod=modcombat, par.prior=TRUE, mean.only=TRUE)
res[1:3,1:3]

print("Number of NaNs")
sum(is.nan(res))

print("Number of NAs")
sum(is.na(res))

print("Number of non-NAs")
sum(!is.na(res))

Result

Loading required package: mgcv
...

Using the 'mean only' version of ComBat
Found 5 batches
Adjusting for 0 covariate(s) or covariate level(s)
Standardizing Data across genes
Fitting L/S model and finding priors
Finding parametric adjustments
Adjusting the Data
          GSM71019.CEL GSM71020.CEL GSM71021.CEL
1007_s_at          NaN          NaN          NaN
1053_at             NA           NA           NA
117_at              NA           NA           NA
[1] "Number of NaNs"
[1] 57
[1] "Number of NAs"
[1] 1270131
[1] "Number of non-NAs"
[1] 0

Note that when mean.only=FALSE, ComBat still fails, but this time is produces an error.

Expected result

Since is unlikely to be a common scenario and should only affect a few rows at most, an ideal solution might be to take @wevanjohnson's suggested approach for a separate variance-related issue (#13):

  1. Note the issue to the user
  2. Exclude the problematic rows from batch-adjustment
  3. Add the rows back in at the end before returning the result

System Info

Using latest sva from Github.

R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  methods   stats     graphics  grDevices utils     datasets 
[8] base     

other attached packages:
[1] bladderbatch_1.10.0 Biobase_2.32.0      BiocGenerics_0.18.0
[4] sva_3.16.1          genefilter_1.54.2   mgcv_1.8-12        
[7] nlme_3.1-127       

loaded via a namespace (and not attached):
 [1] lattice_0.20-33      IRanges_2.6.0        XML_3.98-1.4        
 [4] grid_3.3.0           xtable_1.8-2         DBI_0.4-1           
 [7] stats4_3.3.0         RSQLite_1.0.0        annotate_1.50.0     
[10] S4Vectors_0.10.0     Matrix_1.2-6         splines_3.3.0       
[13] tools_3.3.0          survival_2.39-2      AnnotationDbi_1.34.2
@khughitt
Copy link
Author

Come to think of it, I could just update #13 to check for this scenario and handle it along-side of looking for zero-variances rows. Just let me know what you prefer.

@khughitt
Copy link
Author

khughitt commented Jun 3, 2016

@wevanjohnson Thoughts?

@RuiNascimento
Copy link

@khughitt any idea on how to find the problematic rows? I'm having this exact issue with one of my datasets. Error with mean.only=FALSE or a NAN dataset as result.
Even though I removed NA's from dataset and 0 variance rows

@wevanjohnson
Copy link
Contributor

wevanjohnson commented Jun 5, 2019 via email

@wevanjohnson
Copy link
Contributor

wevanjohnson commented Jun 5, 2019 via email

@RuiNascimento
Copy link

I did some more tests on this issue and discovered that the problem is not related to zero and non-zero values per se.
This happens if the variance in the "different" batch is also 0. Try changing in the example code (provided by @khughitt) the following lines to any value, the error still occurs:

# modify a row so that only one batch is non-zero
edata[1,batch==3] <- 1    #This one is the "different" batch, change this to any integer
edata[1,batch!=3] <- 0    #The other batches, change this to any integer

Maybe this can help pinpoint the issue.
Thx

@wevanjohnson
Copy link
Contributor

wevanjohnson commented Jun 5, 2019 via email

@zhangyuqing
Copy link
Collaborator

Hi everyone,
@wevanjohnson @khughitt @RuiNascimento I made modifications to the ComBat script in an earlier pull request (#35), which does not create NAN values in the above example using the bladderbatch dataset. The updated script is also available on my GitHub: https://github.com/zhangyuqing/sva-devel. @RuiNascimento, would you try this updated version and see if it still results in NAN data? Please let me know if the problem is not resolved. @jtleek could you please merge pull request #35 to address the issues raised with regard to ComBat?
Thanks,
Yuqing

@wevanjohnson
Copy link
Contributor

wevanjohnson commented Jun 5, 2019 via email

@RuiNascimento
Copy link

@zhangyuqing @wevanjohnson Thank you for the quick response!
Can confirm that the new sva version resolves this issue on both the example code provided above, as well as in my dataset.
Much appreciated,
Rui Nascimento

@Manninm
Copy link

Manninm commented Jun 27, 2019

@zhangyuqing I was having problems with zero variance genes and NAN data when calling with mod=NULL and par.prior=FALSE. After downloading the new package off your using github (install_github('zhangyuqing/sva-devel'), ComBat throws the "Error in apply(delta.hat, 1, aprior) : object 'aprior' not found". Since RuiNasscimento got the function to work, I am wondering if there were any changes since.

@zhangyuqing
Copy link
Collaborator

Hi @Manninm,
The aprior function (as well as all the other helper functions for ComBat) is in helper.R. Try just using the two scripts for ComBat:
source("ComBat.R")
source("helper.R")
This should work as a temporary solution for now, while we are cleaning up the package for the next release. If you see any further errors, please let me know.
Thanks,
Yuqing

@Manninm
Copy link

Manninm commented Jun 27, 2019

@zhangyuqing
Thanks so much for your quick replay. The suggestion works thus far. I haven't got output yet. It will need to run overnight. Thanks again!

@modarzi
Copy link

modarzi commented Dec 1, 2019

Dear @zhangyuqing , After running comBat() in my datExpr, I have NaN problem too. So, based on your comment I run below code:
install_github('zhangyuqing/sva-devel')
I got below Error:

Installing package into ‘C:/Users/C1/Documents/R/win-library/3.6’
(as ‘lib’ is unspecified)
ERROR: failed to lock directory 'C:/Users/C1/Documents/R/win-library/3.6' for modifying
Try removing 'C:/Users/C1/Documents/R/win-library/3.6/00LOCK-sva'
Error: Failed to install 'sva' from GitHub:
  (converted from warning) installation of package ‘C:/Users/C1/AppData/Local/Temp/RtmpMj3Rr0/filec6824375718/sva_3.31.0.tar.gz’ had non-zero exit status

Then I download 2 files: "ComBat.R" and "helper.R" and run below code:
`source("ComBat.R")
but again I got below error:

`Error in source("ComBat.R") : ComBat.R:7:1: unexpected '<'
6: 
7: <
   ^

Now, really I am confused and need your guide and comment. I appreciate if you share your comment with me.
Best,
@wevanjohnson

@zhangyuqing
Copy link
Collaborator

Hi @modarzi, I don't see the error you ran into after downloading the files and sourcing the R scripts. Did you download the files from https://github.com/zhangyuqing/sva-devel?

@RuiNascimento
Copy link

Hello @modarzi did you address this error?

Installing package into ‘C:/Users/C1/Documents/R/win-library/3.6’
(as ‘lib’ is unspecified)
ERROR: failed to lock directory 'C:/Users/C1/Documents/R/win-library/3.6' for modifying
Try removing 'C:/Users/C1/Documents/R/win-library/3.6/00LOCK-sva'
Error: Failed to install 'sva' from GitHub:
  (converted from warning) installation of package ‘C:/Users/C1/AppData/Local/Temp/RtmpMj3Rr0/filec6824375718/sva_3.31.0.tar.gz’ had non-zero exit status

Looks like a simple fix. Just delete the file:

> Try removing 'C:/Users/C1/Documents/R/win-library/3.6/00LOCK-sva'

Or try running R as an administrator (I assume you are using Windows because of the directory structure).
Then try to install combat again with:

Install_github('zhangyuqing/sva-devel')

@modarzi
Copy link

modarzi commented Dec 1, 2019

Thank @RuiNascimento and @zhangyuqing . I removed "00LOCK-sva" and by below command
install_github('zhangyuqing/sva-devel')

I have installed "sva" package. Now, I would like to run below code for my datExpr:
datExpr_Co <- ComBat(as.matrix(datExpr) , batch = batchId, par.prior = TRUE)
but I got below error:

Error in apply(dat[, batch == batch_level], 1, function(x) { : 
  dim(X) must have a positive length

batchId is as below:

> batchId
 [1] 396 396 396 412 412 412 412 412 412 412 412 307 307 310 310 310 310 310 310 310 310 310 310 310
[25] 310 310 310 310 310 336 310 336 375 375 375 412 383 218 307 286 229 263 263 263 263 307 307 218
[49] 263 342 336 307 375 310 263 290 286 290 290 286 286 375 336 336 383 383 383 383 383 363 363 363
[73] 375 396
Levels: 218 229 263 286 290 307 310 336 342 363 375 383 396 412

I apprecite if you share your comment with me.
Best

@pcm32
Copy link

pcm32 commented Jul 2, 2020

@zhangyuqing I'm also getting this behaviour (all NaNs in the Combat result), even after installing through install_github('zhangyuqing/sva-devel'). You can get the dataset from here. The command I run is:

library(sva)
readRDS("demo_experiment_summary.rds")->experimentSummary
model <- ~organism_part
model.data<-model.frame(model, experimentSummary@colData[all.vars(model)])
ComBat = ComBat(experimentSummary@assays$data[[1]], experimentSummary$batch, mod = model.matrix(model, data = model.data))

The output I see is:

Found7batches
Note: one batch has only one sample, setting mean.only=TRUE
Adjusting for1covariate(s) or covariate level(s)
Standardizing Data across genes
Fitting L/S model and finding priors
Finding parametric adjustments
Adjusting the Data

I did remove rows of genes that were zero on all samples, but even with the fix in zhangyuqing/sva-devel this still seems to generate all NaNs in the combat result. The counts have been library size normalised (quantile norm), so they are not integers (could that be problematic?). My session info says sva_3.35.2 (also fails with sva_3.30.1 that I get installed from Conda).

Are there any sanity checks that one should apply to the data before running ComBat? I realise now that my problem doesn't fit with the "single batch" description of the above. I can open a separate issue if needed.

Thanks!

@pedroesoares
Copy link

@pcm32 I am facing the same problem here,

First installed the package through install_github('zhangyuqing/sva-devel'), and them runed the Combat function, which returned with all values "NAN"
My hope with this comment is that you solved your problem and could help me,

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants