-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ComBat silently fails when a row has non-zero values for only a single batch and mean.only=TRUE #14
Comments
Come to think of it, I could just update #13 to check for this scenario and handle it along-side of looking for zero-variances rows. Just let me know what you prefer. |
@wevanjohnson Thoughts? |
@khughitt any idea on how to find the problematic rows? I'm having this exact issue with one of my datasets. Error with mean.only=FALSE or a NAN dataset as result. |
Yuqing, could you take a look at this?
… On May 30, 2016, at 10:04 AM, Keith Hughitt ***@***.*** ***@***.***>> wrote:
Overview
An edge case causes ComBat to silently fail when mean.only=TRUE and return a matrix of all NA/NaN values when there is at least one row in the data for which only a single batch has non-zero values.
To reproduce
Based on the vignette example:
#!/usr/bin/env Rscript
library(sva)
library(bladderbatch)
data(bladderdata)
pheno = pData(bladderEset)
edata = exprs(bladderEset)
batch = pheno$batch
modcombat = model.matrix(~1, data=pheno)
# modify a row so that only one batch is non-zero
edata[1,batch==3] <- 1
edata[1,batch!=3] <- 0
# if mean.only is not set to True, ComBat produces an error
#Error in while (change > conv) { : missing value where TRUE/FALSE needed
#Calls: ComBat -> it.sol
#Execution halted
#ComBat(dat=edata, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)
# if mean.only=TRUE, ComBat does not error out, but produces a result with all
# NaNs
res <- ComBat(dat=edata, batch=batch, mod=modcombat, par.prior=TRUE, mean.only=TRUE)
res[1:3,1:3]
print("Number of NaNs")
sum(is.nan(res))
print("Number of NAs")
sum(is.na(res))
print("Number of non-NAs")
sum(!is.na(res))
Result
Loading required package: mgcv
...
Using the 'mean only' version of ComBat
Found 5 batches
Adjusting for 0 covariate(s) or covariate level(s)
Standardizing Data across genes
Fitting L/S model and finding priors
Finding parametric adjustments
Adjusting the Data
GSM71019.CEL GSM71020.CEL GSM71021.CEL
1007_s_at NaN NaN NaN
1053_at NA NA NA
117_at NA NA NA
[1] "Number of NaNs"
[1] 57
[1] "Number of NAs"
[1] 1270131
[1] "Number of non-NAs"
[1] 0
Note that when mean.only=FALSE, ComBat still fails, but this time is produces an error.
Expected result
Since is unlikely to be a common scenario and should only affect a few rows at most, an ideal solution might be to take @wevanjohnson <https://github.com/wevanjohnson>'s suggested approach for a separate variance-related issue (#13 <#13>):
1) Note the issue to the user
2) Exclude the problematic rows from batch-adjustment
3) Add the rows back in at the end before returning the result
System Info
Using latest sva from Github.
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel methods stats graphics grDevices utils datasets
[8] base
other attached packages:
[1] bladderbatch_1.10.0 Biobase_2.32.0 BiocGenerics_0.18.0
[4] sva_3.16.1 genefilter_1.54.2 mgcv_1.8-12
[7] nlme_3.1-127
loaded via a namespace (and not attached):
[1] lattice_0.20-33 IRanges_2.6.0 XML_3.98-1.4
[4] grid_3.3.0 xtable_1.8-2 DBI_0.4-1
[7] stats4_3.3.0 RSQLite_1.0.0 annotate_1.50.0
[10] S4Vectors_0.10.0 Matrix_1.2-6 splines_3.3.0
[13] tools_3.3.0 survival_2.39-2 AnnotationDbi_1.34.2
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#14>, or mute the thread <https://github.com/notifications/unsubscribe/AJgbPBec9jyd9BTuzhnwFih1qk1Jp2Gkks5qGu53gaJpZM4Ip01A>.
|
Yuqing
… On Jun 5, 2019, at 7:41 AM, RuiNascimento ***@***.***> wrote:
@khughitt <https://github.com/khughitt> any idea on how to find the problematic rows? I'm having this exact issue with one of my datasets. Error with mean.only=FALSE or a NAN dataset as result.
Even though I removed NA's from dataset and 0 variance rows
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#14?email_source=notifications&email_token=ACMBWPDLDDX2533O5C2CIXLPY6Q5XA5CNFSM4CFHJVAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW7OBJY#issuecomment-499048615>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACMBWPHEDEFJE5AODKNJZXLPY6Q5XANCNFSM4CFHJVAA>.
|
I did some more tests on this issue and discovered that the problem is not related to zero and non-zero values per se.
Maybe this can help pinpoint the issue. |
Thank you. I have asked my student, Yuqing Zhang to take a look.
… On Jun 5, 2019, at 11:56 AM, RuiNascimento ***@***.***> wrote:
I did some more tests on this issue and discovered that the problem is not related to zero and non-zero values per se.
This happens if the variance in the "different" batch is also 0. Try changing in the example code (provided by @khughitt <https://github.com/khughitt>) the following lines to any value, the error still occurs:
# modify a row so that only one batch is non-zero
edata[1,batch==3] <- 1 #This one is the "different" batch, change this to any integer
edata[1,batch!=3] <- 0 #The other batches, change this to any integer
Maybe this can help pinpoint the issue.
Thx
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#14?email_source=notifications&email_token=ACMBWPACWX2DWGFQVVSJCF3PY7OZDA5CNFSM4CFHJVAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXAFROI#issuecomment-499144889>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACMBWPBBTSCC3ZV3OKX2DVTPY7OZDANCNFSM4CFHJVAA>.
|
Hi everyone, |
Thanks Yuqing!
… On Jun 5, 2019, at 11:59 AM, Yuqing Zhang ***@***.***> wrote:
Hi everyone,
@wevanjohnson <https://github.com/wevanjohnson> @khughitt <https://github.com/khughitt> @RuiNascimento <https://github.com/RuiNascimento> I made modifications to the ComBat script in an earlier pull request (#35 <#35>), which does not create NAN values in the above example using the bladderbatch dataset. The updated script is also available on my GitHub: https://github.com/zhangyuqing/sva-devel <https://github.com/zhangyuqing/sva-devel>. @RuiNascimento <https://github.com/RuiNascimento>, would you try this updated version and see if it still results in NAN data? Please let me know if the problem is not resolved. @jtleek <https://github.com/jtleek> could you please merge pull request #35 <#35> to address the issues raised with regard to ComBat?
Thanks,
Yuqing
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#14?email_source=notifications&email_token=ACMBWPGWIWFSB6URSJVNPM3PY7PGPA5CNFSM4CFHJVAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXAF5BQ#issuecomment-499146374>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACMBWPDDIO7LTCCPCWLI323PY7PGPANCNFSM4CFHJVAA>.
|
@zhangyuqing @wevanjohnson Thank you for the quick response! |
@zhangyuqing I was having problems with zero variance genes and NAN data when calling with mod=NULL and par.prior=FALSE. After downloading the new package off your using github (install_github('zhangyuqing/sva-devel'), ComBat throws the "Error in apply(delta.hat, 1, aprior) : object 'aprior' not found". Since RuiNasscimento got the function to work, I am wondering if there were any changes since. |
Hi @Manninm, |
@zhangyuqing |
Dear @zhangyuqing , After running comBat() in my datExpr, I have NaN problem too. So, based on your comment I run below code:
Then I download 2 files: "ComBat.R" and "helper.R" and run below code:
Now, really I am confused and need your guide and comment. I appreciate if you share your comment with me. |
Hi @modarzi, I don't see the error you ran into after downloading the files and sourcing the R scripts. Did you download the files from https://github.com/zhangyuqing/sva-devel? |
Hello @modarzi did you address this error?
Looks like a simple fix. Just delete the file:
Or try running R as an administrator (I assume you are using Windows because of the directory structure).
|
Thank @RuiNascimento and @zhangyuqing . I removed "00LOCK-sva" and by below command I have installed "sva" package. Now, I would like to run below code for my datExpr:
batchId is as below:
I apprecite if you share your comment with me. |
@zhangyuqing I'm also getting this behaviour (all NaNs in the Combat result), even after installing through
The output I see is:
I did remove rows of genes that were zero on all samples, but even with the fix in Are there any sanity checks that one should apply to the data before running ComBat? I realise now that my problem doesn't fit with the "single batch" description of the above. I can open a separate issue if needed. Thanks! |
@pcm32 I am facing the same problem here, First installed the package through Thanks! |
Overview
An edge case causes
ComBat
to silently fail whenmean.only=TRUE
and return a matrix of all NA/NaN values when there is at least one row in the data for which only a single batch has non-zero values.To reproduce
Based on the vignette example:
Result
Note that when
mean.only=FALSE
, ComBat still fails, but this time is produces an error.Expected result
Since is unlikely to be a common scenario and should only affect a few rows at most, an ideal solution might be to take @wevanjohnson's suggested approach for a separate variance-related issue (#13):
System Info
Using latest sva from Github.
The text was updated successfully, but these errors were encountered: