Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data-fetching from multiple processes causes error #2

Open
zwdzwd opened this issue May 9, 2018 · 6 comments
Open

Data-fetching from multiple processes causes error #2

zwdzwd opened this issue May 9, 2018 · 6 comments

Comments

@zwdzwd
Copy link

zwdzwd commented May 9, 2018

Hi,

When some data is fetched (from S3) the first time but from each of multiple workers using mclapply, some error occurred (something like " scheduled cores 21, 2, 19 encountered errors in user code, all values of the jobs will be affected"). But if parallel processing is run after data is pre-cached, then no error occurs. Any chance that some kind of lock can be implemented to prevent caching from multiple instances of ExperimentHub?

Thank you!

@zwdzwd
Copy link
Author

zwdzwd commented May 9, 2018

The error does go away if I add 'localHub=TRUE' when initiating ExperimentHub() after pre-caching! In fact, the error persisted regardless of whether there exists a local copy of the data, as long as "localHub=TRUE" is absent. I checked my environment has good internet connection.

@mtmorgan
Copy link
Contributor

mtmorgan commented May 9, 2018

I think the work-around will be to open the hub in the worker

mclapply(1:5, function(i) {
    hub = ExperimentHub()
    ...
})

A more sophisticated approach would use a lock to ensure that there is a single process accessing the hub

id = BiocParallel::ipcid()
mclapply(1:5, function(i, id) {
    ...
    BiocParallel::ipclock(id)
    ## ExperimentHub activities
    BiocParallel::ipcunlock(id)
    ...
}, id = id)

@zwdzwd
Copy link
Author

zwdzwd commented May 24, 2018

@mtmorgan Thank you. The ipclock approach is promising I will test that out by having ipcid() set at package load. Thanks again for the suggestion!

@zwdzwd
Copy link
Author

zwdzwd commented May 24, 2018

@mtmorgan I tested with a few code and still couldn't get the ipclock to work. Here is what I tested using alpineData as example

The following works with pre-caching, but failed without pre-cache
R stuck at "retrieving 1 resource"

library(ExperimentHub)
library(BiocParallel)
library(parallel)
id <- BiocParallel::ipcid()

a <- mclapply(1:4, function(x, id) {
    BiocParallel::ipclock(id)
    data <- ExperimentHub()[[3]]
    BiocParallel::ipcunlock(id)
    data
},id=id, mc.cores=4)

The following serial version worked with and without caching.

library(ExperimentHub)
library(BiocParallel)
library(parallel)
id <- BiocParallel::ipcid()

a <- lapply(1:4, function(x, id) {
    BiocParallel::ipclock(id)
    data <- ExperimentHub()[[3]]
    BiocParallel::ipcunlock(id)
    data
},id=id)

It seems the ipclock method still require data be precached? Hope I am not missing things very obvious here. Thank you!

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grDevices datasets  stats     graphics  utils     parallel  methods
[8] base

other attached packages:
[1] BiocParallel_1.15.3  ExperimentHub_1.7.0  AnnotationHub_2.13.1
[4] BiocGenerics_0.27.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16                  AnnotationDbi_1.43.1
 [3] magrittr_1.5                  IRanges_2.15.7
 [5] bit_1.1-12                    xtable_1.8-2
 [7] R6_2.2.2                      blob_1.1.1
 [9] httr_1.3.1                    Biobase_2.41.0
[11] DBI_1.0.0                     htmltools_0.3.6
[13] yaml_2.1.19                   bit64_0.9-7
[15] digest_0.6.15                 interactiveDisplayBase_1.19.0
[17] shiny_1.0.5                   later_0.7.2
[19] S4Vectors_0.19.5              promises_1.0.1
[21] memoise_1.1.0                 RSQLite_2.1.1
[23] mime_0.5                      compiler_3.5.0
[25] BiocInstaller_1.31.1          stats4_3.5.0
[27] httpuv_1.4.2

@mtmorgan
Copy link
Contributor

This

library(ExperimentHub)
library(BiocParallel)
library(parallel)
id <- BiocParallel::ipcid()

unlink(fileName(ExperimentHub()[3]))

a <- mclapply(1:4, function(x, id) {
    BiocParallel::ipclock(id)
    data <- ExperimentHub()[[3]]
    BiocParallel::ipcunlock(id)
    data
},id=id, mc.cores=4)

'works for me'.

snapshotDate(): 2018-05-08
see ?alpineData and browseVignettes('alpineData') for documentation
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache 
    '/home/mtmorgan//.ExperimentHub/167'
snapshotDate(): 2018-05-08
see ?alpineData and browseVignettes('alpineData') for documentation
downloading 0 resources
loading from cache 
    '/home/mtmorgan//.ExperimentHub/167'
snapshotDate(): 2018-05-08
see ?alpineData and browseVignettes('alpineData') for documentation
downloading 0 resources
loading from cache 
    '/home/mtmorgan//.ExperimentHub/167'
snapshotDate(): 2018-05-08
see ?alpineData and browseVignettes('alpineData') for documentation
downloading 0 resources
loading from cache 
    '/home/mtmorgan//.ExperimentHub/167'

Can you try to debug more on your end, e.g., does the ipclock() work (replace data <- ... with Sys.sleep(1), and the four processes should execute sequentially, taking about 4 seconds)? Does the download work when only one core does the download (if (x == 1) data <- ...)?

You might also try to catch the error using BiocParallel, as

param <- register(bpstart(MulticoreParam(4)))
res <- bptry(bplapply(1:4, function(i) stop("oops: ", i)))
res
conditionMessage(res[[1]])

@zwdzwd
Copy link
Author

zwdzwd commented May 25, 2018

@mtmorgan Thanks for suggestions. I tried to debug and it seems ipclock() works OK. It takes about 4 sec with Sys.sleep(1).

Download doesn't work when only one core does the download. Strangely bptry cannot catch error. I inserted browser() in the following code:

param <- register(bpstart(MulticoreParam(4)))
res <- bplapply(1:4, function(i, id) {
    BiocParallel::ipclock(id)
    if(i==1) {
        browser()
        data <- ExperimentHub()[[3]]
    } else {
        data <- 1
    }
    BiocParallel::ipcunlock(id)
    data
}, id=id)

returns error

Error: BiocParallel errors
  element index: 1
  first error: failed to load resource
  name: EH167
  title: ERR188088
  reason: 1 resources failed to download
In addition: Warning message:
stop worker failed:
  'clear_cluster' receive data failed:
  reached elapsed time limit

Under mclapply, R simply hangs on stepping.

Thank you,

Edit: Download with lapply in place of bplapply does finish without issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants