Data-fetching from multiple processes causes error #2

zwdzwd · 2018-05-09T18:39:48Z

Hi,

When some data is fetched (from S3) the first time but from each of multiple workers using mclapply, some error occurred (something like " scheduled cores 21, 2, 19 encountered errors in user code, all values of the jobs will be affected"). But if parallel processing is run after data is pre-cached, then no error occurs. Any chance that some kind of lock can be implemented to prevent caching from multiple instances of ExperimentHub?

Thank you!

zwdzwd · 2018-05-09T20:53:34Z

The error does go away if I add 'localHub=TRUE' when initiating ExperimentHub() after pre-caching! In fact, the error persisted regardless of whether there exists a local copy of the data, as long as "localHub=TRUE" is absent. I checked my environment has good internet connection.

mtmorgan · 2018-05-09T21:08:06Z

I think the work-around will be to open the hub in the worker

mclapply(1:5, function(i) {
    hub = ExperimentHub()
    ...
})

A more sophisticated approach would use a lock to ensure that there is a single process accessing the hub

id = BiocParallel::ipcid()
mclapply(1:5, function(i, id) {
    ...
    BiocParallel::ipclock(id)
    ## ExperimentHub activities
    BiocParallel::ipcunlock(id)
    ...
}, id = id)

zwdzwd · 2018-05-24T06:05:56Z

@mtmorgan Thank you. The ipclock approach is promising I will test that out by having ipcid() set at package load. Thanks again for the suggestion!

zwdzwd · 2018-05-24T09:52:45Z

@mtmorgan I tested with a few code and still couldn't get the ipclock to work. Here is what I tested using alpineData as example

The following works with pre-caching, but failed without pre-cache
R stuck at "retrieving 1 resource"

library(ExperimentHub)
library(BiocParallel)
library(parallel)
id <- BiocParallel::ipcid()

a <- mclapply(1:4, function(x, id) {
    BiocParallel::ipclock(id)
    data <- ExperimentHub()[[3]]
    BiocParallel::ipcunlock(id)
    data
},id=id, mc.cores=4)

The following serial version worked with and without caching.

library(ExperimentHub)
library(BiocParallel)
library(parallel)
id <- BiocParallel::ipcid()

a <- lapply(1:4, function(x, id) {
    BiocParallel::ipclock(id)
    data <- ExperimentHub()[[3]]
    BiocParallel::ipcunlock(id)
    data
},id=id)

It seems the ipclock method still require data be precached? Hope I am not missing things very obvious here. Thank you!

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grDevices datasets  stats     graphics  utils     parallel  methods
[8] base

other attached packages:
[1] BiocParallel_1.15.3  ExperimentHub_1.7.0  AnnotationHub_2.13.1
[4] BiocGenerics_0.27.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16                  AnnotationDbi_1.43.1
 [3] magrittr_1.5                  IRanges_2.15.7
 [5] bit_1.1-12                    xtable_1.8-2
 [7] R6_2.2.2                      blob_1.1.1
 [9] httr_1.3.1                    Biobase_2.41.0
[11] DBI_1.0.0                     htmltools_0.3.6
[13] yaml_2.1.19                   bit64_0.9-7
[15] digest_0.6.15                 interactiveDisplayBase_1.19.0
[17] shiny_1.0.5                   later_0.7.2
[19] S4Vectors_0.19.5              promises_1.0.1
[21] memoise_1.1.0                 RSQLite_2.1.1
[23] mime_0.5                      compiler_3.5.0
[25] BiocInstaller_1.31.1          stats4_3.5.0
[27] httpuv_1.4.2

mtmorgan · 2018-05-24T10:43:10Z

This

library(ExperimentHub)
library(BiocParallel)
library(parallel)
id <- BiocParallel::ipcid()

unlink(fileName(ExperimentHub()[3]))

a <- mclapply(1:4, function(x, id) {
    BiocParallel::ipclock(id)
    data <- ExperimentHub()[[3]]
    BiocParallel::ipcunlock(id)
    data
},id=id, mc.cores=4)

'works for me'.

snapshotDate(): 2018-05-08
see ?alpineData and browseVignettes('alpineData') for documentation
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache 
    '/home/mtmorgan//.ExperimentHub/167'
snapshotDate(): 2018-05-08
see ?alpineData and browseVignettes('alpineData') for documentation
downloading 0 resources
loading from cache 
    '/home/mtmorgan//.ExperimentHub/167'
snapshotDate(): 2018-05-08
see ?alpineData and browseVignettes('alpineData') for documentation
downloading 0 resources
loading from cache 
    '/home/mtmorgan//.ExperimentHub/167'
snapshotDate(): 2018-05-08
see ?alpineData and browseVignettes('alpineData') for documentation
downloading 0 resources
loading from cache 
    '/home/mtmorgan//.ExperimentHub/167'

Can you try to debug more on your end, e.g., does the ipclock() work (replace data <- ... with Sys.sleep(1), and the four processes should execute sequentially, taking about 4 seconds)? Does the download work when only one core does the download (if (x == 1) data <- ...)?

You might also try to catch the error using BiocParallel, as

param <- register(bpstart(MulticoreParam(4)))
res <- bptry(bplapply(1:4, function(i) stop("oops: ", i)))
res
conditionMessage(res[[1]])

zwdzwd · 2018-05-25T02:39:20Z

@mtmorgan Thanks for suggestions. I tried to debug and it seems ipclock() works OK. It takes about 4 sec with Sys.sleep(1).

Download doesn't work when only one core does the download. Strangely bptry cannot catch error. I inserted browser() in the following code:

param <- register(bpstart(MulticoreParam(4)))
res <- bplapply(1:4, function(i, id) {
    BiocParallel::ipclock(id)
    if(i==1) {
        browser()
        data <- ExperimentHub()[[3]]
    } else {
        data <- 1
    }
    BiocParallel::ipcunlock(id)
    data
}, id=id)

returns error

Error: BiocParallel errors
  element index: 1
  first error: failed to load resource
  name: EH167
  title: ERR188088
  reason: 1 resources failed to download
In addition: Warning message:
stop worker failed:
  'clear_cluster' receive data failed:
  reached elapsed time limit

Under mclapply, R simply hangs on stepping.

Thank you,

Edit: Download with lapply in place of bplapply does finish without issue.

zwdzwd mentioned this issue May 24, 2018

SeSAMe Bioconductor/Contributions#716

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data-fetching from multiple processes causes error #2

Data-fetching from multiple processes causes error #2

zwdzwd commented May 9, 2018

zwdzwd commented May 9, 2018

mtmorgan commented May 9, 2018

zwdzwd commented May 24, 2018

zwdzwd commented May 24, 2018

mtmorgan commented May 24, 2018

zwdzwd commented May 25, 2018 •

edited

Loading

Data-fetching from multiple processes causes error #2

Data-fetching from multiple processes causes error #2

Comments

zwdzwd commented May 9, 2018

zwdzwd commented May 9, 2018

mtmorgan commented May 9, 2018

zwdzwd commented May 24, 2018

zwdzwd commented May 24, 2018

mtmorgan commented May 24, 2018

zwdzwd commented May 25, 2018 • edited Loading

zwdzwd commented May 25, 2018 •

edited

Loading