-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data-fetching from multiple processes causes error #2
Comments
The error does go away if I add 'localHub=TRUE' when initiating ExperimentHub() after pre-caching! In fact, the error persisted regardless of whether there exists a local copy of the data, as long as "localHub=TRUE" is absent. I checked my environment has good internet connection. |
I think the work-around will be to open the hub in the worker
A more sophisticated approach would use a lock to ensure that there is a single process accessing the hub
|
@mtmorgan Thank you. The ipclock approach is promising I will test that out by having ipcid() set at package load. Thanks again for the suggestion! |
@mtmorgan I tested with a few code and still couldn't get the ipclock to work. Here is what I tested using alpineData as example The following works with pre-caching, but failed without pre-cache
The following serial version worked with and without caching.
It seems the ipclock method still require data be precached? Hope I am not missing things very obvious here. Thank you!
|
This
'works for me'.
Can you try to debug more on your end, e.g., does the ipclock() work (replace You might also try to catch the error using BiocParallel, as
|
@mtmorgan Thanks for suggestions. I tried to debug and it seems ipclock() works OK. It takes about 4 sec with Download doesn't work when only one core does the download. Strangely
returns error
Under Thank you, Edit: Download with |
Hi,
When some data is fetched (from S3) the first time but from each of multiple workers using mclapply, some error occurred (something like " scheduled cores 21, 2, 19 encountered errors in user code, all values of the jobs will be affected"). But if parallel processing is run after data is pre-cached, then no error occurs. Any chance that some kind of lock can be implemented to prevent caching from multiple instances of ExperimentHub?
Thank you!
The text was updated successfully, but these errors were encountered: