drmaa errors- resubmit/retry #116

cchng · 2020-02-11T16:59:58Z

Hi ruffus team,

I'm using the drmaa wrapper to submit/run jobs on an SGE cluster. I'm running into communication exceptions that I've been working to resolve (Related issue: aws/aws-parallelcluster#1592). Has the ruffus team encountered this error? If not, is there a resubmit/retry feature that is ready to use? Even though not explicitly documented, it looks like the run_job function takes a resubmit parameter.

[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] File "/shared/amgenesis/helpers.py", line 126, in run
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] cmdline.run (options, logger=logger_proxy, multithread = options.jobs, exceptions_terminate_immediately = True)
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/ruffus/cmdline.py", line 834, in run
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] **appropriate_options)
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/ruffus/task.py", line 5424, in pipeline_run
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] raise job_errors
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] ruffus.ruffus_exceptions.RethrownJobError:
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] Original exception:
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] Exception #1
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] 'drmaa.errors.DrmCommunicationException(code 2: failed receiving gdi request response for mid=65535 (can't send response for this message id - protocol error).)' raised in ...

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drmaa errors- resubmit/retry #116

drmaa errors- resubmit/retry #116

cchng commented Feb 11, 2020

drmaa errors- resubmit/retry #116

drmaa errors- resubmit/retry #116

Comments

cchng commented Feb 11, 2020