You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm observing jobs randomly being run sooner than scheduled. As far as I can tell, this occurs with multiple workers and retried jobs.
Investigation
I believe I see a race condition in the code.
Disclaimer: I haven't verified this particular path, as it's extremely difficult to reproduce it. I did, however, verified that scaling down to 1 worker makes the issue go away.
And now, it’s in the future yet worker B continues normally 💥 As far as I can tell, steps 3 and 4 are not protected by the sync primitives. Does this sound plausible?
Possible fix
I haven't studied the code well enough. It looks like an additional check, like score > timestamp_ms() could be added around here to prevent the execution of a future retry:
Symptoms
I'm observing jobs randomly being run sooner than scheduled. As far as I can tell, this occurs with multiple workers and retried jobs.
Investigation
I believe I see a race condition in the code.
Disclaimer: I haven't verified this particular path, as it's extremely difficult to reproduce it. I did, however, verified that scaling down to 1 worker makes the issue go away.
Consider this scenario:
arq/arq/worker.py
Line 386 in 3914e48
arq/arq/worker.py
Line 435 in 3914e48
Retry
. It increments the job score:arq/arq/worker.py
Line 701 in 3914e48
arq/arq/worker.py
Line 449 in 3914e48
And now, it’s in the future yet worker B continues normally 💥 As far as I can tell, steps 3 and 4 are not protected by the sync primitives. Does this sound plausible?
Possible fix
I haven't studied the code well enough. It looks like an additional check, like
score > timestamp_ms()
could be added around here to prevent the execution of a future retry:arq/arq/worker.py
Line 450 in 3914e48
The text was updated successfully, but these errors were encountered: