Optimizing WorkflowExecutor.enqueueReadyTasks #1752
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
WorkflowExecutor.enqueueReadyTasks
runs in follow steps:executor.enqueue_fetch_size
number of tasks2-1. Lock the task if the task is still ready
2-2. Enqueue the task
Step 2-1 checks task state again ("recheck") because another thread may
enqueue the task during the operation (notice that step 1 doesn't
lock the tasks).
"recheck" needs to run a SELECT statement on the database to get state
& lock the task atomicly. If "recheck" doesn't pass, the task will be
ignored (step 2-2 doesn't run).
"recheck" may not pass very frequently when following conditions are met:
executor.enqueue_fetch_size
is large (default, 100, is already large)operations are large, the database is overloaded temporarily, or the
digdag server is overloaded temporarily)
Frequent failing "recheck" means that a lot of SELECT operations waste
database workload. It also wastes digdag server's thread time.
This change optimizes step 1 & 2 as following:
On PostgreSQL, step 1 can be done using one SELECT statement. This
solves above potential problem.
On H2 database, step 2 needs two SELECT statements. Thus this commit
won't optimize performance. But notice that above problem won't happen
on H2 database because a database won't be shared by multiple servers.