You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As the RADPS Technical Lead I want to know the schedular overhead as a function of the number of tasks in a workflow for Prefect and Airflow So that that we can make a data-driven decision about using multi-schedular ADR.
Scenario 1:
Given workflows with 1000, 2000, 4000, 8000, 20000, 40000, 60000, 80000 ? parallel tasks And each task is a sleep And the sleep time is random (0.1s-1.0s, distribution ?) When each of the workflows is run sequentially using at least n_parallelism= max available? (choice resource manager left to the developers) Then record the sum of all the sleep times using a gather from the output of each task (T_sum_task_times) Then record total run time (T_workflow) Then Plot both T_workflow and T_sum_task_times/n_parallelism as a function of the number of tasks in the workflow
Definition of Done
Plot in Wiki
Paragraph/Report of any relevant findings in Wiki (compare Prefect and Airflow)
Add ADR
? discuss with developers
The text was updated successfully, but these errors were encountered:
Experimenting with https://dannorth.net/whats-in-a-story/ format
Narrative:
As the RADPS Technical Lead
I want to know the schedular overhead as a function of the number of tasks in a workflow for Prefect and Airflow
So that that we can make a data-driven decision about using multi-schedular ADR.
Scenario 1:
Given workflows with 1000, 2000, 4000, 8000, 20000, 40000, 60000, 80000 ? parallel tasks
And each task is a sleep
And the sleep time is random (0.1s-1.0s, distribution ?)
When each of the workflows is run sequentially using at least n_parallelism= max available? (choice resource manager left to the developers)
Then record the sum of all the sleep times using a gather from the output of each task (T_sum_task_times)
Then record total run time (T_workflow)
Then Plot both T_workflow and T_sum_task_times/n_parallelism as a function of the number of tasks in the workflow
Definition of Done
? discuss with developers
The text was updated successfully, but these errors were encountered: