Determine Scaling Overhead of Scheduler #34

Jan-Willem · 2025-02-07T13:53:03Z

Experimenting with https://dannorth.net/whats-in-a-story/ format

Narrative:

As the RADPS Technical Lead
I want to know the schedular overhead as a function of the number of tasks in a workflow for Prefect and Airflow
So that that we can make a data-driven decision about using multi-schedular ADR.

Scenario 1:

Given workflows with 1000, 2000, 4000, 8000, 20000, 40000, 60000, 80000 ? parallel tasks
And each task is a sleep
And the sleep time is random (0.1s-1.0s, distribution ?)
When each of the workflows is run sequentially using at least n_parallelism= max available? (choice resource manager left to the developers)
Then record the sum of all the sleep times using a gather from the output of each task (T_sum_task_times)
Then record total run time (T_workflow)
Then Plot both T_workflow and T_sum_task_times/n_parallelism as a function of the number of tasks in the workflow

Definition of Done

Plot in Wiki
Paragraph/Report of any relevant findings in Wiki (compare Prefect and Airflow)
Add ADR

? discuss with developers

Jan-Willem added the Type: Epic label Feb 7, 2025

Jan-Willem added this to the Item 02: Walking Skeleton milestone Feb 7, 2025

Jan-Willem assigned amcnicho, krlberry and taktsutsumi Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine Scaling Overhead of Scheduler #34

Determine Scaling Overhead of Scheduler #34

Jan-Willem commented Feb 7, 2025 •

edited

Loading

Determine Scaling Overhead of Scheduler #34

Determine Scaling Overhead of Scheduler #34

Comments

Jan-Willem commented Feb 7, 2025 • edited Loading

Narrative:

Scenario 1:

Definition of Done

Jan-Willem commented Feb 7, 2025 •

edited

Loading