Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine Scaling Overhead of Scheduler #34

Open
3 tasks
Jan-Willem opened this issue Feb 7, 2025 · 0 comments
Open
3 tasks

Determine Scaling Overhead of Scheduler #34

Jan-Willem opened this issue Feb 7, 2025 · 0 comments

Comments

@Jan-Willem
Copy link
Member

Jan-Willem commented Feb 7, 2025

Experimenting with https://dannorth.net/whats-in-a-story/ format

Narrative:

As the RADPS Technical Lead
I want to know the schedular overhead as a function of the number of tasks in a workflow for Prefect and Airflow
So that that we can make a data-driven decision about using multi-schedular ADR.

Scenario 1:

Given workflows with 1000, 2000, 4000, 8000, 20000, 40000, 60000, 80000 ? parallel tasks
And each task is a sleep
And the sleep time is random (0.1s-1.0s, distribution ?)
When each of the workflows is run sequentially using at least n_parallelism= max available? (choice resource manager left to the developers)
Then record the sum of all the sleep times using a gather from the output of each task (T_sum_task_times)
Then record total run time (T_workflow)
Then Plot both T_workflow and T_sum_task_times/n_parallelism as a function of the number of tasks in the workflow

Definition of Done

  • Plot in Wiki
  • Paragraph/Report of any relevant findings in Wiki (compare Prefect and Airflow)
  • Add ADR

? discuss with developers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants