Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce max workers to not surpass AWS vCPU limits #8

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

amsnyder
Copy link

@amsnyder amsnyder commented May 10, 2023

Rich messaged me to let me know there’s a quota on how many vCPUs we can spin up at a time in the AWS account that nebari-workshop is running on. I did a little experimentation to estimate a resource usage for a class of 30, and we were going to be way over our allocation limit (assuming all the students are running dask calculations at the exact same time, which they probably will be). I had to bump your max number of workers down to 15 in all of your notebooks to put estimated usage within the range of what we have allocated. I tested out your notebooks and the longest dask calculations (in the standard suite and dscore notebooks) take ~3 mins to run with 15 workers.

Rich put in a request for a higher limit of vCPUs, but it is unlikely to be granted before the workshop. We can leave this as a draft on the off chance they do grant our request before the workshop and don't need to merge. But most likely you will want to merge this PR right before your workshop. Let's be in touch when it gets close to showtime.

To avoid any issues in the class, I would recommend that you really emphasize that students shut down their clusters each time you get to that point in a notebook. I would also recommend that instructors who are watching others present (and not presenting themselves) consider not running the notebooks themselves, just to reduce the usage of resources. (Though based on my estimation, the instructors shouldn't push us over the limit - just wanting to be on the safe side if possible)

@amsnyder
Copy link
Author

I just checked and the request to increase the number of vCPUs was granted, so you should actually be ok using your current workflows. This is something we will need to watch for future trainings though

@amsnyder amsnyder closed this May 10, 2023
@amsnyder amsnyder reopened this May 10, 2023
@amsnyder
Copy link
Author

Re-opening - may want to consider running the notebooks 00/01 with 10-15 workers and keep 03/05 at 30 workers. We can monitor the resource usage while running the first two notebooks and make sure things look ok. If there are any concerns we could ask students to reduce the workers in 03/05 during the class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant