-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cubed example using Coiled on AWS #271
Comments
Thanks @rsignell. Looking at the traceback, I think this is probably on our end. I'll take closer look |
We have some code collection logic that doesn't handle In the meantime, I'm not sure where the partial is coming from inside |
Is this a workaround I could implement? I'm afraid I don't understand this! 😕 |
Ah, sorry, I meant if you're playing around with a development version of |
We'll likely have a new release out sometime later today or tomorrow, but if you want to try a dev |
Thanks @jrbourbeau! Both tests ran successfully with Here's the results of the larger
|
@tomwhite or @TomNicholas does the Coiled performance here appear (from the screenshots above) to be comparable to other executors? If not, I'm hoping to work with Coiled engineers to see if they have ideas about how to improve performance. |
Thanks for looking at this @rsignell, and for the bugfix @jrbourbeau. I tried the Coiled executor a while back and had a similar experience with the number of instances not scaling as much as it could have done (see cubed-dev/cubed#309). It would be great if Coiled could ramp up the number of instances faster (is it possible to tell it to do that?). Other executors like Lithops or Modal can scale up to 100s or 1000s of instances in seconds. |
We don't yet have good benchmarks so it's hard to know what the total time should be (and a lot has changed since I last ran anything at scale), but here's a way to get an idea of what the performance should be. If I attempt to run this example locally1 (just for demonstration purposes) then you see a progress bar for each step in the plan (i.e. each intermediate array that needs to be written to zarr). Ignoring the "create arrays", there are 3 steps, each of which consists of 100 embarrassingly parallel tasks (each is a chunk to be saved to an intermediate zarr store). You can see them laid out in the plan: If you don't have 100 chunks being computed in parallel on Coiled between each step then it could go faster.2 If you were to run this on Lambda/GCF then that's what you would see. Assuming your coiled cluster is computing one chunk per worker, it looks like you're only computing 10 chunks at once, so it could be at least 10x faster (before any other graph optimizations). The bottleneck here is the dynamic scaling up of the Coiled Cluster to match the very wide task graphs that Cubed is passing it. Footnotes
|
Coiled uses raw VMs. This means that things are super-cheap, but also that there's a minute-long cold startup time one has to deal with (systems like Modal get around this by having VMs up all the time, but this is expensive and assumes that you're in their region/account. This gets rolled into the 10x cost difference). Because things have minute-long startup times, the adaptive settings are configured to scale up to something large enough so that it completes in ~5 minutes. If we're going to spend one minute spinning up a VM, we want that VM to be active for at least a couple more minutes before spinning it down. You can tweak those configurations if you like, but you'll never get sub-minute with Coiled. There's a cost-speed tradeoff here. |
https://docs.coiled.io/user_guide/clusters/scale.html#adaptive-scaling adaptive scaling docs if you want to tweak some settings. I suspect Cubed can probably make better decisions about when to scale in/out though, and just achieve that through manual |
Yes, we could pre-scale to the maximum width of the graph. |
I was planning on using this for rechunking a large Zarr dataset (reading 1TB Zarr from S3, writing a new rechunked 1 TB Zarr to S3). For this use case I don't care about super fast autoscaling, do I? |
Cost-wise this might be bad, especially if the tasks run quickly. If you spend a minute spinning up VMs that then just run for a second, then you're spending the vast majority of VM-rental-time doing nothing other than setting up.
Not at all. This is a case where Coiled's approach of trading slow spin-up time for dirt cheap spot VMs is ideal. You're spending $0.02 per CPU hour vs $0.20 per CPU hour, as long as you're fine waiting a few minutes for everything to complete. |
I tried following the instructions to run a Cubed demo using Coiled on AWS following these instructions, and the environment creation went smoothly:
but when I tried running the example:
I got:
I'm raising the issue here as I'm guessing something on the Coiled side changed?
But of course I don't really know so I'm prepared to be told to post it over at cubed... :)
The text was updated successfully, but these errors were encountered: