You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I often have workflows that are mostly thousands of reads to s3 for chunks of the same size. I can see some of the variability on the Dask dashboard, but was wondering:
Is there might be a way to get the metrics for the tasks programmatically as a dataframe or something so I could look at the distribution, the tails of the distribution, etc. ?
The text was updated successfully, but these errors were encountered:
rsignell
changed the title
Would there be a way to get a histogram of s3 access times from the information that Coiled collects?
Would there be a way to programmatically get task metrics from the information that Coiled collects?
Jul 24, 2024
Hi, @rsignell! I'm curious about the intent behind your request. What problem(s) would you like to solve using task metrics? Since there's a plethora of possible metrics, which ones would you be interested in?
I would like to look at the variability of the time it takes to retrieve many chunks of identically-sized chunks of data from s3.
That would mean you'd be interested in the distribution of task durations for those tasks reading your chunks, or something else? Is there a specific problem that understanding the variability would help you with?
Yes, I'm trying to figure out how many chunks at a time I should request for each task, and it would be good to know the distribution of s3 access times within the task
I often have workflows that are mostly thousands of reads to s3 for chunks of the same size. I can see some of the variability on the Dask dashboard, but was wondering:
Is there might be a way to get the metrics for the tasks programmatically as a dataframe or something so I could look at the distribution, the tails of the distribution, etc. ?
The text was updated successfully, but these errors were encountered: