-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support async key retrieval #7
Comments
@TomAugspurger and I are looking into this issue. We changed |
Thanks @andersy005, much appreciated! I've been thinking it may be worth constructing a new test case for this. The test case we have now has very small chunks and I'm not sure we're effectively exercising the server-side cluster to an extent that would fully illuminate the issue here. |
One question I had: what level should the parallelism be happening at? Will it just be at the request level, or will each request be getting results in parallel? IIUC, with a typical async webserver, you're serving many requests simultaneously on a single thread, since most of the time you're just waiting around on IO. In this case, a user filters to some subset which is eventually mapped to some set of keys like import requests
import dask
responses = [dask.delayed(requests.get, pure=False)("localhost:9000/air/0.0.0") for i in range(100)]
dask.compute(responses) Anderson can correct me if I'm wrong, but I think we didn't observe any parallelism there. |
@TomAugspurger - I think we're on the same page here. Dask question: Can two threads call I was trying to avoid adding any dask-distributed functionality in xpublish but it seems we may need to support something like |
I'm not sure either, but I'd also guess yes and yes. async clients seem like they might help. I did have one thing that's confusing me. Why do we ever get a Dask array back in |
While looking at the dashboard for the server, I am noticing that only worker is active. It is likely that #12 didn't fully address the asynchronous fetching: |
Not sure if this is a sign that we still need an async client on the server side |
I believe we are passing an entire variable array to |
We are currently using FastAPI as xpublish's web framework / api engine. FastAPI supports async out of the box (details here: https://fastapi.tiangolo.com/async/).
Some initial applications indicate that we're not getting the expected async behavior. I suspect this is somehow related to how we're using dask to fetch chunks of data. Here's what we have now:
GET chunk endpoint:
https://github.com/jhamman/xpublish/blob/044e9b7b07202f57302a9368b16c2db4cddaa79a/xpublish/rest.py#L140-L143
Which calls the
get_key
method:https://github.com/jhamman/xpublish/blob/044e9b7b07202f57302a9368b16c2db4cddaa79a/xpublish/rest.py#L93-L105
Within
get_data_chunk
, we callcompute()
on individual chunks:https://github.com/jhamman/xpublish/blob/044e9b7b07202f57302a9368b16c2db4cddaa79a/xpublish/rest.py#L254-L255
Is this the best way to do this? Do we need to modify how dask's scheduler see's these tasks or is configured to use async?
The text was updated successfully, but these errors were encountered: