-
-
Notifications
You must be signed in to change notification settings - Fork 955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom CapacityLimiter
#1724
Comments
I think that having a separate capacity limiter makes a lot of sense - if only so that starlette can handle without having its capacity being swallowed by tasks from other applications. Should we also have a thought about whether the usage of |
starlette/starlette/datastructures.py Line 460 in a3b43f0
I think it may be hard to partition out all uses of thread pools into individual Aside form making individual So users can choose: def sync_route(req):
return Response()
# individual limiters
routes = [
Route("/foo", sync_route, limiter=CapacityLimiter(1000),
Route("/bar", sync_route, limiter=CapacityLimiter(5),
]
# shared limiter
limiter = CapacityLimiter(100)
async def async_route(req):
files = await req.form(limiter=CapacityLimiter(10)) # used for writing to the file
await files["file"].read(limiter=limiter) # override the above limiter for reading
return Response()
routes = [
Route("/foo", async_route
Route("/bar", endpoint, limiter= limiter,
]
|
I don't think the user should have control over this. I think Starlette should be able to figure it out internally the optimal way. |
I think it will be hard to find an "optimal way", this seems super workload / business logic dependent. |
Pushing such a depth of choice to the users of starlette is not a real solution either - it would make the API a lot more fiddly, for what I suspect are a minority of users; Here is my attempt at categorizing all the uses of
Out of these, I would note that only the cases in bold run user-defined code. I think that, as a first step, it does not seem necessary to define a custom capacity limiter for the features that do not run user-defined code. Why? The starlette built-in usages of synchronous calls are always made as a "dip", which should multiplex as well as humanly possible with any other tasks that usually use As a second step, I would suggest choosing a single number (e.g. 40), and defining a separate capacity limiter for each of the lines in bold. Why?
|
The fiddly API would only apply to the small minority of users that want to use the feature, the vast majority of users would only see an extra keyword only parameter which in my opinion is not a big issue. But that's the last I'll say on that, I don't mean to force a solution. I do think you make an excellent point w.r.t BackgroundTasks: the last thing we want is BackgroundTasks blocking requests. So yes I think a good first step would be to give BackgroundTasks their own capacity limiter. I don't think we can use a module level |
Here's an example of why I think we should let users set this value: harsh8398/fastapi-dependency-issue#2. The specific issue is with FastAPI's dependency injection system, but the same thing would apply to slow sync endpoints. I don't use sync dependencies like this but if I were tasked with fixing this in a real app, I would just pump up the capacity limit and hardware as needed until it's no longer the bottleneck. The easiest way to do that would be to let users set that value by passing a |
@Kludex Was this discussed again in gitter? Any links to it? |
I think this was mentioned on FastAPI issues. I don't recall an old discussion on gitter about this. |
I'm going to write it here how to change the default Right now, you can modify the number of import anyio
from time import sleep
from fastapi import FastAPI
from starlette.applications import Starlette
from starlette.background import BackgroundTasks
from starlette.requests import Request
from starlette.responses import JSONResponse
from starlette.routing import Route
num = 0
def count_sleep():
global num
num += 1
print(f"Running number {num}.")
sleep(10)
def endpoint(request: Request) -> JSONResponse:
tasks = BackgroundTasks()
tasks.add_task(count_sleep)
return JSONResponse({"message": "Hello, world!"}, background=tasks)
# THIS IS THE ADDITION
async def startup():
limiter = anyio.to_thread.current_default_thread_limiter()
limiter.total_tokens = 100
app = Starlette(routes=[Route("/", endpoint)], on_startup=[startup]) You can perform the same query as mentioned:
This time, you are NOT going to have the same behavior as mentioned on:
The behavior now is:
No waiting time. |
That is indeed helpful information! I imagine for a lot of users currently experiencing issues, that's a viable short term solution 🎉! That said, I think there are still conversation to be had because there is very real situations where you might want one thing (endpoint, FastAPI dependency, background task, etc.) to have a limited number of threads to avoid exhausting memory and another thing to have a lot more threads |
this is actually how the vast majority of flask applications are tuned. with the multiple different kind of workers and thread settings. Gunicorn allows for each of the workers to have multiple threads.
this is very application dependent and something that we have tuned over time. Having control on this is very important. its ok if you are not picking it up from the gunicorn args...but this control is definitely important. |
We had a similar situation in Gradio, and resolved it via this kind of approach. Wanted to share to support the issue. |
First step here would be to add some documentation. |
PR welcome to document this behavior, and how to overcome it. (I don't know where is most suitable...) EDIT:
Either a new page, or suggest something. |
Let's only document this. I've already talked to @agronholm some weeks ago about this, and he was cool about just documenting it. |
Any ideas where's the best place to document this? import anyio
from starlette.applications import Starlette
from starlette.background import BackgroundTasks
from starlette.requests import Request
from starlette.responses import JSONResponse
from starlette.routing import Route
limiter = anyio.to_thread.current_default_thread_limiter()
limiter.total_tokens = 100
def do_something():
...
def endpoint(request: Request) -> JSONResponse:
tasks = BackgroundTasks()
tasks.add_task(do_something)
return JSONResponse({"message": "Hello, world!"}, background=tasks)
app = Starlette(routes=[Route("/", endpoint)]) |
No. I think you can suggest with a PR, and we can see if it fits. |
And this limit doesn't change even if I change the number of workers. With workers =1 and workers = 10 still it's just serving 40 sync requests. I would hope with 10 uvicorn workers it can do 400 requests. Also comparing the same to flask I was even able to run 1000 parallel sync functions easily without any limitation etc. One more thing these 40 should be a dynamic number like number of cpus * 10 or something. |
It's not like additional CPUs will make it go any faster with the GIL blocking parallelism. |
@agronholm We should be good on that from python 3.13. Since it's a long standing issue with no solution. That seems like a good route to take. |
Really? Based on what? If you're referring to the free-threaded build, that thing still has ways to go before it's actually usable. |
@agronholm Oh, I see. I just looked it up. My apologies, was under the impression it was stable already. |
Starlette is a web framework that supports both async and sync functions. The sync part code runs in a threadpool.
The threadpool contains a maximum number of threads: 40.
https://github.com/agronholm/anyio/blob/4f3a8056a8b14dbe43c95039a0d731ede1083cb7/src/anyio/_backends/_asyncio.py#L2071-L2077
The concern on this issue is that the threads are shared between the ones that handle the endpoint, and the background tasks.
Assume we have a simple application:
Running it with
uvicorn
:And performing some requests (using httpie):
We can observe that:
Running number 40.
.Running number 80.
.Running number 100.
.I'm just bringing this up, so people are aware.
@agronholm proposed on Gitter that we create a separated
CapacityLimiter
dedicated only for handling the application (i.e.request_response()
). This means thatn
(depending on the number of tokens we choose) number of threads would be dedicated forrequest_response()
.Important
The text was updated successfully, but these errors were encountered: