How to control request size within a batch? #2030
Replies: 2 comments
-
I'm also curious about this and also the relationship between max-batch-size and max-concurrent-requests |
Beta Was this translation helpful? Give feedback.
-
The --max-batch-size limits how many requests are grouped together, but TGi adjusts dynamically to handle more requests under heavy load (e.g due to long tokens or backpressure). This can cause the batch size to exceed your set limit. How to fix it: Reduce --max-total-tokens to limit the total tokens in a batch. |
Beta Was this translation helpful? Give feedback.
-
Hi all,
I've tried
--max-batch-size
command however it doesn't work as I've expected, I thought it was supposed to limittgi_batch_current_size
. I'd like to control the queue size & how many requests per inference batch. Can someone please clarify on this?These are my launch commands and version:
My situation:
--max-batch-size 10
tgi_batch_current_size
goes over 10 after a short whileExpected situation:
--max-batch-size 10
tgi_batch_current_size
stays at 10, and the other 30 request will stay intgi_queue_size
until its done.TLDR:
tgi_batch_current_size
doesn't align with--max-batch-size
.Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions