You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m currently facing an issue with handling requests where the request batch size is greater than the model’s max_batch_size hosted in Triton. The chunking guide for PyTriton suggests it’s possible to address this, but I’m not sure how to implement it using triton client.
Related Open Issues
Installing pytriton includes Triton binaries, which I don’t need for client-side operations. I found this issue where others have mentioned the lack of a lightweight pytriton.client package. Any updates on this?
There’s an ongoing discussion in Triton server issue #4547 about handling large requests, but there haven’t been updates there either.
Questions
How can I handle requests where the batch size exceeds the model’s max_batch_size? Specifically, I’d like to know how to split these large requests efficiently and send them to Triton in smaller batches.
Could you provide a minimal working example using TritonClient?
I’ve seen the PyTriton example, which includes asynchronous support, but I’m looking for something similar with TritonClient.
If possible, an example using concurrent.futures or async functionality would be very helpful.
Is there a plan to release a standalone pytriton.client package to avoid installing the full pytriton? Alternatively, is there a plan to include this batch splitting logic in Triton server itself?
Thanks in advance!
The text was updated successfully, but these errors were encountered:
I’m currently facing an issue with handling requests where the request batch size is greater than the model’s max_batch_size hosted in Triton. The chunking guide for PyTriton suggests it’s possible to address this, but I’m not sure how to implement it using triton client.
Related Open Issues
Installing
pytriton
includes Triton binaries, which I don’t need for client-side operations. I found this issue where others have mentioned the lack of a lightweightpytriton.client
package. Any updates on this?There’s an ongoing discussion in Triton server issue #4547 about handling large requests, but there haven’t been updates there either.
Questions
How can I handle requests where the batch size exceeds the model’s
max_batch_size
? Specifically, I’d like to know how to split these large requests efficiently and send them to Triton in smaller batches.Could you provide a minimal working example using TritonClient?
concurrent.futures
or async functionality would be very helpful.Is there a plan to release a standalone
pytriton.client
package to avoid installing the fullpytriton
? Alternatively, is there a plan to include this batch splitting logic in Triton server itself?Thanks in advance!
The text was updated successfully, but these errors were encountered: