Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Help with Client-Side Batching for Large Requests in Triton #818

Open
harsh-boloai opened this issue Jan 17, 2025 · 0 comments
Open

Comments

@harsh-boloai
Copy link

I’m currently facing an issue with handling requests where the request batch size is greater than the model’s max_batch_size hosted in Triton. The chunking guide for PyTriton suggests it’s possible to address this, but I’m not sure how to implement it using triton client.

Related Open Issues

  • Installing pytriton includes Triton binaries, which I don’t need for client-side operations. I found this issue where others have mentioned the lack of a lightweight pytriton.client package. Any updates on this?

  • There’s an ongoing discussion in Triton server issue #4547 about handling large requests, but there haven’t been updates there either.

Questions

  1. How can I handle requests where the batch size exceeds the model’s max_batch_size? Specifically, I’d like to know how to split these large requests efficiently and send them to Triton in smaller batches.

  2. Could you provide a minimal working example using TritonClient?

    • I’ve seen the PyTriton example, which includes asynchronous support, but I’m looking for something similar with TritonClient.
    • If possible, an example using concurrent.futures or async functionality would be very helpful.
  3. Is there a plan to release a standalone pytriton.client package to avoid installing the full pytriton? Alternatively, is there a plan to include this batch splitting logic in Triton server itself?

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant