Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Supporting minShapes/optShapes/maxShapes for TensorRT #232

Open
teith opened this issue Jan 15, 2024 · 1 comment
Open

Request for Supporting minShapes/optShapes/maxShapes for TensorRT #232

teith opened this issue Jan 15, 2024 · 1 comment

Comments

@teith
Copy link

teith commented Jan 15, 2024

Is your feature request related to a problem? Please describe.
The ONNX Runtime backend in Triton Inference Server lacks direct support for minShapes, optShapes, and maxShapes in the model configuration with TensorRT optimization. While ONNX Runtime itself supports these parameters for TensorRT (as seen here), their absence in Triton's ONNX Runtime backend limits efficient handling of models with dynamic input shapes.

Describe the solution you'd like
I propose adding support for the following parameters directly in Triton's ONNX Runtime backend configuration:
trt_profile_min_shapes
trt_profile_opt_shapes
trt_profile_max_shapes
This addition would enable optimized handling of dynamic input sizes within Triton, improving the performance and flexibility of models utilizing TensorRT.

Describe alternatives you've considered
Мanually compiling the TensorRT engine with these shape ranges before loading it into Triton. However, this approach is less integrated and flexible compared to having direct support in the Triton configuration.

@gedoensmax
Copy link
Contributor

#217 would solve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants