You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The ONNX Runtime backend in Triton Inference Server lacks direct support for minShapes, optShapes, and maxShapes in the model configuration with TensorRT optimization. While ONNX Runtime itself supports these parameters for TensorRT (as seen here), their absence in Triton's ONNX Runtime backend limits efficient handling of models with dynamic input shapes.
Describe the solution you'd like
I propose adding support for the following parameters directly in Triton's ONNX Runtime backend configuration:
trt_profile_min_shapes
trt_profile_opt_shapes
trt_profile_max_shapes
This addition would enable optimized handling of dynamic input sizes within Triton, improving the performance and flexibility of models utilizing TensorRT.
Describe alternatives you've considered
Мanually compiling the TensorRT engine with these shape ranges before loading it into Triton. However, this approach is less integrated and flexible compared to having direct support in the Triton configuration.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
The ONNX Runtime backend in Triton Inference Server lacks direct support for minShapes, optShapes, and maxShapes in the model configuration with TensorRT optimization. While ONNX Runtime itself supports these parameters for TensorRT (as seen here), their absence in Triton's ONNX Runtime backend limits efficient handling of models with dynamic input shapes.
Describe the solution you'd like
I propose adding support for the following parameters directly in Triton's ONNX Runtime backend configuration:
trt_profile_min_shapes
trt_profile_opt_shapes
trt_profile_max_shapes
This addition would enable optimized handling of dynamic input sizes within Triton, improving the performance and flexibility of models utilizing TensorRT.
Describe alternatives you've considered
Мanually compiling the TensorRT engine with these shape ranges before loading it into Triton. However, this approach is less integrated and flexible compared to having direct support in the Triton configuration.
The text was updated successfully, but these errors were encountered: