Request for Supporting minShapes/optShapes/maxShapes for TensorRT #232

teith · 2024-01-15T20:37:56Z

Is your feature request related to a problem? Please describe.
The ONNX Runtime backend in Triton Inference Server lacks direct support for minShapes, optShapes, and maxShapes in the model configuration with TensorRT optimization. While ONNX Runtime itself supports these parameters for TensorRT (as seen here), their absence in Triton's ONNX Runtime backend limits efficient handling of models with dynamic input shapes.

Describe the solution you'd like
I propose adding support for the following parameters directly in Triton's ONNX Runtime backend configuration:
trt_profile_min_shapes
trt_profile_opt_shapes
trt_profile_max_shapes
This addition would enable optimized handling of dynamic input sizes within Triton, improving the performance and flexibility of models utilizing TensorRT.

Describe alternatives you've considered
Мanually compiling the TensorRT engine with these shape ranges before loading it into Triton. However, this approach is less integrated and flexible compared to having direct support in the Triton configuration.

gedoensmax · 2024-02-26T17:47:21Z

#217 would solve this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for Supporting minShapes/optShapes/maxShapes for TensorRT #232

Request for Supporting minShapes/optShapes/maxShapes for TensorRT #232

teith commented Jan 15, 2024

gedoensmax commented Feb 26, 2024

Request for Supporting minShapes/optShapes/maxShapes for TensorRT #232

Request for Supporting minShapes/optShapes/maxShapes for TensorRT #232

Comments

teith commented Jan 15, 2024

gedoensmax commented Feb 26, 2024