-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trtexec dynamic batch size #976
Comments
(1) (2) When you load the engine, you need to pass in the actual shape you would like to run the inference with by adding |
thanks @nvpohanh
but the inference time is more than 50x than trt model with fixed
GPU latency:
message of dynamic batch model when run inference with shapes params
|
trtexec returns the runtime per inference, where an "inference" is a query of batch_size=N which you specified.
It is possible, although in this case I would be surprised by this difference. Could you share what commands you use to build the "default model" and the "dynamic batch (batch=1) model"? The default should be identical (or equivalent to) using |
Maybe I am not describe clearly, build model
dynamic batch model:
inference test
batch model(inference with batch size of 1 and 32)
|
I see. Since "optShapes" is batch_size=8, TRT will use batch_size=8 to choose the fastest tactics, which may not be optimal for batch_size=1 and batch_size=32. Could you try to change optShapes from 8 to 1 or 32 and see if that helps improve the inference time? |
By changed optShapes from 8 to 1 and 32, I build another two models ,
it seems like that the |
I see. That probably means there are some faster tactics taking advantage of static shapes but won't work with dynamic shapes. Could you try setting max=opt=min=inference=32 and see if that helps the latency for batch_size=32 case? |
i tried
but trtexec always return
|
@zirui Could you try |
it works! And when test with the new model built with |
I see. I think now the mystery is: does |
i rebuild models with different min-opt-max shapes:
performance test
so, from the table, we can see that: |
@zirui Would it be possible that you can provide the onnx file so that we can debug this gap? Thanks |
I am unable to provide the model file due to the restriction of my private network environment, |
Thanks. I think the main focus would be: why is 1-1-1 different from the default model? |
hi, @nvpohanh |
@zirui I see that your onnx file input shapes have two dynamic axes: |
I saw the warning logs when build
but i found that the model could run inference by specify
so does this mean the default model run inference with fixed |
This is because the engine has been built with a fixed shape of 1x1, so the input shape is no longer dynamic. That also means trtexec won't try to set the input shape again (it's already static, anyway). I do think we should improve trtexec so that it throws errors (or at least warnings) if user-provided shapes are not used. |
@nvpohanh |
Hey, you said that "ONNX, which only supports explicit batch dimension." |
Starting from TRT 7 (I think), ONNX is always explicit batch. Using |
My input shape is 1x2x2x20, and the output shape is (1,). I can use Thanks. |
@KDr2 When you export the ONNX file, could you specify the batch dimension to be a dynamic dimension? If you are using PyTorch, then use the |
Hi @nvpohanh, my onnx model is converted from a tensorflow saved mode:
Above is the tensor info of the input. And then I used Thanks. |
I am not very familiar with tf2onnx, but it seems that you can try this:
|
excuse me, are you compare inf time with torch model and trt model? |
usually infer time at torch model > trt model |
Description
I tried to convert my onnx model to tensorRT model with trtexec , and i want the batch size to be dynamic,
but failed with two problems:
maxBatch
param failedEnvironment
TensorRT Version: 7.2.1
GPU Type: Telsa P40
Nvidia Driver Version: 418.40.04
CUDA Version: cuda11.1
CUDNN Version: 8.04
Operating System + Version: 18.04.5 LTS
Python Version (if applicable): 3.6.6
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:20.11-py3
Relevant Files
Steps To Reproduce
1. Convert onnx model with
maxBatch
error logs returned by trtexec
2.
(1) Convert to tensorRT model by specify Shape params
(2) Load model with trtexec
The text was updated successfully, but these errors were encountered: