Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refittable failure of TensorRT 8.6.1.6 when running model.onnx on GPU A100 #4299

Open
lizexu123 opened this issue Dec 24, 2024 · 9 comments
Open

Comments

@lizexu123
Copy link

lizexu123 commented Dec 24, 2024

Description

[E] Error[3]: [runtime.cpp::~Runtime::346] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::346, condition: mEngineCounter.use_count() == 1. Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.

Environment

TensorRT Version:
TensorRT-8.6.1.6
NVIDIA GPU:
A100
NVIDIA Driver Version:

CUDA Version:
cuda-12.0
CUDNN Version:
cudnn v8.9.1

Operating System:

Python Version (if applicable):
Python 3.9
Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

./trtexec --onnx=resnet50-v2-7.onnx --saveEngine=resnet50_refit.plan --refit --timeRefit
./trtexec --onnx=yolov4.onnx --saveEngine=yolov4_refit.plan --refit --timeRefit
./trtexec --onnx=mobilenetv2-7.onnx --saveEngine=mobilenetv2-7_refit.plan --refit --timeRefit

Model link:
https://github.com/onnx/models/raw/refs/heads/main/validated/vision/classification/resnet/model/resnet50-v2-7.onnx
https://github.com/onnx/models/raw/refs/heads/main/validated/vision/object_detection_segmentation/yolov4/model/yolov4.onnx
https://github.com/onnx/models/raw/refs/heads/main/validated/vision/classification/mobilenet/model/mobilenetv2-7.onnx

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

@anderson101866
Copy link

Dear TRT experts,
customer(lizexu123) said that they encounter potential issue in trtexec of TRT 8.6.1.
I found that it can be reproduce with NGC PyTorch container 24.04(which is TRT 8.6.3).

This trtexec command will trigger our assertion like this

[E] Error[3]: [runtime.cpp::~Runtime::346] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::346, condition: mEngineCounter.use_count() == 1. Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.
)

Is it any issue tracking with this?

Reproduce

wget https://github.com/onnx/models/raw/refs/heads/main/validated/vision/classification/resnet/model/resnet50-v2-7.onnx
trtexec --onnx=resnet50-v2-7.onnx --saveEngine=resnet50_refit.plan --refit --timeRefit
full log attached here
&&&& RUNNING TensorRT.trtexec [TensorRT v8603] # trtexec --onnx=resnet50-v2-7.onnx --saveEngine=resnet50_refit.plan --refit --timeRefit
[12/25/2024-03:34:21] [I] === Model Options ===
[12/25/2024-03:34:21] [I] Format: ONNX
[12/25/2024-03:34:21] [I] Model: resnet50-v2-7.onnx
[12/25/2024-03:34:21] [I] Output:
[12/25/2024-03:34:21] [I] === Build Options ===
[12/25/2024-03:34:21] [I] Max batch: explicit batch
[12/25/2024-03:34:21] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[12/25/2024-03:34:21] [I] minTiming: 1
[12/25/2024-03:34:21] [I] avgTiming: 8
[12/25/2024-03:34:21] [I] Precision: FP32
[12/25/2024-03:34:21] [I] LayerPrecisions: 
[12/25/2024-03:34:21] [I] Layer Device Types: 
[12/25/2024-03:34:21] [I] Calibration: 
[12/25/2024-03:34:21] [I] Refit: Enabled
[12/25/2024-03:34:21] [I] Version Compatible: Disabled
[12/25/2024-03:34:21] [I] ONNX Native InstanceNorm: Disabled
[12/25/2024-03:34:21] [I] TensorRT runtime: full
[12/25/2024-03:34:21] [I] Lean DLL Path: 
[12/25/2024-03:34:21] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[12/25/2024-03:34:21] [I] Exclude Lean Runtime: Disabled
[12/25/2024-03:34:21] [I] Sparsity: Disabled
[12/25/2024-03:34:21] [I] Safe mode: Disabled
[12/25/2024-03:34:21] [I] Build DLA standalone loadable: Disabled
[12/25/2024-03:34:21] [I] Allow GPU fallback for DLA: Disabled
[12/25/2024-03:34:21] [I] DirectIO mode: Disabled
[12/25/2024-03:34:21] [I] Restricted mode: Disabled
[12/25/2024-03:34:21] [I] Skip inference: Disabled
[12/25/2024-03:34:21] [I] Save engine: resnet50_refit.plan
[12/25/2024-03:34:21] [I] Load engine: 
[12/25/2024-03:34:21] [I] Profiling verbosity: 0
[12/25/2024-03:34:21] [I] Tactic sources: Using default tactic sources
[12/25/2024-03:34:21] [I] timingCacheMode: local
[12/25/2024-03:34:21] [I] timingCacheFile: 
[12/25/2024-03:34:21] [I] Heuristic: Disabled
[12/25/2024-03:34:21] [I] Preview Features: Use default preview flags.
[12/25/2024-03:34:21] [I] MaxAuxStreams: -1
[12/25/2024-03:34:21] [I] BuilderOptimizationLevel: -1
[12/25/2024-03:34:21] [I] Input(s)s format: fp32:CHW
[12/25/2024-03:34:21] [I] Output(s)s format: fp32:CHW
[12/25/2024-03:34:21] [I] Input build shapes: model
[12/25/2024-03:34:21] [I] Input calibration shapes: model
[12/25/2024-03:34:21] [I] === System Options ===
[12/25/2024-03:34:21] [I] Device: 0
[12/25/2024-03:34:21] [I] DLACore: 
[12/25/2024-03:34:21] [I] Plugins:
[12/25/2024-03:34:21] [I] setPluginsToSerialize:
[12/25/2024-03:34:21] [I] dynamicPlugins:
[12/25/2024-03:34:21] [I] ignoreParsedPluginLibs: 0
[12/25/2024-03:34:21] [I] 
[12/25/2024-03:34:21] [I] === Inference Options ===
[12/25/2024-03:34:21] [I] Batch: Explicit
[12/25/2024-03:34:21] [I] Input inference shapes: model
[12/25/2024-03:34:21] [I] Iterations: 10
[12/25/2024-03:34:21] [I] Duration: 3s (+ 200ms warm up)
[12/25/2024-03:34:21] [I] Sleep time: 0ms
[12/25/2024-03:34:21] [I] Idle time: 0ms
[12/25/2024-03:34:21] [I] Inference Streams: 1
[12/25/2024-03:34:21] [I] ExposeDMA: Disabled
[12/25/2024-03:34:21] [I] Data transfers: Enabled
[12/25/2024-03:34:21] [I] Spin-wait: Disabled
[12/25/2024-03:34:21] [I] Multithreading: Disabled
[12/25/2024-03:34:21] [I] CUDA Graph: Disabled
[12/25/2024-03:34:21] [I] Separate profiling: Disabled
[12/25/2024-03:34:21] [I] Time Deserialize: Disabled
[12/25/2024-03:34:21] [I] Time Refit: Enabled
[12/25/2024-03:34:21] [I] NVTX verbosity: 0
[12/25/2024-03:34:21] [I] Persistent Cache Ratio: 0
[12/25/2024-03:34:21] [I] Inputs:
[12/25/2024-03:34:21] [I] === Reporting Options ===
[12/25/2024-03:34:21] [I] Verbose: Disabled
[12/25/2024-03:34:21] [I] Averages: 10 inferences
[12/25/2024-03:34:21] [I] Percentiles: 90,95,99
[12/25/2024-03:34:21] [I] Dump refittable layers:Disabled
[12/25/2024-03:34:21] [I] Dump output: Disabled
[12/25/2024-03:34:21] [I] Profile: Disabled
[12/25/2024-03:34:21] [I] Export timing to JSON file: 
[12/25/2024-03:34:21] [I] Export output to JSON file: 
[12/25/2024-03:34:21] [I] Export profile to JSON file: 
[12/25/2024-03:34:21] [I] 
[12/25/2024-03:34:21] [I] === Device Information ===
[12/25/2024-03:34:21] [I] Selected Device: NVIDIA A100-SXM4-80GB
[12/25/2024-03:34:21] [I] Compute Capability: 8.0
[12/25/2024-03:34:21] [I] SMs: 108
[12/25/2024-03:34:21] [I] Device Global Memory: 81037 MiB
[12/25/2024-03:34:21] [I] Shared Memory per SM: 164 KiB
[12/25/2024-03:34:21] [I] Memory Bus Width: 5120 bits (ECC enabled)
[12/25/2024-03:34:21] [I] Application Compute Clock Rate: 1.41 GHz
[12/25/2024-03:34:21] [I] Application Memory Clock Rate: 1.593 GHz
[12/25/2024-03:34:21] [I] 
[12/25/2024-03:34:21] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[12/25/2024-03:34:21] [I] 
[12/25/2024-03:34:21] [I] TensorRT version: 8.6.3
[12/25/2024-03:34:21] [I] Loading standard plugins
[12/25/2024-03:34:21] [I] [TRT] [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 23, GPU 424 (MiB)
[12/25/2024-03:34:27] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1667, GPU +310, now: CPU 1767, GPU 734 (MiB)
[12/25/2024-03:34:27] [I] Start parsing network model.
[12/25/2024-03:34:27] [I] [TRT] ----------------------------------------------------------------
[12/25/2024-03:34:27] [I] [TRT] Input filename:   resnet50-v2-7.onnx
[12/25/2024-03:34:27] [I] [TRT] ONNX IR version:  0.0.3
[12/25/2024-03:34:27] [I] [TRT] Opset version:    7
[12/25/2024-03:34:27] [I] [TRT] Producer name:    
[12/25/2024-03:34:27] [I] [TRT] Producer version: 
[12/25/2024-03:34:27] [I] [TRT] Domain:           
[12/25/2024-03:34:27] [I] [TRT] Model version:    0
[12/25/2024-03:34:27] [I] [TRT] Doc string:       
[12/25/2024-03:34:27] [I] [TRT] ----------------------------------------------------------------
[12/25/2024-03:34:27] [W] [TRT] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[12/25/2024-03:34:27] [I] Finished parsing network model. Parse time: 0.327449
[12/25/2024-03:34:27] [W] Dynamic dimensions required for input: data, but no shapes were provided. Automatically overriding shape to: 1x3x224x224
[12/25/2024-03:34:29] [I] [TRT] Graph optimization time: 1.86826 seconds.
[12/25/2024-03:34:29] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[12/25/2024-03:34:52] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[12/25/2024-03:34:54] [I] [TRT] Total Host Persistent Memory: 320720
[12/25/2024-03:34:54] [I] [TRT] Total Device Persistent Memory: 0
[12/25/2024-03:34:54] [I] [TRT] Total Scratch Memory: 263168
[12/25/2024-03:34:54] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 68 MiB, GPU 98 MiB
[12/25/2024-03:34:54] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 96 steps to complete.
[12/25/2024-03:34:54] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 1.18346ms to assign 4 blocks to 96 nodes requiring 9734144 bytes.
[12/25/2024-03:34:54] [I] [TRT] Total Activation Memory: 9734144
[12/25/2024-03:34:54] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +59, GPU +98, now: CPU 59, GPU 98 (MiB)
[12/25/2024-03:34:54] [I] Engine built in 32.9268 sec.
[12/25/2024-03:34:54] [I] [TRT] Loaded engine size: 99 MiB
[12/25/2024-03:34:54] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +97, now: CPU 0, GPU 97 (MiB)
[12/25/2024-03:34:54] [I] Engine deserialized in 0.350898 sec.
[12/25/2024-03:34:54] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2210, GPU 908 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 59, GPU 97 (MiB)
[12/25/2024-03:34:56] [I] Engine refitted in 300.885 ms.
[12/25/2024-03:34:56] [E] Error[3]: [runtime.cpp::~Runtime::346] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::346, condition: mEngineCounter.use_count() == 1. Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.
)
[12/25/2024-03:34:56] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +9, now: CPU 60, GPU 106 (MiB)
[12/25/2024-03:34:56] [I] Setting persistentCacheLimit to 0 bytes.
[12/25/2024-03:34:56] [I] Using random values for input data
[12/25/2024-03:34:56] [I] Input binding for data with dimensions 1x3x224x224 is created.
[12/25/2024-03:34:56] [I] Output binding for resnetv24_dense0_fwd with dimensions 1x1000 is created.
[12/25/2024-03:34:56] [I] Starting inference
[12/25/2024-03:35:00] [I] Warmup completed 204 queries over 200 ms
[12/25/2024-03:35:00] [I] Timing trace has 3066 queries over 3.00339 s
[12/25/2024-03:35:00] [I] 
[12/25/2024-03:35:00] [I] === Trace details ===
[12/25/2024-03:35:00] [I] Trace averages of 10 runs:
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975772 ms - Host latency: 1.02461 ms (enqueue 0.50674 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975258 ms - Host latency: 1.02413 ms (enqueue 0.507251 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975362 ms - Host latency: 1.02448 ms (enqueue 0.506453 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974638 ms - Host latency: 1.02379 ms (enqueue 0.50273 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.976079 ms - Host latency: 1.02496 ms (enqueue 0.50186 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975461 ms - Host latency: 1.02445 ms (enqueue 0.501781 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975674 ms - Host latency: 1.03113 ms (enqueue 0.710577 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974948 ms - Host latency: 1.02444 ms (enqueue 0.394061 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97536 ms - Host latency: 1.02414 ms (enqueue 0.509116 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974951 ms - Host latency: 1.02384 ms (enqueue 0.507416 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975565 ms - Host latency: 1.02483 ms (enqueue 0.50477 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975568 ms - Host latency: 1.02448 ms (enqueue 0.507593 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97485 ms - Host latency: 1.02368 ms (enqueue 0.508276 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975162 ms - Host latency: 1.02381 ms (enqueue 0.507382 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974854 ms - Host latency: 1.0236 ms (enqueue 0.505246 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975772 ms - Host latency: 1.02469 ms (enqueue 0.504633 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975357 ms - Host latency: 1.02596 ms (enqueue 0.506036 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975671 ms - Host latency: 1.02455 ms (enqueue 0.507132 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975772 ms - Host latency: 1.02486 ms (enqueue 0.507053 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974847 ms - Host latency: 1.02352 ms (enqueue 0.506598 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975668 ms - Host latency: 1.0263 ms (enqueue 0.505295 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97485 ms - Host latency: 1.02362 ms (enqueue 0.506424 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97598 ms - Host latency: 1.02501 ms (enqueue 0.506869 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975668 ms - Host latency: 1.02469 ms (enqueue 0.50658 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975052 ms - Host latency: 1.02381 ms (enqueue 0.504706 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975674 ms - Host latency: 1.02479 ms (enqueue 0.506091 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.972906 ms - Host latency: 1.02166 ms (enqueue 0.513477 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.96655 ms - Host latency: 1.01529 ms (enqueue 0.506876 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.965945 ms - Host latency: 1.01463 ms (enqueue 0.505887 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.968399 ms - Host latency: 1.01702 ms (enqueue 0.508035 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975565 ms - Host latency: 1.02454 ms (enqueue 0.506741 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.973825 ms - Host latency: 1.02256 ms (enqueue 0.509625 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974945 ms - Host latency: 1.02458 ms (enqueue 0.506647 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97619 ms - Host latency: 1.02492 ms (enqueue 0.506335 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975262 ms - Host latency: 1.02407 ms (enqueue 0.507739 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97475 ms - Host latency: 1.02366 ms (enqueue 0.507635 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974847 ms - Host latency: 1.02358 ms (enqueue 0.502686 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975049 ms - Host latency: 1.02408 ms (enqueue 0.50127 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97536 ms - Host latency: 1.02422 ms (enqueue 0.502325 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975256 ms - Host latency: 1.02386 ms (enqueue 0.503674 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97547 ms - Host latency: 1.02475 ms (enqueue 0.504071 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975781 ms - Host latency: 1.02474 ms (enqueue 0.502124 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975366 ms - Host latency: 1.02452 ms (enqueue 0.503339 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975568 ms - Host latency: 1.02426 ms (enqueue 0.501794 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975989 ms - Host latency: 1.02463 ms (enqueue 0.503088 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974738 ms - Host latency: 1.02353 ms (enqueue 0.503644 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975055 ms - Host latency: 1.02682 ms (enqueue 0.503699 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97536 ms - Host latency: 1.02413 ms (enqueue 0.50155 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975665 ms - Host latency: 1.02487 ms (enqueue 0.502924 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975677 ms - Host latency: 1.02432 ms (enqueue 0.50434 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974945 ms - Host latency: 1.02402 ms (enqueue 0.50343 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974847 ms - Host latency: 1.02358 ms (enqueue 0.504077 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975256 ms - Host latency: 1.02414 ms (enqueue 0.500336 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975165 ms - Host latency: 1.02401 ms (enqueue 0.501819 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975568 ms - Host latency: 1.02419 ms (enqueue 0.503223 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975269 ms - Host latency: 1.02391 ms (enqueue 0.503058 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975146 ms - Host latency: 1.02404 ms (enqueue 0.502777 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975372 ms - Host latency: 1.02452 ms (enqueue 0.503027 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974957 ms - Host latency: 1.0233 ms (enqueue 0.502722 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974237 ms - Host latency: 1.02315 ms (enqueue 0.507599 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975769 ms - Host latency: 1.02458 ms (enqueue 0.505664 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97536 ms - Host latency: 1.02382 ms (enqueue 0.501257 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975262 ms - Host latency: 1.02405 ms (enqueue 0.502356 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975055 ms - Host latency: 1.02338 ms (enqueue 0.50238 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975049 ms - Host latency: 1.02516 ms (enqueue 0.504785 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975476 ms - Host latency: 1.0237 ms (enqueue 0.503766 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975372 ms - Host latency: 1.02384 ms (enqueue 0.503784 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974945 ms - Host latency: 1.02365 ms (enqueue 0.502289 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975153 ms - Host latency: 1.02365 ms (enqueue 0.503186 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975049 ms - Host latency: 1.0238 ms (enqueue 0.514526 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975275 ms - Host latency: 1.02388 ms (enqueue 0.507507 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975873 ms - Host latency: 1.02474 ms (enqueue 0.502631 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974225 ms - Host latency: 1.02286 ms (enqueue 0.507581 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975262 ms - Host latency: 1.02387 ms (enqueue 0.513861 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975049 ms - Host latency: 1.02369 ms (enqueue 0.513855 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975226 ms - Host latency: 1.02391 ms (enqueue 0.503503 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97536 ms - Host latency: 1.02403 ms (enqueue 0.504602 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975561 ms - Host latency: 1.02442 ms (enqueue 0.508997 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975238 ms - Host latency: 1.02372 ms (enqueue 0.509625 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975165 ms - Host latency: 1.02383 ms (enqueue 0.509625 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975568 ms - Host latency: 1.02398 ms (enqueue 0.504919 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975159 ms - Host latency: 1.02359 ms (enqueue 0.504718 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974432 ms - Host latency: 1.0254 ms (enqueue 0.505096 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97536 ms - Host latency: 1.02412 ms (enqueue 0.502698 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97536 ms - Host latency: 1.0238 ms (enqueue 0.503284 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975476 ms - Host latency: 1.0243 ms (enqueue 0.501184 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974634 ms - Host latency: 1.02487 ms (enqueue 0.501489 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975281 ms - Host latency: 1.02395 ms (enqueue 0.502258 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975659 ms - Host latency: 1.02429 ms (enqueue 0.501868 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974951 ms - Host latency: 1.02351 ms (enqueue 0.505896 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975183 ms - Host latency: 1.02438 ms (enqueue 0.500037 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975061 ms - Host latency: 1.02461 ms (enqueue 0.501611 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975146 ms - Host latency: 1.02399 ms (enqueue 0.501953 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974939 ms - Host latency: 1.02336 ms (enqueue 0.503235 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975073 ms - Host latency: 1.02354 ms (enqueue 0.502808 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975171 ms - Host latency: 1.0234 ms (enqueue 0.502649 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975085 ms - Host latency: 1.02399 ms (enqueue 0.503589 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974133 ms - Host latency: 1.02247 ms (enqueue 0.502417 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974744 ms - Host latency: 1.02354 ms (enqueue 0.507629 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975354 ms - Host latency: 1.02451 ms (enqueue 0.507678 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974744 ms - Host latency: 1.02355 ms (enqueue 0.504102 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975732 ms - Host latency: 1.02433 ms (enqueue 0.509583 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975244 ms - Host latency: 1.02386 ms (enqueue 0.509766 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975134 ms - Host latency: 1.02375 ms (enqueue 0.503064 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975671 ms - Host latency: 1.02539 ms (enqueue 0.503723 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975354 ms - Host latency: 1.02413 ms (enqueue 0.507922 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975464 ms - Host latency: 1.0238 ms (enqueue 0.502197 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974548 ms - Host latency: 1.02269 ms (enqueue 0.503674 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974731 ms - Host latency: 1.02522 ms (enqueue 0.502344 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975061 ms - Host latency: 1.02347 ms (enqueue 0.502332 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975977 ms - Host latency: 1.02428 ms (enqueue 0.502234 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975049 ms - Host latency: 1.02338 ms (enqueue 0.500403 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975146 ms - Host latency: 1.02363 ms (enqueue 0.502197 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974866 ms - Host latency: 1.02316 ms (enqueue 0.502979 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975049 ms - Host latency: 1.02327 ms (enqueue 0.501904 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975659 ms - Host latency: 1.02385 ms (enqueue 0.500391 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975256 ms - Host latency: 1.02373 ms (enqueue 0.500537 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975464 ms - Host latency: 1.02478 ms (enqueue 0.502112 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975256 ms - Host latency: 1.02344 ms (enqueue 0.50354 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974622 ms - Host latency: 1.02305 ms (enqueue 0.501746 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974634 ms - Host latency: 1.0233 ms (enqueue 0.500598 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974548 ms - Host latency: 1.02269 ms (enqueue 0.501062 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974744 ms - Host latency: 1.02495 ms (enqueue 0.505225 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974854 ms - Host latency: 1.02305 ms (enqueue 0.503259 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974756 ms - Host latency: 1.02299 ms (enqueue 0.500537 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974866 ms - Host latency: 1.02344 ms (enqueue 0.500671 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974744 ms - Host latency: 1.02285 ms (enqueue 0.503015 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975354 ms - Host latency: 1.02365 ms (enqueue 0.503332 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974854 ms - Host latency: 1.02324 ms (enqueue 0.502332 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975049 ms - Host latency: 1.02345 ms (enqueue 0.501123 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975342 ms - Host latency: 1.02352 ms (enqueue 0.500501 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975439 ms - Host latency: 1.02358 ms (enqueue 0.502393 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974451 ms - Host latency: 1.02273 ms (enqueue 0.503394 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.973914 ms - Host latency: 1.02203 ms (enqueue 0.502698 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975256 ms - Host latency: 1.02371 ms (enqueue 0.502246 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975671 ms - Host latency: 1.02693 ms (enqueue 0.503894 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97489 ms - Host latency: 1.02301 ms (enqueue 0.503503 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975537 ms - Host latency: 1.0239 ms (enqueue 0.503491 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975476 ms - Host latency: 1.02368 ms (enqueue 0.501941 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974719 ms - Host latency: 1.02639 ms (enqueue 0.503552 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975671 ms - Host latency: 1.02429 ms (enqueue 0.504712 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975745 ms - Host latency: 1.02385 ms (enqueue 0.506091 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974756 ms - Host latency: 1.02296 ms (enqueue 0.503406 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975671 ms - Host latency: 1.02378 ms (enqueue 0.501892 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975574 ms - Host latency: 1.02432 ms (enqueue 0.502649 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974756 ms - Host latency: 1.02322 ms (enqueue 0.502637 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974426 ms - Host latency: 1.02261 ms (enqueue 0.503638 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974866 ms - Host latency: 1.02275 ms (enqueue 0.504712 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974768 ms - Host latency: 1.02295 ms (enqueue 0.502173 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975159 ms - Host latency: 1.02308 ms (enqueue 0.502649 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975244 ms - Host latency: 1.02325 ms (enqueue 0.502661 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975232 ms - Host latency: 1.02351 ms (enqueue 0.5021 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975366 ms - Host latency: 1.02316 ms (enqueue 0.503162 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974438 ms - Host latency: 1.0226 ms (enqueue 0.503101 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975061 ms - Host latency: 1.02306 ms (enqueue 0.503076 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974219 ms - Host latency: 1.02249 ms (enqueue 0.501807 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974219 ms - Host latency: 1.022 ms (enqueue 0.502356 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975293 ms - Host latency: 1.02295 ms (enqueue 0.502026 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975903 ms - Host latency: 1.02344 ms (enqueue 0.50155 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975696 ms - Host latency: 1.02367 ms (enqueue 0.502625 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975256 ms - Host latency: 1.02305 ms (enqueue 0.503271 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974841 ms - Host latency: 1.02253 ms (enqueue 0.503894 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975159 ms - Host latency: 1.02284 ms (enqueue 0.503235 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974646 ms - Host latency: 1.02292 ms (enqueue 0.504346 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975269 ms - Host latency: 1.02317 ms (enqueue 0.50199 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974817 ms - Host latency: 1.02213 ms (enqueue 0.504651 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974231 ms - Host latency: 1.02208 ms (enqueue 0.502808 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974622 ms - Host latency: 1.02234 ms (enqueue 0.502759 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974841 ms - Host latency: 1.02263 ms (enqueue 0.500464 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975244 ms - Host latency: 1.02279 ms (enqueue 0.502197 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974854 ms - Host latency: 1.02238 ms (enqueue 0.503821 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974219 ms - Host latency: 1.02222 ms (enqueue 0.50498 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.976086 ms - Host latency: 1.02386 ms (enqueue 0.506104 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974744 ms - Host latency: 1.0226 ms (enqueue 0.503796 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974976 ms - Host latency: 1.02263 ms (enqueue 0.510242 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974658 ms - Host latency: 1.02278 ms (enqueue 0.513928 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974939 ms - Host latency: 1.02256 ms (enqueue 0.508765 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975476 ms - Host latency: 1.02334 ms (enqueue 0.502356 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974841 ms - Host latency: 1.0224 ms (enqueue 0.503955 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974939 ms - Host latency: 1.02274 ms (enqueue 0.506897 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975684 ms - Host latency: 1.02356 ms (enqueue 0.504004 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975671 ms - Host latency: 1.02339 ms (enqueue 0.504272 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974927 ms - Host latency: 1.02305 ms (enqueue 0.50166 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975452 ms - Host latency: 1.02449 ms (enqueue 0.504846 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975073 ms - Host latency: 1.02258 ms (enqueue 0.503882 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974939 ms - Host latency: 1.02239 ms (enqueue 0.504187 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975134 ms - Host latency: 1.02261 ms (enqueue 0.502417 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974744 ms - Host latency: 1.02231 ms (enqueue 0.501379 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974536 ms - Host latency: 1.02245 ms (enqueue 0.509009 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975171 ms - Host latency: 1.0228 ms (enqueue 0.506836 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975098 ms - Host latency: 1.02258 ms (enqueue 0.504419 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974634 ms - Host latency: 1.02239 ms (enqueue 0.501978 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974243 ms - Host latency: 1.02219 ms (enqueue 0.506714 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975049 ms - Host latency: 1.023 ms (enqueue 0.514502 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974585 ms - Host latency: 1.02273 ms (enqueue 0.51731 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974634 ms - Host latency: 1.02263 ms (enqueue 0.505981 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974854 ms - Host latency: 1.0229 ms (enqueue 0.501514 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974951 ms - Host latency: 1.02263 ms (enqueue 0.502808 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975806 ms - Host latency: 1.02424 ms (enqueue 0.507983 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975146 ms - Host latency: 1.02324 ms (enqueue 0.509595 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975708 ms - Host latency: 1.02354 ms (enqueue 0.502563 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975244 ms - Host latency: 1.02336 ms (enqueue 0.506445 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.986255 ms - Host latency: 1.0385 ms (enqueue 0.519312 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97522 ms - Host latency: 1.02993 ms (enqueue 0.51665 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975732 ms - Host latency: 1.03042 ms (enqueue 0.506787 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975415 ms - Host latency: 1.03035 ms (enqueue 0.502661 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.976343 ms - Host latency: 1.03079 ms (enqueue 0.501855 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975366 ms - Host latency: 1.03037 ms (enqueue 0.499634 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975928 ms - Host latency: 1.03027 ms (enqueue 0.50354 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975513 ms - Host latency: 1.02993 ms (enqueue 0.499805 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975293 ms - Host latency: 1.03005 ms (enqueue 0.500562 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.976196 ms - Host latency: 1.03127 ms (enqueue 0.500708 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.9854 ms - Host latency: 1.03782 ms (enqueue 0.513184 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975073 ms - Host latency: 1.02266 ms (enqueue 0.501978 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975439 ms - Host latency: 1.02312 ms (enqueue 0.503516 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975073 ms - Host latency: 1.02239 ms (enqueue 0.502295 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975073 ms - Host latency: 1.02256 ms (enqueue 0.501514 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974585 ms - Host latency: 1.02178 ms (enqueue 0.501221 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975244 ms - Host latency: 1.02258 ms (enqueue 0.50022 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975317 ms - Host latency: 1.02263 ms (enqueue 0.502319 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.976074 ms - Host latency: 1.02498 ms (enqueue 0.506836 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974854 ms - Host latency: 1.02234 ms (enqueue 0.503125 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974268 ms - Host latency: 1.02192 ms (enqueue 0.500952 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97561 ms - Host latency: 1.02366 ms (enqueue 0.500098 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975244 ms - Host latency: 1.02344 ms (enqueue 0.502319 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974854 ms - Host latency: 1.02275 ms (enqueue 0.504687 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974097 ms - Host latency: 1.02166 ms (enqueue 0.502002 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974365 ms - Host latency: 1.02214 ms (enqueue 0.500171 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974341 ms - Host latency: 1.0219 ms (enqueue 0.500269 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975366 ms - Host latency: 1.02288 ms (enqueue 0.507666 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975 ms - Host latency: 1.02234 ms (enqueue 0.504004 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975122 ms - Host latency: 1.02244 ms (enqueue 0.501929 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974536 ms - Host latency: 1.02188 ms (enqueue 0.500293 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974854 ms - Host latency: 1.02214 ms (enqueue 0.501514 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974829 ms - Host latency: 1.022 ms (enqueue 0.503931 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975049 ms - Host latency: 1.02271 ms (enqueue 0.50332 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975561 ms - Host latency: 1.02297 ms (enqueue 0.500586 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974927 ms - Host latency: 1.02266 ms (enqueue 0.500879 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974756 ms - Host latency: 1.02239 ms (enqueue 0.502026 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975928 ms - Host latency: 1.02329 ms (enqueue 0.502856 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97522 ms - Host latency: 1.02268 ms (enqueue 0.501172 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974341 ms - Host latency: 1.02161 ms (enqueue 0.501953 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975073 ms - Host latency: 1.02241 ms (enqueue 0.505127 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975317 ms - Host latency: 1.02268 ms (enqueue 0.500317 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974927 ms - Host latency: 1.02251 ms (enqueue 0.501123 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974731 ms - Host latency: 1.02197 ms (enqueue 0.504004 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974536 ms - Host latency: 1.02234 ms (enqueue 0.500488 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974292 ms - Host latency: 1.02209 ms (enqueue 0.501685 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975122 ms - Host latency: 1.02251 ms (enqueue 0.502393 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974561 ms - Host latency: 1.02212 ms (enqueue 0.502905 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975732 ms - Host latency: 1.02302 ms (enqueue 0.500781 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974658 ms - Host latency: 1.02393 ms (enqueue 0.503857 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975146 ms - Host latency: 1.02256 ms (enqueue 0.503198 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974316 ms - Host latency: 1.02156 ms (enqueue 0.502734 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974243 ms - Host latency: 1.0217 ms (enqueue 0.503931 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974829 ms - Host latency: 1.02268 ms (enqueue 0.501294 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975415 ms - Host latency: 1.02258 ms (enqueue 0.501392 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975 ms - Host latency: 1.02305 ms (enqueue 0.504248 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97478 ms - Host latency: 1.02227 ms (enqueue 0.503784 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974854 ms - Host latency: 1.02217 ms (enqueue 0.502661 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974292 ms - Host latency: 1.02166 ms (enqueue 0.501221 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975024 ms - Host latency: 1.02239 ms (enqueue 0.5021 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974658 ms - Host latency: 1.02207 ms (enqueue 0.50188 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974609 ms - Host latency: 1.0219 ms (enqueue 0.503638 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975464 ms - Host latency: 1.02314 ms (enqueue 0.502148 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974976 ms - Host latency: 1.02266 ms (enqueue 0.510767 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974902 ms - Host latency: 1.02256 ms (enqueue 0.509448 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975195 ms - Host latency: 1.02246 ms (enqueue 0.50564 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974536 ms - Host latency: 1.022 ms (enqueue 0.506104 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974927 ms - Host latency: 1.02244 ms (enqueue 0.505151 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974316 ms - Host latency: 1.02178 ms (enqueue 0.502979 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974854 ms - Host latency: 1.02261 ms (enqueue 0.503027 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974756 ms - Host latency: 1.02246 ms (enqueue 0.503784 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975366 ms - Host latency: 1.02285 ms (enqueue 0.503564 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975366 ms - Host latency: 1.02283 ms (enqueue 0.503857 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974878 ms - Host latency: 1.02236 ms (enqueue 0.505371 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.973926 ms - Host latency: 1.02156 ms (enqueue 0.506665 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975488 ms - Host latency: 1.02307 ms (enqueue 0.509863 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975073 ms - Host latency: 1.02256 ms (enqueue 0.506763 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975171 ms - Host latency: 1.02261 ms (enqueue 0.505518 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975195 ms - Host latency: 1.02246 ms (enqueue 0.501489 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974805 ms - Host latency: 1.02231 ms (enqueue 0.508105 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975366 ms - Host latency: 1.02463 ms (enqueue 0.509814 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974316 ms - Host latency: 1.02158 ms (enqueue 0.505273 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975024 ms - Host latency: 1.02251 ms (enqueue 0.499854 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975269 ms - Host latency: 1.02268 ms (enqueue 0.500977 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975 ms - Host latency: 1.02261 ms (enqueue 0.502173 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975024 ms - Host latency: 1.02229 ms (enqueue 0.50481 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974536 ms - Host latency: 1.02161 ms (enqueue 0.504541 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974854 ms - Host latency: 1.02244 ms (enqueue 0.500122 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975122 ms - Host latency: 1.02236 ms (enqueue 0.50061 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975537 ms - Host latency: 1.02319 ms (enqueue 0.502637 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975146 ms - Host latency: 1.02239 ms (enqueue 0.505151 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975122 ms - Host latency: 1.02302 ms (enqueue 0.500635 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974976 ms - Host latency: 1.02234 ms (enqueue 0.500977 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975391 ms - Host latency: 1.02263 ms (enqueue 0.500732 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975635 ms - Host latency: 1.02302 ms (enqueue 0.503345 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975269 ms - Host latency: 1.02268 ms (enqueue 0.50415 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974829 ms - Host latency: 1.02209 ms (enqueue 0.501025 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975488 ms - Host latency: 1.02278 ms (enqueue 0.500488 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975171 ms - Host latency: 1.02258 ms (enqueue 0.501514 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974927 ms - Host latency: 1.02241 ms (enqueue 0.502759 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974536 ms - Host latency: 1.02166 ms (enqueue 0.500098 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.97373 ms - Host latency: 1.02139 ms (enqueue 0.501636 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.975244 ms - Host latency: 1.02432 ms (enqueue 0.511816 ms)
[12/25/2024-03:35:00] [I] Average on 10 runs - GPU latency: 0.974683 ms - Host latency: 1.02207 ms (enqueue 0.515063 ms)
[12/25/2024-03:35:00] [I] 
[12/25/2024-03:35:00] [I] === Performance summary ===
[12/25/2024-03:35:00] [I] Throughput: 1020.85 qps
[12/25/2024-03:35:00] [I] Latency: min = 1.01053 ms, max = 1.13281 ms, mean = 1.02356 ms, median = 1.02319 ms, percentile(90%) = 1.02637 ms, percentile(95%) = 1.02814 ms, percentile(99%) = 1.03284 ms
[12/25/2024-03:35:00] [I] Enqueue Time: min = 0.336914 ms, max = 0.744446 ms, mean = 0.504433 ms, median = 0.501892 ms, percentile(90%) = 0.510803 ms, percentile(95%) = 0.518555 ms, percentile(99%) = 0.534668 ms
[12/25/2024-03:35:00] [I] H2D Latency: min = 0.0397949 ms, max = 0.0620422 ms, mean = 0.0413101 ms, median = 0.0410156 ms, percentile(90%) = 0.0417786 ms, percentile(95%) = 0.0426025 ms, percentile(99%) = 0.048584 ms
[12/25/2024-03:35:00] [I] GPU Compute Time: min = 0.962585 ms, max = 1.08447 ms, mean = 0.975089 ms, median = 0.974854 ms, percentile(90%) = 0.977051 ms, percentile(95%) = 0.977905 ms, percentile(99%) = 0.978943 ms
[12/25/2024-03:35:00] [I] D2H Latency: min = 0.00585938 ms, max = 0.00952148 ms, mean = 0.00716775 ms, median = 0.00708008 ms, percentile(90%) = 0.00802612 ms, percentile(95%) = 0.00823975 ms, percentile(99%) = 0.0088501 ms
[12/25/2024-03:35:00] [I] Total Host Walltime: 3.00339 s
[12/25/2024-03:35:00] [I] Total GPU Compute Time: 2.98962 s
[12/25/2024-03:35:00] [I] Explanations of the performance metrics are printed in the verbose logs.
[12/25/2024-03:35:00] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8603] # trtexec --onnx=resnet50-v2-7.onnx --saveEngine=resnet50_refit.plan --refit --timeRefit

@lizexu123
Copy link
Author

lizexu123 commented Dec 25, 2024

I retried your demo from [https://github.com/NVIDIA/TensorRT/tree/release/8.6/samples/python/engine_refit_onnx_bidaf], and I found that it works fine. However, I want to switch models, such as the previously mentioned resnet50-v2-7.onnx. I aim to test multiple models and measure the performance to see if the TensorRT Refit functionality affects inference performance. But I am not sure why weights like tmp_weight_2, tmp_weight_7, tmp_weight_6, and tmp_weight_3 are being generated, and why I have to explicitly set them using refitter.set_named_weights(name, weight); otherwise, it throws an error. In my build_and_refit_engine.py, how can I know the shapes of these generated weights (tmp_weight_2, tmp_weight_7, etc.)?

1.First, download the ResNet50 model in the engine_refit_onnx_bidaf directory using the following command:
wget https://github.com/onnx/models/raw/refs/heads/main/validated/vision/classification/resnet/model/resnet50-v2-7.onnx

2.modify TensorRT/samples/python/engine_refit_onnx_bidaf/data_processing.py

# SPDX-FileCopyrightText: Copyright (c) 1993-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

import numpy as np
import nltk
from nltk import word_tokenize
import json
import tensorrt as trt


def preprocess(text):
    try:
        nltk.data.find("tokenizers/punkt")
    except LookupError:
        nltk.download("punkt")
    tokens = word_tokenize(text)
    # split into lower-case word tokens, in numpy array with shape of (seq, 1)
    words = np.asarray([w.lower() for w in tokens]).reshape(-1, 1)
    # split words into chars, in numpy array with shape of (seq, 1, 1, 16)
    chars = [[c for c in t][:16] for t in tokens]
    chars = [cs + [""] * (16 - len(cs)) for cs in chars]
    chars = np.asarray(chars).reshape(-1, 1, 1, 16)
    return words, chars


def get_map_func(filepath):
    file = open(filepath)
    category_map = json.load(file)
    category_mapper = dict(zip(category_map["cats_strings"], category_map["cats_int64s"]))
    default_int64 = category_map["default_int64"]
    func = lambda s: category_mapper.get(s, default_int64)
    return np.vectorize(func)


# def get_inputs(context, query):
#     cw, cc = preprocess(context)
#     qw, qc = preprocess(query)

#     context_word_func = get_map_func("CategoryMapper_4.json")
#     context_char_func = get_map_func("CategoryMapper_5.json")
#     query_word_func = get_map_func("CategoryMapper_6.json")
#     query_char_func = get_map_func("CategoryMapper_7.json")

#     cw_input = context_word_func(cw).astype(trt.nptype(trt.int32)).ravel()
#     cc_input = context_char_func(cc).astype(trt.nptype(trt.int32)).ravel()
#     qw_input = query_word_func(qw).astype(trt.nptype(trt.int32)).ravel()
#     qc_input = query_char_func(qc).astype(trt.nptype(trt.int32)).ravel()
#     return cw_input, cc_input, qw_input, qc_input

def get_inputs():
    batch_size = 1
    channels = 3
    height = 224
    width = 224
    random_input = np.random.rand(batch_size, channels, height, width).astype(np.float32)
    # 标准化(根据 ImageNet 的均值和标准差)
    mean = np.array([0.485, 0.456, 0.406]).reshape(1, 3, 1, 1)
    std = np.array([0.229, 0.224, 0.225]).reshape(1, 3, 1, 1)
    random_input = (random_input - mean) / std
    return (random_input,)
  1. TensorRT/samples/python/engine_refit_onnx_bidaf/prepare_model.py
#
# SPDX-FileCopyrightText: Copyright (c) 1993-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

import onnx_graphsurgeon as gs
import onnx
import numpy as np
import json

import sys, os

sys.path.insert(1, os.path.join(sys.path[0], ".."))
from downloader import getFilePath


def drop_category_mapper_nodes(graph):
    new_inputs = []
    for org_input in graph.inputs:
        # head node, simply disconnect it with others
        assert len(org_input.outputs) == 1
        category_mapper_node = org_input.outputs[0]
        assert category_mapper_node.op == "CategoryMapper"
        assert len(category_mapper_node.outputs) == 1
        new_inputs.append(category_mapper_node.outputs[0])
        category_mapper_node.inputs.clear()
        category_mapper_node.outputs.clear()

        # Save mapping info to preprocess inputs.
        with open(category_mapper_node.name + ".json", "w") as fp:
            json.dump(category_mapper_node.attrs, fp)

    graph.inputs = new_inputs


def replace_unsupported_ops(graph):
    # replace hardmax with ArgMax
    hardmaxes = [node for node in graph.nodes if node.op == "Hardmax"]
    assert len(hardmaxes) == 1
    hardmax = hardmaxes[0]
    hardmax.op = "ArgMax"
    hardmax.name = "ArgMax(org:" + hardmax.name + ")"
    hardmax.attrs["axis"] = 1
    hardmax.attrs["keepdims"] = 0

    cast = hardmax.o()
    reshape = cast.o()

    hardmax.outputs = reshape.outputs
    assert len(hardmax.outputs) == 1
    hardmax.outputs[0].dtype = np.int64
    hardmax.outputs[0].shape = [1]

    compress = reshape.o()
    compress.op = "Gather"
    compress.name = "Gather(org:" + compress.name + ")"
    compress.attrs["axis"] = 1

    cast.outputs.clear()
    reshape.outputs.clear()
    # Remove the node from the graph completely
    graph.cleanup().toposort()


def save_weights_for_refitting(graph, output_dir="refit_weights"):
    
    # 创建输出目录(如果不存在)
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
        print(f"创建输出目录: {output_dir}")
    
    tmap = graph.tensors()
    print("模型中的张量名称:", list(tmap.keys()))
    
    # 遍历所有张量,保存常量张量(即权重)
    for name, tensor in tmap.items():
        print(f"处理张量: {name}, 类型: {type(tensor)}")
        # 检查张量是否具有 'values' 属性
        if hasattr(tensor, 'values') and tensor.values is not None:
            # 构建文件路径
            file_path = os.path.join(output_dir, f"{name}.npy")
            # 保存权重
            np.save(file_path, tensor.values)
            print(f"已保存权重: {file_path}")
        else:
            print(f"跳过张量: {name}(非常量张量)")

def main():
    org_model_file_path = "resnet50-v2-7.onnx"  
    
    if not os.path.exists(org_model_file_path):
        print(f"错误: 模型文件 {org_model_file_path} 不存在。请确保已下载模型文件。")
        return
    
    print("Modifying the ONNX model ...")
    original_model = onnx.load(org_model_file_path)
    graph = gs.import_onnx(original_model)
    
    
    save_weights_for_refitting(graph)
    
    new_model = gs.export_onnx(graph)
    
    modified_model_name = "resnet50-modified.onnx"
    onnx.checker.check_model(new_model)
    onnx.save(new_model, modified_model_name)
    print(f"修改后的 ONNX 模型已保存为 {modified_model_name}")
    print("Done.")

if __name__ == "__main__":
    main()

execute python prepare_model.py

  1. modify TensorRT/samples/python/engine_refit_onnx_bidaf/build_and_refit_engine.py
#!/usr/bin/env python3
#
# SPDX-FileCopyrightText: Copyright (c) 1993-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# ...

import os
import sys
import numpy as np
import tensorrt as trt
from data_processing import get_inputs, preprocess
import onnx  
import time 

sys.path.insert(1, os.path.join(sys.path[0], ".."))
import common

TRT_LOGGER = trt.Logger()

def list_onnx_weights(onnx_file_path):
    model = onnx.load(onnx_file_path)
    weights = {}
    for tensor in model.graph.initializer:
        weights[tensor.name] = tensor
    return weights.keys()

def get_engine(onnx_file_path, engine_file_path):

    def build_engine():
        builder = trt.Builder(TRT_LOGGER)
        network = builder.create_network(common.EXPLICIT_BATCH)
        parser = trt.OnnxParser(network, TRT_LOGGER)
        runtime = trt.Runtime(TRT_LOGGER)

        print("Loading ONNX file from path {}...".format(onnx_file_path))
        with open(onnx_file_path, "rb") as model:
            print("Beginning ONNX file parsing")
            if not parser.parse(model.read()):
                print("ERROR: Failed to parse the ONNX file.")
                for error in range(parser.num_errors):
                    print(parser.get_error(error))
                return None
        print("Completed parsing of ONNX file")

        # 打印输入信息
        print("Network inputs:")
        for i in range(network.num_inputs):
            tensor = network.get_input(i)
            print(tensor.name, trt.nptype(tensor.dtype), tensor.shape)
            
        config = builder.create_builder_config()
        config.builder_optimization_level=3
        config.set_flag(trt.BuilderFlag.REFIT)
        config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 30)  # 1GiB

        # 设置优化配置文件
        profile = builder.create_optimization_profile()
        profile.set_shape(
            network.get_input(0).name,
            min=(1, 3, 224, 224),
            opt=(1, 3, 224, 224),
            max=(1, 3, 224, 224)
        )
        config.add_optimization_profile(profile)

        print("Building an engine from file {}; this may take a while...".format(onnx_file_path))
        plan = builder.build_serialized_network(network, config)
        engine = runtime.deserialize_cuda_engine(plan)
        print("Completed creating Engine")

        with open(engine_file_path, "wb") as f:
            f.write(plan)
        return engine
    
    if os.path.exists(engine_file_path):
        print("Reading engine from file {}".format(engine_file_path))
        with open(engine_file_path, "rb") as f:
            runtime = trt.Runtime(TRT_LOGGER)
            return runtime.deserialize_cuda_engine(f.read())
    else:
        return build_engine()

def main():
    onnx_file_path = "resnet50-modified.onnx"  
    engine_file_path = "resnet50.trt"

    # 列出并验证 ONNX 模型中的权重
    onnx_weights = list_onnx_weights(onnx_file_path)
    print("Weights in ONNX model:")
    for weight_name in onnx_weights:
        print(weight_name)

    cw_input, = get_inputs() 
    cw_input.astype(np.float32)

    # 定义要重拟合的模型权重
    model_weights_names = [
        "resnetv24_conv0_weight", 
        "resnetv24_stage1_conv0_weight",
        "resnetv24_batchnorm1_gamma",
        "resnetv24_batchnorm0_gamma",
        # 根据需要添加其他模型权重名称
    ]

    # 验证模型权重的 .npy 文件是否存在
   # 权重文件所在的目录
    weights_dir = "./refit_weights"

    # 验证模型权重的 .npy 文件是否存在
    for name in model_weights_names:
        file_path = os.path.join(weights_dir, f"{name}.npy")  # 拼接完整路径
        if not os.path.exists(file_path):
            raise FileNotFoundError(f"权重文件 {file_path} 未找到,请确认文件已存在。")
        else:
            print(f"成功找到权重文件:{file_path}")

    refit_model_weights_dict = {}
    # 加载模型权重
    for name in model_weights_names:
        file_path=os.path.join(weights_dir, f"{name}.npy") 
        refit_model_weights_dict[name] = np.load(file_path)
    # refit_model_weights_dict = {name: np.load(f"{name}.npy") for name in model_weights_names}

    # 加载引擎
    engine = get_engine(onnx_file_path, engine_file_path)
    refitter = trt.Refitter(engine, TRT_LOGGER)



    # 初始化内部权重
    # 请根据实际情况调整形状
     # 初始化内部权重
    # 请根据实际情况调整形状
    refit_internal_weights_dict = {
        "tmp_weight_2": np.ones(shape=64, dtype=np.float32),
        "tmp_weight_3": np.ones(shape=64, dtype=np.float32),
        "tmp_weight_6": np.ones(shape=64, dtype=np.float32),
        "tmp_weight_7": np.ones(shape=64, dtype=np.float32),
    }
    weight_names=["tmp_weight_2","tmp_weight_3","tmp_weight_6","tmp_weight_7"]
    for name in weight_names:
        weights=refitter.get_weights_prototype(name)
        if weights.size == -1:
            print(f"权重名称 '{name}' 无法重新拟合或不存在。")
        else:
            print(f"权重名称: {name}")
            print(f"数据类型: {trt.nptype(weights.dtype)}")
            print(f"元素数量: {weights.size}")

    
    # 合并所有权重
    refit_weights_dict = {**refit_model_weights_dict, **refit_internal_weights_dict}
    # 设置权重
    print("开始设置所有权重...")
    for name, weight in refit_weights_dict.items():
        try:
            refitter.set_named_weights(name, weight)
            print(f"设置权重: {name}, 形状: {weight.shape}")
        except Exception as e:
            print(f"设置权重 {name} 时出错: {e}")

    all_weights=refitter.get_all_weights()
    print("all_weights",all_weights)
   
    missing_weights = refitter.get_missing_weights()
    print("missing_weights",missing_weights)
    if len(missing_weights) != 0:
        print("缺失的权重:", missing_weights)
        raise AssertionError("存在缺失的权重,请确保所有权重都已设置。")

    # 执行重拟合
    if not refitter.refit_cuda_engine():
        raise AssertionError("重拟合失败。")
    print("重拟合成功。")

    # 继续进行推理
    for profile_idx in range(engine.num_optimization_profiles):
        print("执行推理...")

        inputs, outputs, bindings, stream = common.allocate_buffers(engine, profile_idx)

        inputs[0].host = cw_input.astype(np.float32)

        execution_context = engine.create_execution_context()
        execution_context.set_binding_shape(0, cw_input.shape)

        start_time = time.perf_counter()
        trt_outputs = common.do_inference_v2(
            execution_context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream
        )
        end_time = time.perf_counter()

        inference_time_ms = (end_time - start_time) * 1000
        print(f"推理时间: {inference_time_ms:.2f} ms")

        # 处理输出
        output_probs = trt_outputs[0]
        predicted_class = np.argmax(output_probs)
        print(f"预测类别: {predicted_class}")

        # 释放缓冲区
        common.free_buffers(inputs, outputs, stream)

        print("推理通过")

if __name__ == "__main__":
    main()

execute build_and_refit_engine.py

@lizexu123
Copy link
Author

This code can run successfully, but I have two questions:

How can I determine which model weights need to be refitted? Can any constant weights saved during the model conversion process be refitted?

For the generated tmp_weight (e.g., tmp_weight_2, etc.), it seems mandatory to set them using refitter.set_named_weights(name, weight). However, the number of tmp_weights generated appears inconsistent depending on the content of model_weights_names. This seems quite complex. Can you help clarify this?

@lix19937
Copy link

How can I determine which model weights need to be refitted? Can any constant weights saved during the model conversion process be refitted?

Weights newWeights = ...;
refitter->setWeights("MyLayer",WeightsRole::kKERNEL,
                    newWeights);

The new weights should have the same count as the original weights used to build the engine. setNamedWeights returns false if something goes wrong, such as a wrong weights name or a change in the weights count.

You can use INetworkDefinition::setWeightsName() to name weights at build time - the ONNX parser uses this API to associate the weights with the names used in the ONNX model. Otherwise, TensorRT will name the weights internally based on the related layer names and weight roles.

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#refitting-engine-c

@anderson101866
Copy link

anderson101866 commented Dec 30, 2024

Dear @lizexu123 ,

There are 2 parts to focus on.

1. (what @lix19937 has mentioned) Skip weights which are not refittable

I've rewritten set_named_weights from

try:
  ...
  refitter.set_named_weights(name, weight)
except ...

to

assert refitter.set_named_weights(name, weight), f'fail to set "{name}", {weight=}' #NOTE: need to check return value instead of try-catch

to capture what weights are actually non-refittable.

According to the document to get_weights_prototype, those size are -1.

2. What does 'tmp_weight_X' mean?

You can refer to this answer, which is inserted by trt.OnnxParser


NOTE: @lizexu123 still don't know how to set new weight for such "tmp_weight_X". Could anyone guide us?

@lix19937
Copy link

lix19937 commented Jan 1, 2025

@anderson101866 @lizexu123 yes, op of trt not exactly correspondence to onnx, onx_trt parser will adjust those case. Maybe tmp_weight_X correspond to one op's attribute. If you want to know the detailed correspondence info, you can use the subgraph/onnx go forward one by one.

@lizexu123
Copy link
Author

trt8.6下没有get_weights_prototype这个函数,有没有类似的方法呢,我如果用Paddle模型,不用trt.OnxxParser,就不会产生这些临时权重了吧

@lix19937
Copy link

lix19937 commented Jan 6, 2025

Is your Paddle model onnx format ? @lizexu123

@anderson101866
Copy link

anderson101866 commented Jan 9, 2025

Hi @lizexu123 , if you rely on trt.OnxxParser, you can leverage trt.OnnxParserRefitter to skip these tmp weights.

Just rewrite your codes to

engine = get_engine(onnx_file_path, engine_file_path) #which build TRT engine via `trt.OnnxParser`
refitter = trt.Refitter(engine, TRT_LOGGER)
parser_refitter = trt.OnnxParserRefitter(refitter, TRT_LOGGER) #NOTE: add this
assert parser_refitter.refit_from_file(onnx_file_path) #NOTE: add this
...
print("开始设置所有权重...")
for name, weight in refit_weights_dict.items():
  assert refitter.set_named_weights(name, weight), f'fail to set "{name}", {weight=}'

Then, those temp weight like tmp_batch_norm_weight_1 will not confuse refitter object.

(Update)
It seems 8.6 also not support the new interface trt.OnnxParserRefitter. You need to apply TensorRT 10.

我如果用Paddle模型,不用trt.OnxxParser,就不会产生这些临时权重了吧

Yes, those temp weight is generated during parsing onnx attribute with OnnxParser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants