Internal Error (Could not find any implementation for node {...Tensordot/Reshape}) failure of TensorRT 8.6 when running generate INT8 engine on GPU RTX3080 #4291

yjiangling · 2024-12-19T08:25:23Z

Hi,
I'm using TensorRT8.6 to conduct int8 calibration and genereate tensorrt engine with polygraphy tools like this

polygraphy convert onnx_model/model.onnx --trt-min-shapes xs:[1,1120] xlen:[1] --trt-opt-shapes xs:[1,160000] xlen:[1] --trt-max-shapes xs:[1,480000] xlen:[1] --int8 --data-loader-script data_loader.py --calibration-cache trt86_minmax_calib.cache --calib-base-cls IInt8MinMaxCalibrator --output trt_model/trt86_minmax_int8.plan

, but it always give the following error:

[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[I]     Configuring with profiles: [Profile().add('xs', min=[1, 1120], opt=[1, 160000], max=[1, 480000]).add('xlen', min=[1], opt=[1], max=[1])]
[W] TensorRT does not currently support using dynamic shapes during calibration. The `OPT` shapes from the calibration profile will be used for tensors with dynamic shapes. Calibration data is expected to conform to those shapes. 
[I] Building engine with configuration:
    Flags                  | [INT8]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 10002.44 MiB, TACTIC_DRAM: 10002.44 MiB]
    Tactic Sources         | [CUBLAS, CUBLAS_LT, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
    Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
    Calibrator             | Calibrator(<generator object load_data at 0x7fd95cd5c430>, cache='trt86_minmax_calib.cache', BaseClass=<class 'tensorrt.tensorrt.IInt8MinMaxCalibrator'>)
[I] Saving calibration cache to trt86_minmax_calib.cache
[W] Missing scale and zero-point for tensor xlen, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor (Unnamed Layer* 1) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor sub:0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor (Unnamed Layer* 9) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor floordiv:0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor stft/frame/range_1:0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor (Unnamed Layer* 59) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor stft/frame/mul_1:0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor stft/frame/Reshape_2:0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor (Unnamed Layer* 70) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor stft/frame/add_2:0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor (Unnamed Layer* 82) [Shuffle]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor add:0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor SequenceMask/ExpandDims:0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor SequenceMask/Cast:0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor (Unnamed Layer* 107) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor (Unnamed Layer* 112) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor SequenceMask/Range:0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[W] Missing scale and zero-point for tensor SequenceMask/Less:0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
formats.cpp:2379: DCHECK(desired_so.size() == t->dim_count()) failed. 
[E] 10: Could not find any implementation for node {ForeignNode[(Unnamed Layer* 82) [Shuffle]_output[Constant]...Tensordot/Reshape]}.
[E] 10: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[(Unnamed Layer* 82) [Shuffle]_output[Constant]...Tensordot/Reshape]}.)
[!] Invalid Engine. Please ensure the engine was built correctly

The convert onnx model is generated from the saved model, which produced with codes as below:

import librosa
import tensorflow as tf

xs = tf.placeholder(tf.float32, shape=[None, None], name='xs')
xlen = tf.placeholder(tf.int32, shape=[None], name='xlen')
spectrogram = tf.square(tf.abs(tf.signal.stft(xs, frame_length=400, frame_step=160, fft_length=512)))
weight = librosa.filters.mel(sr=16000, n_fft=512, n_mels=80).T  # (257,80)
weight = tf.convert_to_tensor(weight, tf.float32)
mel = tf.tensordot(spectrogram, weight, axes=1)
mel_length = (xlen - 400) // 160 + 1
mask = tf.sequence_mask(mel_length, maxlen=tf.shape(mel)[1], dtype=tf.float32)
mel = mel * tf.expand_dims(mask, axis=-1)
ys = tf.identity(mel, name='ys')

with tf.Session() as sess:
	sess.run(tf.global_variables_initializer())

	tf.saved_model.simple_save(sess,
		'./saved_model',
		inputs={"xs": xs, "xlen": xlen},
		outputs={"ys": ys})

Then, convert saved model to onnx model with python3 -m tf2onnx.convert --opset 13 --saved-model ./saved_model/ --output ./onnx_model/model.onnx , what's wrong with it ?

By the way, the file data_loader.py is used for int8 calibration, you can reproduce it like this:

import numpy as np

def load_data(calib_num=100):
	for _ in range(calib_num):
		n = np.random.randint(1000, 5000)
		x = np.random.rand(1, n).astype(np.float32)
		x_len = np.array([n], dtype=np.int32)
		yield {"xs": x, "xlen": x_len}

Anyone can give some helps? Thanks a lot for the help!!!

The text was updated successfully, but these errors were encountered:

yjiangling · 2024-12-19T08:36:01Z

model.zip

Here is the onnx model file, you can use it directly, or generate a new model with before codes.

yuanyao-nv · 2024-12-19T23:52:48Z

@yjiangling I see the TRT version you used is quite old. Would you like to try the latest 10.7 to check if the error still occurs?

yjiangling · 2024-12-20T01:53:37Z

@yjiangling I see the TRT version you used is quite old. Would you like to try the latest 10.7 to check if the error still occurs?

@yuanyao-nv Yes, I'will try to install the latest docker images and try it again, but for some reason, we must run our models on tensorrt8.6 so far, so could you please help to investigate what's wrong with it ? In my experiments, I found if remove the code mel = mel * tf.expand_dims(mask, axis=-1) , it will successfully generate the tensorrt engine without any error. It seems that there are some tricks in TensorRT8.6... I would greatly appreciate it if you could help to figure out it.

yjiangling · 2024-12-26T01:54:00Z

@asfiyab-nvidia Could you please give some helps? many thanks, your partener assigned last week maybe on vacation or too busy, hoping to receive some suggestions from your team, thanks again.

By the way, for some hardware and software requirements in our project, we must use TensorRT8.6 now, and what's more, if we generate fp32 engine directly with int8 calibration and quantilize, everything is OK, so we guess the problem should lie in the calibration and quantification stages.
Even we remove the code mel = mel * tf.expand_dims(mask, axis=-1) and use polygraphy convert to generate the int8 quantilized model, the engine can't get the right result, and when used the trtexec and calibration cache file to conduct calibration and quantilize, it failed again.

trtexec --onnx=onnx_model/model.onnx --minShapes=xs:1x1120,xlen:1 --optShapes=xs:1x160000,xlen:1 --maxShapes=xs:1x480000,xlen:1 --minShapesCalib=xs:1x1120,xlen:1 --optShapesCalib=xs:1x160000,xlen:1 --maxShapesCalib=xs:1x480000,xlen:1 --workspace=10240 --int8 --calib=trt86_minmax_calib.cache --saveEngine=trt_model/trt86_minmax_int8.plan --verbose --buildOnly > trt_model/result-INT8.txt

So, it seems that there is something wrong with the cailbration cache file generate from polygraphy convert before?

yjiangling · 2024-12-31T09:07:05Z

@lix19937 May I have your help? Many thanks.

lix19937 · 2025-01-01T14:28:46Z

Do a test by follow cmd

trtexec --onnx=onnx_model/model.onnx \
--minShapes=xs:1x1120,xlen:1 \
--optShapes=xs:1x160000,xlen:1 \
--maxShapes=xs:1x480000,xlen:1 \
--minShapesCalib=xs:1x1120,xlen:1 \
--optShapesCalib=xs:1x160000,xlen:1 \
--maxShapesCalib=xs:1x480000,xlen:1 \
--workspace=10240 \
--int8  \
--saveEngine=trt_model/trt86_minmax_int8.plan \
--verbose \
--buildOnly > trt_model/result-INT8.txt

Second, polygraphy depends on trt, you should make sure the version match with your trtexec (nvinfer), you can use c++ to do a ptq for your onnx to get calib.table.

yjiangling · 2025-01-02T07:10:57Z

Do a test by follow cmd

trtexec --onnx=onnx_model/model.onnx
--minShapes=xs:1x1120,xlen:1
--optShapes=xs:1x160000,xlen:1
--maxShapes=xs:1x480000,xlen:1
--minShapesCalib=xs:1x1120,xlen:1
--optShapesCalib=xs:1x160000,xlen:1
--maxShapesCalib=xs:1x480000,xlen:1
--workspace=10240
--int8
--saveEngine=trt_model/trt86_minmax_int8.plan
--verbose
--buildOnly > trt_model/result-INT8.txt
Second, polygraphy depends on trt, you should make sure the version match with your trtexec (nvinfer), you can use c++ to do a ptq for your onnx to get calib.table.

@lix19937 Thanks a lot. Yes, the test above without calibration cashe file is OK, and the version of polygraphy is 0.47.1, is it math with the trtexec? How to check if the version match with trtexec (nvinfer)? That is to say, the calibration cache file generated by the data_loader.py sacript in Python is not fine, right? I should use c++ to do the int8 calibration?

lix19937 · 2025-01-06T05:32:39Z

polygraphy has trt(py) backend, so you need make sure your trt(py) version.
In py, you can use print(trt.__version__)

If you want to use your current trt(cpp), you can ref https://github.com/lix19937/trt-samples-for-hackathon-cn/tree/master/cookbook/03-BuildEngineByTensorRTAPI/MNISTExample-pyTorch/C%2B%2B

yjiangling · 2025-01-06T09:48:52Z

polygraphy has trt(py) backend, so you need make sure your trt(py) version. In py, you can use print(trt.__version__)

If you want to use your current trt(cpp), you can ref https://github.com/lix19937/trt-samples-for-hackathon-cn/tree/master/cookbook/03-BuildEngineByTensorRTAPI/MNISTExample-pyTorch/C%2B%2B

@lix19937 Thank you so much, sorry for the misunderstanding, infact I always use TensorRT in Python instead of C++. And the trt(py) version I used is 8.6.1, is it suitable for polygraphy with 0.47.1 version? What version of polygraphy I should use for TensorRT8.6.1?

lix19937 · 2025-01-06T10:05:42Z

you can run follow to see nvinfer.so version

find / -name libtensorrt.so 

ldd -r ${YOUR_libtensorrt.so_location} |grep libnvinfer.so

lix19937 · 2025-01-06T10:07:45Z

model.zip

Here is the onnx model file, you can use it directly, or generate a new model with before codes.

Is it your onnx which current used ?

yjiangling · 2025-01-07T07:35:05Z

model.zip
Here is the onnx model file, you can use it directly, or generate a new model with before codes.

Is it your onnx which current used ?

@lix19937 Thanks again for the helps, I'm so sorry for the late reply.

I used the container created by the nvidia-docker images, it seems that there is no libtensorrt.so file (I think there is only Python package of TensorRT in the environment, the C++ library of TensorRT is not installed), but libnvinfer.so file is found in the folder "/usr/lib/x86_64-linux-gnu/"
Yes, the onnx model is the preprocess module of the full model I used, it gives the same error message "Could not find any implementation for node......Tensordot/Reshape" as the full onnx model, I guess the problem may lie in this part? The size of full onnx model is too large to put here directly, I put it in the https://gh.ddlc.top/https://github.com/yjiangling/Test/releases/download/1.0/full_model.zip, you can use the simplified onnx model or the full onnx model.

asfiyab-nvidia assigned yuanyao-nv Dec 19, 2024

asfiyab-nvidia added Engine Build Issues with engine build triaged Issue has been triaged by maintainers quantization Issues related to Quantization labels Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal Error (Could not find any implementation for node {...Tensordot/Reshape}) failure of TensorRT 8.6 when running generate INT8 engine on GPU RTX3080 #4291

Internal Error (Could not find any implementation for node {...Tensordot/Reshape}) failure of TensorRT 8.6 when running generate INT8 engine on GPU RTX3080 #4291

yjiangling commented Dec 19, 2024

yjiangling commented Dec 19, 2024

yuanyao-nv commented Dec 19, 2024

yjiangling commented Dec 20, 2024 •

edited

Loading

yjiangling commented Dec 26, 2024

yjiangling commented Dec 31, 2024

lix19937 commented Jan 1, 2025

yjiangling commented Jan 2, 2025

lix19937 commented Jan 6, 2025

yjiangling commented Jan 6, 2025

lix19937 commented Jan 6, 2025

lix19937 commented Jan 6, 2025

yjiangling commented Jan 7, 2025 •

edited

Loading

Internal Error (Could not find any implementation for node {...Tensordot/Reshape}) failure of TensorRT 8.6 when running generate INT8 engine on GPU RTX3080 #4291

Internal Error (Could not find any implementation for node {...Tensordot/Reshape}) failure of TensorRT 8.6 when running generate INT8 engine on GPU RTX3080 #4291

Comments

yjiangling commented Dec 19, 2024

yjiangling commented Dec 19, 2024

yuanyao-nv commented Dec 19, 2024

yjiangling commented Dec 20, 2024 • edited Loading

yjiangling commented Dec 26, 2024

yjiangling commented Dec 31, 2024

lix19937 commented Jan 1, 2025

yjiangling commented Jan 2, 2025

lix19937 commented Jan 6, 2025

yjiangling commented Jan 6, 2025

lix19937 commented Jan 6, 2025

lix19937 commented Jan 6, 2025

yjiangling commented Jan 7, 2025 • edited Loading

yjiangling commented Dec 20, 2024 •

edited

Loading

yjiangling commented Jan 7, 2025 •

edited

Loading