TensorRT 8.6.2 MatrixMultiply Operator Quantization #4322

dingliangxiansheng · 2025-01-15T08:38:13Z

I am performing QAT quantization on the HRNet OCR model and using TensorRT 8.6.2 to convert and quantize the generated ONNX model with QDQ operations. After conversion, I found that the MatrixMultiply operator was not quantized to INT8. As shown in the figure below.

Then, I manually inserted QDQ operators between the two matrices being multiplied, and after conversion, the MatrixMultiply operator was successfully quantized to INT8.

However, an issue occurred: the conversion resulted in the INT8 version of MatrixMultiply taking more time than the original FP16 version. As shown in the figure below, the first bar represents the FP16 execution time, and the second bar represents the INT8 execution time.

"Why is this the case?"
"Moreover, I found on the official website that MatrixMultiply does not support INT8. Why is it that after I manually inserted the QDQ nodes, it can be quantized to INT8?"

lix19937 · 2025-01-17T01:45:03Z

"Moreover, I found on the official website that MatrixMultiply does not support INT8. Why is it that after I manually inserted the QDQ nodes, it can be quantized to INT8?"

which website ?

dingliangxiansheng · 2025-01-17T01:56:22Z

"Moreover, I found on the official website that MatrixMultiply does not support INT8. Why is it that after I manually inserted the QDQ nodes, it can be quantized to INT8?"

which website ?

https://docs.nvidia.com/deeplearning/tensorrt/operators/docs/MatrixMultiply.html this website

lix19937 · 2025-01-17T02:30:04Z

can you upload the two case logs by follow cmd

trtexec --verbose \
--best \
--separateProfileRun \
--onnx=spec \
--dumpProfile \
--dumpLayerInfo --profilingVerbosity=detailed \
--exportLayerInfo=li.json  2>&1 | tee out.log

dingliangxiansheng · 2025-01-17T12:16:44Z

can you upload the two case logs by follow cmd

trtexec --verbose
--best
--separateProfileRun
--onnx=spec
--dumpProfile
--dumpLayerInfo --profilingVerbosity=detailed
--exportLayerInfo=li.json 2>&1 | tee out.log

I have output the logs before. The first one is the log without inserting QDQ nodes in MatrixMultiply. The second one is the log after manually inserting QDQ nodes in MatrixMultiply.

HRNetOCR_MatrixMultiply_without_QDQ.log

HRNetOCR_MatrixMultiply_with_QDQ.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT 8.6.2 MatrixMultiply Operator Quantization #4322

TensorRT 8.6.2 MatrixMultiply Operator Quantization #4322

dingliangxiansheng commented Jan 15, 2025

lix19937 commented Jan 17, 2025

dingliangxiansheng commented Jan 17, 2025

lix19937 commented Jan 17, 2025

dingliangxiansheng commented Jan 17, 2025

TensorRT 8.6.2 MatrixMultiply Operator Quantization #4322

TensorRT 8.6.2 MatrixMultiply Operator Quantization #4322

Comments

dingliangxiansheng commented Jan 15, 2025

lix19937 commented Jan 17, 2025

dingliangxiansheng commented Jan 17, 2025

lix19937 commented Jan 17, 2025

dingliangxiansheng commented Jan 17, 2025