Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT 8.6.2 MatrixMultiply Operator Quantization #4322

Open
dingliangxiansheng opened this issue Jan 15, 2025 · 4 comments
Open

TensorRT 8.6.2 MatrixMultiply Operator Quantization #4322

dingliangxiansheng opened this issue Jan 15, 2025 · 4 comments

Comments

@dingliangxiansheng
Copy link

I am performing QAT quantization on the HRNet OCR model and using TensorRT 8.6.2 to convert and quantize the generated ONNX model with QDQ operations. After conversion, I found that the MatrixMultiply operator was not quantized to INT8. As shown in the figure below.

Image

Then, I manually inserted QDQ operators between the two matrices being multiplied, and after conversion, the MatrixMultiply operator was successfully quantized to INT8.

Image

However, an issue occurred: the conversion resulted in the INT8 version of MatrixMultiply taking more time than the original FP16 version. As shown in the figure below, the first bar represents the FP16 execution time, and the second bar represents the INT8 execution time.

Image

Image

"Why is this the case?"
"Moreover, I found on the official website that MatrixMultiply does not support INT8. Why is it that after I manually inserted the QDQ nodes, it can be quantized to INT8?"

Image

@lix19937
Copy link

"Moreover, I found on the official website that MatrixMultiply does not support INT8. Why is it that after I manually inserted the QDQ nodes, it can be quantized to INT8?"

which website ?

@dingliangxiansheng
Copy link
Author

"Moreover, I found on the official website that MatrixMultiply does not support INT8. Why is it that after I manually inserted the QDQ nodes, it can be quantized to INT8?"

which website ?

https://docs.nvidia.com/deeplearning/tensorrt/operators/docs/MatrixMultiply.html this website

@lix19937
Copy link

can you upload the two case logs by follow cmd

trtexec --verbose \
--best \
--separateProfileRun \
--onnx=spec \
--dumpProfile \
--dumpLayerInfo --profilingVerbosity=detailed \
--exportLayerInfo=li.json  2>&1 | tee out.log

@dingliangxiansheng
Copy link
Author

can you upload the two case logs by follow cmd

trtexec --verbose
--best
--separateProfileRun
--onnx=spec
--dumpProfile
--dumpLayerInfo --profilingVerbosity=detailed
--exportLayerInfo=li.json 2>&1 | tee out.log

I have output the logs before. The first one is the log without inserting QDQ nodes in MatrixMultiply. The second one is the log after manually inserting QDQ nodes in MatrixMultiply.

HRNetOCR_MatrixMultiply_without_QDQ.log

HRNetOCR_MatrixMultiply_with_QDQ.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants