Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] 40% slowdown in ONNX Resize Operator on CPU #23391

Open
SuhwanSong opened this issue Jan 16, 2025 · 2 comments
Open

[Performance] 40% slowdown in ONNX Resize Operator on CPU #23391

SuhwanSong opened this issue Jan 16, 2025 · 2 comments
Labels
performance issues related to performance regressions

Comments

@SuhwanSong
Copy link

Describe the issue

We observed a significant performance regression (~40% slowdown) in the Resize operator when using Float32 and Int64 data types on the CPU.
This slowdown impacts workloads that rely heavily on the Resize operator, particularly in image processing tasks.

After the bisect, we found the commit 6cc06ad introduces the slowdown.

range: 5fa4505..6cc06ad

model:

Image

analysis:

[name: model_loading_uri Op: Unknown]: 622 / 662 : 93.95770392749245%
[name: session_initialization Op: Unknown]: 3737 / 4004 : 93.33166833166833%
[name: /Resize_fence_before Op: Resize]: 48 / 1 : 4800.0%
[name: /Resize_kernel_time Op: Resize]: 84187 / 84921 : 99.13566726722483%
[name: /Resize_fence_after Op: Resize]: 0 / 12 : 0.0%
[name: /Resize_1_fence_before Op: Resize]: 4 / 22 : 18.181818181818183%
[name: /Resize_1_kernel_time Op: Resize]: 634602 / 444980 : 142.61360061126342%
[name: /Resize_1_fence_after Op: Resize]: 0 / 0 : 0%
[name: /Ceil_fence_before Op: Ceil]: 0 / 4 : 0.0%
[name: /Ceil_kernel_time Op: Ceil]: 259949 / 262544 : 99.01159424705955%
[name: /Ceil_fence_after Op: Ceil]: 10 / 0 : 0%
[name: SequentialExecutor::Execute Op: Unknown]: 1118971 / 924960 : 120.97506919218128%
[name: model_run Op: Unknown]: 1141098 / 950941 : 119.99671903935155%

To reproduce

  1. Download and unzip "model.zip".
  2. Run the following script.
import time
import onnxruntime
import numpy as np

# Set the random seed
np.random.seed(0)

onnx_model_path = 'model.onnx'

# Load the ONNX model with the CPUExecutionProvider
ort_session = onnxruntime.InferenceSession(onnx_model_path, providers=['CPUExecutionProvider'])
ort_session.get_modelmeta()
inputs = ort_session.get_inputs()

nth = 100000

# Warm-up inference to cache optimizations

input_data = np.load("input.npy", allow_pickle=True).item()
ort_session.run(None, input_data)

# Measure inference time excluding input creation
total_time_ns = 0
for _ in range(nth):

    start_ns = time.perf_counter_ns()
    ort_session.run(None, input_data)
    end_ns = time.perf_counter_ns()

    total_time_ns += end_ns - start_ns

avg_time_ns = total_time_ns / nth
avg_time_ms = avg_time_ns / 1e6

print(f'[{onnxruntime.__version__}] Average inference time: {avg_time_ms:.5f} ms')

Urgency

No response

Platform

Linux

OS Version

6.8.0

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.20.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

model.zip

Is this a quantized model?

Yes

@SuhwanSong SuhwanSong added the performance issues related to performance regressions label Jan 16, 2025
@tianleiwu
Copy link
Contributor

I did not reproduce the issue in my machine.
The latency output for ORT 1.18.0 and ORT 1.20.1 are very close in my machine.

@SuhwanSong
Copy link
Author

The latency outputs for ORT 1.18.0 and 1.20.1 are also very close on my machine.
However, the latencies differ significantly when comparing ORT 1.19.0 and 1.20.1.
This variation is due to a problematic commit introduced between ORT 1.19.0 and 1.20.1.

Could you test with ORT 1.19.0 and 1.20.1? You can also use these two adjacent commits:
5fa4505..6cc06ad

ORT 1.18 vs 1.20. (1.20 is faster than 1.18)

[name: model_loading_uri Op: Unknown]: 648 / 621 : 104.34782608695652%
[name: session_initialization Op: Unknown]: 3572 / 3671 : 97.30318714246799%
[name: /Resize_fence_before Op: Resize]: 0 / 0 : 0%
[name: /Resize_kernel_time Op: Resize]: 83168 / 74997 : 110.89510247076551%
[name: /Resize_fence_after Op: Resize]: 16 / 12 : 133.33333333333331%
[name: /Resize_1_fence_before Op: Resize]: 3 / 16 : 18.75%
[name: /Resize_1_kernel_time Op: Resize]: 775730 / 869052 : 89.26163221533349%
[name: /Resize_1_fence_after Op: Resize]: 0 / 1 : 0.0%
[name: /Ceil_fence_before Op: Ceil]: 0 / 0 : 0%
[name: /Ceil_kernel_time Op: Ceil]: 256553 / 239340 : 107.19186095094844%
[name: /Ceil_fence_after Op: Ceil]: 0 / 0 : 0%
[name: SequentialExecutor::Execute Op: Unknown]: 1279805 / 1329464 : 96.26473526172954%
[name: model_run Op: Unknown]: 1299886 / 1350643 : 96.24201213792246%

ORT 1.19 vs 1.20 (1.19 is faster than 1.20)

[name: model_loading_uri Op: Unknown]: 666 / 499 : 133.46693386773546%
[name: session_initialization Op: Unknown]: 3596 / 2560 : 140.46875%
[name: /Resize_fence_before Op: Resize]: 20 / 0 : 0%
[name: /Resize_kernel_time Op: Resize]: 83151 / 82814 : 100.40693602530972%
[name: /Resize_fence_after Op: Resize]: 18 / 0 : 0%
[name: /Resize_1_fence_before Op: Resize]: 0 / 0 : 0%
[name: /Resize_1_kernel_time Op: Resize]: 781757 / 645172 : 121.17032357262869%
[name: /Resize_1_fence_after Op: Resize]: 21 / 12 : 175.0%
[name: /Ceil_fence_before Op: Ceil]: 0 / 0 : 0%
[name: /Ceil_kernel_time Op: Ceil]: 261431 / 256024 : 102.11191138330781%
[name: /Ceil_fence_after Op: Ceil]: 0 / 0 : 0%
[name: SequentialExecutor::Execute Op: Unknown]: 1253278 / 1110786 : 112.82803348259701%
[name: model_run Op: Unknown]: 1273372 / 1130871 : 112.60099516213609%

ORT 5fa4505 vs 6cc06ad (5fa4505 is faster)

[name: model_loading_uri Op: Unknown]: 673 / 613 : 109.7879282218597%
[name: session_initialization Op: Unknown]: 4013 / 3739 : 107.3281626103236%
[name: /Resize_fence_before Op: Resize]: 0 / 47 : 0.0%
[name: /Resize_kernel_time Op: Resize]: 83368 / 84050 : 99.18857822724569%
[name: /Resize_fence_after Op: Resize]: 12 / 0 : 0%
[name: /Resize_1_fence_before Op: Resize]: 0 / 0 : 0%
[name: /Resize_1_kernel_time Op: Resize]: 635616 / 449415 : 141.4318614198458%
[name: /Resize_1_fence_after Op: Resize]: 0 / 0 : 0%
[name: /Ceil_fence_before Op: Ceil]: 0 / 0 : 0%
[name: /Ceil_kernel_time Op: Ceil]: 261413 / 261593 : 99.9311908193262%
[name: /Ceil_fence_after Op: Ceil]: 3 / 5 : 60.0%
[name: SequentialExecutor::Execute Op: Unknown]: 1110583 / 927192 : 119.77918273669314%
[name: model_run Op: Unknown]: 1132598 / 949394 : 119.2969409960459%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance issues related to performance regressions
Projects
None yet
Development

No branches or pull requests

2 participants