Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model having scatterND layer giving different result every time with same input #23396

Open
beinggourav opened this issue Jan 16, 2025 · 3 comments
Labels
performance issues related to performance regressions

Comments

@beinggourav
Copy link

Describe the issue

Model has scatterND layer with add reduction, using all zeros for data input, random input for updates. I have fixed seed also so that every time I get same input from np.random
I am getting different result in every run with same input.

To reproduce

Attached the model, run below code to get output. Run multiple times to see difference in each run

import onnxruntime
import numpy as np

def run(sess):
  input_details = sess.get_inputs()
  input_dict = {}

  for i in range(len(input_details)):
    np.random.seed(0)
    if(input_details[i].type == 'tensor(float)'):
        input_data = np.random.randn(*input_details[i].shape).astype(np.float32)

    if(input_details[i].name == "bev_feat"):
        input_data = np.zeros(input_details[i].shape).astype(np.float32)
    elif(input_details[i].name == "/Unsqueeze_2_output_0"):
        input_data = np.random.randint(100, size=(input_details[i].shape)).astype(np.int64)

    input_dict[input_details[i].name] = input_data
    print(input_details[i].name, input_data[0][:10])

  output = list(sess.run(None, input_dict))
  return output

EP_list = ['CPUExecutionProvider']
sess = onnxruntime.InferenceSession("<model_path_here>", providers=EP_list)
output = run(sess)
print("\nOutput:", output[0][0][:10])

Urgency

Yes, stuck at this point. I need a reference of actual onnx output for my scatterND implementation.

Platform

Linux

OS Version

Ubuntu 22.04.3 LTS

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.15.0

ONNX Runtime API

Python

Architecture

X86

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

testing_model.zip

Is this a quantized model?

No

@beinggourav beinggourav added the performance issues related to performance regressions label Jan 16, 2025
@tianleiwu
Copy link
Contributor

tianleiwu commented Jan 16, 2025

I tried build source from latest main branch.

It changes slightly in Windows:

Output: [-10.843787     3.829667     3.3664234    5.7173443    0.12671056
  11.390273   -12.988312     5.8242483    4.2814593    4.648715  ]

Output: [-10.843787     3.829666     3.3664227    5.7173443    0.12671053
  11.390273   -12.988313     5.8242483    4.2814593    4.648716  ]

Output: [-10.843787     3.829665     3.3664227    5.717345     0.12671053
  11.390273   -12.988313     5.8242474    4.2814593    4.6487164 ]

If I set thread number to 1, the result becomes stable:

session_options = onnxruntime.SessionOptions()
session_options.intra_op_num_threads = 1
sess = onnxruntime.InferenceSession("scatterND_test.onnx", sess_options=session_options, providers=EP_list)
Output: [-10.843784     3.8296654    3.3664236    5.717345     0.12671077
  11.390275   -12.988313     5.8242493    4.2814593    4.6487164 ]

Output: [-10.843784     3.8296654    3.3664236    5.717345     0.12671077
  11.390275   -12.988313     5.8242493    4.2814593    4.6487164 ]

I think the cause is different order of add in multi-threading. Since each add float32 + float32 is "approximated" by another float32 value. a + b + c might not be same as c + b + a.

If you need more "stable" result, consider use double instead of float, or using single thread.

@beinggourav
Copy link
Author

Hi @tianleiwu
Thanks for quick reply.
I tried setting session_options.intra_op_num_threads = 1, and result is stable now.
But without this option I am seeing major differences (absolute difference > 1) in results:

Output: [ -9.821614    4.3951974   3.9827185   7.968871    2.0045333  13.071521
 -11.583351    2.5462167   5.0686603   6.0265574]

Output: [-10.843784     3.8296654    3.3664236    5.717345     0.12671077
  11.390275   -12.988313     5.8242493    4.2814593    4.6487164 ]

Output: [-10.191404    3.0742202   3.0707393   4.5980406  -1.4840544  12.902637
 -12.532749    5.0352764   3.3857875   5.9804463]

I think this approximation (float32 + float32) should not be that much deviated from actual output. Can you explain little more about this behavior.

@tianleiwu
Copy link
Contributor

Consider the following sequence of numbers: 1.0e+10, 1.0, -1.0e+10

Addition Order 1:
1.0e+10 + 1.0 = 1.0e+10 (The addition of 1.0 to 1.0e+10 has negligible effect due to the significant difference in magnitude)
1.0e+10 + (-1.0e+10) = 0.0

Addition Order 2:
1.0e+10 + (-1.0e+10) = 0
0 + 1.0 = 1.0

The precision of float32 is about 7 digits after dot. Normally when I saw some difference in level of e-6, that is acceptable variance.

@yuslepukhin, @liqunfu, as you have reviewed or updated the CPU implementation of ScatterND. Could you also take a look to the issue and see whether it is feasible to have an implementation that could be deterministic (For example, given same index, we can accumulate in order).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance issues related to performance regressions
Projects
None yet
Development

No branches or pull requests

2 participants