Model having scatterND layer giving different result every time with same input #23396

beinggourav · 2025-01-16T13:55:33Z

Describe the issue

Model has scatterND layer with add reduction, using all zeros for data input, random input for updates. I have fixed seed also so that every time I get same input from np.random
I am getting different result in every run with same input.

To reproduce

Attached the model, run below code to get output. Run multiple times to see difference in each run

import onnxruntime
import numpy as np

def run(sess):
  input_details = sess.get_inputs()
  input_dict = {}

  for i in range(len(input_details)):
    np.random.seed(0)
    if(input_details[i].type == 'tensor(float)'):
        input_data = np.random.randn(*input_details[i].shape).astype(np.float32)

    if(input_details[i].name == "bev_feat"):
        input_data = np.zeros(input_details[i].shape).astype(np.float32)
    elif(input_details[i].name == "/Unsqueeze_2_output_0"):
        input_data = np.random.randint(100, size=(input_details[i].shape)).astype(np.int64)

    input_dict[input_details[i].name] = input_data
    print(input_details[i].name, input_data[0][:10])

  output = list(sess.run(None, input_dict))
  return output

EP_list = ['CPUExecutionProvider']
sess = onnxruntime.InferenceSession("<model_path_here>", providers=EP_list)
output = run(sess)
print("\nOutput:", output[0][0][:10])

Urgency

Yes, stuck at this point. I need a reference of actual onnx output for my scatterND implementation.

Platform

Linux

OS Version

Ubuntu 22.04.3 LTS

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.15.0

ONNX Runtime API

Python

Architecture

X86

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

testing_model.zip

Is this a quantized model?

No

The text was updated successfully, but these errors were encountered:

tianleiwu · 2025-01-16T21:28:54Z

I tried build source from latest main branch.

It changes slightly in Windows:

Output: [-10.843787     3.829667     3.3664234    5.7173443    0.12671056
  11.390273   -12.988312     5.8242483    4.2814593    4.648715  ]

Output: [-10.843787     3.829666     3.3664227    5.7173443    0.12671053
  11.390273   -12.988313     5.8242483    4.2814593    4.648716  ]

Output: [-10.843787     3.829665     3.3664227    5.717345     0.12671053
  11.390273   -12.988313     5.8242474    4.2814593    4.6487164 ]

If I set thread number to 1, the result becomes stable:

session_options = onnxruntime.SessionOptions()
session_options.intra_op_num_threads = 1
sess = onnxruntime.InferenceSession("scatterND_test.onnx", sess_options=session_options, providers=EP_list)

Output: [-10.843784     3.8296654    3.3664236    5.717345     0.12671077
  11.390275   -12.988313     5.8242493    4.2814593    4.6487164 ]

Output: [-10.843784     3.8296654    3.3664236    5.717345     0.12671077
  11.390275   -12.988313     5.8242493    4.2814593    4.6487164 ]

I think the cause is different order of add in multi-threading. Since each add float32 + float32 is "approximated" by another float32 value. a + b + c might not be same as c + b + a.

If you need more "stable" result, consider use double instead of float, or using single thread.

beinggourav · 2025-01-17T07:55:56Z

Hi @tianleiwu
Thanks for quick reply.
I tried setting session_options.intra_op_num_threads = 1, and result is stable now.
But without this option I am seeing major differences (absolute difference > 1) in results:

Output: [ -9.821614    4.3951974   3.9827185   7.968871    2.0045333  13.071521
 -11.583351    2.5462167   5.0686603   6.0265574]

Output: [-10.843784     3.8296654    3.3664236    5.717345     0.12671077
  11.390275   -12.988313     5.8242493    4.2814593    4.6487164 ]

Output: [-10.191404    3.0742202   3.0707393   4.5980406  -1.4840544  12.902637
 -12.532749    5.0352764   3.3857875   5.9804463]

I think this approximation (float32 + float32) should not be that much deviated from actual output. Can you explain little more about this behavior.

tianleiwu · 2025-01-17T18:33:01Z

Consider the following sequence of numbers: 1.0e+10, 1.0, -1.0e+10

Addition Order 1:
1.0e+10 + 1.0 = 1.0e+10 (The addition of 1.0 to 1.0e+10 has negligible effect due to the significant difference in magnitude)
1.0e+10 + (-1.0e+10) = 0.0

Addition Order 2:
1.0e+10 + (-1.0e+10) = 0
0 + 1.0 = 1.0

The precision of float32 is about 7 digits after dot. Normally when I saw some difference in level of e-6, that is acceptable variance.

@yuslepukhin, @liqunfu, as you have reviewed or updated the CPU implementation of ScatterND. Could you also take a look to the issue and see whether it is feasible to have an implementation that could be deterministic (For example, given same index, we can accumulate in order).

beinggourav added the performance issues related to performance regressions label Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model having scatterND layer giving different result every time with same input #23396

Model having scatterND layer giving different result every time with same input #23396

beinggourav commented Jan 16, 2025

tianleiwu commented Jan 16, 2025 •

edited

Loading

beinggourav commented Jan 17, 2025

tianleiwu commented Jan 17, 2025

Model having scatterND layer giving different result every time with same input #23396

Model having scatterND layer giving different result every time with same input #23396

Comments

beinggourav commented Jan 16, 2025

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

tianleiwu commented Jan 16, 2025 • edited Loading

beinggourav commented Jan 17, 2025

tianleiwu commented Jan 17, 2025

tianleiwu commented Jan 16, 2025 •

edited

Loading