Update win-ort-main to tip main 250116 #23398

ashrit-ms · 2025-01-16T18:25:48Z

Description

This PR is to update the win-ort-main branch to the tip main
branch as of 2025-01-16.

Motivation and Context

This update includes the OpenVino fix for debug builds.

### Description  1. Add support for throwing error when hardware is not supported for VitisAI. 2. Add support for unloading VitisAI EP. 3. Add API for Win25. ### Motivation and Context  This is requirement for Win25

### Description Introduces a new optional input (encoder_ibnput_ids) in the decoder graph of the T5 implementation for BeamSearch. This allows usage of pointer generator networks in decoder graph. ### Motivation and Context - Fixes #23123

### Description Fixes crash in QNN dlls when an ETW callback tries to change the QNN log level. This is caused by a function that does not lock a mutex before modifying the QNN log level. ### Motivation and Context An ETW callback into QNN EP leads to a crash within QNN SDK dlls. It happens approximately 1 out of 3 full QNN unit tests runs. The cause is a multithreading synchronization bug in QNN EP. We're not always locking a mutex when ETW calls QNN EP to notify of ETW config change. There are two branches in the QNN EP callback function that try to update the QNN log handle. One branch correctly locks a mutex, but other does not lock it at all. This causes crashes within QNN dlls. - Does not lock mutex: [onnxruntime/onnxruntime/core/providers/qnn/qnn_execution_provider.cc at main · microsoft/onnxruntime](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/qnn/qnn_execution_provider.cc#L426) - Locks mutex: [onnxruntime/onnxruntime/core/providers/qnn/qnn_execution_provider.cc at main · microsoft/onnxruntime](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/qnn/qnn_execution_provider.cc#L442) The fix is to lock the mutex in both paths.

### Description  for ORT 1.21.0 release Create following related issues to track skipped tests due to updated ONNX operators in the ONNX 1.17.0 release: #23162 #23164 #23163 #23161 ### Motivation and Context  --------- Signed-off-by: Liqun Fu <[email protected]> Signed-off-by: Liqun Fu <[email protected]> Co-authored-by: Guenther Schmuelling <[email protected]> Co-authored-by: Yifan Li <[email protected]> Co-authored-by: yf711 <[email protected]>

The algorithm of `SkipSimplifiedLayerNormalization` is quite similar to the `SimplifiedLayerNormalization`, only different is `SkipSimplifiedLayerNormalization` provides an additional output used for calculating the sum of the input, skip and bias (if it exits). BTW, fix a bug in `SimplifiedLayerNormalization`, adding bias if it exits.

### Description Refactor compute plan profiling Support cache coreml model to speed up session initialization. this is only support by user provided entry and user responsible to manage the cache With the cache, session initialization time can be reduced by 50% or more: |model| before| after| |--|--|--| |yolo11.onnx| 0.6s|0.1s| |yolo11-fp16.onnx|1.8s|0.1s| ### Motivation and Context  --------- Co-authored-by: wejoncy <[email protected]> Co-authored-by: Scott McKay <[email protected]>

### Description Enable delay loading hooker for python packages

The SAL2 macros are not always available there ### Description Make SAL2 macros only available on MSVC. ### Motivation and Context #1175

Remove PostBuildCleanup tasks since it is deprecated. It is to address a warning in our pipelines: "Task 'Post Build Cleanup' version 3 (PostBuildCleanup@3) is dependent on a Node version (6) that is end-of-life. Contact the extension owner for an updated version of the task. Task maintainers should review Node upgrade guidance: https://aka.ms/node-runner-guidance" Now the cleanup is controlled in another place: https://learn.microsoft.com/en-us/azure/devops/pipelines/yaml-schema/workspace?view=azure-pipelines The code change was generated by the following Linux command: ```bash find . -name \*.yml -exec sed -i '/PostBuildCleanup/,+2d' {} \; ```

### Description Make arrays with cubin data const. ### Motivation and Context Non-const arrays are put into the .data section which might cause excessive memory usage in some scenarios. Making cubin arrays const allows them to be put into the .rodata section.

### Description  ### Motivation and Context

### Description  For legacy jetson users who use jetpack 5.x, the latest TRT version is 8.5. Add version check to newer trt features to fix build on jetpack 5.x (cuda11.8+gcc11 are required) ### Motivation and Context

### Description  Changed all support tensor type from ir 9 to ir 10. ### Motivation and Context  - See issue #23205 Co-authored-by: Yueqing Zhang <[email protected]>

### Description The Web CI pipeline uses three different Windows machine pools: 1. onnxruntime-Win2022-webgpu-A10 2. onnxruntime-Win2022-VS2022-webgpu-A10 3. onnxruntime-Win-CPU-2022-web This PR merges them together to reduce ongoing maintenance cost.

### Description Use `https.get` instead of `fetch` in ORT Nodejs binding package install script. ### Motivation and Context According to discussions in #23232, the package `global-agent` cannot work with `fetch` API. To make it work with the proxy agent, this PR replaces the `fetch` API with `https.get` in the install script.

### Description This PR is convenient to do post processing for the generated json file when profiling is enabled. Kernel type can be used to aggregate the same type kernels' overall time.

Move Linux GPU CI pipeline to A10 machines which are more advanced. Retire onnxruntime-Linux-GPU-T4 machine pool. Disable run_lean_attention test because the new machines do not have enough shared memory. ``` skip loading trt attention kernel fmha_mhca_fp16_128_256_sm86_kernel because no enough shared memory [E:onnxruntime:, sequential_executor.cc:505 ExecuteKernel] Non-zero status code returned while running MultiHeadAttention node. Name:'MultiHeadAttention_0' Status Message: CUDA error cudaErrorInvalidValue:invalid argument ```

…#23232) ### Description Add proxy agent to fetch request ### Motivation and Context Fixes #23231 --------- Signed-off-by: Junze Wu <[email protected]> Co-authored-by: Yulong Wang <[email protected]>

### Description Update `mocha` to v11.0.1 and `fs-extra` to v11.2.0 ``` # npm audit report nanoid <3.3.8 Severity: moderate Predictable results in nanoid generation when given non-integer values - GHSA-mwcw-c2x4-8c55 fix available via `npm audit fix` node_modules/nanoid mocha 8.2.0 - 10.2.0 Depends on vulnerable versions of nanoid node_modules/mocha 2 moderate severity vulnerabilities ```

### Description 1. Currently Python-Cuda-Publishing-Pipeline only publishes Linux wheels, not Windows wheels. It is because recently we refactored the upstream pipeline("Python-CUDA-Packaging-Pipeline") to use 1ES PT. This PR fixed the issue 2. tools/ci_build/github/azure-pipelines/stages/py-win-gpu-stage.yml no longer includes component-governance-component-detection-steps.yml , because 1ES PT already inserted such a thing 3. Delete tools/ci_build/github/windows/eager/requirements.txt because it is no longer used. ### Motivation and Context The "Python-CUDA-Packaging-Pipeline" is for CUDA 12. "Python CUDA ALT Packaging Pipeline" is for CUDA 11. The two pipelines are very similar, except the CUDA versions are different. Each of them has three parts: build, test, publish. "Python-CUDA-Packaging-Pipeline" is the first part: build. "Python CUDA12 Package Test Pipeline" is the second part. "Python-Cuda-Publishing-Pipeline" is the third part that publishes the packages to an internal ADO feed.

Resolve #21308

### Description Separating result processor out from profiler.py without changing the behaviors of current profile.py ### Motivation and Context Less dependency and smaller code for processing profile from other scenarios. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

The input should be added by skip and bias (if it exits) firstly.

### Description This PR 1) uses override shape instead of tensor original shape in shader key to reduce some shader variants; 2) adds indices shape rank to shader key in case some potential errors.

### Description Fusing Pad & AveragePool requires AveragePool to use `count_include_pad=1`. If the AveragePool already set some padding and `count_include_pad=0`, fusion can't happen. This PR adds a condition to perform fusion depending on those attributes. If fusion occurs, `count_include_pad` is always set to `1`. ### Motivation and Context Fix #22177 (mislabelled as a performance issue but there's an actual bug in the implementation) Bug introduced in #21556

mitigates #23183 while we investigate final solution

### Description Fix comparison of narrow type with wide type in loop condition. ### Motivation and Context Comparison between types of different widths in a loop condition can cause the loop to fail to terminate.

Some quantized models have QDQ around Conv/Gemm but the weight and/or bias are not quantized. This PR adds WeightBiasQuantization optimizer to quantize float weight and/or bias to INT8 and INT32 tensors respectively. We only do this for weight and/or bias initializer so that ConstantFolding will fold the sub-graph to real quantized initializers during the graph optimization next round.

ONNX's MatMul is same as numpy.matmul, which supports input tensors with rank >= 1. But QNN's MatMul can only support input tensors with rank >= 2. This PR is to add MatMulOpBuilder for QNN EP to build QNN graph to support all possible cases of ONNX's MatMul, by adding Reshape nodes if necessary, e.g., if Reshape 1D input to 2D if exists, and Reshape output to expected shape at the end. This PR also tries to use FullyConnected Op for MatMul if 2nd input is 2D initializer or 1D tensor because FullyConnected is faster than MatMul on QNN EP. If 2nd input is 2D tensor, we require it an initializer because FullyConnected requires 2nd input in [n, k] shape, we can transpose it when graph building if it's an initializer (we don't want to add extra Transpose node). Use swin_base model as example, which contains several MatMul nodes with 2nd input is 2D initializer (not followed by Add), running on Gen3 mobile device, before the change, it takes 34.8876 ms, after this change, it's 27.0639 ms.

### Description It has dependency on the following PRs: - #23297 Optimize the ONNX pipeline for Stable Diffusion 3.x and Flux 1.0 models (fp32 or fp16). - [x] Update optimize_pipeline script - [x] Update benchmkark script - [x] Update document about Stable Diffusion 3.x and Flux 1.0 models - [x] Add graph optimizations for MMDit model - [x] FastGelu fusion - [x] RMSNorm fusion - [x] MultiHeadAttention fusion - [x] Add graph optimizations for Flux transformer models - [x] MultiHeadAttention fusion - [x] Update graph optimizations for t5 - [x] Add tests Optimize the ONNX pipeline for Stable Diffusion 3.x and Flux 1.0 models: ``` python optimize_pipeline.py -i ./flux1_schnell_onnx/fp32 -o ./flux1_schnell_onnx/fp16 --float16 Optimize flux1_schnell_onnx/fp32/transformer/model.onnx ... Fused LayerNormalization: 115 Fused SimplifiedLayerNormalization: 152 Fused FastGelu: 76 Fused MultiHeadAttention: 57 ``` ### H100 Benchmark Results * GPU: NVIDIA H100 80GB HBM3 * Image Size: 1024x1024 * Batch Size: 1 Model | Steps | Precision | Engine | Latency (Seconds) | GPU Memory (MB) -- | -- | -- | -- | -- | -- Flux 1.0 Dev | 50 | BF16 | Torch 2.5.1 (compile) | 8.198 | 37,603 Flux 1.0 Dev | 50 | FP16+BF16 | Optimum (ORT) | 10.762 | 41,469 Flux 1.0 Dev | 50 | FP16+FP32 | Optimum (ORT) | 10.891 | 43,545 Flux 1.0 Dev | 50 | BF16 | Torch 2.5.1 (eager) | 12.339 | 36,651 Flux 1.0 Schnell | 4 | BF16 | Torch 2.5.1 (compile) | 0.775 | 37,857 Flux 1.0 Schnell | 4 | FP16+BF16 | Optimum (ORT) | 0.931 | 41,433 Flux 1.0 Schnell | 4 | FP16+FP32 | Optimum (ORT) | 0.939 | 43,809 Flux 1.0 Schnell | 4 | BF16 | Torch 2.5.1 (eager) | 1.120 | 36,629 SD 3.5 Large | 50 | BF16 | Torch 2.5.1 (compile) | 7.466 | 32,217 SD 3.5 Large | 50 | FP16+BF16 | Optimum (ORT) | 10.275 | 36,609 SD 3.5 Large | 50 | FP16+FP32 | Optimum (ORT) | 10.283 | 36,729 SD 3.5 Large | 50 | BF16 | Torch 2.5.1 (eager) | 11.615 | 31,517 SD 3.5 Medium | 50 | BF16 | Torch 2.5.1 (compile) | 3.240 | 21,143 SD 3.5 Medium | 50 | FP16+BF16 | Optimum (ORT) | 4.799 | 25,097 SD 3.5 Medium | 50 | FP16+FP32 | Optimum (ORT) | 4.838 | 25,109 SD 3.5 Medium | 50 | BF16 | Torch 2.5.1 (eager) | 5.582 | 20,489 ### A100 Benchmark Results * GPU: A100-SXM4-80GB * Image Size: 1024x1024 * Batch Size: 1 Model | Steps | Precision | Engine | Latency (Seconds) | GPU Memory (MB) -- | -- | -- | -- | -- | -- Flux 1.0 Dev | 50 | BF16 | Torch 2.5.1 (compile) | 17.593 | 37,723 Flux 1.0 Dev | 50 | FP16+BF16 | Optimum (ORT) | 21.918 | 41,348 Flux 1.0 Dev | 50 | FP16+FP32 | Optimum (ORT) | 22.060 | 44,860 Flux 1.0 Dev | 50 | BF16 | Torch 2.5.1 (eager) | 24.267 | 36,847 Flux 1.0 Schnell | 4 | BF16 | Torch 2.5.1 (compile) | 1.627 | 37,881 Flux 1.0 Schnell | 4 | FP16+BF16 | Optimum (ORT) | 1.884 | 41,537 Flux 1.0 Schnell | 4 | FP16+FP32 | Optimum (ORT) | 1.902 | 44,858 Flux 1.0 Schnell | 4 | BF16 | Torch 2.5.1 (eager) | 2.162 | 36,831 SD 3.5 Large | 50 | BF16 | Torch 2.5.1 (compile) | 15.881 | 32,307 SD 3.5 Large | 50 | FP16+FP32 | Optimum (ORT) | 19.837 | 36,451 SD 3.5 Large | 50 | FP16+BF16 | Optimum (ORT) | 19.964 | 36,461 SD 3.5 Large | 50 | BF16 | Torch 2.5.1 (eager) | 22.477 | 31,513 SD 3.5 Medium | 50 | BF16 | Torch 2.5.1 (compile) | 6.476 | 21,341 SD 3.5 Medium | 50 | FP16+FP32 | Optimum (ORT) | 8.775 | 25,183 SD 3.5 Medium | 50 | BF16 | Torch 2.5.1 (eager) | 10.057 | 20,433 ### Future Works * Triton kernel for matrix multiplication and auto tuning. * FP8/Int8 quantization ### Motivation and Context SD 3.5 Architecture: https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/resolve/main/mmdit-x.png

### Description Updating react-native to 0.70.15 ### Motivation and Context To address the issue with the failed checksum after boost switching URL from Jfrog

### Description  * Remove deprecated gpu arch to control nuget/python package size (latest TRT supports sm75 Turing and newer arch) * Add 90 to support blackwell series in next release (86;89 not considered as adding them will rapidly increase package size) | arch_range | Python-cuda12 | Nuget-cuda12 | | -------------- | ------------------------------------------------------------ | ---------------------------------- | | 60;61;70;75;80 | Linux: 279MB Win: 267MB | Linux: 247MB Win: 235MB | | 75;80 | Linux: 174MB Win: 162MB | Linux: 168MB Win: 156MB | | **75;80;90** | **Linux: 299MB Win: 277MB** | **Linux: 294MB Win: 271MB** | | 75;80;86;89 | [Linux: MB Win: 390MB](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=647457&view=results) | Linux: 416MB Win: 383MB | | 75;80;86;89;90 | [Linux: MB Win: 505MB](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=646536&view=results) | Linux: 541MB Win: 498MB | ### Motivation and Context  Callout: While adding sm90 support, the build of cuda11.8+cudnn8 will be dropped in the coming ORT release, as the build has issue with blackwell (mentioned in comments) and demand on cuda 11 is minor, according to internal ort-cuda11 repo.

Increases operator coverage for webgpu native ep

WebNN doesn't provide a dedicated op for RotaryEmbedding. Instead, we implement it by using a combination of WebNN ops. The decomposed graph is referenced from DML EP at: onnxruntime/core/providers/dml/DmlExecutionProvider/src/Operators/DmlOperatorRotaryEmbedding.cpp

### Description This PR adds unit tests for [fusing the vision components](#20721) of Phi-3 vision and Phi-3.5 vision. ### Motivation and Context Many multi-modal models use a CLIP encoder or a variant of CLIP as part of their encoders. These fusion unit tests will ensure that the vision components of Phi-3 vision and Phi-3.5 vision can still be fused when existing fusions are modified to support more models.

### Description Fix bug in previous change where a failure during `SetupBackend` causes `ReleaseResources `to be called to clean up but does nothing because `backend_setup_completed_ ` is false. `backend_setup_completed_ ` _seems_ to now be redundant so removing it fixes the problem. ### Motivation and Context We are seeing crashes due to the log callback failing to be de-registered

The QNN HTP backend for MatMul is not stable on different versions and platforms. Disable the UT to avoid random failure.

### Description Update xnnpack to remove the dependency on psimd and fp16 libraries. However, coremltool still depends on them, which will be addressed later. Also, update CPUINFO because the latest xnnpack requires CPUINFO's avx10 support. ### Motivation and Context The fewer dependencies the better.

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.5.4 to 0.9.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/ruff/releases">ruff's releases</a>.</em></p> <blockquote> <h2>0.9.1</h2> <h2>Release Notes</h2> <h3>Preview features</h3> <ul> <li>[<code>pycodestyle</code>] Run <code>too-many-newlines-at-end-of-file</code> on each cell in notebooks (<code>W391</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15308">#15308</a>)</li> <li>[<code>ruff</code>] Omit diagnostic for shadowed private function parameters in <code>used-dummy-variable</code> (<code>RUF052</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15376">#15376</a>)</li> </ul> <h3>Rule changes</h3> <ul> <li>[<code>flake8-bugbear</code>] Improve <code>assert-raises-exception</code> message (<code>B017</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15389">#15389</a>)</li> </ul> <h3>Formatter</h3> <ul> <li>Preserve trailing end-of line comments for the last string literal in implicitly concatenated strings (<a href="https://redirect.github.com/astral-sh/ruff/pull/15378">#15378</a>)</li> </ul> <h3>Server</h3> <ul> <li>Fix a bug where the server and client notebooks were out of sync after reordering cells (<a href="https://redirect.github.com/astral-sh/ruff/pull/15398">#15398</a>)</li> </ul> <h3>Bug fixes</h3> <ul> <li>[<code>flake8-pie</code>] Correctly remove wrapping parentheses (<code>PIE800</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15394">#15394</a>)</li> <li>[<code>pyupgrade</code>] Handle comments and multiline expressions correctly (<code>UP037</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15337">#15337</a>)</li> </ul> <h2>Contributors</h2> <ul> <li><a href="https://github.com/AntoineD"><code>@AntoineD</code></a></li> <li><a href="https://github.com/InSyncWithFoo"><code>@InSyncWithFoo</code></a></li> <li><a href="https://github.com/MichaReiser"><code>@MichaReiser</code></a></li> <li><a href="https://github.com/calumy"><code>@calumy</code></a></li> <li><a href="https://github.com/dcreager"><code>@dcreager</code></a></li> <li><a href="https://github.com/dhruvmanila"><code>@dhruvmanila</code></a></li> <li><a href="https://github.com/dylwil3"><code>@dylwil3</code></a></li> <li><a href="https://github.com/sharkdp"><code>@sharkdp</code></a></li> <li><a href="https://github.com/tjkuson"><code>@tjkuson</code></a></li> </ul> <h2>Install ruff 0.9.1</h2> <h3>Install prebuilt binaries via shell script</h3> <pre lang="sh"><code>curl --proto '=https' --tlsv1.2 -LsSf https://github.com/astral-sh/ruff/releases/download/0.9.1/ruff-installer.sh | sh </code></pre> <h3>Install prebuilt binaries via powershell script</h3> <pre lang="sh"><code>powershell -ExecutionPolicy ByPass -c "irm https://github.com/astral-sh/ruff/releases/download/0.9.1/ruff-installer.ps1 | iex" </code></pre>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's changelog</a>.</em></p> <blockquote> <h2>0.9.1</h2> <h3>Preview features</h3> <ul> <li>[<code>pycodestyle</code>] Run <code>too-many-newlines-at-end-of-file</code> on each cell in notebooks (<code>W391</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15308">#15308</a>)</li> <li>[<code>ruff</code>] Omit diagnostic for shadowed private function parameters in <code>used-dummy-variable</code> (<code>RUF052</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15376">#15376</a>)</li> </ul> <h3>Rule changes</h3> <ul> <li>[<code>flake8-bugbear</code>] Improve <code>assert-raises-exception</code> message (<code>B017</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15389">#15389</a>)</li> </ul> <h3>Formatter</h3> <ul> <li>Preserve trailing end-of line comments for the last string literal in implicitly concatenated strings (<a href="https://redirect.github.com/astral-sh/ruff/pull/15378">#15378</a>)</li> </ul> <h3>Server</h3> <ul> <li>Fix a bug where the server and client notebooks were out of sync after reordering cells (<a href="https://redirect.github.com/astral-sh/ruff/pull/15398">#15398</a>)</li> </ul> <h3>Bug fixes</h3> <ul> <li>[<code>flake8-pie</code>] Correctly remove wrapping parentheses (<code>PIE800</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15394">#15394</a>)</li> <li>[<code>pyupgrade</code>] Handle comments and multiline expressions correctly (<code>UP037</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15337">#15337</a>)</li> </ul> <h2>0.9.0</h2> <p>Check out the <a href="https://astral.sh/blog/ruff-v0.9.0">blog post</a> for a migration guide and overview of the changes!</p> <h3>Breaking changes</h3> <p>Ruff now formats your code according to the 2025 style guide. As a result, your code might now get formatted differently. See the formatter section for a detailed list of changes.</p> <p>This release doesn’t remove or remap any existing stable rules.</p> <h3>Stabilization</h3> <p>The following rules have been stabilized and are no longer in preview:</p> <ul> <li><a href="https://docs.astral.sh/ruff/rules/stdlib-module-shadowing/"><code>stdlib-module-shadowing</code></a> (<code>A005</code>). This rule has also been renamed: previously, it was called <code>builtin-module-shadowing</code>.</li> <li><a href="https://docs.astral.sh/ruff/rules/builtin-lambda-argument-shadowing/"><code>builtin-lambda-argument-shadowing</code></a> (<code>A006</code>)</li> <li><a href="https://docs.astral.sh/ruff/rules/slice-to-remove-prefix-or-suffix/"><code>slice-to-remove-prefix-or-suffix</code></a> (<code>FURB188</code>)</li> <li><a href="https://docs.astral.sh/ruff/rules/boolean-chained-comparison/"><code>boolean-chained-comparison</code></a> (<code>PLR1716</code>)</li> <li><a href="https://docs.astral.sh/ruff/rules/decimal-from-float-literal/"><code>decimal-from-float-literal</code></a> (<code>RUF032</code>)</li> <li><a href="https://docs.astral.sh/ruff/rules/post-init-default/"><code>post-init-default</code></a> (<code>RUF033</code>)</li> <li><a href="https://docs.astral.sh/ruff/rules/useless-if-else/"><code>useless-if-else</code></a> (<code>RUF034</code>)</li> </ul> <p>The following behaviors have been stabilized:</p> <ul> <li><a href="https://docs.astral.sh/ruff/rules/pytest-parametrize-names-wrong-type/"><code>pytest-parametrize-names-wrong-type</code></a> (<code>PT006</code>): Detect <a href="https://docs.pytest.org/en/7.1.x/how-to/parametrize.html#parametrize"><code>pytest.parametrize</code></a> calls outside decorators and calls with keyword arguments.</li> </ul>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/astral-sh/ruff/commit/12f86f39a4691e44b62c11dd4bc376a16e358f43"><code>12f86f3</code></a> Ruff 0.9.1 (<a href="https://redirect.github.com/astral-sh/ruff/issues/15407">#15407</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/2b28d566a4a891339a43a35c818f5b155c0b9edd"><code>2b28d56</code></a> Associate a trailing end-of-line comment in a parenthesized implicit concaten...</li> <li><a href="https://github.com/astral-sh/ruff/commit/adca7bd95cf315ca14e34ab3eac6deb73e154f1d"><code>adca7bd</code></a> Remove pygments pin (<a href="https://redirect.github.com/astral-sh/ruff/issues/15404">#15404</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/6b98a26452ec1bde8b445c82c097d03c78213c1d"><code>6b98a26</code></a> [red-knot] Support <code>assert_type</code> (<a href="https://redirect.github.com/astral-sh/ruff/issues/15194">#15194</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/c87463842a6e19976b6f3401137b6932e4a7bb71"><code>c874638</code></a> [red-knot] Move tuple-containing-Never tests to Markdown (<a href="https://redirect.github.com/astral-sh/ruff/issues/15402">#15402</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/c364b586f9177a22f4556f86e434f21dfaf82c38"><code>c364b58</code></a> [<code>flake8-pie</code>] Correctly remove wrapping parentheses (<code>PIE800</code>) (<a href="https://redirect.github.com/astral-sh/ruff/issues/15394">#15394</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/73d424ee5e6963d577e196d71c3b19c82e84e612"><code>73d424e</code></a> Fix outdated doc for handling the default file types with the pre-commit hook...</li> <li><a href="https://github.com/astral-sh/ruff/commit/6e9ff445fd8559972b423370de20563a9c2db8d4"><code>6e9ff44</code></a> Insert the cells from the <code>start</code> position (<a href="https://redirect.github.com/astral-sh/ruff/issues/15398">#15398</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/f2c3ddc5eaa2ce107a200e134be82fc36afce06b"><code>f2c3ddc</code></a> [red-knot] Move intersection type tests to Markdown (<a href="https://redirect.github.com/astral-sh/ruff/issues/15396">#15396</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/b861551b6ac928c25136d76151162f6fefc9cf71"><code>b861551</code></a> Remove unnecessary backticks (<a href="https://redirect.github.com/astral-sh/ruff/issues/15393">#15393</a>)</li> <li>Additional commits viewable in <a href="https://github.com/astral-sh/ruff/compare/0.5.4...0.9.1">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ruff&package-manager=pip&previous-version=0.5.4&new-version=0.9.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Justin Chu <[email protected]> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

…es (#23249) Add support to mainline Onnxruntime of changes from the ROCm Team's changes ### Motivation and Context Various bugfixes, and changes added between ROCm 6.2 and 6.3 that haven't been upstreamed yet to mainline --------- Co-authored-by: Yueqing Zhang <[email protected]> Co-authored-by: Yueqing Zhang <[email protected]> Co-authored-by: Jeff Daily <[email protected]> Co-authored-by: Artur Wojcik <[email protected]> Co-authored-by: Ted Themistokleous <[email protected]> Co-authored-by: Xinya Zhang <[email protected]> Co-authored-by: ikalinic <[email protected]> Co-authored-by: sstamenk <[email protected]>

### Description This undo the changes from #23281

### Description  - Implemented the DepthToSpace uint8_t kernel. - Enabled DropQDQNodesRules for DepthToSpace. - Added unit tests for the DepthToSpace uint8_t kernel. ### Motivation and Context  This commit aims to enhance the performance of the Image Super-Resolution INT8 Model (RFDN). Specifically, it improves the Inference Per Second (IPS) by 25%, providing a significant boost in efficiency and speed.

### Description [webgpu] fix Split operator implementation when input is 1D

Update android_min_sdk_version to 24 and android_target_sdk_version to 34. Previously Jian already updated the values for some pipelines. This PR updates the other occurrences to make things consistent Why android_min_sdk_version is set to 24: Because React Native requires so: react-native-community/discussions-and-proposals#802 Why android_target_sdk_version is set to 34: Because according to Google Play's policy, new apps and app updates must target Android 14 (API level 34) to be submitted to Google Play. https://support.google.com/googleplay/android-developer/answer/11926878?hl=en

…3330) ### Description Set power config id and the default power mode from provider option (if there is) for main thread, otherwise it will mess up the power mode if user just create session without run it. The issue fixed by this PR is: Process 1 just creates the session without run it. Then, start process 2 which creates the session and run it with power saver mode. The result is with burst power mode.

### Description Docker's buildx has four different drivers: 1. default 2. docker-container 3. kubernetes 4. remote Now we are using "docker-container". This PR change it to the default driver, because the container driver needs to fetch an image from docker hub which is no longer free and has a rate limit.

github-advanced-security · 2025-01-16T18:25:51Z

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

ashrit-ms · 2025-01-16T18:29:15Z

@microsoft-github-policy-service agree company="Microsoft"

…

________________________________ From: microsoft-github-policy-service[bot] ***@***.***> Sent: Thursday, January 16, 2025 10:26 AM To: microsoft/onnxruntime ***@***.***> Cc: Ashrit Shetty ***@***.***>; Mention ***@***.***> Subject: Re: [microsoft/onnxruntime] Update win-ort-main to tip main 250116 (PR #23398) @ashrit-ms<https://github.com/ashrit-ms> please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information. @microsoft-github-policy-service agree [company="{your company}"] Options: * (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer. @microsoft-github-policy-service agree * (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer. @microsoft-github-policy-service agree company="Microsoft" Contributor License Agreement Contribution License Agreement This Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”), and conveys certain license rights to Microsoft Corporation and its affiliates (“Microsoft”) for Your contributions to Microsoft open source projects. This Agreement is effective as of the latest signature date below. 1. Definitions. “Code” means the computer software code, whether in human-readable or machine-executable form, that is delivered by You to Microsoft under this Agreement. “Project” means any of the projects owned or managed by Microsoft and offered under a license approved by the Open Source Initiative (www.opensource.org<http://www.opensource.org/>). “Submit” is the act of uploading, submitting, transmitting, or distributing code or other content to any Project, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Project for the purpose of discussing and improving that Project, but excluding communication that is conspicuously marked or otherwise designated in writing by You as “Not a Submission.” “Submission” means the Code and any other copyrightable material Submitted by You, including any associated comments and documentation. 2. Your Submission. You must agree to the terms of this Agreement before making a Submission to any Project. This Agreement covers any and all Submissions that You, now or in the future (except as described in Section 4 below), Submit to any Project. 3. Originality of Work. You represent that each of Your Submissions is entirely Your original work. Should You wish to Submit materials that are not Your original work, You may Submit them separately to the Project if You (a) retain all copyright and license information that was in the materials as You received them, (b) in the description accompanying Your Submission, include the phrase “Submission containing materials of a third party:” followed by the names of the third party and any licenses or other restrictions of which You are aware, and (c) follow any other instructions in the Project’s written guidelines concerning Submissions. 4. Your Employer. References to “employer” in this Agreement include Your employer or anyone else for whom You are acting in making Your Submission, e.g. as a contractor, vendor, or agent. If Your Submission is made in the course of Your work for an employer or Your employer has intellectual property rights in Your Submission by contract or applicable law, You must secure permission from Your employer to make the Submission before signing this Agreement. In that case, the term “You” in this Agreement will refer to You and the employer collectively. If You change employers in the future and desire to Submit additional Submissions for the new employer, then You agree to sign a new Agreement and secure permission from the new employer before Submitting those Submissions. 5. Licenses. * Copyright License. You grant Microsoft, and those who receive the Submission directly or indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license in the Submission to reproduce, prepare derivative works of, publicly display, publicly perform, and distribute the Submission and such derivative works, and to sublicense any or all of the foregoing rights to third parties. * Patent License. You grant Microsoft, and those who receive the Submission directly or indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license under Your patent claims that are necessarily infringed by the Submission or the combination of the Submission with the Project to which it was Submitted to make, have made, use, offer to sell, sell and import or otherwise dispose of the Submission alone or with the Project. * Other Rights Reserved. Each party reserves all rights not expressly granted in this Agreement. No additional licenses or rights whatsoever (including, without limitation, any implied licenses) are granted by implication, exhaustion, estoppel or otherwise. 1. Representations and Warranties. You represent that You are legally entitled to grant the above licenses. You represent that each of Your Submissions is entirely Your original work (except as You may have disclosed under Section 3). You represent that You have secured permission from Your employer to make the Submission in cases where Your Submission is made in the course of Your work for Your employer or Your employer has intellectual property rights in Your Submission by contract or applicable law. If You are signing this Agreement on behalf of Your employer, You represent and warrant that You have the necessary authority to bind the listed employer to the obligations contained in this Agreement. You are not expected to provide support for Your Submission, unless You choose to do so. UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING, AND EXCEPT FOR THE WARRANTIES EXPRESSLY STATED IN SECTIONS 3, 4, AND 6, THE SUBMISSION PROVIDED UNDER THIS AGREEMENT IS PROVIDED WITHOUT WARRANTY OF ANY KIND, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY OF NONINFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. 2. Notice to Microsoft. You agree to notify Microsoft in writing of any facts or circumstances of which You later become aware that would make Your representations in this Agreement inaccurate in any respect. 3. Information about Submissions. You agree that contributions to Projects and information about contributions may be maintained indefinitely and disclosed publicly, including Your name and other information that You submit with Your Submission. 4. Governing Law/Jurisdiction. This Agreement is governed by the laws of the State of Washington, and the parties consent to exclusive jurisdiction and venue in the federal courts sitting in King County, Washington, unless no federal subject matter jurisdiction exists, in which case the parties consent to exclusive jurisdiction and venue in the Superior Court of King County, Washington. The parties waive all defenses of lack of personal jurisdiction and forum non-conveniens. 5. Entire Agreement/Assignment. This Agreement is the entire agreement between the parties, and supersedes any and all prior agreements, understandings or communications, written or oral, between the parties relating to the subject matter hereof. This Agreement may be assigned by Microsoft. — Reply to this email directly, view it on GitHub<#23398 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BJ6TEB3IIWGHGRKYSK2QU2D2K72V7AVCNFSM6AAAAABVKHZZNCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJWGQYTMMRRG4>. You are receiving this because you were mentioned.Message ID: ***@***.***>

### Description This PR allows WebGPU EP to be built with Emscripten for WebAssembly, Including: - cmake build files update to support correct setup for Emscripten. - code changes to fix build breaks for wasm - change in Web CI pipeline to add a build-only target for wasm with `--use_webgpu`.

Use ruff as the code formatter in place of black and isort since it is much faster, and as projects like PyTorch and ONNX have adopted ruff format as well. This PR include only auto-fixed changes in formatting.

onnxruntime/python/tools/quantization/operators/gemm.py

+from ..quant_utils import (
+    TENSOR_NAME_QUANT_SUFFIX,
+    QuantizedValue,
+    QuantizedValueType,
+    attribute_to_kwarg,
+    find_by_name,  # noqa: F401
+    get_mul_node,  # noqa: F401
+    ms_domain,
+)


onnxruntime/test/python/quantization/test_conv_dynamic.py

+from op_test_utils import (
+    TestDataFeeds,  # noqa: F401
+    check_model_correctness,
+    check_op_type_count,
+    check_op_type_order,  # noqa: F401
+    check_qtype_by_node_type,
+)


onnxruntime/test/python/quantization/test_op_pooling.py

+from op_test_utils import (
+    TestDataFeeds,
+    check_model_correctness,
+    check_op_nodes,  # noqa: F401
+    check_op_type_count,
+    check_qtype_by_node_type,
+)


### Description Follw up #21897 To be compatible with onnx 17.0, Registering opset 22 is required in terms of the [updated operators (bfloat16)](https://github.com/onnx/onnx/releases/tag/v1.17.0) ### Motivation and Context Fix #23162 Fix #23161 Fix #23164 (Xnnpack) ### Remaining issue #23163 (QNN) See [the file](https://github.com/microsoft/onnxruntime/pull/23344/files#diff-04f5d6db0a6873f7299ed06ff1ec45a49e69f0865cb32f4397cd56db0cd0a784) ### Result of `find_optimizer_opset_version_updates_required.py (cpu only)` ``` [WARNING] - Newer opset found for kOnnxDomain.Conv. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/conv_add_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.IsInf. Latest:20 Optimizer support ends at 10. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/isinf_reducesum_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/isinf_reducesum_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/isinf_reducesum_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.HardSigmoid. Latest:22 Optimizer support ends at 6. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/conv_add_act_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Transpose. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.Conv. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.MaxPool. Latest:22 Optimizer support ends at 12. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.AveragePool. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.BatchNormalization. Latest:15 Optimizer support ends at 14. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.Transpose. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.Upsample. Latest:10 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.Resize. Latest:19 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.GlobalMaxPool. Latest:22 Optimizer support ends at 1. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.GlobalAveragePool. Latest:22 Optimizer support ends at 1. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.Shape. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pre_shape_node_elimination.cc [WARNING] - Newer opset found for kOnnxDomain.Conv. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/conv_bn_fusion.cc [ERROR] - Call/Declaration is split over multiple lines. Please check manually.File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/label_encoder_fusion.cc Line:49 [ERROR] - Failed to find version information for "ai.onnx.ml".LabelEncoder. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/label_encoder_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.HardSigmoid. Latest:22 Optimizer support ends at 6. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/conv_activation_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Dropout. Latest:22 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/dropout_elimination.cc [WARNING] - Newer opset found for kOnnxDomain.Transpose. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/gemm_transpose_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Transpose. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/gemm_transpose_fusion.cc [ERROR] - Symbolic name of 'ignorable_nodes[index].first' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/matmul_bn_fusion.cc [ERROR] - Symbolic name of 'dest.first' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/matmul_bn_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Conv. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pad_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.AveragePool. Latest:22 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pad_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.MaxPool. Latest:22 Optimizer support ends at 12. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pad_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Pad. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pad_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pad_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Dropout. Latest:22 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/bias_dropout_fusion.cc [ERROR] - Failed to find version information for kMSDomain.BitmaskDropout. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/bias_dropout_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Clip. Latest:13 Optimizer support ends at 6. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/relu_clip_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/fast_gelu_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/fast_gelu_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Reshape. Latest:21 Optimizer support ends at 14. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/reshape_fusion.cc [ERROR] - Failed to find version information for kMSDomain.ConcatTraining. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/reshape_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Where. Latest:16 Optimizer support ends at 9. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/not_where_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Where. Latest:16 Optimizer support ends at 9. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/not_where_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Conv. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/conv_mul_fusion.cc [ERROR] - Symbolic name of 'QOpName' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/qdq_util.cc [ERROR] - Symbolic name of 'QOpName' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/qdq_util.cc [ERROR] - Symbolic name of 'DQOpName' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/qdq_util.cc [ERROR] - Symbolic name of 'DQOpName' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/qdq_util.cc [ERROR] - Call/Declaration is split over multiple lines. Please check manually.File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/avx2_weight_s8_to_u8.cc Line:170 [WARNING] - Newer opset found for kOnnxDomain.MaxPool. Latest:22 Optimizer support ends at 12. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/qdq_propagation.cc [ERROR] - Symbolic name of 'current_node.OpType(' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/compute_optimizer/upstream_transformer_base.cc [WARNING] - Newer opset found for kOnnxDomain.Reshape. Latest:21 Optimizer support ends at 14. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/compute_optimizer/upstream_reshape.cc [WARNING] - Newer opset found for kOnnxDomain.Transpose. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/attention_fusion_helper.h ```

Add a tool to generate node_block_list used in [float16 conversion tool](https://github.com/microsoft/onnxruntime/blob/04030f64be10e020d3ac9aa5ba7d0f2917cbd14e/onnxruntime/python/tools/transformers/float16.py#L175). Previously, we have a feature to dump statistics data (like min, max) of each node input/output. However, it is time consuming to generate a list of nodes that need to be kept in float32 when model is large. This could help speed up the process by outputting a list of nodes that have potential overflow in float-to-half conversion. Usage is to build onnxruntime from source with ` --cmake_extra_defines onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1`, then set some environment variables before running float32 optimized onnx model like: ``` export ORT_DEBUG_NODE_IO_DUMP_HALF_CONVERSION_OVERFLOW=1 export ORT_DEBUG_NODE_IO_HALF_OVERFLOW_THRESHOLD=50000 python benchmark.py -e optimum --height 1024 --width 1024 --steps 3 -b 1 -v Flux.1D -p flux1_dev_onnx/fp32_opt --skip_warmup ``` The threshold `ORT_DEBUG_NODE_IO_HALF_OVERFLOW_THRESHOLD` shall be <= 65504. The default value is 50000 if the environment variable is not set. It is better to leave some margin if number of samples are not large enough in the test. As a demo, we add an option --skip_warmup to benchmark.py for Flux, so that we can reduce the time on dumping warm-up runs. Example snippet of stdout (each inference session has such a summary when session ended): ``` Total counter in node dumping: 141 Found 2 nodes cannot be converted to half precision due to potential input/output overflow. Operator frequencies for these nodes: Softmax : 1 MatMul : 1 # ------- # Example python script for float16 conversion # For details, search `node_block_list` in https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/float16.py # ------- from onnxruntime.transformers.onnx_model import OnnxModel m = OnnxModel(onnx.load('flux1_dev_onnx/fp32_opt/vae_decoder/model.onnx')) node_block_list = [ '/decoder/mid_block/attentions.0/Softmax', '/decoder/mid_block/attentions.0/MatMul', ] m.convert_float_to_float16(keep_io_types=False, node_block_list=node_block_list) m.save_model_to_file('fp16/optimized.onnx', use_external_data_format=False) ``` Then you can use the python script to convert corresponding model to float16. ### Motivation and Context It is a tool used to generate node_block_list used in float16 conversion of stable diffusion 3.x and flux models in #22986. In stable diffusion or Flux pipeline, there are multiple models and there could be multiple session runs for each model. Without a proper tool, it is time consuming to get node_block_list for each model.

### Description  The old `GetCapability` function of WebNN EP is just a very simple search for groups of nodes that can be handled. This doesn't work well in the following example graph, where A and D could be handled by the EP, but B is between them in the topological order, as you get two single node capabilities. However, it may also be advantageous if C and E could be handled by the EP, since they would be combined with D even though they are not connected. ``` A B C | / | D E | | ``` Therefore, we improve partitioning results by reusing `utils::CreateSupportedPartitions`, which walks the edges for each node that the EP can handle as they are iterated in topological order. This would guarantee that all connected nodes that can be handled are grouped together. Correspondingly, we modify the `webnn::GetSupportedNodes` function to return the supported nodes instead of the group of supported partitions. ### Motivation and Context  Co-authored-by: Dwayne Robinson <[email protected]>

onnxruntime/core/providers/cpu/cpu_execution_provider.cc

+class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, Selu);
+class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, float, Sin);
+class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, double, Sin);
+class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 22, Softsign);


onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.cc

+    DeepCpuGruOp);
+
+ONNX_CPU_OPERATOR_KERNEL(
+    GRU,


BoarQing and others added 30 commits December 20, 2024 22:03

Enable delay loading hooker for python packages (#23227)

69bb53d

### Description Enable delay loading hooker for python packages

Update onnxruntime_c_api.h to work with MinGW (#23169)

2116fd1

The SAL2 macros are not always available there ### Description Make SAL2 macros only available on MSVC. ### Motivation and Context #1175

[js/webgpu] validate transpose perm if specified (#23197)

a3833a5

### Description  ### Motivation and Context

Merge web machine pools (#23243)

5d692b0

### Description The Web CI pipeline uses three different Windows machine pools: 1. onnxruntime-Win2022-webgpu-A10 2. onnxruntime-Win2022-VS2022-webgpu-A10 3. onnxruntime-Win-CPU-2022-web This PR merges them together to reduce ongoing maintenance cost.

[webgpu] Add kernel type to profile info (#23167)

4247153

### Description This PR is convenient to do post processing for the generated json file when profiling is enabled. Kernel type can be used to aggregate the same type kernels' overall time.

[js/node] add proxy agent support for onnxruntime-node install script (…

2a16ad0

…#23232) ### Description Add proxy agent to fetch request ### Motivation and Context Fixes #23231 --------- Signed-off-by: Junze Wu <[email protected]> Co-authored-by: Yulong Wang <[email protected]>

fix pipeline build-perf-test-binaries (#23255)

21b4d2a

[build] Be compatible with the latest protobuf (#23260)

704523c

Resolve #21308

[WebNN] Fix bug in SkipSimplifiedLayerNormalization (#23236)

519fae0

The input should be added by skip and bias (if it exits) firstly.

[webgpu] Use override shape in shader key (#23188)

4883ec5

### Description This PR 1) uses override shape instead of tensor original shape in shader key to reduce some shader variants; 2) adds indices shape rank to shader key in case some potential errors.

disable scatternd op for jsep (#23277)

d8e8d4f

mitigates #23183 while we investigate final solution

Address CodeQL security issues on comparison of different types (#23276)

c75681a

### Description Fix comparison of narrow type with wide type in loop condition. ### Motivation and Context Comparison between types of different widths in a loop condition can cause the loop to fail to terminate.

tianleiwu and others added 17 commits January 14, 2025 13:37

Updating react-native to 0.70.15 (#23279)

39db20f

### Description Updating react-native to 0.70.15 ### Motivation and Context To address the issue with the failed checksum after boost switching URL from Jfrog

slice operator implementation for webgpu native (#23264)

c07afd3

Increases operator coverage for webgpu native ep

Disable QNN HTP MatMul Op Test to Avoid Random Failure (#23371)

cff0ec5

The QNN HTP backend for MatMul is not stable on different versions and platforms. Disable the UT to avoid random failure.

Remove hot path for pre-0.70.15 RN fix (#23382)

331fc36

### Description This undo the changes from #23281

[webgpu] fix Split operator implementation when input is 1D (#23376)

1d97d6e

### Description [webgpu] fix Split operator implementation when input is 1D

ashrit-ms requested a review from a team as a code owner January 16, 2025 18:25

ashrit-ms requested a review from jywu-msft January 16, 2025 18:27

ashrit-ms self-assigned this Jan 16, 2025

fs-eire and others added 2 commits January 16, 2025 10:52

Use ruff as the formatter to replace black-isort (#23397)

c7c8757

Use ruff as the code formatter in place of black and isort since it is much faster, and as projects like PyTorch and ONNX have adopted ruff format as well. This PR include only auto-fixed changes in formatting.

github-advanced-security bot found potential problems Jan 16, 2025

View reviewed changes

titaiwangms and others added 3 commits January 16, 2025 11:26

github-advanced-security bot found potential problems Jan 16, 2025

View reviewed changes

ashrit-ms merged commit df87317 into win-ort-main Jan 16, 2025
138 of 150 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update win-ort-main to tip main 250116 #23398

Update win-ort-main to tip main 250116 #23398

ashrit-ms commented Jan 16, 2025

github-advanced-security bot commented Jan 16, 2025

ashrit-ms commented Jan 16, 2025 via email

Update win-ort-main to tip main 250116 #23398

Update win-ort-main to tip main 250116 #23398

Conversation

ashrit-ms commented Jan 16, 2025

Description

Motivation and Context

github-advanced-security bot commented Jan 16, 2025

ashrit-ms commented Jan 16, 2025 via email