[Performance] kokoro onnx performance issues #23384

MithrilMan · 2025-01-15T23:56:30Z

Describe the issue

Hello.
I'm trying to use a kokoro onnx model and I see there are a lot of performance difference between pytorch CPU and onnxruntime CPU (no special provider specified)

I've seen there is https://www.ui.perfetto.dev/ useful to investigate performance issues
I'm attaching a tracing of 3 inferences in a row, I don't have the skill to understand what's the problem
I've used CPU without any other SessionOptions specified, in c# with onnxruntime

onnxruntime_profile__2025-01-15_15-22-14.zip

using the query select name, (dur/1000000) as ms, ts from slice where parent_id=3 AND category = 'Node' order by dur desc where 3 is the slice id SequentialExecuter of a single inference run, that i used to filter infos of a specific inference run (I don't know if there was a better way to get it) I was able to sort the nodes execution by time spent but I'm not able to go further because I don't know how to evaluate these timings with expected ones

I'd like someone to point me out to a resource about how to detect bottlenecks of a model, or someone who have the skill to help with the issue

To reproduce

thanks @thewh1teagle

"""
pip install kokoro-onnx soundfile
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json
ONNX_PROVIDER=CoreMLExecutionProvider LOG_LEVEL=DEBUG uv run main.py
ONNX_PROVIDER=CPUExecutionProvider LOG_LEVEL=DEBUG uv run main.py
"""

import soundfile as sf
from kokoro_onnx import Kokoro

kokoro = Kokoro("kokoro-v0_19.onnx", "voices.json")
samples, sample_rate = kokoro.create(
    "Hello. This audio generated by kokoro!", voice="af_sarah", speed=1.0, lang="en-us"
)
sf.write("audio.wav", samples, sample_rate)
print("Created audio.wav")

Urgency

Could you please suggest how to properly understand what are the performance problem of a model?
It's hard to me to find proper documentation that guide me to understanding deeply how things works, any link to docs/tutorials are apprecciated (remind I'm mainly a C# guy)

Thanks

Platform

Windows

OS Version

Windows 11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

nuget package 1.20.1

ONNX Runtime API

C#

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

The text was updated successfully, but these errors were encountered:

thewh1teagle · 2025-01-16T00:18:23Z

I experience the same issue.

Reproduce:

"""
pip install kokoro-onnx soundfile
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json
ONNX_PROVIDER=CoreMLExecutionProvider LOG_LEVEL=DEBUG uv run main.py
ONNX_PROVIDER=CPUExecutionProvider LOG_LEVEL=DEBUG uv run main.py
"""

import soundfile as sf
from kokoro_onnx import Kokoro

kokoro = Kokoro("kokoro-v0_19.onnx", "voices.json")
samples, sample_rate = kokoro.create(
    "Hello. This audio generated by kokoro!", voice="af_sarah", speed=1.0, lang="en-us"
)
sf.write("audio.wav", samples, sample_rate)
print("Created audio.wav")

skottmckay · 2025-01-16T02:50:57Z

I don't know what Kokoro does around onnxruntime, but typically you should create the inference session and re-use it. Session creation is expensive, as is the first inference. Any performance measurement should be done from the 2nd inference on to be meaningful.

thewh1teagle · 2025-01-16T02:53:22Z

I don't know what Kokoro does around onnxruntime, but typically you should create the inference session and re-use it. Session creation is expensive, as is the first inference. Any performance measurement should be done from the 2nd inference on to be meaningful.

I create the session only once. There's option to add while loop in the line with kokoro.create

skottmckay · 2025-01-16T06:56:18Z

There's a lot happening inside create besides the ORT run call. Have you measured the time for just the call to ORT run and compared it to the pytorch time for only that work?

MithrilMan added the performance issues related to performance regressions label Jan 15, 2025

github-actions bot added .NET Pull requests that update .net code api:CSharp issues related to the C# API labels Jan 15, 2025

MithrilMan changed the title ~~[Performance]~~ [Performance] kokoro onnx performance issues Jan 15, 2025

thewh1teagle mentioned this issue Jan 16, 2025

Improve speed thewh1teagle/kokoro-onnx#40

Open

thewh1teagle mentioned this issue Jan 16, 2025

Export kokoro to sherpa-onnx k2-fsa/sherpa-onnx#1713

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] kokoro onnx performance issues #23384

[Performance] kokoro onnx performance issues #23384

MithrilMan commented Jan 15, 2025 •

edited

Loading

thewh1teagle commented Jan 16, 2025

skottmckay commented Jan 16, 2025

thewh1teagle commented Jan 16, 2025

skottmckay commented Jan 16, 2025

[Performance] kokoro onnx performance issues #23384

[Performance] kokoro onnx performance issues #23384

Comments

MithrilMan commented Jan 15, 2025 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

thewh1teagle commented Jan 16, 2025

skottmckay commented Jan 16, 2025

thewh1teagle commented Jan 16, 2025

skottmckay commented Jan 16, 2025

MithrilMan commented Jan 15, 2025 •

edited

Loading