Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] kokoro onnx performance issues #23384

Open
MithrilMan opened this issue Jan 15, 2025 · 4 comments
Open

[Performance] kokoro onnx performance issues #23384

MithrilMan opened this issue Jan 15, 2025 · 4 comments
Labels
api:CSharp issues related to the C# API .NET Pull requests that update .net code performance issues related to performance regressions

Comments

@MithrilMan
Copy link

MithrilMan commented Jan 15, 2025

Describe the issue

Hello.
I'm trying to use a kokoro onnx model and I see there are a lot of performance difference between pytorch CPU and onnxruntime CPU (no special provider specified)

I've seen there is https://www.ui.perfetto.dev/ useful to investigate performance issues
I'm attaching a tracing of 3 inferences in a row, I don't have the skill to understand what's the problem
I've used CPU without any other SessionOptions specified, in c# with onnxruntime

onnxruntime_profile__2025-01-15_15-22-14.zip

using the query select name, (dur/1000000) as ms, ts from slice where parent_id=3 AND category = 'Node' order by dur desc where 3 is the slice id SequentialExecuter of a single inference run, that i used to filter infos of a specific inference run (I don't know if there was a better way to get it) I was able to sort the nodes execution by time spent but I'm not able to go further because I don't know how to evaluate these timings with expected ones

I'd like someone to point me out to a resource about how to detect bottlenecks of a model, or someone who have the skill to help with the issue

To reproduce

thanks @thewh1teagle

"""
pip install kokoro-onnx soundfile
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json
ONNX_PROVIDER=CoreMLExecutionProvider LOG_LEVEL=DEBUG uv run main.py
ONNX_PROVIDER=CPUExecutionProvider LOG_LEVEL=DEBUG uv run main.py
"""

import soundfile as sf
from kokoro_onnx import Kokoro

kokoro = Kokoro("kokoro-v0_19.onnx", "voices.json")
samples, sample_rate = kokoro.create(
    "Hello. This audio generated by kokoro!", voice="af_sarah", speed=1.0, lang="en-us"
)
sf.write("audio.wav", samples, sample_rate)
print("Created audio.wav")

Urgency

Could you please suggest how to properly understand what are the performance problem of a model?
It's hard to me to find proper documentation that guide me to understanding deeply how things works, any link to docs/tutorials are apprecciated (remind I'm mainly a C# guy)

Thanks

Platform

Windows

OS Version

Windows 11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

nuget package 1.20.1

ONNX Runtime API

C#

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

@MithrilMan MithrilMan added the performance issues related to performance regressions label Jan 15, 2025
@github-actions github-actions bot added .NET Pull requests that update .net code api:CSharp issues related to the C# API labels Jan 15, 2025
@MithrilMan MithrilMan changed the title [Performance] [Performance] kokoro onnx performance issues Jan 15, 2025
@thewh1teagle
Copy link

I experience the same issue.

Reproduce:

"""
pip install kokoro-onnx soundfile
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json
ONNX_PROVIDER=CoreMLExecutionProvider LOG_LEVEL=DEBUG uv run main.py
ONNX_PROVIDER=CPUExecutionProvider LOG_LEVEL=DEBUG uv run main.py
"""

import soundfile as sf
from kokoro_onnx import Kokoro

kokoro = Kokoro("kokoro-v0_19.onnx", "voices.json")
samples, sample_rate = kokoro.create(
    "Hello. This audio generated by kokoro!", voice="af_sarah", speed=1.0, lang="en-us"
)
sf.write("audio.wav", samples, sample_rate)
print("Created audio.wav")

@skottmckay
Copy link
Contributor

I don't know what Kokoro does around onnxruntime, but typically you should create the inference session and re-use it. Session creation is expensive, as is the first inference. Any performance measurement should be done from the 2nd inference on to be meaningful.

@thewh1teagle
Copy link

I don't know what Kokoro does around onnxruntime, but typically you should create the inference session and re-use it. Session creation is expensive, as is the first inference. Any performance measurement should be done from the 2nd inference on to be meaningful.

I create the session only once. There's option to add while loop in the line with kokoro.create

@skottmckay
Copy link
Contributor

There's a lot happening inside create besides the ORT run call. Have you measured the time for just the call to ORT run and compared it to the pytorch time for only that work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api:CSharp issues related to the C# API .NET Pull requests that update .net code performance issues related to performance regressions
Projects
None yet
Development

No branches or pull requests

3 participants