Export kokoro to sherpa-onnx #1713

csukuangfj · 2025-01-15T08:46:43Z

Even though kokoro TTS claims it has only about 82M parameters,
it is by far the LARGEST model we have in sherpa-onnx.

It is also the SLOWEST TTS model on CPU in sherpa-onnx

For the model kokoro-v0_19.onnx, it throws the following error for onnxruntime 1.16.3

  File "/home/runner/work/sherpa-onnx/sherpa-onnx/scripts/kokoro/./test.py", line 60, in show
    sess = ort.InferenceSession(filename, session_opts)
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 452, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from kokoro-v0_19.onnx failed:/onnxruntime_src/onnxruntime/core/graph/model_load_utils.h:46 void onnxruntime::model_load_utils::ValidateOpsetForDomain(const std::unordered_map<std::basic_string<char>, int>&, const onnxruntime::logging::Logger&, bool, const string&, int) ONNX Runtime only *guarantees* support for models stamped with official released onnx opset versions. Opset 20 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain ai.onnx is till opset 19

For the model kokoro-quant.onnx, it throws the following error for onnxruntime 1.16.3

    sess = ort.InferenceSession(filename, session_opts)
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 472, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from kokoro-quant.onnx failed:/onnxruntime_src/onnxruntime/core/graph/model_load_utils.h:46 void onnxruntime::model_load_utils::ValidateOpsetForDomain(const std::unordered_map<std::__cxx11::basic_string<char>, int>&, const onnxruntime::logging::Logger&, bool, const std::string&, int) ONNX Runtime only *guarantees* support for models stamped with official released onnx opset versions. Opset 5 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain ai.onnx.ml is till opset 4.

Test results on my mac book pro (CPU, 1 thread)

(py38) fangjuns-MacBook-Pro:kokoro fangjun$ python3 ./test.py  --model ./kokoro-v0_19.onnx --voices-bin ./voices.bin  --tokens ./tokens.txt
{'model': './kokoro-v0_19.onnx', 'voices_bin': './voices.bin', 'tokens': './tokens.txt'}
NodeArg(name='tokens', type='tensor(int64)', shape=[1, 'tokens1'])
NodeArg(name='style', type='tensor(float)', shape=[1, 256])
NodeArg(name='speed', type='tensor(float)', shape=[1])
-----
NodeArg(name='audio', type='tensor(float)', shape=['audio0'])
{'voice': 'en-us', 'sample_rate': '24000', 'see_also': 'https://huggingface.co/spaces/hexgrad/Kokoro-TTS', 'language': 'English', 'model_url': 'https
://github.com/thewh1teagle/kokoro-onnx/releases/tag/model-files', 'has_espeak': '1', 'maintainer': 'k2-fsa', 'version': '1', 'n_speakers': '11', 'mod
el_type': 'kokoro', 'style_dim': '511,1,256', 'speaker_names': 'af,af_bella,af_nicole,af_sarah,af_sky,am_adam,am_michael,bf_emma,bf_isabella,bm_georg
e,bm_lewis', 'see_also_2': 'https://huggingface.co/hexgrad/Kokoro-82M'}
embedding.shape (11, 511, 1, 256)
['af', 'af_bella', 'af_nicole', 'af_sarah', 'af_sky', 'am_adam', 'am_michael', 'bf_emma', 'bf_isabella', 'bm_george', 'bm_lewis']
Testing 1/11 - af/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-af.wav
 Elapsed seconds: 12.287
 Audio duration in seconds: 14.250
 RTF: 12.287/14.250 = 0.862
Testing 2/11 - af_bella/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-af_bella.wav
 Elapsed seconds: 12.018
 Audio duration in seconds: 14.100
 RTF: 12.018/14.100 = 0.852
Testing 3/11 - af_nicole/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-af_nicole.wav
 Elapsed seconds: 18.522
 Audio duration in seconds: 22.150
 RTF: 18.522/22.150 = 0.836
Testing 4/11 - af_sarah/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-af_sarah.wav
 Elapsed seconds: 12.553
 Audio duration in seconds: 14.475
 RTF: 12.553/14.475 = 0.867
Testing 5/11 - af_sky/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-af_sky.wav
 Elapsed seconds: 11.016
 Audio duration in seconds: 13.100
 RTF: 11.016/13.100 = 0.841
Testing 6/11 - am_adam/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-am_adam.wav
 Elapsed seconds: 11.008
 Audio duration in seconds: 13.150
 RTF: 11.008/13.150 = 0.837
Testing 7/11 - am_michael/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-am_michael.wav
 Elapsed seconds: 13.117
 Audio duration in seconds: 15.800
 RTF: 13.117/15.800 = 0.830
Testing 8/11 - bf_emma/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-bf_emma.wav
 Elapsed seconds: 11.194
 Audio duration in seconds: 13.325
 RTF: 11.194/13.325 = 0.840
Testing 9/11 - bf_isabella/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-bf_isabella.wav
 Elapsed seconds: 10.854
 Audio duration in seconds: 12.475
 RTF: 10.854/12.475 = 0.870
Testing 10/11 - bm_george/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-bm_george.wav
 Elapsed seconds: 12.616
 Audio duration in seconds: 14.900
 RTF: 12.616/14.900 = 0.847
Testing 11/11 - bm_lewis/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-bm_lewis.wav
 Elapsed seconds: 12.666
 Audio duration in seconds: 14.775
 RTF: 12.666/14.775 = 0.857

My mac info

thewh1teagle · 2025-01-15T15:30:26Z

@csukuangfj

Thanks!
I have the same issues with the speed. in Pytorch it works faster usually

MithrilMan · 2025-01-15T16:06:10Z

I've seen there is https://www.ui.perfetto.dev/ useful to investigate performance issues
I'm attaching a tracing of 3 inferences in a row, I don't have the skill to understand what's the problem
I've used CPU without any other SessionOptions specified, in c# with onnxruntime

onnxruntime_profile__2025-01-15_15-22-14.zip

using the query select name, (dur/1000000) as ms, ts from slice where parent_id=3 AND category = 'Node' order by dur desc where 3 is the slice id SequentialExecuter of a single inference run, that i used to filter infos of a specific inference run (I don't know if there was a better way to get it) I was able to sort the nodes execution by time spent but I'm not able to go further because I don't know how to evaluate these timings with expected ones

I'd like someone to point me out to a resource about how to detect bottlenecks of a model, or someone who have the skill to help with the issue

thewh1teagle · 2025-01-16T02:18:50Z

@csukuangfj

See microsoft/onnxruntime#23384

csukuangfj added 4 commits January 15, 2025 14:33

Export kokoro TTS models to sherpa-onnx.

5e7ebb0

Install Python deps

069a914

minor fixes

dce79e0

Fix typos

7bc7018

csukuangfj merged commit 9efe26a into k2-fsa:master Jan 15, 2025

csukuangfj deleted the export-kokoro branch January 15, 2025 08:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export kokoro to sherpa-onnx #1713

Export kokoro to sherpa-onnx #1713

csukuangfj commented Jan 15, 2025

thewh1teagle commented Jan 15, 2025

MithrilMan commented Jan 15, 2025 •

edited

Loading

thewh1teagle commented Jan 16, 2025

Export kokoro to sherpa-onnx #1713

Export kokoro to sherpa-onnx #1713

Conversation

csukuangfj commented Jan 15, 2025

Test results on my mac book pro (CPU, 1 thread)

thewh1teagle commented Jan 15, 2025

MithrilMan commented Jan 15, 2025 • edited Loading

thewh1teagle commented Jan 16, 2025

MithrilMan commented Jan 15, 2025 •

edited

Loading