Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export kokoro to sherpa-onnx #1713

Merged
merged 4 commits into from
Jan 15, 2025
Merged

Conversation

csukuangfj
Copy link
Collaborator

CC @thewh1teagle @giannik

Even though kokoro TTS claims it has only about 82M parameters,
it is by far the LARGEST model we have in sherpa-onnx.

It is also the SLOWEST TTS model on CPU in sherpa-onnx


For the model kokoro-v0_19.onnx, it throws the following error for onnxruntime 1.16.3

  File "/home/runner/work/sherpa-onnx/sherpa-onnx/scripts/kokoro/./test.py", line 60, in show
    sess = ort.InferenceSession(filename, session_opts)
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 452, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from kokoro-v0_19.onnx failed:/onnxruntime_src/onnxruntime/core/graph/model_load_utils.h:46 void onnxruntime::model_load_utils::ValidateOpsetForDomain(const std::unordered_map<std::basic_string<char>, int>&, const onnxruntime::logging::Logger&, bool, const string&, int) ONNX Runtime only *guarantees* support for models stamped with official released onnx opset versions. Opset 20 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain ai.onnx is till opset 19

For the model kokoro-quant.onnx, it throws the following error for onnxruntime 1.16.3

    sess = ort.InferenceSession(filename, session_opts)
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 472, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from kokoro-quant.onnx failed:/onnxruntime_src/onnxruntime/core/graph/model_load_utils.h:46 void onnxruntime::model_load_utils::ValidateOpsetForDomain(const std::unordered_map<std::__cxx11::basic_string<char>, int>&, const onnxruntime::logging::Logger&, bool, const std::string&, int) ONNX Runtime only *guarantees* support for models stamped with official released onnx opset versions. Opset 5 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain ai.onnx.ml is till opset 4.

Test results on my mac book pro (CPU, 1 thread)

(py38) fangjuns-MacBook-Pro:kokoro fangjun$ python3 ./test.py  --model ./kokoro-v0_19.onnx --voices-bin ./voices.bin  --tokens ./tokens.txt
{'model': './kokoro-v0_19.onnx', 'voices_bin': './voices.bin', 'tokens': './tokens.txt'}
NodeArg(name='tokens', type='tensor(int64)', shape=[1, 'tokens1'])
NodeArg(name='style', type='tensor(float)', shape=[1, 256])
NodeArg(name='speed', type='tensor(float)', shape=[1])
-----
NodeArg(name='audio', type='tensor(float)', shape=['audio0'])
{'voice': 'en-us', 'sample_rate': '24000', 'see_also': 'https://huggingface.co/spaces/hexgrad/Kokoro-TTS', 'language': 'English', 'model_url': 'https
://github.com/thewh1teagle/kokoro-onnx/releases/tag/model-files', 'has_espeak': '1', 'maintainer': 'k2-fsa', 'version': '1', 'n_speakers': '11', 'mod
el_type': 'kokoro', 'style_dim': '511,1,256', 'speaker_names': 'af,af_bella,af_nicole,af_sarah,af_sky,am_adam,am_michael,bf_emma,bf_isabella,bm_georg
e,bm_lewis', 'see_also_2': 'https://huggingface.co/hexgrad/Kokoro-82M'}
embedding.shape (11, 511, 1, 256)
['af', 'af_bella', 'af_nicole', 'af_sarah', 'af_sky', 'am_adam', 'am_michael', 'bf_emma', 'bf_isabella', 'bm_george', 'bm_lewis']
Testing 1/11 - af/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-af.wav
 Elapsed seconds: 12.287
 Audio duration in seconds: 14.250
 RTF: 12.287/14.250 = 0.862
Testing 2/11 - af_bella/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-af_bella.wav
 Elapsed seconds: 12.018
 Audio duration in seconds: 14.100
 RTF: 12.018/14.100 = 0.852
Testing 3/11 - af_nicole/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-af_nicole.wav
 Elapsed seconds: 18.522
 Audio duration in seconds: 22.150
 RTF: 18.522/22.150 = 0.836
Testing 4/11 - af_sarah/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-af_sarah.wav
 Elapsed seconds: 12.553
 Audio duration in seconds: 14.475
 RTF: 12.553/14.475 = 0.867
Testing 5/11 - af_sky/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-af_sky.wav
 Elapsed seconds: 11.016
 Audio duration in seconds: 13.100
 RTF: 11.016/13.100 = 0.841
Testing 6/11 - am_adam/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-am_adam.wav
 Elapsed seconds: 11.008
 Audio duration in seconds: 13.150
 RTF: 11.008/13.150 = 0.837
Testing 7/11 - am_michael/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-am_michael.wav
 Elapsed seconds: 13.117
 Audio duration in seconds: 15.800
 RTF: 13.117/15.800 = 0.830
Testing 8/11 - bf_emma/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-bf_emma.wav
 Elapsed seconds: 11.194
 Audio duration in seconds: 13.325
 RTF: 11.194/13.325 = 0.840
Testing 9/11 - bf_isabella/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-bf_isabella.wav
 Elapsed seconds: 10.854
 Audio duration in seconds: 12.475
 RTF: 10.854/12.475 = 0.870
Testing 10/11 - bm_george/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-bm_george.wav
 Elapsed seconds: 12.616
 Audio duration in seconds: 14.900
 RTF: 12.616/14.900 = 0.847
Testing 11/11 - bm_lewis/./kokoro-v0_19.onnx
 Saved to kokoro-v0_19-bm_lewis.wav
 Elapsed seconds: 12.666
 Audio duration in seconds: 14.775
 RTF: 12.666/14.775 = 0.857

My mac info
Screenshot 2025-01-15 at 15 11 42

@csukuangfj csukuangfj merged commit 9efe26a into k2-fsa:master Jan 15, 2025
@csukuangfj csukuangfj deleted the export-kokoro branch January 15, 2025 08:49
@thewh1teagle
Copy link
Contributor

@csukuangfj

Thanks!
I have the same issues with the speed. in Pytorch it works faster usually

@MithrilMan
Copy link

MithrilMan commented Jan 15, 2025

I've seen there is https://www.ui.perfetto.dev/ useful to investigate performance issues
I'm attaching a tracing of 3 inferences in a row, I don't have the skill to understand what's the problem
I've used CPU without any other SessionOptions specified, in c# with onnxruntime

onnxruntime_profile__2025-01-15_15-22-14.zip

using the query select name, (dur/1000000) as ms, ts from slice where parent_id=3 AND category = 'Node' order by dur desc where 3 is the slice id SequentialExecuter of a single inference run, that i used to filter infos of a specific inference run (I don't know if there was a better way to get it) I was able to sort the nodes execution by time spent but I'm not able to go further because I don't know how to evaluate these timings with expected ones

I'd like someone to point me out to a resource about how to detect bottlenecks of a model, or someone who have the skill to help with the issue

@thewh1teagle
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants