This WIP repository is intended to be an inference library for https://hf.co/hexgrad/Kokoro-82M
It is under construction and likely will not be useful until the next base model release.
The goal is to be able to pip install kokoro
and offer some of the design goals and functionalities laid out below.
Misaki is a G2P engine with language-specific solutions:
pip install misaki[en] # installs English
pip install misaki[ja] # installs Japanese
Users who don't peek under the hood may not care, since import kokoro
will simply import misaki
and life goes on. This is likely the proper separation of responsibilities, and not all users will want or need all languages.
Kokoro models have a 512 token context window, which usually amounts to about 30 seconds of audio. Finding natural stopping points in your text to chop is key to smooth long-form (LF) generation, which should be much easier with token-level traces in misaki[en]
(hopefully other languages to follow).
The modeling code could benefit from a touch-up and as a side effect, become ONNX exportable and hopefully slightly faster.
Today, voicepacks are essentially (510, 256)
-shaped tensors, compiled as average styles per utterance length, with 510 possible lengths. Since most style vectors are computed on synthetic data, each style is essentially a "mean of means", which may explain why the voices are somewhat flat-sounding. It also implies that for any given utterance, currently the only features being used to choose how the voice sounds are (1) the user-selected voice name, like af
and (2) the length of the utterance. Features like the punctuation texture .?!
or the text sentiment are not yet being used. Potential solutions could be neural or even classical, e.g. using vector DBs. This, among other things, is still an area of research.
Within a couple weeks of Kokoro's Christmas 2024 release, talented people already built great things. If you want to build something, go for it! Kokoro is permissive Apache-licensed software. If you also want to add or improve something here (or misaki), hopefully Kokoro can earn your commit, and feel free to open a PR if so.