Skip to content

hexgrad/kokoro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kokoro

This WIP repository is intended to be an inference library for https://hf.co/hexgrad/Kokoro-82M

It is under construction and likely will not be useful until the next base model release.

The goal is to be able to pip install kokoro and offer some of the design goals and functionalities laid out below.

G2P will be imported from Misaki

Misaki is a G2P engine with language-specific solutions:

pip install misaki[en] # installs English
pip install misaki[ja] # installs Japanese

Users who don't peek under the hood may not care, since import kokoro will simply import misaki and life goes on. This is likely the proper separation of responsibilities, and not all users will want or need all languages.

Smarter LF chunking

Kokoro models have a 512 token context window, which usually amounts to about 30 seconds of audio. Finding natural stopping points in your text to chop is key to smooth long-form (LF) generation, which should be much easier with token-level traces in misaki[en] (hopefully other languages to follow).

Cleaner modeling code

The modeling code could benefit from a touch-up and as a side effect, become ONNX exportable and hopefully slightly faster.

Experimental features (TBD)

Today, voicepacks are essentially (510, 256)-shaped tensors, compiled as average styles per utterance length, with 510 possible lengths. Since most style vectors are computed on synthetic data, each style is essentially a "mean of means", which may explain why the voices are somewhat flat-sounding. It also implies that for any given utterance, currently the only features being used to choose how the voice sounds are (1) the user-selected voice name, like af and (2) the length of the utterance. Features like the punctuation texture .?! or the text sentiment are not yet being used. Potential solutions could be neural or even classical, e.g. using vector DBs. This, among other things, is still an area of research.

Community contributions welcome

Within a couple weeks of Kokoro's Christmas 2024 release, talented people already built great things. If you want to build something, go for it! Kokoro is permissive Apache-licensed software. If you also want to add or improve something here (or misaki), hopefully Kokoro can earn your commit, and feel free to open a PR if so.