Skip to content

egorsmkv/speech-recognition-uk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

🇺🇦 Speech Recognition & Synthesis for Ukrainian

Overview

This repository collects links to models, datasets, and tools for Ukrainian Speech-to-Text and Text-to-Speech.

Community

🎤 Speech-to-Text

📦 Implementations

wav2vec2-bert

wav2vec2

You can check demos out here: https://github.com/egorsmkv/wav2vec2-uk-demo

HuBERT

data2vec

Citrinet

ContextNet

FastConformer

Squeezeformer

Conformer-CTC

VOSK

Models: https://huggingface.co/Yehor/vosk-uk

DeepSpeech

M-CTC-T

whisper

Flashlight

📊 Benchmarks

This benchmark uses Common Voice 10 test split.

  • WER: Word Error Rate
  • CER: Character Error Rate

wav2vec2-bert

Model WER CER Accuracy (words)
Yehor/w2v-bert-uk (F16) 6.6% 1.34% 93.4%
Yehor/w2v-bert-uk-v2.1 (F16) 17.34% 3.33% 82.66%

wav2vec2

Model WER CER Accuracy (words)
Yehor/w2v-xls-r-uk 20.24% 3.64% 79.76%
robinhad/wav2vec2-xls-r-300m-uk 27.36% 5.37% 72.64%
arampacha/wav2vec2-xls-r-1b-uk 16.52% 2.93% 83.48%

HuBERT

Model WER CER Accuracy (words)
Yehor/hubert-uk (F16) 37.07% 6.87% 62.93%

Citrinet

Model WER CER Accuracy (words)
nvidia/stt_uk_citrinet_1024_gamma_0_25 4.32% 0.94% 95.68%
neongeckocom/stt_uk_citrinet_512_gamma_0_25 7.46% 1.6% 92.54%

ContextNet

Model WER CER Accuracy (words)
theodotus/stt_uk_contextnet_512 6.69% 1.45% 93.31%

FastConformer P&C

This model supports text punctuation and capitalization

Model WER CER Accuracy (words)
theodotus/stt_ua_fastconformer_hybrid_large_pc 4% 1.02% 96%

Squeezeformer

Model WER CER Accuracy (words)
theodotus/stt_uk_squeezeformer_ctc_xs 10.78% 2.29% 89.22%
theodotus/stt_uk_squeezeformer_ctc_sm 8.2% 1.75% 91.8%
theodotus/stt_uk_squeezeformer_ctc_ml 5.91% 1.26% 94.09%

Conformer-CTC

Model WER CER Accuracy (words)
taras-sereda/uk-pods-conformer 6.75% 1.41% 93.25%

Flashlight

Model WER CER Accuracy (words)
Flashlight Conformer 19.15% 2.44% 80.85%

data2vec

Model WER CER Accuracy (words)
robinhad/data2vec-large-uk 31.17% 7.31% 68.83%

VOSK

Model WER CER Accuracy (words)
v3 53.25% 38.78% 46.75%

m-ctc-t

Model WER CER Accuracy (words)
speechbrain/m-ctc-t-large 57% 10.94% 43%

whisper

Model WER CER Accuracy (words)
tiny 63.08% 18.59% 36.92%
base 52.1% 14.08% 47.9%
small 30.57% 7.64% 69.43%
medium 18.73% 4.4% 81.27%
large (v1) 16.42 3.93% 83.58%
large (v2) 13.72% 3.18% 86.28%
large (v3) 20.53% 5.28% 79.478%
turbo 22.83% 7.05% 77.17%

Fine-tuned version for Ukrainian:

Model WER CER Accuracy (words)
small 27.04% 5.65% 72.96%
large 24.82% 5.5% 75.18%

If you want to fine-tune a Whisper model on own data, then use this repository: https://github.com/egorsmkv/whisper-ukrainian

DeepSpeech

Model WER CER Accuracy (words)
v0.5 70.25% 20.09% 29.75%

📖 Development

📚 Datasets

Compiled dataset from different open sources + Companies + Community = 188.31GB / ~1200 hours 💪

Voice of America (398 hours)

FLEURS

Ukrainian broadcast

YODAS2

Companies

Ukrainian podcasts

Cleaned Common Voice 10 (test set)

Noised Common Voice 10

Community

Other

⭐ Related works

Language models

Inverse Text Normalization:

Text Enhancement

Aligners

📢 Text-to-Speech

Test sentence with stresses:

К+ам'ян+ець-Под+ільський - м+істо в Хмельн+ицькій +області Укра+їни, ц+ентр Кам'ян+ець-Под+ільської міськ+ої об'+єднаної територі+альної гром+ади +і Кам'ян+ець-Под+ільського рай+ону.

Without stresses:

Кам'янець-Подільський - місто в Хмельницькій області України, центр Кам'янець-Подільської міської об'єднаної територіальної громади і Кам'янець-Подільського району.

📦 Implementations

StyleTTS2

P-Flow TTS

audio.mp4

RAD-TTS

demo.mp4

Coqui TTS

tts_output.mp4

Neon TTS

neon_tts.mp4

FastPitch

Balacoon TTS

balacoon_tts.mp4

📚 Datasets

⭐ Related works

Accentors

Misc