-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repeated Word Hallucination in Transcription Output #134
Comments
added logging in stt.py and got repeated words in transcribe_file: segments, transcription_info = whisper.transcribe(
file.file,
task=Task.TRANSCRIBE,
language=language,
initial_prompt=prompt,
word_timestamps="word" in timestamp_granularities,
temperature=temperature,
vad_filter=vad_filter,
hotwords=hotwords,
) @fedirz any thoughts or updates about this issue? |
Same issue here -- tried to transcribe an hour-long lecture and got the repetition of about 4 words a few hundred times until the end of the file. |
This is likely an Does setting |
@fedirz I don’t think it’s faster-whisper, because i could not reproduce this issue with only faster-whisper framework. But while debugging faster-whisper-server I found out, that issue comes from stt.py: so issue may come from faster-whisper, but it’s not model problem, maybe file is somehow broken or splitted incorrectly. |
Is there some kind of silence of more than 30s around the location of the repeated words? |
No, I’ve tested it with different files |
the same issue with few languages. its happened on any model. I tried with russian and english. most often seen on files 5 min and more |
What temperature have you used? Have you tried lowering it? |
When running the Whisper model using the faster-whisper-server Docker container, I encounter a transcription issue where the output begins to “hallucinate” after a certain word. The model continuously repeats this word until the end of the transcription output, as shown below:
"Если не вакоеска, то паралляма сейчас обучаем, потому что, ну, это надо прямочень хорошо качать, чтобы шмуф, ну, как бы сейчас вот будет, если люди много нету, то, ну, как бы, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ..."
This problem appears across all models, but it is more pronounced depending on model size and audio file length. For example, the hallucination begins with files around 1-2 MB in size when using the medium model, but with larger files when using the small model. Tested on foreign language.
And this issue does not occur when I run the model directly via the faster-whisper Python library. Below are the details of how I am running the server and using the model in both contexts.
start server:
server client run:
direct run whith faster-whisper framework on same vm and gpu:
I guess the problem might be in how large files are splitted into pieces before being fed into the model, if the splitting tool is not taken from the framework
Thank you!
The text was updated successfully, but these errors were encountered: