Repeated Word Hallucination in Transcription Output #134

dailydaniel · 2024-10-29T14:25:03Z

When running the Whisper model using the faster-whisper-server Docker container, I encounter a transcription issue where the output begins to “hallucinate” after a certain word. The model continuously repeats this word until the end of the transcription output, as shown below:

"Если не вакоеска, то паралляма сейчас обучаем, потому что, ну, это надо прямочень хорошо качать, чтобы шмуф, ну, как бы сейчас вот будет, если люди много нету, то, ну, как бы, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ..."

This problem appears across all models, but it is more pronounced depending on model size and audio file length. For example, the hallucination begins with files around 1-2 MB in size when using the medium model, but with larger files when using the small model. Tested on foreign language.

And this issue does not occur when I run the model directly via the faster-whisper Python library. Below are the details of how I am running the server and using the model in both contexts.

start server:

docker run -it -d --gpus "device=0" \
 -v ~/.cache/huggingface:/root/.cache/huggingface \
 -p 3004:8000 \
 --name faster-whisper \
 --restart unless-stopped \
 fedirz/faster-whisper-server:latest-cuda

server client run:

model = "Systran/faster-whisper-small"
transcript = client.audio.transcriptions.create(
    model=model, file=audio_file
)

direct run whith faster-whisper framework on same vm and gpu:

from faster_whisper import WhisperModel

os.environ["CUDA_VISIBLE_DEVICES"] = device_id
model = WhisperModel(model_size, device="cuda")
segments, info = model.transcribe(input_path)

I guess the problem might be in how large files are splitted into pieces before being fed into the model, if the splitting tool is not taken from the framework

Thank you!

dailydaniel · 2024-11-08T01:52:31Z

added logging in stt.py and got repeated words in transcribe_file:

segments, transcription_info = whisper.transcribe(
            file.file,
            task=Task.TRANSCRIBE,
            language=language,
            initial_prompt=prompt,
            word_timestamps="word" in timestamp_granularities,
            temperature=temperature,
            vad_filter=vad_filter,
            hotwords=hotwords,
        )

@fedirz any thoughts or updates about this issue?

D3alWyth1T · 2024-11-13T16:16:34Z

Same issue here -- tried to transcribe an hour-long lecture and got the repetition of about 4 words a few hundred times until the end of the file.

fedirz · 2024-11-14T02:55:48Z

This is likely an faster-whisper / models themselves. I can look further into this if someone provides an English audio sample which a medium or large model hallucinates on. openai/whisper#679

Does setting ?vad_filter=true help?

dailydaniel · 2024-11-14T03:05:03Z

@fedirz I don’t think it’s faster-whisper, because i could not reproduce this issue with only faster-whisper framework. But while debugging faster-whisper-server I found out, that issue comes from stt.py:
whisper.transcribe(…

so issue may come from faster-whisper, but it’s not model problem, maybe file is somehow broken or splitted incorrectly.

thiswillbeyourgithub · 2024-11-14T06:24:19Z

Is there some kind of silence of more than 30s around the location of the repeated words?

dailydaniel · 2024-11-14T08:05:46Z

No, I’ve tested it with different files

mkaskov · 2024-11-14T08:15:06Z

the same issue with few languages. its happened on any model. I tried with russian and english. most often seen on files 5 min and more

thiswillbeyourgithub · 2024-11-14T12:15:18Z

What temperature have you used? Have you tried lowering it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeated Word Hallucination in Transcription Output #134

Repeated Word Hallucination in Transcription Output #134

dailydaniel commented Oct 29, 2024

dailydaniel commented Nov 8, 2024

D3alWyth1T commented Nov 13, 2024

fedirz commented Nov 14, 2024 •

edited

Loading

dailydaniel commented Nov 14, 2024

thiswillbeyourgithub commented Nov 14, 2024

dailydaniel commented Nov 14, 2024

mkaskov commented Nov 14, 2024

thiswillbeyourgithub commented Nov 14, 2024

Repeated Word Hallucination in Transcription Output #134

Repeated Word Hallucination in Transcription Output #134

Comments

dailydaniel commented Oct 29, 2024

dailydaniel commented Nov 8, 2024

D3alWyth1T commented Nov 13, 2024

fedirz commented Nov 14, 2024 • edited Loading

dailydaniel commented Nov 14, 2024

thiswillbeyourgithub commented Nov 14, 2024

dailydaniel commented Nov 14, 2024

mkaskov commented Nov 14, 2024

thiswillbeyourgithub commented Nov 14, 2024

fedirz commented Nov 14, 2024 •

edited

Loading