Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated Word Hallucination in Transcription Output #134

Open
dailydaniel opened this issue Oct 29, 2024 · 8 comments
Open

Repeated Word Hallucination in Transcription Output #134

dailydaniel opened this issue Oct 29, 2024 · 8 comments

Comments

@dailydaniel
Copy link

When running the Whisper model using the faster-whisper-server Docker container, I encounter a transcription issue where the output begins to “hallucinate” after a certain word. The model continuously repeats this word until the end of the transcription output, as shown below:

"Если не вакоеска, то паралляма сейчас обучаем, потому что, ну, это надо прямочень хорошо качать, чтобы шмуф, ну, как бы сейчас вот будет, если люди много нету, то, ну, как бы, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ну, ..."

This problem appears across all models, but it is more pronounced depending on model size and audio file length. For example, the hallucination begins with files around 1-2 MB in size when using the medium model, but with larger files when using the small model. Tested on foreign language.

And this issue does not occur when I run the model directly via the faster-whisper Python library. Below are the details of how I am running the server and using the model in both contexts.

start server:

docker run -it -d --gpus "device=0" \
 -v ~/.cache/huggingface:/root/.cache/huggingface \
 -p 3004:8000 \
 --name faster-whisper \
 --restart unless-stopped \
 fedirz/faster-whisper-server:latest-cuda

server client run:

model = "Systran/faster-whisper-small"
transcript = client.audio.transcriptions.create(
    model=model, file=audio_file
)

direct run whith faster-whisper framework on same vm and gpu:

from faster_whisper import WhisperModel

os.environ["CUDA_VISIBLE_DEVICES"] = device_id
model = WhisperModel(model_size, device="cuda")
segments, info = model.transcribe(input_path)

I guess the problem might be in how large files are splitted into pieces before being fed into the model, if the splitting tool is not taken from the framework

Thank you!

@dailydaniel
Copy link
Author

added logging in stt.py and got repeated words in transcribe_file:

segments, transcription_info = whisper.transcribe(
            file.file,
            task=Task.TRANSCRIBE,
            language=language,
            initial_prompt=prompt,
            word_timestamps="word" in timestamp_granularities,
            temperature=temperature,
            vad_filter=vad_filter,
            hotwords=hotwords,
        )

@fedirz any thoughts or updates about this issue?

@D3alWyth1T
Copy link

Same issue here -- tried to transcribe an hour-long lecture and got the repetition of about 4 words a few hundred times until the end of the file.

@fedirz
Copy link
Owner

fedirz commented Nov 14, 2024

This is likely an faster-whisper / models themselves. I can look further into this if someone provides an English audio sample which a medium or large model hallucinates on. openai/whisper#679

Does setting ?vad_filter=true help?

@dailydaniel
Copy link
Author

@fedirz I don’t think it’s faster-whisper, because i could not reproduce this issue with only faster-whisper framework. But while debugging faster-whisper-server I found out, that issue comes from stt.py:
whisper.transcribe(…

so issue may come from faster-whisper, but it’s not model problem, maybe file is somehow broken or splitted incorrectly.

@thiswillbeyourgithub
Copy link
Contributor

Is there some kind of silence of more than 30s around the location of the repeated words?

@dailydaniel
Copy link
Author

No, I’ve tested it with different files

@mkaskov
Copy link

mkaskov commented Nov 14, 2024

the same issue with few languages. its happened on any model. I tried with russian and english. most often seen on files 5 min and more

@thiswillbeyourgithub
Copy link
Contributor

What temperature have you used? Have you tried lowering it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants