Live Transcription using websocket doest not work. #111

ivoryguard · 2024-10-10T12:20:54Z

Hello.

After creating a docker container following the tutorial video and readme, I tried Live Transcription of microphone input using ffmpeg, but it didn't work properly.

After checking the docker container logs and running the faster-whisper-server source locally with modifications, I confirmed that a TimeoutError exception was occurring in the audio_receiver of stt.py in faster-whisper-server.

Additionally, it seems that the websocket connection is forcibly disconnected in some lines of the stt.py code, including the execution of the logger.info(f"Not enough speech in the last {config.inactivity_window_seconds} seconds.") code.

Recognition of local wav files, which is not Live Transcription using websocket, worked normally, and sending wav files (not microphone input) to "/v1/audio/translations" via websocket also worked normally.

I'm using Windows 10 Pro, so I ran faster-whisper-server with WSL + Docker Desktop and sent the microphone input stream to /v1/audio/translations using ffmpeg for Windows.

Here's how to reproduce:

Download ffmpeg and websocat for Windows.
Copy the websocat executable to the ffmpeg\bin folder, renaming it to websocat.exe.
Run Live Transcription with the following command:

ffmpeg.exe -f dshow -i audio="Your Mic Input Device Name" -ac 1 -ar 16k -acodec pcm_s16le -f wav - | websocat --binary ws://localhost:8000/v1/audio/transcriptions

The client side will display the following error log:

websocat: WebSocketError: I/O failureate= 255.4kbits/s speed=1.05x
websocat: error running
av_interleaved_write_frame(): Invalid argument5kbits/s speed=1.04x
Error writing trailer of pipe:: Invalid argument

The faster-whisper-server side logs the following:

2024-10-10 20:51:32 faster-whisper-server-cuda-1  | INFO:     ('172.18.0.1', 53050) - "WebSocket /v1/audio/transcriptions?response_format=json" [accepted]
2024-10-10 20:51:32 faster-whisper-server-cuda-1  | INFO:     connection open
2024-10-10 20:51:33 faster-whisper-server-cuda-1  | 2024-10-10 11:51:33,684:INFO:faster_whisper_server.routers.stt:audio_receiver:No data received in 1.0 seconds. Closing the connection.
2024-10-10 20:51:33 faster-whisper-server-cuda-1  | 2024-10-10 11:51:33,684:INFO:faster_whisper_server.audio:close:AudioStream closed
2024-10-10 20:51:33 faster-whisper-server-cuda-1  | 2024-10-10 11:51:33,685:INFO:faster_whisper_server.transcriber:audio_transcriber:Audio transcriber finished
2024-10-10 20:51:33 faster-whisper-server-cuda-1  | 2024-10-10 11:51:33,685:INFO:faster_whisper_server.model_manager:_decrement_ref:Model Systran/faster-whisper-large-v3 is idle, scheduling offload in 300s
2024-10-10 20:51:33 faster-whisper-server-cuda-1  | 2024-10-10 11:51:33,685:INFO:faster_whisper_server.routers.stt:transcribe_stream:Closing the connection.

The following command works normally, and it means that my computer's settings and faster-whisper-server settings are correct:

ffmpeg.exe -i audio.wav -ac 1 -ar 16k -acodec pcm_s16le -f wav - | websocat --binary ws://localhost:8000/v1/audio/transcriptions

I set up a local python development environment for faster-whisper-server and then ran it with VS Code after changing the following setting values in config.py, which made Live Transcription work for a while:

max_no_data_seconds: float = 10.0
min_duration: float = 5.0
max_inactivity_seconds: float = 5
inactivity_window_seconds: float = 6.0

The min_duration setting was the most important, and without changing it, live transcription always failed.

After various tests, I came to the following conclusions:

Transcription of audio files works normally in any method.
Live Transcription of microphone input doesn't work properly because the websocket connection is forcibly disconnected on the server side during the data validity processing stage of the data received by the WebSocket.
Simply increasing some setting values doesn't make Live Transcription work properly.
I tested with both cpu and gpu, but the issue was same.
I think that running Live Transcription on multi-core CPUs may be possible because the faster whisper-large-v3-turbo model is available.

Here are some of my opinions:

Setting min_duration to about 5 seconds seems to improve performance and accuracy of recognized audio. I think accumulating and recognizing audio buffers in 1-second units is too short.
For non-English languages like Korean, Japanese, and Chinese, recognizing too short audio buffers causes the accumulated recognition result text to keep changing, which is not good.
You can create the effect of a native speaker of that country's language by running YouTube on a smartphone, playing any Korean or Japanese news video, and placing the smartphone next to the microphone.
When I got Live Transcription to work for a while by changing the settings, I could see that the initially recognized text from the audio input kept accumulating for a long time (which could also be seen in your demo mp4). I suspect that if you do Live Transcription of long microphone input for more than 30 minutes, the returned recognition result will become extremely long.
It would be better if the recognition results were returned separated by sentences as much as possible, such as the transcription results of audio files.

Best regards.

The text was updated successfully, but these errors were encountered:

soloHeroo · 2024-11-02T17:27:12Z

这个问题还存在

RAPHCVR · 2024-11-10T22:17:31Z

Indeed

gdomod · 2024-11-14T09:36:30Z

i would also test it with 8 hours live stream, the response is after few minutes too long.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Live Transcription using websocket doest not work. #111

Live Transcription using websocket doest not work. #111

ivoryguard commented Oct 10, 2024 •

edited

Loading

soloHeroo commented Nov 2, 2024 •

edited

Loading

RAPHCVR commented Nov 10, 2024

gdomod commented Nov 14, 2024

Live Transcription using websocket doest not work. #111

Live Transcription using websocket doest not work. #111

Comments

ivoryguard commented Oct 10, 2024 • edited Loading

soloHeroo commented Nov 2, 2024 • edited Loading

RAPHCVR commented Nov 10, 2024

gdomod commented Nov 14, 2024

ivoryguard commented Oct 10, 2024 •

edited

Loading

soloHeroo commented Nov 2, 2024 •

edited

Loading