You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my setup, sometimes a user can trigger ollama pretty hard and use a lot of VRAM, making whisper unavailable during the keep alive of ollama.
It's very annoying to lose the audio in that case. I think it would be better to always return the transcription even if more slowly, so I'm suggesting adding an env variable to add the CPU backend as a fallback if the cuda one encountered an OOM error.
What do you think?
Edit: alternatively, specify list of fallback models maybe? Like if loading large-v3 failed, try with base
The text was updated successfully, but these errors were encountered:
Hi,
In my setup, sometimes a user can trigger ollama pretty hard and use a lot of VRAM, making whisper unavailable during the keep alive of ollama.
It's very annoying to lose the audio in that case. I think it would be better to always return the transcription even if more slowly, so I'm suggesting adding an env variable to add the CPU backend as a fallback if the cuda one encountered an OOM error.
What do you think?
Edit: alternatively, specify list of fallback models maybe? Like if loading large-v3 failed, try with base
The text was updated successfully, but these errors were encountered: