Feature Request: fallback to CPU if OOM reached #85

thiswillbeyourgithub · 2024-09-15T17:28:05Z

Hi,

In my setup, sometimes a user can trigger ollama pretty hard and use a lot of VRAM, making whisper unavailable during the keep alive of ollama.

It's very annoying to lose the audio in that case. I think it would be better to always return the transcription even if more slowly, so I'm suggesting adding an env variable to add the CPU backend as a fallback if the cuda one encountered an OOM error.

What do you think?

Edit: alternatively, specify list of fallback models maybe? Like if loading large-v3 failed, try with base

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: fallback to CPU if OOM reached #85

Feature Request: fallback to CPU if OOM reached #85

thiswillbeyourgithub commented Sep 15, 2024 •

edited

Loading

Feature Request: fallback to CPU if OOM reached #85

Feature Request: fallback to CPU if OOM reached #85

Comments

thiswillbeyourgithub commented Sep 15, 2024 • edited Loading

thiswillbeyourgithub commented Sep 15, 2024 •

edited

Loading