[Bug]: OOM occurs when the NPU is used with Deepseek-R1 and INT4 quantization #29095
Open
3 tasks done
Labels
bug
Something isn't working
category: NPU
OpenVINO NPU plugin
category: NPUW
NPUW plugin
support_request
OpenVINO Version
2025.0
Operating System
Other (Please specify in description)
Device used for inference
NPU
Framework
None
Model used
DeepSeek-R1-Distill-Qwen-14B/INT4-NPU_compressed_weights
Issue description
I'm trying to run the following example from the OpenVINO notebook using an NPU:
https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/deepseek-r1
When attempting to load the DeepSeek-R1-Distill-Qwen-14B with INT4-NPU model on the NPU, I encounter an OOM (Out of Memory) error.
However, the same model runs correctly on a CPU, suggesting that the available memory should be sufficient.
My configuration: Ubuntu 24.04.2 LTS, Intel® Core™ Ultra 7 265KF, 64GB.
Step-by-step reproduction
Just run the DeepSeek-R1 example from the notebook, select the DeepSeek-R1-Distill-Qwen-14B model and NPU as the device, while keeping the rest of the settings unchanged.
Relevant log output
Issue submission checklist
The text was updated successfully, but these errors were encountered: