[Bug]: OOM occurs when the NPU is used with Deepseek-R1 and INT4 quantization #29095

KDev-emb · 2025-02-20T15:20:08Z

OpenVINO Version

2025.0

Operating System

Other (Please specify in description)

Device used for inference

NPU

Framework

None

Model used

DeepSeek-R1-Distill-Qwen-14B/INT4-NPU_compressed_weights

Issue description

I'm trying to run the following example from the OpenVINO notebook using an NPU:
https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/deepseek-r1
When attempting to load the DeepSeek-R1-Distill-Qwen-14B with INT4-NPU model on the NPU, I encounter an OOM (Out of Memory) error.
However, the same model runs correctly on a CPU, suggesting that the available memory should be sufficient.
My configuration: Ubuntu 24.04.2 LTS, Intel® Core™ Ultra 7 265KF, 64GB.

Step-by-step reproduction

Just run the DeepSeek-R1 example from the notebook, select the DeepSeek-R1-Distill-Qwen-14B model and NPU as the device, while keeping the rest of the settings unchanged.

Relevant log output

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[8], line 6
      2 import sys
      4 print(f"Loading model from {model_dir}\n")
----> 6 pipe = ov_genai.LLMPipeline(str(model_dir), device.value)
      7 if "genai_chat_template" in model_configuration:
      8     pipe.get_tokenizer().set_chat_template(model_configuration["genai_chat_template"])

RuntimeError: Exception from src/inference/src/cpp/core.cpp:109:
Exception from src/inference/src/dev/plugin.cpp:53:
Exception from src/plugins/intel_npu/src/backend/src/zero_remote_tensor.cpp:101:
L0 zeMemAllocHost result: ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY, code 0x70000003 - insufficient device memory to satisfy call

Issue submission checklist

I'm reporting an issue. It's not a question.
I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
There is reproducer code and related data files such as images, videos, models, etc.

KDev-emb added bug Something isn't working support_request labels Feb 20, 2025

ilya-lavrenov assigned dmatveev and TolyaTalamanov Feb 20, 2025

ilya-lavrenov added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: OOM occurs when the NPU is used with Deepseek-R1 and INT4 quantization #29095

[Bug]: OOM occurs when the NPU is used with Deepseek-R1 and INT4 quantization #29095

KDev-emb commented Feb 20, 2025

[Bug]: OOM occurs when the NPU is used with Deepseek-R1 and INT4 quantization #29095

[Bug]: OOM occurs when the NPU is used with Deepseek-R1 and INT4 quantization #29095

Comments

KDev-emb commented Feb 20, 2025

OpenVINO Version

Operating System

Device used for inference

Framework

Model used

Issue description

Step-by-step reproduction

Relevant log output

Issue submission checklist