Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: OOM occurs when the NPU is used with Deepseek-R1 and INT4 quantization #29095

Open
3 tasks done
KDev-emb opened this issue Feb 20, 2025 · 0 comments
Open
3 tasks done
Assignees
Labels
bug Something isn't working category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin support_request

Comments

@KDev-emb
Copy link

OpenVINO Version

2025.0

Operating System

Other (Please specify in description)

Device used for inference

NPU

Framework

None

Model used

DeepSeek-R1-Distill-Qwen-14B/INT4-NPU_compressed_weights

Issue description

I'm trying to run the following example from the OpenVINO notebook using an NPU:
https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/deepseek-r1
When attempting to load the DeepSeek-R1-Distill-Qwen-14B with INT4-NPU model on the NPU, I encounter an OOM (Out of Memory) error.
However, the same model runs correctly on a CPU, suggesting that the available memory should be sufficient.
My configuration: Ubuntu 24.04.2 LTS, Intel® Core™ Ultra 7 265KF, 64GB.

Step-by-step reproduction

Just run the DeepSeek-R1 example from the notebook, select the DeepSeek-R1-Distill-Qwen-14B model and NPU as the device, while keeping the rest of the settings unchanged.

Relevant log output

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[8], line 6
      2 import sys
      4 print(f"Loading model from {model_dir}\n")
----> 6 pipe = ov_genai.LLMPipeline(str(model_dir), device.value)
      7 if "genai_chat_template" in model_configuration:
      8     pipe.get_tokenizer().set_chat_template(model_configuration["genai_chat_template"])

RuntimeError: Exception from src/inference/src/cpp/core.cpp:109:
Exception from src/inference/src/dev/plugin.cpp:53:
Exception from src/plugins/intel_npu/src/backend/src/zero_remote_tensor.cpp:101:
L0 zeMemAllocHost result: ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY, code 0x70000003 - insufficient device memory to satisfy call

Issue submission checklist

  • I'm reporting an issue. It's not a question.
  • I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • There is reproducer code and related data files such as images, videos, models, etc.
@KDev-emb KDev-emb added bug Something isn't working support_request labels Feb 20, 2025
@ilya-lavrenov ilya-lavrenov added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin support_request
Projects
None yet
Development

No branches or pull requests

4 participants