You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m trying to prepare GPU worker nodes and enable GPU support on Kubernetes to use GPU nodes. I followed the steps in the README file link , but the pod always remains pending and is not working.Itried to use cuda 10 as tuto and also i changed to 12 and always not working.
1. Quick Debug Information
OS/Version : Ubuntu 22.04.4 LTS (Jammy Jellyfish)
cuda version : 12.2
NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2
*server type : Nvidia L40S : link
kubectl version :
Client Version: v1.30.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0
minikube version: v1.33.1
helm Version:"v3.15.3"
2. Issue or feature description
Events:
Type Reason Age From Message
Warning FailedScheduling 26m (x150 over 12h) default-scheduler 0/1 nodes are available: 1 Insufficient nvidia.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
3.
kubectl get pods NAME READY STATUS RESTARTS AGE gpu-demo-vectoradd 0/1 Pending 0 12h gpu-operator-test 0/1 Pending 0 13h gpu-operator-test1 0/1 Pending 0 13h gpu-pod 0/1 Pending 0 13h
`kubectl describe pod gpu-pod
Name: gpu-pod
Namespace: default
Priority: 0
Service Account: default
Node:
Labels:
Annotations:
Status: Pending
IP:
IPs:
Containers:
cuda-container:
Image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
Port:
Host Port:
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ww9jw (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-ww9jw:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
nvidia.com/gpu:NoSchedule op=Exists
Events:
Type Reason Age From Message
Warning FailedScheduling 26m (x150 over 12h) default-scheduler 0/1 nodes are available: 1 Insufficient nvidia.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.`
The text was updated successfully, but these errors were encountered:
Did you deploy nvidia-device-plugin via helm? If so, which helm chart are you using? I am currently facing the same problem after upgrading from 0.14.0 to 0.16.1.
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.
I’m trying to prepare GPU worker nodes and enable GPU support on Kubernetes to use GPU nodes. I followed the steps in the README file link , but the pod always remains pending and is not working.Itried to use cuda 10 as tuto and also i changed to 12 and always not working.
1. Quick Debug Information
*server type : Nvidia L40S : link
Client Version: v1.30.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0
2. Issue or feature description
Events:
Type Reason Age From Message
Warning FailedScheduling 26m (x150 over 12h) default-scheduler 0/1 nodes are available: 1 Insufficient nvidia.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
3.
kubectl get pods NAME READY STATUS RESTARTS AGE gpu-demo-vectoradd 0/1 Pending 0 12h gpu-operator-test 0/1 Pending 0 13h gpu-operator-test1 0/1 Pending 0 13h gpu-pod 0/1 Pending 0 13h
`kubectl describe pod gpu-pod
Name: gpu-pod
Namespace: default
Priority: 0
Service Account: default
Node:
Labels:
Annotations:
Status: Pending
IP:
IPs:
Containers:
cuda-container:
Image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
Port:
Host Port:
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ww9jw (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-ww9jw:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
nvidia.com/gpu:NoSchedule op=Exists
Events:
Type Reason Age From Message
Warning FailedScheduling 26m (x150 over 12h) default-scheduler 0/1 nodes are available: 1 Insufficient nvidia.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.`
The text was updated successfully, but these errors were encountered: