Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: azure-pipelines-agent run inside AKS ignores cgroups & memory limits #5121

Open
1 of 4 tasks
philipp-durrer-jarowa opened this issue Feb 17, 2025 · 2 comments
Open
1 of 4 tasks

Comments

@philipp-durrer-jarowa
Copy link

What happened?

We're running a Docker@2 Task (doing buildAndpush and therefore disabling arguments options) in a Azure DevOps pipeline which builds a react.js application which uses >8GB of RAM during the build step. The Azure DevOps build agent however does have a memory request/limit set to 6GB and therefore get's OOMKilled.

Inside the azure devops build agent we use podman that handles the Dockerfile build from the Docker@2 task from the Azure DevOps Pipeline.

We've followed https://github.com/microsoft/azure-pipelines-agent/blob/master/docs/start/resourceconfig.md w/o success as k8s overwrites the cgroups for all running processes.

Any other suggestions or ideas how to enforce the memory limit with the azure pipeline agent?

Versions

Azure Pipelines agent: v4.251.0

Environment type (Please select at least one enviroment where you face this issue)

  • Self-Hosted
  • Microsoft Hosted
  • VMSS Pool
  • Container

Azure DevOps Server type

dev.azure.com (formerly visualstudio.com)

Azure DevOps Server Version (if applicable)

No response

Operation system

AKS k8s 1.30.6

Version controll system

Azure DevOps

Relevant log output

top - 12:16:43 up 3 days, 18:52,  0 users,  load average: 7.75, 6.91, 4.32
Tasks:  40 total,   2 running,  38 sleeping,   0 stopped,   0 zombie
%Cpu(s): 56.2 us, 10.6 sy,  0.0 ni, 31.3 id,  1.6 wa,  0.0 hi,  0.3 si,  0.0 st
MiB Mem :  32093.8 total,   1600.0 free,   7915.3 used,  22578.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  23368.4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                      
   8286 azurede+  20   0   36.5g   4.6g   0.0g S 310.0  14.6   7:15.45 node


pod casually blows thru the 6GB memory limit and get's OOMKilled by k8s control plane.
@znedw
Copy link

znedw commented Feb 18, 2025

the documentation is for cgroups v1 where AKS would be cgroup v2, you probably need to do something like echo "6G" > /sys/fs/cgroup/system.slice/vsts.agent.*.*.agent*.service/memory.peak

@philipp-durrer-jarowa
Copy link
Author

We run the agent with the start.sh script not as a service. We use KEDA and scaled jobs.

Therefore there's no vsts.*.service in the cgroup:

azuredevops@buildagents-azure-devops-agent-scaledjobs-7rtjh-8vx6q:/sys/fs/cgroup/system.slice$ ls
 boot-efi.mount           hugetlb.1GB.events         memory.swap.events
 cgroup.controllers       hugetlb.1GB.events.local   memory.swap.high
 cgroup.events            hugetlb.1GB.max            memory.swap.max
 cgroup.freeze            hugetlb.1GB.rsvd.current   misc.current
 cgroup.kill              hugetlb.1GB.rsvd.max       misc.max
 cgroup.max.depth         hugetlb.2MB.current        multipathd.service
 cgroup.max.descendants   hugetlb.2MB.events         networkd-dispatcher.service
 cgroup.procs             hugetlb.2MB.events.local   node-exporter.service
 cgroup.stat              hugetlb.2MB.max            node-problem-detector.service
 cgroup.subtree_control   hugetlb.2MB.rsvd.current   pids.current
 cgroup.threads           hugetlb.2MB.rsvd.max       pids.events
 cgroup.type              hv-kvp-daemon.service      pids.max
 chrony.service           io.max                     rdma.current
 containerd.service       io.pressure                rdma.max
 cpu.idle                 io.prio.class              rsyslog.service
 cpu.max                  io.stat                    run-rpc_pipefs.mount
 cpu.max.burst            io.weight                  ssh.service
 cpu.pressure             irqbalance.service         sync-container-logs.service
 cpu.stat                 kubelet.service            system-getty.slice
 cpu.uclamp.max           memory.current             system-modprobe.slice
 cpu.uclamp.min           memory.events             'system-serial\x2dgetty.slice'
 cpu.weight               memory.events.local       'system-systemd\x2dfsck.slice'
 cpu.weight.nice          memory.high                system-walinuxagent.extensions.slice
 cpuset.cpus              memory.low                 systemd-journald.service
 cpuset.cpus.effective    memory.max                 systemd-logind.service
 cpuset.cpus.partition    memory.min                 systemd-networkd.service
 cpuset.mems              memory.numa_stat           systemd-resolved.service
 cpuset.mems.effective    memory.oom.group           systemd-udevd.service
 cron.service             memory.pressure            unattended-upgrades.service
 dbus.service             memory.stat                walinuxagent.service
 hugetlb.1GB.current      memory.swap.current

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants