kaniko build using too many memory #909

jyipks · 2019-12-11T02:47:50Z

I am building a rather large docker image, end size is ~8GB. It builds fine in DinD, however we would like to use kaniko. The kaniko pod running the dockerfile balloons in memory usage and gets killed by kubernetes. How can I make kaniko work for me, or am I stuck with DinD?

Please help, thank you

tejal29 · 2020-01-10T22:50:46Z

/cc @priyawadhwa Can we provide users anything to measure the memory usage ?

@jyipks Can you tell us if you have set resource limits in kaniko pod spec
Also please tell us cluster specification

priyawadhwa · 2020-01-14T21:08:14Z

@tejal29 @jyipks the only thing I can think of is upping the resource limits on the pod as well

jyipks · 2020-01-14T23:32:18Z

i had no resource limits on the kaniko pods. This was on a 3 node cluster, 4cores, 16GB each. From grafana i believe the pod attempted to use more than 15GB
I was building a custom jupyter-notebook image, that normally comes out to be ~8GB upon build completion via docker build.

jyipks · 2020-01-21T22:00:26Z

does kaniko keep everything in memory as its building the image or it writes to a temp directory? if it goes into a temp directory can you please provide it?

thanks

mamoit · 2020-07-28T13:26:25Z

This sounds like #862.
@jyipks do you remember if you were using the --reproducible flag?

jyipks · 2020-07-28T13:30:10Z

No i've never used that flag before.

rvaidya · 2021-03-11T03:17:34Z

This also happens when trying to do an npm install - I also have never used that flag before.

max107 · 2021-03-12T20:34:51Z

same problem

tarakanof · 2021-03-17T14:00:08Z

Same problem on gitlab runner: latest debian with latest docker. Building 12Mb docker image uses 15Gb - 35Gb of memory.

fknop · 2021-03-26T23:40:51Z

We're facing the same issue in Gitlab CI custom runner.
We're building a docker image for node, the build started hanging on webpack everytime and the machine ends up running out of memory and crashes.
It used to work fine without any issue.
Our docker image is a little less than 300MB and our machine has 8Gb ram

meseta · 2021-03-30T12:33:25Z

Similar issue on Gitlab CI on GKE. We're building a python image based on official python base image, it consumes about 12Gb of RAM

jamil-s · 2021-04-19T21:06:58Z

We're seeing similar issues with gradle builds as well

nichoio · 2021-04-20T09:32:16Z

Would also like to learn more about this. Kaniko doesn't have a feature equivalent to docker build --memory, does it?

suprememoocow · 2021-07-07T13:00:42Z

We're seeing similar issues too. For example, this job failed with OOM: https://gitlab.com/gitlab-com/gl-infra/tamland/-/jobs/1405946307

The job includes some stacktrace information, which may help in diagnosing the problem.

The parameters that we were using, including --snapshotMode=redo are here: https://gitlab.com/gitlab-com/gl-infra/tamland/-/commit/0b399381d30655059ec78461640674af7562c708#587d266bb27a4dc3022bbed44dfa19849df3044c_116_125

mikesir87 · 2021-09-07T13:36:36Z

I'm having the same problem as well. But, in my case, it's a Java-based build and the Maven cache repo is being included as an ignore-path. The number of changes that should occur outside of that are fairly minimal, yet I'm easily seeing 5+ GB of RAM being used where the build before that was using at most 1.2GB. We'd love to be able to use smaller instances for our builds.

trallnag · 2021-10-13T10:26:53Z

I rolled back to 1.3.0 from 1.6.0 and now it seems to work again

Phylu · 2021-10-19T10:25:09Z

This should be closed in the 1.7.0 release as of #1722.

s3f4 · 2021-10-23T21:39:16Z

I rolled back to 1.3.0 from 1.6.0 and now it seems to work again

1.7 has a gcloud credentials problem, rolling back to 1.3.0 worked.

Exagone313 · 2021-10-29T15:40:14Z

Do you know when the tag gcr.io/kaniko-project/executor:debug (as well as :latest) gets updated? It still points to the v1.6.0 version: https://console.cloud.google.com/gcr/images/kaniko-project/GLOBAL/executor

If the k8s node where the MLFlow builder step is running doesn't have a lot of memory, the builder step will fail if it has to build larger images. For example, building the trainer image for the keras CIFAR10 codeset example resulted in an OOM failure on a node where only 8GB of memory were available. This is a known kaniko issue [1] and there's a fix available [2] with more recent (>=1.7.0) kaniko versions: disabling the compressed caching via the `--compressed-caching` command line argument. This commit models a workflow input parameter mapped to this new command line argument. To avoid OOM errors with bigger images, the user may set it in the workflow like so: ``` - name: builder image: ghcr.io/stefannica/mlflow-builder:latest inputs: - name: mlflow-codeset codeset: name: '{{ inputs.mlflow-codeset }}' path: /project - name: compressed_caching # Disable compressed caching to avoid running into OOM errors on cluster nodes with lower memory value: false ``` [1] GoogleContainerTools/kaniko#909 [2] GoogleContainerTools/kaniko#1722

Zachu · 2022-01-20T08:19:15Z

I was also experiencing a memory issues in the last part of the image building with v1.7.0.

INFO[0380] Taking snapshot of full filesystem...        
Killed

I tried all kinds of combinations with --compressed-caching=false and removing the --reproducible flag, downgrading to v1.3.0 and stuff. I finally got the build to pass by using the --use-new-run flag.

--use-new-run

Use the experimental run implementation for detecting changes without requiring file system snapshots. In some cases, this may improve build performance by 75%.

So I guess you should put that into your toolbox while banging your head against the wall :)

Idok-viber · 2022-10-13T09:33:40Z

Also got this issue when building with v1.9.1.

INFO[0133] Taking snapshot of full filesystem...        
Killed

reverted back to v1.3.0 and it works.

cforce · 2022-11-07T09:04:59Z

I am using 1.9.0 and it seems to eat quite some memory. With or without --compressed-caching=false, --use-new-run same issue sporadically
"The node was low on resource: memory. Container build was using 5384444Ki, which exceeds its request of 0. Container helper was using 24720Ki, which exceeds its request of 0.
"The node was low on resource: memory. Container helper was using 9704Ki, which exceeds its request of 0. Container build was using 6871272Ki, which exceeds its request of 0."

7 GB to build an simple image ? The memory consumption is ridiculous. Why does the same with standard docker just works with 1x40 x less memory request?

codezart · 2023-05-03T20:21:15Z

I've got the same issue. For my case, I'm using a git context and cloning it itself takes 10Gi+ and gets killed before initiating the build on the latest versions. I tried with a node with more than 16Gi and it worked 1 out of 3 times.

cforce · 2023-05-06T06:12:16Z

Kanicko feel like dead, i propose to switch to podman

jonaskello · 2023-06-08T19:27:56Z

We have the same problem, get this in gitlab ci:

INFO[0172] Taking snapshot of full filesystem...        
Killed
Cleaning up project directory and file based variables
ERROR: Job failed: command terminated with exit code 137

starkmatt · 2023-06-13T07:43:40Z

Solution for me was to use v1.9.2-debug with the following options:
--cache=true --compressed-caching=false --use-new-run --cleanup

This worked for me,
Thank you very much,

FYI For anyone else running into this.

zzzinho · 2023-07-10T09:05:32Z

I have the same problem in v1.12.1-debug

INFO[0206] Taking snapshot of full filesystem...        
Killed

ricardojdsilva87 · 2023-07-12T16:59:25Z

Hello everyone just to give my input, here are some CPU/RAM metrics with different kaniko versions.

Just to clarify the container where the build runs is using github actions hosted runners with 2core and 4GB RAM

Picture 1 - kaniko 1.9.2-debug with cache enabled --> Push failed with message Killed

Picture 2 - kaniko 1.9.2-debug with cache enabled and these settings --compressed-caching=false --use-new-run --cleanup Push failed with message Killed

Picture 3 - kaniko 1.12.1-debug with cache enabled and these settings --compressed-caching=false --use-new-run --cleanup Push failed with message Killed

Picture 4 - kaniko 1.3.0-debug with cache enabled (the flag --compressed-caching is not supported in this version) --> Push WORKS

The resulting image has around 500MB and the container uses around 1 core and less than the memory limit of the container (4GB). This build works if we increase the memory limit to 16GB that is an overkill and a waste of resources. The jobs that are killed are in fact using almost half of the memory (~2GB) of the job that was successful (3GB)

I would say that something broke kaniko starting on version 1.3.0, but even with all the flags set the builds do not work and the memory usage is way less than with v1.3.0 (Update the builds started to fail from version v1.9.1)

Thanks for your help

UPDATE
Tested also other older kaniko versions.
with kaniko 1.5.2-debug with cache enabled

with kaniko 1.5.2-debug with cache enabled

with kaniko 1.6.0-debug with cache enabled

with kaniko 1.8.1-debug with cache enabled

with kaniko 1.9.0-debug with cache enabled

Starting with kaniko v1.9.1 the builds started to fail

droslean · 2023-08-28T13:54:51Z

Same here. My build process takes around 1.5-1.8GB of memory to build, but when I run the Dockerfile via kaniko it needs 5GB which is absurd!!!!

Is there any solution here?

cforce · 2023-08-29T12:13:24Z

I encourage to use podman

droslean · 2023-08-29T13:15:14Z

@aaron-prindle Any ideas?

timwsuqld · 2023-09-14T04:55:26Z

I can confirm using 1.3.0 works for us (with --force as we have v2 cgroups), while 1.14.0 fails. I've not tested every version in between. Final image size is 980Mb, build machine has 4GB of ram.

ensc · 2024-06-13T10:30:06Z

same here with Kaniko version : v1.23.0

Snapshotting itself works but when sending results to the registry (gitlab), executor gets killed

kernel: Out of memory: Killed process 9503 (executor) total-vm:53191912kB, anon-rss:31445632kB, file-rss:128kB, shmem-rss:0kB, UID:0 pgtables:63580kB oom_score_adj:0

Results on registry are around 6 GB, extracted filesystem takes around 14 GB. Last words are

$ . /opt/sdk/environment-setup-cortexa9t2hf-neon-oe-linux-gnueabi
INFO[0342] Taking snapshot of full filesystem...        
INFO[0561] USER	build-user:build-user                   
INFO[0561] Cmd: USER

RSS immediately after this output is around 700MB and quickly increases then.

ajbeach2 · 2024-08-31T06:01:54Z

I am running into this issues as well. kaniko is unusable for gcp cloud build for my use case due to OOM. A 5 year old issue for this is crazy.

For context, I used E2_HIGHCPU_32 which has 32 GB of memory and STILL GET OOM. Granted, my image is large (12GB) but i currently don't have control of the image size.

apinter · 2024-09-17T13:47:16Z

Although I don't reach an OOM, but I do notice an increase in memory usage when building multiple images in the same pod. For context I have a repo with 18 images that sometimes changes at the same time. They are built with a for loop so one image after the other. The more I build the more memory the pod uses. If I spawn out to a new pod then I don't have access to the cache anymore so the previous jobs' content is gone and build fails. Is there a way to clean up the pod after a build? And yes, I'm using the --cleanup flag.

th-lange · 2024-11-18T08:22:23Z

We are running into the same problem. Actually - we are running into multiple problems:

Our builds take significantly more time, cpu and ram.

Starting with kaniko-1.9.1-debug, we run into OOM-Kill issues with our builds. Another problem for us is, that the images take significantly longer, and we run into timeout issues with our GitLab runners.

For now, kaniko-1.9.0-debug seems to work for us. But it feels bad, relying on an old version to build production images.

droslean · 2024-11-18T12:15:15Z

For the record. There is no solution to this issue. Openshift has image building mechanisms without spending such resources. For Kubernetes, using Kaniko, we need to accept the extra resource usage and the cluster cost increase.

th-lange · 2024-11-18T12:40:55Z

For the record. There is no solution to this issue. Openshift has image building mechanisms without spending such resources. For Kubernetes, using Kaniko, we need to accept the extra resource usage and the cluster cost increase.

Just to make this clear:

We build on 2 dedicated build servers (not k8s)

My issue is, that Memory Usage increased from 0.9GB to > 10GB (OOM Kill at 10), and the build time increased from less than 1 min to up to 10 min.

I understand, that additional resources will be required.

The question is, if the change from kaniko-1.9.0 to kaniko-1.9.1 is worth this increase... Because it seems to be widespread and to me somewhat excessive....

cforce · 2024-11-22T07:08:39Z

@th-lange the memory issue is already 2~ years old. I would propose to use buildah.

th-lange · 2024-11-22T08:15:52Z

@th-lange the memory issue is already 2~ years old. I would propose to use buildah.

I know - but it baffles me.
Especially users pretending this increase is normal or reasonable. And an up to 10-fold increase is something we would just have to accept as a fact of live.... Maybe it is just me, but a 10GB memory usage is not ok, if you build a 100MB image.

Our solution is, to use kaniko 1.9.0. But we are discussing other options too.

Edit: I am grateful for the work done by the team - I don't want to sound ungrateful! But the decisions baffle me - that's all.

ensc · 2024-11-22T13:04:52Z

I would propose to use buildah

buildah does not work in containers

mamoit · 2024-11-22T14:34:57Z

@ensc buildah does work in containers, here's a quick build example:

podman run --rm --volume ./:/root/build:ro --workdir /root/build -it quay.io/buildah/stable buildah build --arch amd64 --layers -f Dockerfile --storage-driver=vfs

cvgw added kind/question Further information is requested area/performance issues related to kaniko performance enhancement labels Dec 21, 2019

aemaem mentioned this issue May 19, 2021

Snapshot includes whole image #1569

Open

lovetheguitar mentioned this issue Jun 23, 2021

Ability to use a TempFile instead of an in-memory buffer for tarball snapshots to build large Docker containers? #358

Closed

Phylu mentioned this issue Jul 8, 2021

Kaniko gets OOMKilled while building vast images #1680

Closed

stefannica mentioned this issue Nov 15, 2021

mlflow-builder: fix OOM failures during build with bigger images fuseml/extensions#74

Merged

Verhaeg mentioned this issue Dec 7, 2022

Build with kaniko is randomuly crashing after Taking snapshot of full filesystem #2275

Open

aaron-prindle added the priority/p1 Basic need feature compatibility with docker build. we should be working on this next. label Jun 8, 2023

aaron-prindle added issue/oom differs-from-docker works-with-docker categorized labels Jul 5, 2023

aaron-prindle added regression regression/v1.2.0 labels Jul 12, 2023

aaron-prindle mentioned this issue Sep 26, 2023

Kaniko builds take consume double the resources and time to build than docker/podman #2730

Open

kwiatekus mentioned this issue Jan 12, 2024

Serverless build executor OOMKilled when building large images kyma-project/serverless#547

Closed

hypery2k mentioned this issue Jan 18, 2024

Multistage build failing - works with Docker - including Repro #2944

Closed

kaniko build using too many memory #909

kaniko build using too many memory #909

Comments

jyipks commented Dec 11, 2019

tejal29 commented Jan 10, 2020

priyawadhwa commented Jan 14, 2020

jyipks commented Jan 14, 2020

jyipks commented Jan 21, 2020

mamoit commented Jul 28, 2020

jyipks commented Jul 28, 2020

rvaidya commented Mar 11, 2021

max107 commented Mar 12, 2021

tarakanof commented Mar 17, 2021

fknop commented Mar 26, 2021

meseta commented Mar 30, 2021

jamil-s commented Apr 19, 2021

nichoio commented Apr 20, 2021

suprememoocow commented Jul 7, 2021

mikesir87 commented Sep 7, 2021

trallnag commented Oct 13, 2021

Phylu commented Oct 19, 2021

s3f4 commented Oct 23, 2021

Exagone313 commented Oct 29, 2021

Zachu commented Jan 20, 2022 • edited Loading

Idok-viber commented Oct 13, 2022 • edited Loading

cforce commented Nov 7, 2022 • edited Loading

codezart commented May 3, 2023

cforce commented May 6, 2023

jonaskello commented Jun 8, 2023

starkmatt commented Jun 13, 2023 • edited Loading

zzzinho commented Jul 10, 2023

ricardojdsilva87 commented Jul 12, 2023 • edited Loading

droslean commented Aug 28, 2023

cforce commented Aug 29, 2023

droslean commented Aug 29, 2023

timwsuqld commented Sep 14, 2023

ensc commented Jun 13, 2024

ajbeach2 commented Aug 31, 2024 • edited Loading

apinter commented Sep 17, 2024

th-lange commented Nov 18, 2024 • edited Loading

droslean commented Nov 18, 2024

th-lange commented Nov 18, 2024

cforce commented Nov 22, 2024

th-lange commented Nov 22, 2024 • edited Loading

ensc commented Nov 22, 2024

mamoit commented Nov 22, 2024 • edited Loading

Zachu commented Jan 20, 2022 •

edited

Loading

Idok-viber commented Oct 13, 2022 •

edited

Loading

cforce commented Nov 7, 2022 •

edited

Loading

starkmatt commented Jun 13, 2023 •

edited

Loading

ricardojdsilva87 commented Jul 12, 2023 •

edited

Loading

ajbeach2 commented Aug 31, 2024 •

edited

Loading

th-lange commented Nov 18, 2024 •

edited

Loading

th-lange commented Nov 22, 2024 •

edited

Loading

mamoit commented Nov 22, 2024 •

edited

Loading