-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kaniko build using too many memory #909
Comments
/cc @priyawadhwa Can we provide users anything to measure the memory usage ? @jyipks Can you tell us if you have set resource limits in kaniko pod spec |
i had no resource limits on the kaniko pods. This was on a 3 node cluster, 4cores, 16GB each. From grafana i believe the pod attempted to use more than 15GB |
does kaniko keep everything in memory as its building the image or it writes to a temp directory? if it goes into a temp directory can you please provide it? thanks |
No i've never used that flag before. |
This also happens when trying to do an npm install - I also have never used that flag before. |
same problem |
Same problem on gitlab runner: latest debian with latest docker. Building 12Mb docker image uses 15Gb - 35Gb of memory. |
We're facing the same issue in Gitlab CI custom runner. |
Similar issue on Gitlab CI on GKE. We're building a python image based on official python base image, it consumes about 12Gb of RAM |
We're seeing similar issues with gradle builds as well |
Would also like to learn more about this. Kaniko doesn't have a feature equivalent to |
We're seeing similar issues too. For example, this job failed with OOM: https://gitlab.com/gitlab-com/gl-infra/tamland/-/jobs/1405946307 The job includes some stacktrace information, which may help in diagnosing the problem. The parameters that we were using, including |
I'm having the same problem as well. But, in my case, it's a Java-based build and the Maven cache repo is being included as an ignore-path. The number of changes that should occur outside of that are fairly minimal, yet I'm easily seeing 5+ GB of RAM being used where the build before that was using at most 1.2GB. We'd love to be able to use smaller instances for our builds. |
I rolled back to 1.3.0 from 1.6.0 and now it seems to work again |
This should be closed in the 1.7.0 release as of #1722. |
1.7 has a gcloud credentials problem, rolling back to 1.3.0 worked. |
Do you know when the tag |
If the k8s node where the MLFlow builder step is running doesn't have a lot of memory, the builder step will fail if it has to build larger images. For example, building the trainer image for the keras CIFAR10 codeset example resulted in an OOM failure on a node where only 8GB of memory were available. This is a known kaniko issue [1] and there's a fix available [2] with more recent (>=1.7.0) kaniko versions: disabling the compressed caching via the `--compressed-caching` command line argument. This commit models a workflow input parameter mapped to this new command line argument. To avoid OOM errors with bigger images, the user may set it in the workflow like so: ``` - name: builder image: ghcr.io/stefannica/mlflow-builder:latest inputs: - name: mlflow-codeset codeset: name: '{{ inputs.mlflow-codeset }}' path: /project - name: compressed_caching # Disable compressed caching to avoid running into OOM errors on cluster nodes with lower memory value: false ``` [1] GoogleContainerTools/kaniko#909 [2] GoogleContainerTools/kaniko#1722
I was also experiencing a memory issues in the last part of the image building with
I tried all kinds of combinations with
So I guess you should put that into your toolbox while banging your head against the wall :) |
Also got this issue when building with
reverted back to |
I am using 1.9.0 and it seems to eat quite some memory. With or without --compressed-caching=false, --use-new-run same issue sporadically 7 GB to build an simple image ? The memory consumption is ridiculous. Why does the same with standard docker just works with 1x40 x less memory request? |
I've got the same issue. For my case, I'm using a git context and cloning it itself takes 10Gi+ and gets killed before initiating the build on the latest versions. I tried with a node with more than 16Gi and it worked 1 out of 3 times. |
Kanicko feel like dead, i propose to switch to podman |
We have the same problem, get this in gitlab ci:
|
This worked for me, FYI For anyone else running into this. |
I have the same problem in
|
Same here. My build process takes around 1.5-1.8GB of memory to build, but when I run the Dockerfile via kaniko it needs 5GB which is absurd!!!! Is there any solution here? |
I encourage to use podman |
@aaron-prindle Any ideas? |
I can confirm using |
same here with Kaniko version : v1.23.0 Snapshotting itself works but when sending results to the registry (gitlab), executor gets killed
Results on registry are around 6 GB, extracted filesystem takes around 14 GB. Last words are
RSS immediately after this output is around 700MB and quickly increases then. |
I am running into this issues as well. kaniko is unusable for gcp cloud build for my use case due to OOM. A 5 year old issue for this is crazy. For context, I used E2_HIGHCPU_32 which has 32 GB of memory and STILL GET OOM. Granted, my image is large (12GB) but i currently don't have control of the image size. |
Although I don't reach an OOM, but I do notice an increase in memory usage when building multiple images in the same pod. For context I have a repo with 18 images that sometimes changes at the same time. They are built with a for loop so one image after the other. The more I build the more memory the pod uses. If I spawn out to a new pod then I don't have access to the cache anymore so the previous jobs' content is gone and build fails. Is there a way to clean up the pod after a build? And yes, I'm using the |
We are running into the same problem. Actually - we are running into multiple problems: Our builds take significantly more time, cpu and ram. Starting with kaniko-1.9.1-debug, we run into OOM-Kill issues with our builds. Another problem for us is, that the images take significantly longer, and we run into timeout issues with our GitLab runners. For now, kaniko-1.9.0-debug seems to work for us. But it feels bad, relying on an old version to build production images. |
For the record. There is no solution to this issue. Openshift has image building mechanisms without spending such resources. For Kubernetes, using Kaniko, we need to accept the extra resource usage and the cluster cost increase. |
Just to make this clear:
My issue is, that Memory Usage increased from 0.9GB to > 10GB (OOM Kill at 10), and the build time increased from less than 1 min to up to 10 min. I understand, that additional resources will be required. The question is, if the change from kaniko-1.9.0 to kaniko-1.9.1 is worth this increase... Because it seems to be widespread and to me somewhat excessive.... |
@th-lange the memory issue is already 2~ years old. I would propose to use buildah. |
I know - but it baffles me. Our solution is, to use kaniko 1.9.0. But we are discussing other options too. Edit: I am grateful for the work done by the team - I don't want to sound ungrateful! But the decisions baffle me - that's all. |
buildah does not work in containers |
@ensc buildah does work in containers, here's a quick build example: podman run --rm --volume ./:/root/build:ro --workdir /root/build -it quay.io/buildah/stable buildah build --arch amd64 --layers -f Dockerfile --storage-driver=vfs |
I am building a rather large docker image, end size is ~8GB. It builds fine in DinD, however we would like to use kaniko. The kaniko pod running the dockerfile balloons in memory usage and gets killed by kubernetes. How can I make kaniko work for me, or am I stuck with DinD?
Please help, thank you
The text was updated successfully, but these errors were encountered: