-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generated image is missing files generated via RUN #3123
Comments
FYI, I looked at other tickets with a similar problem (e.g., #2336), but either the root-cause described in those tickets is different or the proposed work-around did not work for me. I have tried many different work-arounds, and none worked for me (aside from touching every file in the file-system, but this is not an option for me) |
another observation: if I change this to a multi-stage build AND I do more than just use
Build command (note that I'm not pushing remotely just to safe roundtrip time, pushing to a remote registry has the same outcome):
Test command:
In some cases we get this result (installation worked):
In other cases we get this result (installated files were not committed to the snapshot/image):
As said, it's random and about 50% to have the one or other outcome. And even more weirdly, it seems to alternate if it works or if it fails. As if a cache would corrupt and then uncorrupt itself (note that in these experiments the cache is off). I have captured the stdout (build command outputs) + stderr (kamiko debug-verbosity logs) from a successful and failing build. |
Hi @clemenskol , did you find any workaround? |
unfortunately no. We had to move away from kaniko - it was the only "solution" that worked |
Same issue here and I'm pretty sure that we are not the only one having it..... @anoop142 , anybody able to reproduce on your side ? |
For me the basic case that fails is
grep: /home/foo: No such file or directory While this works
Seems like kaniko is skipping layers when EOF is used for RUN. |
Ok. At least seems not to be the case of @clemenskol. As far as I'm concern I do not use EOF too but, if this could be a problem, I'm doing it inside a Gitlab CI job. Best, |
@jrevillard you are right, skipping EOF command is indeed a different issue #1713. |
I seem to be encountering the same problem. Only in my case one out of dozens of images is broken. A couple of files from the base image are not available in the final image. It looks as if the last layer is not properly snapshotted (size 2mb instead of 150mb; worth mentioning also RUN). All images use the same base image, run on different machines.
Upgraded to newest kaniko 1.23.2-debug and will observe results. I cant share my dockerfile and base image, but its not multistage. This is very difficult to debug, as it happens quite rarely |
Unfortunately, the missing files have further appeared in the latest version of kaniko. I will try to add an image test as the next stage |
The biggest issue I see is the lost trust in kaniko. If there is no guarantee, that the filesystem is identical to the one produced by buildx or buildah (at least semantically), I simply can't use kaniko. In production, it is almost impossible to check if all needed files are there or not. |
I tried to use the |
I'm trying to reproduce this issue, does anybody has a very simple reproduction setup. The less files are being written, the easier it will be to diagnose the root-cause. The reported one by installing Python is changing/adding so many files that it is harder to diagnose. |
I have changed the pipeline so that I now build the image to ‘candidate’, the next stage(gitlab stage) opens this image and the |
Since I cannot reproduce this issue, I am providing you my tooling trying to reproduce it. It might help you, but you will probably need to adapt it for your system. I am using NetBSD mtree to get a "snapshot" of the built rootfs from "inside" Kaniko build. This is the # syntax=docker/dockerfile:1
FROM amd64/ubuntu as test
RUN \
apt update && \
apt install \
--no-install-recommends \
--assume-yes \
python3.12 \
mtree-netbsd && \
ls -l `which python3.12` && \
python3.12 --version && \
md5sum `which python3.12` > python3.12.md5
COPY ./mtree-excludes /tmp/
RUN \
mtree \
-c \
-x \
-K md5 > rootfs.built.mtree This is a script to build the image using Kaniko and then instantiate a container using the built image. It then tries to find out if some files have "disappeared": #!/usr/bin/env bash
set -eu
TOOL="finch"
IMG="reproduce-kaniko-3123.tar"
CONT_IMG="/workspace/${IMG}"
LOCAL_IMG="${IMG}"
echo ; echo "*********************"
echo "Building the image..." ; echo
"${TOOL}" run \
-v $PWD:/workspace \
gcr.io/kaniko-project/executor:latest \
--dockerfile /workspace/Dockerfile \
--no-push \
--context dir:///workspace/ \
--tar-path "${CONT_IMG}"
echo ; echo "********************"
echo "Loading the image..." ; echo
"${TOOL}" load \
-i "${LOCAL_IMG}"
echo ; echo "***********************"
echo "Comparing the rootfs..." ; echo
echo "> python3.12 binary checksum as reported by md5sum from Kaniko"
"${TOOL}" run \
--rm \
unset-repo/unset-image-name:latest \
cat /python3.12.md5
"${TOOL}" run \
--rm \
unset-repo/unset-image-name:latest \
cat /rootfs.built.mtree > rootfs.built.mtree
echo ; echo "> python3.12 binary checksum as reported by mtree from Kaniko"
grep -A 1 "^ python3.12 " rootfs.built.mtree | head -n 2
"${TOOL}" run \
--rm \
unset-repo/unset-image-name:latest \
mtree -f /rootfs.built.mtree > rootfs-changes.mtree \
|| true
echo ; echo "****************"
echo "Missing files..." ; echo
grep "^missing: " rootfs-changes.mtree It runs Let's see if with this help we can get someone to provide some more insights on what is going on... |
Assuming we ever get to properly diagnose this issue and get to the root-cause, we even write a patch to fix it... will we ever see a fix getting integrated into Kaniko? |
Actual behavior
Files generated via a
RUN
command should be included in the final image (e.g., regardless of file generation timestamp). This seems not to be the case.I have generated a minimal
Dockerfile
to demonstrate this:When building the image, the
python3.11
command is not property installed in the generated image, although it's clearly present while building.My build command:
The output of the final 2 commands can be seen in the build output:
When the generated image is then run, the file is not found (
python3.11
) simply does not exist.To test if this has to do with file timestamps, I have done the following modification:
In this case, the
python3.11
binary is in the generated image, but since it's not just the binary itself that is missing (but essentially most files installed viaapt
), the image is completely non-functional:Note that I have tried various alternatives using or not using
--cache
and using different--snapshot-mode
Expected behavior
All files are stored in the generated image.
If I build the image using the
Dockerfile
above viadocker buildx build
the image works as expected:To Reproduce
Steps to reproduce the behavior:
Dockerfile
aboveAdditional Information
--cache
flagThe text was updated successfully, but these errors were encountered: