-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --zero-file-timestamps
flag
#2477
base: main
Are you sure you want to change the base?
Conversation
This change adds a new flag to zero timestamps in layer tarballs without making a fully reproducible image. My use case for this is maintaining a large image with build tooling. I have a multi-stage Dockerfile that generates an image containing several toolchains for cross-compilation, with each toolchain being prepared in a separate stage before being COPY'd into the final image. This is a very large image, and while it's incredibly convenient for development, making a change as simple as adding one new tool tends to invalidate caches and force the devs to download another 10+ GB image. If timestamps were removed from each layer, these images would be mostly unchanged with each minor update, greatly reducing disk space needed for keeping old versions around and time spent downloading updated images. I wanted to use Kaniko's --reproducible flag to help with this, but ran into issues with memory consumption (GoogleContainerTools#862) and build time (GoogleContainerTools#1960). Additionally, I didn't really care about reproducibility - I mainly cared about the layers having identical contents so Docker could skip pulling and storing redundant layers from a registry. This solution works around these problems by stripping out timestamps as the layer tarballs are built. It removes the need for a separate postprocessing step, and preserves image metadata so we can still see when the image itself was built. An alternative solution would be to use mutate.Time much like Kaniko currently uses mutate.Canonical to implement --reproducible, but that would not be a satisfactory solution for me until [issue 1168](google/go-containerregistry#1168) is addressed by go-containerregistry. Given my lack of Go experience, I don't feel comfortable tackling that myself, and this seems like a simple and useful workaround in the meantime.
This allows the behavior of --zero-timestamps to better match the behavior of --reproducible. Layers generated using both methods should now have the same digest.
Dear maintainers. @aaron-prindle @chuangw6 |
It's been a few months and I still would really like this feature; is there anything I can do to help the chances of this getting merged? This change solves an immediate problem that I (and several others) have, and it seems like a useful feature to me in general (I think that having reproducible layer contents but preserving the image metadata itself would be valuable). |
Isn't this exactly what "reproducible" means? But I, too, would prefer this implementation over the existing one |
@zx96, has this been tested? I'm attempting to use your branch and am still getting cache misses in use-cases where |
@derari In my head, "reproducible" generally means you get the exact same thing out and can verify it with the hashes; this flag doesn't give you the exact same image, just identical layer contents (which is good enough for Docker to avoid storing the same thing multiple times). @jkalez I did test it back before I initially opened the PR, but my testing was mostly just ensuring I got the same layer hashes out reliably and that they matched the layer hashes I got with |
I'm using commit My kaniko arguments look as follows: /kaniko/executor
--cache=true
--cache-copy-layers=true
--cache-run-layers=true
--cache-ttl=24h
--cache-repo="$CI_REGISTRY_IMAGE/container-cache"
--compressed-caching=false
--zero-file-timestamps
--snapshot-mode=redo
--context="$CI_PROJECT_DIR/images/$IMAGE_NAME"
--build-arg BASE_IMAGE
--destination "$CI_REGISTRY_IMAGE/$IMAGE_NAME:$CI_COMMIT_REF_SLUG"
--registry-certificate "$CI_REGISTRY=$CI_SERVER_TLS_CA_FILE" All the environment variables are appropriately populated. For this test, I'm just using a local container registry like this: https://docs.docker.com/registry/deploying/ The context directory "$CI_PROJECT_DIR/images/$IMAGE_NAME" contains a ARG BASE_IMAGE
FROM $BASE_IMAGE
RUN apt update
RUN apt install -y xvfb libxcb1-dev
COPY script.sh .
ENTRYPOINT ["./script.sh"] If I run kaniko like this twice with a cold cache, I expect the following:
However, the 2nd run never actually pulls anything from the cache. I get logs like the following:
Note in the 1st run, I got the following:
Obviously those hashes do not match. Note if I run the exact same test with
And this is the first run when run with
Note here that the hashes match. Let me know if there's anything else I can provide to you to help you repro this! |
@zx96 any luck reproducing what I saw in your branch? |
@jkalez I've been busy and haven't had a chance to look into it yet... 😅 I might get a chance to look at this weekend; I'll update here if I figure something out. |
@zx96 could this perhaps be merged with a note that setting this flag can have some issues with layer caching? That would at least enable the feature, which likely provides a larger benefit to speed than the cache does in the first place. |
The reproducible community seems to be settling on having the environment variable SOURCE_DATE_EPOCH as something that can be set to a value, that is then used for all date-specific operations so that the dates stay the same with the same builds. Buildkit use it: Could Kaniko use this convention? |
Hey, I'd still like this feature very to be in kaniko very much. We have builds taking 3 seconds to initialize, 5 seconds to pull the image, 10 seconds to unpack the rootfs, 2 seconds to run the Dockerfile commands, and then 38 (!) seconds before it starts to push the image, which takes about 10 seconds again. This means a much faster implementation of --reproducible would halve our build time. It seems @zx96 has abandoned this PR, perhaps someone can take over? |
Description
This MR adds a new flag (
--zero-file-timestamps
) to zero timestamps in layer tarballs without making a fully reproducible image. This flag provides a workaround (but not a complete solution) for #862 and #1960.My use case for this is maintaining a large image with build tooling. I have a multi-stage Dockerfile that generates an image containing several toolchains for cross-compilation, with each toolchain being prepared in a separate stage before being
COPY
'd into the final image. This is a very large image, and while it's incredibly convenient for development, making a change as simple as adding one new tool tends to invalidate caches and force the devs to download another 10+ GB image.If timestamps were removed from each layer, these images would be mostly unchanged with each minor update, greatly reducing disk space needed for keeping old versions around and time spent downloading updated images.
I wanted to use Kaniko's
--reproducible
flag to help with this, but ran into issues with memory consumption (#862) and build time (#1960). Additionally, I didn't really care about reproducibility - I mainly cared about the layers having identical contents so Docker could skip pulling and storing redundant layers from a registry.This solution works around these problems by stripping out timestamps as the layer tarballs are built. It removes the need for a separate postprocessing step, and preserves image metadata so we can still see when the image itself was built.
An alternative solution would be to use
mutate.Time
much like Kaniko currently usesmutate.Canonical
to implement--reproducible
, but that would not be a satisfactory solution for me until issue 1168 is addressed by go-containerregistry. Given my lack of Go experience, I don't feel comfortable tackling that myself, and this seems like a simple and useful workaround in the meantime.Submitter Checklist
These are the criteria that every PR should meet, please check them off as you review them:
See the contribution guide for more details.
Reviewer Notes
Release Notes