Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build with kaniko is randomuly crashing after Taking snapshot of full filesystem #2275

Open
slamer59 opened this issue Oct 3, 2022 · 4 comments
Labels
area/multi-stage builds issues related to kaniko multi-stage builds area/performance issues related to kaniko performance enhancement categorized differs-from-docker issue/build-fails issue/crash issue/hang issue/oom kind/bug Something isn't working possible-dupe priority/p1 Basic need feature compatibility with docker build. we should be working on this next. works-with-docker

Comments

@slamer59
Copy link

slamer59 commented Oct 3, 2022

Actual behavior
The kaniko build silently crashes after taking the full filesystem snapshot with no useful error. Works fine with dind. Disabling the Kaniko cache doesn't help.

Might be related to

Expected behavior
Running gitlab CI with kaniko

Leads to this error:

INFO[0195] Taking snapshot of full filesystem...        
Cleaning up project directory and file based variables 00:00
ERROR: Job failed: pod "runner-75bjfbsg-project-35867263-concurrent-1nv49n" status is "Failed"

I don't know how to keep Job in k8s so... I cannot see what happend (I will look on how to keep failing ones)

To Reproduce
Steps to reproduce the behavior:

  1. running this CI in gitlab



docker-build:
  stage: docker-build
  rules:
    - if: $CI_COMMIT_BRANCH == "main" && $TRIGGER_JOB == null
  image:
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  parallel:
      matrix: 
        - ENVIRONMENT: ["stage", "production"]
  environment:
    name: $ENVIRONMENT
  script:
    - mkdir -p /kaniko/.docker
    - cat "${DOCKER_AUTH_CONFIG}" > /kaniko/.docker/config.json
    - >-
      /kaniko/executor
      --context "${CI_PROJECT_DIR}"
      --build-arg BUILDKIT_INLINE_CACHE=1 
      --dockerfile "${CI_PROJECT_DIR}/Dockerfile"
      --destination "$CI_REGISTRY_IMAGE:latest"
      --destination "$CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA-$CI_PIPELINE_ID"
      --cache

lead to a failure.

I can run again the same configuration and it work some time. Here an exemple on docker-build [production]
image

Additional Information

  • Dockerfile
    NB : I remove ENG/ARG so it can better read
# Install dependencies only when needed
FROM node:16-alpine AS deps
# Check https://github.com/nodejs/docker-node/tree/b4117f9333da4138b03a546ec926ef50a31506c3#nodealpine to understand why libc6-compat might be needed.
RUN apk add --no-cache libc6-compat chromium
WORKDIR /app
# COPY package.json yarn.lock ./
# RUN yarn install --frozen-lockfile

# If using npm with a `package-lock.json` comment out above and use below instead
COPY package.json package-lock.json ./ 

RUN npm ci --legacy-peer-deps

# Rebuild the source code only when needed
FROM node:16-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .

RUN npm run build

# Production image, copy all the files and run next
FROM node:16-alpine AS runner
WORKDIR /app

ENV NODE_ENV production
# Uncomment the following line in case you want to disable telemetry during runtime.
# ENV NEXT_TELEMETRY_DISABLED 1

RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs

# You only need to copy next.config.js if you are NOT using the default configuration
COPY --from=builder /app/next.config.js ./
COPY --chown=nextjs:nodejs --from=builder /app/public ./public
COPY --from=builder /app/package.json ./package.json

# Automatically leverage output traces to reduce image size 
# https://nextjs.org/docs/advanced-features/output-file-tracing
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static

USER nextjs

EXPOSE 3000

ENV PORT 3000

CMD ["node", "server.js"]

  • Build Context
    Please provide or clearly describe any files needed to build the Dockerfile (ADD/COPY commands)
  • Kaniko Image (fully qualified with digest)

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
Please check if the build works in docker but not in kaniko
Please check if this error is seen when you use --cache flag
Please check if your dockerfile is a multistage dockerfile
@Wells-Li
Copy link

Wells-Li commented Nov 8, 2022

May be memory limit is the main problem. I got this before too~

@Verhaeg
Copy link

Verhaeg commented Dec 7, 2022

I'm having the same issue, pipeline constantly failing and I'm not sure but I think the snapshot is being run on memory and while running on Kubernetes I don't think that would be the best idea as memory is a limited resource specially in this context.

In my case, I need to install some packages and they amount to ~1.1GB plus the already existing data (Alpine based image). But considering several node applications, 1GB unfortunately does not seem a lot as final image.

Running on GKE with Autopilot and Gitlab Helm - memory is capped at 2Gi right now which should be plenty except for this pipeline. Is it possible to disable snapshot? or how to make it use disk instead?

This seems related to: #909

@gaatjeniksaan
Copy link

gaatjeniksaan commented Mar 23, 2023

We're having this issue as well with 1.9.1-debug. End size of the image should be ~9GB, but the kaniko build (on GKE) fails due to limit in memory. See attached image to share in my agony.
image (5)

@pimperator
Copy link

pimperator commented Oct 4, 2023

I am having the same issues repeatedly (running on gitlab-ci pipelines with eks, memory limits in place) ... the thing is no matter how much memory I give the job it uses everything it gets.

Here are some screenshots on the same job with different reservations/limits:

Bildschirmfoto 2023-10-04 um 09 23 19
Bildschirmfoto 2023-10-04 um 12 31 45
Bildschirmfoto 2023-10-04 um 12 39 44

at least the last one did not fail but both others failed at taking snapshots
resulting container size is approximately 280MB

after I ran across with some further issues I noticed this flag: https://github.com/GoogleContainerTools/kaniko#flag---compressed-caching
which needs to be set to 'false' on OOM-errors
setting that flag on my side resulted in no OOM termination (yet) but scratching the maximum allocateable memory
Bildschirmfoto 2023-10-04 um 14 58 27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/multi-stage builds issues related to kaniko multi-stage builds area/performance issues related to kaniko performance enhancement categorized differs-from-docker issue/build-fails issue/crash issue/hang issue/oom kind/bug Something isn't working possible-dupe priority/p1 Basic need feature compatibility with docker build. we should be working on this next. works-with-docker
Projects
None yet
Development

No branches or pull requests

6 participants