Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: multi-arch #58

Merged
merged 2 commits into from
Oct 9, 2024
Merged

Conversation

darkweaver87
Copy link
Contributor

@darkweaver87 darkweaver87 commented Mar 25, 2024

Change Summary

This PR uses docker multi-stage feature and and multi-arch build.
It also updates to python 3.11 (version on debian) and use chromium instead of chrome to support multi-arch easily.
Fixes #58.

PR Checklist

Copy link
Member

@jasonbosco jasonbosco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you still keep the base image separate from the image that places the latest code inside the base image?

The base image takes a while to build and doesn't need to be rebuilt each time the scraper code changes, which is why these are two separate images.

@darkweaver87
Copy link
Contributor Author

Sorry for late answer @jasonbosco. I can do it but I don't get your point actually :-)
The dockerfile is a multi-stage one. So each stage can be built separately if needed and a change in the code won't trigger a full rebuild of the base image and you can even specify which target you want to build. Example building test image:

$ docker buildx build -t typesense-docsearch-scraper:latest --platform=linux/amd64 --load . -f scraper/dev/docker/Dockerfile --target test
[+] Building 215.9s (31/31) FINISHED                                                                                                                                                 docker-container:nifty_pascal
 => [internal] booting buildkit                                                                                                                                                                              13.5s
 => => pulling image moby/buildkit:buildx-stable-1                                                                                                                                                           12.4s
 => => creating container buildx_buildkit_nifty_pascal0                                                                                                                                                       1.0s
 => [internal] load build definition from Dockerfile                                                                                                                                                          0.0s
 => => transferring dockerfile: 1.79kB                                                                                                                                                                        0.0s
 => resolve image config for docker-image://docker.io/docker/dockerfile:1.4                                                                                                                                   1.2s
 => [auth] docker/dockerfile:pull token for registry-1.docker.io                                                                                                                                              0.0s
 => docker-image://docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc                                                                                    1.2s
 => => resolve docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc                                                                                        0.0s
 => => sha256:1328b32c40fca9bcf9d70d8eccb72eb873d1124d72dadce04db8badbe7b08546 9.94MB / 9.94MB                                                                                                                1.0s
 => => extracting sha256:1328b32c40fca9bcf9d70d8eccb72eb873d1124d72dadce04db8badbe7b08546                                                                                                                     0.1s
 => [internal] load .dockerignore                                                                                                                                                                             0.0s
 => => transferring context: 267B                                                                                                                                                                             0.0s
 => [internal] load metadata for docker.io/library/debian:12-slim                                                                                                                                             1.4s
 => [auth] library/debian:pull token for registry-1.docker.io                                                                                                                                                 0.0s
 => [internal] load build context                                                                                                                                                                             0.1s
 => => transferring context: 1.86MB                                                                                                                                                                           0.0s
 => [base  1/17] FROM docker.io/library/debian:12-slim@sha256:f528891ab1aa484bf7233dbcc84f3c806c3e427571d75510a9d74bb5ec535b33                                                                                3.3s
 => => resolve docker.io/library/debian:12-slim@sha256:f528891ab1aa484bf7233dbcc84f3c806c3e427571d75510a9d74bb5ec535b33                                                                                       0.0s
 => => sha256:f11c1adaa26e078479ccdd45312ea3b88476441b91be0ec898a7e07bfd05badc 29.13MB / 29.13MB                                                                                                              2.7s
 => => extracting sha256:f11c1adaa26e078479ccdd45312ea3b88476441b91be0ec898a7e07bfd05badc                                                                                                                     0.5s
 => [base  2/17] RUN useradd -d /home/seleuser -m seleuser                                                                                                                                                    0.2s
 => [base  3/17] RUN chown -R seleuser /home/seleuser                                                                                                                                                         0.1s
 => [base  4/17] RUN chgrp -R seleuser /home/seleuser                                                                                                                                                         0.1s
 => [base  5/17] WORKDIR /home/seleuser                                                                                                                                                                       0.0s
 => [base  6/17] RUN apt-get update -y && apt-get install -yq     software-properties-common    python3                                                                                                      23.2s
 => [base  7/17] RUN apt-get update -y && apt-get install -yq     curl     wget     sudo     gnupg     && curl -sL https://deb.nodesource.com/setup_18.x | sudo bash -                                        8.1s 
 => [base  8/17] RUN apt-get update -y && apt-get install -y     nodejs                                                                                                                                       5.8s 
 => [base  9/17] RUN apt-get update -y && apt-get install -yq   unzip   xvfb   libxi6   libgconf-2-4   default-jdk                                                                                           38.3s 
 => [base 10/17] RUN apt-get update -y && apt-get install -yq   chromium-driver                                                                                                                              29.7s 
 => [base 11/17] RUN wget -q https://github.com/SeleniumHQ/selenium/releases/download/selenium-4.4.0/selenium-server-4.4.0.jar                                                                                3.0s 
 => [base 12/17] RUN wget -q https://repo1.maven.org/maven2/org/testng/testng/7.6.1/testng-7.6.1.jar                                                                                                          0.4s 
 => [base 13/17] COPY Pipfile .                                                                                                                                                                               0.1s 
 => [base 14/17] COPY Pipfile.lock .                                                                                                                                                                          0.1s 
 => [base 15/17] RUN apt-get update -y && apt-get install -yq     python3-pip                                                                                                                                20.9s 
 => [base 16/17] RUN pip3 install pipenv --break-system-packages                                                                                                                                              4.8s 
 => [base 17/17] RUN pipenv sync --python 3.11                                                                                                                                                               17.5s 
 => [test 1/3] WORKDIR /home/seleuser                                                                                                                                                                         0.1s 
 => [test 2/3] COPY . .                                                                                                                                                                                       0.1s 
 => [test 3/3] RUN touch .env                                                                                                                                                                                 0.1s 
 => exporting to docker image format                                                                                                                                                                         42.4s 
 => => exporting layers                                                                                                                                                                                      22.1s 
 => => exporting manifest sha256:47f5650797ce0c30a35d82381a955e8328bed8ca12a0b25974cf4bc151ed91e4                                                                                                             0.0s 
 => => exporting config sha256:e345418620cb7a94a7ecd1fb6e7c1e908e3626ef85b070d58ed6406410b4eae4                                                                                                               0.0s
 => => sending tarball                                                                                                                                                                                       20.3s
 => importing to docker                                                                                                                                                                                      15.9s
 => => loading layer 32148f9f6c5a 294.91kB / 29.13MB                                                                                                                                                         15.9s
 => => loading layer 408aeabd8ec4 3.31kB / 3.31kB                                                                                                                                                            15.1s
 => => loading layer 5f70bf18a086 32B / 32B                                                                                                                                                                  15.1s
 => => loading layer b3f64d5dd689 557.06kB / 68.94MB                                                                                                                                                         14.9s
 => => loading layer 0fa4a649ac15 131.07kB / 11.15MB                                                                                                                                                         13.6s
 => => loading layer 6a94ac0622d7 458.75kB / 45.35MB                                                                                                                                                         13.5s
 => => loading layer ea92554603cd 191.07MB / 238.09MB                                                                                                                                                        12.3s
 => => loading layer 1c59b50d0aed 155.42MB / 174.49MB                                                                                                                                                         8.5s
 => => loading layer c5a49a44b5a9 229.38kB / 21.94MB                                                                                                                                                          5.1s
 => => loading layer d7632a142344 32.77kB / 921.11kB                                                                                                                                                          4.8s
 => => loading layer 50d7f986270c 450B / 450B                                                                                                                                                                 4.7s
 => => loading layer d0a6277ae75e 20.11kB / 20.11kB                                                                                                                                                           4.7s
 => => loading layer 093373cbd025 557.06kB / 98.49MB                                                                                                                                                          4.6s
 => => loading layer 56623155e221 294.91kB / 26.72MB                                                                                                                                                          2.7s
 => => loading layer a26fa1e42ce8 557.06kB / 85.80MB                                                                                                                                                          2.1s
 => => loading layer 193ccc1a9401 32.77kB / 1.19MB                                                                                                                                                            0.3s
 => => loading layer 6360e74fdf2a 154B / 154B     

now I change the code:

echo '# test' >> scraper/__init__.py
$ docker buildx build -t typesense-docsearch-scraper:latest --platform=linux/amd64 --load . -f scraper/dev/docker/Dockerfile --target test
[+] Building 4.9s (28/28) FINISHED                                                                                                                                                   docker-container:nifty_pascal
 => [internal] load build definition from Dockerfile                                                                                                                                                          0.0s
 => => transferring dockerfile: 1.79kB                                                                                                                                                                        0.0s
 => resolve image config for docker-image://docker.io/docker/dockerfile:1.4                                                                                                                                   0.6s
 => CACHED docker-image://docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc                                                                             0.0s
 => => resolve docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc                                                                                        0.0s
 => [internal] load .dockerignore                                                                                                                                                                             0.0s
 => => transferring context: 267B                                                                                                                                                                             0.0s
 => [internal] load metadata for docker.io/library/debian:12-slim                                                                                                                                             0.2s
 => [internal] load build context                                                                                                                                                                             0.0s
 => => transferring context: 22.46kB                                                                                                                                                                          0.0s
 => [base  1/17] FROM docker.io/library/debian:12-slim@sha256:f528891ab1aa484bf7233dbcc84f3c806c3e427571d75510a9d74bb5ec535b33                                                                                0.0s
 => => resolve docker.io/library/debian:12-slim@sha256:f528891ab1aa484bf7233dbcc84f3c806c3e427571d75510a9d74bb5ec535b33                                                                                       0.0s
 => CACHED [base  2/17] RUN useradd -d /home/seleuser -m seleuser                                                                                                                                             0.0s
 => CACHED [base  3/17] RUN chown -R seleuser /home/seleuser                                                                                                                                                  0.0s
 => CACHED [base  4/17] RUN chgrp -R seleuser /home/seleuser                                                                                                                                                  0.0s
 => CACHED [base  5/17] WORKDIR /home/seleuser                                                                                                                                                                0.0s
 => CACHED [base  6/17] RUN apt-get update -y && apt-get install -yq     software-properties-common    python3                                                                                                0.0s
 => CACHED [base  7/17] RUN apt-get update -y && apt-get install -yq     curl     wget     sudo     gnupg     && curl -sL https://deb.nodesource.com/setup_18.x | sudo bash -                                 0.0s
 => CACHED [base  8/17] RUN apt-get update -y && apt-get install -y     nodejs                                                                                                                                0.0s
 => CACHED [base  9/17] RUN apt-get update -y && apt-get install -yq   unzip   xvfb   libxi6   libgconf-2-4   default-jdk                                                                                     0.0s
 => CACHED [base 10/17] RUN apt-get update -y && apt-get install -yq   chromium-driver                                                                                                                        0.0s
 => CACHED [base 11/17] RUN wget -q https://github.com/SeleniumHQ/selenium/releases/download/selenium-4.4.0/selenium-server-4.4.0.jar                                                                         0.0s
 => CACHED [base 12/17] RUN wget -q https://repo1.maven.org/maven2/org/testng/testng/7.6.1/testng-7.6.1.jar                                                                                                   0.0s
 => CACHED [base 13/17] COPY Pipfile .                                                                                                                                                                        0.0s
 => CACHED [base 14/17] COPY Pipfile.lock .                                                                                                                                                                   0.0s
 => CACHED [base 15/17] RUN apt-get update -y && apt-get install -yq     python3-pip                                                                                                                          0.0s
 => CACHED [base 16/17] RUN pip3 install pipenv --break-system-packages                                                                                                                                       0.0s
 => CACHED [base 17/17] RUN pipenv sync --python 3.11                                                                                                                                                         0.0s
 => CACHED [test 1/3] WORKDIR /home/seleuser                                                                                                                                                                  0.0s
 => [test 2/3] COPY . .                                                                                                                                                                                       0.0s
 => [test 3/3] RUN touch .env                                                                                                                                                                                 0.1s
 => exporting to docker image format                                                                                                                                                                          3.8s
 => => exporting layers                                                                                                                                                                                       0.1s
 => => exporting manifest sha256:73f7dc3b3c1d8a32737692ed785e7b5c18fdc6f671f9a46549f4f420d089d125                                                                                                             0.0s
 => => exporting config sha256:3bc3081e931ed24936f3920a68700fdd27a4f558421d120c20ca6498d20e85fa                                                                                                               0.0s
 => => sending tarball                                                                                                                                                                                        3.6s
 => importing to docker                                                                                                                                                                                       0.2s
 => => loading layer f4a7588e8c11 32.77kB / 1.19MB                                                                                                                                                            0.2s
 => => loading layer 577e67c906ab 154B / 154B                                                                           

Look at CACHED lines and build time :-)

@jasonbosco
Copy link
Member

@darkweaver87 Docker cache only works when using the same machine to run consecutive docker builds. If we need to run this in CI (without any cache, since at least with GitHub actions saving any form of cache usually takes a long time), then we'd end up rebuilding all layers every CI run.

It would be nice to keep CI fast, and also the base images separate.

@darkweaver87
Copy link
Contributor Author

You can still use the --cache-from option (I'm using it in my own CI to build scraper en ARM) but if it's more convenient for you then I will rebase and switch back to multiple files.

@jasonbosco
Copy link
Member

Yeah multiple files would be ideal

@darkweaver87
Copy link
Contributor Author

Done :-)
FYI, I faced when an issue since chromium has been upgraded in debian since my last PR.

@jasonbosco jasonbosco merged commit 617a374 into typesense:master Oct 9, 2024
1 check passed
@jasonbosco
Copy link
Member

@darkweaver87 Thank you again for the PR. I've published the changes in this Docker Image: typesense/docsearch-scraper:0.11.0.rc1.

Could you give it a shot and let me know how it goes?

@darkweaver87
Copy link
Contributor Author

@jasonbosco I just tested typesense/docsearch-scraper:0.11.0.rc1 on my infrastructure (which runs on aws/t4g) and it works perfectly. Many thanks.

@jasonbosco
Copy link
Member

Awesome, thank you for confirming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants