Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to reduce size of built docker images; replace miniconda with micromamba #29

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

tomkinsc
Copy link
Member

@tomkinsc tomkinsc commented Apr 12, 2024

Docker builds of viral-baseimage have been creeping up in size. Inspection of the images via dive revealed that much of the space was being consumed by google-cloud-sdk and redundant installs of python.

This PR includes a number of changes to reduce the size of docker images build from this repo.

This installs google-cloud-sdk from google's distribution tarball rather than the apt package for it, in order to save some space (the former is smaller). It also removes the BigQuery component which is installed by default with google-cloud-sdk (it can be restored later if needed), and manually removes the redundant install of python3 that comes bundled with google-cloud-sdk since we already have the system python in /bin.

This also replaces miniconda (+mamba) with micromamba as the manager for conda packages. Note: this means we do not have an install of python3 in the base conda environment (further reducing the size of this baseimage). Downstream python usage relying on this image will require an explicit conda python install in either this image or in downstream images built on top of this one (viral-core). This also also adds symlinks to micromamba from "conda" and "mamba", allowing most commands to make use of micromamba transparently, though there are some breaking differences in API (ex. "conda config --add" vs "micromamba config append").

Altogether these changes shave ~800MB, or 45%, off the size of the docker image.

Closes #31

…kage; replace miniconda with micromamba

This includes a number of changes to reduce the size of docker images build from this repo.

This installs google-cloud-sdk from google's distribution tarball rather than the apt package for it, in order to save some space. It removes the BigQuery component which is installed by default with google-cloud-sdk (it can be restored later if needed), and also manually removes the install of python3 that comes bundled with google-cloud-sdk since we already have an install at the system level in /bin.

This also replaces miniconda with micromamba as the manager for conda packages. Note that this means we do not have an install of python3 in the base conda environment (further reducing the size of the baseimage), so downstream python usage will require an explicit conda python install in either this image or in downstream images build on top of this one (viral-core). This also also adds symlinks to micromamba from "conda" and "mamba",  allow most commands to make use of micromamba transparently, though there are some breaking differences in API (ex. "conda config --add" vs "micromamba config append").
@tomkinsc tomkinsc requested a review from dpark01 April 12, 2024 21:16
…w) conda package; multi-stage build; progress toward using micromamba

bump dx-toolkit version (and install from newly-updated conda package to avoid second python install); bump google cloud sdk (and remove more cruft from its install); bump qsv version; bump udocker version; employ multi-stage build for squashing layers; remove python from apt packages; remove a few non-python apt packages; continue progress toward switching to micromamba
@tomkinsc
Copy link
Member Author

WIP, but with the changes above, the viral-baseimage size is reduced on Quay.io, from 560MB for v0.2.4 to 261MB for the Docker build as of commit 7d59555, and that's with a python 3.10 installation included.

The main changes are that dx-toolkit, google-cloud-cli, and the OS are now using a conda-installed Python rather than each bringing their own.

The image build is now multi-stage, allowing for layer squashing with deduplicating copy-on-overlay during the second stage (depending on the docker engine/backend: containerd and a few others support it; not all do).

Additional unnecessary files have also been removed from the google-cloud-cli installation (beyond its extra copy of Python).

The switch to micromamba will also make downstream builds faster, though care will need to be taken for a smooth transition from miniconda+mamba. We'll want to double-check the ENV and which environments(s) the PATH has (i.e. which conda env is active), and to make sure duplicate package installations do not occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transition from miniconda to (micro)mamba
2 participants