Experiment with running rails/rails builds on Buildkite Hosted Agents #141

yob · 2025-01-29T07:00:41Z

I've started exploring what changes are required to get rails/rails builds running on Buildkite Hosted Agents, and whether there are performance gains to be had. What's here works and runs a green build, but I'm not very familiar with the rails core conventions and preferences so this is an early preview for feedback.

The required changes are are all in the first commit. The second is a helpful debugging tweak that prints the content of the docker image store at the start of the job - helpful for understanding how the caching is working, but I assume we'd drop it before merging.

The high level changes are:

Use the agent-local OCI registry rather than ECR
We still push the compiled images to the registry at the start of build, and pull it at the start of each subsequent job. However, most layers of the image don't change between builds and in most cases only the changed layers are fetched (== speedy)
In many cases the images for mysql/postgres/rabbitmq/etc will be cached on the agents from previous runs and won't need to be pulled
Update docker-compose plugin to the 5.x series

In my testing I've found the builds complete in 5-8 minutes when run on agents with 2vCPU and 4Gb RAM, depending on cache warmth and hit rate.

An experiment in changing the rails CI pipeline from "self-hosted" agents to "hosted" agents, a recently release Buildkite feature [1]. The hosted agents linux environment is superficially quite similar to the Elastic Stack for AWS, so the required changes are fairly minimal. Roughly half the changes are to take advantage of some performance optimisations available on hosted agents (like cache volumes, and remote buildkit builders with cache that last across builds). The essential changes: * Read the OCI registry from the environment rather than hard code an ECR registry. The current self-hosted agents run in AWS and can access ECR, but the hosted agent environment has access to its own registry specifically for use cases like this - building an image at the start of the build and then reusing it in later jobs * Changing the queue from `default` or `builder`, to `hosted` Optimisations: * There's no need to use the docker-compose plugins cache_from and image_name shenanigans. The images built at the start of each build use a remote buildkit builder with cache that is s hared between builds. The cache is typically warm, and when it is the image build time drops from ~2 mins to ~18sec * Use plain buildkit to build the images, without the docker compose plugin. This avoids the image being exported from buildkit to docker, and when the buildkit cache is warm the jobs complete in as little as 18s. This bypasses the docker-compse built in support for separating building and running, but the docker-compose.yml already kinda bypasses that by hard coding the image used in the run jobs (using the IMAGE_NAME env var) * Create a cache volume for ruby gems that are installed in docker during the initial step. This shaves ~30s off the build time [1] https://buildkite.com/docs/pipelines/hosted-agents/overview

yob · 2025-01-29T07:11:56Z

lib/buildkite/config/rake_command.rb

@@ -48,12 +48,14 @@ def install_plugins(service = "default", env = nil, dir = ".")
          ],
          compressed: ".buildkite.tgz"
        }
+        plugin :metahook, {
+          "pre-command": "echo \"+++ inspect docker image store\"\ndocker image ls"
+        }


This is for debugging only, we can remove it before considering merging

yob · 2025-01-29T07:12:15Z

lib/buildkite/config/rake_command.rb


        plugin :docker_compose, {
          "env" => env,
          "run" => service,
-          "pull" => service,
-          "pull-retries" => 3,
+          "tty" => "true",


tty changed to default:false in v5 of the plugin

yob · 2025-01-29T07:13:16Z

pipelines/rails-ci/initial.yml

+    cache:
+      paths:
+        - "cache/bundler"
+      name: "rails-initial-bundler-cache"


Create a cache volume for the gems used in this initial step. In most cases the cache will be warm and it shaves 10s of seconds of the initial step

yob · 2025-01-29T07:13:55Z

lib/buildkite/config/build_context.rb

@@ -190,7 +190,7 @@ def min_ruby
      end

      def remote_image_base
-        "973266071021.dkr.ecr.us-east-1.amazonaws.com/#{"#{build_queue}-" unless standard_queues.include?(build_queue)}builds"
+        ENV.fetch("REGISTRY") + "/#{"#{build_queue}-" unless standard_queues.include?(build_queue)}builds"


We now fetch the registry hostname from the environment dynamically in the initial job

yob added 2 commits January 29, 2025 17:50

inspect the docker image store at the start of each job

da0ac85

yob commented Jan 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment with running rails/rails builds on Buildkite Hosted Agents #141

Experiment with running rails/rails builds on Buildkite Hosted Agents #141

yob commented Jan 29, 2025

yob Jan 29, 2025

yob Jan 29, 2025

yob Jan 29, 2025

yob Jan 29, 2025

Experiment with running rails/rails builds on Buildkite Hosted Agents #141

Are you sure you want to change the base?

Experiment with running rails/rails builds on Buildkite Hosted Agents #141

Conversation

yob commented Jan 29, 2025

yob Jan 29, 2025

Choose a reason for hiding this comment

yob Jan 29, 2025

Choose a reason for hiding this comment

yob Jan 29, 2025

Choose a reason for hiding this comment

yob Jan 29, 2025

Choose a reason for hiding this comment