Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with running rails/rails builds on Buildkite Hosted Agents #141

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

yob
Copy link

@yob yob commented Jan 29, 2025

I've started exploring what changes are required to get rails/rails builds running on Buildkite Hosted Agents, and whether there are performance gains to be had. What's here works and runs a green build, but I'm not very familiar with the rails core conventions and preferences so this is an early preview for feedback.

The required changes are are all in the first commit. The second is a helpful debugging tweak that prints the content of the docker image store at the start of the job - helpful for understanding how the caching is working, but I assume we'd drop it before merging.

The high level changes are:

  1. Use the agent-local OCI registry rather than ECR
  2. We still push the compiled images to the registry at the start of build, and pull it at the start of each subsequent job. However, most layers of the image don't change between builds and in most cases only the changed layers are fetched (== speedy)
  3. In many cases the images for mysql/postgres/rabbitmq/etc will be cached on the agents from previous runs and won't need to be pulled
  4. Update docker-compose plugin to the 5.x series

In my testing I've found the builds complete in 5-8 minutes when run on agents with 2vCPU and 4Gb RAM, depending on cache warmth and hit rate.

yob added 2 commits January 29, 2025 17:50
An experiment in changing the rails CI pipeline from "self-hosted"
agents to "hosted" agents, a recently release Buildkite feature [1].

The hosted agents linux environment is superficially quite similar to
the Elastic Stack for AWS, so the required changes are fairly minimal.
Roughly half the changes are to take advantage of some performance
optimisations available on hosted agents (like cache volumes, and
remote buildkit builders with cache that last across builds).

The essential changes:

* Read the OCI registry from the environment rather than hard code an
  ECR registry. The current self-hosted agents run in AWS and can access
  ECR, but the hosted agent environment has access to its own registry
  specifically for use cases like this - building an image at the start
  of the build and then reusing it in later jobs
* Changing the queue from `default` or `builder`, to `hosted`

Optimisations:

* There's no need to use the docker-compose plugins cache_from and
  image_name shenanigans. The images built at the start of each build
  use a remote buildkit builder with cache that is s hared between
  builds. The cache is typically warm, and when it is the image build
  time drops from ~2 mins to ~18sec
* Use plain buildkit to build the images, without the docker compose
  plugin. This avoids the image being exported from buildkit to docker,
  and when the buildkit cache is warm the jobs complete in as little as
  18s. This bypasses the docker-compse built in support for separating
  building and running, but the docker-compose.yml already kinda
  bypasses that by hard coding the image used in the run jobs (using the
  IMAGE_NAME env var)
* Create a cache volume for ruby gems that are installed in docker
  during the initial step. This shaves ~30s off the build time

[1] https://buildkite.com/docs/pipelines/hosted-agents/overview
@@ -48,12 +48,14 @@ def install_plugins(service = "default", env = nil, dir = ".")
],
compressed: ".buildkite.tgz"
}
plugin :metahook, {
"pre-command": "echo \"+++ inspect docker image store\"\ndocker image ls"
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for debugging only, we can remove it before considering merging


plugin :docker_compose, {
"env" => env,
"run" => service,
"pull" => service,
"pull-retries" => 3,
"tty" => "true",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tty changed to default:false in v5 of the plugin

cache:
paths:
- "cache/bundler"
name: "rails-initial-bundler-cache"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a cache volume for the gems used in this initial step. In most cases the cache will be warm and it shaves 10s of seconds of the initial step

@@ -190,7 +190,7 @@ def min_ruby
end

def remote_image_base
"973266071021.dkr.ecr.us-east-1.amazonaws.com/#{"#{build_queue}-" unless standard_queues.include?(build_queue)}builds"
ENV.fetch("REGISTRY") + "/#{"#{build_queue}-" unless standard_queues.include?(build_queue)}builds"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now fetch the registry hostname from the environment dynamically in the initial job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant