-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment with running rails/rails builds on Buildkite Hosted Agents #141
base: main
Are you sure you want to change the base?
Conversation
An experiment in changing the rails CI pipeline from "self-hosted" agents to "hosted" agents, a recently release Buildkite feature [1]. The hosted agents linux environment is superficially quite similar to the Elastic Stack for AWS, so the required changes are fairly minimal. Roughly half the changes are to take advantage of some performance optimisations available on hosted agents (like cache volumes, and remote buildkit builders with cache that last across builds). The essential changes: * Read the OCI registry from the environment rather than hard code an ECR registry. The current self-hosted agents run in AWS and can access ECR, but the hosted agent environment has access to its own registry specifically for use cases like this - building an image at the start of the build and then reusing it in later jobs * Changing the queue from `default` or `builder`, to `hosted` Optimisations: * There's no need to use the docker-compose plugins cache_from and image_name shenanigans. The images built at the start of each build use a remote buildkit builder with cache that is s hared between builds. The cache is typically warm, and when it is the image build time drops from ~2 mins to ~18sec * Use plain buildkit to build the images, without the docker compose plugin. This avoids the image being exported from buildkit to docker, and when the buildkit cache is warm the jobs complete in as little as 18s. This bypasses the docker-compse built in support for separating building and running, but the docker-compose.yml already kinda bypasses that by hard coding the image used in the run jobs (using the IMAGE_NAME env var) * Create a cache volume for ruby gems that are installed in docker during the initial step. This shaves ~30s off the build time [1] https://buildkite.com/docs/pipelines/hosted-agents/overview
@@ -48,12 +48,14 @@ def install_plugins(service = "default", env = nil, dir = ".") | |||
], | |||
compressed: ".buildkite.tgz" | |||
} | |||
plugin :metahook, { | |||
"pre-command": "echo \"+++ inspect docker image store\"\ndocker image ls" | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for debugging only, we can remove it before considering merging
|
||
plugin :docker_compose, { | ||
"env" => env, | ||
"run" => service, | ||
"pull" => service, | ||
"pull-retries" => 3, | ||
"tty" => "true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tty
changed to default:false in v5 of the plugin
cache: | ||
paths: | ||
- "cache/bundler" | ||
name: "rails-initial-bundler-cache" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a cache volume for the gems used in this initial step. In most cases the cache will be warm and it shaves 10s of seconds of the initial step
@@ -190,7 +190,7 @@ def min_ruby | |||
end | |||
|
|||
def remote_image_base | |||
"973266071021.dkr.ecr.us-east-1.amazonaws.com/#{"#{build_queue}-" unless standard_queues.include?(build_queue)}builds" | |||
ENV.fetch("REGISTRY") + "/#{"#{build_queue}-" unless standard_queues.include?(build_queue)}builds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now fetch the registry hostname from the environment dynamically in the initial job
I've started exploring what changes are required to get rails/rails builds running on Buildkite Hosted Agents, and whether there are performance gains to be had. What's here works and runs a green build, but I'm not very familiar with the rails core conventions and preferences so this is an early preview for feedback.
The required changes are are all in the first commit. The second is a helpful debugging tweak that prints the content of the docker image store at the start of the job - helpful for understanding how the caching is working, but I assume we'd drop it before merging.
The high level changes are:
In my testing I've found the builds complete in 5-8 minutes when run on agents with 2vCPU and 4Gb RAM, depending on cache warmth and hit rate.