Skip to content
Get 30+ hours of free content from GitHub Universe! Watch now.

Simplifying developer onboarding with a few clicks

Migrating from a monolithic architecture to a microservices approach required tooling that enabled both flexibility and consistency.

Artwork: Tim Peacock

Photo of Art Chaidarun
Duolingo logo

Art Chaidarun // Senior Staff Software Engineer, Duolingo

The ReadME Project amplifies the voices of the open source community: the maintainers, developers, and teams whose contributions move the world forward every day.

Getting your computer set up to write code at Duolingo was once a rite of passage that took several hours—or even days—of installing, configuring, and troubleshooting. Now that process has been reduced to a few clicks and mere minutes!

As we’ve migrated our codebase from a monolith to a microservice architecture with hundreds of different environments, it’s become even more important to offer our software developers a fast and easy onboarding experience. Our latest approach is to move our development process into the cloud with GitHub Codespaces.

We want to share our methods and results in the hope that other organizations can learn from how we’ve used Codespaces to streamline our developer experience. 


In this Guide, you will learn:

  • The benefits of a cloud-based IDE

  • How to plan your organization’s transition to remote development

  • How to maintain Codespace config files and scripts across many repositories

  • How to securely grant access to private resources


Why code in the cloud?

Developers traditionally install and run tools like integrated development environments (IDEs), libraries, and language runtimes on their local workstations. This approach makes sense at first, but has some downsides as an organization grows:

  • Everyone must carefully follow long setup instructions every time they work on a new project or get a new device.

  • Laptop processing power limits build speeds. Docker is fast on Linux but much (much!) slower on Mac.

  • Managing various versions of Python, Node.js, and others on the same laptop gets pretty hairy, requiring tools like nodenv.

  • Making local builds accessible to coworkers for testing takes significant time and effort.

  • Some required tools may only work on certain laptops, e.g., Docker for Mac has some problems unique to specific processors (M1).

Moving the development process into the cloud solves all these problems. 

Why we chose Codespaces

We first began looking into remote development in 2020, and there weren’t many mature options. Rather than investing in a homegrown solution to what must be a common problem, we decided to wait for an off-the-shelf product from an established player. Such a product arrived the following year from GitHub.

The most popular general-purpose editor at Duolingo has long been VS Code, and we’ve always hosted our code on GitHub. We figured that neither of those two Microsoft products is likely to ever integrate as seamlessly with alternatives like AWS Cloud9 and Gitpod as they do with Codespaces.

Planning the transition

Instead of opening up Codespaces to all of our developers without guidance, we found it helpful first to have a single developer try Codespaces in a few representative repositories. This first developer would work out any kinks, document findings for others, and serve as an internal expert who could answer other developers’ questions in our #help-codespaces Slack channel.

That initial set of repositories included our backend monolith (our oldest and largest repo), our main web repo, and one microservice repo for each of our two supported backend stacks: Python and JVM. Once we had Codespaces running smoothly in these four test repos, we were confident that our approach would scale well across the entire organization.

We want to hear from you! Join us on GitHub Discussions.

Keeping things DRY (don’t repeat yourself)

Code duplication is the root of many evils in software development, so we’ve configured and scripted Codespaces to avoid it as much as possible.

We started by baking common tools like pre-commit and the AWS CLI into the Docker image that we host on GitHub’s Container registry and using as the base environment for each repository’s codespace. We maintain a few separate image tags corresponding to different Python versions. Apart from that, we try to keep this base image completely repo-agnostic.

This image is used only at Codespace creation time. What if we want to add a new tool or behavior to all existing Codespaces without requiring that they be destroyed and recreated? We use a multi-layered system of hook scripts. In each repo’s codespace config file, we specify an executable Bash script included in our base image as the postStartCommand to run when spinning up the Codespace. That script calls a corresponding postStart script, which is bundled in our self-updating duo CLI, and performs some actions like starting Tailscale and calling a repo-specific .devcontainer/postStart script if one exists.

Thanks to these careful arrangements, we’re able to declare (via Pulldozer, our tool for concurrently editing hundreds of repos) the same lean and consistent Codespace config file across the vast majority of our repos:

1
2
3
4
5
6
7
8
9
10
11
{
  "features": {"docker-in-docker": "latest"},
  "image": "ghcr.io/duolingo/codespaces:python3.9",
  "onCreateCommand": "/onCreate",
  "postAttachCommand": "/postAttach",
  "postCreateCommand": "/postCreate",
  "postStartCommand": "/postStart",
  "remoteUser": "vscode",
  "settings": {"git.autofetch": true},
  "updateContentCommand": "/updateContent"
}

Accessing private resources with Tailscale

Many of our AWS resources are normally only accessible to developers via our office VPN. To make those resources available to Codespaces, we host a Tailscale relay node in our VPC and run the Tailscale client inside each codespace.

As mentioned previously, our postStart scripts automatically start the Tailscale client upon codespace launch. To connect, our developers simply run duo vpn (our wrapper around tailscale up) and sign in with Google SSO. This connection persists across codespace launches and will be ready to go again as soon as you start work the next day, a feature that even our regular VPN doesn’t have.

Summary

Codespaces have improved some workflows while enabling others that would have been impossible without such an integration. Locally building our original monolith repository had become such an intractable problem that most developers simply gave up and deployed straight to staging servers instead. Now it takes just one minute to spin up a monolith Codespace that’s practically equivalent to a local environment, with a feedback loop on the order of seconds rather than minutes.

Our Codespace config and scripts have thankfully required very few changes since their introduction; we expect this stability to continue going forward. Our comprehensive internal documentation, dedicated help channel on Slack, and flexible ways to customize functionality have kept our Codespace workflows simple and easy to maintain.

About The
ReadME Project

Coding is usually seen as a solitary activity, but it’s actually the world’s largest community effort led by open source maintainers, contributors, and teams. These unsung heroes put in long hours to build software, fix issues, field questions, and manage communities.

The ReadME Project is part of GitHub’s ongoing effort to amplify the voices of the developer community. It’s an evolving space to engage with the community and explore the stories, challenges, technology, and culture that surround the world of open source.

Follow us:

Nominate a developer

Nominate inspiring developers and projects you think we should feature in The ReadME Project.

Support the community

Recognize developers working behind the scenes and help open source projects get the resources they need.

Thank you! for subscribing