Tobias Koppers doesn’t work for Instagram. He never has. But since 2014 he’s been responsible for maintaining an important component of the web version of Instagram.
Koppers is the creator of webpack, an open source code bundling tool for JavaScript. Basically, webpack enables code splitting, allowing developers to optimize their JavaScript code and split it up into smaller chunks so that when you visit a web application, the most important pieces of code download first. This makes web applications faster because you don’t have to download the entire application before you can start using it.
Webpack began as part of an academic project in 2012. Like so many other open source projects, it started out as a way to scratch an itch. “I was working on a web application for my master’s thesis in computer science and I was looking for a code optimization tool,” he explains in a recent episode of The ReadME Podcast. Unhappy with the way other tools handled code splitting, he wrote his own and released it on GitHub. “I had contributed to open source before, but I had never open sourced my own project before, so I thought it would be fun,” he says.
Less than three years later, former Instagram engineering manager Pete Hunt explained at OSCON 2014 that the company relied on webpack, putting Koppers in the strange position of maintaining an important, yet practically invisible part of one of the most popular apps in the world.
Webpack’s quiet, under-the-hood role at Instagram is a great example of how there’s much more open source in our software than the average user may realize. Proprietary apps and web services—even unlikely ones—make extensive use of open source, though it’s not always readily apparent. For instance, countless iOS apps—including Hulu, Poshmark, and TikTok—rely on the open source networking library AlamoFire or its predecessor AFNetworking to download images and data even though users never interact with those libraries directly.
Open source software often draws comparisons to physical infrastructure. Writer and researcher Nadia Eghbal describes open source as the “roads and bridges” of the digital world. But while roads and bridges are highly visible forms of infrastructure, open source often goes unseen by its end users.
Open source libraries like cURL are hidden power sources that many take for granted. And, like their physical counterparts, they need to be maintained to stay operational. cURL is included in most Linux distributions and needs to be updated to support the latest web standards, much like any standard web browser is. “Many people don’t realize that cURL is an application and not just an operating system feature,” says cURL creator Daniel Stenberg. “And even those who know about it often wonder why I need to do maintenance.”
The near-invisibility of many open source projects creates a number of challenges for the open source ecosystem, ranging from funding the project and securing its code to ensuring compatibility between its components.
Trials and tribulations
Security is the most pressing reason to care about open source infrastructure. Whether it’s open source or proprietary, code that isn’t audited and maintained is a security risk for every piece of software that uses it. But many open source projects aren’t created and maintained by companies offering support contracts and warranties. Instead, they’re built by developers, often working in their spare time. These developers might not have anyone actively patching security vulnerabilities, let alone paying for audits.
Ideally, critical open source projects will have full-time developers not only fixing vulnerabilities but “shifting left” on security to reduce vulnerabilities in the first place. Today, developers at larger organizations can scan their code for vulnerabilities at check-in and fix issues before deploying to production, while infosec teams work with software architects to ensure best practices from the beginning of a project. But the same resources aren’t always available for smaller companies or on individual projects.
Full-time developers require funding. While financial models for open source have been a major topic of interest in recent years, no single model works best. Many developers found startups to monetize their work. Others rely on corporate sponsorships to fund full-time development. Some charge for extra consulting and support to fund their projects. After years of working on cURL in his spare time, Stenberg went to work for wolfSSL, providing commercial support for cURL.
Convincing an employer to pay for open source work as part of a developer’s responsibilities is another popular funding model. For developers like Koppers, it’s an ideal situation. Koppers quit his job to work on Webpack thanks to corporate sponsorships, but now works full-time on the project for a company called Vercel. Some companies recognize that their business depends on the reliability, security, and integrity of open source libraries and, in response, hire the maintainers to work on their creations full or part-time, like any other critical piece of infrastructure. Carolyn Van Slyck started work on the open source deployment automation system Porter on her own time. Now its maintenance is part of her responsibilities at Microsoft. But she points out that these sorts of arrangements aren’t necessarily permanent. Companies may not fund development of a project indefinitely, especially if they stop using the software.
Some software fares better as its own startup but other projects don’t lend themselves to monetization. “A thousand open source projects is like a thousand startups with different business models,” Evan You, maintainer of the popular JavaScript framework Vue, said at GitHub Universe 2020. “I’m mostly doing crowdfunding because Vue is a pretty high-exposure project, but it might not work for certain types of projects.”
Samuel Colvin, the maintainer of the popular Python library pydantic, says he doesn’t see a clear way to monetize his work. pydantic is a data validation library. It helps developers make sure that the correct type of data is inputted into an application. For example, it can check to make sure that when an application is expecting an integer, a user doesn’t enter a string of letters instead.
On one hand, pydantic is a widely used piece of software, core to other popular Python packages like the natural language processing library spaCy and FastAPI, the third most popular Python framework according to a survey of more than 28,000 Python developers conducted in October 2020. “It’s essential to what we do at SpaCy,” says maintainer Ines Montani. “There’s a whole ecosystem built up around tools like pydantic.” On the other, Colvin says it’s not yet big enough to garner the volume of sponsorships or donations that would help fund a full-time developer for the project, nor is there an obvious way to build a startup around it.
Besides, funding won’t solve all the problems that modern open source faces. The open source ecosystem is growing more complex, and with that complexity comes new burdens for open source maintainers to keep their projects compatible with the other projects, code, and applications they were designed to work with.
For example, in 2017 the Python core development team proposed a change to the programming language that would have partially broken pydantic and, by extension, everything that depended on it, including FastAPI. Colvin didn’t realize the change would be implemented until April of 2021 when he got an email from a concerned Python maintainer. “He told me he had only just heard about FastAPI and pydantic and realized the proposal might affect them,” Colvin explains.
Colvin launched an online campaign against the proposal, and, ultimately, the Python team chose to delay the change while they worked on a solution. “I probably could have handled the situation better. I probably banged the drum louder than I needed to,” Colvin says. “I was really impressed with the Python steering committee, they understood our concerns and we’ll be working with them more in the future.”
The incident illuminates the challenges of maintaining software. “Following the Python developer mailing lists would be a full-time job in and of itself,” Colvin says. Even if he didn’t have a full-time job outside of pydantic, he wouldn’t be able to keep up with everything going on in the Python ecosystem that could affect his project. Likewise, it’s unreasonable to expect the core Python maintainers to understand the inner workings of every Python library out there, let alone the edge cases.
“Even if you have funding, there’s always more work than you can possibly do,” Van Slyck says. “And the bar is always being raised. It used to be enough to just put some code online. Now you’re expected to foster a community. You have a lot more hats to wear.”
Happiness in modularity
Some developers have found serenity in open source by focusing on making smaller, more manageable packages of code—and deliberately wearing fewer hats. “I don’t like to design things that require a lot of maintenance,” says James Halliday, the creator of hundreds of Node.js modules. “I try to keep my modules minimal. I think a lot of tools would be better if people stopped working on them, stopped changing things for no reason.” More features mean more complexity and every change introduces the possibility of new bugs.
Halliday—better known by his handle substack—takes an uncommonly hands-off approach to his work. “I have all my GitHub notification emails turned off,” he says. If someone finds a problem with his code or wants a feature added to a module he’s no longer working on, they’re free to fork his code. That is, after all, the open source way. But he doesn’t put effort into issues or pull requests for packages he considers finished. “It’s not my job to keep tabs on every little thing I wrote years or decades ago very often,” he says. “I am always busy with new projects and if I was always looking back at old projects I wouldn’t have enough time to move forward.”
There’s no perfect development process that works for all open source developers. Plenty of them are happy to maintain their projects or work with the community on improvements. A less extreme version of the “less is more” mentality is becoming an increasingly popular alternative to maintaining large projects and communities. As Eghbal pointed out in her book Working in Public: The Making and Maintenance of Open Source Software, adding more contributors to a project can actually create more work for maintainers, as maintainers can quickly find themselves on the hook to not only review other people’s code but also maintain new features if those contributors end up leaving the project. Eghbal identified a trend towards more modular design and development in open source, with maintainers creating smaller packages that require fewer maintainers and less work to maintain. Even larger projects, like Ruby on Rails, are adopting a more modular approach.
Meanwhile, GitHub’s 2020 State of the Octoverse report found that open source developers are increasingly focusing on making smaller, more incremental changes that create less overhead for maintainers. “The number one best practice identified was keeping the scope of pull requests small, because it makes reviews easier, leads to better reviews, and makes it easier to revert if there are issues,” the report says. “It also streamlines feedback, creating momentum and contributing to the team’s productivity.”
While this more modular approach might make life easier for open source maintainers, it can create new challenges for enterprises because it increases the total number of different packages they need to vet and keep updated “The surface area of potential problems is bigger, since there are more dependencies, each one managed by a different developer,” Eghbal wrote
In some ways, that’s the cost of using free software. “Remember that open source maintainers are generally doing this work for a greater good, often with no compensation, and don’t have much control over how their software is being used or modified by other people.” says GitHub CSO Mike Hanley. “If you’re depending on third party and open source software, there is a responsibility to do some due diligence on what you’re incorporating into your project and how you’re using it. Consider sharing any improvements, interesting use cases, or bug fixes you find back with the project.”
Beyond that, there are plenty of opportunities for companies and individuals to become more involved in the open source communities they rely on—from funding and maintenance to fostering communities where none yet exist. It all starts with taking a closer look at the software behind your code and seeing the previously unseen infrastructure that powers it.