Skip to content
Get 30+ hours of free content from GitHub Universe! Watch now.

The case for using Rust in MLOps

Level up your Rust skills and push MLOps forward with GitHub Copilot.

Artwork: Susan Haejin Lee

Photo of Noah Gift
Pragmatic AI Labs logo

Noah Gift // Executive in Residence, Duke University and Founder, Pragmatic AI Labs

The ReadME Project amplifies the voices of the open source community: the maintainers, developers, and teams whose contributions move the world forward every day.

Operational efficiency must be at the core of any technology system. MLOps builds upon DevOps, which in turn builds on the concept of kaizen, the Japanese word for continuous improvement. Without continuous improvement, you wouldn’t have DevOps or, by extension, MLOps.


In this Guide, you will learn how to:

  1. Apply best practices for sustainability and energy efficiency by using the Rust language.

  2. Level up to using a more robust language, Rust, with GitHub Copilot.

  3. Think differently about the false appearance of progress in data science and MLOps projects.


At the heart of continuously improving operations is a simple question: “Can we improve operational performance—from training and inference to packaging and delivery—by ten times or more?” If the answer is yes, as it will be with many organizations using Python for data science, the next question should be: "Why are we not doing it?" 

For decades, organizations had few options besides pure C/C++ and Python for machine learning solutions. C++ may provide more efficiency in terms of performance, but Python is generally easier to learn, implement, and maintain, which is why Python has taken off in data science. The hard choice between the performant but complex C++ and the easy-to-learn but comparatively slow Python ultimately results in many companies choosing Python.

But there is another way. Rust consistently ranks among the most performant and energy-efficient languages. It also happens to be among the most loved languages in StackOverflow’s annual developer survey. Though some Python libraries widely used in data science are written in C and can provide some of the performance benefits of running a compiled language, Rust provides a more direct route to bare metal while using a single language. 

Rust is also far easier to learn and use than C or C++, which makes it a realistic solution for those who want the performance of a compiled language. That’s especially the case when using GitHub Copilot, an AI-powered pair programmer that uses the OpenAI Codex to suggest code and entire functions in real time to developers while they code. Let's discuss this strategy next.

We want to hear from you! Join us on GitHub Discussions.

The case for Rust for MLOps

GitHub Copilot is a revolutionary new change in the way developers work. GitHub Copilot and tools like it are a game changer since they minimize the impact of syntax on productivity. With Rust, you spend more time working on compiling code, which is an investment in future returns, much like saving for the future in a retirement account. Rust has great performance and safety, but the syntax can be challenging. With GitHub Copilot, the syntax becomes less of an issue since the suggestions eliminate many of the difficulties in programming. Additionally, because of the robustness of the Rust toolchain for linting, formatting, and compiling, any errors or false starts from GitHub Copilot are caught by the these tools, making the combination of Rust and GitHub Copilot an emergent front-runner in AI-assisted coding.

There are several reasons to consider Rust other than performance. Rust is a modern language that first appeared in 2010. It lacks the baggage that older languages carry, but it’s established enough that we can rest assured it isn’t going anywhere anytime soon. Further, other trends are supporting a hard look at Rust.

Rust was designed from the ground up to support modern computing capabilities, like multi-core threads, that are often “bolted on” to older languages like Python. By designing the language to support these features from the start, Rust can avoid the awkwardness found in many other languages. A great example of how simple multi-core threads are in Rust is the following snippet from the Rust rayon library:

1
2
3
4
5
6
use rayon::prelude::*;
fn sum_of_squares(input: &[i32]) -> i32 {
    input.par_iter()
         .map(|i| i * i)
         .sum()
}

There are no gimmicks or hacks to the code; the threads “just work” across all the machine cores, and the code is just as readable as Python. 

Likewise, Rust was built to support typing, so the entire toolchain from the linter to the editor to the compiler can leverage this capability. Rust also makes packaging a breeze. Cargo provides a Python-esque “one obvious way” to install packages.

Of course there are still areas where Python excels. It’s fantastic for API documentation and readability in general. If you need to try out an idea, it is hard to beat using Python in an interactive prompt, like IPython, to explore a concept. But MLOPs is more sensitive to performance requirements than other data science fields, and is heavily dependent on software engineering best practices that are better implemented with Rust. A new superset of Python called Mojo might solve many performance and deployment issues in the near future, but it's still in development while Rust is available in the here and now.

 One common objection to the use of Rust is that it doesn’t have as large and established an ecosystem as Python does for working with data. But keep in mind that this ecosystem isn’t necessarily tuned to the needs of MLOps. In particular, the stack I call #jcpennys (Jupyter, Conda, Pandas, Numpy, Sklearn) is straight from academia, heavyweight, and optimized for use with small data. In academics, there is much to be said for a "God environment" with everything in one spot. But in real-world production MLOps, you don't want extra packages or brittle tools that are difficult to test, like notebooks. Meanwhile, the Rust ecosystem is growing. For example, Polars is a performant data frame library.

Leveling up with Rust, GitHub Copilot, and Codespaces

Let's look at how you can use the GitHub ecosystem to level up to a more robust language in Rust.

All Rust projects can follow this pattern:

  1. Create a new repo using Rust New Project Template.

  2. Create a new Codespace and use it.

  3. Use main.rs to call the handle CLI and lib.rs to handle logic and import clap in Cargo.toml as shown in this project.

  4. Use cargo init --name 'hello' or whatever you want to call your project.

  5. Put your "ideas" in as comments in Rust to seed GitHub Copilot, i.e //build anadd function

  6. Run make format i.e. cargo format

  7. Run make lint i.e. cargo clippy --quiet

  8. Run project: cargo run -- --help

  9. Push your changes to allow GitHub Actions to format check, lint check, and other actions like binary deploy.

This is a new emerging pattern ideal for systems programming in Rust, as certain combinations lead to new advances. For example, steel is a composite of iron and carbon, making a new substance stronger and harder than iron. Similarly, GitHub Copilot’s suggestions combined with a next generation compiled language like Rust and its ecosystem of formatting, linting, and packaging tools leads to the computer science equivalent of an alloy: a new, stronger solution to computational problems.

Flow chart depicting modern programming with prompt engineering for Rust MLOps and cloud computing

Here’s an example repository.

A good starting point for a new Rust project is the following pattern:

To run: cargo run -- marco --name "Marco"

Be careful to use the NAME of the project in the Cargo.toml to call lib.rs as in:

1
2
[package]
name = "hello"

For example, see the name `hello` invoked alongside marco_polo, which is in lib.rs

lib.rs code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
rust
/* A Marco Polo game. */
/* Accepts a string with a name.
If the name is "Marco", returns "Polo".
If the name is "any other value", it returns "Marco".
*/
pub fn marco_polo(name: &str) -> String {
    if name == "Marco" {
        "Polo".to_string()
    } else {
        "Marco".to_string()
    }
}

main.rs code:

1
2
3
4
5
6
7
8
9
10
Rust
fn main() {
    let args = Cli::parse();
    match args.command {
        Some(Commands::Marco { name }) => {
            println!("{}", hello::marco_polo(&name));
        }
        None => println!("No command was used"),
    }
}

Retrofitting a VW bug from the 1970s with modern EV technology is a suboptimal financial strategy. Similarly, bolting more and more non-native components onto Python is a suboptimal strategy when, instead, you could choose a new language when appropriate. Additionally, the old paradigm of mixing C with Python needs to be clarified if a developer can use one language and replace both with Rust.

In distributed computing, performance does matter, as does cybersecurity, energy usage, and binary distribution of software. Rust has a lot of compelling use cases for MLOps, and additional examples are in the Rust MLOps repo as well as a tutorial including notes from the Duke cloud computing course teaching Rust with GitHub Copilot.

We want to hear from you! Join us on GitHub Discussions.

We shouldn't treat software languages like sports teams we “root for.” The pragmatic practitioner looks for tools that efficiently solve problems. Languages like Go and Rust have emerged as solutions for high-performance computing, and Rust, in particular, shines at cybersecurity safety, a weakness of most languages like C and Python. The slight increase in complexity will pay off for organizations in the form of fewer bugs, more secure code, less toil for developers when managing packages and dependency, and lower compute costs.  

As you look around your organization, you’re bound to find numerous areas that can benefit from Rust’s improved cost profiles. Embedding ML models inside of command line tools is a great place to start. This could open up a new world of possibilities for sophisticated, binary-distributed tools. Microsoft has also adopted Rust bindings for the ONNX Runtime, which should increase the likelihood of new emerging embedded solutions in binary command line tools. Likewise, edge and embedded ML are ideal targets for Rust, since it is an excellent solution for low-memory and lower-energy workloads. Even if you start small, you’re bound to find some big wins.

Noah Gift is a distinguished technologist, entrepreneur, educator, and author with expertise in cloud computing, machine learning, and artificial intelligence. Renowned for developing innovative software solutions, Noah has founded multiple technology startups and taught at prestigious institutions, including Duke University, Northwestern University, and the University of California, Berkeley. A prolific author, Noah has written and co-authored books on topics such as cloud-native applications and machine learning workflows, empowering readers with the skills to excel in the evolving tech landscape. His contributions to the field have earned him recognition and respect within the technology community.

About The
ReadME Project

Coding is usually seen as a solitary activity, but it’s actually the world’s largest community effort led by open source maintainers, contributors, and teams. These unsung heroes put in long hours to build software, fix issues, field questions, and manage communities.

The ReadME Project is part of GitHub’s ongoing effort to amplify the voices of the developer community. It’s an evolving space to engage with the community and explore the stories, challenges, technology, and culture that surround the world of open source.

Follow us:

Nominate a developer

Nominate inspiring developers and projects you think we should feature in The ReadME Project.

Support the community

Recognize developers working behind the scenes and help open source projects get the resources they need.

Thank you! for subscribing