Skip to content

An awesome collection about micrograd & friends - the tiny built from scratch (about 100 lines of code) autograd engine w/ a PyTorch-like neural net library on top


Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



12 Commits

Repository files navigation

These 94 lines of code are everything that is needed to train a neural network. Everything else is just efficiency.

This is my earlier project Micrograd. It implements a scalar-valued auto-grad engine. You start with some numbers at the leafs (usually the input data and the neural network parameters), build up a computational graph with operations like + and * that mix them, and the graph ends with a single value at the very end (the loss). You then go backwards through the graph applying chain rule at each node to calculate the gradients. The gradients tell you how to nudge your parameters to decrease the loss (and hence improve your network).

Sometimes when things get too complicated, I come back to this code and just breathe a little. But ok ok you also do have to know what the computational graph should be (e.g. MLP -> Transformer), what the loss function should be (e.g. autoregressive/diffusion), how to best use the gradients for a parameter update (e.g. SGD -> AdamW) etc etc. But it is the core of what is mostly happening.

The 1986 paper from Rumelhart, Hinton, Williams that popularized and used this algorithm (backpropagation) for training neural nets, micrograd on Github and my (now somewhat old) YouTube video where I very slowly build and explain.

-- Andrej Karpathy, June 2024

Awesome Micrograd & Friends

Yes, you can! Build your own auto(matic) grad(ient) engine using reverse-mode auto(matic) diff(erenation) from scratch. bonus - add a pytoch-like neural network library on top.

Official Micrograd Versions by Andrej Karpathy

Genesis @

A tiny Autograd engine (with a bite! :)). Implements backpropagation (reverse-mode autodiff) over a dynamically built DAG and a small neural networks library on top of it with a PyTorch-like API. Both are tiny, with about 100 and 50 lines of code respectively. The DAG only operates over scalar values, so e.g. we chop up each neuron into all of its individual tiny adds and multiplies. However, this is enough to build up entire deep neural nets doing binary classification, as the demo notebook shows. Potentially useful for educational purposes.

EurekaLabs follow-up started in 2024 (part of LLMs 101) @

Neural Networks: Zero to Hero

Lecture 1: Building micrograd - the spelled-out intro to neural networks and backpropagation

Micrograd Extensions

Micrograd++ / Micrograd Plus Plus (by Parsiad Azimzadeh)

incl. tensor support (via numpy.ndarrays) and GPU support and more


Micograd CUDA (by Matthieu Le Cauchois)

incl. Micrograd extension with basic 2D tensors and naïve matrix multiplication for MLP, batching, CUDA kernel for matrix multiplication, and more


Tinygrad & Teenygrad (by George Hotz et al)

Tinygrad is a about 10000 lines of code Pytorch-like library with autograd, incl. gpu support beyond CUDA and much more, venture capital backed by tinycorp (with a million dollar investment).

Teenygrad is a 1000 lines of code Tinygrad.

see and

Micrograd in Language X


Micrograd TS by Oleksii Trekhleb, see - about 200 lines of typescript code

Show/Hide Sample
// Inputs x1, x2
const x1 = v(2, { label: 'x1' })
const x2 = v(0, { label: 'x2' })

// Weights w1, w2
const w1 = v(-3, { label: 'w1' })
const w2 = v(1, { label: 'w2' })

// bias of the neuron
const b = v(6.8813735870195432, { label: 'b' })

// x1w1 + x2w2 + b
const x1w1 = x1.mul(w1)
x1w1.label = 'x1w1'

const x2w2 = x2.mul(w2)
x2w2.label = 'x2w2'

const x1w1x2w2 = x1w1.add(x2w2)
x1w1x2w2.label = 'x1w1x2w2'

const n = x1w1x2w2.add(b)
n.label = 'n'

const o = n.tanh()
o.label = 'o'


C Lang

micrograd.c by Jaward Sesay, see

Show Hide/Sample
Value* a = value_new(-4.0);
Value* b = value_new(2.0);
Value* c = value_add(a, b);
Value* d = value_add(value_mul(a, b), value_pow(b, 3));
c = value_add(c, value_add(c, value_new(1)));
c = value_add(c, value_add(value_add(value_new(1), c), value_neg(a)));
d = value_add(d, value_add(value_mul(d, value_new(2)), value_relu(value_add(b, a))));
d = value_add(d, value_add(value_mul(value_new(3), d), value_relu(value_sub(b, a))));
Value* e = value_sub(c, d);
Value* f = value_pow(e, 2);
Value* g = value_div(f, value_new(2.0));
g = value_add(g, value_div(value_new(10.0), f));

double tol = 1e-4; 
printf("g->data: %.6f\n", g->data);


printf("a->grad: %.6f\n", a->grad);
printf("b->grad: %.6f\n", b->grad);

Go Lang

go-micrograd by Nathan Bary, see

Show/Hide Sample
    x := New(2)
	w := New(0.4) // pretend random init
	y := New(4)

	for k := 0; k < 6; k++ {

		// forward pass
		ypred := Mul(w, x)
		loss := Pow(Sub(ypred, y), New(2))

		// backward pass
		w.Grad = 0 // zero previous gradients

		// update weights
		w.Data += -0.1 * w.Grad

		fmt.Printf("Iter: %2v, Loss: %.4v, w: %.4v\n",
            k, loss.Data, w.Data)

Crystal by nogginly, see

Show/Hide Sample
require "micrograd"

alias NNFloat = Float32
alias NNValue = MicroGrad::Value(NNFloat)

a = NNValue[-4]
b = NNValue[2]
c = a + b
d = a * b + b**3
c += c + 1
c += 1 + c + (-a)
d += d * 2 + (b + a).relu
d += 3 * d + (b - a).relu
e = c - d
f = e**2
g = f / 2.0
g += 10.0 / f

puts "g: #{g}" # prints 24.7041, the outcome of this forward pass
puts "a: #{a}" # prints 138.8338, i.e. the numerical value of dg/da
puts "b: #{b}" # prints 645.5773, i.e. the numerical value of dg/db


micrograd by Nithin Bekal, see, (rdocs)

Show/Hide Sample
include Micrograd

a =
b =
c =
e = a * b
d = e + c
f =

l = d * f

# Walk through all the values and calculate gradients for them.

backprop by Rick Hull, see, (rdocs)


MicroGrad.jl by Lior Sinai, see

Micrograd Articles

History Corner

Grokking Deep Learning by Andrew W. Trask builds a micrograd-like engine w/ a pytorch-like library on top in 2019 (!), see

Chapter 13 - Introducing automatic optimization: let's build a deep learning framework.

[...] Introduction to automatic gradient computation (autograd) Previously, you performed backpropagation by hand. Let's make it automatic! [...]

The (code) notebook is free online, see Chapter13 - Intro to Automatic Differentiation - Let's Build A Deep Learning Framework.ipynb.


An awesome collection about micrograd & friends - the tiny built from scratch (about 100 lines of code) autograd engine w/ a PyTorch-like neural net library on top





