v0.16.0

barronalex released this 11 Jul 18:44

· 192 commits to main since this release

Highlights

@mx.custom_function for custom vjp/jvp/vmap transforms
Up to 2x faster Metal GEMV and fast masked GEMV
- benchmarks
Fast hadamard_transform
- benchmarks

Core

Metal 3.2 support
Reduced CPU binary size
Added quantized GPU ops to JIT
Faster GPU compilation
Added grads for bitwise ops + indexing

Bug Fixes

1D scatter bug
Strided sort bug
Reshape copy bug
Seg fault in mx.compile
Donation condition in compilation
Compilation of accelerate on iOS

Assets 2