Releases · ml-explore/mlx

Highlights

Speed improvements:
- Up to 2x faster I/O: benchmarks.
- Faster transposed copies, unary, and binary ops
  - CPU benchmarks here.
  - GPU benchmarks here and here.
Transposed convolutions
Improvements to mx.distributed (send/recv/average_gradients)

Core

New features:
- mx.conv_transpose{1,2,3}d
- Allow mx.take to work with integer index
- Add std as method on mx.array
- mx.put_along_axis
- mx.cross_product
- int() and float() work on scalar mx.array
- Add optional headers to mx.fast.metal_kernel
- mx.distributed.send and mx.distributed.recv
- mx.linalg.pinv
Performance
- Up to 2x faster I/O
- Much faster CPU convolutions
- Faster general n-dimensional copies, unary, and binary ops for both CPU and GPU
- Put reduction ops in default stream with async for faster comms
- Overhead reductions in mx.fast.metal_kernel
- Improve donation heuristics to reduce memory use
Misc
- Support Xcode 160

NN

Faster RNN layers
nn.ConvTranspose{1,2,3}d
mlx.nn.average_gradients data parallel helper for distributed training

Bug Fixes

Fix boolean all reduce bug
Fix extension metal library finding
Fix ternary for large arrays
Make eval just wait if all arrays are scheduled
Fix CPU softmax by removing redundant coefficient in neon_fast_exp
Fix JIT reductions
Fix overflow in quantize/dequantize
Fix compile with byte sized constants
Fix copy in the sort primitive
Fix reduce edge case
Fix slice data size
Throw for certain cases of non captured inputs in compile
Fix copying scalars by adding fill_gpu
Fix bug in module attribute set, reset, set
Ensure io/comm streams are active before eval
Fix mx.clip
Override class function in Repr so mx.array is not confused with array.array
Avoid using find_library to make install truly portable
Remove fmt dependencies from MLX install
Fix for partition VJP
Avoid command buffer timeout for IO on large arrays

Highlights

mx.einsum: PR
Big speedups in reductions: benchmarks
2x faster model loading: PR
mx.fast.metal_kernel for custom GPU kernels: docs

Core

Faster program exits
Laplace sampling
mx.nan_to_num
nn.tanh gelu approximation
Fused GPU quantization ops
Faster group norm
bf16 winograd conv
vmap support for mx.scatter
mx.pad "edge" padding
More numerically stable mx.var
mx.linalg.cholesky_inv/mx.linalg.tri_inv
mx.isfinite
Complex mx.sign now mirrors NumPy 2.0 behaviour
More flexible mx.fast.rope
Update to nanobind 2.1

Bug Fixes

gguf zero initialization
expm1f overflow handling
bfloat16 hadamard
large arrays for various ops
rope fix
bf16 array creation
preserve dtype in nn.Dropout
nn.TransformerEncoder with norm_first=False
excess copies from contiguity bug

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highlights

Core

Bugfixes

Highlights

Core

NN

Bugfixes

Highlights

Core

NN

Bug Fixes

Highlights

Core

Bug Fixes

Releases: ml-explore/mlx

v0.20.0

Highlights

Core

Bugfixes

v0.19.3

v0.19.2

v0.19.1

v0.19.0

Highlights

Core

NN

Bugfixes

v0.18.1

v0.18.0

Highlights

Core

NN

Bug Fixes

v0.17.3

v0.17.1

v0.17.0

Highlights

Core

Bug Fixes