Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ziggy thing: run Zig code in a Livebook #11

Open
ndrean opened this issue Oct 31, 2024 · 24 comments
Open

Ziggy thing: run Zig code in a Livebook #11

ndrean opened this issue Oct 31, 2024 · 24 comments

Comments

@ndrean
Copy link

ndrean commented Oct 31, 2024

Challenge: Create a Mandelbrot set image with Zig

Introduction

WFT is a Mandelbrot set? 🤯

https://en.wikipedia.org/wiki/Mandelbrot_set

Nice, but how? 😳

https://en.wikipedia.org/wiki/Plotting_algorithms_for_the_Mandelbrot_set

Of course, some explanations on how to translate this is into real code.

But wait, why? 🧐

  1. Well, the algorithm is "easy" to understand, once you understood what a Mandlebrot set is 😁.

  2. What is interesting is that the computation lends itself extremely well to parallel processing. So Go of course, and maybe Elixir (using Nx?) are candidates. Lets do it in Zig firstly, then maybe Elixir.

  3. Since we want to produce an image, we also learn to setup and use a Zig library to precisely produce a PNG file from data. It turns out that this is the easiest part (but also happened to be a blocking step).

  4. Can I put this altogether and render an image? Lets Zigler it and let Livebook run and display it!

@ndrean
Copy link
Author

ndrean commented Oct 31, 2024

Preliminaries

Install a Zig version manager:

  • Why not brew ?

Because you will use a library who depends upon a nominated version.
Since brew gives you "0.13.0" at the moment of writing, and the library uses another version.

  1. You can use: ZVM to use the latest version on master. Link below

zvm: https://github.com/tristanisham/zvm
Screenshot 2024-10-31 at 13 34 49

  1. You can use zigup. Link below
    Screenshot 2024-10-31 at 13 47 15

Once installed, you can use zigup to fetch the Zig version required by the library. Link below for "2024100-mach" as requested at the time of writing.

Screenshot 2024-10-31 at 13 48 26

  1. You instantiate a Zig project
zig init
  1. You need to pass a commit identifier to the dependency: select the last commit: zigimg repo and paste it below:
zig fetch --save "https://github.com/zigimg/zigimg/archive/5b5d718159c6ec223a54c9bb960690576e5df9c2.tar.gz"
  1. Add the code below to "build.zig":
const zigimg_dependency = b.dependency("zigimg", .{
        .target = target,
        .optimize = optimize,
    });

    exe.root_module.addImport("zigimg", zigimg_dependency.module("zigimg"));

and you should be good to go.

@ndrean
Copy link
Author

ndrean commented Oct 31, 2024

Show me some code

The algorithm:

  • instantiate a slice pixels of length say 1_000 x 1_000 x 3 = 3_000_000 bytes (u8)
  • loop i over 1..1000
    • loop j over 1..1_000 (all the columns)
      • compute the coordinate c in the complex plan corresponding to (i,j)
      • compute the iterations n(the length of the orbit of c)
      • compute the RGB colours for this n: it is a length 3 array col= [ R(n), G(n), B(n) ]
      • append to pixels at position ( i + j ) * 3 this array col

Quick remarks:

  • no advanced Zig has been used below. Only maybe the usage of the optional null with iter.?.
  • comptime usage [TO BE REVIEWED] (and understood!).
  • you can produce inline tests, really TDD. In fact, it is even short to use test "something" {...l}.
  • Style-guide. [TO BE REVIEWED]
  • you can setup documentation, using /// and doctests.
  • you can generate documentation directly from the code given that use documented using ///.

All these points above are [WORK IN PROGRESS]

First Zig code
// //! Compute Mandelbrot Set in Zig.

const std = @import("std");
const Cx = std.math.Complex(f64);
const zigimg = @import("zigimg");

const print = std.debug.print;

const IMAX: usize = 200;
const RESOLUTION = [_]u32{ 4096, 4096 };

test "complex" {
    const c1 = Cx.init(1.0, 0.0);
    const c2 = Cx.init(0.0, 1.0);
    const c3 = Cx.add(c1, c2);
    try std.testing.expectApproxEqRel(std.math.sqrt2, Cx.magnitude(c3), 1e-4);
    const c4 = Cx.mul(c1, c2);
    try std.testing.expectEqual(c4, c2);
}
/// Compute the square of the norm of a complex number to avoid the square root
fn sqnorm(z: Cx) f64 {
    return z.re * z.re + z.im * z.im;
}

test "sqnorm" {
    const z = Cx{ .re = 2.0, .im = 2.0 };
    try std.testing.expectApproxEqRel(sqnorm(z), Cx.magnitude(z) * Cx.magnitude(z), 1e-4);
}


/// The function computes the number iterations for `z(n+1) = z(n)^2 + c` to escape.
///  It escapes when (squared_norm > 4) or when it reaches IMAX. Return `j`
/// If it stays bounded, return `null`
fn getIter(c: Cx) ?usize {
    if (c.re > 0.6 or c.re < -2.1) return null;
    if (c.im > 1.2 or c.im < -1.2) return null;

    var z = Cx{ .re = 0.0, .im = 0.0 };

    for (0..IMAX) |j| {
        if (sqnorm(z) > 4) return j;
        z = Cx.mul(z, z).add(c);
    }
    return null;
}

test "iter when captured" {
    const c = Cx{ .re = 0.0, .im = 0.0 };
    const iter = getIter(c);
    try std.testing.expect(iter == null);
}
test "iter if escapes" {
    const c = Cx{ .re = 1.0, .im = 1.0 };
    const iter = getIter(c);
    try std.testing.expect(iter != null);
}

/// Creates an RGB arrays of u8 colour based on the number of iterations.
///
/// The colour if black when the point is captured.
///
/// The brighter the color the faster it escapes.
fn createRgb(iter: ?usize) [3]u8 {
    // If it didn't escape, return black
    if (iter == null) return [_]u8{ 0, 0, 0 };

    // Normalize time to [0,1] now that we know it escaped
    const normalized = @as(f32, @floatFromInt(iter.?)) / @as(f32, @floatFromInt(IMAX));

    if (normalized < 0.5) {
        const scaled = normalized * 2;
        return [_]u8{ @as(u8, @intFromFloat(255.0 * (1.0 - scaled))), @as(u8, @intFromFloat(255.0 * (1.0 - scaled / 2))), @as(u8, @intFromFloat(127 + 128 * scaled)) };
    } else {
        const scaled = (normalized - 0.5) * 2.0;
        return [_]u8{ 0, @as(u8, @intFromFloat(127 * (1 - scaled))), @as(u8, @intFromFloat(255.0 * (1.0 - scaled))) };
    }
}

test "createRgb" {
    const iter1 = 0;
    const expected1 = [_]u8{ 255, 255, 127 };
    var result = createRgb(iter1);
    try std.testing.expectEqualSlices(u8, &expected1, &result);

    const iter2 = IMAX / 2;
    const expected2 = [_]u8{ 0, 127, 255 };
    result = createRgb(iter2);
    try std.testing.expectEqualSlices(u8, &expected2, &result);

    const iter3 = IMAX;
    const expected3 = [_]u8{ 0, 0, 0 };
    result = createRgb(iter3);
    try std.testing.expectEqualSlices(u8, &expected3, &result);
}


const Context = struct {
    resolution: [2]u32,
    topLeft: Cx,
    bottomRight: Cx,
};

/// Given an image of size img,
/// a complex plane defined by the topLeft and bottomRight,
/// the pixel coordinate in the output image is translated to a complex number
///
/// Example: With an image of size img=100x200, the point/pixel at 75,175,
/// should map to 0.5 + 0.5i
fn pixelToComplex(pix: [2]u32, ctx: Context) Cx {
    const w = ctx.bottomRight.re - ctx.topLeft.re;
    const h = ctx.topLeft.im - ctx.bottomRight.im;
    const re = @as(f64, @floatFromInt(pix[0])) / @as(f64, @floatFromInt(ctx.resolution[0])) * w;
    const im = @as(f64, @floatFromInt(pix[1])) / @as(f64, @floatFromInt(ctx.resolution[1])) * h;
    return Cx{ .re = (ctx.topLeft.re + re) * w, .im = (ctx.topLeft.im - im) * h };
}

test "pixelToComplex" {
    const topLeft = Cx{ .re = -1, .im = 1 };
    const bottomRight = Cx{ .re = 1, .im = -1 };
    const ctx = Context{ .resolution = .{ 100, 200 }, .topLeft = topLeft, .bottomRight = bottomRight };
    const pix = .{ 75, 150 };
    const expected = Cx{ .re = 0.5, .im = -0.5 };
    const result = pixelToComplex(pix, ctx);
    try std.testing.expect(expected.re == result.re and expected.im == result.im);
}

fn createSlice(ctx: Context, allocator: std.mem.Allocator) ![]u8 {
    var pixels = try allocator.alloc(u8, ctx.resolution[0] * ctx.resolution[1] * 3);
    for (0..ctx.resolution[1]) |y| {
        for (0..ctx.resolution[0]) |x| {
            const c = pixelToComplex(.{ @as(u32, @intCast(x)), @as(u32, @intCast(y)) }, ctx);
            const iter = getIter(c);
            const colour = createRgb(iter);
            const pixel_index = (y * resolution[0] + x) * 3;
            // copy RGB values to consecutive memory locations
            pixels[pixel_index + 0] = colour[0]; //R
            pixels[pixel_index + 1] = colour[1]; //G
            pixels[pixel_index + 2] = colour[2]; //B

        }
    }
    return pixels;
}

test "createSlice" {
   // TODO: a meaningful test
    _ = try createSlice(.{ 100, 200 }, Cx{ .re = -1, .im = 1 }, Cx{ .re = 1, .im = -1 }, std.testing.allocator);
}

fn writeToPNG(path: []const u8, pixels: []u8, resolution: [2]u32, allocator: std.mem.Allocator) !void {
    const w = resolution[0];
    const h = resolution[1];

    var image = try zigimg.Image.fromRawPixels(allocator, w, h, pixels, .rgb24);
    defer image.deinit();

    try image.writeToFilePath(path, .{ .png = .{} });
}

pub fn main() !void {
    var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    defer arena.deinit();
    const allocator = arena.allocator();

    const topLeft = Cx{ .re = -1, .im = 1 };
    const bottomRight = Cx{ .re = 1, .im = -1 };
    const ctx = Context{.resolution: RESOLUTION, .topLeft = topLeft, .bottomRight = bottomRight};

    const pixels = try createSlice(ctx, allocator);
    defer allocator.free(pixels);
    try writeToPNG("mandelbrot.png", pixels, RESOLUTION, allocator);
}

@ndrean
Copy link
Author

ndrean commented Oct 31, 2024

A zig build run gives miraculously the result below!

A first not so bad result, to be adjusted! 🎉

junk

@ndrean
Copy link
Author

ndrean commented Oct 31, 2024

Speed this up

Given the symmetry of this figure (if the orbit of the complex c is bounded, then so will its conjugate), we can obviously spare half of the computation. A bit hairy to do...

Then we can leverage parallelism since the points are independent.

Two options are evaluated:

  • spawn an OS thread per line. The number of rows is divided by the number of available CPU and we spawn plenty of threads per line.
  • divide the number of rows by the number of CPUs to obtain bands. Then spawn an OS thread per band. This means we spawn cpu_count threads only.

It turns out that the first idea is a bad idea. The overhead of spawning OS threads is such that you have no gain at all.

The second option is the more efficient.

  • Image: 30_000 x 30_000 pixels PNG RGB = 16MB
  • max_iter evaluated at 100 (coarse) and 500 (finer)
algo max_iter 100 max_iter 500 print PNG
unthreaded 9s 43s + 100s
threaded 5.5s 28s + 100s

We see that we spend more time producing a PNG file from the data than producing the data itself.

@ndrean
Copy link
Author

ndrean commented Nov 2, 2024

Long-running NIFs

Erlang doc on NIF: warning

"As mentioned in the warning text at the beginning of this manual page, it is of vital importance that a native function returns relatively fast. It is difficult to give an exact maximum amount of time that a native function is allowed to work, but usually a well-behaving native function is to return to its caller within 1 millisecond. This can be achieved using different approaches. If you have full control over the code to execute in the native function, the best approach is to divide the work into multiple chunks of work and call the native function multiple times."

It seems that 1 million pixels (1_000 x 1_000 pixels) is the limit for the Zig code to return on completion under 1ms.

Recall the algorithm:

  • instantiate a slice pixels of length 1_000 x 1_000 x 3 = 3_000_000 bytes (u8)
  • loop i over 1..500 (we do half of the rows because of the symmetry)
    • loop j over 1..1_000 (all the columns)
      • compute the coordinate c in the complex plan corresponding to (i,j)
      • compute the iterations n(the length of the orbit of c)
      • compute the RGB colours for this n: it is a length 3 array col= [ R(n), G(n), B(n) ]
      • append to pixels at position ( i + j ) * 3 this array col
      • append to pixels at position ( 1_000 - i + j ) * 3 the same array col

Zig is mutating directly a memory area by appending the computed 3-bytes array at a given location (i,j) (and its "symmetric").

What can be done?

  • if we keep only the Zig computation - the 3-bytes array - and pass it to the BEAM 1 million times, then the BEAM will probably make 1 million copies of the inflating binary we are producing along the road and we cannot use the symmetry.

  • can we consider running the Zig code as a packaged library via the PORT mechanism? This is no more Zigler though.

  • use Zigler concurrency wrapper!

@ndrean
Copy link
Author

ndrean commented Nov 2, 2024

Zigler concurrent mode 🎉

This Zig file runs the algorithm above but modified to run OS threads to parallelise the computations (the total number of rows is divided by the number of available CPUs, and each "band" is run in an OS thread).

It exports a function generate_mandlebrot/2.

In ths inline code, add: /// nif: generate_mandelbrot/2 Threaded and declare:

use Zig, otp_app: :zigler,
    nifs: [..., generate_misiurewicz: [:threaded]],
    release_mode: :fast

The parameters are the image size in pixel h x w.

Check Zigler Guides 04-Nifs options on how to set the concurrency mode

Zig threads ```zig // draw.zig const beam = @import("beam"); const std = @import("std"); const Cx = std.math.Complex(f64);

const IMAX = 300;
const topLeft = Cx{ .re = -2.0, .im = 1.2 };
const bottomRight = Cx{ .re = 0.6, .im = -1.2 };
const w = bottomRight.re - topLeft.re;
const h = topLeft.im - bottomRight.im;

/// nif: generate_mandelbrot/2 Threaded
pub fn generate_mandelbrot(res_x: usize, res_y: usize) !beam.term {
const pixels = try beam.allocator.alloc(u8, res_x * res_y * 3);
defer beam.allocator.free(pixels);

// no OS thread
// const res = try createUnthreadedSlice(pixels, res_x, res_y);

// OS threads
const res = try createBands(pixels, res_x, res_y);

return beam.make(res, .{ .as = .binary });
// return beam.make(res, .{});

}

// with OS THREADS-----------------------------------
fn createBands(pixels: []u8, res_x: usize, res_y: usize) ![]u8 {
const cpus = try std.Thread.getCpuCount();
var threads = try beam.allocator.alloc(std.Thread, cpus);
defer beam.allocator.free(threads);

// half of the total rows
const rows_to_process = res_x / 2 + res_x % 2;
// one band is one count of cpus
const nb_rows_per_band = rows_to_process / cpus + rows_to_process % cpus;
std.debug.print("nb_rows_per_band: {}\n", .{nb_rows_per_band});

for (0..cpus) |cpu_count| {
    const start_row = cpu_count * nb_rows_per_band;
    const end_row = start_row + nb_rows_per_band;
    const args = .{ res_x, res_y, pixels, start_row, end_row };
    threads[cpu_count] = try std.Thread.spawn(.{}, processRows, args);
}
for (threads[0..cpus]) |thread| {
    thread.join();
}

return pixels;

}

fn processRow(res_x: usize, res_y: usize, pixels: []u8, row_id: usize) void {
// Calculate the symmetric row
const sym_row_id = res_x - 1 - row_id;

if (row_id <= sym_row_id) {
    // loop over columns
    for (0..res_y) |col_id| {
        const c = pixelToComplex(.{ @as(usize, @intCast(col_id)), @as(usize, @intCast(row_id)) }, res_x, res_y);
        const iter = getIter(c);
        const colour = createRgb(iter);

        const p_idx = (row_id * res_y + col_id) * 3;
        pixels[p_idx + 0] = colour[0];
        pixels[p_idx + 1] = colour[1];
        pixels[p_idx + 2] = colour[2];

        // Process the symmetric row (if it's different from current row)
        if (row_id != sym_row_id) {
            const sym_p_idx = (sym_row_id * res_y + col_id) * 3;
            pixels[sym_p_idx + 0] = colour[0];
            pixels[sym_p_idx + 1] = colour[1];
            pixels[sym_p_idx + 2] = colour[2];
        }
    }
}

}

fn processRows(res_x: usize, res_y: usize, pixels: []u8, start_row: usize, end_row: usize) void {
for (start_row..end_row) |current_row| {
processRow(res_x, res_y, pixels, current_row);
}
}

// shared functions --
fn pixelToComplex(pix: [2]usize, res_x: usize, res_y: usize) Cx {
const re = @as(f64, @floatFromInt(pix[0])) / @as(f64, @floatFromInt(res_x)) * w;
const im = @as(f64, @floatFromInt(pix[1])) / @as(f64, @floatFromInt(res_y)) * h;
return Cx{ .re = (topLeft.re + re) * w, .im = (topLeft.im - im) * h };
}

fn getIter(c: Cx) ?usize {
if (c.re > 0.6 or c.re < -2.1) return null;
if (c.im > 1.2 or c.im < -1.2) return null;

var z = Cx{ .re = 0.0, .im = 0.0 };

for (0..IMAX) |j| {
    if (sqnorm(z) > 4) return j;
    z = Cx.mul(z, z).add(c);
}
return null;

}

fn sqnorm(z: Cx) f64 {
return z.re * z.re + z.im * z.im;
}

fn createRgb(iter: ?usize) [3]u8 {
// If it didn't escape, return black
if (iter == null) return [_]u8{ 0, 0, 0 };

// Normalize time to [0,1] now that we know it escaped
const normalized = @as(f64, @floatFromInt(iter.?)) / @as(f64, @floatFromInt(IMAX));

if (normalized < 0.5) {
    const scaled = normalized * 2;
    return [_]u8{ @as(u8, @intFromFloat(255.0 * (1.0 - scaled))), @as(u8, @intFromFloat(255.0 * (1.0 - scaled / 2))), @as(u8, @intFromFloat(127 + 128 * scaled)) };
} else {
    const scaled = (normalized - 0.5) * 2.0;
    return [_]u8{ 0, @as(u8, @intFromFloat(127 * (1 - scaled))), @as(u8, @intFromFloat(255.0 * (1.0 - scaled))) };
}

}
// shared functions---

// No OS THREAD--------------------------------------
fn createUnthreadedSlice(pixels: []u8, res_x: usize, res_y: usize) ![]u8 {
const rows_to_process = res_x / 2 + res_x % 2;
for (0..rows_to_process) |current_row| {
for (0..res_y) |current_col| {
const c = pixelToComplex(.{ @as(usize, @intcast(current_col)), @as(uusize, @intcast(current_row)) }, res_x, res_y);
const iter = getIter(c);
const colour = createRgb(iter);
const pixel_index = (current_row * res_y + current_col) * 3;
// copy RGB values to consecutive memory locations
pixels[pixel_index + 0] = colour[0]; //R
pixels[pixel_index + 1] = colour[1]; //G
pixels[pixel_index + 2] = colour[2]; //B

        const mirror_y = res_x - 1 - current_row;
        if (mirror_y != current_row) {
            const mirror_pixel_index = (mirror_y * res_y + current_col) * 3;
            pixels[mirror_pixel_index + 0] = colour[0]; //R
            pixels[mirror_pixel_index + 1] = colour[1]; //G
            pixels[mirror_pixel_index + 2] = colour[2]; //B
        }
    }
}
return pixels;

}


Elixir runs this code:

```elixir
defmodule Zigit do
  use Zig,
    otp_app: :zigit,
    nifs: [..., generate_mandelbrot: [:threaded]],
    zig_code_path: "draw.zig"

  def draw do
    t = :os.system_time(:millisecond)
    res_x = 40_000
    res_y = 40_000
    img = generate_mandelbrot(res_x, res_y)
    dbg(byte_size(img))
    dbg(:os.system_time(:millisecond) - t)
    {:ok, vimg} = Vix.Vips.Image.new_from_binary(img, res_x, res_y, 3, :VIPS_FORMAT_UCHAR)
    Vix.Vips.Operation.pngsave(vimg, "priv/zigex.png", compression: 2)
    # Vix.Vips.Image.write_to_file(vimg, "priv/zigex.jpg[Q=90]")
  end
end

With: 40_000 x 40_000 pixels and max_iter = 300, I have a processing time of:

  • 258s, and writing time of 2s when the computations do not use OS threads (no band)
  • 123s with similar writing time with OS threads (with band).

@ndrean
Copy link
Author

ndrean commented Nov 2, 2024

Conclusion

Zig is super fast!

@ndrean
Copy link
Author

ndrean commented Nov 2, 2024

For the pleasure, at least mine 😬, a zoom into an area 🎉
Black is "stable", yellow unstable fast, and blue unstable slow...

Screenshot 2024-11-02 at 15 18 44

@ndrean
Copy link
Author

ndrean commented Nov 2, 2024

Livebook this

Program

Run in a Livebook to:

  • click on a frame and visualize the orbit of a point (Zig will compute them and Livebook/Kino/Vega will display them)
  • trigger the computation of this Mandelbrot set (with low resolution)
  • be able to select a zone on-click on the frame to zoom further in (launch further computations

Livebook desktop config...

System.cmd("which", ["zig"])

# {..., 1}

🤔

Livebook desktop path

Let's check Zig:

> which zig
 ~/.zvm/bin/zig

and I check I have the path with Zig appended for Livebook to read them.

> cat ~/.livebookdesktop.sh
export PATH="$HOME/.zvm/bin/zig:$PATH" <- ok

However, Livebook desktop does not find Zig. 🤨

So I installed Livebook via Escript

mix do local.rebar --force, local.hex --force
mix escript.install hex livebook

Add export PATH=$HOME/.mix/escripts:$PATH" to "~/.zshrc",

then source ~/.zshrc;

and run:

livebook server via the CLI.

This seems better as via the terminal my path is known to the livebook command:

System.cmd("which", ["zig"])

# { ~/.zvm/bin/zig\n", 0}}

After these nicesties, seems like a bug Livebook/Zigler...#498

@ndrean
Copy link
Author

ndrean commented Nov 3, 2024

Orbits of Mandlebrot points with Livebook

Run in Livebook

Screenshot 2024-11-14 at 19 22 38

@ndrean
Copy link
Author

ndrean commented Nov 4, 2024

Pur Elixir version 🎉 in a Livebook (Zigler not ready yet)

I had a discussion with Paulo Valente from Dockyard about some points on Nx. Vey interesting.
I will share some code.

Here is an image obtained with only using Elixir with Nx and the EXLA backend (much better than the previous not Nx based).

This is a 600x600 image with max_iter=30 obtained in approx 47s (on a not so performant machine)
🎉 🚀

image

It is 3 magnitude slower than Zig though (8ms), but the ease of coding and rendering in a Livebook is amazing. Like a Jupyter Notebook.

@ndrean
Copy link
Author

ndrean commented Nov 7, 2024

Conclusion. 🚀🎉

Run in Livebook

@nelsonic
Copy link
Member

nelsonic commented Nov 8, 2024

@ndrean Awesome! 😍

@ndrean
Copy link
Author

ndrean commented Nov 8, 2024

@nelsonic quick question: did you open and tried the Livebook? If not, could you do it quickly? I still have doubts whether Zigler works. The "it works for me ™️" syndrome.....

@nelsonic
Copy link
Member

nelsonic commented Nov 8, 2024

@ndrean ran it on Huggingface 🤗 ✅

@ndrean
Copy link
Author

ndrean commented Nov 8, 2024

@nelsonic Huggingface ?🤔 Did not know!

❗ Livebook image does not ship with Zig, it does not work (the Zig part).

@nelsonic
Copy link
Member

nelsonic commented Nov 8, 2024

You can install whatever you want on hf. 🧑‍💻
And sadly, right now I'm on my work-work mac which is locked down ... 🏦 ⏳
Will be picking Zig up in the future on my personal mac and running all your superb examples. ❤️

@ndrean
Copy link
Author

ndrean commented Nov 11, 2024

My last creation .... 😀. or I should say a Misiurewicz point to be precise. And I stop this!.

Screenshot 2024-11-10 at 19 01 14

@ndrean
Copy link
Author

ndrean commented Nov 16, 2024

I added a Kino.JS.Live. The data flows between the browser and the Livebook server as binary data, faster for rendering.
I added a zoom in, both with Elixir and with Zig. Fractal images are fascinating I think!

Screenshot 2024-11-16 at 23 56 42

@nelsonic
Copy link
Member

What an awesome rabbit hole! 🐇 🕳️ 😍

Unrelated K8s question: dwyl/learn-devops#99 (comment) ❓ 🤔 💭

@ndrean
Copy link
Author

ndrean commented Nov 24, 2024

Still drilling the rabbit hole .... ♾️, and turning this into a personal blog 😄

WebAssembly, Zig and LiveView 🚀

My last (?) step was to produce a WASM from Zig and run it in the browser.

Not saying that I know Zig, nor WebAssembly, for sure not, but I managed to get something working.

LiveView rendering WebAssembly

I will also setup a Github pages to render a standalone WASM: https://ndrean.github.io/zig-assembly-test/

Ok, took me a few hours yesterday skimming through WebAssembly, but learnt a few things. How to compile Zig to WASM? How to adapt the Zig for WebAssembly? How to get a WASM file from Phoenix into WebAssembly? How to run WebAssembly code?

And understood a little bit about the build.zig setup file: using zig build is the written manifest of the arguments you pass in the command line with zig build-lib -target ... -O ... (like the docker incantations vs a docker compose file).

One point: I am not sure that the WebAssembly code is faster than using "embedded Zig", simply because I used OS threads in the embedded Zig whilst WebAssembly in the browser does not support this (perhaps using Web Workers, maybe one day...). However, you don't need a backend: it runs in the browser (once you loaded it).

A few words about WebAssembly (and Zig)

I found this aa bit late but this repo is interesting about WebAssembly and Zig:
https://github.com/sleibrock/zigtoys/tree/main

The most "surprising" facts are:

  • you can pass only numbers from Javascript to WebAssembly: integers and floats. Nothing else.
  • you can't return anything else than numbers from a WebAssembly function,
  • you can't tryin Zig: this means you catch unreachable, thus we must be pretty sure of zero divisions and calculate memory usage beforehand.

So how do I return an array? Even before returning an array, our type in Zig should correspond to the type declared in Javascript. Then, you send the address of its first element, and the size, and let Javascript build his UnitArray from this. You are really running WebAssembly functions that happen to be produced by Zig.

By the way, you don't pub fn ... but you export fn ... in your Zig code, not really documented... In fact, the Zig documentation is rather poor, in short, still in an unstable state.

And there is WASI. It has a runtime wasmtime.

Screenshot 2024-11-24 at 15 27 11

or this short video if you prefer: WebAssembly outside the web.

Screenshot 2024-11-24 at 15 28 12

A bit into WebAssembly arcanes and running WebAssembly server-side (Node makes is really easy).

Screenshot 2024-11-24 at 15 57 15

@ndrean
Copy link
Author

ndrean commented Nov 25, 2024

The GitHub pages running Zig compiled into WebAssembly: https://ndrean.github.io/zig-assembly-test/

@ndrean
Copy link
Author

ndrean commented Nov 26, 2024

You saw me coming....

You write "elixirish webassemblish" code, as use Wasmex (wasmtime in fact via a Rust binding, the WASI) to run this.

Screenshot 2024-11-26 at 15 15 06

Write WebAssembly with the power of Elixir as your compiler:

Use Elixir’s module system to break problems down and then compose them together.
Chain function calls together with the pipe |> operator.
Publish reusable code with the Hex package manager.
Write unit tests using Elixir’s built-in ExUnit.
Reduce boilerplate with Elixir’s powerful macro system.
Run dynamic Elixir code at compile time e.g. talk to the rest of your Elixir application, call out to an Elixir library, or make network requests.
Compile modules on-the-fly e.g. use feature flags to conditionally compile code paths or enable particular WebAssembly instructions, creating a custom “tree shaken” WebAssembly module per user.

@ndrean
Copy link
Author

ndrean commented Dec 2, 2024

Running WebAssembly in Elixir

  • Orb compiles WebAssembly using Elixir syntax code into WAT (text-wasm). You then need a wasmtime based runtime to execute this file.
  • wasmex can execute WASM from Elixir via wasmtime (included). Check the doc example
  • extism can execute WASM from Elixir via wasmtime too.

To use WebAssembly in Elixir, you would typically:

  • Add :wasmex or :extism to the dependencies.
  • Load a WebAssembly module (.wasm or .wat file) in your Elixir code.
  • Start a WebAssembly instance.
  • Call functions from the WebAssembly module. The code that compiled to WebAssembly exported functions.

WebAssembly and NIF

NIF is generally faster and less latency and loads faster, but WebAssembly is safer because it is sandboxed.

  • NIFs have direct access to Erlang's memory, run directly within the BEAM process.
  • Wasm modules operate within their own memory space, sandboxed environment isolated from the BEAM, thus the safety.

NIFs require compilation for specific target platforms, while Wasm modules is portable.

WebAssembly can therefor run edge functions safely.

Example: passing strings to/from Elixir to WebAssembly

A WebAssembly function accepts only numbers, integers or floats.

When calling a WebAssembly function and passing a string, you typically need to pass two parameters:

  • the memory address (pointer) of the string (as an integer)
  • its length (as an integer).

This is because WebAssembly does not have direct access to the host's memory.

You also need to provide a memory allocation for WebAssembly (compiled from an internal allocator in Zig or malloc in C).

  • Repo: Run Javascript compiled to WASM and Zig compiled to WASM server-side this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants