Long running NIF, a florilege #17

ndrean · 2024-12-12T11:15:42Z

Winter is almost here. More things to watch

You may prefer to read the ERL NIF documentation first.

Long-running NIFs
As mentioned in the warning text at the beginning of this manual page, it is of vital importance that a native function returns relatively fast. It is difficult to give an exact maximum amount of time that a native function is allowed to work, but usually a well-behaving native function is to return to its caller within 1 millisecond. This can be achieved using different approaches. If you have full control over the code to execute in the native function, the best approach is to divide the work into multiple chunks of work and call the native function multiple times. This is, however, not always possible, for example when calling third-party libraries.

The enif_consume_timeslice() function can be used to inform the runtime system about the length of the NIF call. It is typically always to be used unless the NIF executes very fast.

If the NIF call is too lengthy, this must be handled in one of the following ways to avoid degraded responsiveness, scheduler load balancing problems, and other strange behaviors:

Yielding NIF - If the functionality of a long-running NIF can be split so that its work can be achieved through a series of shorter NIF calls, the application has two options:

Make that series of NIF calls from the Erlang level.
Call a NIF that first performs a chunk of the work, then invokes the enif_schedule_nif function to schedule another NIF call to perform the next chunk. The final call scheduled in this manner can then return the overall result.
Breaking up a long-running function in this manner enables the VM to regain control between calls to the NIFs.

This approach is always preferred over the other alternatives described below. This both from a performance perspective and a system characteristics perspective.

Threaded NIF - This is accomplished by dispatching the work to another thread managed by the NIF library, return from the NIF, and wait for the result. The thread can send the result back to the Erlang process using enif_send. Information about thread primitives is provided below.

From the documentation:

Dirty NIF
A NIF that cannot be split and cannot execute in a millisecond or less is called a "dirty NIF", as it performs work that the ordinary schedulers of the Erlang runtime system cannot handle cleanly. Applications that make use of such functions must indicate to the runtime that the functions are dirty so they can be handled specially. This is handled by executing dirty jobs on a separate set of schedulers called dirty schedulers. A dirty NIF executing on a dirty scheduler does not have the same duration restriction as a normal NIF.

It is important to classify the dirty job correct. An I/O bound job should be classified as such, and a CPU bound job should be classified as such. If you should classify CPU bound jobs as I/O bound jobs, dirty I/O schedulers might starve ordinary schedulers. I/O bound jobs are expected to either block waiting for I/O, and/or spend a limited amount of time moving data.

To schedule a dirty NIF for execution, the application has two options:

Set the appropriate flags value for the dirty NIF in its ErlNifFunc entry.
Call enif_schedule_nif, pass to it a pointer to the dirty NIF to be executed, and indicate with argument flags whether it expects the operation to be CPU-bound or I/O-bound.
A job that alternates between I/O bound and CPU bound can be reclassified and rescheduled using enif_schedule_nif so that it executes on the correct type of dirty scheduler at all times. For more information see the documentation of the erl command line arguments +SDcpu, and +SDio.

While a process executes a dirty NIF, some operations that communicate with it can take a very long time to complete. Suspend or garbage collection of a process executing a dirty NIF cannot be done until the dirty NIF has returned. Thus, other processes waiting for such operations to complete might have to wait for a very long time. Blocking multi-scheduling, that is, calling erlang:system_flag(multi_scheduling, block), can also take a very long time to complete. This is because all ongoing dirty operations on all dirty schedulers must complete before the block operation can complete.

Many operations communicating with a process executing a dirty NIF can, however, complete while it executes the dirty NIF. For example, retrieving information about it through process_info, setting its group leader, register/unregister its name, and so on.

Termination of a process executing a dirty NIF can only be completed up to a certain point while it executes the dirty NIF. All Erlang resources, such as its registered name and its ETS tables, are released. All links and monitors are triggered. The execution of the NIF is, however, not stopped. The NIF can safely continue execution, allocate heap memory, and so on, but it is of course better to stop executing as soon as possible. The NIF can check whether a current process is alive using enif_is_current_process_alive. Communication using enif_send and enif_port_command is also dropped when the sending process is not alive. Deallocation of certain internal resources, such as process heap and process control block, is delayed until the dirty NIF has completed.

Tackling Dirty Jobs with Erlang's Schedulers

"A key feature of the Erlang virtual machine (VM) is its sophisticated and scalable schedulers, which are responsible for ensuring application processes and tasks take advantage of all system cores. In this talk, Steve will explain some of the inner workings of Erlang's schedulers, focusing on how large-scale applications such as the Riak database augment the VM with native C/C++ code for performance and for application integration. Steve will also present some brand new Erlang features that help "dirty" code cooperate seamlessly with Erlang's schedulers.

Talk objectives:

To explain some inner workings and new features of the Erlang VM, and show how native C/C++ code can be used to safely enhance Erlang applications.

Lukas Larsson - Understanding the Erlang Scheduler

"In Erlang there are different types of concurrent entities, processes, ports etc., each of which can have millions of instances, that have to be mapped out to make optimal usage of the hardware. The Erlang scheduler is a master piece in software engineering, but how does it actually go about scheduling the processes you create in your programs?"

Sasha Juric - How does Scheduler work

Rustler point-of-view

Dirty CPU, Resheduling, Threads, Dirty Scheduler,

Deep dive into the Scheduler

If you are more into reading

ndrean · 2024-12-19T12:05:41Z

nice post, shedding some light the concept of resource and its Zigler traduction: an object that represents a block of memory (pointer & length) that is typed by the BEAM as garbage collected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long running NIF, a florilege #17

Long running NIF, a florilege #17

ndrean commented Dec 12, 2024 •

edited

Loading

ndrean commented Dec 19, 2024 •

edited

Loading

Long running NIF, a florilege #17

Long running NIF, a florilege #17

Comments

ndrean commented Dec 12, 2024 • edited Loading

Tackling Dirty Jobs with Erlang's Schedulers

Lukas Larsson - Understanding the Erlang Scheduler

Sasha Juric - How does Scheduler work

Rustler point-of-view

Deep dive into the Scheduler

ndrean commented Dec 19, 2024 • edited Loading

ndrean commented Dec 12, 2024 •

edited

Loading

ndrean commented Dec 19, 2024 •

edited

Loading