Use loky for multi-platform multiprocessing #72
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Python's multi-process parallelization tools rely on the pickle protocol for inter-process communication (IPC). Some objects, such as nested functions, anonymous functions (lambdas) or functions defined in a Jupyter notebook cannot be pickled, which limits the use of parallelization.
The problem is exacerbated on platforms that use "spawn" to create subprocesses, instead of "fork". This happens on Windows and since Python 3.8 also on macOS. On Linux, "fork" starts subprocesses such that they inherit the full state of the parent process. Only "output" variables have to be transferred via IPC. In contrast, "spawned" processes start with a clean slate, and all input variables need to be transferred via IPC. Of course, this increases the parallelization overhead significantly (not much we can do about that. Use Linux for any serious work.)
This affects
krotov
and the example notebooks in a number of ways:qutip.parallel.parallel_map
, but QuTiP's implementation suffers from the above limitations for systems using "spawn". The existing example notebooks switch outparallel_map
toserial_map
on Windows, but this is a very inelegant solution, and since Python 3.8 this would need to be done on macOS as wellkrotov.parallelization.parallel_map_fw_prop_step
may choke depending on the platform and the kind of objects in theobjectives
.Both of these will cause either crashes in scripts or indefinite freezes in Jupyter notebooks.
A possible solution to these problems, at least in terms of stability, is to use the loky library as a replacement for the standard library
multiprocessing
, which replaces thepickle
protocol withcloudpickle
for IPC. This enables the transfer of all the "unpickleable" functions that the standard library refuses to handle (at the cost of additional IPC overhead).For
qutip.parallel.parallel_map
, I had hoped to get a pull request merged that would be able to useloky
/cloudpickle
for IPC (QuTiP itself has also been running into these issues, qutip/qutip-notebooks#100, qutip/qutip#1202). Unfortunately, my new implementation interacts badly with some of the low-level Cython parts of QuTiP, and it's not clear how soon these problems can be overcome.Therefore, in this PR, I'm adding my
parallel_map
implementation usingloky
to thekrotov
, so that it's available for use in the example notebooks. Also, theparallel_map_fw_prop_step
is extended with aloky
-based alternative implementation. This should make thekrotov
examples stable on all platforms. Unfortunately, on anything but Linux, the increased IPC overhead will likely outweigh any benefit of parallelization; but it's better to be slow and correct than to crash or freeze the program.