Use loky for multi-platform multiprocessing #72

goerz · 2020-03-22T17:08:55Z

Python's multi-process parallelization tools rely on the pickle protocol for inter-process communication (IPC). Some objects, such as nested functions, anonymous functions (lambdas) or functions defined in a Jupyter notebook cannot be pickled, which limits the use of parallelization.

The problem is exacerbated on platforms that use "spawn" to create subprocesses, instead of "fork". This happens on Windows and since Python 3.8 also on macOS. On Linux, "fork" starts subprocesses such that they inherit the full state of the parent process. Only "output" variables have to be transferred via IPC. In contrast, "spawned" processes start with a clean slate, and all input variables need to be transferred via IPC. Of course, this increases the parallelization overhead significantly (not much we can do about that. Use Linux for any serious work.)

This affects krotov and the example notebooks in a number of ways:

The documentation and examples advise using qutip.parallel.parallel_map, but QuTiP's implementation suffers from the above limitations for systems using "spawn". The existing example notebooks switch out parallel_map to serial_map on Windows, but this is a very inelegant solution, and since Python 3.8 this would need to be done on macOS as well
The krotov.parallelization.parallel_map_fw_prop_step may choke depending on the platform and the kind of objects in the objectives.

Both of these will cause either crashes in scripts or indefinite freezes in Jupyter notebooks.

A possible solution to these problems, at least in terms of stability, is to use the loky library as a replacement for the standard library multiprocessing, which replaces the pickle protocol with cloudpickle for IPC. This enables the transfer of all the "unpickleable" functions that the standard library refuses to handle (at the cost of additional IPC overhead).

For qutip.parallel.parallel_map, I had hoped to get a pull request merged that would be able to use loky/cloudpickle for IPC (QuTiP itself has also been running into these issues, qutip/qutip-notebooks#100, qutip/qutip#1202). Unfortunately, my new implementation interacts badly with some of the low-level Cython parts of QuTiP, and it's not clear how soon these problems can be overcome.

Therefore, in this PR, I'm adding my parallel_map implementation using loky to the krotov, so that it's available for use in the example notebooks. Also, the parallel_map_fw_prop_step is extended with a loky-based alternative implementation. This should make the krotov examples stable on all platforms. Unfortunately, on anything but Linux, the increased IPC overhead will likely outweigh any benefit of parallelization; but it's better to be slow and correct than to crash or freeze the program.

review-notebook-app · 2020-03-22T17:09:02Z

Check out this pull request on

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

codecov · 2020-03-22T17:33:13Z

Codecov Report

Merging #72 into master will increase coverage by 0.25%.
The diff coverage is 97.10%.

@@            Coverage Diff             @@
##           master      #72      +/-   ##
==========================================
+ Coverage   95.85%   96.11%   +0.25%     
==========================================
  Files          13       13              
  Lines        1545     1671     +126     
==========================================
+ Hits         1481     1606     +125     
- Misses         64       65       +1

Impacted Files	Coverage Δ
src/krotov/parallelization.py	`97.59% <96.22%> (-2.41%)`	⬇️
src/krotov/info_hooks.py	`94.65% <100.00%> (+0.17%)`	⬆️
src/krotov/objectives.py	`97.28% <100.00%> (+0.73%)`	⬆️
src/krotov/optimize.py	`97.64% <100.00%> (+0.09%)`	⬆️
src/krotov/propagators.py	`85.45% <100.00%> (+0.26%)`	⬆️
src/krotov/result.py	`93.93% <100.00%> (+0.25%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1b6945f...f44b4c4. Read the comment docs.

goerz · 2020-03-23T21:26:33Z

Benchmarking on Linux showed that krotov is substantially slowed down by unwanted multi-threading withing numpy (inside the expm propagator). As part of this PR, the threadpoolctl library is used in krotov.optimize_pulses, krotov.propagators.expm and throughout the krotov.parallelization module to (optionally) try to eliminate any low-level multithreading.

The documentation explains the problems extensively, and also recommends that users generally set

export MKL_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export OMP_NUM_THREADS=1

in their shell to catch any kind of situation that threadpoolctl might miss.

Add parallel_map

f1236a2

goerz force-pushed the loky branch from 7b1fb9e to 707a7a9 Compare March 22, 2020 17:33

goerz force-pushed the loky branch from 707a7a9 to 79c1e01 Compare March 22, 2020 18:09

Allow to use loky for all parallelization

e6c8f8b

goerz force-pushed the loky branch 4 times, most recently from 1f99e50 to 94dbfcb Compare March 23, 2020 21:16

Use threadpoolctl

f44b4c4

goerz force-pushed the loky branch from 94dbfcb to f44b4c4 Compare March 24, 2020 03:07

goerz merged commit 5ff0fc1 into master Mar 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use loky for multi-platform multiprocessing #72

Use loky for multi-platform multiprocessing #72

goerz commented Mar 22, 2020

review-notebook-app bot commented Mar 22, 2020

codecov bot commented Mar 22, 2020 •

edited

Loading

goerz commented Mar 23, 2020

Use loky for multi-platform multiprocessing #72

Use loky for multi-platform multiprocessing #72

Conversation

goerz commented Mar 22, 2020

review-notebook-app bot commented Mar 22, 2020

codecov bot commented Mar 22, 2020 • edited Loading

Codecov Report

goerz commented Mar 23, 2020

codecov bot commented Mar 22, 2020 •

edited

Loading