Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use loky for multi-platform multiprocessing #72

Merged
merged 3 commits into from
Mar 24, 2020
Merged

Use loky for multi-platform multiprocessing #72

merged 3 commits into from
Mar 24, 2020

Conversation

goerz
Copy link
Member

@goerz goerz commented Mar 22, 2020

Python's multi-process parallelization tools rely on the pickle protocol for inter-process communication (IPC). Some objects, such as nested functions, anonymous functions (lambdas) or functions defined in a Jupyter notebook cannot be pickled, which limits the use of parallelization.

The problem is exacerbated on platforms that use "spawn" to create subprocesses, instead of "fork". This happens on Windows and since Python 3.8 also on macOS. On Linux, "fork" starts subprocesses such that they inherit the full state of the parent process. Only "output" variables have to be transferred via IPC. In contrast, "spawned" processes start with a clean slate, and all input variables need to be transferred via IPC. Of course, this increases the parallelization overhead significantly (not much we can do about that. Use Linux for any serious work.)

This affects krotov and the example notebooks in a number of ways:

  • The documentation and examples advise using qutip.parallel.parallel_map, but QuTiP's implementation suffers from the above limitations for systems using "spawn". The existing example notebooks switch out parallel_map to serial_map on Windows, but this is a very inelegant solution, and since Python 3.8 this would need to be done on macOS as well
  • The krotov.parallelization.parallel_map_fw_prop_step may choke depending on the platform and the kind of objects in the objectives.

Both of these will cause either crashes in scripts or indefinite freezes in Jupyter notebooks.

A possible solution to these problems, at least in terms of stability, is to use the loky library as a replacement for the standard library multiprocessing, which replaces the pickle protocol with cloudpickle for IPC. This enables the transfer of all the "unpickleable" functions that the standard library refuses to handle (at the cost of additional IPC overhead).

For qutip.parallel.parallel_map, I had hoped to get a pull request merged that would be able to use loky/cloudpickle for IPC (QuTiP itself has also been running into these issues, qutip/qutip-notebooks#100, qutip/qutip#1202). Unfortunately, my new implementation interacts badly with some of the low-level Cython parts of QuTiP, and it's not clear how soon these problems can be overcome.

Therefore, in this PR, I'm adding my parallel_map implementation using loky to the krotov, so that it's available for use in the example notebooks. Also, the parallel_map_fw_prop_step is extended with a loky-based alternative implementation. This should make the krotov examples stable on all platforms. Unfortunately, on anything but Linux, the increased IPC overhead will likely outweigh any benefit of parallelization; but it's better to be slow and correct than to crash or freeze the program.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

@codecov
Copy link

codecov bot commented Mar 22, 2020

Codecov Report

Merging #72 into master will increase coverage by 0.25%.
The diff coverage is 97.10%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #72      +/-   ##
==========================================
+ Coverage   95.85%   96.11%   +0.25%     
==========================================
  Files          13       13              
  Lines        1545     1671     +126     
==========================================
+ Hits         1481     1606     +125     
- Misses         64       65       +1     
Impacted Files Coverage Δ
src/krotov/parallelization.py 97.59% <96.22%> (-2.41%) ⬇️
src/krotov/info_hooks.py 94.65% <100.00%> (+0.17%) ⬆️
src/krotov/objectives.py 97.28% <100.00%> (+0.73%) ⬆️
src/krotov/optimize.py 97.64% <100.00%> (+0.09%) ⬆️
src/krotov/propagators.py 85.45% <100.00%> (+0.26%) ⬆️
src/krotov/result.py 93.93% <100.00%> (+0.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1b6945f...f44b4c4. Read the comment docs.

@goerz goerz force-pushed the loky branch 4 times, most recently from 1f99e50 to 94dbfcb Compare March 23, 2020 21:16
@goerz
Copy link
Member Author

goerz commented Mar 23, 2020

Benchmarking on Linux showed that krotov is substantially slowed down by unwanted multi-threading withing numpy (inside the expm propagator). As part of this PR, the threadpoolctl library is used in krotov.optimize_pulses, krotov.propagators.expm and throughout the krotov.parallelization module to (optionally) try to eliminate any low-level multithreading.

The documentation explains the problems extensively, and also recommends that users generally set

export MKL_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export OMP_NUM_THREADS=1

in their shell to catch any kind of situation that threadpoolctl might miss.

@goerz goerz merged commit 5ff0fc1 into master Mar 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant