Helper class for readable parallel mapping.
Parameters: | n_jobs: int, default: 1 :
backend: str or None, default: ‘multiprocessing’ :
verbose: int, optional :
pre_dispatch: {‘all’, integer, or expression, as in ‘3*n_jobs’} :
batch_size: int or ‘auto’, default: ‘auto’ :
temp_folder: str, optional :
max_nbytes int, str, or None, optional, 1M by default :
|
---|
Notes
This object uses the multiprocessing module to compute in parallel the application of a function to many different arguments. The main functionality it brings in addition to using the raw multiprocessing API are (see examples for details):
More readable code, in particular since it avoids constructing list of arguments.
- Easier debugging:
- informative tracebacks even when the error happens on the client side
- using ‘n_jobs=1’ enables to turn off parallel computing for debugging without changing the codepath
- early capture of pickling errors
An optional progress meter.
Interruption of multiprocesses jobs with ‘Ctrl-C’
Flexible pickling control for the communication to and from the worker processes.
Ability to use shared memory efficiently with worker processes for large numpy-based datastructures.
Examples
A simple example:
>>> from math import sqrt
>>> from joblib import Parallel, delayed
>>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
Reshaping the output when the function has several return values:
>>> from math import modf
>>> from joblib import Parallel, delayed
>>> r = Parallel(n_jobs=1)(delayed(modf)(i/2.) for i in range(10))
>>> res, i = zip(*r)
>>> res
(0.0, 0.5, 0.0, 0.5, 0.0, 0.5, 0.0, 0.5, 0.0, 0.5)
>>> i
(0.0, 0.0, 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 4.0, 4.0)
The progress meter: the higher the value of verbose, the more messages:
>>> from time import sleep
>>> from joblib import Parallel, delayed
>>> r = Parallel(n_jobs=2, verbose=5)(delayed(sleep)(.1) for _ in range(10))
[Parallel(n_jobs=2)]: Done 1 out of 10 | elapsed: 0.1s remaining: 0.9s
[Parallel(n_jobs=2)]: Done 3 out of 10 | elapsed: 0.2s remaining: 0.5s
[Parallel(n_jobs=2)]: Done 6 out of 10 | elapsed: 0.3s remaining: 0.2s
[Parallel(n_jobs=2)]: Done 9 out of 10 | elapsed: 0.5s remaining: 0.1s
[Parallel(n_jobs=2)]: Done 10 out of 10 | elapsed: 0.5s finished
Traceback example, note how the line of the error is indicated as well as the values of the parameter passed to the function that triggered the exception, even though the traceback happens in the child process:
>>> from heapq import nlargest
>>> from joblib import Parallel, delayed
>>> Parallel(n_jobs=2)(delayed(nlargest)(2, n) for n in (range(4), 'abcde', 3))
#...
---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
TypeError Mon Nov 12 11:37:46 2012
PID: 12934 Python 2.7.3: /usr/bin/python
...........................................................................
/usr/lib/python2.7/heapq.pyc in nlargest(n=2, iterable=3, key=None)
419 if n >= size:
420 return sorted(iterable, key=key, reverse=True)[:n]
421
422 # When key is none, use simpler decoration
423 if key is None:
--> 424 it = izip(iterable, count(0,-1)) # decorate
425 result = _nlargest(n, it)
426 return map(itemgetter(0), result) # undecorate
427
428 # General case, slowest method
TypeError: izip argument #1 must support iteration
___________________________________________________________________________
Using pre_dispatch in a producer/consumer situation, where the data is generated on the fly. Note how the producer is first called a 3 times before the parallel loop is initiated, and then called to generate new data on the fly. In this case the total number of iterations cannot be reported in the progress messages:
>>> from math import sqrt
>>> from joblib import Parallel, delayed
>>> def producer():
... for i in range(6):
... print('Produced %s' % i)
... yield i
>>> out = Parallel(n_jobs=2, verbose=100, pre_dispatch='1.5*n_jobs')(
... delayed(sqrt)(i) for i in producer())
Produced 0
Produced 1
Produced 2
[Parallel(n_jobs=2)]: Done 1 jobs | elapsed: 0.0s
Produced 3
[Parallel(n_jobs=2)]: Done 2 jobs | elapsed: 0.0s
Produced 4
[Parallel(n_jobs=2)]: Done 3 jobs | elapsed: 0.0s
Produced 5
[Parallel(n_jobs=2)]: Done 4 jobs | elapsed: 0.0s
[Parallel(n_jobs=2)]: Done 5 out of 6 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=2)]: Done 6 out of 6 | elapsed: 0.0s finished
Methods
__init__([n_jobs, backend, verbose, ...]) | |
debug(msg) | |
Parallel.dispatch | |
dispatch_next() | Dispatch more data for parallel processing |
format(obj[, indent]) | Return the formated representation of the object. |
print_progress() | Display the process of the parallel execution only a fraction of time, controlled by self.verbose. |
retrieve() | |
warn(msg) |