Parallelizing and memorizing Python programs with Joblib

A Library for Many Jobs

Memory Mapping

Under normal circumstances, mmap_mode='r+' is recommended for enabling memory mapping. This value opens an optionally existing file and appends new data. In the other modes, Memory does not write any new data but only reads from the existing file (r) or overwrites the existing data (w+). The c (copy-on-write) mode tells Memory to treat the file on the disk as immutable, as with r, but it does keep new assignments in memory.

If you need to save disk space rather than time, you can initialize the memory object with the argument compress=True. This option tells the Memory function to compress results when saving to disk; however, it rules out the option of memory mapping.

Finally, the Memory class also allows you to issue status messages. Its verbose constructor argument defaults to 1, which means that cache() outputs a status message every time a memorized function is called when computing the results from scratch. If you substitute verbose=0, the potentially very numerous status reports are suppressed. Substituting the default value for something higher tells Memory to report on each call of the function, whether the result was in a file or is recomputed.

Finally, cache() uses the ignore parameter to accept a list of function arguments that it ignores during memorization. This functionality is useful, if individual function arguments only affect the screen output but not the function result. Listing 4 shows the f(x) function with the additional verbose argument, whose value is irrelevant for the return value of the function.

Listing 4

Ignoring Individual Arguments

01 from joblib import Memory
02
03 memory = Memory()
04
05 @memory.cache(ignore=['verbose'])
06 def f(x, verbose=0):
07     if verbose > 0:
08         print('Running f(x).')
09     return x

On Disk

Joblib also provides two functions for saving and loading Python objects: joblib.dump() and joblib.load(). These functions are also used in the Memory class, but they also work independently of it and replace the Python pickle module's mechanisms for serializing objects with what are often more efficient methods. In particular, Joblib stores large NumPy arrays quickly and in a space-saving way.

The joblib.dump() function accepts any Python object and a file name as arguments. Without other parameters, the object ends up in the specified file. Calling joblib.load() with the same file name then restores this object:

import joblib
x = ...
joblib.dump(x, 'file')
...
x = joblib.load('file')

Like Memory, dump() also supports the optional compress parameter. This parameter is a number from 0 to 9, indicating the compression level: 0 means no compression at all; 9 uses the least disk space but also takes the most time. In combination with compress, the cache_size argument also determines how much memory Joblib uses to compress data quickly before writing to disk. The specified value describes the size in megabytes, but that is merely an estimate that Joblib exceeds if needed, such as when handling very large NumPy arrays.

The dump() complement load() also optionally uses the memory mapping method – like Memory. The mmap_mode argument enables this with the same parameters and possible values as for Memory: r+, r, w+, and c are used for reading and writing, exclusive reading, overwriting, or read-only and in-memory completion.

Prestigious Helper

The value of the Joblib library is hard to overstate. It solves some common tasks in a flash with an intuitive interface. The problems  – simple parallelization, memorization, and saving and loading objects  – are those programmers often encounter in practice. What you find here is a convenient solution that gives you more time to devote to genuine problems.

Joblib is included in most distributions and can otherwise easily be imported with the Python package management tools, Easy Install and Pip, using easy_install joblib or pip install joblib. This process is quick, because – besides Python itself – Joblib does not require any other packages.

Infos

  1. Joblib for Python: http://pythonhosted.org/joblib/
  2. Caching in RAM with Python: http://code.activestate.com/recipes/52201/
  3. Python NumPy library: http://www.numpy.org/

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Parallel Python with Joblib

    The Joblib Python Library handles frequent problems – like parallelization, memorization, and saving and loading objects – in almost no time, giving programmers more freedom to push on with their core tasks.

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=