Applying OpenMP techniques to Python code.

Pymp – OpenMP-like Python Programming

Ever since Python was created, users have been looking for ways to achieve multiprocessing with threads, which the Python global interpreter lock (GIL) prevents. One common approach to getting around the GIL is to run computationally intensive code outside of Python with tools such as Cython and ctypes. You can even use F2PY with compiled C functions.

All of the previously mentioned tools “bypass” Python and rely on a compiled language to provide threaded multiprocessing elements with an interface to Python. What is really needed is either a way to perform threaded processing or a form of multiprocessing in Python itself. A very interesting tool for this purpose is Pymp, a Python-based method of providing OpenMP-like functionality.

OpenMP

OpenMP employs a few principles in its programming model. The first is that everything takes place in threads. The second is the fork-join model, which comprises parallel regions in which one or more threads can be used (Figure 1).

Figure 1: Illustration of the fork-join model for OpenMP.

Only a single thread (the master thread) exists before the parallel region of the OpenMP program. When the parallel region is encountered, the master thread creates a team of parallel threads. The code in this parallel region is executed in parallel among the various team threads.

When the threads complete their code in the parallel region, they synchronize and terminate, with only the master thread remaining. Inside the parallel region threads typically share data, and all of the threads can access this shared data at the same time.

The process of forking threads in a parallel region, joining the data back to the master thread, and terminating the other threads can be done many times in a single program, although you don’t want to do it too often because of the need to create and destroy the threads.

Pymp

Because the goal of Pymp is to bring OpenMP-like functionality to Python, Pymp and Python should naturally share some concepts. A single master thread forks into multiple threads, sharing data and then synchronizing (joining) and destroying the threads.

As with OpenMP applications, when Pymp Python code hits a parallel region, processes – termed child processes – are forked and are in a state that is nearly the same as the “master process.” Note that these are forked processes and not threads, as is typical with OpenMP applications. As for the shared memory, according to the Pymp website, “… the memory is not copied, but referenced. Only when a process writes into a part of the memory [does] it gets its own copy of the corresponding memory region. This keeps the processing overhead low (but of course not as low as for OpenMP threads).”

As the parallel region ends (the “join” phase), all of the child processes exit so that only the master process continues. Any data structures from the child processes are synchronized using either shared memory or a manager process and the pickle protocol via the multiprocessor module. This module has an API similar to the threading module and supports spawning processes.

As with OpenMP, Pymp numbers the child processes with thethread_num variable. The master process has thread_num 0.

With OpenMP, you define a parallel region. In Fortran and C the regions are defined by the directives in Listing 1. Pymp has no way to mark the parallel region. The Pymp website recommends you use a pymp.range or pymp.xrange statement, or even an if-else statement. Doing so achieves the same expected behavior.

Listing 1: Defining a Parallel Region

Fortran C
!$omp parallel
...
!$omp end parallel
 
#pragma omp parallel
{   
...}
#pragma end parallel

From the website, example code might look like:

with pymp.Parallel(4) as p:
  for sec_idx in p.xrange(4):
    if sec_idx == 0:
      p.print('Section 0')
    elif sec_idx == 1:
      p.print('Section 1')
    ...

The first statement in the code outline defines the parallel construct.

As with OpenMP code, you can control various aspects of Pymp code with environment variables (e.g., with the OpenMP variables that begin with OMP), or you can use Pymp versions that begin with PYMP. The mapping is pretty straightforward:

  • PYMP_NESTED/OMP_NESTED
  • PYMP_THREAD_LIMIT/OMP_THREAD_LIMIT
  • PYMP_NUM_THREADS/OMP_NUM_THREADS

The first variable is a binary: TRUE or FALSE. The second variable can be used to set a limit on the number of threads. The third variable is a comma-separated list of the number of threads per nesting level. If only one value is specified, it is used for all levels.

Other aspects to Pymp are explained on the website, including:

  • Schedules
  • Variable scopes
  • Nested Loops
  • Exceptions
  • Conditional parallelism
  • Reductions
  • Iterables

This article is too short to cover these topics, but if you are interested, the GitHub site briefly explains them, and you can create some simple examples for further exploration.