High-Performance Python – Compiled Code and C Interface

Cython

Cython is an optimizing static compiler for Python (Python2/Python3) and an extended programming language based on Pyrex. (Note: You had better be moving away from Python 2 and on to Python 3 pretty quickly because, as the Python wiki says, “Python 2.x is legacy, Python 3.x is the present and future of the language.”) The Pyrex language is used to create Python modules. It is really a superset of Python that also supports calling C functions and declaring C types on variables and class attributes. As a result, with some work, very efficient C code can be generated.

Unlike Numba, which is a JIT compiler, Cython translates the Python code to C and compiles it into an appropriate form to be used in Python. In general, the C code compiles with almost any C/C++ compiler, which makes Cython a good tool for compiling Python code that is frequently used but doesn’t change too much.

Cython Examples

Cython can accept almost any valid Python source file to produce C code. Compiling the C code is fairly simple. The first step in using Cython is the easiest: Select the code you want and put it into a separate file. You can have more than one function per file if you like.

The second step is to create the setup.py file, which is like a makefile for Python. It defines what Python file you want to compile into a shareable library and is where you can put options (e.g., compile options) you want to use. After compiling, be sure to test the code.

Here, I use two examples from a Cython tutorial. The first is a simple Hello World example, and the second is a summation example that uses a loop.

Hello World

The Python code to be compiled in the helloworld.pyx file is

print("Hello World")

which is just about the simplest one-line Python script you can have.

As previously mentioned, you need to create a setup.py file that is really a Python makefile:

from distutils.core import setup
from Cython.Build import cythonize
 
setup(
    ext_modules = cythonize("helloworld.pyx")
)

The first two lines are fairly standard for a Python setup.py file. After that, the setup command builds the binary (shared object). In this case, the command is to cythonize the helloworld.pyx file. To make life easier, be sure to put this file in the same directory as the code.

The system I used had Ubuntu 18.04 (with updates) and the Anaconda Python distribution. To build the binary, enter

$ python3 setup.py build_ext --inplace

The output is shown in Listing 3.

Listing 3: Binary Build

$ python3 setup.py build_ext --inplace
Compiling helloworld.py because it changed.
[1/1] Cythonizing helloworld.pyx
/home/laytonjb/anaconda3/lib/python3.7/site-packages/Cython/Compiler/Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /home/laytonjb/HPC-PYTHON-1/helloworld.pyx
  tree = Parsing.p_module(s, pxd, full_module_name)
running build_ext
building 'helloworld' extension
creating build
creating build/temp.linux-x86_64-3.7
gcc -pthread -B /home/laytonjb/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/laytonjb/anaconda3/include/python3.7m -c helloworld.c -o build/temp.linux-x86_64-3.7/helloworld.o
gcc -pthread -shared -B /home/laytonjb/anaconda3/compiler_compat -L/home/laytonjb/anaconda3/lib -Wl,-rpath=/home/laytonjb/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/helloworld.o -o /home/laytonjb/HPC-PYTHON-1/helloworld.cpython-37m-x86_64-linux-gnu.so

Note that the command line uses setup.py as the “configuration” for building the binary (shared object). In the output, you will see paths that correspond to the system I used. Don’t worry about this because setup.py takes care of the paths.

Now I can test the compiled Cython shared object:

>>> import helloworld
Hello World

It worked! These are the basic steps for creating a compiled binary (shared object) of Python code.

Summing

To begin, I’ll take some simple Python code from the Numba example and compute the sum of a one-dimensional list. Although I’m sure better code is out there for computing a sum, this example will teach you how to do a more mathematical example.

For this example, a simple function in the sum.pyx file computes the sum:

def sum(x):
  total = 0
  for i in range(x.shape[0]):
    total += x[i]
  return total

The code is compiled the same way as the Hello World code, with a change to the Python function in setup.py to cythonize sum.pyx. The code in Listing 4 tests the module in a Jupyter notebook.

Listing 4: Summation Test

import sum
import numpy
x = numpy.arange(10_000_000);
%time sum.sum(x)
 
CPU times: user 1.37 s, sys: 0 ns, total: 1.37 s
Wall time: 1.37 s

Notice that sum is the object and sum.sum is the function within the object, which means you can put more than one function in your Python code. Also notice that the time for running the code is about the same as the pure Python itself. Although you can optimize Cython code by, for example, employing OpenMP, I won’t discuss that here.

Ctypes

Cython takes Python code, converts it to C, and compiles it for you, but what if you have existing C code that you want to use in Python like a Python module? This is where ctypes can help.

The ctypes foreign function library provides C-compatible data types and lets you call functions in dynamic link libraries (DLLs) or shared libraries from within Python. In essence, it “wraps” these libraries so they can be called from Python. You can find ctypes with virtually any Python distribution.

To use ctypes, you typically start with your C/C++ code and build the shareable object as usual. However, be sure to use the position-independent code (PIC) flag and the shared flag (you’ll be building a library). For example, with gcc, you use the -fPIC and -shared options:

$ gcc -fPIC -shared -o libsource.so source.c

It is up to you to compile the C code and create a library using any method you like, as long as you use the -fPIC and -shared options.

Ctypes Example – sum

In the previous example, most of the work is done in the summation, so now I’ll rewrite that routine in C to get better performance. According to an online tutorial, an example in C for computing the sum and building it into a library is shown in Listing 5.

Listing 5: C Summation

int sum_function(int num_numbers, int *numbers) {
    int i;
    int sum = 0;
    for (i = 0; i < num_numbers; i++) {
        sum += numbers[i];
    }
    return sum;
}

The function is named sum_function and the code sum.c. This code can be compiled into a shared object (library) with gcc:

$ gcc -fPIC -shared -o libsum.so sum.c

The compiler creates the shared object libsum.so, a library.

To use the library in Python, a few specific ctypes functions and variables are needed within Python. Because it can make the Python code a bit complex, I write a “wrapper function” for the library in Python (Listing 6).

Listing 6: Wrapper Function

import ctypes
 
_sum = ctypes.CDLL('libsum.so')
_sum.sum_function.argtypes = (ctypes.c_int, ctypes.POINTER(ctypes.c_int))
 
def sum_function(numbers):
    global _sum
    num_numbers = len(numbers)
    array_type = ctypes.c_int * num_numbers
    result = _sum.sum_function(ctypes.c_int(num_numbers), array_type(*numbers))
    return int(result)

Notice that the specific function sum_function is defined. If you have a library with more than one function, you will have to create the interface for each function in this file.

Now to test the C function in Python:

import sum
import numpy
 
x = numpy.arange(10000000)
%time sum.sum_function(x)
 
CPU times: user 2.15 s, sys: 68.4 ms, total: 2.22 s
Wall time: 2.22 s

The eagle has landed! It works! However, you might notice that it is slower than even the pure Python code. Most likely the arithmetic intensity is not great enough to show an improvement. Do not let this deter you. It’s always worth trying ctypes if you want or need more performance.