Profiling Python Code

To improve the performance of your applications, you need to conduct some kind of dynamic (program, software, code) analysis, also called profiling, to measure metrics of interest. One key metric for developers is time – where is the code spending most of its time? – because you can the focus on areas, or hotspots, that can be made to run faster.

It might seem obvious, but it is worth noting that if you don’t profile for code optimization, you could flounder all around the code improving sections you think might be bottlenecks. I have watched people spend hours and hours working a particular part of their code when a simple profile showed that portion of the code contributed very little to the overall run time. I admit that I have done this before, focusing on loops that I thought would take the most time, without profiling; however, once I profiled, I found that I had wasted my time and I should focus elsewhere.

Different kinds of profiling (e.g., event-based, statistical, instrumented, simulation), are used in different situations. In this article, I focus on two types: deterministic and statistical. Deterministic profiling captures every computation of the code and produces very accurate profiles, but it can greatly slow down code performance. Although you achieve very good accuracy with the profile, run times are greatly increased, and you have to wonder whether the profiling didn’t adversely affect how the code ran; for example, did the profiling cause the computation bottlenecks to move to a different place in the code?

On the other hand, statistical profiling takes periodic “samples” of the code computations and uses them as representations of the profile of the code. This method usually has very little effect on code performance, so you can get a profile that is very close to the real execution of the code. One has to wonder about the correct time interval to get an accurate profile of the application while not affecting the run time. Usually this means setting the time intervals to smaller and smaller values to capture the profile accurately. If the interval becomes too small, then it almost becomes deterministic profiling, and run time is greatly increased.

On the other hand, if your code takes a long time to execute (e.g., hours or days), deterministic profiling might be impossible because the increase in run time is unacceptable. In this case, statistical profiling is appropriate because of the longer periods of time available to sample performance.

In this article, I focus on profiling Python code, primarily because of a current lack of Python profiling but also because I think profiling Python code, creating functions, and using Numba to then compile these functions for CPUs or GPUs is a good process that improves performance.

To help illustrate some tools you can use to profile Python code, I will use an example of an idealized molecular dynamics (MD) application. I’ll work through some profiling tools and modify the code in a reasonable manner for better profiling. The first, and probably most used and flexible, method I want to mention is “manual” profiling.

Manual Profiling

The manual profiling approach is fairly simple but involves inserting timing points into your code. Timing points surround a section of code and collect the total elapsed time(s) for the section, as well as how many times the section is executed. From this information you can calculate an average elapsed time. The timing points can be spread throughput the code, so you get an idea of how much time each section of the code takes. The elapsed times are printed at the end of execution, to give you an idea of where you should focus your efforts to improve performance.

A key advantage of this approach is its generally low overhead. Additionally, you can control what portions of the code are timed (you don’t have to profile the entire code). A downside is that you have to instrument your code by inserting timing points throughout. However, inserting these points is not difficult.

One easy way to accomplish this uses the Python time module. Simple code from an article on the Better Programming website (example 16) is shown in Listing 1. The code simply calls the current time before and after a section of code of interest. The difference is elapsed time, or the amount of time needed to execute that section of code.

Listing 1: Time to Execute

import time
 
start_time = time.time()
# Code to check follows
a, b = 1,2
c = a + b
# Code to check ends
end_time = time.time()
time_taken = (end_time- start_time)
 
print(" Time taken in seconds: {0} s").format(time_taken_in_micro)

If a section of code is called repeatedly, just sum the elapsed times for the section and sum the number of times that section is used; then, you can compute the average time through the code section. If the number of calls is large enough, you can do some quick descriptive statistics and compute the mean, median, variance, min, max, and deviations.

cProfile

cProfile is a deterministic profiler for Python and is recommended “… for most users.” In general terms, it creates a set of statistics that lists the total time spent in certain parts of the code, as well as how often the portion of the code was called.

cProfile, as the name hints, is written in C as a Python extension and comes in the standard Python 3, which keeps the overhead low, so the profiler doesn’t affect the amount of time much.

cProfile outputs a few stats about the test code:

  • ncalls – number of calls to the portion of code
  • tottime– total time spent in the given function (excludes time made in calls to subfunctions)
  • percall – tottime divided by ncalls
  • cumtime – cumulative time spent in the specific function, including all subfunctions
  • percallcumtime divided by ncalls

cProfile also outputs the file name of the code, in case multiple file are involved, as well as the line number of the function (lineno).

Running cProfile is fairly simple:

$ python -m cProfile -s cumtime script.py

The first part of the command tells Python to use the cProfile module. The output from cProfile is sorted (-s) by cumtime (cumulative time). The last option on the command line is the Python code of interest. cProfile also has an option (-o) to send the stats to an output file instead of stdout. Listing 2 is a sample of the first few lines from cProfile on a variation of the MD code.

Listing 2: cProfile Output

Thu Nov  7 08:09:57 2019
         12791143 function calls (12788375 primitive calls) in 156.745 seconds
 
   Ordered by: cumulative time
 
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    148/1    0.001    0.000  156.745  156.745 {built-in method builtins.exec}
        1  149.964  149.964  156.745  156.745 md_002.py:3()
 12724903    3.878    0.000    3.878    0.000 {built-in method builtins.min}
        1    2.649    2.649    2.727    2.727 md_002.py:10(init)
       50    0.168    0.003    0.168    0.003 md_002.py:127(update)
    175/2    0.001    0.000    0.084    0.042 :978(_find_and_load)
...