Profiling Python code

In Profile

Line-by-Line Function Profiling

The useful pprofile analyzes your entire code line by line. It can also do deterministic and statistical profiling. If you want to focus on a specific function within your code, line_profiler and kernprof can help. The line_profiler module performs line-by-line profiling of functions, and the kernprof script allows you to run either line_profiler or standard Python profilers such as cProfile.

To have kernprof run line_profiler, enter,

$ kernprof -l script_to_profile.py

which will produce a binary file, script_to_profile.py.lprof. To "decode" the data, you can enter the command:

$ python3 -m line_profiler   script_to_profile.py.lprof > results.txt

and look at the results.txt file.

To get line_profiler to profile only certain functions, put an @profile decorator before the function declaration. The output is the elapsed time for the routine. The percentage of time, which is something I tend to check first, is relative to the total time for the function (be sure to remember that). The example in Listing 4 is output for some example code discussed in the next section.

Listing 4

Profiling a Function

Total time: 0.365088 s
File: ./md_002.py
Function: update at line 126
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   126                                           @profile
   127                                           def update(d_num, p_num, rmass, dt, pos, vel, acc, force):
   128
   129                                               # Update
   130
   131                                               #  Update positions
   132       200        196.0      1.0      0.1      for i in range(0, d_num):
   133     75150      29671.0      0.4      8.1          for j in range(0, p_num):
   134     75000     117663.0      1.6     32.2              pos[i,j] = pos[i,j] + vel[i,j]*dt + 0.5 * acc[i,j]*dt*dt
   135                                                   # end for
   136                                               # end for
   137
   138                                               #  Update velocities
   139       200         99.0      0.5      0.0      for i in range(0, d_num):
   140     75150      29909.0      0.4      8.2          for j in range(0, p_num):
   141     75000     100783.0      1.3     27.6              vel[i,j] = vel[i,j] + 0.5*dt*( force[i,j] * rmass + acc[i,j] )
   142                                                   # end for
   143                                               # end for
   144
   145                                               #  Update accelerations.
   146       200         95.0      0.5      0.0      for i in range(0, d_num):
   147     75150      29236.0      0.4      8.0          for j in range(0, p_num):
   148     75000      57404.0      0.8     15.7              acc[i,j] = force[i,j]*rmass
   149                                                   # end for
   150                                               # end for
   151
   152        50         32.0      0.6      0.0      return pos, vel, acc

Example Code

To better illustrate the process of using a profiler, I chose some MD Python code with a fair amount of arithmetic intensity that could easily be put into functions. Because I'm not a computational chemist, let me quote from the website: "The computation involves following the paths of particles which exert a distance-dependent force on each other. The particles are not constrained by any walls; if particles meet, they simply pass through each other. The problem is treated as a coupled set of differential equations. The system of differential equation is discretized by choosing a discrete time step. Given the position and velocity of each particle at one time step, the algorithm estimates these values at the next time step. To compute the next position of each particle requires the evaluation of the right hand side of its corresponding differential equation."

Serial Code and Profiling

When you download the Python version of the code, it already has several functions. To better illustrate profiling the code, I converted it to simple serial code and called it md_001.py (Listing 5). Then, I profiled the code with cProfile:

$ python3 -m cProfile -s cumtime md_001.py

Listing 5

md001.py

## MD is the main program for the molecular dynamics simulation.
#
#  Discussion:
#    MD implements a simple molecular dynamics simulation.
#
#    The velocity Verlet time integration scheme is used.
#    The particles interact with a central pair potential.
#
#  Licensing:
#    This code is distributed under the GNU LGPL license.
#  Modified:
#    26 December 2014
#
#  Author:
#    John Burkardt
#
#  Parameters:
#    Input, integer D_NUM, the spatial dimension.
#    A value of 2 or 3 is usual.
#    The default value is 3.
#
#    Input, integer P_NUM, the number of particles.
#    A value of 1000 or 2000 is small but "reasonable".
#    The default value is 500.
#
#    Input, integer STEP_NUM, the number of time steps.
#    A value of 500 is a small but reasonable value.
#    The default value is 500.
#
#    Input, real DT, the time step.
#    A value of 0.1 is large; the system will begin to move quickly but the
#    results will be less accurate.
#    A value of 0.0001 is small, but the results will be more accurate.
#    The default value is 0.1.
#
import platform
from time import clock
import numpy as np
from sys import exit
import time
def timestamp ( ):
    t = time.time ( )
    print ( time.ctime ( t ) )
    return None
# end def
# ===================
# Main section of code
# ====================
timestamp()
print('')
print('MD_TEST')
print('  Python version: %s' % (platform.python_version( ) ))
print('  Test the MD molecular dynamics program.')
# Initialize variables
d_num = 3
p_num = 500
step_num = 50
dt = 0.1
mass = 1.0
rmass = 1.0 / mass
wtime1 = clock( )
#  output:
print('' )
print('MD' )
print('  Python version: %s' % (platform.python_version( ) ) )
print('  A molecular dynamics program.' )
print('' )
print('  D_NUM, the spatial dimension, is %d' % (d_num) )
print('  P_NUM, the number of particles in the simulation is %d.' % (p_num) )
print('  STEP_NUM, the number of time steps, is %d.' % (step_num) )
print('  DT, the time step size, is %g seconds.' % (dt) )
print('' )
print('  At each step, we report the potential and kinetic energies.' )
print('  The sum of these energies should be a constant.' )
print('  As an accuracy check, we also print the relative error' )
print('  in the total energy.' )
print('' )
print('      Step      Potential       Kinetic        (P+K-E0)/E0' )
print('                Energy P        Energy K       Relative Energy Error')
print('')
step_print_index = 0
step_print_num = 10
step_print = 0
for step in range(0, step_num+1):
    if (step == 0):
        # Initialize
        #  Positions
        seed = 123456789
        a = 0.0
        b = 10.0
        i4_huge = 2147483647
        if (seed < 0):
            seed = seed + i4_huge
        # end if
        if (seed == 0):
             print('' )
             print( 'R8MAT_UNIFORM_AB - Fatal error!')
             print('  Input SEED = 0!' )
             sys.ext('R8MAT_UNIFORM_AB - Fatal error!')
        # end if
        pos = np.zeros( (d_num, p_num) )
        for j in range(0, p_num):
            for i in range(0, d_num):
                k = (seed // 127773)
                seed = 16807 * (seed - k * 127773) - k * 2836
                seed = (seed % i4_huge)
                if (seed < 0):
                       seed = seed + i4_huge
                # end if
                pos[i,j] = a + (b - a) * seed * 4.656612875E-10
            # end for
        # end for
        #  Velocities
        vel = np.zeros([ d_num, p_num ])
        #  Accelerations
        acc = np.zeros([ d_num, p_num ])
    else:
        # Update
        #  Update positions
        for i in range(0, d_num):
             for j in range(0, p_num):
                pos[i,j] = pos[i,j] + vel[i,j] * dt + 0.5 * acc[i,j] * dt * dt
            # end for
        # end for
        #  Update velocities
        for i in range(0, d_num):
            for j in range(0, p_num):
                 vel[i,j] = vel[i,j] + 0.5 * dt * ( force[i,j] * rmass + acc[i,j] )
            # end for
        # end for
         #  Update accelerations.
        for i in range(0, d_num):
             for j in range(0, p_num):
                acc[i,j] = force[i,j] * rmass
            # end for
        # end for
    # endif
    # compute force, potential, kinetic
    force = np.zeros([ d_num, p_num ])
    rij = np.zeros(d_num)
    potential = 0.0
    for i in range(0, p_num):
        #  Compute the potential energy and forces.
        for j in range(0, p_num):
            if (i != j):
                #  Compute RIJ, the displacement vector.
                for k in range(0, d_num):
                    rij[k] = pos[k,i] - pos[k,j]
                # end for
                #  Compute D and D2, a distance and a truncated distance.
                d = 0.0
                for k in range(0, d_num):
                    d = d + rij[k] ** 2
                # end for
                d = np.sqrt(d)
                d2 = min(d, np.pi / 2.0)
                #  Attribute half of the total potential energy to particle J.
                potential = potential + 0.5 * np.sin(d2) * np.sin(d2)
                #  Add particle J's contribution to the force on particle I.
                for k in range(0, d_num):
                    force[k,i] = force[k,i] - rij[k] * np.sin(2.0 * d2) / d
                # end for
            # end if
        # end for
    # end for
    #  Compute the kinetic energy
    kinetic = 0.0
    for k in range(0, d_num):
        for j in range(0, p_num):
            kinetic = kinetic + vel[k,j] ** 2
        # end for
     # end for
    kinetic = 0.5 * mass * kinetic
    if (step == 0):
         e0 = potential + kinetic
    # endif
    if (step == step_print):
        rel = (potential + kinetic - e0) / e0
        print('  %8d  %14f  %14f  %14g' % (step, potential, kinetic, rel) )
        step_print_index = step_print_index + 1
        step_print = (step_print_index * step_num) // step_print_num
    #end if
# end step
wtime2 = clock( )
print('')
print('    Elapsed wall clock time = %g seconds.' % (wtime2 - wtime1) )
#  Terminate
print('')
print('MD_TEST')
print('  Normal end of execution.')
timestamp ( )
# end if

Listing 6 is the top of the profile output ordered by cumulative time (cumtime). Notice that the profile output only lists the code itself. Because it doesn't profile the code line by line, it's impossible to learn anything about the code.

Listing 6

cProfile Output

Sat Oct 26 09:43:21 2019
         12791090 function calls (12788322 primitive calls) in 163.299 seconds
   Ordered by: cumulative time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    148/1    0.001    0.000  163.299  163.299 {built-in method builtins.exec}
        1  159.297  159.297  163.299  163.299 md_001.py:3()
 12724903    3.918    0.000    3.918    0.000 {built-in method builtins.min}
    175/2    0.001    0.000    0.083    0.042 :978(_find_and_load)
    175/2    0.001    0.000    0.083    0.042 :948(_find_and_load_unlocked)
    165/2    0.001    0.000    0.083    0.041 :663(_load_unlocked)
...

I also used pprofile:

$ pprofile md_001.py

The default options cause the code to run much slower because it is tracking all computations (i.e., it is not sampling), but the code lines relative to the run time still impart some good information (Listing 7). Note that the code ran slower by about a factor of 10. Only the parts of the code with some fairly large percentages of time are shown.

Listing 7

pprofile Output

Command line: ./md_001.py
Total duration: 1510.79s
File: ./md_001.py
File duration: 1510.04s (99.95%)
Line #|      Hits|         Time| Time per hit|      %|Source code
------+----------+-------------+-------------+-------+-----------
...
   141|     25551|    0.0946999|  3.70631e-06|  0.01%|    for i in range(0, p_num):
   142|         0|            0|            0|  0.00%|
   143|         0|            0|            0|  0.00%|        #  Compute the potential energy and forces.
   144|  12775500|      47.1989|  3.69449e-06|  3.12%|        for j in range(0, p_num):
   145|  12750000|      47.4793|  3.72387e-06|  3.14%|            if (i != j):
   146|         0|            0|            0|  0.00%|
   147|         0|            0|            0|  0.00%|                #  Compute RIJ, the displacement vector.
   148|  50898000|      194.963|  3.83046e-06| 12.90%|                for k in range(0, d_num):
   149|  38173500|      166.983|  4.37432e-06| 11.05%|                    rij[k] = pos[k,i] - pos[k,j]
   150|         0|            0|            0|  0.00%|                # end for
   151|         0|            0|            0|  0.00%|
   152|         0|            0|            0|  0.00%|                #  Compute D and D2, a distance and a truncated distance.
   153|         0|            0|            0|  0.00%|
   154|  12724500|      46.7333|   3.6727e-06|  3.09%|                d = 0.0
   155|  50898000|      195.426|  3.83956e-06| 12.94%|                for k in range(0, d_num):
   156|  38173500|      165.494|  4.33531e-06| 10.95%|                    d = d + rij[k] ** 2
   157|         0|            0|            0|  0.00%|                # end for
   158|  12724500|      72.0723|  5.66406e-06|  4.77%|                d = np.sqrt(d)
   159|  12724500|      59.0492|  4.64059e-06|  3.91%|                d2 = min(d, np.pi / 2.0)
   160|         0|            0|            0|  0.00%|
   161|         0|            0|            0|  0.00%|                #  Attribute half of the total potential energy to particle J.
   162|  12724500|      76.7099|  6.02852e-06|  5.08%|                potential = potential + 0.5 * np.sin(d2) * np.sin(d2)
   163|         0|            0|            0|  0.00%|
   164|         0|            0|            0|  0.00%|                #  Add particle J's contribution to the force on particle I.
   165|  50898000|      207.158|  4.07005e-06| 13.71%|                for k in range(0, d_num):
   166|  38173500|      228.123|  5.97595e-06| 15.10%|                    force[k,i] = force[k,i] - rij[k] * np.sin(2.0 * d2) / d
   167|         0|            0|            0|  0.00%|                # end for
   168|         0|            0|            0|  0.00%|            # end if
   169|         0|            0|            0|  0.00%|
   170|         0|            0|            0|  0.00%|        # end for
   171|         0|            0|            0|  0.00%|    # end for
   172|         0|            0|            0|  0.00%|
...

The output from pprofile provides an indication of where the code uses the most time:

* The loop computing <C>rij[k]<C>.
* The loop summing <C>d<C> (collective operation).
* Computing the square root of <C>d<C>.
* Computing <C>d2<C>.
* Computing the <C>potential<C> energy.
* The loop computing the <C>force<C> array.

Another option is to put timing points throughout the code, focusing primarily on the section of the code computing potential energy and forces. This code produced the output shown in Listing 8. Notice that the time to compute the potential and force update values is 181.9 seconds with a total time of 189.5 seconds. Obviously, this is where you would need to focus your efforts to improve code performance.

Listing 8

md_001b.py Output

Elapsed wall clock time = 189.526 seconds.
    Total Time for position update =       0.089102 seconds
      Avg Time for position update =       0.001782 seconds
    Total Time for velocity update =       0.073946 seconds
      Avg Time for velocity update =       0.001479 seconds
    Total Time for acceleration update =       0.031308 seconds
      Avg Time for acceleration update =       0.000626 seconds
    Total Time for potential update =     103.215999 seconds
      Avg Time for potential update =       0.000008 seconds
    Total Time for force update =     181.927011 seconds
      Avg Time for force update =       0.000014 seconds
    Total Time for force loop =      75.269342 seconds
      Avg Time for force loop =       0.000006 seconds
    Total Time for rij loop =      25.444300 seconds
      Avg Time for rij loop =       0.000002 seconds
MD_TEST
  Normal end of execution.

« Previous 1 2 3 4 Next »