# Profiling Python Code

#### Example Code

To better illustrate the process of using a profiler, I chose some MD Python code with a fair amount of arithmetic intensity that could easily be put into functions. Because I’m not a computational chemist, let me quote from the website: “The computation involves following the paths of particles which exert a distance-dependent force on each other. The particles are not constrained by any walls; if particles meet, they simply pass through each other. The problem is treated as a coupled set of differential equations. The system of differential equation is discretized by choosing a discrete time step. Given the position and velocity of each particle at one time step, the algorithm estimates these values at the next time step. To compute the next position of each particle requires the evaluation of the right hand side of its corresponding differential equation.”

##### Serial Code and Profiling

When you download the Python version of the code, it already has several functions. To better illustrate profiling the code, I converted it to simple serial code and called it md_001.py (Listing 5). Then, I profiled the code with cProfile:

`\$ python3 -m cProfile -s cumtime md_001.py`

Listing 5: md001.py

```## MD is the main program for the molecular dynamics simulation.
#
#  Discussion:
#    MD implements a simple molecular dynamics simulation.
#
#    The velocity Verlet time integration scheme is used.
#    The particles interact with a central pair potential.
#
#  Licensing:
#    This code is distributed under the GNU LGPL license.
#  Modified:
#    26 December 2014
#
#  Author:
#    John Burkardt
#
#  Parameters:
#    Input, integer D_NUM, the spatial dimension.
#    A value of 2 or 3 is usual.
#    The default value is 3.
#
#    Input, integer P_NUM, the number of particles.
#    A value of 1000 or 2000 is small but "reasonable".
#    The default value is 500.
#
#    Input, integer STEP_NUM, the number of time steps.
#    A value of 500 is a small but reasonable value.
#    The default value is 500.
#
#    Input, real DT, the time step.
#    A value of 0.1 is large; the system will begin to move quickly but the
#    results will be less accurate.
#    A value of 0.0001 is small, but the results will be more accurate.
#    The default value is 0.1.
#

import platform
from time import clock
import numpy as np
from sys import exit
import time

def timestamp ( ):
t = time.time ( )
print ( time.ctime ( t ) )

return None
# end def

# ===================
# Main section of code
# ====================
timestamp()
print('')
print('MD_TEST')
print('  Python version: %s' % (platform.python_version( ) ))
print('  Test the MD molecular dynamics program.')

# Initialize variables
d_num = 3
p_num = 500
step_num = 50
dt = 0.1
mass = 1.0
rmass = 1.0 / mass

wtime1 = clock( )

#  output:
print('' )
print('MD' )
print('  Python version: %s' % (platform.python_version( ) ) )
print('  A molecular dynamics program.' )
print('' )
print('  D_NUM, the spatial dimension, is %d' % (d_num) )
print('  P_NUM, the number of particles in the simulation is %d.' % (p_num) )
print('  STEP_NUM, the number of time steps, is %d.' % (step_num) )
print('  DT, the time step size, is %g seconds.' % (dt) )

print('' )
print('  At each step, we report the potential and kinetic energies.' )
print('  The sum of these energies should be a constant.' )
print('  As an accuracy check, we also print the relative error' )
print('  in the total energy.' )
print('' )
print('      Step      Potential       Kinetic        (P+K-E0)/E0' )
print('                Energy P        Energy K       Relative Energy Error')
print('')

step_print_index = 0
step_print_num = 10
step_print = 0

for step in range(0, step_num+1):
if (step == 0):
# Initialize

#  Positions
seed = 123456789

a = 0.0
b = 10.0

i4_huge = 2147483647

if (seed < 0):
seed = seed + i4_huge
# end if

if (seed == 0):
print('' )
print( 'R8MAT_UNIFORM_AB - Fatal error!')
print('  Input SEED = 0!' )
sys.ext('R8MAT_UNIFORM_AB - Fatal error!')
# end if

pos = np.zeros( (d_num, p_num) )

for j in range(0, p_num):
for i in range(0, d_num):
k = (seed // 127773)

seed = 16807 * (seed - k * 127773) - k * 2836

seed = (seed % i4_huge)

if (seed < 0):
seed = seed + i4_huge
# end if

pos[i,j] = a + (b - a) * seed * 4.656612875E-10
# end for
# end for

#  Velocities
vel = np.zeros([ d_num, p_num ])

#  Accelerations
acc = np.zeros([ d_num, p_num ])
else:
# Update

#  Update positions
for i in range(0, d_num):
for j in range(0, p_num):
pos[i,j] = pos[i,j] + vel[i,j] * dt + 0.5 * acc[i,j] * dt * dt
# end for
# end for

#  Update velocities
for i in range(0, d_num):
for j in range(0, p_num):
vel[i,j] = vel[i,j] + 0.5 * dt * ( force[i,j] * rmass + acc[i,j] )
# end for
# end for

#  Update accelerations.
for i in range(0, d_num):
for j in range(0, p_num):
acc[i,j] = force[i,j] * rmass
# end for
# end for
# endif

# compute force, potential, kinetic
force = np.zeros([ d_num, p_num ])
rij = np.zeros(d_num)

potential = 0.0

for i in range(0, p_num):

#  Compute the potential energy and forces.
for j in range(0, p_num):
if (i != j):

#  Compute RIJ, the displacement vector.
for k in range(0, d_num):
rij[k] = pos[k,i] - pos[k,j]
# end for

#  Compute D and D2, a distance and a truncated distance.

d = 0.0
for k in range(0, d_num):
d = d + rij[k] ** 2
# end for
d = np.sqrt(d)
d2 = min(d, np.pi / 2.0)

#  Attribute half of the total potential energy to particle J.
potential = potential + 0.5 * np.sin(d2) * np.sin(d2)

#  Add particle J's contribution to the force on particle I.
for k in range(0, d_num):
force[k,i] = force[k,i] - rij[k] * np.sin(2.0 * d2) / d
# end for
# end if

# end for
# end for

#  Compute the kinetic energy
kinetic = 0.0
for k in range(0, d_num):
for j in range(0, p_num):
kinetic = kinetic + vel[k,j] ** 2
# end for
# end for

kinetic = 0.5 * mass * kinetic

if (step == 0):
e0 = potential + kinetic
# endif

if (step == step_print):
rel = (potential + kinetic - e0) / e0
print('  %8d  %14f  %14f  %14g' % (step, potential, kinetic, rel) )
step_print_index = step_print_index + 1
step_print = (step_print_index * step_num) // step_print_num
#end if

# end step

wtime2 = clock( )

print('')
print('    Elapsed wall clock time = %g seconds.' % (wtime2 - wtime1) )

#  Terminate

print('')
print('MD_TEST')
print('  Normal end of execution.')

timestamp ( )

# end if```

Listing 6 is the top of the profile output ordered by cumulative time (cumtime). Notice that the profile output only lists the code itself. Because it doesn’t profile the code line-by-line, it’s impossible to learn anything about the code.

Listing 6: cProfile Output

```Sat Oct 26 09:43:21 2019
12791090 function calls (12788322 primitive calls) in 163.299 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
148/1    0.001    0.000  163.299  163.299 {built-in method builtins.exec}
1  159.297  159.297  163.299  163.299 md_001.py:3()
12724903    3.918    0.000    3.918    0.000 {built-in method builtins.min}
175/2    0.001    0.000    0.083    0.042 :978(_find_and_load)
175/2    0.001    0.000    0.083    0.042 :948(_find_and_load_unlocked)
165/2    0.001    0.000    0.083    0.041 :663(_load_unlocked)
...```

I also used pprofile:

`\$ pprofile md_001.py`

The default options cause the code to run much slower because it is tracking all computations (i.e., it is not sampling), but the code lines relative to the run time still impart some good information (Listing 7). Note that the code ran slower by about a factor of 10. Only the parts of the code with some fairly large percentages of time are shown.

Listing 7: pprofile Output

```Command line: ./md_001.py
Total duration: 1510.79s
File: ./md_001.py
File duration: 1510.04s (99.95%)
Line #|      Hits|         Time| Time per hit|      %|Source code
------+----------+-------------+-------------+-------+-----------
...
141|     25551|    0.0946999|  3.70631e-06|  0.01%|    for i in range(0, p_num):
142|         0|            0|            0|  0.00%|
143|         0|            0|            0|  0.00%|        #  Compute the potential energy and forces.
144|  12775500|      47.1989|  3.69449e-06|  3.12%|        for j in range(0, p_num):
145|  12750000|      47.4793|  3.72387e-06|  3.14%|            if (i != j):
146|         0|            0|            0|  0.00%|
147|         0|            0|            0|  0.00%|                #  Compute RIJ, the displacement vector.
148|  50898000|      194.963|  3.83046e-06| 12.90%|                for k in range(0, d_num):
149|  38173500|      166.983|  4.37432e-06| 11.05%|                    rij[k] = pos[k,i] - pos[k,j]
150|         0|            0|            0|  0.00%|                # end for
151|         0|            0|            0|  0.00%|
152|         0|            0|            0|  0.00%|                #  Compute D and D2, a distance and a truncated distance.
153|         0|            0|            0|  0.00%|
154|  12724500|      46.7333|   3.6727e-06|  3.09%|                d = 0.0
155|  50898000|      195.426|  3.83956e-06| 12.94%|                for k in range(0, d_num):
156|  38173500|      165.494|  4.33531e-06| 10.95%|                    d = d + rij[k] ** 2
157|         0|            0|            0|  0.00%|                # end for
158|  12724500|      72.0723|  5.66406e-06|  4.77%|                d = np.sqrt(d)
159|  12724500|      59.0492|  4.64059e-06|  3.91%|                d2 = min(d, np.pi / 2.0)
160|         0|            0|            0|  0.00%|
161|         0|            0|            0|  0.00%|                #  Attribute half of the total potential energy to particle J.
162|  12724500|      76.7099|  6.02852e-06|  5.08%|                potential = potential + 0.5 * np.sin(d2) * np.sin(d2)
163|         0|            0|            0|  0.00%|
164|         0|            0|            0|  0.00%|                #  Add particle J's contribution to the force on particle I.
165|  50898000|      207.158|  4.07005e-06| 13.71%|                for k in range(0, d_num):
166|  38173500|      228.123|  5.97595e-06| 15.10%|                    force[k,i] = force[k,i] - rij[k] * np.sin(2.0 * d2) / d
167|         0|            0|            0|  0.00%|                # end for
168|         0|            0|            0|  0.00%|            # end if
169|         0|            0|            0|  0.00%|
170|         0|            0|            0|  0.00%|        # end for
171|         0|            0|            0|  0.00%|    # end for
172|         0|            0|            0|  0.00%|
...```

The output from pprofile provides some indication of where the code uses the most time:

• The loop computing rij[k]
• The loop summing d (collective operation)
• Computing the square root of d
• Computing d2
• Computing the potential energy
• The loop computing the force array

Another option is to put timing points throughout the code, focusing primarily on the section of the code computing potential energy and forces. This code produced the output shown in Listing 8. Notice that the time to compute the potential and forces is 181.9 seconds with a total time of 189.5 seconds. Obviously, this is where you would need to focus your efforts to improve code performance.

Listing 8: md_001b.py Output

```Elapsed wall clock time = 189.526 seconds.

Total Time for position update =       0.089102 seconds
Avg Time for position update =       0.001782 seconds
Total Time for velocity update =       0.073946 seconds
Avg Time for velocity update =       0.001479 seconds
Total Time for acceleration update =       0.031308 seconds
Avg Time for acceleration update =       0.000626 seconds

Total Time for potential update =     103.215999 seconds
Avg Time for potential update =       0.000008 seconds

Total Time for force update =     181.927011 seconds
Avg Time for force update =       0.000014 seconds

Total Time for force loop =      75.269342 seconds
Avg Time for force loop =       0.000006 seconds

Total Time for rij loop =      25.444300 seconds
Avg Time for rij loop =       0.000002 seconds

MD_TEST
Normal end of execution.```
##### First Function Creation

The potential energy and force computations are the dominant part of the run time, so to better profile them, it is best to isolate that code in a function. Perhaps a bit counterintuitively, I created a function that initializes the algorithm and a second function for the update loops, and called the resulting code md_002.py. (My modified code is available online.) Because the potential energy and force computations change very little, I won’t be profiling this version of the code. All I did was make sure I was getting the same answers as in the previous version. However, feel free to practice profiling it.

##### Final Version

The final version of the code moves the section of code computing the potential energy and forces into a function for better profiling. The code, md_003.py, has a properties function that computes the potential energy and forces.

The cProfile results don’t show anything useful, so I will skip that output. On the other hand, the pprofile output has some useful information (Listing 9). The excerpts mostly focus on the function that computes potential energy and forces. Notice that the overall time to run the code is still about 10 times longer.

Listing 9: md_003.py Output Excerpts

```Command line: md_003.py
Total duration: 1459.49s
File: md_003.py
File duration: 1458.73s (99.95%)
Line #|      Hits|         Time| Time per hit|      %|Source code
------+----------+-------------+-------------+-------+-----------
1|         0|            0|            0|  0.00%|# md test code
2|         0|            0|            0|  0.00%|
3|         2|  3.40939e-05|  1.70469e-05|  0.00%|import platform
4|         1|  2.28882e-05|  2.28882e-05|  0.00%|from time import clock
(call)|         1|  2.71797e-05|  2.71797e-05|  0.00%|# :1009 _handle_fromlist
5|         1|  2.69413e-05|  2.69413e-05|  0.00%|import numpy as np
(call)|         1|     0.751632|     0.751632|  0.05%|# :978 _find_and_load
6|         1|  2.47955e-05|  2.47955e-05|  0.00%|from sys import exit
(call)|         1|  1.78814e-05|  1.78814e-05|  0.00%|# :1009 _handle_fromlist
7|         1|  8.34465e-06|  8.34465e-06|  0.00%|import time
...
160|        51|  0.000295162|  5.78749e-06|  0.00%|def properties(p_num, d_num, pos, vel, mass):
161|         0|            0|            0|  0.00%|
162|        50|  0.000397682|  7.95364e-06|  0.00%|    import numpy as np
163|         0|            0|            0|  0.00%|
164|         0|            0|            0|  0.00%|
165|         0|            0|            0|  0.00%|    # compute force, potential, kinetic
166|        50|  0.000529528|  1.05906e-05|  0.00%|    force = np.zeros([ d_num, p_num ])
167|        50|  0.000378609|  7.57217e-06|  0.00%|    rij = np.zeros(d_num)
168|         0|            0|            0|  0.00%|
169|        50|  0.000226259|  4.52518e-06|  0.00%|    potential = 0.0
170|         0|            0|            0|  0.00%|
171|     25050|    0.0909703|  3.63155e-06|  0.01%|    for i in range(0, p_num):
172|         0|            0|            0|  0.00%|
173|         0|            0|            0|  0.00%|        #  Compute the potential energy and forces
174|  12525000|      44.7758|  3.57492e-06|  3.07%|        for j in range(0, p_num):
175|  12500000|      45.4399|  3.63519e-06|  3.11%|            if (i != j):
176|         0|            0|            0|  0.00%|                #  Compute RIJ, the displacement vector
177|  49900000|      183.525|  3.67786e-06| 12.57%|                for k in range(0, d_num):
178|  37425000|      155.539|  4.15603e-06| 10.66%|                    rij[k] = pos[k,i] - pos[k,j]
179|         0|            0|            0|  0.00%|                # end for
180|         0|            0|            0|  0.00%|
181|         0|            0|            0|  0.00%|                #  Compute D and D2, a distance and a truncated distance
182|  12475000|      44.5996|  3.57512e-06|  3.06%|                d = 0.0
183|  49900000|      184.464|  3.69668e-06| 12.64%|                for k in range(0, d_num):
184|  37425000|      155.339|  4.15067e-06| 10.64%|                    d = d + rij[k] ** 2
185|         0|            0|            0|  0.00%|                # end for
186|  12475000|      68.8519|  5.51919e-06|  4.72%|                d = np.sqrt(d)
187|  12475000|      56.0835|  4.49567e-06|  3.84%|                d2 = min(d, np.pi / 2.0)
188|         0|            0|            0|  0.00%|
189|         0|            0|            0|  0.00%|                #  Attribute half of the total potential energy to particle J
190|  12475000|      74.6307|  5.98242e-06|  5.11%|                potential = potential + 0.5 * np.sin(d2) * np.sin(d2)
191|         0|            0|            0|  0.00%|
192|         0|            0|            0|  0.00%|                #  Add particle J's contribution to the force on particle I.
193|  49900000|      199.233|  3.99264e-06| 13.65%|                for k in range(0, d_num):
194|  37425000|      212.352|  5.67406e-06| 14.55%|                    force[k,i] = force[k,i] - rij[k] * np.sin(2.0 * d2) / d
195|         0|            0|            0|  0.00%|                # end for
196|         0|            0|            0|  0.00%|            # end if
197|         0|            0|            0|  0.00%|
198|         0|            0|            0|  0.00%|        # end for
199|         0|            0|            0|  0.00%|    # end for
200|         0|            0|            0|  0.00%|
201|         0|            0|            0|  0.00%|    #  Compute the kinetic energy
202|        50|  0.000184059|  3.68118e-06|  0.00%|    kinetic = 0.0
203|       200|  0.000753641|  3.76821e-06|  0.00%|    for k in range(0, d_num):
204|     75150|     0.265535|   3.5334e-06|  0.02%|        for j in range(0, p_num):
205|     75000|     0.298971|  3.98628e-06|  0.02%|            kinetic = kinetic + vel[k,j] ** 2
206|         0|            0|            0|  0.00%|        # end for
207|         0|            0|            0|  0.00%|     # end for
208|         0|            0|            0|  0.00%|
209|        50|  0.000200987|  4.01974e-06|  0.00%|    kinetic = 0.5 * mass * kinetic
210|         0|            0|            0|  0.00%|
211|        50|  0.000211716|  4.23431e-06|  0.00%|    return force, kinetic, potential
212|         0|            0|            0|  0.00%|
213|         0|            0|            0|  0.00%|# end def
...```

Again, most of the time in the code is spent in the properties function, which computes the potential energy and forces. The first few loops take up most of the time.

Out of curiosity, I looked at the output from line_profiler (Listing 10). Remember that all time percentages are relative to the time for that particular routine (not the entire code). The first couple of loops used a fair percentage of the run time. The last loop that computes the force array,

```for k in range(0, d_num):
force[k,i] = force[k,i] - rij[k] * np.sin(2.0 * d2) / d```

used about 25 percent of the run time for this function.

Listing 10: md_003.py line_profiler Output

```Total time: 358.778 s
File: ./md_003.py
Function: properties at line 159

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
159                                           @profile
160                                           def properties(p_num, d_num, pos, vel, mass):
161
162        50        164.0      3.3      0.0      import numpy as np
163
164
165                                               # compute force, potential, kinetic
166        50        351.0      7.0      0.0      force = np.zeros([ d_num, p_num ])
167        50        153.0      3.1      0.0      rij = np.zeros(d_num)
168
169        50         30.0      0.6      0.0      potential = 0.0
170
171     25050      14447.0      0.6      0.0      for i in range(0, p_num):
172
173                                                   #  Compute the potential energy and forces
174  12525000    7036459.0      0.6      2.0          for j in range(0, p_num):
175  12500000    7730669.0      0.6      2.2              if (i != j):
176                                                           #  Compute RIJ, the displacement vector
177  49900000   33530834.0      0.7      9.3                  for k in range(0, d_num):
178  37425000   39827594.0      1.1     11.1                      rij[k] = pos[k,i] - pos[k,j]
179                                                           # end for
180
181                                                           #  Compute D and D2, a distance and a truncated distance
182  12475000    7182783.0      0.6      2.0                  d = 0.0
183  49900000   33037923.0      0.7      9.2                  for k in range(0, d_num):
184  37425000   39131501.0      1.0     10.9                      d = d + rij[k] ** 2
185                                                           # end for
186  12475000   25236413.0      2.0      7.0                  d = np.sqrt(d)
187  12475000   13375864.0      1.1      3.7                  d2 = min(d, np.pi / 2.0)
188
189                                                           #  Attribute half of the total potential energy to particle J
190  12475000   31104186.0      2.5      8.7                  potential = potential + 0.5 * np.sin(d2) * np.sin(d2)
191
192                                                           #  Add particle J's contribution to the force on particle I.
193  49900000   34782251.0      0.7      9.7                  for k in range(0, d_num):
194  37425000   86664317.0      2.3     24.2                      force[k,i] = force[k,i] - rij[k] * np.sin(2.0 * d2) / d
195                                                           # end for
196                                                       # end if
197
198                                                   # end for
199                                               # end for
200
201                                               #  Compute the kinetic energy
202        50         33.0      0.7      0.0      kinetic = 0.0
203       200        147.0      0.7      0.0      for k in range(0, d_num):
204     75150      44048.0      0.6      0.0          for j in range(0, p_num):
205     75000      78144.0      1.0      0.0              kinetic = kinetic + vel[k,j] ** 2
206                                                   # end for
207                                                # end for
208
209        50         47.0      0.9      0.0      kinetic = 0.5 * mass * kinetic
210
211        50         37.0      0.7      0.0      return force, kinetic, potential```

If you look at this routine in the md_003.py file, I’m sure you can find some optimizations that would improve performance.

#### Using Numba

As part of the profiling, I moved parts of the code to functions. With Python, this allowed me to perform deterministic profiling without resulting in a long run time. Personally, I like deterministic profiling better than statistical because I don’t have to find the time interval that results in a good profile.

Putting parts of the code in functions provides a good starting point for using Numba. I described the use of Numba in a previous high-performance Python article. I made a few more changes to the last version of the code (md_003.py), and because the properties routine took a majority of the run time, I targeted this routine with Numba and simply used jit to compile the code.

The original code, before using Numba, was around 140 seconds on my laptop. After using Numba and running on all eight cores of my laptop (four "real" cores and four Hyper Threading (HT) cores), it ran in about 3.6 seconds. I would call that a success.