Why Good Applications Don’t Scale


Writing a useful parallel application is a great accomplishment. At some point you will want to try running the application with larger core counts to improve performance, which is one of the reasons you wrote a parallel application in the first place. However, when at some point increasing the number of processors used in the application stops decreasing and starts increasing the wall clock time of the application, it’s time to start searching for reasons why this is happening.

The first explanation is from Amdahl’s Law, which illustrates the theoretical speedup of an application when running with more processes and the limitation of serial code on wall clock time. Although it can be annoying that parallel processing is limited by serial performance and not something directly involving parallel processing, it does explain that at some point, your application will not run appreciable faster unless your application has a very small serial portion.

The second explanation for decreased performance is serial bottlenecks in an application caused by I/O. Examining the amount of time an application spends on I/O is an important step in understanding the serial portion of your application. Luckily you can find tools and libraries to parallelize the application I/O; however, at some fundamental level, the application has to do some I/O to read input and write output, limiting the scalability of the application (scalability limit).

The goal is to push this scalability limit to the largest number of processes possible, so get out there and improve the serial portion of your applications!