HPC Compilers

Consequences of a System Change

A bid process that reveals the need to switch to a new system architecture often carries considerable consequences. The relevant standards for HPC languages, programming models, and libraries guarantee that the code base can be ported to a new platform. However, the ability to achieve the same or better performance is far from obvious. With every system change in the past, then, scientists had to check their program code for performance weaknesses and – possibly with the support of the data center – optimize again.

For years, the introduction of newer architectures has shown that the performance portability problem has become increasingly worse. Regardless of whether you consider a switch to many-core (Intel Xeon Phi) systems or those that use GPU acceleration, in switching, you will often lose more than an order of magnitude of your computing power, rather than a fraction as in the past, if the memory access patterns or memory requirements of the application are not precisely tailored to the sweet spot of the new architecture.

For example, discontinuously stored field elements in memory cannot be vectorized on current Intel processors, which can result in a power loss factor of up to 32 when using current SIMD (Single Instruction, Multiple Data Processing) units. Similarly, users whose working data sets do not completely fit into the relatively small local main memory of an accelerator card can suffer painful performance losses that often result in performance-reducing offload data transfers. Programmers are then forced to change previously effective data layouts, which can be a very time-consuming process for large applications.

Additionally, programmers typically have to use newer language features (e.g., the directives for asynchronous offloading of data from the host processor to an accelerator card or SIMD directives for vectorization defined in OpenMP 4.5) to create a GPU-enabled or vectorized application on many-core processors. Efficiently implementing these OpenMP concepts (or alternative models such as OpenACC) in the selected compiler suite is necessary for successful optimization.

With the high complexity of programming models, it may well be necessary, depending on the application profile, to consider a compiler alternative. Depending on the platform, the LRZ offers one or two such alternatives on its HPC systems.

The Authors

Dr. Carla Guillen works as a research assistant at the LRZ in the application support group and mainly deals with performance monitoring and energy optimization of high-performance computing applications. In this context, she programs system-wide tools with C++ to monitor the highest scaling computers at the LRZ.

Dr. Reinhold Bader completed his studies in physics and mathematics at the Ludwig Maximilian University in Munich in 1998 with a dissertation on theoretical solid-state physics. Since 1999, he has been a research assistant at the LRZ in HPC user support, HPC systems procurement, prototype benchmarking, and configuration and change management. At present, he is group leader for the HPC services at LRZ. As a German Institute for Standardization (DIN) delegate to the ISO/IEC JTC1/SC22/WG5 standards committee, he is involved in the further development of the international standard for the Fortran programming language.