Exploring AMD’s Ambitious ROCm Initiative

Other Tools

The ROCm ecosystem is envisioned as a complete open development environment that includes a comprehensive toolkit of developer utilities. ROCm comes with a collection of debugging tools, including a HIP debugger and ROCm-GDB, a version of the GDB debugger modified for the ROCm platform. The ROC Profiler and ROC Tracer utilities provide performance analysis for programs written in C/C++, Python, and Fortran. ROCm also supports the Tau performance system, a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, UPC, Java, and Python. Beyond the Tau tools, which are available now, AMD continues to work on expanding support for other tools and performance profilers for large systems, such as PAPI and the HPCToolkit, which will be openly available in the future.

A system management interface (ROCm-SMI) supports a number of functions related to system time and temperature settings. In GPU environments, clock speed is an important consideration, and AMD GPUs can operate at a variety of different clock levels to optimize speed and energy usage. As with all high-performance environments, the clock speed has an effect on energy use, which has an effect on the temperature of the system. ROCm-SMI has options for measuring temperature, controlling voltage, and managing the fan speed. You can integrate the commands of the system management interface into programs and scripts to build speed and temperature controls directly into the programming environment. See the ROCm documentation for more information on ROCm management and development tools.

New Generation

Free software happens in communities, and the free ROCm platform has already unleashed a flurry of community development. GitHub is home to a number of open source projects that extend and expand the ROCm ecosystem for HPC, including the NWChem computational chemistry toolkit, as well as the LAMMPS, NAMD, and Gromacs molecular dynamics simulators.

The latest ROCm update occurs as AMD continues to build on its new generation of hardware for machine learning and HPC. The new Vega 7nm technology-based product line includes the Radeon Instinct™ MI50 16GB and 32GB GPUs [9], which operate at 26.8 (FP16), 13.3 (FP32), and 6.6 (FP64) TFLOPS peak performance, have up to 1 TB/​s memory bandwidth and are, according to AMD, the world’s first PCIe® Gen 4 accelerators. The new Infinity Fabric™ link technology delivers up to 184 GB/​s peer-to-peer bandwidth – up to 4.75 times faster than PCIe 3.0 alone. These hardware innovations let you group GPUs together into “hives,” which could further boost performance for some configuration scenarios.

The latest ROCm release is designed to exploit the powerful possibilities of AMD’s advanced GPU-based HPC products, with built-in switches and optimizations that will bring this next-generation GPU hardware to its fullest potential. ROCm 3.0 supports the new 2nd Gen AMD EPYC™ processor series [10], which comes with up to 64 cores and 128 threads and has broken the barrier on CPU performance with 100 performance world records [11]. The new ROCm release also includes support for the Bfloat16 floating-point math format.