AMD’s ROCm platform brings new freedom and portability to the GPU space.

Exploring AMD’s Ambitious ROCm Initiative

This article was updated on December 6, 2018.

Two years ago, AMD released the innovative ROCm hardware-accelerated, parallel computing environment, and since then, the company has continued to refine its bold vision for an open-source, multi-platform, high-performance computing environment. Over the past two years, ROCm developers have contributed many new features and components to the ROCm open software platform. Now, the much-anticipated release of the Vega 7nm technology based GPU environment adds another important ingredient to the mix, empowering a second generation of high-performance applications that will benefit from ROCm’s acceleration features and “write it once” programming paradigm.

ROCm is a universal platform for GPU-accelerated computing. A modular design lets any hardware vendor build drivers that support the ROCm stack. ROCm also integrates multiple programming languages and makes it easy to add support for other languages. ROCm even provides tools for porting vendor-specific CUDA code into a vendor-neutral ROCm format, which makes the massive body of source code written for CUDA available to AMD hardware and other hardware environments.

What is ROCm, and why is it poised to shake up the whole HPC industry? The best way to get familiar is to look inside.

Big Picture

The ROCm developers wanted a platform that supports a number of different programming languages and is flexible enough to interface with different GPU-based hardware environments (Figure 1). As you will learn later in this article, ROCm provides direct support for OpenCL, Python, and several common C++ variants. One of the most innovative features of the platform is the Heterogeneous-Compute Interface for Portability (HIP) tool, which offers a vendor-neutral dialect of C++ that is ready to compile for either the AMD or CUDA/​NVIDIA GPU environment.

  

Figure 1: ROCm is designed as a universal platform, supporting multiple languages and GPU technologies.

  

Lower in the stack, ROCm provides the Heterogeneous Computing Platform, a Linux driver, and a runtime stack optimized for “HPC and ultra-scale class computing.” ROCm’s modular design means the programming stack is easily ported to other environments.

HCC

At the heart of the ROCm platform is the Heterogeneous Compute Compiler (HCC). The open source HCC is based on the LLVM compiler with the Clang C++ preprocessor. HCC supports several versions of standard C++, including C++11, C++14, and some C++17 features. HCC also supports GPU-based acceleration and other parallel programming features, providing a path for programmers to access the advanced capabilities of AMD GPUs in the same way that the proprietary NVCC CUDA compiler provides access to NVIDIA hardware. AMD says it invested heavily in HCC because integrating GPU acceleration features directly into the compiler represents a chance to “approach computation holistically, on a system level, rather than as a discrete GPU artifact.”

C++ was not created for GPU-based parallel computing, and the standard forms of the language do not have the features necessary to capitalize on all the benefits of AMD’s GPU environment. A programmer who wants to engage the full range of parallel programming options needs to use some form of C++ language extension. In addition to its support for standards-based C++, HCC supports a pair of important parallel programming extensions:

  • C++ AMP (Accelerated Massive Parallelism) – Microsoft’s extension for HPC programming and GPU support.
  • HC (Heterogeneous Computing) – AMD’s own GPU-ready API.

Support for C++ AMP provides an easy transition for programmers who are accustomed to the Microsoft Visual Studio programming environment. Code written for C++ AMP can compile on HCC without the need to adapt.

According to AMD, the native HC API is “inspired” by AMP; however, “HC has some important differences from C++ AMP, including removing the ‘restrict’ keyword, supporting additional data types in kernels, providing more control over synchronization and data movement, and providing pointer-based memory allocation.”

HCC with the HC and AMP extensions provide a complete solution for GPU-accelerated programming in AMD’s native hardware environment, but one piece of the puzzle remains. The goal of ROCm is to build a language- and vendor-neutral programming environment, and to do so, the ROCm developers knew they would need to build a bridge to the CUDA environment and the hundreds of programs and frameworks designed to work with CUDA. What they really wanted was a way to write code once and then compile it for either the CUDA environment or the HCC/​AMD environment. The solution to this problem is perhaps the most innovative part of the ROCm stack: Heterogeneous-Compute Interface for Portability, also known as HIP.