Linux device mapper writecache

Kicking write I/O operations into overdrive with the Linux device mapper writecache.

Porting CUDA to HIP

Give your proprietary CUDA code new life with an open platform for HPC.

High-Performance Python – Distributed Python

Scale Python GPU code to distributed systems and your laptop.

High-Performance Python – GPUs

Python finally has interoperable tools for programming the GPU – with or without CUDA.

High-Performance Python – Compiled Code and Fortran Interface

Fortran functions called from Python bring complex computations to a scriptabllanguage.

High-Performance Python – Compiled Code and C Interface

Although Python is a popular language, in the high-performance world, it is not known for being fast. A number of tactics have been employed to make Python faster. We look at three: Numba, Cython, and ctypes.

OpenMP – Coding Habits and GPUs

In this third and last article on OpenMP, we look at good OpenMP coding habits and present a short introduction to employing OpenMP with GPUs.

In the Loop

Diving deeper into OpenMP loop directives for parallel code.


The powerful OpenMP parallel do directive creates parallel code for your loops.

Porting Code to OpenACC

OpenACC directives can improve performance if you know how to find where parallel code will make the greatest difference.