Sometimes your Python programs need a little more speed. The pyamgx library can help you speed up your Python code.

 

pyamgx – Accelerated Python Library

Python is quickly becoming one of the most popular languages for scientific computing and is already the most popular language for Deep Learning. Python code can interact with applications and quickly determine whether an application solves a problem. Given that it is an interpreted language, it is almost always slower than a compiled language such as C/C++ or Fortran. When you want your applications to run as fast as possible, you can use extensions or libraries that run faster than native Python.

A large number of Python extensions range from simple numerical interfaces such as NumPy to Deep Learning frameworks such as TensorFlow or Keras. Many libraries, some of which have Python interfaces, use GPUs to get even more performance.

In this article, I present a new Python interface to an accelerated library as an example of a way to speed up your code. Specifically, I look at AmgX, a Python interface for an Nvidia library that runs algebraic multigrid (AMG) methods on a GPU. It can be very useful in solving partial differential equations (PDEs) in the fluid flow, physics, and astrophysics disciplines.

Multigrid Methods

Multigrid methods are algorithms for solving differential equations. They use multiple levels of grid resolution to solve the equations, hence the name “multigrid.” Without multigrid methods, the solver rapidly eliminates wavelength errors related to grid resolution through the use of some sort of iterative solver that is generally in the class of relaxation methods. Although it is very good at eliminating short wavelength errors, the solver will spend quite a bit of time eliminating longer wavelength errors where relaxation methods are not as effective.

Multigrid methods create one or more coarse grids that the iterative relaxation solver uses to produce a global correction, which it then applies to the solution on a fine grid to help reduce the errors from longer wavelengths and the overall time to solution.

Multigrid methods take the solution on the fine grid and inject it up to the next level coarser grid. The solver might run a few iterations on this level and then inject the solution up to the next coarser level, if present. Multigrid methods take the solution from coarser levels and apply a correction on the next finer level. Again, solver iterations can be run on this level. Ultimately, the correction is applied to the finest grid level.

Multigrid methods can also be used as preconditioners for problems and not just as a solver, typically for an external (i.e., not part of the solver) iterative solver. Commonly, multigrid preconditioners solve eigenvalue problems.

AMGs construct the coarser grids directly from the system matrix. The “grid levels” are merely subsets of unknowns without any ties to the geometric grids. These methods are used often, so you do not have to code a true multigrid method.

AmgX

AmgX (algebraic multigrid accelerated) was developed by Nvidia to accelerate core linear solver algorithms on GPUs. The focus is on linear solvers commonly used in computational fluid dynamics (CFD), physics, astrophysics, energy, and nuclear code. With AmgX, you can create a solver composition system that allows you to create complex nested solvers and preconditioners.

By nature, GPUs are massively parallel, and AmgX focuses on using as much parallelism as possible. It can run on a single GPU or multiple GPUs in a single node. It can also use multiple nodes with GPUs using MPI (according to the user’s code). OpenMP can also be used for parallelism on a single node using CPUs as well as GPUs or mixed with MPI. By default, AmgX uses a C-based API.

The specific algorithms in AmgX are listed on the developer blog with a list of applications (e.g., for CFD, oil and gas, physics and astrophysics, and other nuclear-focused code).

A few good introductory articles on AmgX give a quick overview of AMG and then present how Fluent 15.0 benefited from the library to accelerate CFD applications 2 to 2.7 times on Nvidia K40X GPUs (in 2014) when solving a 111 million cell problem (440 million unknowns).