Getting Started

With the one directive omp parallel do, you can take advantage of great sources of parallelism in your loops and possibly improve the performance of your code. Before jumping in, you should be prepared to take several steps.

First, create one or more “baseline” cases with your code before using OpenMP directives. The baseline cases can help you check that your code is getting the correct answers once you start porting. One way to do this is to embed a test in your code that checks the answers (e.g., HPL (high-performance Linpack), which is used for the TOP500 benchmark). At the end, the test tells you whether the answers are correct. Another approach I tend to use is to write the code results to a file for the baseline case and then compare those with the results once you start porting, as I explained in the last OpenACC article. The exact same approach can be used with OpenMP.

Second, put timer functions throughout the code that get the wall clock time when called. At the minimum, you can use one at the beginning of the code and one at the end. This time is generally called the wall clock or run time for the application. If you are interested in understanding how the directives affect the execution time for different parts of your code, you can put timers around those parts. Do as much as you want, but don’t go too crazy.

Third, optionally use a profiler that can tell you how much total time is used in each routine of the code. Several profilers are available – primarily those that came with the compiler you are using.

Fourth, run the serial code with the timers or use the profiler to get a stack rank of which routines took the most time. These are the routines you want to target first because they take the most time. I discussed this process in the last OpenACC article, and the same procedure can be used with OpenMP.

I invite you to go forth and create parallel code with omp parallel do. If you want to get really crazy, you can mix OpenMP and OpenACC on your code. (I double dog dare you.)