In the Loop

Loop Schedule

The compiler will try to divide the work equally among threads for parallel regions defined by omp parallel. You can control how the work is divided with a clause to the directive of the form

!$omp do schedule(kind[, <chunksize>])

The integer expression chunksize is a positive value, and the values for kind are:

  • static
  • dynamic
  • guided
  • auto
  • runtime

The static schedule breaks up the iteration space into chunk sizes, and the chunks are then assigned cyclically to each thread in order (chunk 0 goes to thread 0, chunk 1 goes to thread 1, etc.). If chunksize is not specified, the iteration space is divided into equal chunks (as much as possible), and each chunk is assigned to a thread in the team.

The dynamic schedule divides the iteration space into chunks of size chunksize and assigns them to threads on a first come, first served basis; that is, when a thread is finished with a chunk, it is assigned the next chunk in the list. When no chunksize is specified, the default is 1. Dynamic scheduling allows you to create more chunks than there are threads and still have them all execute.

The guided schedule is somewhat similar to dynamic, but the chunks start off large and get exponentially smaller (again, with a chunksize of as the default). The specific size of the next chunk is proportional to the number of remaining iterations divided by the number of threads. If you specify chunksize with this schedule, it becomes the minimum size of the chunks.

The auto schedule lets the run time decide the assignment of iterations to threads on its own. For example, if the parallel loop is executed many times, the run time can evolve a schedule with some load balancing characteristics and low overheads.

The fourth schedule, runtime, defers the scheduling decision until run time. It is defined by an environment variable (OMP_SCHEDULE), which allows you to vary the schedule simple by changing OMP_SCHEDULE. You cannot specify a chunksize for this clause.

Fortran and C scheduling clause examples that use the dynamic scheduler with a chunk size of are shown in Listing 11. Which schedule is best really depends on your code, the compiler, the data, and the system. To make things easier when you port to OpenMP, I would recommend leaving the schedule to the default (which is implementation dependent) or changing it to runtime and then specifying it with the OMP_SCHEDULE environment variable. The auto option delivers best performance.

Listing 11: schedule() Clause

Fortran C
!$omp do schedule(dynamic, 4)
#pragma omp for schedule(dynamic, 4)

Summary

In this article I expanded on the previous article about OpenMP directives with some additional directives and some clauses to these directives. Remember, the goal of these articles is to present simple directives that you can use to get started porting your code to OpenMP. In this article, I covered the following topics:

  • Data and control parallelism
  • Teams and loops revisited
  • omp parallel options, including reductions defaultfirstprivatelastprivatenowait, and reduction and loop scheduling

In the next article, I will present some best practices in general use by myself and others. I’ll also take a stab at discussing how to use the OpenMP accelerator directive to run code on GPUs.