Modern Fortran – Part 3
Fortran 90 took Fortran 77 from the dark ages by giving it new features that developers had wanted for many years and by deprecating old features – but this was only the start. Fortran 95 added new features, including High-Performance Fortran (HPF), and improved its object-oriented capabilities. Fortran 2003 then extended the object-oriented features started by Fortran 90 and 95; improved C and Fortran integration, standardizing it and making it portable; and added a new range of I/O capabilities. However, Fortran wasn't done evolving. Developers still had features and capabilities on their wish lists, which led to Fortran 2008.
Fortran 2003 to Fortran 2008 is much like Fortran 90 to Fortran 95. The revision added some corrections and clarifications to Fortran 2003, while introducing new capabilities. Probably the biggest added feature was the idea of concurrent computing via Coarray Fortran. Concurrent computing executes several computations during overlapping periods of time, allowing users to run sections of code at the same time. In contrast, sequential computations occur one after the other and have to wait for the previous computations to finish.
As a point of a clarification, concurrent computing is related to, but different from, parallel programming, which is so prevalent in HPC. In parallel computing, various processes run at the same time. On the other hand, concurrent computing has processes that overlap in terms of duration, but their execution doesn't have to happen at the same instant. A good way to explain the differences between serial, sequential, concurrent, and parallel is shown in Table 2, which assumes you have two tasks, T1 and T2.
|Table 1: Serial, Sequential, Concurrent*|
|Order of Execution||Model|
|T1 executes and finishes before T2||Serial and sequential|
|T2 executes and finishes before T1||Serial and sequential|
|T1 and T2 execute alternately||Serial and concurrent|
|T1 and T2 execute simultaneously||Parallel and concurrent|
|* From Wikipedia|
“Simultaneous” means executed in the same physical instant, and “sequential” is an antonym of “concurrent” and “parallel.” In this case, sequential typically means something is performed or used in sequence.
Coarray Fortran (CAF) is a set of extensions for Fortran 95/2003 that were developed outside of the Fortran standard so experimentation could take place quickly. They were adopted in Fortran 2008 with the syntax varying a little bit relative to the original CAF definition. Coarray Fortran is an example of a Partitioned Global Address Space (PGAS), which assumes a global address space that is logically partitioned with a portion of it local to each process, thread, or processing element. Because each process has a portion of the address space, there can be an affinity for a particular process.
CAF uses a parallel execution model so that performance of code could be improved. The basics of CAF are as follows:
“A Fortran program containing coarrays is interpreted as if it were replicated a fixed number of times and all copies were executed asynchronously. Each copy has its own set of data objects and is called an image. The array syntax of Fortran is extended with additional trailing subscripts in square brackets to give a clear and straightforward representation of access to data on other images.”
Data reference without square brackets are local data (local to the image). If the square brackets are included, then the data might need to be communicated between images. CAF uses a Single-Program, Multiple Data (SPMD) model for this.
When a coarray-enabled application starts, a copy of the application is executed on each processor. However, the images (each copy of the application) are executed asynchronously. The images are distinguished from one another by an index between 1 and n, the number of images. Notice it starts with a 1 and not a zero, which is perhaps influenced by the Fortran roots.
In the application code, you define a coarray with a trailing . The coarray then exists in each image, with the same name and having the same size. They can be a scalar, array, static, or dynamic and of intrinsic or derived type. Synchronization statements can be used to maintain program correctness. Listing 1 is an example of the ways you can define a coarray.
Listing 1: Defining Coarrays
integer :: x[*] ! scalar coarray real, dimension(n) :: a[*] ! Array coarray real, dimension(n), codimension[*] :: a ! Array coarray integer :: cx[10,10,*] ! scalar coarray with corank of 3 ! Array coarray with corank of 3 with different cobounds real :: c(m,n) :: [0:10,10,*] ! Alloctable coarray real, allocatable :: mat(:,:)[:] allocate(mat(m,n)[*]) type(mytype) :: xc[*] ! Derived tpe scalar coarray
Notice that scalars can be coarrays, fixed-size arrays, allocatable arrays, and derived types, and you can have coarrays with different coranks. However, you cannot have a coarray that is a constant or a pointer.
What is a "corank"? Variables in Fortran code can have rank, bounds, extent, size, and shape. These are defined in the parenthesis of an array declaration, as specified in Fortran 90 onward. For coarrays the corank, cobounds, and coextents are given by the data in the square brackets. The cosize of a coarray is always equal to the number of images specified when the application is executed. A coarray has a final coextent, a final upper cobound, and a coshape that depend on the number of images.
A simple example declaration is:
integer :: a(4)[*]
This declaration means that each image (each “process”) is an integer array of 4. You can assemble a 2D (rank = 2) coarray data structure from these 1D (rank = 1) arrays. If using four images with the previous declaration, the coarray looks like Figure 1. The arrays are stacked on top of each other because of the 1D coarray specification [*].
If you specify the coarray in a different manner, but still use four images, a 2D coarray can be declared as follows:
integer :: a(4)[2,*]
Each image is again a 1D integer with length (extent) 4. However, the coarray specifices that the 2D array is assembled differently (Figure 2). Remember that each image has it's own array that is the same as the others but can be combined using coarrays.
The use of coarrays can be thought of as opposite the way distributed arrays are used in MPI. With MPI applications, each rank or process has a local array; then, the process needs to have be mapped from the local array to the global array so that local data can be mapped to the larger global array. The starting point is with global arrays and then to local arrays.
Coarrays are the opposite. You can take local arrays and combine them into a global array using a coarray. You can access the local array (local to the image) using "usual" array notation. You can also access data on another image almost in the same way, but you have to use the coarray notation.
In another simple example, each image has a 2D array, and they can be combined into a coarray to create a larger 2D array. Again, assume four images with the following coarray declaration:
integer :: a(4,4)[2,*]
Each image has a 2D integer array a. With the coarray definition given by the square bracket notation, the “local” arrays are combined into a coarray (globally accessible) that looks like Figure 3.
If the array is local to the image, you access the array as you normally would. For example, for image 3 to access element (2,2) from the array, the statement would be something like:
b = a(2,2)
You can always use coarray notation if needed, but in this case, you know the data is local, so you can access it using local notation. If another image wanted to access that element, then the statement would have to be:
b = a(2,2)[1,2]
Images 1, 2, and 4 would access the data element in this fashion (global access). You still have to pay attention to what image holds what data, but writing the statements to use them is fairly easy.
The key is the coarray subscripts. The following declaration is an example of a simple local variable,
integer, dimension(10,4) :: a
with rank 2 (two indices). The lower bounds are 1 and 1. The upper bounds are 10 and 4. The shape is [10,4].
The following declaration would convert the array to a coarray:
integer :: a(10,4)[3,*]
This has added the corank of 2 (two indices). The lower cobounds are 1 and 1. The upper cobounds are 3 and m. The coshape is [3,m], where m is:
m = ceiling(num_images()/3)
Using coarrays you no longer have to use MPI_SEND() and MPI_RECV()or their non-blocking equivalents for sending the data from one process to another. You just access the remote data using the coarray syntax and let the underlying coarray library do the work. It can make multiprocess coding easier.
Implementing CAF for Fortran 2008 is done different ways. A common way is to implement it using MPI, as in gfortran. Except for a small number of functions, gfortran provides coarray functionality from Fortran 2008 starting with version 5.1. It uses the coarray capability in OpenCoarrays. The compiler translates the coarray syntax into library calls that then use MPI functions underneath.
I decided to build and test the latest gfortran and OpenCoarrays on CentOS 7.2, which comes with an older gfortran, so the first step I had to take was to install the latest and greatest version GCC 6.2.0. Believe it or not, building and installing a new GCC is not as difficult as it would seem. If you install as many dependencies as possible using the packages of the distribution, it's much easier. In general I follow the directions in the GCC wiki and GNU GCC installation page.
I installed GCC 6.2.0 into my home directory, /home/laytonjb/bin/gcc-6.2.0, then modified the environment variables $PATH and $LD_LIBRARY_PATH so the new compilers were used instead of the older ones. The command used for building and installing GCC is:
$PWD/../gcc-6.2.0/configure --prefix=$HOME/gcc-6.2.0 --enable-languages=c,c++,fortran,go --disable-multilib
The next step was to build and install an MPI library using the GCC 6.2.0 compilers. The OpenCoarray website recommended MPICH first, so I built and installed MPICH-3.2 in /home/laytonjb/bin/mpich-3.2. Again, the environment variables $PATH and $LD_LIBRARY_PATH were modified in the .bashrc file to point to MPICH binaries and libraries.
After the MPI library, the next step was to build and install OpenCoarray (OCA). The latest stable version as of the writing of this article was 1.7.2. The following command was used for the build and installation:
./install.sh -i /home/laytonjb/bin/opencoarray-1.7.2
The OCA build didn't take too long and was much shorter than building GCC. At this point, I'm ready to start compiling and executing some CAF code!
The first example is very simple (Listing 2). This proverbial “hello world” program for Fortran coarrays was taken from the Wikipedia page for coarrays.
Listing 2: Hello World with Coarrays
program Hello_World implicit none character(len=20) :: name[*] ! scalar coarray, one "name" for each image. ! Note: "name" is the local variable while "name" accesses the ! variable in a specific image; "name[this_image()]" is the same as "name". ! Interact with the user on Image 1; execution for all others pass by. if (this_image() == 1) then write(*,'(a)',advance='no') 'Enter your name: ' read(*,'(a)') name end if ! Distribute information to all images call co_broadcast(name,source_image=1) ! I/O from all images, executing in any order, but each record written is intact. write(*,'(3a,i0)') 'Hello ',trim(name),' from image ', this_image() end program Hello_world
Using GCC, particularly from gfortran, you have two ways to build and execute this code. The first way is to use the compile and run scripts provided by OCA. If the name of the source file is hello.f90, then you should compile and execute with:
$ caf hello.f90 -o hello $ cafrun -np 4 ./hello Enter your name: Jeff Hello Jeff from image 1 Hello Jeff from image 4 Hello Jeff from image 3 Hello Jeff from image 2
The first line compiles the code and the second line executes it. The -np 4 in the cafrun command tells the application to use four images (i.e., -np 4 = number of processors is 4)
The second way of building and executing coarray code is to build them with the mpif90 script and execute them using the mpirun script,
mpif90 -fcoarray=lib yourcoarray.f90 -L/path/to/libcaf_mpi.a -lcaf_mpi -o yourcoarray $ mpirun -np # ./yourcoarray
where # is the number of images. The compilation command adds in the all-important -fcoarray=lib libraries.
Other, more complicated, coarray Fortran code examples are floating around the web, although not as many as for F90 or Fortran 2003. Because Fortran 2008 is so new, it looks like coarray examples haven't quite caught up yet.
A great place to find out what aspects of Fortran 2008 your compiler supports and does not support is the Fortran 2008 wiki. In addition to CAF, Fortran 2008 has implemented some other features, which include the following highlights:
- Submodules – additional structuring facilities for modules; supersedes ISO/IEC TR 19767:2005
- Coarray Fortran – a parallel execution model
- The DO CONCURRENT construct – for loop iterations with no interdependencies
- The CONTIGUOUS attribute – to specify storage layout restrictions
- The BLOCK construct – which can contain declarations of objects with construct scope
- Recursive allocatable components – as an alternative to recursive pointers in derived types
The next evolution of Fortran, even though many compilers have yet to implement all of Fortran 2008, is Fortran 2015. The standard has been under discussion for several years and is still undergoing work, with a goal for a 2018 release.
In general, Fortran 2015 is intended to include minor revisions, some clarifications, and corrections to inconsistencies. However, some new features should be included, as well. In general, Fortran 2015 has two "thrusts." The first is around improved C and Fortran integration, and the second is around enhancing coarrays.
Many of the Fortran/C interoperability standards for Fortran 2015 were aimed to help MPI 3.0. This article from “Dr. Fortran” contains a discussion of the proposed Fortran/C changes and even has a couple of examples.
A second feature is the addition of new parts to coarrays. The first part is called "teams," which are collections of images in coarray applications that work together on a particular task. An application can have several teams working in on separate tasks that then communicate their results to their parent.
From teams, you can also create subteams that can be dissolved and reformed dynamically. With subteams, if one image fails, you can capture the problem, re-create a new team, and proceed with your computations Contrast this with MPI: If a rank fails, the entire application hangs or crashes.
The definition of “a failed image” reveals a problem of much discussion (or argument). For example, an image really might not be failed but just very slow for some reason. The complication is how to define “slow” (or stalled) and how to have Fortran detect and recover from these (slow or stalled) images. Initially, the Fortran team decided to add some synchronization constructs to detect that an image has failed. Although the final determination is not yet settled, the discussion is proceeding in the right direction.
The second part of the coarray additions is called "events." This addition is very much as it sounds; that is, events allow an image to notify another image that it has completed a task and that it can proceed.
The third part of the coarray additions is called "atomics." In Fortran 2008, the concept of "atomic" objects was created that allow for invisible operations. However, they have had limited support. In Fortran 2015, the following atomic operations were added:ADD, AND, CAS (compare and swap), OR, and XOR.
The fourth and last part of the coarray additions is called "collectives," which are intrinsic procedures for performing operations across all images of the current team. The new routines that have been defined but are subject to change are: CO_MAX, CO_MIN, CO_SUM, CO_BROADCAST, and CO_REDUCE.
As with previous versions of Fortran, some features are targeted for deprecation in Fortran 2015:
- Labeled DO loops (DO 10 I=…) (which will be a sad day for us old Fortran types.)
- COMMON and BLOCK DATA (also a sad day)
The features that are finally deleted from Fortran are:
- Arithmetic IF
- The non-block DO construct, where the DO range doesn't end in a CONTINUE or END DO