Introduction to Programming with OpenMP --------------------------------------- Practical Exercise 5 (Synchronisation) -------------------------------------- Use similar commands to compile and run the program as in practical exercise 2. Question 1 ---------- 1.1 Take a copy of the program you wrote in question 2.2 in practical exercise 3 that was called 'Britomart', and change the program to split the directives into separate parallel and DO/for ones, putting the former outside both of the outermost loops in procedures Cholesky and Solve. You will need to use a synchronisation feature to do this correctly; start by using the single directive. Note that it doesn't actually improve the times, which is a common problem with such tuning attempts. 1.2 Change the program to use the master directive, remembering to synchronise properly. Question 2 ---------- 2.1 Write a program that consists of a parallel region outside a loop that reads positive integers in from the terminal, one by one, and exits at end of file. Use a master directive to synchronise the read. After reading an integer N, use it to calculate a private or threadprivate integer variable using a serial loop in each thread like the following, and print out the final value: var = N DO i = 1,omp_get_thread_num()+5000*5000 var = MOD(5*var+1,1024) END DO or: var = N; for (i = 0; i < omp_get_thread_num()+5000*5000; ++i) var = (5*var+1)%1024; Then test your program by typing the integers 1, 7 and 17. 2.2 Change the program to use the single directive, read the integer directly into var, and make it copyprivate. Then test it again. Question 3 ---------- This exercise is at about the same level of difficulty as converting some actual scientific codes to using OpenMP. It is a realistic example of how to manage synchronisation for things like finite element analysis and PDEs, and most of the coding and tuning techniques are similar, though of course the actual calculation is trivial. It is a lot harder than it looks, and the difficulty is not in using the OpenMP facilities; handling shared, updatable data correctly and efficiently is the hardest part of shared memory parallelism. Do not underestimate the difficulty of this, as it is much harder than all of the others, and very much harder to be sure that you have it right. The code is written in a gratuitously inefficient way to ensure that the race conditions are at least potentially visible. Unfortunately, in practice, they don't show up as wrong answers in reasonably sized tests. That doesn't mean that they won't cause trouble in larger and longer runs of the program. This is a common problem with debugging OpenMP. 3.1 Starting with Programs/Life.f90 or Programs/Life.c, put a parallel directive outside the iteration loop in the main program (note that this is trickier in C and C++ than in Fortran), and put both the clearing to zero in that loop and the entirety of the executable statements in procedure Iterate into a single directive's block. There are several small datasets in directory Programs, that you may prefer to use for testing, but you should finally test and time it with the following input that will match the specimen answer: Programs/bigrandom.life 10000 Note that running the program on one core in OpenMP mode merely slows it down, unsurprisingly. 3.2 Using a DO/for directive and a sections directive, parallelise the shifting (the second loop) in procedure Iterate and zeroing in the main iteration in the main program. You will have to ensure that the sections that are zeroed do not overlap (even though not doing it will probably make no difference to the results). Note that this improves the time somewhat, but not a lot. 3.3 Now parallise the main loop (the first one) in procedure Iterate. It is tempting just to add a DO/for directive around the main iteration loop, but that introduces race conditions. You can probably see this by using schedule(static,1) and testing. One simple approach is to divide the outer loop into 2* chunks, and execute the odd and even ones on successive passes, separated by a barrier. However, there are many other approaches. Note that we have now completed the task and the program is running fairly efficiently!