Introduction to Programming with OpenMP --------------------------------------- Practical Exercise 3 (More Syntax and SIMD) ------------------------------------------- Use similar commands to compile and run the program as in practical exercise 2. You should specify static scheduling explicitly, and continue to declare all variables inside OpenMP directive blocks. Question 1 ---------- In doing this question, declare every variable and array that is used in any of the loops inside the DO/for directive as either shared or private. Declare all loop indices and temporary variables as private, and the main arrays as shared. More details are given on this in the next lecture. 1.1 Starting with Programs/Stats.f90 or Programs/Stats.c, add directives to parallelise both loops in procedure Stats, using a combined parallel and DO/for directive and reductions. Use exactly the same arguments and files as in practical exercise 2. Note that the example output is for Fortran, and includes times using the intrinsic functions as well as your code; the C does not. Fortran users may like to experiment with the tuned libraries and see whether they improve the times for the intrinsics. Also note that the data are not normally distributed (Gaussian), so do not expect the skewness and kurtosis to be exactly 0 and 3. 1.2 Change the program to split the directives into separate parallel and DO/for ones, putting the former around both DO/for loops. Note that doing this correctly is a lot trickier than in practical exercise 2. You should notice no difference in the values or the times. Question 2 ---------- 2.1 Starting with Programs/Cholesky.f90 or Programs/Cholesky.c, add directives to parallelise the innermost loops of both loops in procedure Cholesky and both in procedure Solve, using a combined parallel and DO/for directive and reductions. You may need to define extra temporary variables. Use exactly the same arguments and files as in practical exercise 2. Note that the CPU time goes up but the wall-clock time down for procedure Cholesky; this is common when things are going well. Note that the wall-clock time is a lot less than the CPU time for procedure Solve, but both times are ridiculously high; this is common when things are going badly. In some implementations, the time will be so high that the program appears to hang - if that happens, just break in. We need to try a different approach for this. 2.2 Change the program to parallelise the outermost loops that are independent in procedure Solve; that is the middle loop for the first set of loops and the outermost loop for the second. Study the code to see why, and why parallelising the middle loop in the second will not work. Note that we now have both Cholesky and Solver wall-clock times much less than the serial code, even though their CPU times are larger. Make a safe copy of this program and call it Britomart.