Introduction to Programming with MPI ------------------------------------ Practical Exercise 06 (Error Handling) -------------------------------------- For general instructions, see the introduction to the collective practicals. Question 1 ---------- Take a copy of the program you wrote in question 4.2 in practical exercise 3, that was called 'Queequeg'. 1.1 Change the MPI_SUM in the reduction call to MPI_LAND; this is an invalid operation on double precision floating-point numbers, and is one of the errors detected by this implementation. Run the program and see how it fails. 1.2 Change the program in the following way. Write a diagnostic procedure that prints output of the form Processor failed with code : where the code is an MPI error code, and the reason is the text returned from MPI_Error_string. The procedure should take the error code as its argument and should finish by calling MPI_Abort. Set the error handling to MPI_ERRORS_RETURN immediately before the reduction call, call the diagnostic procedure if the reduction fails, and reset the error handling back to MPI_ERRORS_ARE_FATAL if it does not. This shows how to take control of error handling to diagnose a failure in one specific MPI call. 1.3 Change the program to set the error handling to MPI_ERRORS_RETURN immediately after initialising MPI, and test the error code after every MPI call, except possible MPI_Abort. Do not reset the error handling to MPI_ERRORS_ARE_FATAL. This shows how to take control of the error handling for every call. If you use MPI_Abort as recommended in this course (i.e. always with MPI_COMM_WORLD), then any failure indicates such a serious problem with MPI that there is little point in worrying about what will happen. A really paranoid programmer might follow the MPI_Abort with a (language-dependent) program termination. 1.4 Change either or both of the programs you wrote in 1.2 and 1.3 to restore MPI_SUM (i.e. replace the MPI_LAND by MPI_SUM), and see that they start working again.