Introduction to Programming with MPI
------------------------------------

Practical Exercise 04 (Point-to-Point Transfers)
------------------------------------------------
    
You should do these practicals in the same way that you did the ones for
the collectives.  For general instructions, see the introduction to the
collective practicals.

Note that these use synchronous sends or buffered sends, to expose or
hide certain race conditions that can cause deadlock; this is NOT what
you should do in practical programs.  As the course teaches, you should
normally use ordinary sends, and ensure that the program works whether
those are implemented as synchronous or buffered sends.  For similar
reasons, many mistakes will cause the program to hang; if that happens,
break in using control-C.


Question 1
----------

Take a copy of the program you wrote in question 1.1 in practical
exercise 3, that was called 'Ahab'.


1.1 Change the program to send the data from the root process to all the
others using point-to-point transfers and synchronous sends, rather
than using broadcast.  Use a fixed value for the tag, such as 123.


1.2 Change the program to check the status on the receives, and that the
source, tag and element count are all what would be expected.  You do
NOT need to change the calls to use wild cards.  Display "Oops!" if not
and abort.  Note that the checking is done by the receive anyway if
there are no wild cards, so this exercise is rather pointless in
practice.


Question 2
----------

Take a copy of the program you wrote in question 2.4 in practical
exercise 3, that was called 'Ishmael'.


2.1 Write a function using synchronous sends to perform a collective
with an identical syntax to MPI_Alltoall that rotates the data -
i.e. the data in process N is sent to process N+1, except for the last
which is sent back to process 0.  Note that you will need to serialise
the sends when doing this.  Use another fixed value for the tag, such as
456.  Do not bother about the error handling.

Fortran programmers should declare the arrays REAL(KIND=DP), as that
is how they will be used.

The program starts by scattering data from the root to all processes;
change it to create another array and call the rotate function to first
array around the processes to the second array.  Display the second
array following the rotate.


2.2 Change the program you wrote in 2.1 above to initialise the second
array to the values of the first (i.e. duplicate the array received from
the scatter), and rotate just the middle element of the first array to
the the middle element of the second array.  Then display the second
array.  You should not need to make any changes to the rotate function.

Make a safe copy of this program, and call it 'Flask'.


2.3 Change the rotate function (i.e. JUST the function) in the program
you wrote in 2.2 above to handle error checking properly.  In C and
Fortran, it should check the error code after every MPI call and return
it if it is not MPI_SUCCESS.  In all three languages, check the received
counts and return MPI_ERR_COUNT if they are not what the parameters say
they should be.

Note that this is tedious, but not very difficult.  It is also needed
only when you change the error handling from the default, which is
covered in a later lecture.  However, you should always do it when
writing your own primitives, as you may want to change the error
handling later, and you don't want your own code to fall apart.


2.4 Change the rotate function in the program you wrote in 2.3 above to
use wild cards, and check the source and tag on receipt.  Return
MPI_ERR_OTHER is there is a mismatch.


Question 3
----------

3.1 Take a copy of the program you wrote in 2.1 that was called 'Flask',
and simplify it so that all processes do their send first, using
buffered sends, and then receive what they are sent.  Remember to
allocate a suitable buffer.

Make a safe copy of this program, and call it 'Daggoo'.


3.2 Change the buffered sends to synchronous sends, and notice that it
hangs.  Break in using control-C after you get bored.  There are no
specimen answers to this question.


3.3 Now change the sends and receives to send-receives.  This is the
simplest efficient way of writing rotate and similar collectives and
communication.


Question 4
----------

Take a fresh copy of the program you wrote in question 2.4 in practical
exercise 3, that was called 'Ishmael'.


4.1 Change the file name from 'reals.input' to 'mixed.input', and read
in 4 integers that give the number of reals that should be scattered to
each process.  Allocate an array of exactly that size and read in that
number of elements (Fortran programmers should use a single READ
statement).

Change the original scatter to use buffered sends, point-to-point.  In
the target processes, use probing to find out how much data has been
sent and allocate receiving arrays of exactly that size.  Remove the
display of the initial data on the root process, and display the results
as received on all processes.

Note that there is a simpler way to do this, which is covered in the
lecture "More on Collectives", so this is not often needed in practice.


Question 5
----------

WARNING:  this exercise may be too tricky for some people, as it needs
slightly more advanced programming language skills than the others.

Take a copy of the program you wrote in question 5.1 in practical
exercise 3, that was called 'Stubb'.


5.1 Write a replacement for MPI_Alltoall, which first transfers the data
using buffered sends and then receives them using MPI_ANY_SOURCE; use
probing to decide where to put the result.  Change the program to use
this.  The same techniques that you used in 2.1 and 3.1/4.1 above will
help.

C and C++ programmers will need to assume that the datatype is MPI_INT,
and should use sizeof(int) to get the size.  Fortran programmers will
need to use INTEGER in the declarations.  It is possible to solve that
problem generically, but we have not covered the facilities yet.


5.2 Change the function to maintain a sequence number of calls (with
the first call being 1), and to use a tag with the following value (made
up from the sequence number and the source and target process numbers):

C and C++:  (169*sequence+13*(target+1)+(source+1))%32768
Fortran:    MOD(169*sequence+13*(target+1)+(source+1),32768)

Also change the function to use MPI_ANY_TAG and to check the tag value
manually.  This is how to use the tag as an error-detection mechanism.