## Assignment 2:

Parallel Matrix Multiply

In the lecture I merely said write a parallel matrix multiplication program: C = A * B. Here I have added detail to make it easier for you.

The program should be able to accept any size square matrix and utilize any number of processors, although I will accept a fixed number of processors.

Some comments on implementation and submission:

1. Only one process (e.g., a process with rank = 0 acting as the master) should read the data/matrix (or assign values to matrix A and B) and then it should automatically share this among the worker processes.

2.  You can divide the work among processes in a number of ways. Some suggested ways are: (a) treat computations of all elements of each row of the resulting matrix C as one work unit, (b) treat computations of all elements of each column of the resulting matrix C as one work unit, (c) computations of each element of  the resulting matrix C as one work unit, (d) create work units based on the number of processes and the size of matrix and then assign them different worker process.

3. Try to do assign the work units to worker processes such that the load/work distribution among them is balanced. The master process should collect the results from worker processes and strores in matrix C.

4. Testing and Evaluation:

(a) Take the size of square matrix as an input to the program by reading from the keyboard at runtime.
(b) Test your program to make sure that it works in parallel!
(c) When the size of matrix (N) is larger than 5, it is NOT be feasible for you to input elements of matrix A and B from the keyboard. In such cases, simply assign some random values to elements matrix A and  B.
(d)  Run your program for three different problem sizes: matrix size: N = 100, 200, 300, 400, 500 on different combinations of processes: 4, 8, 10, and 12 processes.
(e) Note the processing time by making use of appropriate MPI calls and draw a graph for varying values of matrix size with "Number of CPUs/nodes on X column" and "the time taken " on Y column.

That is, use randomly generated  matricies in a range of sizes and produce a report on the time taken for a range of different sized matricies in a parallel programmin environment using MPI.
.
4. Submission:

• Write a report containing (a) details on your design and implementation and the way you did load balancing, (b) performance graphs and (c) program code