Previous: Array Initialization Up: Examples

Scientific Mesh Computation

This is a solution to the cellular automaton grid computation problem. A gradient function is evaluated iteratively over a two dimensional grid of points. The initial boundary conditions are given and each interior point computes its new value as a weighted average of its old value and its neighbors' old values. This process is repeated until convergence (or, in this case, until a fixed number of iterations have been processed).


#include <iostream.h>
const int N = 10;

float Mesh[N][N];

float compute_cell (int r, int c) { return (4*Mesh[r][c] + Mesh[r+1][c] + Mesh[r-1][c] + Mesh[r][c+1] + Mesh[r][c-1])/8.0; }

void calculate (void) { float New_Mesh[N][N]; for (int iterate=0; iterate<300; iterate++) { parfor (int row=1; row<N-1; row++) //compute new mesh parfor (int col=1; col<N-1; col++) New_Mesh[row][col] = compute_cell(row,col); parfor (int newrow=1; newrow<N-1; newrow++) //update old mesh parfor (int newcol=1; newcol<N-1; newcol++) Mesh[newrow][newcol] = New_Mesh[newrow][newcol]; } }

int main() { for (int i=1; i<N-1; i++) { //initialize boundary of Mesh Mesh[0][i] = i; Mesh[i][0] = i; Mesh[N-1][i] = N-1+i; Mesh[i][N-1] = N-1+i; } for (i=1; i<N-1; i++) //initialize interior of Mesh for (int j=1; j<N-1; j++) Mesh[i][j] = 0;

calculate();

for (i=0; i<N; i++) { //display outcome for (int j=0; j<N; j++) cout << Mesh[i][j] << "\t"; cout << endl; } return 0; }

The computation is done in parallel for all points. Note that a particular point's old value may be used in as many as 4 computations (i.e. for each of its neighbors). Because each computation requiring this old value performs only a read operation, this is not an instance of dangerous sharing. The value that is computed is written to a new mesh. At the end of the first two parfor statements, therefore, we know that this new mesh of values has been completely filled in with the new values. The second group of parfor statements can then safely copy these values to the original mesh. It is important to understand why no mutable variable is being both written and read by parallel threads of control.

Clearly this is not a very efficient solution to this problem. The cost of copying the mesh of values at every iteration is high. One way to avoid this cost is to maintain two arrays, M1 and M2. On even iterations, the M1 stores the old value and the new values are written into M2, and vice versa on the odd iterations. Also, the number of parallel threads of control is excessive (as discussed in the ``Pitfalls'' section, ), considering the small amount of work to be performed by each one. It is reasonable to expect a program in which each thread of control computes the values for a collection of cells to be more efficient. The following code incorporates this optimization.


#include <iostream.h>

//N...........size of grid (N by N) //T...........number of concurrent processes, each working // on an N by ((N-2)/T+2) slice (T must divide N-2) // and (N-2)/T $>$= 2 //HORIZON.....the event horizon for terminating iteration

#define N 22 #define T 5 #define HORIZON 300

float Grid[2][T][N][(N-2)/T+2];

void initialize (void) { int i,j,k; for (i=0; i<T; i++) //initialize interior of Grids for (j=1; j<N-1; j++) for (k=0; k<(N-2)/T-1; k++) Grid[0][i][j][k] = 0; for (i=0; i<T; i++) //initialize boundary of Grids for (k=0; k<(N-2)/T+2; k++) { Grid[0][i][0][k] = i*(N-2)/T+k; Grid[0][i][N-1][k] = i*(N-2)/T+k+N-1+N-1; } for (i=0; i<N; i++) { Grid[0][0][i][0] = i; Grid[0][T-1][i][(N-2)/T+1] = i+N-1; } }

float compute (int l, int s, int r, int c) { return (4*Grid[l][s][r][c] + Grid[l][s][r+1][c] + Grid[l][s][r-1][c] + Grid[l][s][r][c+1] + Grid[l][s][r][c-1])/8.0; }

void exchange_boundaries (int l, int s) { int i; if (s<T-1) for (i=0; i<N; i++) Grid[l][s+1][i][0] = Grid[l][s][i][(N-2)/T]; if (s>0) for (i=0; i<N; i++) Grid[l][s-1][i][(N-2)/T+1] = Grid[l][s][i][1]; }

int main() { initialize();

for (int iterate=0; iterate<HORIZON; iterate++) { parfor (int slice=0; slice<T; slice++) { //for each slice for (int row=1; row<N-1; row++) //compute new Grid for (int col=1; col<(N-2)/T+1; col++) Grid[(iterate+1)%2][slice][row][col] = compute(iterate%2,slice,row,col); } parfor (int slice2=0; slice2<T; slice2++) { //exchange boundaries exchange_boundaries((iterate+1)%2,slice2); //between neighbours } cout << "exchange " << iterate << endl; }

for (int i=0; i<T; i++) { //display outcome cout << "Slice " << i << endl; cout << "=========" << endl; for (int j=0; j<N; j++) { for (int k=0; k<(N-2)/T+2; k++) cout << Grid[HORIZON%2][i][j][k] << "\t"; cout << endl; } cout << endl; }

return 0; }

Also, the synchronization at the end of each iteration is excessive. The value of a particular cell in the mesh can affect the next values of only its neighbors. Thus, there is no need for a cell to synchronize with any cells apart from its immediate neighbors. We will discuss how such synchronization schemes can be constructed in Chapter .

paolo@cs.caltech.edu