Previous: Array Initialization Up: Examples
This is a solution to the cellular automaton grid computation problem. A gradient function is evaluated iteratively over a two dimensional grid of points. The initial boundary conditions are given and each interior point computes its new value as a weighted average of its old value and its neighbors' old values. This process is repeated until convergence (or, in this case, until a fixed number of iterations have been processed).
#include <iostream.h> const int N = 10;float Mesh[N][N];
float compute_cell (int r, int c) { return (4*Mesh[r][c] + Mesh[r+1][c] + Mesh[r-1][c] + Mesh[r][c+1] + Mesh[r][c-1])/8.0; }
void calculate (void) { float New_Mesh[N][N]; for (int iterate=0; iterate<300; iterate++) { parfor (int row=1; row<N-1; row++) //compute new mesh parfor (int col=1; col<N-1; col++) New_Mesh[row][col] = compute_cell(row,col); parfor (int newrow=1; newrow<N-1; newrow++) //update old mesh parfor (int newcol=1; newcol<N-1; newcol++) Mesh[newrow][newcol] = New_Mesh[newrow][newcol]; } }
int main() { for (int i=1; i<N-1; i++) { //initialize boundary of Mesh Mesh[0][i] = i; Mesh[i][0] = i; Mesh[N-1][i] = N-1+i; Mesh[i][N-1] = N-1+i; } for (i=1; i<N-1; i++) //initialize interior of Mesh for (int j=1; j<N-1; j++) Mesh[i][j] = 0;
calculate();
for (i=0; i<N; i++) { //display outcome for (int j=0; j<N; j++) cout << Mesh[i][j] << "\t"; cout << endl; } return 0; }
The computation is done in
parallel for all points. Note that a particular
point's old value may be used in as many as 4 computations
(i.e. for each of its neighbors). Because each computation
requiring this old value performs only a read operation, this
is not an instance of dangerous sharing. The value that
is computed is written to a new mesh. At the end of the
first two parfor statements, therefore, we know that
this new mesh of values has been completely filled in
with the new values. The second group of parfor statements
can then safely copy these values to the original mesh.
It is important to understand why no mutable variable is
being both written and read by parallel threads of control.
Clearly this is not a very efficient solution to this
problem. The cost of copying the mesh of values at every
iteration is high. One way to avoid this cost is to
maintain two arrays, M1 and M2. On even iterations, the
M1 stores the old value and the new values are written
into M2, and vice versa on the odd iterations. Also,
the number of parallel threads of control is excessive
(as discussed in the ``Pitfalls'' section, ),
considering the small amount of work to be performed by
each one. It is reasonable to expect a program in which
each thread of control computes the values for a collection
of cells to be more efficient. The following code incorporates
this optimization.
#include <iostream.h>//N...........size of grid (N by N) //T...........number of concurrent processes, each working // on an N by ((N-2)/T+2) slice (T must divide N-2) // and (N-2)/T $>$= 2 //HORIZON.....the event horizon for terminating iteration
#define N 22 #define T 5 #define HORIZON 300
float Grid[2][T][N][(N-2)/T+2];
void initialize (void) { int i,j,k; for (i=0; i<T; i++) //initialize interior of Grids for (j=1; j<N-1; j++) for (k=0; k<(N-2)/T-1; k++) Grid[0][i][j][k] = 0; for (i=0; i<T; i++) //initialize boundary of Grids for (k=0; k<(N-2)/T+2; k++) { Grid[0][i][0][k] = i*(N-2)/T+k; Grid[0][i][N-1][k] = i*(N-2)/T+k+N-1+N-1; } for (i=0; i<N; i++) { Grid[0][0][i][0] = i; Grid[0][T-1][i][(N-2)/T+1] = i+N-1; } }
float compute (int l, int s, int r, int c) { return (4*Grid[l][s][r][c] + Grid[l][s][r+1][c] + Grid[l][s][r-1][c] + Grid[l][s][r][c+1] + Grid[l][s][r][c-1])/8.0; }
void exchange_boundaries (int l, int s) { int i; if (s<T-1) for (i=0; i<N; i++) Grid[l][s+1][i][0] = Grid[l][s][i][(N-2)/T]; if (s>0) for (i=0; i<N; i++) Grid[l][s-1][i][(N-2)/T+1] = Grid[l][s][i][1]; }
int main() { initialize();
for (int iterate=0; iterate<HORIZON; iterate++) { parfor (int slice=0; slice<T; slice++) { //for each slice for (int row=1; row<N-1; row++) //compute new Grid for (int col=1; col<(N-2)/T+1; col++) Grid[(iterate+1)%2][slice][row][col] = compute(iterate%2,slice,row,col); } parfor (int slice2=0; slice2<T; slice2++) { //exchange boundaries exchange_boundaries((iterate+1)%2,slice2); //between neighbours } cout << "exchange " << iterate << endl; }
for (int i=0; i<T; i++) { //display outcome cout << "Slice " << i << endl; cout << "=========" << endl; for (int j=0; j<N; j++) { for (int k=0; k<(N-2)/T+2; k++) cout << Grid[HORIZON%2][i][j][k] << "\t"; cout << endl; } cout << endl; }
return 0; }
Also, the synchronization at the end of each iteration
is excessive. The value of a particular cell in the mesh
can affect the next values of only its neighbors. Thus,
there is no need for a cell to synchronize with any cells
apart from its immediate neighbors. We will discuss how
such synchronization schemes can be constructed in
Chapter .