Previous: Distributed Hello World Up: Distributed Hello World Next: Distributed Hello World

Introduction to Distributed Computing

An address space is the set of memory that can be accessed from a thread of control. So far, all the concurrency we have created using par, parfor, and spawn has been inside a single address space. Modulo the scoping boundaries imposed by the C++ language, each thread of control has access to the same memory. This means that communicating data from one thread to another simply requires agreeing on which location in memory to place the information. However, simultaneous access to data by multiple threads is nondeterministic. We introduced atomic and sync to control this nondeterminism.

We are now going to talk about distributing a computation over several address spaces. Threads on separate address spaces no longer have access to the same memory. Thus, communication of data from one address space to another is required for two such threads to share data. This communication is often quite time-consuming. However, we now need to be concerned only with nondeterminism caused by interaction with other threads on the same address space, rather than with all threads in the entire computation.

Because communication is now more expensive, deciding in which address space to place which pieces of data becomes important. Each thread would like access to pieces of data it frequently uses to be inexpensive, i.e., in the same address space. We would like to distribute the computation to the available address spaces in such a way that each piece can inexpensively access most of the data on which it depends.

In C++ objects, we group the data related to pieces of computation (member functions) together. CC++ extends this idea with processor objects. Each processor object is a separate address space. We group related pieces of data, and the parts of the computation that go with them, into one processor object.

Naturally, we cannot always break the computation up such that each piece can inexpensively access all the data on which it depends. In CC++, data that is expensive to access is distinguished from data that is inexpensive to access. Pointers that reference data that is expensive to access (i.e., on another address space/processor object) are global pointers, while those that reference inexpensively accessible data (i.e., on the same processor object) are local pointers.

Dereferencing a global pointer creates a communication to another processor object to fetch the value referenced. The specifics of this communication are controlled through the CC++ construct of data transfer functions.

We will see processor objects, global pointers, and data transfer functions in detail in the next 3 chapters. First we present a simple example of their use.

paolo@cs.caltech.edu