Pacheco chapter 2 Solutions
#--------------------------------------------------------------------------------
2.10.
Suppose a program must execute 10^12 instructions in order to solve a particular
problem. Suppose further that a single processor system can solve the problem
in 10^6 seconds (about 11.6 days). So, on average, the single processor system
executes 10^6 or a million instructions per second. Now suppose that the
program has been parallelized for execution on a distributed-memory system.
Suppose also that if the parallel program uses p processors, each processor will
execute 10^12 /p instructions and each processor must send 10^9 ( p − 1) messages.
Finally, suppose that there is no additional overhead in executing the
parallel program. That is, the program will complete after each processor has
executed all of its instructions and sent all of its messages, and there won’t be
any delays due to things such as waiting for messages.
a. Suppose it takes 10^−9 seconds to send a message. How long will it take
the program to run with 1000 processors, if each processor is as fast as the
single processor on which the serial program was run?
b. Suppose it takes 10^−3 seconds to send a message. How long will it take the
program to run with 1000 processors?
#--------
(a) Since the number of processors is 1000, the number of instructions (other than
sending messages) that each processor must execute is 10^12 /1000 = 10^9 .
Since each processor executes 10^6 instructions per second, the total time
each processor will spend executing instructions is 10^9 /10^6 = 1000 seconds.
Since each processor must send 10^9 × 999 messages and it takes 10^−9 seconds to
send a single message, the total time each processor will spend sending messages
is 999 seconds.
So the total time each processor will spend is about 1999 seconds or about 33 minutes.
(b) The amount of time the processors spend executing instructions is unchanged:
1000 seconds.
However, now the total time to send the 10^9 × 999 messages is 10^6 × 999 seconds.
So the total time to run the program will be about 10^6 × 999 + 1000 seconds ≈ 32 years!
#--------------------------------------------------------------------------------
2.12
a. A planar mesh is just like a toroidal mesh, except that it doesn’t have
the “wraparound” links.
What is the bisection width of a square planar mesh?
b. A three-dimensional mesh is similar to a planar mesh, except that it also
has depth. What is the bisection width of a three-dimensional mesh?
#--------
(a) Suppose the mesh has p = q × q nodes, where q is even. Then we can bisect the
mesh by simply removing the “middle” horizontal links (as in Figure 2.10).
Since there are q of these links, the bisection width is q = √p.
(b) Suppose the mesh has p = q^3 nodes, where, once again, q is even. Then we can
imagine the three-dimensional mesh as being built by stacking q two-dimensional meshes.
If we lean the stack of two-dimensional meshes against a wall, each of the
two-dimensional meshes is more or less vertical, and we can bisect the
three-dimensional mesh by removing by removing the middle horizontal links
from each of our two-dimensional meshes.
Each of the two-dimensional bisections will remove q links, and we’ll have
removed a total of q^2 links.
So the bisection width of a three-dimensional mesh is q^2 = p^(2/3).
#--------------------------------------------------------------------------------
2.16
a. Suppose the run-time of a serial program is given by Tserial = n2 , where
the units of the run-time are in microseconds. Suppose that a parallelization
of this program has run-time Tparallel = n2 /p + log2 (p).
Write a program that finds the speedups and efficiencies of this program for
various values of n and p.
Run your program with n = 10, 20, 40, . . . , 320, and p = 1, 2, 4, . . . , 128.
What happens to the speedups and efficiencies as p is increased and n is held fixed?
What happens when p is fixed and n is increased?
b. Suppose that Tparallel = Tserial /p + Toverhead. Also suppose that we fix p and
increase the problem size.
- Show that if Toverhead grows more slowly than Tserial , the parallel
efficiency will increase as we increase the problem size.
- Show that if, on the other hand, Toverhead grows faster than Tserial , the
parallel efficiency will decrease as we increase the problem size.
#--------
(a) See ex2.16a spdup.c.
If we hold n fixed and increase p, the behavior of the speedups and
efficiencies depends on the size of n. For n = 10, the speedups
increase at nearly the rate of increase of p when p is small.
But as p increases, the increase in the speedups slows until it actually
decreases in going from 64 to 128.
The behavior of the efficiencies reflect this: For small p, the efficiencies are
close to 1, but as p gets larger, they drop precipitously.
For n = 320, speedups increase by nearly a factor of 2 when we double p —
regardless of how large p is — and, as a consequence efficiencies are
near 1 for all values of p.
If we hold p fixed but increase n, then for small p the speedups and efficiencies
remain nearly constant. For example, for p = 2, the efficiencies are very close
to 1 for all values of n. For large p, the speedups and efficiencies
increase as n is increased, but the rate of increase starts to diminish as n
approaches its maximum of 320.
(b) Since Tparallel = Tserial /p + Toverhead , we can write the efficiency as
E = T_serial / (p T_parallel) = T_serial / ( T_serial + p T_overhead)
If we multiply both numerator and denominator by 1/Tserial , we’ll have
E = 1 / (1 + (p T_parallel/T_serial)
If T_overhead grows more slowly than T_serial , then the fraction T_overhead / T_serial
will get smaller as n increases. So the denominator in our formula for E will
decrease, and E will increase as n increases.
On the other hand, if T_overhead grows faster than T_serial , then the fraction
T_overhead/T_serial will get larger as n increases. This will cause the denominator
in our formula for E to increase, and E will decrease as n increases.
#--------------------------------------------------------------------------------