Chapter 1 solutions - 1.3, 1.4, 1.6, 1.7
#----------------------------------------------------------------
1.3.
divisor = 2;
core_difference = 1;
sum = my_value;
while ( divisor <= number of cores ) {
if ( my_rank % divisor == 0 ) {
partner = my_rank + core_difference;
receive value from partner core;
sum += received value;
} else {
partner = my_rank - core_difference;
send my sum to partner core;
}
divisor *= 2;
core_difference *= 2;
}
#----------------------------------------------------------------
1.4.
bitmask = 1;
divisor = 2;
sum = my_value;
while ( bitmask < number of cores ) {
partner = my_rank ^ bitmask;
if ( my_rank % divisor == 0 ) {
receive value from partner core;
sum += received value;
} else {
send my_sum to partner core;
}
bitmask <<= 1;
divisor *= 2;
}
#----------------------------------------------------------------
1.6.
(a) The number of receives is p − 1, and the number of additions is p − 1.
(b) The number of receives is log2(p), and the number of additions is log2(p).
(c)
p Original Tree-Structured
2 1 1
4 3 2
8 7 3
16 15 4
32 31 5
64 63 6
128 127 7
256 255 8
512 511 9
1024 1023 10
#----------------------------------------------------------------
1.7.
The example is a combination of task- and data- parallelism.
In each phase of the tree-structured global sum, the cores are computing partial sums.
This can be seen as data-parallelism.
Also, in each phase, there are two types of tasks.
Some cores are sending their sums and some are receiving another cores partial sum.
This can be seen as task-parallelism.
#----------------------------------------------------------------