SEE BODY TAG
Name:

SEE BODY TAG

SEE BODY TAG

SEE BODY TAG

Write programs to solve each of these problems. Include comments in each program with your name, "CompSci 491 Exam 2", and the date.

Upload each program separately, to the course website under "Exam 2".

1. [25 pts] Write a program that uses OpenMP to run as many threads as possible. Each thread must print out the letter 'A', the total number of threads, and its own thread number; then use the `sleep()` function (from `"unistd.h"`) to sleep for three seconds; then print out a blank line, and the letter 'B', the total threads, and its own thread number again.

Here is an example run of the instructor's solution:

```\$
\$ ./sleeper-omp

\$
```

(Notice that the order of the threads changes from their first outputs to their second.)

2. [25 pts] Write a program that uses Pthreads to run five threads. Each thread must print out the letter 'A' and the value of its single parameter, cast to a text string. Then use the `sleep()` function (from `"unistd.h"`) to sleep for two seconds; then print out a blank line, and the letter 'B' and its parameter again.

Here is an example run of the instructor's solution:

```\$
A: alpha
A: gamma
A: beta
A: delta
A: epsilon

B: gamma

B: epsilon

B: alpha

B: delta

B: beta
\$
```

(Notice that the order of the threads changes from their first outputs to their second.)

3. [25 pts] Here is the main body of a program that creates a large array of random values, then calls a function to sum it up.

Copy it as-is:

Write the parallelized function `double sumup(float *array, long unsigned size)` to sum up the values (the easy part). Your function must use OpenMP.

• DO NOT parallelize the loop that generates the random data; you should make no changes to `main()`. Just write a parallelized function to do the summing.
• Hint: You will need a simple critical section.
• You can put your function in a separate file, or modify the main file if you prefer.
• You don't have to write a serial version, but it might be helpful to you. (It's trivial, just a for loop.)

Here are timings for the instructor's solution on a laptop, for reference:

#### 256M values

The program's single argument is a power of 2, used to determine the number of values. An argument of "28" should run fairly quickly, for testing purposes:

```\$ gcc -Wall  -o sumrands-serial  sumrands-main.c  sumrands-serial.c
\$ gcc -Wall  -o sumrands-omp  sumrands-main.c  sumrands-omp.c  -fopenmp
\$
\$ time ./sumrands-serial 28
268435456 values / 2147483648 bytes (seed 7)
2.667056 total compute seconds: 1.987923 filling seconds, 0.679133 sumup seconds.
Sum of 268435456 values is 72057918657108261
Average of 268435456 values is 268436665.300683

real	0m2.729s
user	0m2.288s
sys	0m0.440s
\$
\$ time ./sumrands-omp 28
268435456 values / 2147483648 bytes (seed 7)
2.852838 total compute seconds: 1.966147 filling seconds, 0.886691 sumup seconds.
Sum of 268435456 values is 72057918657108261
Average of 268435456 values is 268436665.300683

real	0m2.144s
user	0m2.428s
sys	0m0.500s
\$
```

The OpenMP version uses slightly more clocks to sum up, but slightly less real time. (About two-thirds of the user time is spent in the serial array-filling loop at the beginning.)

#### 2G values

An argument of "31" produces more interesting times for comparison. (An argument of "32" or larger makes even more interesting times, but don't try this until you're finished with the problem!)

```\$
\$ time ./sumrands-serial 31
2147483648 values / 17179869184 bytes (seed 7)
38.613407 total compute seconds: 16.428843 filling seconds, 22.184566 sumup seconds.
Sum of 2147483648 values is 576446126628202761
Average of 2147483648 values is 268428645.389249

real	3m0.697s
user	0m24.348s
sys	0m15.076s
\$
\$ time ./sumrands-omp 31
2147483648 values / 17179869184 bytes (seed 7)
27.672680 total compute seconds: 18.896782 filling seconds, 8.775898 sumup seconds.
Sum of 2147483648 values is 576446126628202761
Average of 2147483648 values is 268428645.389249

real	1m6.100s
user	0m22.192s
sys	0m6.052s
\$
```

The serial version shows more sumup time, contributing to a higher total compute time than the OpenMP version; it also has an anomalously long real time.

4. [25 pts] Make another copy of the "sumrands-main.c" program above, and name it "sumrands-pthreads.c". Modify the program to use 16 pthreads.
• The timing code and the array-creating code in `main()` should be left alone.
• Replace the single call to the `sumup()` function with the code needed to start the pthreads.
• Add a struct that contains a pointer to the data array, the size of the data array, and anything else the thread may need.
• You can put your function in a separate file or not, as you prefer.
• You can use a critical section and a mutex to communicate partial sums back to the parent thread, but you can also include the partial sum in the struct that holds the function arguments and just let the parent get all the partial threads from those structs.

Here are timings for the instructor's solution on a laptop, for reference:

#### Test runs: 256M and 2G values:

These times are comparable to the "OpenMP" times shown in the previous problem.

```\$ gcc -Wall  -o sumrands-pthreads  -lpthread
\$
268435456 values / 2147483648 bytes (7 seed)
5.075000 total compute seconds: 2.047424 filling seconds, 3.027576 sumup seconds.
Sum of 268435456 values is 17726168783105492820
Average of 268435456 values is 66035124596.601326

real	0m2.526s
user	0m4.708s
sys	0m0.428s
\$
2147483648 values / 17179869184 bytes (7 seed)
44.982281 total compute seconds: 20.212025 filling seconds, 24.770256 sumup seconds.