SEE BODY TAG
Name:

SEE BODY TAG

SEE BODY TAG

SEE BODY TAG

Length ???
1. [5 pts] Match these eight ideas from computer architecture with the following ideas from other fields:
A "Design for Moore's Law" "Performance via Pipelining" "Use Abstraction to Simplify Design" "Performance via Prediction" "Make the Common Case Fast" "Hierarchy of Memories" "Performance via Parallelism" "Dependability via Redundancy"

2. [10 pts] (a) Put these in order from top (fastest) to bottom (slowest) of the memory hierarchy. (b) Briefly describe the relative capacities, and what each is used for:
DRAM registers SSD drives SRAM Magnetic tape
Registers are at the top, fastest but lowest total capacity. SRAM forms cache memory at the next level. DRAM forms main memory below that SSD drives form secondary memory. Mag tape provides has the highest overall capacit, and provides long-term archival storage, at the bottom of the hierarchy.
3. clock cycles per instruction
type Atype Btype Ctype D
CPI3126

instruction distributions
type Atype Btype Ctype D
prog13.0e66.0e79.0e63.0e5
prog21.0e72.0e71.1e74.0e5

A hypothetical processor executes four types of instructions — type A, type B, type C, and type D — at the clock-cycles/instruction (CPI) rates shown on the right. The processor has a clock rate of 3.0GHz.

Two programs prog1 and prog2 have dynamic instruction counts, also shown at right.

[10 pts] Calculate the average cycles per instruction (CPI) for each program, and the total execution time for each program.

```P1 time:    3 * 3.0e6  + 1 * 6.0e7  +  2 * 9.0e6  + 6 * 3.0e5
= 88.8e6 cycles
8.88e7 / 3.0e9 = .0296 seconds

P2 time:    3 * 1.0e7  + 1 * 2.0e7  +  2 * 1.1e7  + 6 * 4.0e5
= 74.4e6 cycles
74.4e6 / 3.0e9 = .0248 seconds
#-----------------

type A    type B    type C    type D   clock rate 3.00E+09
3         1         2         6
instructions cycles   mean CPI   time
prog1  3.00E+06  6.00E+07  9.00E+06  3.00E+05    7.23E+07   8.88E+07   1.23    0.0296
prog2  1.00E+07  2.00E+07  1.10E+07  4.00E+05    4.14E+07   7.44E+07   1.80    0.0248

'=C\$3*C6+D\$3*D6+E\$3*E6+F\$3*F6
'=H6/G6 '=H6/H\$3
#-----------------
```

1. [5 pts] Which program will benefit more if the CPI for "Type D" instructions is decreased to 4 cycles?

2. [5 pts] If the CPI for "type D" instructions is decreased to 4, but the CPI for "type B" instructions increases to 2, while the clock frequency is increased to 4.5GHz, does each program take more time or less time to execute?

```P1 time:    3 * 3.0e6  + 1 * 6.0e7  +  2 * 9.0e6  + 6 * 3.0e5
= 88.8e6 cycles
8.88e7 / 3.0e9 = .0296 seconds

P2 time:    3 * 1.0e7  + 1 * 2.0e7  +  2 * 1.1e7  + 6 * 4.0e5
= 74.4e6 cycles
74.4e6 / 3.0e9 = .0248 seconds

#-----------------

type A     type B     type C     type D       clock rate
3          1          2          6          3.00E+09
cycles   mean CPI time
prog1	3.00E+06   6.00E+07   9.00E+06   3.00E+05   8.88E+07   1.23   0.0296
prog2	1.00E+07   2.00E+07   1.10E+07   4.00E+05   7.44E+07   1.80   0.0248

type A     type B     type C     type D       clock rate
3          2          2          4          4.50E+09
cycles   mean CPI time
prog1   3.00E+06   6.00E+07   9.00E+06   3.00E+05   1.48E+08   2.05   0.032933333333333
prog2   1.00E+07   2.00E+07   1.10E+07   4.00E+05   9.36E+07   2.26   0.0208

#-----------------
```

4. This graph shows transistor size (downward trend) and CPU frequency (upward trend) for Intel processors, from the 4004 in 1971 to a Core processor in 2012. (The vertical scales are logarithmic.)

Power consumption is governed by the equations below.

• Capacitive load  transistor count × transistor size
• Leakage current  1/(transistor size) × eVoltage
• Dynamic power
= k × Capacitive load × Voltage2 × Frequency / 2
• Static power = Voltage × leakage current
1. [5 pts] What "law" describes the steady downward trend in transistor size? State what the law says, as precisely as you can.
Moore's Law; feature size halves and performance doubles every 18 to 24 months.
2. [5 pts] The capacitive load per CPU stayed approximately constant over the period from 1971 to 2012 What must have been happening to the transistor count over that period?
transistor count increased exponentially, as the inverse of transistor size.
3. [5 pts] The operating voltage also decreased over the latter part of this time period. Why did the static power consumption increase over time, instead of decreasing?
Transistor size decreased.
4. [5 pts] What term in the power equations is adversely affected by the transistor-size trend? What can be done to counteract the effect of the transistor size?
Leakage current increases. Counteract by lowering voltage.
5. [5 pts] Why did the frequency curve begin to flatten out around 2002?
Increasing frequency led to more power consumption and heat generation.
5. [10 pts]

Here is a LEGv8 Assembly instruction: `movk x27, 0x00f0, lsl 0`

Identify the instruction format, and show the binary contents of each of the register fields in the assembled machine instruction.

Identify the instruction format, and show the binary contents of each field of the assembled machine instruction.

Also show the full machine instruction in hexadecimal.

IM / IW format
```       0x794       0x00f0          27
11110010100.0000000011110000.11011
___opcode__.____Immediate___.__Rd_

1111 0010 1000 0000  0001 1110 0001 1011  =  f2 80 1e 1b
- ----  ----
```
6. [10 pts]

Here is a LEGv8 machine instruction: `0x38226820` `0x7800f020`

Identify the instruction format, and show each of the instruction format's fields, in binary.

Also write the Assembly-language version of the instruction.

D format
```    0111 1000 0000 0000  1111 0000 0010 0000

01111000000 000001111 00 00001 00000
3   c   0      /\
upper 16 bits  lower 16 bits
```
STURH w0, [x1, 15]
7. Here is an arm64 implementation of a strncpy() function:

1. [10 pts] List all instructions that involve or affect the stack frame or the stack pointer. Also state the instruction format from the "Green card", for the instruction or a similar instruction.

STP, STR - D format
LDP, LDR - D format
(MOV --> OR - R format)
2. [10 pts] List all instructions in the program that use pre-increment addressing. Briefly describe the effect of this addressing mode. Also state the instruction format(s) from the "Green card", for the instruction or a similar instruction.

STP fp, lr, [sp, -0x20] ! --- add -0x20 to sp, then use result as mem addr
3. [10 pts] List all instructions in the program that use post-increment addressing. Briefly describe the effect of this addressing mode. Also state the instruction format(s) from the "Green card", for the instruction or a similar instruction.

LDP fp, lr, [sp], 0x20] --- use sp as mem addr, then add 0x20 to it
LDRB w9, [x1], 1
STRB w9, [x0], 1
4. [10 pts] List all instructions that potentially change the value of the program counter PC — that is, all flow-control instructions. Also state the instruction format(s) from the "Green card", for the instruction or a similar instruction.

B.EQ, B.LT, B, RET, CBZ --- B-format, CB-format; R-format;;

8. [10 pts]

The figure at right shows a standard block diagram for a 4-bit ripple-carry adder.

Suppose that this circuit can produce its output bits ((4 sum bits and a carry-out bit) in 24 nanoseconds. If the design is extended to 32 bits, what will happen to the time needed to produce all output bits (32 sum bits and a carry-out)? Be as specific as possible.

32/4 * 24ns = 192 ns

9. [10 pts]

The figure at right shows a 32-bit Wallace tree circuit that can multiply by doing 32 1-bit multipications in parallel, rather than needing one clock cycle for each bit.

Briefly explain why it is not possible to create a comparable circuit to do division.

Each subtraction is conditional on the outcome of the preceding subtraction, so they cannot be done in parallel.