SEE BODY TAG
Name:
Answer Key

SEE BODY TAG

SEE BODY TAG

 

SEE BODY TAG


Length ???
  1. [5 pts] Match these eight ideas from computer architecture with the following ideas from other fields:
    A "Design for Moore's Law"  E "Performance via Pipelining"
    B "Use Abstraction to Simplify Design"  F "Performance via Prediction"
    C "Make the Common Case Fast"  G "Hierarchy of Memories"
    D "Performance via Parallelism"  H "Dependability via Redundancy"
     
  2. [10 pts] (a) Put these in order from top (fastest) to bottom (slowest) of the memory hierarchy. (b) Briefly describe the relative capacities, and what each is used for:
    DRAM registers SSD drives SRAM Magnetic tape
    Registers are at the top, fastest but lowest total capacity. SRAM forms cache memory at the next level. DRAM forms main memory below that SSD drives form secondary memory. Mag tape provides has the highest overall capacit, and provides long-term archival storage, at the bottom of the hierarchy.  
  3. clock cycles per instruction
     type Atype Btype Ctype D
    CPI3126

     

    instruction distributions
     type Atype Btype Ctype D
    prog13.0e66.0e79.0e63.0e5
    prog21.0e72.0e71.1e74.0e5

    A hypothetical processor executes four types of instructions — type A, type B, type C, and type D — at the clock-cycles/instruction (CPI) rates shown on the right. The processor has a clock rate of 3.0GHz.

    Two programs prog1 and prog2 have dynamic instruction counts, also shown at right.

    [10 pts] Calculate the average cycles per instruction (CPI) for each program, and the total execution time for each program.

    P1 time:    3 * 3.0e6  + 1 * 6.0e7  +  2 * 9.0e6  + 6 * 3.0e5
                = 88.8e6 cycles
                8.88e7 / 3.0e9 = .0296 seconds
    
    P2 time:    3 * 1.0e7  + 1 * 2.0e7  +  2 * 1.1e7  + 6 * 4.0e5
                = 74.4e6 cycles
                74.4e6 / 3.0e9 = .0248 seconds
    #-----------------
    
     type A    type B    type C    type D   clock rate 3.00E+09
       3         1         2         6       
                                            instructions cycles   mean CPI   time
    prog1  3.00E+06  6.00E+07  9.00E+06  3.00E+05    7.23E+07   8.88E+07   1.23    0.0296
    prog2  1.00E+07  2.00E+07  1.10E+07  4.00E+05    4.14E+07   7.44E+07   1.80    0.0248
    
                                                        '=C$3*C6+D$3*D6+E$3*E6+F$3*F6    
                                                                   '=H6/G6 '=H6/H$3
    #-----------------
    
     
    1. [5 pts] Which program will benefit more if the CPI for "Type D" instructions is decreased to 4 cycles?

    2. [5 pts] If the CPI for "type D" instructions is decreased to 4, but the CPI for "type B" instructions increases to 2, while the clock frequency is increased to 4.5GHz, does each program take more time or less time to execute?

      P1 time:    3 * 3.0e6  + 1 * 6.0e7  +  2 * 9.0e6  + 6 * 3.0e5
                  = 88.8e6 cycles
                  8.88e7 / 3.0e9 = .0296 seconds
      
      P2 time:    3 * 1.0e7  + 1 * 2.0e7  +  2 * 1.1e7  + 6 * 4.0e5
                  = 74.4e6 cycles
                  74.4e6 / 3.0e9 = .0248 seconds
      
      #-----------------
      
              type A     type B     type C     type D       clock rate		
                 3          1          2          6          3.00E+09		
      					            cycles   mean CPI time
      prog1	3.00E+06   6.00E+07   9.00E+06   3.00E+05   8.88E+07   1.23   0.0296
      prog2	1.00E+07   2.00E+07   1.10E+07   4.00E+05   7.44E+07   1.80   0.0248
      							
      							
              type A     type B     type C     type D       clock rate		
                 3          2          2          4          4.50E+09		
      					            cycles   mean CPI time
      prog1   3.00E+06   6.00E+07   9.00E+06   3.00E+05   1.48E+08   2.05   0.032933333333333
      prog2   1.00E+07   2.00E+07   1.10E+07   4.00E+05   9.36E+07   2.26   0.0208
      
      #-----------------
      
       
  4. Intel processor history This graph shows transistor size (downward trend) and CPU frequency (upward trend) for Intel processors, from the 4004 in 1971 to a Core processor in 2012. (The vertical scales are logarithmic.)

    Power consumption is governed by the equations below.

    • Capacitive load  transistor count × transistor size
    • Leakage current  1/(transistor size) × eVoltage
    • Dynamic power 
          = k × Capacitive load × Voltage2 × Frequency / 2
    • Static power = Voltage × leakage current
    1. [5 pts] What "law" describes the steady downward trend in transistor size? State what the law says, as precisely as you can.
      Moore's Law; feature size halves and performance doubles every 18 to 24 months.  
    2. [5 pts] The capacitive load per CPU stayed approximately constant over the period from 1971 to 2012 What must have been happening to the transistor count over that period?
      transistor count increased exponentially, as the inverse of transistor size.  
    3. [5 pts] The operating voltage also decreased over the latter part of this time period. Why did the static power consumption increase over time, instead of decreasing?
      Transistor size decreased.  
    4. [5 pts] What term in the power equations is adversely affected by the transistor-size trend? What can be done to counteract the effect of the transistor size?
      Leakage current increases. Counteract by lowering voltage.  
    5. [5 pts] Why did the frequency curve begin to flatten out around 2002?
      Increasing frequency led to more power consumption and heat generation.  
  5. [10 pts]

    Here is a LEGv8 Assembly instruction: movk x27, 0x00f0, lsl 0

    Identify the instruction format, and show the binary contents of each of the register fields in the assembled machine instruction.

    Identify the instruction format, and show the binary contents of each field of the assembled machine instruction.

    Also show the full machine instruction in hexadecimal.

    IM / IW format
           0x794       0x00f0          27
        11110010100.0000000011110000.11011
        ___opcode__.____Immediate___.__Rd_
    
        1111 0010 1000 0000  0001 1110 0001 1011  =  f2 80 1e 1b
                     - ----  ---- 
    
  6. [10 pts]

    Here is a LEGv8 machine instruction: 0x38226820 0x7800f020

    Identify the instruction format, and show each of the instruction format's fields, in binary.

    Also write the Assembly-language version of the instruction.

    D format
        0111 1000 0000 0000  1111 0000 0010 0000
    
           ___opcode__.__DTaddr_.__.__Rn_.__Rt_
           01111000000 000001111 00 00001 00000
            3   c   0      /\
              upper 16 bits  lower 16 bits
    
    STURH w0, [x1, 15]
  7. Here is an arm64 implementation of a strncpy() function:

    // assembly strncpy()
    // 2021-03-19
        .global strncpy
        .text
    // expects -
    //   x0: pointer to dest
    //   x1: pointer to src
    //   x2: max length of dest
    // returns -
    //   x0: copied length, in bytes
    strncpy:
        stp  fp, lr, [sp, -0x20] !
        mov  fp, sp
        str  x0, [fp, 0x10] // save dest ptr
        str  x1, [fp, 0x18] // save src ptr
    strncpy_test:
        sub  x2, x2, 1
        cmp  x2, xzr
        b.eq strncpy_end
        b.lt strncpy_done
    strncpy_each:
        ldrb w9, [x1], 1
        cbz  w9, strncpy_end
        strb w9, [x0], 1
        b    strncpy_test
    strncpy_end:
        strb wzr, [x0]
    strncpy_done:
        ldr  x1, [fp, 0x10]
        sub  x0, x0, x1
        ldp  fp, lr, [sp], 0x20
        ret
    
    1. [10 pts] List all instructions that involve or affect the stack frame or the stack pointer. Also state the instruction format from the "Green card", for the instruction or a similar instruction.

      STP, STR - D format
      LDP, LDR - D format
      (MOV --> OR - R format)
    2. [10 pts] List all instructions in the program that use pre-increment addressing. Briefly describe the effect of this addressing mode. Also state the instruction format(s) from the "Green card", for the instruction or a similar instruction.

      STP fp, lr, [sp, -0x20] ! --- add -0x20 to sp, then use result as mem addr
    3. [10 pts] List all instructions in the program that use post-increment addressing. Briefly describe the effect of this addressing mode. Also state the instruction format(s) from the "Green card", for the instruction or a similar instruction.

      LDP fp, lr, [sp], 0x20] --- use sp as mem addr, then add 0x20 to it
      LDRB w9, [x1], 1
      STRB w9, [x0], 1
    4. [10 pts] List all instructions that potentially change the value of the program counter PC — that is, all flow-control instructions. Also state the instruction format(s) from the "Green card", for the instruction or a similar instruction.

      B.EQ, B.LT, B, RET, CBZ --- B-format, CB-format; R-format;;
    
        
  8. [10 pts] 4-bit ripple-carry adder 4-bit ripple-carry adder

    The figure at right shows a standard block diagram for a 4-bit ripple-carry adder.

    Suppose that this circuit can produce its output bits ((4 sum bits and a carry-out bit) in 24 nanoseconds. If the design is extended to 32 bits, what will happen to the time needed to produce all output bits (32 sum bits and a carry-out)? Be as specific as possible.

    32/4 * 24ns = 192 ns
    
        
  9. [10 pts] Wallace tree

    The figure at right shows a 32-bit Wallace tree circuit that can multiply by doing 32 1-bit multipications in parallel, rather than needing one clock cycle for each bit.

    Briefly explain why it is not possible to create a comparable circuit to do division.

    Each subtraction is conditional on the outcome of the preceding subtraction, so they cannot be done in parallel.