Lectures for 2nd Edition - Oregon State University

Lectures for 2nd Edition - Oregon State University

Lecture 6 1 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Notes Finish up concepts of multi-cycle computer today Move to Pipelining next Tuesday Homework #2 due next Tuesday Pre-Lab #1 due next Tuesday 3 paper readings, and short summarization Homework #3: Likely a programming assignment C code to design an interconnection network 2 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Pipelining Improve performance by increasing instruction throughput Program execution Time order (in Morgan Kaufmann Publishersinstructions) 200 lw Morgan Kaufmann Publishers Morgan Kaufmann Publishers$1, Morgan Kaufmann Publishers100($0) Instruction fetch Reg lw Morgan Kaufmann Publishers Morgan Kaufmann Publishers$2, Morgan Kaufmann Publishers Morgan Kaufmann Publishers200($0) 400 600 Data access ALU

800 1000 1200 1400 ALU Data access 1600 1800 Reg Instruction Reg fetch 800 Morgan Kaufmann Publishersps lw Morgan Kaufmann Publishers Morgan Kaufmann Publishers$3, Morgan Kaufmann Publishers Morgan Kaufmann Publishers300($0) Reg Instruction fetch 800 Morgan Kaufmann Publishersps Note: timing assumptions changed for this example 800 Morgan Kaufmann Publishersps Program execution Time order (in Morgan Kaufmann Publishersinstructions) lw Morgan Kaufmann Publishers Morgan Kaufmann Publishers$1, Morgan Kaufmann Publishers100($0) 200 Instruction fetch

400 Reg lw Morgan Kaufmann Publishers Morgan Kaufmann Publishers$2, Morgan Kaufmann Publishers Morgan Kaufmann Publishers200($0) 200 Morgan Kaufmann Publishersps Instruction fetch lw Morgan Kaufmann Publishers Morgan Kaufmann Publishers$3, Morgan Kaufmann Publishers Morgan Kaufmann Publishers300($0) 200 Morgan Kaufmann Publishersps 600 ALU Reg Instruction fetch 800 Data access ALU Reg 1000 1200 1400 Reg Data access ALU Reg Data access Reg 200 Morgan Kaufmann Publishersps 200 Morgan Kaufmann Publishersps 200 Morgan Kaufmann Publishersps 200 Morgan Kaufmann Publishersps 200 Morgan Kaufmann Publishersps Ideal speedup is number of stages in the pipeline. Do we achieve this? 3

2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Simple Implementation Include the functional units we need for each instruction Instruction address MemWrite Instruction Add Sum PC Address Read data 16 Instruction memory a. Morgan Kaufmann PublishersInstruction Morgan Kaufmann Publishersmemory b. Morgan Kaufmann PublishersProgram Morgan Kaufmann Publisherscounter c. Morgan Kaufmann PublishersAdder Write data Data memory Sign extend 32 MemRead a. Morgan Kaufmann PublishersData Morgan Kaufmann Publishersmemory Morgan Kaufmann Publishersunit Register numbers 5

Read register Morgan Kaufmann Publishers1 5 Read register Morgan Kaufmann Publishers2 5 Data Write register 4 Read data Morgan Kaufmann Publishers1 Data Registers ALU Morgan Kaufmann Publishersoperation Zero ALU ALU result Read data Morgan Kaufmann Publishers2 Write Data b. Morgan Kaufmann PublishersSign-extension Morgan Kaufmann Publishersunit Why do we need this stuff? RegWrite a. Morgan Kaufmann PublishersRegisters b. Morgan Kaufmann PublishersALU 4 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers

Building the Datapath Use multiplexors to stitch them together PCSrc M u x Add Add 4 ALU result Shift left Morgan Kaufmann Publishers2 PC Read address Instruction Instruction memory Read register Morgan Kaufmann Publishers1 ALUSrc Read data Morgan Kaufmann Publishers1 ALU Morgan Kaufmann Publishersoperation MemWrite Read register Morgan Kaufmann Publishers2 Registers Read Write data Morgan Kaufmann Publishers2 register MemtoReg

Zero M u x Write data ALU ALU result Address Write data RegWrite 16 4 Sign extend 32 Read data M u x Data memory MemRead 5 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Control Selecting the operations to perform (ALU, read/write, etc.)

Controlling the flow of data (multiplexor inputs) Information comes from the 32 bits of the instruction Example: add $8, $17, $18 Instruction Format: 000000 10001 10010 op rs rt 01000 00000 100000 rd shamt funct ALU's operation based on instruction type and function code 6 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Control e.g., what should the ALU do with this instruction Example: lw $1, 100($2) 35 2 1

op rs rt 16 bit offset ALU control input 0000 0001 0010 0110 0111 1100 100 AND OR add subtract set-on-less-than NOR Why is the code for subtract 0110 and not 0011? 7 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Control Must describe hardware to compute 4-bit ALU control input given instruction type 00 = lw, sw ALUOp 01 = beq, computed from instruction type 10 = arithmetic function code for arithmetic Describe it using a truth table (can turn into gates):

8 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers 0 M u x Add ALU Add result 4 Instruction Morgan Kaufmann Publishers[3126] PC Read address Instruction [310] Instruction memory Instruction Morgan Kaufmann Publishers[2521] Read register Morgan Kaufmann Publishers1 Instruction Morgan Kaufmann Publishers[2016] Read register Morgan Kaufmann Publishers2 0 M u Instruction Morgan Kaufmann Publishers[1511] x 1 Instruction Morgan Kaufmann Publishers[150] Shift left Morgan Kaufmann Publishers2

RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite Control Write register Write data 16 1 Read data Morgan Kaufmann Publishers1 Zero Read data Morgan Kaufmann Publishers2 Registers Sign extend ALU ALU result 0 M u x 1 Address Read data 1 M u

x 0 Data Write memory data 32 ALU control Instruction Morgan Kaufmann Publishers[50] Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq

X 0 X 0 0 0 1 0 1 Control Simple combinational logic (truth tables) Inputs Op5 Op4 Op3 Op2 ALUOp Op1 ALU Morgan Kaufmann Publisherscontrol Morgan Kaufmann Publishersb lock Op0 ALUOp0 ALUOp1 Outputs F3 F Morgan Kaufmann Publishers(5 0) F2 F1 F0 Operation2 Operation1 R-format Operation Iw sw

beq RegDst ALUSrc MemtoReg Operation0 RegWrite MemRead MemWrite Branch ALUOp1 ALUOpO 10 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Our Simple Control Structure All of the logic is combinational We wait for everything to settle down, and the right thing to be done ALU might not produce right answer right away we use write signals along with clock to determine when to write Cycle time determined by length of the longest path State element 1 Combinational Morgan Kaufmann Publisherslogic State element 2 Clock Morgan Kaufmann Publisherscycle We are ignoring some details like setup and hold times

11 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Single Cycle Implementation Calculate cycle time assuming negligible delays except: memory (200ps), ALU and adders (100ps), register file access (50ps) PCSrc M u x Add Add 4 ALU result Shift left Morgan Kaufmann Publishers2 PC Read address Instruction Instruction memory Read register Morgan Kaufmann Publishers1 ALUSrc Read data Morgan Kaufmann Publishers1 ALU Morgan Kaufmann Publishersoperation MemWrite Read register Morgan Kaufmann Publishers2 Registers Read

Write data Morgan Kaufmann Publishers2 register MemtoReg Zero M u x Write data ALU ALU result Address Write data RegWrite 16 4 Sign extend 32 Read data M u x Data memory MemRead 12 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers

Where we are headed Single Cycle Problems: what if we had a more complicated instruction like floating point? wasteful of area One Solution: use a smaller cycle time have different instructions take different numbers of cycles a multicycle datapath: PC Address Instruction register Instruction or Morgan Kaufmann Publishersdata Memory Data Memory data register Data A Register Morgan Kaufmann Publishers# Registers Register Morgan Kaufmann Publishers# ALU ALUOut B Register Morgan Kaufmann Publishers# 13

2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Multicycle Approach We will be reusing functional units ALU used to compute address and to increment PC Memory used for instruction and data Our control signals will not be determined directly by instruction e.g., what should the ALU do for a subtract instruction? Well use a finite state machine for control 14 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Multicycle Approach Break up the instructions into steps, each step takes a cycle balance the amount of work to be done restrict each cycle to use only one major functional unit At the end of a cycle store values for use in later cycles (easiest thing to do) introduce additional internal registers PC 0 M u x 1 Address Memory MemData Write data Instruction [2016] Instruction

[150] Instruction register Instruction [150] Memory data register 0 M u x 1 Read register Morgan Kaufmann Publishers1 Instruction [2521] 0 M Instruction u x [1511] 1 0 M u x 1 16 Read data Morgan Kaufmann Publishers1 Read register Morgan Kaufmann Publishers2 Registers Write Read register data Morgan Kaufmann Publishers2 A ALUOut

0 B Write data Sign extend Zero ALU ALU result 4 1M u 2 x 3 32 Shift left Morgan Kaufmann Publishers2 15 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Instructions from ISA perspective Consider each instruction from perspective of ISA. Example: The add instruction changes a register. Register specified by bits 15:11 of instruction. Instruction specified by the PC. New value is the sum (op) of two registers. Registers specified by bits 25:21 and 20:16 of the instruction Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]] In order to accomplish this we must break up the instruction. (kind of like introducing variables when programming) 16

2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Breaking down an instruction ISA definition of arithmetic: Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] Reg[Memory[PC][20:16]] Could break down to: IR <= Memory[PC] A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] ALUOut <= A op B Reg[IR[20:16]] <= ALUOut We forgot an important part of the definition of arithmetic! PC <= PC + 4 op 17 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Idea behind multicycle approach We define each instruction from the ISA perspective (do this!) Break it down into steps following our rule that data flows through at most one major functional unit (e.g., balance work across steps) Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.) Finally try and pack as much work into each step (avoid unnecessary cycles) while also trying to share steps where possible

(minimizes control, helps to simplify solution) Result: Our books multicycle Implementation! 18 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Five Execution Steps Instruction Fetch Instruction Decode and Register Fetch Execution, Memory Address Computation, or Branch Completion Memory Access or R-type instruction completion Write-back step INSTRUCTIONS TAKE FROM 3 - 5 CYCLES! 19 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Step 1: Instruction Fetch Use PC to get instruction and put it in the Instruction Register. Increment the PC by 4 and put the result back in the PC. Can be described succinctly using RTL "Register-Transfer Language" IR <= Memory[PC]; PC <= PC + 4; Can we figure out the values of the control signals? What is the advantage of updating the PC now?

20 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Step 2: Instruction Decode and Register Fetch Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; ALUOut <= PC + (sign-extend(IR[15:0]) << 2); We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic) 21 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Step 3 (instruction dependent) ALU is performing one of three functions, based on instruction type Memory Reference: ALUOut <= A + sign-extend(IR[15:0]); R-type: ALUOut <= A op B; Branch: if (A==B) PC <= ALUOut; 22

2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Step 4 (R-type or memory-access) Loads and stores access memory MDR <= Memory[ALUOut]; or Memory[ALUOut] <= B; R-type instructions finish Reg[IR[15:11]] <= ALUOut; The write actually takes place at the end of the cycle on the edge 23 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Write-back step Reg[IR[20:16]] <= MDR; Which instruction needs this? PC 0 M u x 1 Address Memory MemData Write data Instruction [2016] Instruction [150] Instruction register Instruction [150] Memory data

register 0 M u x 1 Read register Morgan Kaufmann Publishers1 Instruction [2521] 0 M Instruction u x [1511] 1 0 M u x 1 16 Read data Morgan Kaufmann Publishers1 Read register Morgan Kaufmann Publishers2 Registers Write Read register data Morgan Kaufmann Publishers2 A 32 ALUOut 0 B 4

Write data Sign extend Zero ALU ALU result 1M u 2 x 3 Shift left Morgan Kaufmann Publishers2 24 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Summary: 25 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Simple Questions How many cycles will it take to execute this code? Label: lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label add $t5, $t2, $t3 sw $t5, 8($t3) ... #assume not What is going on during the 8th cycle of execution?

In what cycle does the actual addition of $t2 and $t3 takes place? 26 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers PCSource PCWriteCond PCWrite ALUOp Outputs IorD MemRead ALUSrcB Control ALUSrcA MemWrite MemtoReg Op [50] RegWrite IRWrite 0 RegDst 26 Instruction Morgan Kaufmann Publishers[25-0] PC 0 M u x 1

Instruction [3126] Address Memory MemData Write data Instruction [2016] Instruction [150] Instruction register Instruction [150] Memory data register 0 M u x 1 Read register Morgan Kaufmann Publishers1 Instruction [2521] Read data Morgan Kaufmann Publishers1 Read register Morgan Kaufmann Publishers2 Registers Write Read register data Morgan Kaufmann Publishers2 0 M Instruction u x [1511]

1 A 16 Sign extend 32 Instruction Morgan Kaufmann Publishers[50] 28 PC Morgan Kaufmann Publishers[3128] Zero ALU ALU result B 4 Write data 0 M u x 1 Shift left Morgan Kaufmann Publishers2 Shift left Morgan Kaufmann Publishers2 Jump address [310] 0 1M u 2 x 3

ALU control ALUOut M 1 u x 2 Review: finite state machines Finite state machines: a set of states and next state function (determined by current state and the input) output function (determined by current state and possibly input) Next state Current Morgan Kaufmann Publishersstate Next-state function Clock Inputs Output function Outputs Well use a Moore machine (output based only on current state) 28 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Implementing the Control Value of control signals is dependent upon: what instruction is being executed which step is being performed

Use the information weve accumulated to specify a finite state machine specify the finite state machine graphically, or use microprogramming Implementation can be derived from specification 29 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Graphical Specification of FSM Instruction Morgan Kaufmann Publishersfetch MemRead ALUSrcA Morgan Kaufmann Publishers= Morgan Kaufmann Publishers0 IorD Morgan Kaufmann Publishers= Morgan Kaufmann Publishers0 IRWrite ALUSrcB Morgan Kaufmann Publishers= Morgan Kaufmann Publishers01 ALUOp Morgan Kaufmann Publishers= Morgan Kaufmann Publishers00 PCWrite PCSource Morgan Kaufmann Publishers= Morgan Kaufmann Publishers00 0 Start Note: Instruction Morgan Kaufmann Publishersdecode/ register Morgan Kaufmann Publishersfetch 1 ALUSrcA Morgan Kaufmann Publishers= Morgan Kaufmann Publishers0 ALUSrcB Morgan Kaufmann Publishers= Morgan Kaufmann Publishers11 ALUOp Morgan Kaufmann Publishers= Morgan Kaufmann Publishers00 dont care if not mentioned asserted if name only otherwise exact value Memory Morgan Kaufmann Publishersaddress computation How many state bits will we need?

2 6 ALUSrcA Morgan Kaufmann Publishers= Morgan Kaufmann Publishers1 ALUSrcB Morgan Kaufmann Publishers= Morgan Kaufmann Publishers10 ALUOp Morgan Kaufmann Publishers= Morgan Kaufmann Publishers00 8 ALUSrcA Morgan Kaufmann Publishers= Morgan Kaufmann Publishers1 ALUSrcB Morgan Kaufmann Publishers= Morgan Kaufmann Publishers00 ALUOp Morgan Kaufmann Publishers= Morgan Kaufmann Publishers10 Memory access 3 Memory access 5 MemRead IorD Morgan Kaufmann Publishers= Morgan Kaufmann Publishers1 Branch completion Execution Jump completion 9 ALUSrcA Morgan Kaufmann Publishers= Morgan Kaufmann Publishers1 ALUSrcB Morgan Kaufmann Publishers= Morgan Kaufmann Publishers00 ALUOp Morgan Kaufmann Publishers= Morgan Kaufmann Publishers01 PCWriteCond PCSource Morgan Kaufmann Publishers= Morgan Kaufmann Publishers01 PCWrite PCSource Morgan Kaufmann Publishers= Morgan Kaufmann Publishers10 R-type Morgan Kaufmann Publisherscompletion 7 MemWrite IorD Morgan Kaufmann Publishers= Morgan Kaufmann Publishers1

RegDst Morgan Kaufmann Publishers= Morgan Kaufmann Publishers1 RegWrite MemtoReg Morgan Kaufmann Publishers= Morgan Kaufmann Publishers0 Memory Morgan Kaufmann Publishersread completon Morgan Kaufmann Publishersstep 4 RegDst Morgan Kaufmann Publishers= Morgan Kaufmann Publishers1 RegWrite MemtoReg Morgan Kaufmann Publishers= Morgan Kaufmann Publishers0 30 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Finite State Machine for Control Implementation: PCWrite PCWriteCond IorD MemRead MemWrite IRWrite Control Morgan Kaufmann Publisherslogic MemtoReg PCSource ALUOp Outputs ALUSrcB ALUSrcA RegWrite RegDst NS3 NS2 NS1 NS0 Instruction Morgan Kaufmann Publishersregister opcode Morgan Kaufmann Publishersfield S0 S1 S2

S3 Op0 Op1 Op2 Op3 Op4 Inputs Op5 State Morgan Kaufmann Publishersregister 31 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers PLA Implementation If I picked a horizontal or vertical line could you explain it? Op5 Op4 Op3 Op2 Op1 Op0 S3 S2 S1 S0 PCWrite PCWriteCond IorD MemRead MemWrite IRWrite MemtoReg PCSource1 PCSource0

ALUOp1 ALUOp0 ALUSrcB1 ALUSrcB0 ALUSrcA RegWrite RegDst NS3 NS2 NS1 NS0 32 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers ROM Implementation ROM = "Read Only Memory" values of memory locations are fixed ahead of time A ROM can be used to implement a truth table if the address is m-bits, we can address 2m entries in the ROM. our outputs are the bits of data that the address points to. m n 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1 0 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 1 0 1 m is the "height", and n is the "width" 33

2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers ROM Implementation How many inputs are there? 6 bits for opcode, 4 bits for state = 10 address lines (i.e., 210 = 1024 different addresses) How many outputs are there? 16 datapath-control outputs, 4 state bits = 20 outputs ROM is 210 x 20 = 20K bits Rather wasteful, since for lots of the entries, the outputs are the same i.e., opcode is often ignored (and a rather unusual size) 34 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Another Implementation Style Complex instructions: the "next state" is often current state + 1 Control Morgan Kaufmann Publishersunit PLA Morgan Kaufmann Publishersor Morgan Kaufmann PublishersROM Outputs Input PCWrite PCWriteCond IorD MemRead MemWrite IRWrite BWrite MemtoReg

PCSource ALUOp ALUSrcB ALUSrcA RegWrite RegDst AddrCtl 1 State Adder Address Morgan Kaufmann Publishersselect Morgan Kaufmann Publisherslogic Op[5 0] Instruction Morgan Kaufmann Publishersregister opcode Morgan Kaufmann Publishersfield 35 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Details Op 000000 000010 000100 100011 101011 Dispatch ROM 1 Opcode name R-format jmp beq lw sw Value 0110 1001 1000 0010 0010 Op 100011

101011 Dispatch ROM 2 Opcode name lw sw Value 0011 0101 PLA Morgan Kaufmann Publishersor Morgan Kaufmann PublishersROM 1 State Adder 3 Mux 2 1 AddrCtl 0 0 Dispatch Morgan Kaufmann PublishersROM Morgan Kaufmann Publishers2 Dispatch Morgan Kaufmann PublishersROM Morgan Kaufmann Publishers1 Address Morgan Kaufmann Publishersselect Morgan Kaufmann Publisherslogic Instruction Morgan Kaufmann Publishersregister opcode Morgan Kaufmann Publishersfield State number 0 1 2 3 4 5 6 7 8 9 Address-control action Use Morgan Kaufmann Publishersincremented Morgan Kaufmann Publishersstate Use Morgan Kaufmann Publishersdispatch Morgan Kaufmann PublishersROM Morgan Kaufmann Publishers1 Use Morgan Kaufmann Publishersdispatch Morgan Kaufmann PublishersROM Morgan Kaufmann Publishers2

Use Morgan Kaufmann Publishersincremented Morgan Kaufmann Publishersstate Replace Morgan Kaufmann Publishersstate Morgan Kaufmann Publishersnumber Morgan Kaufmann Publishersby Morgan Kaufmann Publishers0 Replace Morgan Kaufmann Publishersstate Morgan Kaufmann Publishersnumber Morgan Kaufmann Publishersby Morgan Kaufmann Publishers0 Use Morgan Kaufmann Publishersincremented Morgan Kaufmann Publishersstate Replace Morgan Kaufmann Publishersstate Morgan Kaufmann Publishersnumber Morgan Kaufmann Publishersby Morgan Kaufmann Publishers0 Replace Morgan Kaufmann Publishersstate Morgan Kaufmann Publishersnumber Morgan Kaufmann Publishersby Morgan Kaufmann Publishers0 Replace Morgan Kaufmann Publishersstate Morgan Kaufmann Publishersnumber Morgan Kaufmann Publishersby Morgan Kaufmann Publishers0 Value of AddrCtl 3 1 2 3 0 0 3 0 0 0 36 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Microprogramming Control Morgan Kaufmann Publishersunit Microcode Morgan Kaufmann Publishersmemory Outputs Input PCWrite PCWriteCond IorD MemRead MemWrite IRWrite BWrite MemtoReg PCSource ALUOp ALUSrcB ALUSrcA RegWrite RegDst AddrCtl

Datapath 1 Microprogram Morgan Kaufmann Publisherscounter Adder Address Morgan Kaufmann Publishersselect Morgan Kaufmann Publisherslogic Instruction Morgan Kaufmann Publishersregister opcode Morgan Kaufmann Publishersfield What are the microinstructions ? 37 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Microprogramming A specification methodology appropriate if hundreds of opcodes, modes, cycles, etc. signals specified symbolically using microinstructions Label Fetch Mem1 LW2 ALU control Add Add Add SRC1 PC PC A Register control SRC2 4 Extshft Read Extend

PCWrite Memory control Read Morgan Kaufmann PublishersPC ALU Read Morgan Kaufmann PublishersALU Write Morgan Kaufmann PublishersMDR SW2 Rformat1 Func Morgan Kaufmann Publisherscode A Write Morgan Kaufmann PublishersALU B Write Morgan Kaufmann PublishersALU BEQ1 JUMP1 Subt A B ALUOut-cond Jump Morgan Kaufmann Publishersaddress Sequencing Seq Dispatch Morgan Kaufmann Publishers1 Dispatch Morgan Kaufmann Publishers2 Seq Fetch Fetch Seq Fetch Fetch Fetch Will two implementations of the same architecture have the same microcode? What would a microassembler do? 38

2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Microinstruction format Field name ALU Morgan Kaufmann Publisherscontrol SRC1 SRC2 Value Add Subt Func Morgan Kaufmann Publisherscode PC A B 4 Extend Extshft Read ALUOp Morgan Kaufmann Publishers= Morgan Kaufmann Publishers10 ALUSrcA Morgan Kaufmann Publishers= Morgan Kaufmann Publishers0 ALUSrcA Morgan Kaufmann Publishers= Morgan Kaufmann Publishers1 ALUSrcB Morgan Kaufmann Publishers= Morgan Kaufmann Publishers00 ALUSrcB Morgan Kaufmann Publishers= Morgan Kaufmann Publishers01 ALUSrcB Morgan Kaufmann Publishers= Morgan Kaufmann Publishers10 ALUSrcB Morgan Kaufmann Publishers= Morgan Kaufmann Publishers11 Write Morgan Kaufmann PublishersALU RegWrite, RegDst Morgan Kaufmann Publishers= Morgan Kaufmann Publishers1, MemtoReg Morgan Kaufmann Publishers= Morgan Kaufmann Publishers0 RegWrite, RegDst Morgan Kaufmann Publishers= Morgan Kaufmann Publishers0, MemtoReg Morgan Kaufmann Publishers= Morgan Kaufmann Publishers1 MemRead, lorD Morgan Kaufmann Publishers= Morgan Kaufmann Publishers0 MemRead, lorD Morgan Kaufmann Publishers= Morgan Kaufmann Publishers1 MemWrite, lorD Morgan Kaufmann Publishers= Morgan Kaufmann Publishers1 PCSource Morgan Kaufmann Publishers= Morgan Kaufmann Publishers00 PCWrite PCSource Morgan Kaufmann Publishers= Morgan Kaufmann Publishers01, PCWriteCond

PCSource Morgan Kaufmann Publishers= Morgan Kaufmann Publishers10, PCWrite AddrCtl Morgan Kaufmann Publishers= Morgan Kaufmann Publishers11 AddrCtl Morgan Kaufmann Publishers= Morgan Kaufmann Publishers00 AddrCtl Morgan Kaufmann Publishers= Morgan Kaufmann Publishers01 AddrCtl Morgan Kaufmann Publishers= Morgan Kaufmann Publishers10 Register control Write Morgan Kaufmann PublishersMDR Read Morgan Kaufmann PublishersPC Memory Read Morgan Kaufmann PublishersALU Write Morgan Kaufmann PublishersALU ALU PC Morgan Kaufmann Publisherswrite Morgan Kaufmann Publisherscontrol ALUOut-cond jump Morgan Kaufmann Publishersaddress Sequencing Signals active ALUOp Morgan Kaufmann Publishers= Morgan Kaufmann Publishers00 ALUOp Morgan Kaufmann Publishers= Morgan Kaufmann Publishers01 Seq Fetch Dispatch Morgan Kaufmann Publishers1 Dispatch Morgan Kaufmann Publishers2 Comment Cause Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersALU Morgan Kaufmann Publishersto Morgan Kaufmann Publishersadd. Cause Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersALU Morgan Kaufmann Publishersto Morgan Kaufmann Publisherssubtract; Morgan Kaufmann Publishersthis Morgan Kaufmann Publishersimplements Morgan Kaufmann Publishersthe Morgan Kaufmann Publisherscompare Morgan Kaufmann Publishersfor branches. Use Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersinstruction's Morgan Kaufmann Publishersfunction Morgan Kaufmann Publisherscode Morgan Kaufmann Publishersto Morgan Kaufmann Publishersdetermine Morgan Kaufmann PublishersALU Morgan Kaufmann Publisherscontrol. Use Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersPC Morgan Kaufmann Publishersas Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersfirst Morgan Kaufmann PublishersALU Morgan Kaufmann Publishersinput. Register Morgan Kaufmann PublishersA Morgan Kaufmann Publishersis Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersfirst Morgan Kaufmann PublishersALU Morgan Kaufmann Publishersinput. Register Morgan Kaufmann PublishersB Morgan Kaufmann Publishersis Morgan Kaufmann Publishersthe Morgan Kaufmann Publisherssecond Morgan Kaufmann PublishersALU Morgan Kaufmann Publishersinput. Use Morgan Kaufmann Publishers4 Morgan Kaufmann Publishersas Morgan Kaufmann Publishersthe Morgan Kaufmann Publisherssecond Morgan Kaufmann PublishersALU Morgan Kaufmann Publishersinput. Use Morgan Kaufmann Publishersoutput Morgan Kaufmann Publishersof Morgan Kaufmann Publishersthe Morgan Kaufmann Publisherssign Morgan Kaufmann Publishersextension Morgan Kaufmann Publishersunit Morgan Kaufmann Publishersas Morgan Kaufmann Publishersthe Morgan Kaufmann Publisherssecond Morgan Kaufmann PublishersALU Morgan Kaufmann Publishersinput. Use Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersoutput Morgan Kaufmann Publishersof Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersshift-by-two Morgan Kaufmann Publishersunit Morgan Kaufmann Publishersas Morgan Kaufmann Publishersthe Morgan Kaufmann Publisherssecond Morgan Kaufmann PublishersALU Morgan Kaufmann Publishersinput. Read Morgan Kaufmann Publisherstwo Morgan Kaufmann Publishersregisters Morgan Kaufmann Publishersusing Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersrs Morgan Kaufmann Publishersand Morgan Kaufmann Publishersrt Morgan Kaufmann Publishersfields Morgan Kaufmann Publishersof Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersIR Morgan Kaufmann Publishersas Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersregister numbers Morgan Kaufmann Publishersand Morgan Kaufmann Publishersputting Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersdata Morgan Kaufmann Publishersinto Morgan Kaufmann Publishersregisters Morgan Kaufmann PublishersA Morgan Kaufmann Publishersand Morgan Kaufmann PublishersB. Write Morgan Kaufmann Publishersa Morgan Kaufmann Publishersregister Morgan Kaufmann Publishersusing Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersrd Morgan Kaufmann Publishersfield Morgan Kaufmann Publishersof Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersIR Morgan Kaufmann Publishersas Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersregister Morgan Kaufmann Publishersnumber Morgan Kaufmann Publishersand

the Morgan Kaufmann Publisherscontents Morgan Kaufmann Publishersof Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersALUOut Morgan Kaufmann Publishersas Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersdata. Write Morgan Kaufmann Publishersa Morgan Kaufmann Publishersregister Morgan Kaufmann Publishersusing Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersrt Morgan Kaufmann Publishersfield Morgan Kaufmann Publishersof Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersIR Morgan Kaufmann Publishersas Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersregister Morgan Kaufmann Publishersnumber Morgan Kaufmann Publishersand the Morgan Kaufmann Publisherscontents Morgan Kaufmann Publishersof Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersMDR Morgan Kaufmann Publishersas Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersdata. Read Morgan Kaufmann Publishersmemory Morgan Kaufmann Publishersusing Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersPC Morgan Kaufmann Publishersas Morgan Kaufmann Publishersaddress; Morgan Kaufmann Publisherswrite Morgan Kaufmann Publishersresult Morgan Kaufmann Publishersinto Morgan Kaufmann PublishersIR Morgan Kaufmann Publishers(and Morgan Kaufmann Publishers the Morgan Kaufmann PublishersMDR). Read Morgan Kaufmann Publishersmemory Morgan Kaufmann Publishersusing Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersALUOut Morgan Kaufmann Publishersas Morgan Kaufmann Publishersaddress; Morgan Kaufmann Publisherswrite Morgan Kaufmann Publishersresult Morgan Kaufmann Publishersinto Morgan Kaufmann PublishersMDR. Write Morgan Kaufmann Publishersmemory Morgan Kaufmann Publishersusing Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersALUOut Morgan Kaufmann Publishersas Morgan Kaufmann Publishersaddress, Morgan Kaufmann Publisherscontents Morgan Kaufmann Publishersof Morgan Kaufmann PublishersB Morgan Kaufmann Publishersas Morgan Kaufmann Publishersthe data. Write Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersoutput Morgan Kaufmann Publishersof Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersALU Morgan Kaufmann Publishersinto Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersPC. If Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersZero Morgan Kaufmann Publishersoutput Morgan Kaufmann Publishersof Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersALU Morgan Kaufmann Publishersis Morgan Kaufmann Publishersactive, Morgan Kaufmann Publisherswrite Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersPC Morgan Kaufmann Publisherswith Morgan Kaufmann Publishersthe Morgan Kaufmann Publisherscontents of Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersregister Morgan Kaufmann PublishersALUOut. Write Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersPC Morgan Kaufmann Publisherswith Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersjump Morgan Kaufmann Publishersaddress Morgan Kaufmann Publishersfrom Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersinstruction. Choose Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersnext Morgan Kaufmann Publishersmicroinstruction Morgan Kaufmann Publisherssequentially. Go Morgan Kaufmann Publishersto Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersfirst Morgan Kaufmann Publishersmicroinstruction Morgan Kaufmann Publishersto Morgan Kaufmann Publishersbegin Morgan Kaufmann Publishersa Morgan Kaufmann Publishersnew Morgan Kaufmann Publishersinstruction. Dispatch Morgan Kaufmann Publishersusing Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersROM Morgan Kaufmann Publishers1. Dispatch Morgan Kaufmann Publishersusing Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersROM Morgan Kaufmann Publishers2. 39 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Maximally vs. Minimally Encoded No encoding: 1 bit for each datapath operation faster, requires more memory (logic) used for Vax 780 an astonishing 400K of memory! Lots of encoding: send the microinstructions through logic to get control signals uses less memory, slower Historical context of CISC: Too much logic to put on a single chip with everything else Use a ROM (or even RAM) to hold the microcode Its easy to add new instructions 40 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Microcode: Trade-offs

Distinction between specification and implementation is sometimes blurred Specification Advantages: Easy to design and write Design architecture and microcode in parallel Implementation (off-chip ROM) Advantages Easy to change since values are in memory Can emulate other architectures Can make use of internal registers Implementation Disadvantages, SLOWER now that: Control is implemented on same chip as processor ROM is no longer faster than RAM No need to go back and make changes 41 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Historical Perspective In the 60s and 70s microprogramming was very important for implementing machines This led to more sophisticated ISAs and the VAX In the 80s RISC processors based on pipelining became popular Pipelining the microinstructions is also possible! Implementations of IA-32 architecture processors since 486 use: hardwired control for simpler instructions (few cycles, FSM control implemented using PLA or random logic) microcoded control for more complex instructions (large numbers of cycles, central control store)

The IA-64 architecture uses a RISC-style ISA and can be implemented without a large central control store 42 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Pentium 4 Pipelining is important (last IA-32 without it was 80386 in 1985) Control Control I/O interface Chapter 7 Instruction Morgan Kaufmann Publisherscache Data cache Enhanced floating Morgan Kaufmann Publisherspoint and Morgan Kaufmann Publishersmultimedia Integer datapath Control Advanced Morgan Kaufmann Publisherspipelining hyperthreading Morgan Kaufmann Publisherssupport Secondary cache and memory interface Chapter 6 Control Pipelining is used for the simple instructions favored by compilers Simply put, a high performance implementation needs to ensure that the simple

instructions execute quickly, and that the burden of the complexities of the instruction set penalize the complex, less frequently used, instructions 43 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Pentium 4 Somewhere in all that control we must handle complex instructions Control Control I/O interface Instruction Morgan Kaufmann Publisherscache Data cache Enhanced floating Morgan Kaufmann Publisherspoint and Morgan Kaufmann Publishersmultimedia Integer datapath Control Advanced Morgan Kaufmann Publisherspipelining hyperthreading Morgan Kaufmann Publisherssupport Secondary cache and memory interface Control Processor executes simple microinstructions, 70 bits wide (hardwired) 120 control lines for integer datapath (400 for floating point) If an instruction requires more than 4 microinstructions to implement,

control from microcode ROM (8000 microinstructions) Its complicated! 44 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Chapter 5 Summary If we understand the instructions We can build a simple processor! If instructions take different amounts of time, multi-cycle is better Datapath implemented using: Combinational logic for arithmetic State holding elements to remember bits Control implemented using: Combinational logic for single-cycle implementation Finite state machine for multi-cycle implementation 45 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers DONE 46 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Chapter Six -- Pipelining 47 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Pipelining

Improve performance by increasing instruction throughput Program execution Time order (in Morgan Kaufmann Publishersinstructions) 200 lw Morgan Kaufmann Publishers Morgan Kaufmann Publishers$1, Morgan Kaufmann Publishers100($0) Instruction fetch Reg lw Morgan Kaufmann Publishers Morgan Kaufmann Publishers$2, Morgan Kaufmann Publishers Morgan Kaufmann Publishers200($0) 400 600 Data access ALU 800 1000 1200 1400 ALU Data access 1600 1800 Reg Instruction Reg fetch 800 Morgan Kaufmann Publishersps lw Morgan Kaufmann Publishers Morgan Kaufmann Publishers$3, Morgan Kaufmann Publishers Morgan Kaufmann Publishers300($0) Reg

Instruction fetch 800 Morgan Kaufmann Publishersps Note: timing assumptions changed for this example 800 Morgan Kaufmann Publishersps Program execution Time order (in Morgan Kaufmann Publishersinstructions) lw Morgan Kaufmann Publishers Morgan Kaufmann Publishers$1, Morgan Kaufmann Publishers100($0) 200 Instruction fetch 400 Reg lw Morgan Kaufmann Publishers Morgan Kaufmann Publishers$2, Morgan Kaufmann Publishers Morgan Kaufmann Publishers200($0) 200 Morgan Kaufmann Publishersps Instruction fetch lw Morgan Kaufmann Publishers Morgan Kaufmann Publishers$3, Morgan Kaufmann Publishers Morgan Kaufmann Publishers300($0) 200 Morgan Kaufmann Publishersps 600 ALU Reg Instruction fetch 800 Data access ALU Reg 1000

1200 1400 Reg Data access ALU Reg Data access Reg 200 Morgan Kaufmann Publishersps 200 Morgan Kaufmann Publishersps 200 Morgan Kaufmann Publishersps 200 Morgan Kaufmann Publishersps 200 Morgan Kaufmann Publishersps Ideal speedup is number of stages in the pipeline. Do we achieve this? 48 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Pipelining What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores What makes it hard? structural hazards: suppose we had only one memory control hazards: need to worry about branch instructions data hazards: an instruction depends on a previous instruction Well build a simple pipeline and look at these issues Well talk about modern processors and what really makes it hard: exception handling trying to improve performance with out-of-order execution, etc.

49 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Basic Idea IF: Morgan Kaufmann PublishersInstruction Morgan Kaufmann Publishersfetch ID: Morgan Kaufmann PublishersInstruction Morgan Kaufmann Publishersdecode/ register Morgan Kaufmann Publishersfile Morgan Kaufmann Publishersread EX: Morgan Kaufmann PublishersExecute/ address Morgan Kaufmann Publisherscalculation MEM: Morgan Kaufmann PublishersMemory Morgan Kaufmann Publishersaccess WB: Morgan Kaufmann PublishersWrite Morgan Kaufmann Publishersback Add 4 Shift left Morgan Kaufmann Publishers2 P C Address Instruction Instruction memory Read Read register Morgan Kaufmann Publishers1 data Morgan Kaufmann Publishers1 Read register Morgan Kaufmann Publishers2 Registers Write Read register data Morgan Kaufmann Publishers2 Write data 16 ADD Add result

Zero ALU ALU result Address Read data Data Memory Write data Sign 32 extend What do we need to add to actually split the datapath into stages? 50 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Pipelined Datapath IF/ID ID/EX EX/MEM MEM/WB Add 4 Shift left Morgan Kaufmann Publishers2 PC Address Instruction memory Add Add result

Read register Morgan Kaufmann Publishers1 Read data Morgan Kaufmann Publishers1 Read register Morgan Kaufmann Publishers2 Registers Read Write data Morgan Kaufmann Publishers2 register Zero ALU ALU result Read data Address Data memory Write data Write data 16 Sign extend 32 Can you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem? 51 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Corrected Datapath IF/ID ID/EX

EX/MEM MEM/WB Add 4 Shift left Morgan Kaufmann Publishers2 PC Address Instruction memory Add Add result Read register Morgan Kaufmann Publishers1 Read data Morgan Kaufmann Publishers1 Read register Morgan Kaufmann Publishers2 Registers Read Write data Morgan Kaufmann Publishers2 register Zero ALU ALU result Read data Address Data memory Write data Write data 16

Sign extend 32 52 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Graphically Representing Pipelines Time Morgan Kaufmann Publishers(in Morgan Kaufmann Publishersclock Morgan Kaufmann Publisherscycles) Program execution order (in Morgan Kaufmann Publishersinstructions) lw Morgan Kaufmann Publishers$1, Morgan Kaufmann Publishers100($0) lw Morgan Kaufmann Publishers$2, Morgan Kaufmann Publishers200($0) lw Morgan Kaufmann Publishers$3, Morgan Kaufmann Publishers300($0) CC Morgan Kaufmann Publishers1 CC Morgan Kaufmann Publishers2 IM Reg IM CC Morgan Kaufmann Publishers3 ALU Reg IM CC Morgan Kaufmann Publishers4 CC Morgan Kaufmann Publishers5 DM

Reg ALU DM Reg ALU DM Reg CC Morgan Kaufmann Publishers6 CC7 Reg Can help with answering questions like: how many cycles does it take to execute this code? what is the ALU doing during cycle 4? use this representation to help understand datapaths 53 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Pipeline Control PCSrc IF/ID ID/EX EX/MEM MEM/WB Add Add Add result 4 Shift left Morgan Kaufmann Publishers2

Branch RegWrite PC Address Instruction memory Read register Morgan Kaufmann Publishers1 Read data Morgan Kaufmann Publishers1 Read register Morgan Kaufmann Publishers2 Registers Read Write data Morgan Kaufmann Publishers2 register MemWrite ALUSrc Zero Add ALU result MemtoReg Read data Address Data memory Write data Write data Instruction (150) Instruction (2016)

16 Sign extend 32 6 ALU control MemRead ALUOp Instruction (1511) RegDst 54 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Pipeline control We have 5 stages. What needs to be controlled in each stage? Instruction Fetch and PC Increment Instruction Decode / Register Fetch Execution Memory Stage Write Back How would control be handled in an automobile plant? a fancy control center telling everyone what to do? should we use a finite state machine? 55 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Pipeline Control Pass control signals along just like the data

Instruction R-format lw sw beq Execution/Address Calculation Memory access stage stage control lines control lines Reg ALU ALU ALU Mem Mem Dst Op1 Op0 Src Branch Read Write 1 1 0 0 0 0 0 0 0 0 1 0 1 0 X 0 0 1 0 0 1 X 0 1 0 1 0 0

Write-back stage control lines Reg Mem to write Reg 1 0 1 1 0 X 0 X WB Instruction IF/ID Control M WB EX M WB ID/EX EX/MEM MEM/WB 56 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Datapath with Control PCSrc ID/EX

Control IF/ID WB EX/MEM M WB EX M MEM/WB WB Add 4 Shift left Morgan Kaufmann Publishers2 PC Address Instruction memory Add Add result Branch ALUSrc Read register Morgan Kaufmann Publishers1 Read data Morgan Kaufmann Publishers1 Read register Morgan Kaufmann Publishers2 Registers Read Write

data Morgan Kaufmann Publishers2 register Zero ALU ALU result Read data Address Data memory Write data Write data Instruction [150] Instruction [2016] 16 Sign extend 32 6 ALU control MemRead ALUOp Instruction [1511] RegDst 57 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Dependencies

Problem with starting next instruction before first is finished dependencies that go backward in time are data hazards Time Morgan Kaufmann Publishers(in Morgan Kaufmann Publishersclock Morgan Kaufmann Publisherscycles) CC Morgan Kaufmann Publishers1 CC Morgan Kaufmann Publishers2 CC Morgan Kaufmann Publishers3 10 10 10 IM Reg Value Morgan Kaufmann Publishersof register Morgan Kaufmann Publishers$2: CC Morgan Kaufmann Publishers4 CC Morgan Kaufmann Publishers5 CC Morgan Kaufmann Publishers6 CC Morgan Kaufmann Publishers7 CC Morgan Kaufmann Publishers8 CC Morgan Kaufmann Publishers9 10 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers10/20 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers20 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers20 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers20 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers20 Program execution order (in Morgan Kaufmann Publishersinstructions) sub $2, Morgan Kaufmann Publishers$1, Morgan Kaufmann Publishers$3 and Morgan Kaufmann Publishers$12, $2, Morgan Kaufmann Publishers$5 or Morgan Kaufmann Publishers$13, Morgan Kaufmann Publishers$6, $2 add Morgan Kaufmann Publishers$14, $2, $2

sw Morgan Kaufmann Publishers$15, Morgan Kaufmann Publishers100($2) IM DM Reg IM Reg DM Reg IM Reg DM Reg IM Reg DM Reg Reg DM Reg 58 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Software Solution Have compiler guarantee no hazards

Where do we insert the nops ? sub and or add sw $2, $1, $3 $12, $2, $5 $13, $6, $2 $14, $2, $2 $15, 100($2) Problem: this really slows us down! 59 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Forwarding Use temporary results, dont wait for them to be written register file forwarding to handle read/write to same register ALU forwarding Time Morgan Kaufmann Publishers(in Morgan Kaufmann Publishersclock Morgan Kaufmann Publisherscycles) CC Morgan Kaufmann Publishers1 CC Morgan Kaufmann Publishers2 Value Morgan Kaufmann Publishersof Morgan Kaufmann Publishersregister Morgan Kaufmann Publishers$2: 10 10 Value Morgan Kaufmann Publishersof Morgan Kaufmann PublishersEX/MEM: Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann PublishersX Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann PublishersX Value Morgan Kaufmann Publishersof Morgan Kaufmann PublishersMEM/WB: Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann PublishersX Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann PublishersX CC Morgan Kaufmann Publishers3 10 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann PublishersX Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann PublishersX CC Morgan Kaufmann Publishers4 CC Morgan Kaufmann Publishers5 CC Morgan Kaufmann Publishers6

CC Morgan Kaufmann Publishers7 CC Morgan Kaufmann Publishers8 CC Morgan Kaufmann Publishers9 10 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers10/20 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers20 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers20 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers20 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers20 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers20 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann PublishersX Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann PublishersX Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann PublishersX Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann PublishersX Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann PublishersX Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann PublishersX Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers20 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann PublishersX Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann PublishersX Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann PublishersX Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann PublishersX Program execution order (in Morgan Kaufmann Publishersinstructions) sub $2, Morgan Kaufmann Publishers$1, Morgan Kaufmann Publishers$3 and Morgan Kaufmann Publishers$12, $2, Morgan Kaufmann Publishers$5 or Morgan Kaufmann Publishers$13, Morgan Kaufmann Publishers$6, $2 add Morgan Kaufmann Publishers$14,$2 , $2 sw Morgan Kaufmann Publishers$15, Morgan Kaufmann Publishers100($2) what if this $2 was $13? IM Reg IM DM Reg IM Reg

DM Reg IM Reg DM Reg IM Reg DM Reg Reg DM Reg 60 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Forwarding The main idea (some details not shown) ID/EX EX/MEM MEM/WB M u x ForwardA Registers

ALU M u x Data memory M u x ForwardB Rs Rt Rt Rd EX/MEM.RegisterRd M u x Forwarding unit MEM/WB.RegisterRd 61 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Can't always forward Load word can still cause a hazard: an instruction tries to read a register following a load instruction that writes to the same register. Time Morgan Kaufmann Publishers(in Morgan Kaufmann Publishersclock Morgan Kaufmann Publisherscycles) CC Morgan Kaufmann Publishers1 CC Morgan Kaufmann Publishers2 CC Morgan Kaufmann Publishers3 CC Morgan Kaufmann Publishers4

CC Morgan Kaufmann Publishers5 DM Reg CC Morgan Kaufmann Publishers6 CC Morgan Kaufmann Publishers7 CC Morgan Kaufmann Publishers8 CC Morgan Kaufmann Publishers9 Program execution order (in Morgan Kaufmann Publishersinstructions) lw $2, Morgan Kaufmann Publishers20($1) and $4, $2, Morgan Kaufmann Publishers$5 or Morgan Kaufmann Publishers$8, $2, Morgan Kaufmann Publishers$6 add Morgan Kaufmann Publishers$9, $4, $2 IM Reg IM Reg IM DM Reg IM Reg DM

Reg Reg DM Reg Thus, we need a hazard detection unit to stall the load instruction slt Morgan Kaufmann Publishers$1, Morgan Kaufmann Publishers$6, Morgan Kaufmann Publishers$7 IM Reg DM Reg 62 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Stalling We can stall the pipeline by keeping an instruction in the same stage Time Morgan Kaufmann Publishers(in Morgan Kaufmann Publishersclock Morgan Kaufmann Publisherscycles) CC Morgan Kaufmann Publishers1 CC Morgan Kaufmann Publishers2 CC Morgan Kaufmann Publishers3 CC Morgan Kaufmann Publishers4 CC Morgan Kaufmann Publishers5 Reg DM Reg CC Morgan Kaufmann Publishers6 CC Morgan Kaufmann Publishers7 CC Morgan Kaufmann Publishers8

CC Morgan Kaufmann Publishers9 CC Morgan Kaufmann Publishers10 Program execution order (in Morgan Kaufmann Publishersinstructions) lw $2, Morgan Kaufmann Publishers20($1) IM bubble and Morgan Kaufmann Publishersbecomes Morgan Kaufmann Publishersnop add Morgan Kaufmann Publishers$4, Morgan Kaufmann Publishers$2, Morgan Kaufmann Publishers$5 or Morgan Kaufmann Publishers$8, Morgan Kaufmann Publishers$2, Morgan Kaufmann Publishers$6 add Morgan Kaufmann Publishers$9, Morgan Kaufmann Publishers$4, Morgan Kaufmann Publishers$2 IM Reg IM DM Reg IM Reg DM DM Reg IM Reg Reg Reg

DM Reg 63 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Hazard Detection Unit Stall by letting an instruction that wont write anything go forward Hazard detection unit ID/EX.MemRead ID/EX WB M u x Control 0 IF/ID EX/MEM M WB EX M MEM/WB WB M u x Registers ALU

PC Instruction memory M u x Data memory M u x IF/ID.RegisterRs IF/ID.RegisterRt IF/ID.RegisterRt Rt IF/ID.RegisterRd Rd M u x ID/EX.RegisterRt Rs Rt Forwarding unit 64 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Branch Hazards When we decide to branch, other instructions are in the pipeline! Time Morgan Kaufmann Publishers(in Morgan Kaufmann Publishersclock Morgan Kaufmann Publisherscycles) CC Morgan Kaufmann Publishers1

CC Morgan Kaufmann Publishers2 CC Morgan Kaufmann Publishers3 CC Morgan Kaufmann Publishers4 CC Morgan Kaufmann Publishers5 DM Reg CC Morgan Kaufmann Publishers6 CC Morgan Kaufmann Publishers7 CC Morgan Kaufmann Publishers8 CC Morgan Kaufmann Publishers9 Program execution order (in Morgan Kaufmann Publishersinstructions) 40 Morgan Kaufmann Publishersbeq Morgan Kaufmann Publishers$1, Morgan Kaufmann Publishers$3, Morgan Kaufmann Publishers28 44 Morgan Kaufmann Publishersand Morgan Kaufmann Publishers$12, Morgan Kaufmann Publishers$2, Morgan Kaufmann Publishers$5 48 Morgan Kaufmann Publishersor Morgan Kaufmann Publishers$13, Morgan Kaufmann Publishers$6, Morgan Kaufmann Publishers$2 52 Morgan Kaufmann Publishersadd Morgan Kaufmann Publishers$14, Morgan Kaufmann Publishers$2, Morgan Kaufmann Publishers$2 72 Morgan Kaufmann Publisherslw Morgan Kaufmann Publishers$4, Morgan Kaufmann Publishers50($7) IM Reg IM Reg IM DM

Reg IM Reg DM Reg IM Reg DM Reg Reg DM Reg We are predicting branch not taken need to add hardware for flushing instructions if we are wrong 65 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Flushing Instructions IF.Flush Hazard detection unit ID/EX WB Control 0 IF/ID M u x +

EX/MEM M WB EX/MEM EX M WB + 4 M u x PC Shift left Morgan Kaufmann Publishers2 Registers Instruction memory = M u x ALU M u x Data memory M u x

Sign extend M u x Fowarding unit Note: weve also moved branch decision to ID stage 66 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Branches If the branch is taken, we have a penalty of one cycle For our simple design, this is reasonable With deeper pipelines, penalty increases and static branch prediction drastically hurts performance Solution: dynamic branch prediction Taken Not Morgan Kaufmann Publisherstaken Predict Morgan Kaufmann Publisherstaken Predict Morgan Kaufmann Publisherstaken Taken Not Morgan Kaufmann Publisherstaken Taken Not Morgan Kaufmann Publisherstaken Predict Morgan Kaufmann Publishersnot Morgan Kaufmann Publisherstaken Predict Morgan Kaufmann Publishersnot Morgan Kaufmann Publisherstaken Taken Not Morgan Kaufmann Publisherstaken A 2-bit prediction scheme 67 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Branch Prediction

Sophisticated Techniques: A branch target buffer to help us look up the destination Correlating predictors that base prediction on global behavior and recently executed branches (e.g., prediction for a specific branch instruction based on what happened in previous branches) Tournament predictors that use different types of prediction strategies and keep track of which one is performing best. A branch delay slot which the compiler tries to fill with a useful instruction (make the one cycle delay part of the ISA) Branch prediction is especially important because it enables other more advanced pipelining techniques to be effective! Modern processors predict correctly 95% of the time! 68 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Improving Performance Try and avoid stalls! E.g., reorder these instructions: lw lw sw sw $t0, $t2, $t2, $t0, 0($t1) 4($t1) 0($t1) 4($t1) Dynamic Pipeline Scheduling Hardware chooses which instructions to execute next Will execute instructions out of order (e.g., doesnt wait for a

dependency to be resolved, but rather keeps going!) Speculates on branches and keeps the pipeline full (may need to rollback if prediction incorrect) Trying to exploit instruction-level parallelism 69 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Advanced Pipelining Increase the depth of the pipeline Start more than one instruction each cycle (multiple issue) Loop unrolling to expose more ILP (better scheduling) Superscalar processors DEC Alpha 21264: 9 stage pipeline, 6 instruction issue All modern processors are superscalar and issue multiple instructions usually with some limitations (e.g., different pipes) VLIW: very long instruction word, static multiple issue (relies more on compiler technology) This class has given you the background you need to learn more! 70 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Chapter 6 Summary Pipelining does not improve latency, but does improve throughput Deeply

pipelined Multicycle (Section Morgan Kaufmann Publishers5.5) Pipelined Multiple Morgan Kaufmann Publishersissue with Morgan Kaufmann Publishersdeep Morgan Kaufmann Publisherspipeline (Section Morgan Kaufmann Publishers6.10) Multiple Morgan Kaufmann Publishersissue with Morgan Kaufmann Publishersdeep Morgan Kaufmann Publisherspipeline (Section Morgan Kaufmann Publishers6.10) Multiple-issue pipelined (Section Morgan Kaufmann Publishers6.9) Multiple-issue pipelined (Section Morgan Kaufmann Publishers6.9) Single-cycle (Section Morgan Kaufmann Publishers5.4) Deeply pipelined Multicycle (Section Morgan Kaufmann Publishers5.5) Single-cycle (Section Morgan Kaufmann Publishers5.4) Slower Pipelined Faster Instructions Morgan Kaufmann Publishersper Morgan Kaufmann Publishersclock Morgan Kaufmann Publishers(IPC Morgan Kaufmann Publishers= Morgan Kaufmann Publishers1/CPI) 1 Several Use Morgan Kaufmann Publisherslatency Morgan Kaufmann Publishersin Morgan Kaufmann Publishersinstructions 71

2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Chapter Seven 72 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be refreshed) very small but slower than SRAM (factor of 5 to 10) Word Morgan Kaufmann Publishersline A A B B Pass Morgan Kaufmann Publisherstransistor Capacitor Bit Morgan Kaufmann Publishersline 73 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Exploiting Memory Hierarchy Users want large and fast memories! SRAM access times are .5 5ns at cost of $4000 to $10,000 per GB. DRAM access times are 50-70ns at cost of $100 to $200 per GB. Disk access times are 5 to 20 million ns at cost of $.50 to $2 per GB.

2004 Try and give it to them anyway build a memory hierarchy CPU Level Morgan Kaufmann Publishers1 Levels Morgan Kaufmann Publishersin Morgan Kaufmann Publishersthe memory Morgan Kaufmann Publishershierarchy Increasing Morgan Kaufmann Publishersdistance from Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersCPU Morgan Kaufmann Publishersin access Morgan Kaufmann Publisherstime Level Morgan Kaufmann Publishers2 Level Morgan Kaufmann Publishersn Size Morgan Kaufmann Publishersof Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersmemory Morgan Kaufmann Publishersat Morgan Kaufmann Publisherseach Morgan Kaufmann Publisherslevel 74 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be referenced soon. Why does code have locality? Our initial focus: two levels (upper, lower) block: minimum unit of data hit: data requested is in the upper level miss: data requested is not in the upper level 75 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers

Cache Two issues: How do we know if a data item is in the cache? If it is, how do we find it? Our first example: block size is one word of data "direct mapped" For each item of data at the lower level, there is exactly one location in the cache where it might be. e.g., lots of items at the lower level share locations in the upper level 76 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Direct Mapped Cache Mapping: address is modulo the number of blocks in the cache Cache 000 001 010 011 100 101 110 111 00001 00101 01001 01101 10001 10101 11001

11101 Memory 77 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Direct Mapped Cache For MIPS: Address Morgan Kaufmann Publishers(showing Morgan Kaufmann Publishersbit Morgan Kaufmann Publisherspositions) 31 Morgan Kaufmann Publishers30 Hit Tag 13 Morgan Kaufmann Publishers12 Morgan Kaufmann Publishers11 20 2 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers1 Morgan Kaufmann Publishers0 Byte offset 10 Data Index Index 0 1 2 Valid Tag Data 1021 1022 1023 20 32 =

What kind of locality are we taking advantage of? 78 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Direct Mapped Cache Taking advantage of spatial locality: Address Morgan Kaufmann Publishers(showing Morgan Kaufmann Publishersbit Morgan Kaufmann Publisherspositions) 31 Hit 14 Morgan Kaufmann Publishers13 18 Tag 6 Morgan Kaufmann Publishers5 8 2 Morgan Kaufmann Publishers1 Morgan Kaufmann Publishers0 4 Byte offset Data Block Morgan Kaufmann Publishersoffset Index 18 Morgan Kaufmann Publishersbits V 512 Morgan Kaufmann Publishersbits Tag Data 256 entries

16 32 32 32 = Mux 32 79 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Hits vs. Misses Read hits this is what we want! Read misses stall the CPU, fetch block from memory, deliver to cache, restart Write hits: can replace data in cache and memory (write-through) write the data only into the cache (write-back the cache later) Write misses: read the entire block into the cache, then write the word 80 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Hardware Issues Make reading multiple words easier by using banks of memory CPU

CPU CPU Multiplexor Cache Cache Cache Bus Bus Memory b. Morgan Kaufmann PublishersWide Morgan Kaufmann Publishersmemory Morgan Kaufmann Publishersorganization Bus Memory bank Morgan Kaufmann Publishers0 Memory bank Morgan Kaufmann Publishers1 Memory bank Morgan Kaufmann Publishers2 Memory bank Morgan Kaufmann Publishers3 c. Morgan Kaufmann PublishersInterleaved Morgan Kaufmann Publishersmemory Morgan Kaufmann Publishersorganization Memory a. One-word-wide memory Morgan Kaufmann Publishersorganization It can get a lot more complicated... 81 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Performance

Increasing the block size tends to decrease miss rate: 40% 35% Miss Morgan Kaufmann Publishersrate 30% 25% 20% 15% 10% 5% 0% 4 16 64 Block Morgan Kaufmann Publisherssize Morgan Kaufmann Publishers(bytes) 256 1 Morgan Kaufmann PublishersKB 8 Morgan Kaufmann PublishersKB 16 Morgan Kaufmann PublishersKB 64 Morgan Kaufmann PublishersKB 256 Morgan Kaufmann PublishersKB Use split caches because there is more spatial locality in code: Program gcc spice Block size in words 1 4 1 4 Instruction miss rate 6.1% 2.0%

1.2% 0.3% Data miss rate 2.1% 1.7% 1.3% 0.6% Effective combined miss rate 5.4% 1.9% 1.2% 0.4% 82 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Performance Simplified model: execution time = (execution cycles + stall cycles) cycle time stall cycles = # of instructions miss ratio miss penalty Two ways of improving performance: decreasing the miss ratio decreasing the miss penalty What happens if we increase block size? 83 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Decreasing miss ratio with associativity One-way Morgan Kaufmann Publishersset Morgan Kaufmann Publishersassociative (direct Morgan Kaufmann Publishersmapped) Block Tag Data 0

1 2 3 4 5 6 Two-way Morgan Kaufmann Publishersset Morgan Kaufmann Publishersassociative Set Tag Data Tag Data 0 1 2 3 7 Four-way Morgan Kaufmann Publishersset Morgan Kaufmann Publishersassociative Set Tag Data Tag Data Tag Data Tag Data 0 1 Eight-way Morgan Kaufmann Publishersset Morgan Kaufmann Publishersassociative Morgan Kaufmann Publishers(fully Morgan Kaufmann Publishersassociative) Compared to direct mapped, give a series of references that: Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data results in a lower miss ratio using a 2-way set associative cache results in a higher miss ratio using a 2-way set associative cache assuming we use the least recently used replacement strategy 84 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers An implementation Address 31 30 12 11 10 9 8 8 22 Index

0 1 2 V Tag Data V 3210 Tag Data V Tag Data V Tag Data 253 254 255 22 32 4-to-1 Morgan Kaufmann Publishersmultiplexor Hit Data 85 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Performance

15% 1 Morgan Kaufmann PublishersKB 12% 2 Morgan Kaufmann PublishersKB 9% 4 Morgan Kaufmann PublishersKB 6% 8 Morgan Kaufmann PublishersKB 16 Morgan Kaufmann PublishersKB 32 Morgan Kaufmann PublishersKB 3% 64 Morgan Kaufmann PublishersKB 128 Morgan Kaufmann PublishersKB 0 One-way Two-way Four-way Eight-way Associativity 86 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Decreasing miss penalty with multilevel caches Add a second level cache: often primary cache is on the same chip as the processor use SRAMs to add another cache above primary memory (DRAM) miss penalty goes down if data is in 2nd level cache Example:

CPI of 1.0 on a 5 Ghz machine with a 5% miss rate, 100ns DRAM access Adding 2nd level cache with 5ns access time decreases miss rate to .5% Using multilevel caches: try and optimize the hit time on the 1st level cache try and optimize the miss rate on the 2nd level cache 87 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Cache Complexities Not always easy to understand implications of caches: 1200 2000 Radix Morgan Kaufmann Publisherssort 1000 Radix Morgan Kaufmann Publisherssort 1600 800 1200 600 800 400 200 Quicksort 400 0 Quicksort 0 4

8 16 32 64 128 256 512 1024 2048 4096 Size Morgan Kaufmann Publishers(K Morgan Kaufmann Publishersitems Morgan Kaufmann Publishersto Morgan Kaufmann Publisherssort) Theoretical behavior of Radix sort vs. Quicksort 4 8 16 32 64 128 256 512 1024 2048 4096 Size Morgan Kaufmann Publishers(K Morgan Kaufmann Publishersitems Morgan Kaufmann Publishersto Morgan Kaufmann Publisherssort) Observed behavior of Radix sort vs. Quicksort 88 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Cache Complexities Here is why:

5 Radix Morgan Kaufmann Publisherssort 4 3 2 1 Quicksort 0 4 8 16 32 64 128 256 512 1024 2048 4096 Size Morgan Kaufmann Publishers(K Morgan Kaufmann Publishersitems Morgan Kaufmann Publishersto Morgan Kaufmann Publisherssort) Memory system performance is often critical factor multilevel caches, pipelined processors, make it harder to predict outcomes Compiler optimizations to increase locality sometimes hurt ILP Difficult to predict best algorithm: need experimental data 89 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Virtual Memory Main memory can act as a cache for the secondary storage (disk) Virtual Morgan Kaufmann Publishersaddresses Physical Morgan Kaufmann Publishersaddresses Address Morgan Kaufmann Publisherstranslation

Disk Morgan Kaufmann Publishersaddresses Advantages: illusion of having more physical memory program relocation protection 90 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Pages: virtual memory blocks Page faults: the data is not in memory, retrieve it from disk huge miss penalty, thus pages should be fairly large (e.g., 4KB) reducing page faults is important (LRU is worth the price) can handle the faults in software instead of hardware using write-through is too expensive so we use writeback Virtual Morgan Kaufmann Publishersaddress 31 Morgan Kaufmann Publishers30 Morgan Kaufmann Publishers29 Morgan Kaufmann Publishers28 Morgan Kaufmann Publishers27 15 Morgan Kaufmann Publishers14 Morgan Kaufmann Publishers13 Morgan Kaufmann Publishers12 Morgan Kaufmann Publishers11 Morgan Kaufmann Publishers10 Morgan Kaufmann Publishers9 Morgan Kaufmann Publishers8 3 Morgan Kaufmann Publishers2 Morgan Kaufmann Publishers1 Morgan Kaufmann Publishers0 Page Morgan Kaufmann Publishersoffset Virtual Morgan Kaufmann Publisherspage Morgan Kaufmann Publishersnumber Translation 29 Morgan Kaufmann Publishers28 Morgan Kaufmann Publishers27 15 Morgan Kaufmann Publishers14 Morgan Kaufmann Publishers13 Morgan Kaufmann Publishers12 Morgan Kaufmann Publishers11 Morgan Kaufmann Publishers10 Morgan Kaufmann Publishers9 Morgan Kaufmann Publishers8 Physical Morgan Kaufmann Publisherspage Morgan Kaufmann Publishersnumber 3 Morgan Kaufmann Publishers2 Morgan Kaufmann Publishers1 Morgan Kaufmann Publishers0 Page Morgan Kaufmann Publishersoffset Physical Morgan Kaufmann Publishersaddress 91

2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Page Tables Virtual Morgan Kaufmann Publisherspage number Page Morgan Kaufmann Publisherstable Physical Morgan Kaufmann Publisherspage Morgan Kaufmann Publishersor Valid disk Morgan Kaufmann Publishersaddress 1 1 1 1 0 1 1 0 1 1 0 1 Physical Morgan Kaufmann Publishersmemory Disk Morgan Kaufmann Publishersstorage 92 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Page Tables Page Morgan Kaufmann Publisherstable Morgan Kaufmann Publishersregister Virtual Morgan Kaufmann Publishersaddress 3 1 Morgan Kaufmann Publishers 3 0 Morgan Kaufmann Publishers 2 9 Morgan Kaufmann Publishers 2 8 Morgan Kaufmann Publishers 2 7 1 5 Morgan Kaufmann Publishers 1 4 Morgan Kaufmann Publishers 1 3 Morgan Kaufmann Publishers 1 2 Morgan Kaufmann Publishers 1 1 Morgan Kaufmann Publishers 1 0 Morgan Kaufmann Publishers 9 Morgan Kaufmann Publishers 8 Virtual Morgan Kaufmann Publisherspage Morgan Kaufmann Publishersnumber Page Morgan Kaufmann Publishersoffset 12 20 Valid 3 Morgan Kaufmann Publishers 2 Morgan Kaufmann Publishers 1 Morgan Kaufmann Publishers 0 Physical Morgan Kaufmann Publisherspage Morgan Kaufmann Publishersnumber

Page Morgan Kaufmann Publisherstable 18 If Morgan Kaufmann Publishers0 Morgan Kaufmann Publishersthen Morgan Kaufmann Publisherspage Morgan Kaufmann Publishersis Morgan Kaufmann Publishersnot present Morgan Kaufmann Publishersin Morgan Kaufmann Publishersmemory 2 9 Morgan Kaufmann Publishers 2 8 Morgan Kaufmann Publishers 2 7 1 5 Morgan Kaufmann Publishers 1 4 Morgan Kaufmann Publishers 1 3 Morgan Kaufmann Publishers 1 2 Morgan Kaufmann Publishers 1 1 Morgan Kaufmann Publishers 1 0 Morgan Kaufmann Publishers 9 Morgan Kaufmann Publishers 8 Physical Morgan Kaufmann Publisherspage Morgan Kaufmann Publishersnumber 3 Morgan Kaufmann Publishers 2 Morgan Kaufmann Publishers 1 Morgan Kaufmann Publishers 0 Page Morgan Kaufmann Publishersoffset Physical Morgan Kaufmann Publishersaddress 93 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Making Address Translation Fast A cache for address translations: translation lookaside buffer TLB Virtual Morgan Kaufmann Publisherspage number Valid Dirty Ref 1 1 1 1 0 1 0 1 1 0 0 0 Tag Physical Morgan Kaufmann Publisherspage address 1 1

1 1 0 1 Physical Morgan Kaufmann Publishersmemory Page Morgan Kaufmann Publisherstable Physical Morgan Kaufmann Publisherspage Valid Dirty Ref or Morgan Kaufmann Publishersdisk Morgan Kaufmann Publishersaddress 1 1 1 1 0 1 1 0 1 1 0 1 Typical values: 1 0 0 0 0 0 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 1

0 1 Disk Morgan Kaufmann Publishersstorage 16-512 entries, miss-rate: .01% - 1% miss-penalty: 10 100 cycles 94 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers TLBs and caches Virtual Morgan Kaufmann Publishersaddress TLB Morgan Kaufmann Publishersaccess TLB Morgan Kaufmann Publishersmiss exception No Yes TLB Morgan Kaufmann Publishershit? Physical Morgan Kaufmann Publishersaddress No Try Morgan Kaufmann Publishersto Morgan Kaufmann Publishersread Morgan Kaufmann Publishersdata from Morgan Kaufmann Publisherscache Cache Morgan Kaufmann Publishersmiss Morgan Kaufmann Publishersstall while Morgan Kaufmann Publishersread Morgan Kaufmann Publishersblock No Cache Morgan Kaufmann Publishershit? Yes Write? No Yes

Write Morgan Kaufmann Publishersaccess bit Morgan Kaufmann Publisherson? Write Morgan Kaufmann Publishersprotection exception Yes Try Morgan Kaufmann Publishersto Morgan Kaufmann Publisherswrite Morgan Kaufmann Publishersdata to Morgan Kaufmann Publisherscache Deliver Morgan Kaufmann Publishersdata to Morgan Kaufmann Publishersthe Morgan Kaufmann PublishersCPU Cache Morgan Kaufmann Publishersmiss Morgan Kaufmann Publishersstall while Morgan Kaufmann Publishersread Morgan Kaufmann Publishersblock No Cache Morgan Kaufmann Publishershit? Yes Write Morgan Kaufmann Publishersdata Morgan Kaufmann Publishersinto Morgan Kaufmann Publisherscache, update Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersdirty Morgan Kaufmann Publishersbit, Morgan Kaufmann Publishersand put Morgan Kaufmann Publishersthe Morgan Kaufmann Publishersdata Morgan Kaufmann Publishersand Morgan Kaufmann Publishersthe address Morgan Kaufmann Publishersinto Morgan Kaufmann Publishersthe Morgan Kaufmann Publisherswrite Morgan Kaufmann Publishersbuffer 95 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers TLBs and Caches Virtual Morgan Kaufmann Publishersaddress 31 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers30 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers29 14 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers13 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers12 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers11 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers10 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers9 Virtual Morgan Kaufmann Publisherspage Morgan Kaufmann Publishersnumber 3 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers2 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers1 Morgan Kaufmann Publishers Morgan Kaufmann Publishers Morgan Kaufmann Publishers0 Page Morgan Kaufmann Publishersoffset 12 20 Valid Dirty Tag

Physical Morgan Kaufmann Publisherspage Morgan Kaufmann Publishersnumber = = = = = = TLB TLB Morgan Kaufmann Publishershit 20 Page Morgan Kaufmann Publishersoffset Physical Morgan Kaufmann Publisherspage Morgan Kaufmann Publishersnumber Physical Morgan Kaufmann Publishersaddress Block Cache Morgan Kaufmann Publishersindex Physical Morgan Kaufmann Publishersaddress Morgan Kaufmann Publisherstag offset 18 8 4 Byte offset 2 8 12 Valid Data Tag Cache = Cache Morgan Kaufmann Publishershit 32 Data 96

2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Modern Systems 97 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Modern Systems Things are getting complicated! 98 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Some Issues Processor speeds continue to increase very fast much faster than either DRAM or disk access times 100,000 10,000 1,000 Performance CPU 100 10 Memory 1 Year Design challenge: dealing with this growing disparity Prefetching? 3rd level caches and more? Memory design? 99

2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Chapters 8 & 9 (partial coverage) 100 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Interfacing Processors and Peripherals I/O Design affected by many factors (expandability, resilience) Performance: access latency throughput connection between devices and the system the memory hierarchy the operating system A variety of different users (e.g., banks, supercomputers, engineers) Interrupts Processor Cache Memory- Morgan Kaufmann PublishersI/O Morgan Kaufmann Publishersbus Main memory I/O controller Disk Disk I/O controller I/O controller Graphics

output Network 101 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers I/O Important but neglected The difficulties in assessing and designing I/O systems have often relegated I/O to second class status courses in every aspect of computing, from programming to computer architecture often ignore I/O or give it scanty coverage textbooks leave the subject to near the end, making it easier for students and instructors to skip it! GUILTY! we wont be looking at I/O in much detail be sure and read Chapter 8 in its entirety. you should probably take a networking class! 102 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers I/O Devices Very diverse devices behavior (i.e., input vs. output) partner (who is at the other end?) data rate 103 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers I/O Example: Disk Drives Platters Tracks Platter

Sectors Track To access data: seek: position head over the proper track (3 to 14 ms. avg.) rotational latency: wait for desired sector (.5 / RPM) transfer: grab the data (one or more sectors) 30 to 80 MB/sec 104 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers I/O Example: Buses Shared communication link (one or more wires) Difficult design: may be bottleneck length of the bus number of devices tradeoffs (buffers for higher bandwidth increases latency) support for many different devices cost Types of buses: processor-memory (short high speed, custom design) backplane (high speed, often standardized, e.g., PCI) I/O (lengthy, different devices, e.g., USB, Firewire) Synchronous vs. Asynchronous use a clock and a synchronous protocol, fast and small but every device must operate at same rate and clock skew requires the bus to be short dont use a clock and instead use handshaking 105 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers I/O Bus Standards

Today we have two dominant bus standards: 106 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Other important issues Bus Arbitration: daisy chain arbitration (not very fair) centralized arbitration (requires an arbiter), e.g., PCI collision detection, e.g., Ethernet Operating system: polling interrupts direct memory access (DMA) Performance Analysis techniques: queuing theory simulation analysis, i.e., find the weakest link (see I/O System Design) Many new developments 107 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Pentium 4 I/O Options Pentium Morgan Kaufmann Publishers4 processor DDR Morgan Kaufmann Publishers400 (3.2 Morgan Kaufmann PublishersGB/sec) Main memory DIMMs

DDR Morgan Kaufmann Publishers400 (3.2 Morgan Kaufmann PublishersGB/sec) System Morgan Kaufmann Publishersbus Morgan Kaufmann Publishers(800 Morgan Kaufmann PublishersMHz, Morgan Kaufmann Publishers604 Morgan Kaufmann PublishersGB/sec) AGP Morgan Kaufmann Publishers8X Memory (2.1 Morgan Kaufmann PublishersGB/sec) Graphics controller output hub CSA (north Morgan Kaufmann Publishersbridge) (0.266 Morgan Kaufmann PublishersGB/sec) 1 Morgan Kaufmann PublishersGbit Morgan Kaufmann PublishersEthernet 82875P Serial Morgan Kaufmann PublishersATA (150 Morgan Kaufmann PublishersMB/sec) (266 Morgan Kaufmann PublishersMB/sec) Parallel Morgan Kaufmann PublishersATA (100 Morgan Kaufmann PublishersMB/sec) Serial Morgan Kaufmann PublishersATA (150 Morgan Kaufmann PublishersMB/sec) Parallel Morgan Kaufmann PublishersATA (100 Morgan Kaufmann PublishersMB/sec) Disk Disk Stereo (surroundsound) AC/97 (1 Morgan Kaufmann PublishersMB/sec) USB Morgan Kaufmann Publishers2.0 (60 Morgan Kaufmann PublishersMB/sec) . Morgan Kaufmann Publishers. Morgan Kaufmann Publishers. I/O controller hub (south Morgan Kaufmann Publishersbridge) 82801EB

(20 Morgan Kaufmann PublishersMB/sec) CD/DVD Tape 10/100 Morgan Kaufmann PublishersMbit Morgan Kaufmann PublishersEthernet PCI Morgan Kaufmann Publishersbus (132 Morgan Kaufmann PublishersMB/sec) 108 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Fallacies and Pitfalls Fallacy: the rated mean time to failure of disks is 1,200,000 hours, so disks practically never fail. Fallacy: magnetic disk storage is on its last legs, will be replaced. Fallacy: A 100 MB/sec bus can transfer 100 MB/sec. Pitfall: Moving functions from the CPU to the I/O processor, expecting to improve performance without analysis. 109 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Multiprocessors Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad news: its really hard to write good concurrent programs many commercial failures Processor

Processor Processor Cache Cache Cache Processor Processor Processor Cache Cache Cache Memory Memory Memory Single Morgan Kaufmann Publishersbus Memory I/O Network 110 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Questions How do parallel processors share data? single address space (SMP vs. NUMA) message passing

How do parallel processors coordinate? synchronization (locks, semaphores) built into send / receive primitives operating system protocols How are they implemented? connected by a single bus connected by a network 111 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Supercomputers Plot of top 500 supercomputer sites over a decade: Single Morgan Kaufmann PublishersInstruction Morgan Kaufmann Publishersmultiple Morgan Kaufmann Publishersdata Morgan Kaufmann Publishers(SIMD) 500 Cluster (network Morgan Kaufmann Publishersof workstations) 400 Cluster (network Morgan Kaufmann Publishersof SMPs) 300 Massively parallel processors (MPPs) 200 100 0 93 93 94 94 95 95 96 96 97 97 98 98 99 99 00 Sharedmemory multiprocessors (SMPs)

Uniprocessors 112 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Using multiple processors an old idea Some SIMD designs: Costs for the the Illiac IV escalated from $8 million in 1966 to $32 million in 1972 despite completion of only of the machine. It took three more years before it was operational! For better or worse, computer architects are not easily discouraged Lots of interesting designs and ideas, lots of failures, few successes 113 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Topologies P0 P1 P2 a. Morgan Kaufmann Publishers2-D Morgan Kaufmann Publishersgrid Morgan Kaufmann Publishersor Morgan Kaufmann Publishersmesh Morgan Kaufmann Publishersof Morgan Kaufmann Publishers16 Morgan Kaufmann Publishersnodes P3 P0 P4 P1 P5 P2 P6 P3

P7 P4 P5 P6 P7 b. Morgan Kaufmann PublishersOmega Morgan Kaufmann Publishersnetwork a. Morgan Kaufmann PublishersCrossbar b. Morgan Kaufmann Publishersn-cube Morgan Kaufmann Publisherstree Morgan Kaufmann Publishersof Morgan Kaufmann Publishers8 Morgan Kaufmann Publishersnodes Morgan Kaufmann Publishers(8 Morgan Kaufmann Publishers= Morgan Kaufmann Publishers23 Morgan Kaufmann Publishersso Morgan Kaufmann Publishersn Morgan Kaufmann Publishers= Morgan Kaufmann Publishers3) 114 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Clusters Constructed from whole computers Independent, scalable networks Strengths: Many applications amenable to loosely coupled machines Exploit local area networks Cost effective / Easy to expand Weaknesses: Administration costs not necessarily lower Connected using I/O bus Highly available due to separation of memories In theory, we should be able to do better 115 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Google

Serve an average of 1000 queries per second Google uses 6,000 processors and 12,000 disks Two sites in silicon valley, two in Virginia Each site connected to internet using OC48 (2488 Mbit/sec) Reliability: On an average day, 20 machines need rebooted (software error) 2% of the machines replaced each year In some sense, simple ideas well executed. Better (and cheaper) than other approaches involving increased complexity 116 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers Concluding Remarks Evolution vs. Revolution More often the expense of innovation comes from being too disruptive to computer users Acceptance of hardware ideas requires acceptance by software people; therefore hardware people should learn about software. And if software people want good machines, they must learn more about hardware to be able to communicate with and thereby influence hardware engineers. 117 2004 Morgan Kaufmann PublishersMorgan Morgan Kaufmann PublishersKaufmann Morgan Kaufmann PublishersPublishers

Recently Viewed Presentations

  • Cost Benefit Analysis of the Three Gorges Dam

    Cost Benefit Analysis of the Three Gorges Dam

    An Economic Cost-Benefit Analysis of GM Crop Cultivation: An Irish Case Study Marie-Louise Flannery, Fiona S. Thorne, B Paul W. Kelly and Ewen Mullins GM Crops in Ireland At the time of study, no GM crop cultivation in Ireland (EU...
  • HUMANS & CULTURAL STATE & ECONOMIC SOCIAL ENVIRONINTERACTION

    HUMANS & CULTURAL STATE & ECONOMIC SOCIAL ENVIRONINTERACTION

    (685-705) Return Who is Emperor Wu? 685-705 Return Social Structure * Humans & Environment A Middle Eastern group who moved from place to place tending and trading sheep, horses, and camels Return Answer Humans & Environment Middle Eastern beast of...
  • A2 Sociology - WordPress.com

    A2 Sociology - WordPress.com

    Provide definitions on your whiteboard Classic Marxism: Marx Classical Marxism was founded by Karl Marx Like Functionalism, Marxism is also a structural theory - society is a structure or system that shapes individuals' behaviour and ideas. It is also a...
  • COUGAR FIRST IMPRESSIONS VOLUNTEER TRAINING August 7, 2017

    COUGAR FIRST IMPRESSIONS VOLUNTEER TRAINING August 7, 2017

    RADIO ETIQUETTE. Push the button on the side when you want to speak and release when you are finished speaking. Example: Table 12 to Base, over. Base to Table 12, over. Table 12 needs water pouches, over. Base to Table...
  • Current Advanced Teleconferencing Solutions Dan Sandin EVL Feb

    Current Advanced Teleconferencing Solutions Dan Sandin EVL Feb

    Bandwidth 100 Mbit each way with sound DVCPRO HD Teleconferencing Development Need to test latency and quality Need to develop software decode I meet with Panasonic feb 8 06 HDSDI Teleconferencing No compression 4-2-2 Use widely deployed HD format 1920...
  • Alliteration - williston.k12.sc.us

    Alliteration - williston.k12.sc.us

    ABC Order super splendid terrific splendid super terrific heartache sadness gloomy gloomy heartache sadness sharecropping landowner share landowner share sharecropping character creative creature character creative creature treehouse theater terrified terrified theater treehouse gnat great golden gnat golden great citizen charisma...
  • ccartagena.weebly.com

    ccartagena.weebly.com

    clove of seasons. Red flower. Red = blood/death. Cloves are not ready to be picked until they are 5 years old, linking Doodle's "late blooming" for a child. symbol. the oriole nest in the elm was untenanted and rocked back...
  • E Portfolio: Quick Start Guide The e-portfolio is

    E Portfolio: Quick Start Guide The e-portfolio is

    The e-portfolio is an online site for you to submit your assignments & evidence as well as tracking your progress on your qualification.www.vqmanager.co.uk. E Portfolio: Quick Start Guide