CS152: Computer Architecture and Engineering

CS152: Computer Architecture and Engineering

CS152 Computer Architecture and Engineering Lecture 14 Pipelining Control Continued Introduction to Advanced Pipelining CS152 Lec14.1 Recap: Summary of Pipelining Basics 5 stages: Fetch: Fetch instruction from memory Decode: get register values and decode control information Execute: Execute arithmetic operations/calculate addresses Memory: Do memory ops (load or store) Writeback: Write results back to registers (I.e. COMMIT) Pipelines pass control information down the pipe just as data moves down pipe Forwarding/Stalls handled by local control Balancing length of instructions makes pipelining much smoother Increasing length of pipe increases impact of hazards; pipelining helps instruction bandwidth, not latency CS152 Lec14.2 Recap: Can pipelining get us into trouble? Yes: Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time - E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) data hazards: attempt to use item before it is ready - E.g., one sock of pair in dryer and one in washer; cant fold until get sock from washer through dryer - instruction depends on result of prior instruction still in the pipeline control hazards: attempt to make a decision before condition is evaulated - E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in - branch instructions Can always resolve hazards by waiting pipeline control must detect the hazard take action (or delay action) to resolve hazards CS152 Lec14.3

Pipelining the Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Clock 1st lw Ifetch Reg/Dec 2nd lw Ifetch 3rd lw Exec Mem Wr Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Mem Wr The five independent functional units in the pipeline datapath are: Instruction Memory for the Ifetch stage Register Files Read ports (bus A and busB) for the Reg/Dec stage ALU for the Exec stage Data Memory for the Mem stage Register Files Write port (bus W) for the Wr stage CS152 Lec14.4 The Four Stages of Rtype Cycle 1 Cycle 2 R-type Ifetch Reg/Dec Cycle 3 Cycle 4 Exec

Wr Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: ALU operates on the two register operands Update PC Wr: Write the ALU output back to the register file CS152 Lec14.5 Pipelining the R-type and Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock R-type Ifetch R-type Reg/Dec Exec Ifetch Reg/Dec Exec Ifetch Reg/Dec Load Ops! We have a problem! Wr R-type Ifetch Wr Exec Mem Wr Reg/Dec Exec

Wr R-type Ifetch Reg/Dec Exec Wr We have a structural hazard: Two instructions try to write to the register file at the same time! Only one write port CS152 Lec14.6 Important Observation Each functional unit can only be used once per instruction Each functional unit must be used at the same stage for all instructions: Load uses Register Files Write Port during its 5th stage Load 1 2 Ifetch Reg/Dec 3 Exec 4 5 Mem Wr R-type uses Register Files Write Port during its 4th stage 1 R-type Ifetch 2 Reg/Dec 3 Exec 4 Wr 2 ways to solve this pipeline hazard.

CS152 Lec14.7 Solution 1: Insert Bubble into the Pipeline Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock Ifetch Load Reg/Dec Exec Ifetch Reg/Dec R-type Ifetch Wr Exec Mem Reg/Dec Exec Wr Wr R-type Ifetch Reg/Dec Pipeline Exec R-type Ifetch Bubble Reg/Dec Ifetch Wr Exec Reg/Dec Wr Exec Insert a bubble into the pipeline to prevent 2 writes at the same cycle The control logic can be complex. Lose instruction fetch and issue opportunity. No instruction is started in Cycle 6! CS152 Lec14.8 Solution 2: Delay R-types Write by One Cycle

Delay R-types register write by one cycle: Now R-type instructions also use Reg Files write port at Stage 5 Mem stage is a NOOP stage: nothing is being done. 1 2 R-type Ifetch Cycle 1 Cycle 2 Reg/Dec 3 Exec 4 Mem 5 Wr Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock R-type Ifetch R-type Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Mem Wr Reg/Dec Exec

Mem Wr Reg/Dec Exec Mem Load R-type Ifetch R-type Ifetch Wr CS152 Lec14.9 Modified Control & Datapath IR <- Mem[PC]; PC < PC+4; A <- R[rs]; B< R[rt] Mem[S] <- B M S B D Reg. File A Data Mem R[rd] < M; IR Inst. Mem R[rt] < M; PC Next PC R[rd] < M; M < Mem[S]

if Cond PC < PC+SX; Mem Access M < S S < A + SX; Exec M < S S < A + SX; Equal S < A or ZX; Reg File S < A + B; CS152 Lec14.10 The Four Stages of Store Cycle 1 Cycle 2 Store Ifetch Reg/Dec Cycle 3 Cycle 4 Exec Mem Wr Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: Calculate the memory address Mem: Write the data into the Data Memory CS152 Lec14.11 The Three Stages of Beq

Cycle 1 Cycle 2 Beq Ifetch Reg/Dec Cycle 3 Cycle 4 Exec Mem Wr Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: compares the two register operand, select correct branch target address latch into PC CS152 Lec14.12 Control Diagram IR <- Mem[PC]; PC < PC+4; A <- R[rs]; B< R[rt] Mem[S] <- B M S B D Reg. File A Data Mem R[rd] < M; IR Inst. Mem

R[rt] < S; PC Next PC R[rd] < S; M < Mem[S] If Cond PC < PC+SX; Mem Access M < S S < A + SX; Exec M < S S < A + SX; Equal S < A or ZX; Reg File S < A + B; CS152 Lec14.13 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Input Control Memory Datapath Output CS152 Lec14.14 Recall: Single cycle control Control Ideal Instruction Memory

Rd Rs 5 5 Instruction Address PC Rw Ra Rb 32 32-bit Registers Clk Conditions Rt 5 A 32 Clk Control Signals 32 32 ALU Next Address Instruction B 32 Data Address Data In Ideal Data Memory Data Out Clk Datapath CS152 Lec14.15 Data Stationary

Control The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later Reg/Dec ALUOp ALUOp RegDst MemWr Branch MemtoReg RegWr RegDst MemWr Branch MemtoReg RegWr MemWr Branch MemtoReg RegWr Wr Mem/Wr Register ExtOp ALUSrc Mem Ex/Mem Register ExtOp ALUSrc ID/Ex Register IF/ID Register Main Control Exec MemtoReg RegWr CS152 Lec14.16 B

D Reg. File M S WB Ctrl Data Mem A Mem Ctrl Mem Access im v rw wb PC rs rt v rw wb me Exec op v rw wb me ex Next PC rt rs Decode IR fun Reg

File Inst. Mem Datapath + Data Stationary Control CS152 Lec14.17 Lets Try it Out 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 these addresses are octal CS152 Lec14.18 n

= B 10 D PC M S Data Mem Reg File A Mem Access im WB Ctrl Mem Ctrl Exec rs rt Next PC IR n Reg. File n Decode Inst. Mem Start: Fetch 10 n IF 10 lw

r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and CS152 r13, r14, 15 Lec14.19 PC 14 D Data Mem B M S Mem Access = im

A WB Ctrl Mem Ctrl Exec rt n Reg. File n Decode 2 Reg File IR n Next PC lw r1, r2(35) Inst. Mem Fetch 14, Decode 10 ID 10 lw IF 14 addI r2, r2, 3 r1, r2(35) 20 sub r3, r4, r5 24 beq r6, r7, 100

30 ori r8, r9, 17 34 add r10, r11, r12 100 and CS152 r13, r14, 15 Lec14.20 PC 20 D Data Mem = B Reg. File WB Ctrl M S Mem Access r2 2 n Mem Ctrl Exec 35 lw r1

Decode rt Reg File IR n Next PC addI r2, r2, 3 Inst. Mem Fetch 20, Decode 14, Exec 10 EX 10 lw ID 14 addI r2, r2, 3 IF 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r1, r2(35) CS152 r13, r14, 15

Lec14.21 PC 24 Data Mem Reg. File lw r1 D M Mem Access = B WB Ctrl Mem Ctrl r2+35 r2 Exec 4 n addI r2, r2, 3 3 Decode 5 Reg File IR Next PC sub r3, r4, r5 Inst. Mem Fetch 24, Decode 20, Exec 14, Mem 10

M 10 lw EX 14 addI r2, r2, 3 ID 20 sub r3, r4, r5 IF 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r1, r2(35) CS152 r13, r14, 15 Lec14.22 = PC Next PC D lw r1 Reg. File WB Ctrl

M[r2+35] Mem Access r2+3 Exec Reg File IR Mem Ctrl Data Mem addI r2 Decode Inst. Mem Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 WB 10 M 14 lw r1, r2(35) addI r2, r2, 3 EX 20 ID 24 sub r3, r4, r5 beq r6, r7, 100 IF 30 ori r8, r9, 17 add r10, r11, r12 34

100 and CS152 r13, r14, 15 Lec14.23 30 PC Next PC D Data Mem r5 Mem Access = r4 M[r2+35] 7 WB Ctrl Note Delayed Branch: always execute ori after beq Reg. File Mem Ctrl lw r1 addI r2 r2+3 sub r3 sub Exec 6 Reg File

IR Decode beq r6, r7 100 Inst. Mem Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 WB 10 M 14 lw r1, r2(35) addI r2, r2, 3 EX 20 ID 24 sub r3, r4, r5 beq r6, r7, 100 IF 30 ori r8, r9, 17 add r10, r11, r12 34 100 and CS152 r13, r14, 15 Lec14.24 34 PC Next PC WB Ctrl

r1=M[r2+35] x x Take the branch r6-r7 = 0 Reg. File x x D Data Mem x Exec x Mem Access = x Mem Ctrl x x x Reg File IR x Decode Inst. Mem Fetch 34, Dcd 30, Ex 24, Mem 20, WB 14 10 WB 14 M 20

lw r1, r2(35) addI r2, r2, 3 sub r3, r4, r5 EX 24 ID 30 beq r6, r7, 100 ori r8, r9, 17 34 add r10, r11, r12 CS152 IF 100 and r13, r14, 15 Lec14.25 Take the branch r6-r7 = 0 r1=M[r2+35] WB Ctrl Reg. File addI r2 r2+3 Data Mem sub r3 34 PC Next PC D Mem Access

=0 r7 Mem Ctrl r4-r5 r6 9 Exec beq Decode xx 100 IR Reg File ori r8, r9 17 Inst. Mem Fetch 34, Dcd 30, Ex 24, Mem 20, WB 14 10 WB 14 M 20 lw r1, r2(35) addI r2, r2, 3 sub r3, r4, r5 EX 24 ID 30 beq r6, r7, 100 ori r8, r9, 17

34 add r10, r11, r12 CS152 IF 100 and r13, r14, 15 Lec14.26 sub r3 beq ori r8 Mem Ctrl WB Ctrl 100 PC Next PC Do we have a problem here? Data Mem D Mem Access x Reg. File xxx r9 r4-r5 17 Decode or Exec IR

11 12 Reg File add r10, r11, r12 Inst. Mem Fetch 100, Dcd 34, Ex 30, Mem 24, WB 20 r1=M[r2+35] r2 = r2+3 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 Lec14.27 CS152 sub r3

beq ori r8 Mem Ctrl WB Ctrl Data Mem 100 PC Next PC D Mem Access x ooops, we should have only one delayed instruction Reg. File xxx r9 r4-r5 17 Decode or Exec IR 11 12 Reg File add r10, r11, r12 Inst. Mem Fetch 100, Dcd 34, Ex 30, Mem 24, WB 20 r1=M[r2+35]

r2 = r2+3 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 Lec14.28 CS152 beq ori r8 add r10 Mem Ctrl WB Ctrl 104 PC

Next PC D Reg. File xxx Data Mem r12 Mem Access r11 r9 | 17 xx Decode add Exec IR 14 15 Reg File and r13, r14, r15 Inst. Mem Fetch 104, Dcd 100, Ex 34, Mem 30, WB 24 n r1=M[r2+35] r2 = r2+3 r3 = r4-r5 10 lw 14 addI r2, r2, 3 20 sub

r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 Lec14.29 Squash the extra instruction r1, r2(35) CS152 ori r8 add r10 and r13 Mem Ctrl WB Ctrl Reg. File r9 | 17 Data Mem Mem Access r11+r12 108 D

PC r15 Exec r14 Next PC IR Reg File xx Decode Inst. Mem Fetch 108, Dcd 104, Ex 100, Mem 34, WB 30 n r1=M[r2+35] r2 = r2+3 r3 = r4-r5 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34

add r10, r11, r12 100 and r13, r14, 15 Lec14.30 CS152 NO WB NO Ovflow WB Ctrl Reg. File add r10 r11+r12 Mem Access Data Mem and r13 PC 114 D Next PC Mem Ctrl r14 & R15 Exec IR Reg File Decode Inst. Mem Fetch 112, Dcd 108, Ex 104, Mem 100, WB 34 n

r1=M[r2+35] r2 = r2+3 r3 = r4-r5 r8 = r9 | 17 10 lw 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 Lec14.31 Squash the extra instruction in the branch shadow! r1, r2(35) CS152 Pipelined Processor IRmem Separate control at each stage D Reg.

File M Data Mem B Mem Access PC Next PC Equal Ex Ctrl Exec S WB Ctrl IRex A IRwb Dcd Ctrl Stalls Reg File IR Inst. Mem Valid Mem Ctrl Bubbles Stalls propagate backwards to freeze previous stages Bubbles in pipeline introduced by placing Noops into local stage, stall previous stages. CS152 Lec14.32 Pipeline Hazards Again I-Fetch DCD MemOpFetch OpFetch IFetch Structural

Hazard I-Fet ch DCD DCD OpFetch Jump IFetch IF DCD EX IF Mem WB DCD EX IF DCD Store Control Hazard RAW (read after write) Data Hazard Mem WB DCD EX Mem WB IF DCD IF Exec DCD OF WAW Data Hazard (write after write) OF Ex RS

Ex Mem WAR Data Hazard (write after read) CS152 Lec14.33 Recap: Data Hazards Avoid some by design eliminate WAR by always fetching operands early (DCD) in pipe eliminate WAW by doing all WBs in order (last stage, static) Detect and resolve remaining ones stall or forward (if possible) IF DCD EX IF Mem WB DCD EX IF RAW Data Hazard Mem WB DCD EX Mem WB IF DCD IF DCD OF WAW Data Hazard OF Ex RS Ex Mem WAR Data Hazard CS152 Lec14.34 Hazard Detection

Suppose instruction i is about to be issued and a predecessor instruction j is in the instruction pipeline. New Inst Inst I Window execution: A RAW hazard exists on register ifon Rregs( i) Only pending instructions can Wregs( Instruction j ) Inst J cause exceptions Movement: Keep a record of pending writes (for inst's in the pipe) and compare with operand regs of current instruction. When instruction issues, reserve its result register as a write reservation. When on operation completes, remove its write reservation. A WAW hazard exists on register if Wregs( i ) Wregs( j ) A WAR hazard exists on register if Wregs( i ) Rregs( j ) CS152 Lec14.35 Record of Pending Writes In Pipeline Registers IAU npc Current operand registers I mem Regs op rw rs rt PC Pending writes hazard <= B A im n op rw

((rs == rwex) & regWex) OR ((rs == rwmem) & regWme) OR alu ((rs == rwwb) & regWwb) OR S n op rw ((rt == rwmem) & regWme) OR D mem m Regs ((rt == rwex) & regWex) OR ((rt == rwwb) & regWwb) n op rw CS152 Lec14.36 Resolve RAW by forwarding (or bypassing) IAU npc I mem Regs op rw rs rt Forward mux B A im v op rw alu S v op rw PC Detect nearest valid write op operand register and forward into op latches,

bypassing remainder of the pipe Increase muxes to add paths from pipeline registers Data Forwarding = Data Bypassing D mem m Regs v op rw CS152 Lec14.37 What about memory operations? If instructions are initiated in order and operations always occur in the same stage, there can be no hazards between memory operations! What about data dependence on loads? R1 <- R4 + R5 R2 <- Mem[ R2 + I ] R3 <- R2 + R1 Delayed Loads Can recognize this in decode stage and introduce bubble while stalling fetch stage Tricky situation: R1 <- Mem[ R2 + I ] Mem[R3+34] <- R1 Handle with bypass in memory stage! op Rd Ra Rb op Rd Ra Rb Rd Rd A D B R Mem T to reg file CS152

Lec14.38 Compiler Avoiding Load Stalls: scheduled gcc spice 54% 31% 42% 14% tex 0% unscheduled 65% 25% 20% 40% 60% 80% % loads stalling pipeline CS152 Lec14.39 What about Interrupts, Traps, Faults? External Interrupts: Allow pipeline to drain, Fill with NOPs Load PC with interrupt address Faults (within instruction, restartable) Force trap instruction into IF disable writes till trap hits WB must save multiple PCs or PC + state Recall: Precise Exceptions State of the machine is preserved as if program executed up to the offending instruction All previous instructions completed Offending instruction and all following instructions act as if they have not even started Same system code will work on different implementations CS152 Lec14.40

Exception/Interrupts: Implementation questions 5 instructions, executing in 5 different pipeline stages! Who caused the interrupt? Stage Problem interrupts occurring IFPage fault on instruction fetch; misaligned memory memory-protection violation ID Undefined or illegal opcode EX Arithmetic exception MEM Page fault on data fetch; misaligned memory memory-protection violation; memory error access; access; How do we stop the pipeline? How do we restart it? Do we interrupt immediately or wait? How do we sort all of this out to maintain preciseness? CS152 Lec14.41 Exception Handling IAU npc I mem Regs B lw $2,20($5) A im n op rw detect bad instruction address PC Excp detect bad instruction Excp detect overflow alu S D mem m Regs Excp detect bad data address

Excp Allow exception to take effect CS152 Lec14.42 Another look at the exception problem Time Bad Inst Inst TLB fault Overflow IFetch Dcd Program Flow Data TLB Exec IFetch Dcd Mem WB Exec Mem WB Exec Mem WB Exec Mem IFetch Dcd IFetch Dcd WB Use pipeline to sort this out! Pass exception status along with instruction. Keep track of PCs for every instruction in pipeline. Dont act on exception until it reaches WB stage Handle interrupts through faulting no-op in IF stage When instruction reaches end of MEM stage: Save PC EPC, Interrupt vector addr PC Turn all instructions in earlier stages into no-ops! CS152

Lec14.43 Resolution: Freeze above & Bubble Below IAU npc I mem Regs op rw rs rt freeze PC bubble B A im n op rw alu S n op rw D mem m Regs Flush accomplished by setting invalid bit in pipeline n op rw CS152 Lec14.44 FYI: MIPS R3000 clocking discipline phi1 phi2 2-phase non-overlapping clocks Pipeline stage is two (level sensitive) latches phi1 phi2 phi1 Edge-triggered CS152 Lec14.45

MIPS R3000 Instruction Pipeline Decode Reg. Read Inst Fetch TLB I-Cache RF ALU / E.A Memory Operation E.A. TLB Write Reg WB D-Cache Resource Usage TLB TLB I-cache RF WB ALUALU D-Cache Write in phase 1, read in phase 2 => eliminates bypass from WB CS152 Lec14.46 Recall: Data Hazard on r1 xor r10,r1,r11 Dm Reg Dm Im Reg Dm Im

Reg Dm Im Reg ALU or r8,r1,r9 WB ALU and r6,r1,r7 Im MEM ALU sub r4,r1,r3 EX ALU O r d e r add r1,r2,r3 ID/ RF Reg ALU I n s t r. Time (clock cycles) I F Im Reg Reg Reg

Reg Dm Reg With MIPS R3000 pipeline, no need to forward from WB stage CS152 Lec14.47 MIPS R3000 Multicycle Operations Use control word of local stage to step through multicycle operation op Rd Ra Rb Stall all stages above multicycle operation in the pipeline mul Rd Ra Rb Rd A B R Rd T to reg file Drain (bubble) stages below it Alternatively, launch multiply/divide to autonomous unit, only stall pipe if attempt to get result before ready - This means stall mflo/mfhi in decode stage if multiply/divide still executing Ex: Multiply, Divide, Cache Miss CS152 Lec14.48 Is CPI = 1 for our pipeline? Remember that CPI is an Average # cycles/inst IFetch Dcd Exec IFetch Dcd Mem WB

Exec Mem WB Exec Mem WB Exec Mem IFetch Dcd IFetch Dcd WB CPI here is 1, since the average throughput is 1 instruction every cycle. What if there are stalls or multi-cycle execution? Usually CPI > 1. How close can we get to 1?? CS152 Lec14.49 Recall: Compute CPI? Start with Base CPI Add stalls CPI CPI base CPI stall CPI stall STALLtype 1 freqtype 1 STALLtype 2 freqtype 2 Suppose: CPIbase=1 Freqbranch=20%, freqload=30% Suppose branches always cause 1 cycle stall Loads cause a 100 cycle stall 1% of time Then: CPI = 1 + (10.20)+(100 0.300.01)=1.5 Multicycle? Could treat as: CPIstall=(CYCLES-CPIbase) freqinst CS152 Lec14.50 Case Study: MIPS R4000 (200 MHz) 8 Stage Pipeline: IFfirst half of fetching of instruction; PC selection happens here as well as initiation of instruction cache access. ISsecond half of access to instruction cache. RFinstruction decode and register fetch, hazard checking and also instruction cache hit detection. EXexecution, which includes effective address calculation, ALU operation, and branch target computation and condition evaluation. DFdata fetch, first half of access to data cache.

DSsecond half of access to data cache. TCtag check, determine whether the data cache access hit. WBwrite back for loads and register-register operations. 8 Stages: What is impact on Load delay? Branch delay? Why? CS152 Lec14.51 Case Study: MIPS R4000 IF IS IF RF IS IF EX RF IS IF DF EX RF IS IF DS DF EX RF IS IF TC DS DF EX RF IS IF WB TC DS DF EX RF IS IF IF THREE Cycle Branch Latency (conditions evaluated

during EX phase) IS IF RF IS IF EX RF IS IF DF EX RF IS IF DS DF EX RF IS IF TC DS DF EX RF IS IF WB TC DS DF EX RF IS IF TWO Cycle Load Latency Delay slot plus two stalls Branch likely cancels delay slot if not taken CS152 Lec14.52 MIPS R4000 Floating Point FP Adder, FP Multiplier, FP Divider Last step of FP Multiplier/Divider uses FP Adder HW 8 kinds of stages in FP units: Stage

Functional unit Description A FP adder Mantissa ADD stage D FP divider Divide pipeline stage E FP multiplier Exception test stage M FP multiplier First stage of multiplier N FP multiplier Second stage of multiplier R FP adder Rounding stage S FP adder Operand shift stage U Unpack FP numbers CS152 Lec14.53 MIPS FP Pipe Stages FP Instr 1 2 3 4 5 6 7 Add, Subtract U S+A A+R R+S

Multiply U E+M M Divide U A R Square root U E Negate U S Absolute value U S FP compare U A 8 M M N N+A R D28 D+A D+R, D+R, D+A, D+R, A, R (A+R)108

A R R Stages: M First stage of multiplier N Second stage of multiplier R Rounding stage S Operand shift stage U Unpack FP numbers A D E Mantissa ADD stage Divide pipeline stage Exception test stage CS152 Lec14.54 R4000 Performance Not ideal CPI of 1: Load stalls (1 or 2 clock cycles) Branch stalls (2 cycles + unfilled slots) FP result stalls: RAW data hazard (latency) Base Load stalls Branch stalls FP result stalls tomcatv su2cor spice2g6 ora nasa7 doduc li gcc espresso eqntott

FP structural stalls: Not enough FP hardware (parallelism) 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 FP structural stalls CS152 Lec14.55 Summary Hazards limit performance Structural: need more HW resources Data: need forwarding, compiler scheduling Control: early evaluation & PC, delayed branch, prediction Data hazards must be handled carefully: RAW data hazards handled by forwarding WAW and WAR hazards dont exist in 5-stage pipeline MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load) Exceptions in 5-stage pipeline recorded when they occur, but acted on only at WB (end of MEM) stage Must flush all previous instructions More performance from deeper pipelines, parallelism CS152 Lec14.56

Recently Viewed Presentations

  • Use of electricity to direct microbial metabolite production

    Use of electricity to direct microbial metabolite production

    AboutOMICSInternational Conferences. AboutOMICSGroup. OMICS International is a pioneer and leading science event organizer, which publishes around 500 open access journals and conducts over 300 Medical, Clinical, Engineering, Life Sciences, Pharma scientific conferences all over the globe annually with the support...
  • Algebra lesson - Atomwide

    Algebra lesson - Atomwide

    Mechanics lesson. SaRmechanics.lgfl.net. Job Requirements. Maths required: Angle facts. Trigonometry. Exponential calculations. Using radian measure. Converting units. ... You can use trigonometry or a conversion table to find the tension put on each rope.
  • Workday Project Methodology by Stage

    Workday Project Methodology by Stage

    Workday Project Methodology. Plan. Review . project scope. Develop project plan & project charter. Define roles and responsibilities. Define communication plan. Initial Prototype (P0) Project kick-off. Define what needs to be done, how it will be done, and who will...
  • Cyber Threat to Critical Infrastructure Mike Lettman Chief

    Cyber Threat to Critical Infrastructure Mike Lettman Chief

    Go to the presentation into which you want to copy the title slide. ... Software that automatically displays or downloads advertising material (often unwanted) when a user is online. ... Software that enables a user to obtain covert information about...
  • La Navidad en Mexico

    La Navidad en Mexico

    NAVIDAD (Christmas day) A large reunion Main meal around three o'clock in the afternoon, what it is known as "recalentado", because the Christmas meal is reheated from the previous night AÑO NUEVO 31st December - 1st January Big party, is...
  • 幻灯片 1 - education.gmu.edu

    幻灯片 1 - education.gmu.edu

    Student-centered approach Classroom management Teaching demo Learning From Discussion Work in groups Create hands-on interactive activities with partners Blackboard Discussion Learning By Doing Afternoon practical teaching for student summer camp Blackboard Online Discussion Technology warm-up Answer questions Discussion Give feedback...
  • The Windshield Wiper Glasses Kylie B 9 Years

    The Windshield Wiper Glasses Kylie B 9 Years

    The windshield wiper glasses that have already been invented are hard to see out of. There was a girl on you tube who was waiting for her parents outside, and her glasses fogged up. She wished she had windshield wiper...
  • Technology Tools to Support Teaching and Learning

    Technology Tools to Support Teaching and Learning

    Technology Tools to Support Teaching and Learning CECS 4100 Chapters 5 and 6 IETT Rhonda Christensen