CS 161Computer Architecture Introduction to Advanced Architecturs Lecture 13 Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson) 1 1999 UCB Stages of Execution in Pipelined MIPS 5 stage instruction pipeline 1) I-fetch: Fetch Instruction, Increment PC 2) Decode: Instruction, Read Registers 3) Execute: Mem-reference: Calculate Address R-format: Perform ALU Operation 4) Memory: Load: Read Data from Data Memory Store: Write Data to Data Memory 5) Write Back: Write Data to Register
2 1999 UCB Pipelined Execution Representation Time IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB Program Flow To simplify pipeline, every instruction takes same number of steps, called stages 3 One clock cycle per stage 1999 UCB Review: Single-cycle Datapath for
MIPS PC Instruction Memory (Imem) Stage 1 Registers Stage 2 ALU Stage 3 Stage 5 Data Memory (Dmem) Stage 4 Use datapath figure to represent pipeline IFtch Dcd Exec Mem WB
4 Reg ALU IM DM Reg 1999 UCB Graphical Pipeline Representation Time (clock cycles) ALU I n IM DM Reg Reg
Load s IM DM Reg Reg t Add r. IM DM Reg Reg Store O IM DM Reg Reg Sub r IM DM Reg Reg d Or
e r (right half highlighted means read, left half write) ALU ALU ALU ALU 5 1999 UCB Required Changes to Datapath Introduce registers to separate 5 stages by putting IF/ID, ID/EX, EX/MEM, and MEM/WB registers in the datapath. Next PC value is computed in the 3rd step, but we need to bring in next instn in the next cycle Move PCSrc Mux to 1st stage Branch address is computed in 3rd stage. With
pipeline, the PC value has changed! Must carry the PC value along with instn. Width of IF/ID register = (IR)+(PC) = 64 bits. For lw instn, we need write register address at stage 5. But the IR is now occupied by another instn! So, we must carry the IR destination field as we move along the stages. See connection in fig. Length od ID/EX register = (Reg1)+(Reg2)+(offset)+(PC)+ destn = 133 bits 6 1999 UCB Pipelined Datapath (with Pipeline Regs) (6.2)Fetch Decode Execute Memory 0 M u x 1 IF/ID
EX/MEM ID/EX Write Back MEM/WB Add 4 Add Add result PC Address Ins tructio n Shift
left 2 Read register 1 Read data 1 Read register 2 Read data 2 Write register Imem Write data 0 M u x
1 Regs Zero ALU ALU result Address Write data 16 Sign extend 32 Read data 1 M u
x 0 Dmem 5 7 64 bits 133 bits 102 bits 69 bits 1999 UCB Pipelined Control (6.3) Start with single-cycle controller Group control lines by pipeline stage needed Extend pipeline registers with control bits WB
Instruction Control Mem WB EX Mem RegDst ALUop ALUSrc IF/ID 8 ID/EX WB Branch MemRead
MemWrite EX/MEM MemToReg RegWrite MEM/WB 1999 UCB Problems for Pipelining Hazards prevent next instruction from executing during its designated clock cycle, limiting speedup Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) Control hazards: conditional branches & other instructions may stall the pipeline delaying later instructions (must check detergent level before washing next load) Data hazards: Instruction depends on result of prior instruction still in the pipeline (matching socks in later load) 9
1999 UCB MIP S R40 00 pipe line 10 1999 UCB Advanced Architectural Concepts Can we achieve CPI < 1? (i.e., can we have IPC > 1?) State-of-the-Art Microprocessor Superscalar execution or Instruction Level Parallelism (ILP) Deeper Pipeline => Dynamic Branch Prediction => Speculation => Recovery Out-of-order Execution => Instruction Window and Prefetch => Reorder Buffers
VLIW Ex: Intel/HP Titanium 11 1999 UCB Instruction Level Parallelism (ILP) IPC > 1 Time IFtch Dcd Exec Mem WB IFetchDcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB Program Flow ILP = 2 EX: Pentium, SPARC, MIPS 10000, IBM Power PC 12 1999 UCB
HW Schemes: Instruction Parallelism Key idea: Allow instructions behind stall to proceed DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F12,F8,F14 Enables out-of-order execution => out-of-order completion ID stage checks for hazards. If no hazards, issue the instn for execution. Scoreboard dates to CDC 6600 in 1963 13 1999 UCB How ILP Works Issuing multiple instructions per cycle would require fetching multiple instructions from memory per cycle => called Superscalar degree or Issue width To find independent instructions, we must have a big pool of instructions to choose from, called instruction buffer (IB). As IB length increases, complexity of decoder (control) increases that
increases the datapath cycle time Prefetch instructions sequentially by an IFU that operates independently from datapath control. Fetch instruction (PC)+L, where L is the IB size or as directed by the branch predictor. 14 1999 UCB Microarchitecture of an ILP-based CPU (Power PC) 15 1999 UCB 16 1999 UCB Very Large Instruction Word (VLIW) IPC > 1 Time IFtch Dcd Exec Mem WB Exec
IFtch Dcd Exec Mem WB Exec IFtch Dcd Exec Mem WB Exec Program Flow 17 EX: Itanium 1999 UCB Tri Med ia 32-bit 64-bit TM3peripheral memory bus bus 2 Arc hite
data cache ctur 16KB e instruction cache 32 KB multi-port 128 words x 32 bits register file bypass network FU FU FU FU FU PC VLIW instruction decode and launch
Compressed code in the Instruction Cache 18 1999 UCB What is Multiprocessing Parallelism at the Instruction Level is limited because of data dependency => Speed up is limited!! Abundant availability of program level parallelism, like Do I = 1000, Loop Level Parallelism. How about employing multiple processors to execute the loops => Parallel processing or Multiprocessing With billion transistors on a chip, we can put a few CPUs in one chip => Chip multiprocessor 19 1999 UCB Hardware
Multithreading We need to develop a hardware multithreading technique because switching between threads in software is very time-consuming (Why?), so not suitable for main memory (instead of I/O) access, Ex: Multitasking Develop multiple PCs and register sets on the CPU so that thread switching can occur without having to store the register contents in main memory (stack, like it is done for context switching). Several threads reside in the CPU simultaneously, and execution switches between the threads on main memory access. How about both multiprocessors and multithreading on a chip? => Network Processor 20 1999 UCB Hardware Multithreading How can we guarantee no dependencies between instructions in a pipeline?
One way is to interleave execution of instructions from different program threads on same pipeline Interleave 4 threads, T1-T4, on non-bypassed 5-stage pipe T1: LW r1, 0(r2) T2: ADD r7, r1, r4 T3: XORI r5, r4, #12 T4: SW 0(r7), r5 T1: LW r5, 12(r1) 21 1999 UCB Architectural Comparisons (cont.) Fine-Grained Coarse-Grained Multiprocessing Time (processor cycle) Superscalar Simultaneous Multithreading
Este material nonecesariamente refleja los puntos de vista o políticas del Departamento del Trabajo de EU, ni la mención de marcas registradas, productos u organizaciones comerciales implican el respaldo del gobierno de EU. 02/2016:00 Timing. TRAINER NOTE: Communicate importance of...
The Value of Selling Cisco Technical Service Your Business Benefit. Cisco Services provides you with an innovative smart service portfolio, unique customer engagement strategy, and robust partner program to help you sell and deliver services, increase margins and build recurring...
22 INCREASES UNDER 12 PRESIDENTS . Minimum Wage. www.EconEdLink.org . Though it has been adjusted 22 times, the real value of the wage has been declining since 1968. The current $7.25 federal minimum is on par with its real value...
Education Law Defines Professional Misconduct Permitting, aiding or abetting an unlicensed person to provide professional services Practicing the profession while license is suspended, or willfully failing to register or notify department of any change in name or mailing address Committing...
Considerable interest has focused on syndromes of platelet dysfunction potentially unique to the HMII device, including an acquired von willebrand syndrome. Additional study of blood produce use, platelet aggregometry, and thromboelestography may prove useful in perioperative bleeding risk stratification.
Causes of Fatigue in Patients with Heart Failure Donna Mancini, MD Columbia University New York, NY Symptoms of CHF Fatigue Dyspnea Fatigue in HF Impaired Cardiac Output Response with Skeletal Muscle Hypoperfusion Abnormal Vasodilation/Altered Endothelial Fn Skeletal Muscle Dysfunction Malnutrition/Cachexia...