CS61C: Machine Structures

CS61C: Machine Structures

CS 161Computer Architecture Introduction to Advanced Architecturs Lecture 13 Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson) 1 1999 UCB Stages of Execution in Pipelined MIPS 5 stage instruction pipeline 1) I-fetch: Fetch Instruction, Increment PC 2) Decode: Instruction, Read Registers 3) Execute: Mem-reference: Calculate Address R-format: Perform ALU Operation 4) Memory: Load: Read Data from Data Memory Store: Write Data to Data Memory 5) Write Back: Write Data to Register

2 1999 UCB Pipelined Execution Representation Time IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB Program Flow To simplify pipeline, every instruction takes same number of steps, called stages 3 One clock cycle per stage 1999 UCB Review: Single-cycle Datapath for

MIPS PC Instruction Memory (Imem) Stage 1 Registers Stage 2 ALU Stage 3 Stage 5 Data Memory (Dmem) Stage 4 Use datapath figure to represent pipeline IFtch Dcd Exec Mem WB

4 Reg ALU IM DM Reg 1999 UCB Graphical Pipeline Representation Time (clock cycles) ALU I n IM DM Reg Reg

Load s IM DM Reg Reg t Add r. IM DM Reg Reg Store O IM DM Reg Reg Sub r IM DM Reg Reg d Or

e r (right half highlighted means read, left half write) ALU ALU ALU ALU 5 1999 UCB Required Changes to Datapath Introduce registers to separate 5 stages by putting IF/ID, ID/EX, EX/MEM, and MEM/WB registers in the datapath. Next PC value is computed in the 3rd step, but we need to bring in next instn in the next cycle Move PCSrc Mux to 1st stage Branch address is computed in 3rd stage. With

pipeline, the PC value has changed! Must carry the PC value along with instn. Width of IF/ID register = (IR)+(PC) = 64 bits. For lw instn, we need write register address at stage 5. But the IR is now occupied by another instn! So, we must carry the IR destination field as we move along the stages. See connection in fig. Length od ID/EX register = (Reg1)+(Reg2)+(offset)+(PC)+ destn = 133 bits 6 1999 UCB Pipelined Datapath (with Pipeline Regs) (6.2)Fetch Decode Execute Memory 0 M u x 1 IF/ID

EX/MEM ID/EX Write Back MEM/WB Add 4 Add Add result PC Address Ins tructio n Shift

left 2 Read register 1 Read data 1 Read register 2 Read data 2 Write register Imem Write data 0 M u x

1 Regs Zero ALU ALU result Address Write data 16 Sign extend 32 Read data 1 M u

x 0 Dmem 5 7 64 bits 133 bits 102 bits 69 bits 1999 UCB Pipelined Control (6.3) Start with single-cycle controller Group control lines by pipeline stage needed Extend pipeline registers with control bits WB

Instruction Control Mem WB EX Mem RegDst ALUop ALUSrc IF/ID 8 ID/EX WB Branch MemRead

MemWrite EX/MEM MemToReg RegWrite MEM/WB 1999 UCB Problems for Pipelining Hazards prevent next instruction from executing during its designated clock cycle, limiting speedup Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) Control hazards: conditional branches & other instructions may stall the pipeline delaying later instructions (must check detergent level before washing next load) Data hazards: Instruction depends on result of prior instruction still in the pipeline (matching socks in later load) 9

1999 UCB MIP S R40 00 pipe line 10 1999 UCB Advanced Architectural Concepts Can we achieve CPI < 1? (i.e., can we have IPC > 1?) State-of-the-Art Microprocessor Superscalar execution or Instruction Level Parallelism (ILP) Deeper Pipeline => Dynamic Branch Prediction => Speculation => Recovery Out-of-order Execution => Instruction Window and Prefetch => Reorder Buffers

VLIW Ex: Intel/HP Titanium 11 1999 UCB Instruction Level Parallelism (ILP) IPC > 1 Time IFtch Dcd Exec Mem WB IFetchDcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB Program Flow ILP = 2 EX: Pentium, SPARC, MIPS 10000, IBM Power PC 12 1999 UCB

HW Schemes: Instruction Parallelism Key idea: Allow instructions behind stall to proceed DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F12,F8,F14 Enables out-of-order execution => out-of-order completion ID stage checks for hazards. If no hazards, issue the instn for execution. Scoreboard dates to CDC 6600 in 1963 13 1999 UCB How ILP Works Issuing multiple instructions per cycle would require fetching multiple instructions from memory per cycle => called Superscalar degree or Issue width To find independent instructions, we must have a big pool of instructions to choose from, called instruction buffer (IB). As IB length increases, complexity of decoder (control) increases that

increases the datapath cycle time Prefetch instructions sequentially by an IFU that operates independently from datapath control. Fetch instruction (PC)+L, where L is the IB size or as directed by the branch predictor. 14 1999 UCB Microarchitecture of an ILP-based CPU (Power PC) 15 1999 UCB 16 1999 UCB Very Large Instruction Word (VLIW) IPC > 1 Time IFtch Dcd Exec Mem WB Exec

IFtch Dcd Exec Mem WB Exec IFtch Dcd Exec Mem WB Exec Program Flow 17 EX: Itanium 1999 UCB Tri Med ia 32-bit 64-bit TM3peripheral memory bus bus 2 Arc hite

data cache ctur 16KB e instruction cache 32 KB multi-port 128 words x 32 bits register file bypass network FU FU FU FU FU PC VLIW instruction decode and launch

Compressed code in the Instruction Cache 18 1999 UCB What is Multiprocessing Parallelism at the Instruction Level is limited because of data dependency => Speed up is limited!! Abundant availability of program level parallelism, like Do I = 1000, Loop Level Parallelism. How about employing multiple processors to execute the loops => Parallel processing or Multiprocessing With billion transistors on a chip, we can put a few CPUs in one chip => Chip multiprocessor 19 1999 UCB Hardware

Multithreading We need to develop a hardware multithreading technique because switching between threads in software is very time-consuming (Why?), so not suitable for main memory (instead of I/O) access, Ex: Multitasking Develop multiple PCs and register sets on the CPU so that thread switching can occur without having to store the register contents in main memory (stack, like it is done for context switching). Several threads reside in the CPU simultaneously, and execution switches between the threads on main memory access. How about both multiprocessors and multithreading on a chip? => Network Processor 20 1999 UCB Hardware Multithreading How can we guarantee no dependencies between instructions in a pipeline?

One way is to interleave execution of instructions from different program threads on same pipeline Interleave 4 threads, T1-T4, on non-bypassed 5-stage pipe T1: LW r1, 0(r2) T2: ADD r7, r1, r4 T3: XORI r5, r4, #12 T4: SW 0(r7), r5 T1: LW r5, 12(r1) 21 1999 UCB Architectural Comparisons (cont.) Fine-Grained Coarse-Grained Multiprocessing Time (processor cycle) Superscalar Simultaneous Multithreading

22 Thread 1 Thread 3 Thread 5 Thread 2 Thread 4 Idle slot 1999 UCB Intel IXP2400 Network Processor 23 XScale core replaces StrongARM

1.4 GHz target in 0.13-micron Nearest neighbor routes added between microengines Hardware to accelerate CRC operations and Random number generation 16 entry CAM

1999 UCB IBM Cell Processor SPU: Synergetic Processor Unit 24 1999 UCB Chip Multiprocessors 25 1999 UCB

Recently Viewed Presentations

  • Title Page - Occupational Safety and Health Administration

    Title Page - Occupational Safety and Health Administration

    Este material nonecesariamente refleja los puntos de vista o políticas del Departamento del Trabajo de EU, ni la mención de marcas registradas, productos u organizaciones comerciales implican el respaldo del gobierno de EU. 02/2016:00 Timing. TRAINER NOTE: Communicate importance of...
  • Creating Service Quotes and Contracts

    Creating Service Quotes and Contracts

    The Value of Selling Cisco Technical Service Your Business Benefit. Cisco Services provides you with an innovative smart service portfolio, unique customer engagement strategy, and robust partner program to help you sell and deliver services, increase margins and build recurring...
  • Internet2

    Internet2

    Internet2 Business Meeting John Kennedy Vice President for Operations [email protected]
  • Title of the Presentation - EconEdLink

    Title of the Presentation - EconEdLink

    22 INCREASES UNDER 12 PRESIDENTS . Minimum Wage. www.EconEdLink.org . Though it has been adjusted 22 times, the real value of the wage has been declining since 1968. The current $7.25 federal minimum is on par with its real value...
  • New York State Social Work Licensure

    New York State Social Work Licensure

    Education Law Defines Professional Misconduct Permitting, aiding or abetting an unlicensed person to provide professional services Practicing the profession while license is suspended, or willfully failing to register or notify department of any change in name or mailing address Committing...
  • Complications of Continuous-Flow Left Ventricular Assist Devices

    Complications of Continuous-Flow Left Ventricular Assist Devices

    Considerable interest has focused on syndromes of platelet dysfunction potentially unique to the HMII device, including an acquired von willebrand syndrome. Additional study of blood produce use, platelet aggregometry, and thromboelestography may prove useful in perioperative bleeding risk stratification.
  • Nutritional Effects on Exercise Performance and the ...

    Nutritional Effects on Exercise Performance and the ...

    Causes of Fatigue in Patients with Heart Failure Donna Mancini, MD Columbia University New York, NY Symptoms of CHF Fatigue Dyspnea Fatigue in HF Impaired Cardiac Output Response with Skeletal Muscle Hypoperfusion Abnormal Vasodilation/Altered Endothelial Fn Skeletal Muscle Dysfunction Malnutrition/Cachexia...
  • Linking Health Outcomes to Other Community Planning Priorities

    Linking Health Outcomes to Other Community Planning Priorities

    Linking Health Outcomes to Other Community Planning Priorities Avril Blamey Senior Public Health Advisor Policy Evaluation and Appraisal team NHS HS