Control Hazard
Outline Pipeline, Pipelined datapath Dependences, Hazards Structural, Data - Stalling, Forwarding Control Hazards Branch prediction
Control Hazard Arise from the pipelining of branches and other instructions that change the PC
Control Hazard Arise from the pipelining of branches and other instructions that change the PC Also called Branch Hazards
Branch Evaluation
beq beq $1, $1, $3, $3, 28 28 and and $12, $2, $2, $5 $5 or or $13, $6, $6, $2 $2 add add $14, $2, $2, $2 $2...... lw lw $4, $4, 50($7) Control Hazard
beq Control Hazard beq beq $1, $1, $3, $3, 28 28 and and $12, $2, $2, $5 $5 or or $13, $6, $6, $2 $2 add add $14, $2, $2, $2 $2...... lw Time lw $4, $4, 50($7) (clock cycles) 1 2 3 4 5 6 7 8 9
Control Hazard beq beq $1, $1, $3, $3, 28 28 and and $12, $2, $2, $5 $5 or or $13, $6, $6, $2 $2 add add $14, $2, $2, $2 $2...... lw Time lw $4, $4, 50($7) (clock cycles) 1 2 3 4 5 6 7 8 9 beq IF and
Control Hazard beq beq $1, $1, $3, $3, 28 28 and and $12, $2, $2, $5 $5 or or $13, $6, $6, $2 $2 add add $14, $2, $2, $2 $2...... lw Time lw $4, $4, 50($7) (clock cycles) 1 2 3 4 5 6 7 8 9 beq IF ID EX and or IF ID IF
beq beq $1, $1, $3, $3, 28 28 and and $12, $2, $2, $5 $5 or or $13, $6, $6, $2 $2 add add $14, $2, $2, $2 $2 Control Hazard Branch Taken...... lw Time lw $4, $4, 50($7) (clock cycles) 1 2 3 4 5 6 7 8 9 beq IF ID EX and or IF ID IF
beq beq $1, $1, $3, $3, 28 28 and and $12, $2, $2, $5 $5 or or $13, $6, $6, $2 $2 add add $14, $2, $2, $2 $2 Control Hazard NPC Updates...... lw Time lw $4, $4, 50($7) (clock cycles) 1 2 3 4 5 6 7 8 9 beq and IF ID EX MEM IF ID EX or add IF ID IF
beq beq $1, $1, $3, $3, 28 28 and and $12, $2, $2, $5 $5 or or $13, $6, $6, $2 $2 add add $14, $2, $2, $2 $2 Control Hazard NPC Updates...... lw Time lw $4, $4, 50($7) (clock cycles) 1 2 3 4 5 6 7 8 9 beq and IF ID EX MEM IF ID EX or add IF ID IF Insert NOP NOP
Control Hazard beq beq $1, $1, $3, $3, 28 28 and and $12, $2, $2, $5 $5 or or $13, $6, $6, $2 $2 Branch add add $14, $2, $2, $2 target $2 enters the the pipeline...... lw Time lw $4, $4, 50($7) (clock cycles) 1 2 3 4 5 6 7 8 9 beq and or IF ID EX MEM WB IF ID EX MEM IF ID EX add IF ID Target: lw IF
Control Hazard beq beq $1, $1, $3, $3, 28 28 and and $12, $2, $2, $5 $5 or or $13, $6, $6, $2 $2 add add $14, $2, $2, $2 $2...... lw Time lw $4, $4, 50($7) (clock cycles) 1 2 3 4 5 6 7 8 9 beq and or add IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM IF ID EX Target: lw IF ID Target+1 IF
Control Hazard beq beq $1, $1, $3, $3, 28 28 and and $12, $2, $2, $5 $5 or or $13, $6, $6, $2 $2 add add $14, $2, $2, $2 $2...... lw Time lw $4, $4, 50($7) (clock cycles) 1 2 3 4 5 6 7 8 9 beq IF ID EX MEM WB and IF ID EX MEM WB or IF ID EX MEM WB add IF ID EX MEM WB Target: lw IF ID EX MEM WB Target+1 IF ID EX MEM
Control Hazard beq beq $1, $1, $3, $3, 28 28 and and $12, $2, $2, $5 $5 or or $13, $6, $6, $2 $2 add add $14, $2, $2, $2 $2...... lw Time lw $4, $4, 50($7) (clock cycles) 1 2 3 4 5 6 7 8 9 Target: beq and or add lw IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Branch Penalty = 3 cycles
Control Hazard Branch evaluation occurs in EX stage. PC updates in MEM stage. Branch target loads in the next cycle The delay in determining the proper instruction to fetch is called a Control Hazard or Branch Hazard.
Reducing Pipeline Branch Penalties
Reducing Pipeline Branch Penalties Freeze the pipeline
Reducing Pipeline Branch Penalties Freeze the pipeline Time 1 2 3 4 5 6 (clock cycles) 7 8 9 beq nop nop nop lw IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB
Reducing Branch Delay
Reducing Branch Delay Most branches rely on simple tests (equality or sign) Move the branch decision up
Reducing Branch Delay Most branches rely on simple tests (equality or sign) Move the branch decision up Compute NPC earlier Evaluate branch at the earliest
Reducing Branch Delay
NPC Reducing Branch Delay
Reducing Branch Delay NPC Branch Delay: Delay between Branch instruction fetch and Branch Target fetch of of a taken branch.
Branch Delay ADD BEQ SUB... R2, R3,R4 R2, R4, loop R5, R5,R4 Time 1 2 3 4 5 6 (clock cycles) 7 8 9 10 11 ADD BEQ IF ID IF
Branch Delay ADD BEQ SUB... R2, R3,R4 R2, R4, loop R5, R5,R4 Time 1 2 3 4 5 6 (clock cycles) 7 8 9 10 11 ADD IF ID EX BEQ IF ID SUB IF
Branch Delay ADD BEQ SUB... R2, R3,R4 R2, R4, loop R5, R5,R4 Time 1 2 3 4 5 6 (clock cycles) 7 8 9 10 11 ADD IF ID EX BEQ IF ID SUB IF PC = PC + Offset
Branch Delay ADD BEQ SUB... R2, R3,R4 R2, R4, loop R5, R5,R4 Time 1 2 3 4 5 6 (clock cycles) 7 8 9 10 11 ADD IF ID EX MEM BEQ IF ID EX SUB IF NOP ID Branch Target IF
Branch Delay ADD BEQ SUB... R2, R3,R4 R2, R4, loop R5, R5,R4 Time 1 2 3 4 5 6 (clock cycles) 7 8 9 10 11 ADD IF ID EX MEM WB BEQ IF ID EX MEM SUB IF NOP ID NOP EX Branch Target IF ID
Branch Delay ADD BEQ SUB... R2, R3,R4 R2, R4, loop R5, R5,R4 Time 1 2 3 4 5 6 (clock cycles) 7 8 9 10 11 ADD IF ID EX MEM WB BEQ IF ID EX MEM WB SUB IF NOP ID NOP EX MEM NOP NOP WB Branch Target IF ID EX MEM WB
Branch Delay Slot Time 1 2 3 4 5 6 (clock cycles) 7 8 9 Branch IF ID EX MEM WB Instruction i + 1 Branch Target Branch Target + 1 IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Branch Delay Slot
Branch Hazards Time 1 2 3 4 5 6 (clock cycles) 7 8 9 Branch IF ID EX MEM WB Instruction i + 1 Branch Target Branch Target + 1 IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Branch Target IF IF ID EX MEM WB 1 stall cycle for every branch yields a performance loss of 10% to 30%!
Example Branch Taken
Branch Evaluation in ID Steps Are beq inputs ready?
Branch Evaluation in ID Steps Decode the instruction Decide whether to bypass/forward to the equality unit Forwarding Logic has to be modfied Complete the comparison If branch taken, set the PC to the branch target address.
Branch Evaluation Forwarding New forwarding logic required Source operands can come from ALU/MEM or MEM/WB If source operands are not ready, data hazard can occur and a stall will be needed....... add add $1, $1, $2, $2, $2 $2 beq beq $1, $1, $3, $3, 28 28......
Reducing Pipeline Branch Penalties Freeze the pipeline Static Prediction Predict Untaken, Predict Taken Delayed Branch Fill Branch Delay Slot
Predict Untaken Scheme Time (clock cycles) 1 2 3 4 5 6 7 8 9 Untaken Branch IF ID EX MEM WB Instruction i + 1 Instruction i + 2 Instruction i + 3 IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Time (clock cycles) 1 2 3 4 5 6 7 8 9 Taken Branch IF ID EX MEM WB Instruction i + 1 Branch Target Branch Target + 1 IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB
Reducing Pipeline Branch Penalties beq $1, $3, 28 and and $12, $12, $2, $2, $5 $5 or or $13, $13, $6, $6, $2 $2 add add $14, $14, $2, $2, $2 $2 lw lw $4, $4, 50($7) 50($7) Predict Taken beq $1, $3, 28
Reducing Pipeline Branch Penalties Predict Taken Great if prediction is correct beq beq $1, $1, $3, $3, 28 28 and and $12, $12, $2, $2, $5 $5 or or $13, $13, $6, $6, $2 $2 add add $14, $14, $2, $2, $2 $2 lw lw $4, $4, 50($7) 50($7)
Reducing Pipeline Branch Penalties Predict Taken Great if prediction is correct. Penalty on misprediction only. Time 1 2 3 4 5 6 (clock cycles) 7 8 9 beq beq $1, $1, $3, $3, 28 28 and and $12, $12, $2, $2, $5 $5 or or $13, $13, $6, $6, $2 $2 add add $14, $14, $2, $2, $2 $2 lw lw $4, $4, 50($7) 50($7) beq lw...... and IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB
Reducing Pipeline Branch Penalties sll... srl srl $14, $14, $2, $2, $2 $2 beq beq $1, $1, $3, $3, 28 28 and and $12, $12, $2, $2, $5 $5...... Fill the branch delay slot sll...
Reducing Pipeline Branch Penalties Fill the branch delay slot sll sll...... srl srl $14, $14, $2, $2, $2 $2 beq beq $1, $1, $3, $3, 28 28 and and $12, $12, $2, $2, $5 $5...... Time 1 2 3 4 5 6 (clock cycles) 7 8 9 beq srl IF ID EX MEM WB IF ID EX MEM WB branch target IF ID EX MEM WB
Branch Delay Slot Predict taken Predict untaken
Static Branch Prediction
Dynamic Branch Prediction 0x0100 0x0100 while(1) {{...... for(i=0; i<count; i++) i++) {{ }} BRANCH BRANCH }}
Dynamic Branch Prediction 0x0100 0x0100 while(1) {{...... for(i=0; i<count; i++) i++) {{ }} BRANCH BRANCH }} BRANCH TARGET BUFFER PC Prediction Target 0x0100 1 0x128 0x0154 0 0x080 0x0210 1 0x300... 1 Addresses of branches in the program
Dynamic Branch Prediction Branch Target Buffer Single bit predictors (1-bit bimodal predictor) Change prediction with branch behaviour No. of wrong predictions? 0x0100 0x0100 while(1) {{...... for(i=0; i<count; i++) i++) {{ }} BRANCH BRANCH }} BRANCH TARGET BUFFER PC Prediction Target 0x0100 1 0x128 0x0154 0 0x080 0x0210 1 0x300... 1 Addresses of branches in the program
Dynamic Branch Prediction Branch Target Buffer Single bit predictors (1-bit bimodal predictor) Change prediction with branch behaviour No. of wrong predictions? 0x0100 0x0100 while(1) {{...... for(i=0; i<count; i++) i++) {{ }} BRANCH BRANCH }} Branch instruction behaviour T T T T N T T T T T T T T T T T T Wrong Predictions BRANCH TARGET BUFFER PC Prediction Target 0x0100 1 0x128 0x0154 0 0x080 0x0210 1 0x300... 1 Addresses of branches in the program
Dynamic Branch Prediction Branch Target Buffer Single bit predictors (1-bit bimodal predictor) Change prediction with branch behaviour No. of wrong predictions? 0x0100 0x0100 while(1) {{...... for(i=0; i<count; i++) i++) {{ }} BRANCH BRANCH }} Branch instruction behaviour T T T T N T T T T T T T T T T T T Wrong Predictions BRANCH TARGET BUFFER PC Prediction Target 0x0100 1 0x128 0x0154 0 0x080 0x0210 1 0x300... 1 Addresses of branches in the program Can Can we we do do better?
Dynamic Branch Prediction 0010 Branch Prediction Bits 00 11 10 11 11 00 00 00
Dynamic Branch Prediction 2-bit predictors Branch Prediction Bits 2-bit Bimodal Saturating Counter 00 0010 11 10 11 11 00 00 00
Dynamic Branch Prediction 2-bit predictors Branch Prediction Bits 2-bit Bimodal Saturating Counter 00 0010 11 10 11 11 00 00 00
Dynamic Branch Prediction 2-bit predictors... T T T N T T T T T T T T...
Dynamic Branch Prediction 2-bit predictors... T T T N T T T T T T T T...
Dynamic Branch Prediction 2-bit predictors... T T T N T T T T T T T T...
Dynamic Branch Prediction 2-bit predictors... T T T N T T T T T T T T...
Dynamic Branch Prediction 2-bit predictors... T T T N T T T T T T T T...
Dynamic Branch Prediction 2-bit predictors... T T T N T T T T T T T T...
Dynamic Branch Prediction 2-bit predictors T T T N T T T T T T T T...
Dynamic Branch Prediction n-bit saturating counters
Branch Predictors Without Branch Predictor With With Branch Predictor
Observations? Dynamic Branch Prediction
Correlating Branch Predictors eqntott code code if (aa == 2) aa = 0; if (bb == 2) bb = 0; if (aa!=bb) {
Correlating Branch Predictors eqntott code code if (aa == 2) aa = 0; if (bb == 2) bb = 0; if (aa!=bb) { Outcome of the previous branch X 0010 Branch Prediction Buffer 00 11 10 11 11 00 00 01
Correlating Branch Predictors eqntott code code if (aa == 2) aa = 0; if (bb == 2) bb = 0; if (aa!=bb) { Two-level predictors (1,2) predictor Outcome of the previous branch X 0010 Branch Prediction Buffer 00 11 10 11 11 00 00 01
Correlating Branch Predictors eqntott code code if (aa == 2) aa = 0; if (bb == 2) bb = 0; if (aa!=bb) { Two-level predictors (1,2) predictor Outcome of the previous branch X 0010 Branch Prediction Buffer 00 11 10 11 11 00 00 01 Yeh and Patt, Alternative implementations of two-level adaptive branch prediction, ISCA, 1992.
Correlating Branch Predictors if (aa == 2) aa = 0; Outcomes of the previous 2 branches Branch Prediction Buffer if (bb == 2) bb = 0; if (aa!=bb) { XX 0010 A 4096 bit, (2,2) buffer supports how many branch instructions? 00 11 10 11 11 00 00 01 Yeh and Patt, Alternative implementations of two-level adaptive branch prediction, ISCA, 1992.
Correlating Branch Predictors if (aa == 2) aa = 0; Outcomes of the previous 2 branches Branch Prediction Buffer if (bb == 2) bb = 0; if (aa!=bb) { XX 0010 A 4096 bit, (2,2) buffer supports how many branch instructions? 00 11 10 11 11 00 00 01 (m,n) BPB bits=2 m n No. of prediction entries Yeh and Patt, Alternative implementations of two-level adaptive branch prediction, ISCA, 1992.
Correlating Branch Predictors Yeh and Patt, ISCA, 1992.
Local Predictors Previous outcomes of the same branch Branch Prediction Buffer XXXXXXXXXX 00 11............ 00 01 1K entries
Local Predictors A history of branch behaviour is recorded Previous outcomes of the same branch Branch Prediction Buffer XXXXXXXXXX 00 11............ 00 01 1K entries
Local Predictors A history of branch behaviour is recorded One for each possible combination of outcomes for the last n occurrences of this branch Previous outcomes of the same branch Branch Prediction Buffer XXXXXXXXXX 00 11............ 00 01 1K entries
Local Predictors Example 0x0100 0x0100 while(1) {{...... for(i=0; i<count; i++) i++) {{ }} BRANCH BRANCH }} Branch instruction behaviour T T N T T T T N T T T T N T T 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1
Local Predictors Example 0x0100 0x0100 while(1) {{...... for(i=0; i<count; i++) i++) {{ }} BRANCH BRANCH }} Branch instruction behaviour T T N T T T T N T T T T N T T 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 Local History 0 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 Prediction NT NT T T T T T T T T NT NT
Tournament Predictors
Tournament Predictors Branch PC PC Local predictor Global predictor Predictor Selector Prediction
Tournament Predictors Use multiple predictors: Global, local or mix Branch PC PC Local predictor Global predictor Predictor Selector Prediction
Tournament Predictors Use multiple predictors: Global, local or mix Combine them with a selector 2 bit saturating counter to select the right predictor for the branch (global vs. local) Branch PC PC Local predictor Global predictor Predictor Selector Prediction
Tournament Predictors
Outline Pipeline, Pipelined datapath Dependences, Hazards Structural, Data - Stalling, Forwarding Control Hazards Branch prediction