2015-10-13

MIPS, part 4: Branches and jumps

Conditional branches and unconditional jumps are the last piece of the puzzle. They allow programs to make decisions, skip over code, and call subroutines.

You'll want to read MIPS part 1: Registers and calling convention and MIPS part 2: ALU instructions first.

The branch delay slot (#delay)

An important feature of the MIPS architecture is that the instruction following branches and jumps is executed before the target instruction is. This is known as the branch delay slot, or sometimes delay slot for short.

For example, the following code calls a subroutine with parameter 1 set to 8000: JAL 0x1040CCD0 ; jump to 0x1040CCD0, set $31 to the return address ADDIU $4, $0, 8000 ; set parameter 1 to 8000 (See also the syntax for JAL.)

What happens is as follows:

  • Cycle 1: The first instruction is fetched and decoded as a JAL and its execution starts.
  • Cycle 2: The next instruction is fetched and decoded as an ADDIU and its execution starts. Elsewhere in the pipeline, the Program Counter is updated with 0x1040CCD0, and $31 is updated with the address of the JAL plus 2 instructions, which will be the next instruction to be executed after the subroutine returns.
  • Cycle 3: The instruction at 0x1040CCD0 is fetched and decoded and its execution starts.

For conditional branches, if the instruction in the delay slot modifies a register used by the branch, the branch doesn't care. For example: BNE $13, $0, 0x1037586C ; if ($13 != $0) goto 0x1037586C; LW $13, 0($4) ; loads $13 from memory (See also the syntax for BNE.)

Here, the value of $13 at the start of BNE is used to test for the branch regardless of the write to $13 in its delay slot.

The load delay slot and multiplier unit delay were mandatory in MIPS I through MIPS III, but were taken out in later revisions of the MIPS architecture (e.g. MIPS32). Programs written for MIPS32 processors can omit the NOPs required by earlier processors, but the processor can still run older programs, because running NOPs does not affect anything in the program.

On the other hand, the branch delay slot has been mandatory in all revisions of the MIPS architecture and cannot be removed, because if a later processor did not run them anymore, all older programs that did meaningful work in their branch delay slots would be broken.

Branch Likely instructions (#likely)

MIPS II defined these, so they apply to the Nintendo 64, but not to the Sony PlayStation, which had a MIPS I processor.

Branch Likely instructions, which are a hint to the processor that a branch is far more often (> 7/8) likely to be taken than to be untaken. Some processors don't take the hint, but the instructions still have a side-effect they must honor: if a likely branch is not taken, its delay slot is not executed.

That means a compiler can, among other things, place a memory load instruction in the delay slot of a likely branch and not overrun an array if a loop has ended but the next memory slot is out of bounds.

Conditional branches (#cond)

BEQ: Branch if Equal (#beq)

BEQ Rs, Rt, Imm16 ; if (Rs == Rt) goto PC + (Imm16 + 1) instructions;

If the values of registers Rs and Rt are equal: Imm16, interpreted as a signed offset, is a number of instructions added to the address of the delay slot to make the address of the branch, and the processor initiates a branch there. The delay slot is executed first.

Otherwise, the branch falls through.

In programs, BEQ $0, $0, Imm16 ; goto PC + (Imm16 + 1) instructions; can be used as an unconditional relative branch.

BEQL is the Branch Likely version of this instruction. If BEQL is not taken, its delay slot is not executed.

BGEZ: Branch if Greater than or Equal to Zero (#bgez)

BGEZ Rs, Imm16 ; if ((int) Rs >= 0) goto PC + (Imm16 + 1) instructions;

If the sign bit of register Rs is unset: Imm16, interpreted as a signed offset, is a number of instructions added to the address of the delay slot to make the address of the branch, and the processor initiates a branch there. The delay slot is executed first.

Otherwise, the branch falls through.

In programs, BGEZ $0, Imm16 ; goto PC + (Imm16 + 1) instructions; can be used as an unconditional relative branch.

BGEZL is the Branch Likely version of this instruction. If BGEZL is not taken, its delay slot is not executed.

BGEZAL: Branch if Greater than or Equal to Zero And Link (#bgezal)

BGEZAL Rs, Imm16

If the sign bit of register Rs is unset: Imm16, interpreted as a signed offset, is a number of instructions added to the address of the delay slot to make the address of the branch, and the processor initiates a branch there. The delay slot is executed first.

Otherwise, the branch falls through.

Regardless of the outcome of the branch, $31 gets the address of the instruction after its delay slot. Because the branch may not have fully executed yet, reading $31 in the delay slot is undefined behavior: it may yield its previous value, or the value it will have after the target instruction executes, or an intermediate value.

In programs, BGEZAL $0, Imm16 can be used as a relative subroutine call.

BGEZALL is the Branch Likely version of this instruction. If BGEZALL is not taken, its delay slot is not executed.

BGTZ: Branch if Greater Than Zero (#bgtz)

BGTZ Rs, Imm16 ; if ((int) Rs > 0) goto PC + (Imm16 + 1) instructions;

If the signed value of register Rs is greater than 0: Imm16, interpreted as a signed offset, is a number of instructions added to the address of the delay slot to make the address of the branch, and the processor initiates a branch there. The delay slot is executed first.

Otherwise, the branch falls through.

BGTZL is the Branch Likely version of this instruction. If BGTZL is not taken, its delay slot is not executed.

BLEZ: Branch if Less than or Equal to Zero (#blez)

BLEZ Rs, Imm16 ; if ((int) Rs <= 0) goto PC + (Imm16 + 1) instructions;

If the signed value of register Rs is less than or equal to 0: Imm16, interpreted as a signed offset, is a number of instructions added to the address of the delay slot to make the address of the branch, and the processor initiates a branch there. The delay slot is executed first.

Otherwise, the branch falls through.

BLEZL is the Branch Likely version of this instruction. If BLEZL is not taken, its delay slot is not executed.

BLTZ: Branch if Less Than Zero (#bltz)

BLTZ Rs, Imm16 ; if ((int) Rs < 0) goto PC + (Imm16 + 1) instructions;

If the sign bit of register Rs is unset: Imm16, interpreted as a signed offset, is a number of instructions added to the address of the delay slot to make the address of the branch, and the processor initiates a branch there. The delay slot is executed first.

Otherwise, the branch falls through.

BLTZL is the Branch Likely version of this instruction. If BLTZL is not taken, its delay slot is not executed.

BLTZAL: Branch if Less Than Zero (#bltzal)

BLTZAL Rs, Imm16

If the sign bit of register Rs is unset: Imm16, interpreted as a signed offset, is a number of instructions added to the address of the delay slot to make the address of the branch, and the processor initiates a branch there. The delay slot is executed first.

Otherwise, the branch falls through.

Regardless of the outcome of the branch, $31 gets the address of the instruction after its delay slot. Because the branch may not have fully executed yet, reading $31 in the delay slot is undefined behavior: it may yield its previous value, or the value it will have after the target instruction executes, or an intermediate value.

In programs, BLTZAL $0, Imm16 ; $31 = PC + 8; NOP ADDIU $31, $31, -8 ; $31 = $31 - 8; can be used as a quick way to get the value of the Program Counter, if needed. It doesn't matter where BLTZAL goes, because it's never taken, but it's good practice to branch to +1.

BLTZALL is the Branch Likely version of this instruction. If BLTZALL is not taken, its delay slot is not executed.

BNE: Branch if Not Equal (#bne)

BNE Rs, Rt, Imm16 ; if (Rs != Rt) goto PC + (Imm16 + 1) instructions;

If the values of registers Rs and Rt are not equal: Imm16, interpreted as a signed offset, is a number of instructions added to the address of the delay slot to make the address of the branch, and the processor initiates a branch there. The delay slot is executed first.

Otherwise, the branch falls through.

BNEL is the Branch Likely version of this instruction. If BNEL is not taken, its delay slot is not executed.

Unconditional jumps (#uncond)

J: Jump to address in 256 MiB segment (#j)

J Imm26

Jumps to an address relative to a 256 MiB segment. Due to instructions being 4-byte-aligned, the bits of Imm26 are shifted to the left by 2 to make the lower 28 bits of the target address. The bits at position 28 and up are taken from the value of the Program Counter as of the J instruction's delay slot. The processor initiates a jump to the target address. The delay slot is executed first.

If J is the last instruction of a 256 MiB segment, it will jump to the designated address in the next segment. Consider this code: 1FFFFFFC: J 0x3FFFFFA 20000000: NOP It will jump to address 0x2FFFFFE8, not 0x1FFFFFE8. This is encountered very infrequently, but must still be accounted for.

JAL: Jump And Link to address in 256 MiB segment (#jal)

JAL Imm26

As J. Additionally, $31 gets the address of the instruction after its delay slot. Because the branch may not have fully executed yet, reading $31 in the delay slot is undefined behavior: it may yield its previous value, or the value it will have after the target instruction executes, or an intermediate value.

In programs running in a single 256 MiB segment, JAL is used for subroutine calls.

JR: Jump to Register (#jr)

JR Rs ; goto Rs;

Initiates a jump to the address contained in register Rs. The delay slot is executed first.

In programs, JR $31 is used to return from subroutine calls, due to both JAL and JALR putting the return address into $31.

Other registers may be used; in that case, JR is probably used to implemented computed or indirect jumps.

JALR: Jump And Link to Register (#jalr)

JALR Rs

As JR. Additionally, $31 gets the address of the instruction after its delay slot. Because the branch may not have fully executed yet, reading $31 in the delay slot is undefined behavior: it may yield its previous value, or the value it will have after the target instruction executes, or an intermediate value.

In programs that don't fit in one 256 MiB segment, JALR is used for subroutine calls.


Next time, we start looking at old subroutines with our compiler knowledge of today, and we thank the compiler implementers for taking the time to implement all of the optimizations we now take for granted... I hope!

1 comment :

  1. Thanks again for the great post.. looking forward to the next post and analyzing the horrors of ancient MIPS compilers.

    ReplyDelete