Conditional branches and unconditional jumps are the last piece of the puzzle. They allow programs to make decisions, skip over code, and call subroutines.
You'll want to read MIPS part 1: Registers and calling convention and MIPS part 2: ALU instructions first.
The branch delay slot (#delay)
An important feature of the MIPS architecture is that the instruction following branches and jumps is executed before the target instruction is. This is known as the branch delay slot, or sometimes delay slot for short.
For example, the following code calls a subroutine with parameter 1 set to 8000:
JAL 0x1040CCD0 ; jump to 0x1040CCD0, set $31 to the return address
ADDIU $4, $0, 8000 ; set parameter 1 to 8000
(See also the syntax for JAL
.)
What happens is as follows:
- Cycle 1: The first instruction is fetched and decoded as a
JAL
and its execution starts. - Cycle 2: The next instruction is fetched and decoded as an
ADDIU
and its execution starts. Elsewhere in the pipeline, the Program Counter is updated with0x1040CCD0
, and$31
is updated with the address of theJAL
plus 2 instructions, which will be the next instruction to be executed after the subroutine returns. - Cycle 3: The instruction at
0x1040CCD0
is fetched and decoded and its execution starts.
For conditional branches, if the instruction in the delay slot modifies a register used by the branch, the branch doesn't care. For example:
BNE $13, $0, 0x1037586C ; if ($13 != $0) goto 0x1037586C;
LW $13, 0($4) ; loads $13 from memory
(See also the syntax for BNE
.)
Here, the value of $13
at the start of BNE
is used to test for the branch regardless of the write to $13
in its delay slot.
The load delay slot and multiplier unit delay were mandatory in MIPS I through MIPS III, but were taken out in later revisions of the MIPS architecture (e.g. MIPS32). Programs written for MIPS32 processors can omit the NOP
s required by earlier processors, but the processor can still run older programs, because running NOP
s does not affect anything in the program.
On the other hand, the branch delay slot has been mandatory in all revisions of the MIPS architecture and cannot be removed, because if a later processor did not run them anymore, all older programs that did meaningful work in their branch delay slots would be broken.
Branch Likely instructions (#likely)
MIPS II defined these, so they apply to the Nintendo 64, but not to the Sony PlayStation, which had a MIPS I processor.
Branch Likely instructions, which are a hint to the processor that a branch is far more often (> 7/8) likely to be taken than to be untaken. Some processors don't take the hint, but the instructions still have a side-effect they must honor: if a likely branch is not taken, its delay slot is not executed.
That means a compiler can, among other things, place a memory load instruction in the delay slot of a likely branch and not overrun an array if a loop has ended but the next memory slot is out of bounds.
Conditional branches (#cond)
BEQ
: Branch if Equal (#beq)
BEQ Rs, Rt, Imm16 ; if (Rs == Rt) goto PC + (Imm16 + 1) instructions;
If the values of registers Rs
and Rt
are equal: Imm16
, interpreted as a signed offset, is a number of instructions added to the address of the delay slot to make the address of the branch, and the processor initiates a branch there. The delay slot is executed first.
Otherwise, the branch falls through.
In programs,
BEQ $0, $0, Imm16 ; goto PC + (Imm16 + 1) instructions;
can be used as an unconditional relative branch.
BEQL
is the Branch Likely version of this instruction. If BEQL
is not taken, its delay slot is not executed.
BGEZ
: Branch if Greater than or Equal to Zero (#bgez)
BGEZ Rs, Imm16 ; if ((int) Rs >= 0) goto PC + (Imm16 + 1) instructions;
If the sign bit of register Rs
is unset: Imm16
, interpreted as a signed offset, is a number of instructions added to the address of the delay slot to make the address of the branch, and the processor initiates a branch there. The delay slot is executed first.
Otherwise, the branch falls through.
In programs,
BGEZ $0, Imm16 ; goto PC + (Imm16 + 1) instructions;
can be used as an unconditional relative branch.
BGEZL
is the Branch Likely version of this instruction. If BGEZL
is not taken, its delay slot is not executed.
BGEZAL
: Branch if Greater than or Equal to Zero And Link (#bgezal)
BGEZAL Rs, Imm16
If the sign bit of register Rs
is unset: Imm16
, interpreted as a signed offset, is a number of instructions added to the address of the delay slot to make the address of the branch, and the processor initiates a branch there. The delay slot is executed first.
Otherwise, the branch falls through.
Regardless of the outcome of the branch, $31
gets the address of the instruction after its delay slot. Because the branch may not have fully executed yet, reading $31
in the delay slot is undefined behavior: it may yield its previous value, or the value it will have after the target instruction executes, or an intermediate value.
In programs,
BGEZAL $0, Imm16
can be used as a relative subroutine call.
BGEZALL
is the Branch Likely version of this instruction. If BGEZALL
is not taken, its delay slot is not executed.
BGTZ
: Branch if Greater Than Zero (#bgtz)
BGTZ Rs, Imm16 ; if ((int) Rs > 0) goto PC + (Imm16 + 1) instructions;
If the signed value of register Rs
is greater than 0: Imm16
, interpreted as a signed offset, is a number of instructions added to the address of the delay slot to make the address of the branch, and the processor initiates a branch there. The delay slot is executed first.
Otherwise, the branch falls through.
BGTZL
is the Branch Likely version of this instruction. If BGTZL
is not taken, its delay slot is not executed.
BLEZ
: Branch if Less than or Equal to Zero (#blez)
BLEZ Rs, Imm16 ; if ((int) Rs <= 0) goto PC + (Imm16 + 1) instructions;
If the signed value of register Rs
is less than or equal to 0: Imm16
, interpreted as a signed offset, is a number of instructions added to the address of the delay slot to make the address of the branch, and the processor initiates a branch there. The delay slot is executed first.
Otherwise, the branch falls through.
BLEZL
is the Branch Likely version of this instruction. If BLEZL
is not taken, its delay slot is not executed.
BLTZ
: Branch if Less Than Zero (#bltz)
BLTZ Rs, Imm16 ; if ((int) Rs < 0) goto PC + (Imm16 + 1) instructions;
If the sign bit of register Rs
is unset: Imm16
, interpreted as a signed offset, is a number of instructions added to the address of the delay slot to make the address of the branch, and the processor initiates a branch there. The delay slot is executed first.
Otherwise, the branch falls through.
BLTZL
is the Branch Likely version of this instruction. If BLTZL
is not taken, its delay slot is not executed.
BLTZAL
: Branch if Less Than Zero (#bltzal)
BLTZAL Rs, Imm16
If the sign bit of register Rs
is unset: Imm16
, interpreted as a signed offset, is a number of instructions added to the address of the delay slot to make the address of the branch, and the processor initiates a branch there. The delay slot is executed first.
Otherwise, the branch falls through.
Regardless of the outcome of the branch, $31
gets the address of the instruction after its delay slot. Because the branch may not have fully executed yet, reading $31
in the delay slot is undefined behavior: it may yield its previous value, or the value it will have after the target instruction executes, or an intermediate value.
In programs,
BLTZAL $0, Imm16 ; $31 = PC + 8;
NOP
ADDIU $31, $31, -8 ; $31 = $31 - 8;
can be used as a quick way to get the value of the Program Counter, if needed. It doesn't matter where BLTZAL
goes, because it's never taken, but it's good practice to branch to +1.
BLTZALL
is the Branch Likely version of this instruction. If BLTZALL
is not taken, its delay slot is not executed.
BNE
: Branch if Not Equal (#bne)
BNE Rs, Rt, Imm16 ; if (Rs != Rt) goto PC + (Imm16 + 1) instructions;
If the values of registers Rs
and Rt
are not equal: Imm16
, interpreted as a signed offset, is a number of instructions added to the address of the delay slot to make the address of the branch, and the processor initiates a branch there. The delay slot is executed first.
Otherwise, the branch falls through.
BNEL
is the Branch Likely version of this instruction. If BNEL
is not taken, its delay slot is not executed.
Unconditional jumps (#uncond)
J
: Jump to address in 256 MiB segment (#j)
J Imm26
Jumps to an address relative to a 256 MiB segment. Due to instructions being 4-byte-aligned, the bits of Imm26 are shifted to the left by 2 to make the lower 28 bits of the target address. The bits at position 28 and up are taken from the value of the Program Counter as of the J
instruction's delay slot. The processor initiates a jump to the target address. The delay slot is executed first.
If J
is the last instruction of a 256 MiB segment, it will jump to the designated address in the next segment. Consider this code:
1FFFFFFC: J 0x3FFFFFA
20000000: NOP
It will jump to address 0x2FFFFFE8
, not 0x1FFFFFE8
. This is encountered very infrequently, but must still be accounted for.
JAL
: Jump And Link to address in 256 MiB segment (#jal)
JAL Imm26
As J
. Additionally, $31
gets the address of the instruction after its delay slot. Because the branch may not have fully executed yet, reading $31
in the delay slot is undefined behavior: it may yield its previous value, or the value it will have after the target instruction executes, or an intermediate value.
In programs running in a single 256 MiB segment, JAL
is used for subroutine calls.
JR
: Jump to Register (#jr)
JR Rs ; goto Rs;
Initiates a jump to the address contained in register Rs
. The delay slot is executed first.
In programs,
JR $31
is used to return from subroutine calls, due to both JAL
and JALR
putting the return address into $31
.
Other registers may be used; in that case, JR
is probably used to implemented computed or indirect jumps.
JALR
: Jump And Link to Register (#jalr)
JALR Rs
As JR
. Additionally, $31
gets the address of the instruction after its delay slot. Because the branch may not have fully executed yet, reading $31
in the delay slot is undefined behavior: it may yield its previous value, or the value it will have after the target instruction executes, or an intermediate value.
In programs that don't fit in one 256 MiB segment, JALR
is used for subroutine calls.
Next time, we start looking at old subroutines with our compiler knowledge of today, and we thank the compiler implementers for taking the time to implement all of the optimizations we now take for granted... I hope!
Thanks again for the great post.. looking forward to the next post and analyzing the horrors of ancient MIPS compilers.
ReplyDelete