This entry will focus on a subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 6CD8 (hexadecimal) within the ROM.
Super Mario 64 appears to have been compiled with one of the earliest compilers available for the Nintendo 64. It does not schedule instructions well; a modern compiler could do much better.
You should read the entries on the MIPS instruction set first.
The original subroutine
6CD8:
ADDIU $29, $29, -40 ; Grow the stack by 40 bytes
SW $31, 28($29) ; There are JALs below, so store the return address
SW $4, 40($29) ; Store argument 1 to the stack
SW $5, 44($29) ; Store argument 2 to the stack
SW $16, 24($29) ; Preserve the caller's value of $16
SW $0, 36($29) ; Initialize stack slot 36 to 0
LH $16, 42($29) ; Grab the low 16 bits of stack slot 40 (big-endian)
BEQ $16, $0, 6D10 ; If those bits are equal to 0, go to 6D10
NOP
6CFC:
ADDIU $1, $0, 1
BEQ $16, $1, 6D24 ; If those bits are equal to 1, go to 6D24
NOP
BEQ $0, $0, 6D38 ; Unconditional branch to 6D38
NOP
6D10:
JAL 0x8024BA8C ; Subroutine call - may use our values of $4 and $5
NOP
SW $2, 36($29) ; Store the subroutine's return value into stack slot 36
BEQ $0, $0, 6D38 ; Unconditional branch to 6D38
NOP
6D24:
JAL 0x8024B9B8 ; Subroutine call - may use our values of $4 and $5
NOP
SW $2, 36($29) ; Store the subroutine's return value into stack slot 36
BEQ $0, $0, 6D38 ; Unconditional branch to 6D38
NOP
6D38:
BEQ $0, $0, 6D48 ; Unconditional branch to 6D48
LW $2, 36($29) ; Load our return value from stack slot 36
6D40:
BEQ $0, $0, 6D48 ; Unconditional branch to 6D48
NOP
6D48:
LW $31, 28($29) ; Restore our return address after JALs
LW $16, 24($29) ; Restore the caller's value of $16
ADDIU $29, $29, 40 ; Return the stack to its old size
JR $31 ; Return after the JAL that called us
NOP
Basic blocks
You may have noticed that the subroutine up there contains multiple branches. Compilers, like gcc and LLVM, view these branches as the terminating instructions of basic blocks.
Basic blocks are blocks of instructions having two properties:
- Only the first instruction of a basic block may be the target of a branch from elsewhere.
- The last instruction of a basic block must be a branch to elsewhere.
This is because some optimizations, notably instruction scheduling and combination, must know if multiple instructions are always executed or if an instruction in the middle may be the target of a branch.
Accordingly, the very first thing we must do is figure out how a compiler saw this function in terms of basic blocks:
function 6CD8(%arg1 signed 16-bit, %arg2) {
_start:
stack allocate %retVal
%retVal = 0
decide %arg1Lo == 0? yes: is0; no: test1
test1:
decide %arg1Lo == 1? yes: is1; no: others
others:
goto returnValue
is0:
%retVal = call 0x8024BA8C(%arg1, %arg2)
goto returnValue
is1:
%retVal = call 0x8024B9B8(%arg1, %arg2)
goto returnValue
returnValue:
set return value to %retVal
goto preEnd
preEnd:
goto _end
_end:
return
}
So the code does three things depending on argument 1's value:
- If it's 0, it calls a function at
0x8024BA8C
and arranges to eventually return the value returned by that function. - If it's 1, it calls a function at
0x8024B9B8
and arranges to eventually return the value returned by that function. - Otherwise, 0, the value with which the stack slot was initialized, is returned.
Between these basic blocks, three optimizations can be applied.
Basic block elimination (#bbe)
The compiler did not clean up branches to basic blocks whose sole purpose was to branch to another basic block. The basic block I labeled as preEnd
above, for example, merely branches to _end
, which destroys the stack frame and returns. Branches to preEnd
can be rewritten as branches to _end
. In the subroutine, it's the block at 6D40
.
If a basic block (B) is the target of only one unconditional branch done by another basic block (A), then basic blocks A and B in that order can become the new A. The compiler did not clean these up either; if it did, it would have avoided the branch at the end of the block I labeled returnValue
. Note that returnValue
itself is the target of 3 unconditional branches, in is0
, is1
and others
, so they can't be combined.
Dead code elimination (#dce)
Once all branches to a basic block are gone, that basic block can be eliminated. After the basic block elimination above, preEnd
could be removed.
Branch elimination (#be)
When it was time to generate code for those basic blocks, the compiler eliminated some branches when the terminating instruction of a basic block went to the very next instruction, but not all. This is something that modern compilers usually do consistently.
You can see these superfluous branches at the end of the blocks at 6D24
and 6D40
in the subroutine. If the block at 6D40
was eliminated, as a modern compiler would, then the block at 6D38
would have have been superfluous instead. In all of those cases, the branch goes to the very next instruction, so it can be removed.
The rewritten subroutine (#rewritten)
6CD8:
ADDIU $29, $29, -40 ; Grow the stack by 40 bytes
SW $31, 28($29) ; There are JALs below, so store the return address
SW $4, 40($29) ; Store argument 1 to the stack
SW $5, 44($29) ; Store argument 2 to the stack
SW $16, 24($29) ; Preserve the caller's value of $16
SW $0, 36($29) ; Initialize stack slot 36 to 0
LH $16, 42($29) ; Grab the low 16 bits of stack slot 40 (big-endian)
BEQ $16, $0, 6D10 ; If those bits are equal to 0, go to 6D10
NOP
6CFC:
ADDIU $1, $0, 1
BEQ $16, $1, 6D24 ; If those bits are equal to 1, go to 6D24
NOP
BEQ $0, $0, 6D30 ; Unconditional branch to 6D30
NOP
6D10:
JAL 0x8024BA8C ; Subroutine call - may use our values of $4 and $5
NOP
SW $2, 36($29) ; Store the subroutine's return value into stack slot 36
BEQ $0, $0, 6D30 ; Unconditional branch to 6D30
NOP
6D24:
JAL 0x8024B9B8 ; Subroutine call - may use our values of $4 and $5
NOP
SW $2, 36($29) ; Store the subroutine's return value into stack slot 36
6D30:
LW $2, 36($29) ; Load our return value from stack slot 36
LW $31, 28($29) ; Restore our return address after JALs
LW $16, 24($29) ; Restore the caller's value of $16
ADDIU $29, $29, 40 ; Return the stack to its old size
JR $31 ; Return after the JAL that called us
NOP
A good first step, but we can do better!
No comments :
Post a Comment