Three very common optimization possibilities (i.e. many non-trivial subroutines can be optimized in these ways) in Super Mario 64 are exemplified by the following pieces of code. I had to extract the pieces of code in a different way for this entry, so this time, their offsets are Nintendo 64 memory addresses, not file offsets.
You should read the entries on the MIPS instruction set first.
8027BE14:
SW $24, 28($29)
LW $25, 28($29)
80286790:
LUI $10, 0x8034
LUI $8, 0x8034
LH $8, -14516($8)
LH $10, -14574($10)
LUI $1, 0x8034
80288CE4:
ANDI $4, $4, 0xFFFF
ANDI $5, $5, 0xFFFF
ANDI $6, $6, 0xFFFF
ANDI $14, $5, 0x000F
OR $5, $14, $0
ANDI $15, $5, 0xFFFF
Available expressions (#avail)
Available expression analysis can be performed on the first block:
8027BE14:
SW $24, 28($29)
LW $25, 28($29)
in order for the LW
to use the result stored by the SW
.
On 32-bit MIPS processors, the LW
is completely unneeded, and reads of $25
can simply be replaced with reads of $24
, the register that still contains the value stored to stack slot 28.
However, on 64-bit MIPS processors, which the Nintendo 64 contains, the LW
forces the value loaded from memory to be properly sign-extended from 32 to 64 bits. Because the arguments to subroutines may or may not be properly sign-extended, the LW
is actually useful here; it's just wasteful in terms of memory access. It can be replaced with ADDU
without any change in program meaning:
8027BE14:
SW $24, 28($29)
ADDU $25, $24, $0
This only works if there are no aliases involved and if the value stored to the stack slot is still in a register, hence requiring an available expression analysis.
Constant reuse as a post-pass optimization (#constreuse)
A post-pass optimizer could replace sequences of code that use the same constant but then overwrite their own content such that the constant becomes unavailable immediately:
80286790:
LUI $10, 0x8034
LUI $8, 0x8034
LH $8, -14516($8)
LH $10, -14574($10)
LUI $1, 0x8034
so that they load the constant once into the last written register and perform all operations based on that new register:
80286790:
LUI $1, 0x8034
LH $8, -14516($1)
LH $10, -14574($1)
As of the end of this code, the only three things that need to happen are as follows:
$1
contains0x80340000
;$8
contains the 32-bit value loaded from memory at0x8033C74C
;$10
contains the 32-bit value loaded from memory at0x8033C712
.
This is what the rewritten code does, in as many instructions.
Such a chain of instructions contains LUI
instructions, all loading the same constant value, followed by code that reads the value and overwrites it in the same register. LUI ADDIU
, LUI ORI
and LUI
followed by any memory load instruction all match this description. The code simplification needs to use LUI
on the last written register first, replace references to the duplicate registers with that register, and delete the superfluous LUI
instructions.
Value range analysis (#range)
Value range analysis, more specifically bitmask analysis (as described in the paper "Range and Bitmask Analysis for Hardware Optimization in High-Level Synthesis" by Marcel Gort and Jason H. Anderson, University of Toronto), allows a compiler to figure out some unneeded bitwise operations.
The compiler did not perform this analysis; let's follow the code in this snippet to figure out where and how:
80288CE4:
ANDI $4, $4, 0xFFFF ; Bits #63..16 of $4 now known to be 0
ANDI $5, $5, 0xFFFF ; Bits #63..16 of $5 now known to be 0
ANDI $6, $6, 0xFFFF ; Bits #63..16 of $6 now known to be 0
ANDI $14, $5, 0x000F ; Bits #63..4 of $14 now known to be 0
OR $5, $14, $0 ; Move $14 (bits #3..0 of $5) back into $5
ANDI $15, $5, 0xFFFF ; Ensure bits #63..16 of $5 are unset
First, as is usual in subroutines, this piece of code makes sure that its arguments, of type unsigned short
(uint16_t
), do not have any garbage bits set. Then, it computes argument 2 & 0xF
and stores this back to an unsigned short
(uint16_t
) variable. The compiler did not realize that bits #63..16 were already 0, though. The last ANDI
instruction is unneeded, so it can be deleted, and later reads of $15
replaced with reads of $5
.
No comments :
Post a Comment