This entry will focus on a second subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 2B3C (hexadecimal) within the ROM.
Last time, we found that the programmer who wrote this C function didn't do so well, leading to the compiler determining that every use of a pointer in a global variable needed to reload its value from memory in order to store a new value through it.
Before writing this entry, I rewrote the subroutine as if the programmer had written a better version of the C function and fed it to the compiler.
Because some data probably exists around the global pointer variable, I chose to take a copy of the global pointer inside the function (struct state* local_state = global_state;
) instead of using a non-pointer global variable (struct state global_state
).
A better subroutine to work with
2B3C:
ADDIU $29, $29, -8
LUI $14, 0x8034
LUI $15, 0x8034
LW $15, -20364($15)
LW $14, -20372($14)
SUBU $24, $14, $15
SRA $25, $24, 3
SW $25, 4($29)
LUI $9, 0x8034
LW $9, -20376($9) ; Load the state pointer into $9, just once
LUI $8, 0x8034
ADDIU $8, $8, -20440
SW $8, 64($9)
ADDIU $10, $0, 2
SW $10, 68($9)
ADDIU $12, $0, 1
SW $12, 0($9)
LUI $14, 0x8033
ADDIU $14, $14, -19872
SW $14, 8($9)
LUI $24, 0x8033
LUI $25, 0x8033
ADDIU $25, $25, -19872
ADDIU $24, $24, -19664
SUBU $8, $24, $25
SW $8, 12($9)
SW $0, 4($9)
LUI $11, 0x8033
ADDIU $11, $11, -19664
SW $11, 16($9)
LUI $13, 0x8034
ADDIU $13, $13, -25920
SW $13, 24($9)
ADDIU $15, $0, 4096
SW $15, 20($9)
ADDIU $25, $0, 2048
SW $25, 28($9)
LUI $9, 0x8020
ADDIU $9, $9, 30976
SW $9, 32($9)
ADDIU $11, $0, 1024
SW $11, 36($9)
LUI $13, 0x8022
ADDIU $13, $13, 28672
SW $13, 40($9)
LUI $15, 0x8022
LUI $1, 0x0001
ORI $1, $1, 0xF000
ADDIU $15, $15, 28672
ADDU $24, $15, $1
SW $24, 44($9)
LUI $8, 0x8034
LW $8, -20364($8)
SW $8, 48($9)
LW $10, 4($29)
SLL $11, $10, 3
SW $11, 52($9)
LUI $13, 0x8020
ADDIU $13, $13, 28160
SW $13, 56($9)
ADDIU $15, $0, 2304
SW $15, 60($9)
JR $31
ADDIU $29, $29, 8
Which is somewhat shorter, but there can still be work done on it.
Constant propagation (#constprop)
Look at these two parts of the subroutine:
LUI $24, 0x8033 ; $24 = 0x80330000;
LUI $25, 0x8033 ; $25 = 0x80330000;
ADDIU $25, $25, -19872 ; $25 = 0x8032B260;
ADDIU $24, $24, -19664 ; $24 = 0x8032B330;
SUBU $8, $24, $25 ; $8 = 0x8032B330 - 0x8032B260;
LUI $15, 0x8022 ; $15 = 0x80220000;
LUI $1, 0x0001 ; $1 = 0x10000;
ORI $1, $1, 0xF000 ; $1 = 0x1F000;
ADDIU $15, $15, 28672 ; $15 = 0x80227000;
ADDU $24, $15, $1 ; $24 = 0x80227000 + 0x1F000;
Each of them ends with an operation that results in a constant. The operations can be replaced with loading the result into their result registers:
ADDIU $8, $0, 208 ; $8 = 0x8032B330 - 0x8032B260 = 208;
LUI $24, 0x8024
ORI $24, $24, 0x6000 ; $24 = 0x80227000 + 0x1F000 = 0x80246000;
As for the original constants, they may be removed, but only if they are not needed for further computations. More formally, a constant is needed in a register if an instruction following loading it into a register reads the register, or if the basic block containing the subroutine call has a branch, possibly through other basic blocks, to a basic block that reads the register. (Yes, that's similar to the algorithm for callee-saved registers.)
Additionally, a constant in a register is unneeded if its value is fully overwritten later but it has not been read in the meantime.
Fortunately for us, $24
and $25
, used by SUBU
, are both fully overwritten later. As for the ADDU
, $15
is fully overwritten later, and $1
is not read again. So we can remove all the constants that were used for the computations!
Constant reuse (#constreuse)
I used a modern version of GCC for MIPS (4.9.1) just before writing this section to make sure that it performs this optimization. It does, so it is indeed a case of Old Games, New Tech.
Before assigning a register to hold a constant, modern compilers record this fact into a table and attempt to reuse the constant to build other constants later on in the same basic block.
The compiler used for Super Mario 64 has not done this. It loads 0x80340000
5 times, 0x80330000
4 times, and 0x80200000
2 times in the same basic block. (Some of these constants are no longer needed because of the constant propagation step above having removed some operations. They're still counted here.)
In order to reuse a partial constant, we can assign a register to hold it for the entire duration for which it is needed. Let's assign $4
to hold 0x80340000
, $5
to hold 0x80330000
and $6
to hold 0x80200000
and rewrite the subroutine.
The rewritten subroutine (#rewritten)
2B3C:
ADDIU $29, $29, -8
LUI $4, 0x8034 ; $4 = 0x80340000, just once
LW $15, -20364($4) ; Load from 0x8033B074 (via $4)
LW $14, -20372($4) ; Load from 0x8033B06C (via $4)
SUBU $24, $14, $15
SRA $25, $24, 3
SW $25, 4($29)
LW $9, -20376($4) ; Load (via $4) into $9, just once
ADDIU $8, $4, -20440 ; Form the constant via $4
SW $8, 64($9)
ADDIU $10, $0, 2
SW $10, 68($9)
ADDIU $12, $0, 1
SW $12, 0($9)
LUI $5, 0x8033 ; $5 = 0x80330000, just once
ADDIU $14, $5, -19872 ; Form the constant via $5
SW $14, 8($9)
ADDIU $8, $0, 208 ; $8 = 0x8032B330 - 0x8032B260 = 208;
SW $8, 12($9)
SW $0, 4($9)
ADDIU $11, $5, -19664 ; Form the constant via $5
SW $11, 16($9)
ADDIU $13, $4, -25920 ; Form the constant via $4
SW $13, 24($9)
ADDIU $15, $0, 4096
SW $15, 20($9)
ADDIU $25, $0, 2048
SW $25, 28($9)
LUI $6, 0x8020 ; $6 = 0x80200000, just once
ADDIU $9, $6, 30976 ; Form the constant via $6
SW $9, 32($9)
ADDIU $11, $0, 1024
SW $11, 36($9)
LUI $13, 0x8022
ADDIU $13, $13, 28672
SW $13, 40($9)
LUI $24, 0x8024
ORI $24, $24, 0x6000 ; $24 = 0x80227000 + 0x1F000 = 0x80246000;
SW $24, 44($9)
LW $8, -20364($4) ; Load from 0x8033B074 (via $4)
SW $8, 48($9)
LW $10, 4($29)
SLL $11, $10, 3
SW $11, 52($9)
ADDIU $13, $6, 28160 ; Form the constant via $6
SW $13, 56($9)
ADDIU $15, $0, 2304
SW $15, 60($9)
JR $31
ADDIU $29, $29, 8
We can actually get rid of the stack adjustment and the store and reload of stack slot 4, but that's not something new; if you like, you can go to part 3 of subroutine 6CD8
, about the stack to remind yourself of what to do.
No comments :
Post a Comment