2015-10-20

Super Mario 64 subroutine 2B3C, part 2: Constants

This entry will focus on a second subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 2B3C (hexadecimal) within the ROM.

Last time, we found that the programmer who wrote this C function didn't do so well, leading to the compiler determining that every use of a pointer in a global variable needed to reload its value from memory in order to store a new value through it.

Before writing this entry, I rewrote the subroutine as if the programmer had written a better version of the C function and fed it to the compiler.

Because some data probably exists around the global pointer variable, I chose to take a copy of the global pointer inside the function (struct state* local_state = global_state;) instead of using a non-pointer global variable (struct state global_state).

A better subroutine to work with

2B3C: ADDIU $29, $29, -8 LUI $14, 0x8034 LUI $15, 0x8034 LW $15, -20364($15) LW $14, -20372($14) SUBU $24, $14, $15 SRA $25, $24, 3 SW $25, 4($29) LUI $9, 0x8034 LW $9, -20376($9) ; Load the state pointer into $9, just once LUI $8, 0x8034 ADDIU $8, $8, -20440 SW $8, 64($9) ADDIU $10, $0, 2 SW $10, 68($9) ADDIU $12, $0, 1 SW $12, 0($9) LUI $14, 0x8033 ADDIU $14, $14, -19872 SW $14, 8($9) LUI $24, 0x8033 LUI $25, 0x8033 ADDIU $25, $25, -19872 ADDIU $24, $24, -19664 SUBU $8, $24, $25 SW $8, 12($9) SW $0, 4($9) LUI $11, 0x8033 ADDIU $11, $11, -19664 SW $11, 16($9) LUI $13, 0x8034 ADDIU $13, $13, -25920 SW $13, 24($9) ADDIU $15, $0, 4096 SW $15, 20($9) ADDIU $25, $0, 2048 SW $25, 28($9) LUI $9, 0x8020 ADDIU $9, $9, 30976 SW $9, 32($9) ADDIU $11, $0, 1024 SW $11, 36($9) LUI $13, 0x8022 ADDIU $13, $13, 28672 SW $13, 40($9) LUI $15, 0x8022 LUI $1, 0x0001 ORI $1, $1, 0xF000 ADDIU $15, $15, 28672 ADDU $24, $15, $1 SW $24, 44($9) LUI $8, 0x8034 LW $8, -20364($8) SW $8, 48($9) LW $10, 4($29) SLL $11, $10, 3 SW $11, 52($9) LUI $13, 0x8020 ADDIU $13, $13, 28160 SW $13, 56($9) ADDIU $15, $0, 2304 SW $15, 60($9) JR $31 ADDIU $29, $29, 8

Which is somewhat shorter, but there can still be work done on it.

Constant propagation (#constprop)

Look at these two parts of the subroutine: LUI $24, 0x8033 ; $24 = 0x80330000; LUI $25, 0x8033 ; $25 = 0x80330000; ADDIU $25, $25, -19872 ; $25 = 0x8032B260; ADDIU $24, $24, -19664 ; $24 = 0x8032B330; SUBU $8, $24, $25 ; $8 = 0x8032B330 - 0x8032B260; LUI $15, 0x8022 ; $15 = 0x80220000; LUI $1, 0x0001 ; $1 = 0x10000; ORI $1, $1, 0xF000 ; $1 = 0x1F000; ADDIU $15, $15, 28672 ; $15 = 0x80227000; ADDU $24, $15, $1 ; $24 = 0x80227000 + 0x1F000;

Each of them ends with an operation that results in a constant. The operations can be replaced with loading the result into their result registers: ADDIU $8, $0, 208 ; $8 = 0x8032B330 - 0x8032B260 = 208; LUI $24, 0x8024 ORI $24, $24, 0x6000 ; $24 = 0x80227000 + 0x1F000 = 0x80246000;

As for the original constants, they may be removed, but only if they are not needed for further computations. More formally, a constant is needed in a register if an instruction following loading it into a register reads the register, or if the basic block containing the subroutine call has a branch, possibly through other basic blocks, to a basic block that reads the register. (Yes, that's similar to the algorithm for callee-saved registers.)

Additionally, a constant in a register is unneeded if its value is fully overwritten later but it has not been read in the meantime.

Fortunately for us, $24 and $25, used by SUBU, are both fully overwritten later. As for the ADDU, $15 is fully overwritten later, and $1 is not read again. So we can remove all the constants that were used for the computations!

Constant reuse (#constreuse)

I used a modern version of GCC for MIPS (4.9.1) just before writing this section to make sure that it performs this optimization. It does, so it is indeed a case of Old Games, New Tech.

Before assigning a register to hold a constant, modern compilers record this fact into a table and attempt to reuse the constant to build other constants later on in the same basic block.

The compiler used for Super Mario 64 has not done this. It loads 0x80340000 5 times, 0x80330000 4 times, and 0x80200000 2 times in the same basic block. (Some of these constants are no longer needed because of the constant propagation step above having removed some operations. They're still counted here.)

In order to reuse a partial constant, we can assign a register to hold it for the entire duration for which it is needed. Let's assign $4 to hold 0x80340000, $5 to hold 0x80330000 and $6 to hold 0x80200000 and rewrite the subroutine.

The rewritten subroutine (#rewritten)

2B3C: ADDIU $29, $29, -8 LUI $4, 0x8034 ; $4 = 0x80340000, just once LW $15, -20364($4) ; Load from 0x8033B074 (via $4) LW $14, -20372($4) ; Load from 0x8033B06C (via $4) SUBU $24, $14, $15 SRA $25, $24, 3 SW $25, 4($29) LW $9, -20376($4) ; Load (via $4) into $9, just once ADDIU $8, $4, -20440 ; Form the constant via $4 SW $8, 64($9) ADDIU $10, $0, 2 SW $10, 68($9) ADDIU $12, $0, 1 SW $12, 0($9) LUI $5, 0x8033 ; $5 = 0x80330000, just once ADDIU $14, $5, -19872 ; Form the constant via $5 SW $14, 8($9) ADDIU $8, $0, 208 ; $8 = 0x8032B330 - 0x8032B260 = 208; SW $8, 12($9) SW $0, 4($9) ADDIU $11, $5, -19664 ; Form the constant via $5 SW $11, 16($9) ADDIU $13, $4, -25920 ; Form the constant via $4 SW $13, 24($9) ADDIU $15, $0, 4096 SW $15, 20($9) ADDIU $25, $0, 2048 SW $25, 28($9) LUI $6, 0x8020 ; $6 = 0x80200000, just once ADDIU $9, $6, 30976 ; Form the constant via $6 SW $9, 32($9) ADDIU $11, $0, 1024 SW $11, 36($9) LUI $13, 0x8022 ADDIU $13, $13, 28672 SW $13, 40($9) LUI $24, 0x8024 ORI $24, $24, 0x6000 ; $24 = 0x80227000 + 0x1F000 = 0x80246000; SW $24, 44($9) LW $8, -20364($4) ; Load from 0x8033B074 (via $4) SW $8, 48($9) LW $10, 4($29) SLL $11, $10, 3 SW $11, 52($9) ADDIU $13, $6, 28160 ; Form the constant via $6 SW $13, 56($9) ADDIU $15, $0, 2304 SW $15, 60($9) JR $31 ADDIU $29, $29, 8

We can actually get rid of the stack adjustment and the store and reload of stack slot 4, but that's not something new; if you like, you can go to part 3 of subroutine 6CD8, about the stack to remind yourself of what to do.

No comments :

Post a Comment