2015-10-19

Super Mario 64 subroutine 2B3C, part 1: Alias analysis

This entry will focus on a second subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 2B3C (hexadecimal) within the ROM.

Unlike the subroutine at 6CD8, this one has only one basic block. Well, it has two, but the end of the first one is a jump directly to the second, so that doesn't count!

You should read the entries on the MIPS instruction set first.

The original subroutine

2B3C: ADDIU $29, $29, -8 LUI $14, 0x8034 LUI $15, 0x8034 LW $15, -20364($15) LW $14, -20372($14) SUBU $24, $14, $15 SRA $25, $24, 3 SW $25, 4($29) LUI $9, 0x8034 LW $9, -20376($9) LUI $8, 0x8034 ADDIU $8, $8, -20440 SW $8, 64($9) LUI $11, 0x8034 LW $11, -20376($11) ADDIU $10, $0, 2 SW $10, 68($11) LUI $13, 0x8034 LW $13, -20376($13) ADDIU $12, $0, 1 SW $12, 0($13) LUI $15, 0x8034 LW $15, -20376($15) LUI $14, 0x8033 ADDIU $14, $14, -19872 SW $14, 8($15) LUI $9, 0x8034 LW $9, -20376($9) LUI $24, 0x8033 LUI $25, 0x8033 ADDIU $25, $25, -19872 ADDIU $24, $24, -19664 SUBU $8, $24, $25 SW $8, 12($9) LUI $10, 0x8034 LW $10, -20376($10) SW $0, 4($10) LUI $12, 0x8034 LW $12, -20376($12) LUI $11, 0x8033 ADDIU $11, $11, -19664 SW $11, 16($12) LUI $14, 0x8034 LW $14, -20376($14) LUI $13, 0x8034 ADDIU $13, $13, -25920 SW $13, 24($14) LUI $24, 0x8034 LW $24, -20376($24) ADDIU $15, $0, 4096 SW $15, 20($24) LUI $8, 0x8034 LW $8, -20376($8) ADDIU $25, $0, 2048 SW $25, 28($8) LUI $10, 0x8034 LW $10, -20376($10) LUI $9, 0x8020 ADDIU $9, $9, 30976 SW $9, 32($10) LUI $12, 0x8034 LW $12, -20376($12) ADDIU $11, $0, 1024 SW $11, 36($12) LUI $14, 0x8034 LW $14, -20376($14) LUI $13, 0x8022 ADDIU $13, $13, 28672 SW $13, 40($14) LUI $25, 0x8034 LW $25, -20376($25) LUI $15, 0x8022 LUI $1, 0x0001 ORI $1, $1, 0xF000 ADDIU $15, $15, 28672 ADDU $24, $15, $1 SW $24, 44($25) LUI $8, 0x8034 LUI $9, 0x8034 LW $9, -20376($9) LW $8, -20364($8) SW $8, 48($9) LW $10, 4($29) LUI $12, 0x8034 LW $12, -20376($12) SLL $11, $10, 3 SW $11, 52($12) LUI $14, 0x8034 LW $14, -20376($14) LUI $13, 0x8020 ADDIU $13, $13, 28160 SW $13, 56($14) LUI $24, 0x8034 LW $24, -20376($24) ADDIU $15, $0, 2304 SW $15, 60($24) BEQ $0, $0, 2CC4 SLL $0, $0, 0 2CC4: JR $31 ADDIU $29, $29, 8

Wow! That's a lot of constants, and a lot of memory accesses! Can any of those be eliminated?

Alias analysis (#alias)

Say you have two instructions that load from the same memory location, but there's a store in the middle. Can you remove the second load?

If you know that the address of both the loads and the store are the same, and the size of the loads and the store is the same, then what you have is a read-after-write dependency, and the load can become a move of the value that was written to memory into another register.

If you know both addresses (they are constants), and you know the size of the store, and it doesn't overlap the loads, you can safely remove the second load.

If both loads and the store are going through the same register, with different offsets, and the store doesn't overlap the loads, you can safely remove the second load.

Otherwise, the compiler must perform alias analysis to figure out whether the store to the unrelated address could be writing to the location used by the second load, thus requiring the reload.

Let's do this alias analysis on the subroutine: LUI $14, 0x8034 LUI $15, 0x8034 LW $15, -20364($15) ; Loads $15 from memory at 0x8033B074 LW $14, -20372($14) ; Loads $14 from memory at 0x8033B06C SUBU $24, $14, $15 ; $24 = $14 - $15 SRA $25, $24, 3 ; $25 = $24 / 8 (signed) > SW $25, 4($29) Here's our first store, which goes to stack slot 4. Given that, in C, pointers to stack variables are only valid if they're taken explicitly during the execution of a function and passed around, unrelated variables having pointers to this stack slot invokes Undefined Behavior. This store must not alias any other memory.

LUI $9, 0x8034 LW $9, -20376($9) ; Loads $9 from memory at 0x8033B068 LUI $8, 0x8034 ADDIU $8, $8, -20440 > SW $8, 64($9) ; Stores 0x8033B028 at the unknown address in $9 + 64 Our second store is to an address that was loaded from memory. It could refer to any address, so reloads after this store must occur.

LUI $11, 0x8034 > LW $11, -20376($11) ; Loads $11 from memory at 0x8033B068 This load, despite being to the same address as the previous one, might read a new value written by the intervening SW. It must stay.

ADDIU $10, $0, 2 ; $10 = 2 > SW $10, 68($11) ; Stores 2 at the unknown address in $11 + 68 Our third store is to an address that was loaded from memory. It could refer to any address, so reloads after this store must occur.

LUI $13, 0x8034 LW $13, -20376($13) ; Loads $13 from memory at 0x8033B068 (again) ADDIU $12, $0, 1 ; $12 = 1 SW $12, 0($13) ; Stores 1 at the unknown address in $13 (again) LUI $15, 0x8034 LW $15, -20376($15) ; Loads $15 from memory at 0x8033B068 (again) LUI $14, 0x8033 ADDIU $14, $14, -19872 SW $14, 8($15) ; Stores 0x8032B260 at the unknown address in $15 (again) + 8 LUI $9, 0x8034 LW $9, -20376($9) ; Loads $15 from memory at 0x8033B068 (again) LUI $24, 0x8033 LUI $25, 0x8033 ADDIU $25, $25, -19872 ADDIU $24, $24, -19664 SUBU $8, $24, $25 ; $8 = 0x8032B330 - 0x8032B260 (= 208) SW $8, 12($9) ; Stores 208 to the unknown address in $9 (again) + 12 LUI $10, 0x8034 LW $10, -20376($10) ; Loads $10 from memory at 0x8033B068 (again) SW $0, 4($10) ; Stores 0 to the unknown address in $10 (again) + 4 LUI $12, 0x8034 LW $12, -20376($12) ; Loads $12 from memory at 0x8033B068 (again) LUI $11, 0x8033 ADDIU $11, $11, -19664 SW $11, 16($12) ; Stores 0x8032B330 at the unknown address in $12 LUI $14, 0x8034 LW $14, -20376($14) ; Loads $14 from memory at 0x8033B068 (again) LUI $13, 0x8034 ADDIU $13, $13, -25920 SW $13, 24($14) ; Stores 0x80339AC0 to the unknown address in $14 (again) + 24 LUI $24, 0x8034 LW $24, -20376($24) ; Loads $24 from memory at 0x8033B068 (again) ADDIU $15, $0, 4096 SW $15, 20($24) ; Stores 4096 at the unknown address in $24 (again) + 20 LUI $8, 0x8034 LW $8, -20376($8) ; Loads $8 from memory at 0x8033B068 (again) ADDIU $25, $0, 2048 SW $25, 28($8) ; Stores 2048 at the unknown address in $24 (again) + 28 LUI $10, 0x8034 LW $10, -20376($10) ; Loads $10 from memory at 0x8033B068 (again) LUI $9, 0x8020 ADDIU $9, $9, 30976 SW $9, 32($10) ; Stores 0x80207900 at the unknown address in $24 (again) + 28 LUI $12, 0x8034 LW $12, -20376($12) ; Loads $12 from memory at 0x8033B068 (again) ADDIU $11, $0, 1024 SW $11, 36($12) ; Stores 1024 at the unknown address in $12 (again) + 36 LUI $14, 0x8034 LW $14, -20376($14) ; Loads $14 from memory at 0x8033B068 (again) LUI $13, 0x8022 ADDIU $13, $13, 28672 SW $13, 40($14) ; Stores 0x80227000 at the unknown address in $14 (again) + 40 LUI $25, 0x8034 LW $25, -20376($25) ; Loads $25 from memory at 0x8033B068 (again) LUI $15, 0x8022 LUI $1, 0x0001 ORI $1, $1, 0xF000 ADDIU $15, $15, 28672 ADDU $24, $15, $1 ; $24 = 0x80227000 + 0x1F000 (= 0x80246000) SW $24, 44($25) ; Stores 0x80246000 at the unknown address in $25 (again) + 44 LUI $8, 0x8034 LUI $9, 0x8034 LW $9, -20376($9) ; Loads $9 from memory at 0x8033B068 (again) > LW $8, -20364($8) ; Loads $8 from memory at 0x8033B074 After all of these operations, the one reading at 0x8033B074 may not have the value that was stored there earlier. This load must stay.

SW $8, 48($9) ; Stores $8 at the unknown address in $8 (again) + 48 > LW $10, 4($29) ; Reload the value computed at the beginning As for this load, it is not allowed to alias any other store because it's on the stack and its address has still not been taken. If the value stored to stack slot 4 was still in $25, it could simply be moved from $25 to $10, but it has been rewritten since being stored.

LUI $12, 0x8034 LW $12, -20376($12) ; Loads $12 from memory at 0x8033B068 (again) SLL $11, $10, 3 ; $11 = $10 * 8; SW $11, 52($12) ; Stores $11 at the unknown address in $12 (again) + 52 LUI $14, 0x8034 LW $14, -20376($14) ; Loads $14 from memory at 0x8033B068 (again) LUI $13, 0x8020 ADDIU $13, $13, 28160 SW $13, 56($14) ; Stores 0x80206E00 at the unknown address in $14 (again) + 56 LUI $24, 0x8034 LW $24, -20376($24) ; Loads $24 from memory at 0x8033B068 (again) ADDIU $15, $0, 2304 SW $15, 60($24) ; Stores 2304 at the unknown address at $24 (again) + 60

So the compiler did its thing? (#results)

Yes, the compiler did its job this time, so it's the programmer's fault.

This subroutine is some sort of state initialization function, giving constant values to 17 members of a structure whose address is contained in a global variable at 0x8033B068. Here's what it could look like: struct state { int member1; int member2; /* ... */ }; /* A global variable */ struct state* global_state; void init_state() { global_state->member17 = 0x8033B028; global_state->member18 = 2; global_state->member1 = 1; global_state->member3 = 0x8032B260; /* ... */ }

The problem is that the write of member17 through the pointer may affect the future value of global_state, if global_state happens to point 64 bytes before the global variable's address in memory. (The write to member18 may also affect it if it's 68 bytes before, and so on.) The compiler knows nothing about this, so it must carry out all operations.

The programmer should have used a non-pointer global variable (struct state global_state) or grabbed the value of the pointer at the start of the function: void init_state() { struct state* local_state = global_state; local_state->member17 = 0x8033B028; local_state->member18 = 2; local_state->member1 = 1; local_state->member3 = 0x8032B260; /* ... */ }

I'll optimize the subroutine myself to get rid of the pointer reloads before the next entry; don't worry!

No comments :

Post a Comment