This entry will focus on a second subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 2B3C (hexadecimal) within the ROM.
Unlike the subroutine at 6CD8
, this one has only one basic block. Well, it has two, but the end of the first one is a jump directly to the second, so that doesn't count!
You should read the entries on the MIPS instruction set first.
The original subroutine
2B3C:
ADDIU $29, $29, -8
LUI $14, 0x8034
LUI $15, 0x8034
LW $15, -20364($15)
LW $14, -20372($14)
SUBU $24, $14, $15
SRA $25, $24, 3
SW $25, 4($29)
LUI $9, 0x8034
LW $9, -20376($9)
LUI $8, 0x8034
ADDIU $8, $8, -20440
SW $8, 64($9)
LUI $11, 0x8034
LW $11, -20376($11)
ADDIU $10, $0, 2
SW $10, 68($11)
LUI $13, 0x8034
LW $13, -20376($13)
ADDIU $12, $0, 1
SW $12, 0($13)
LUI $15, 0x8034
LW $15, -20376($15)
LUI $14, 0x8033
ADDIU $14, $14, -19872
SW $14, 8($15)
LUI $9, 0x8034
LW $9, -20376($9)
LUI $24, 0x8033
LUI $25, 0x8033
ADDIU $25, $25, -19872
ADDIU $24, $24, -19664
SUBU $8, $24, $25
SW $8, 12($9)
LUI $10, 0x8034
LW $10, -20376($10)
SW $0, 4($10)
LUI $12, 0x8034
LW $12, -20376($12)
LUI $11, 0x8033
ADDIU $11, $11, -19664
SW $11, 16($12)
LUI $14, 0x8034
LW $14, -20376($14)
LUI $13, 0x8034
ADDIU $13, $13, -25920
SW $13, 24($14)
LUI $24, 0x8034
LW $24, -20376($24)
ADDIU $15, $0, 4096
SW $15, 20($24)
LUI $8, 0x8034
LW $8, -20376($8)
ADDIU $25, $0, 2048
SW $25, 28($8)
LUI $10, 0x8034
LW $10, -20376($10)
LUI $9, 0x8020
ADDIU $9, $9, 30976
SW $9, 32($10)
LUI $12, 0x8034
LW $12, -20376($12)
ADDIU $11, $0, 1024
SW $11, 36($12)
LUI $14, 0x8034
LW $14, -20376($14)
LUI $13, 0x8022
ADDIU $13, $13, 28672
SW $13, 40($14)
LUI $25, 0x8034
LW $25, -20376($25)
LUI $15, 0x8022
LUI $1, 0x0001
ORI $1, $1, 0xF000
ADDIU $15, $15, 28672
ADDU $24, $15, $1
SW $24, 44($25)
LUI $8, 0x8034
LUI $9, 0x8034
LW $9, -20376($9)
LW $8, -20364($8)
SW $8, 48($9)
LW $10, 4($29)
LUI $12, 0x8034
LW $12, -20376($12)
SLL $11, $10, 3
SW $11, 52($12)
LUI $14, 0x8034
LW $14, -20376($14)
LUI $13, 0x8020
ADDIU $13, $13, 28160
SW $13, 56($14)
LUI $24, 0x8034
LW $24, -20376($24)
ADDIU $15, $0, 2304
SW $15, 60($24)
BEQ $0, $0, 2CC4
SLL $0, $0, 0
2CC4:
JR $31
ADDIU $29, $29, 8
Wow! That's a lot of constants, and a lot of memory accesses! Can any of those be eliminated?
Alias analysis (#alias)
Say you have two instructions that load from the same memory location, but there's a store in the middle. Can you remove the second load?
If you know that the address of both the loads and the store are the same, and the size of the loads and the store is the same, then what you have is a read-after-write dependency, and the load can become a move of the value that was written to memory into another register.
If you know both addresses (they are constants), and you know the size of the store, and it doesn't overlap the loads, you can safely remove the second load.
If both loads and the store are going through the same register, with different offsets, and the store doesn't overlap the loads, you can safely remove the second load.
Otherwise, the compiler must perform alias analysis to figure out whether the store to the unrelated address could be writing to the location used by the second load, thus requiring the reload.
Let's do this alias analysis on the subroutine:
LUI $14, 0x8034
LUI $15, 0x8034
LW $15, -20364($15) ; Loads $15 from memory at 0x8033B074
LW $14, -20372($14) ; Loads $14 from memory at 0x8033B06C
SUBU $24, $14, $15 ; $24 = $14 - $15
SRA $25, $24, 3 ; $25 = $24 / 8 (signed)
> SW $25, 4($29)
Here's our first store, which goes to stack slot 4. Given that, in C, pointers to stack variables are only valid if they're taken explicitly during the execution of a function and passed around, unrelated variables having pointers to this stack slot invokes Undefined Behavior. This store must not alias any other memory.
LUI $9, 0x8034
LW $9, -20376($9) ; Loads $9 from memory at 0x8033B068
LUI $8, 0x8034
ADDIU $8, $8, -20440
> SW $8, 64($9) ; Stores 0x8033B028 at the unknown address in $9 + 64
Our second store is to an address that was loaded from memory. It could refer to any address, so reloads after this store must occur.
LUI $11, 0x8034
> LW $11, -20376($11) ; Loads $11 from memory at 0x8033B068
This load, despite being to the same address as the previous one, might read a new value written by the intervening SW
. It must stay.
ADDIU $10, $0, 2 ; $10 = 2
> SW $10, 68($11) ; Stores 2 at the unknown address in $11 + 68
Our third store is to an address that was loaded from memory. It could refer to any address, so reloads after this store must occur.
LUI $13, 0x8034
LW $13, -20376($13) ; Loads $13 from memory at 0x8033B068 (again)
ADDIU $12, $0, 1 ; $12 = 1
SW $12, 0($13) ; Stores 1 at the unknown address in $13 (again)
LUI $15, 0x8034
LW $15, -20376($15) ; Loads $15 from memory at 0x8033B068 (again)
LUI $14, 0x8033
ADDIU $14, $14, -19872
SW $14, 8($15) ; Stores 0x8032B260 at the unknown address in $15 (again) + 8
LUI $9, 0x8034
LW $9, -20376($9) ; Loads $15 from memory at 0x8033B068 (again)
LUI $24, 0x8033
LUI $25, 0x8033
ADDIU $25, $25, -19872
ADDIU $24, $24, -19664
SUBU $8, $24, $25 ; $8 = 0x8032B330 - 0x8032B260 (= 208)
SW $8, 12($9) ; Stores 208 to the unknown address in $9 (again) + 12
LUI $10, 0x8034
LW $10, -20376($10) ; Loads $10 from memory at 0x8033B068 (again)
SW $0, 4($10) ; Stores 0 to the unknown address in $10 (again) + 4
LUI $12, 0x8034
LW $12, -20376($12) ; Loads $12 from memory at 0x8033B068 (again)
LUI $11, 0x8033
ADDIU $11, $11, -19664
SW $11, 16($12) ; Stores 0x8032B330 at the unknown address in $12
LUI $14, 0x8034
LW $14, -20376($14) ; Loads $14 from memory at 0x8033B068 (again)
LUI $13, 0x8034
ADDIU $13, $13, -25920
SW $13, 24($14) ; Stores 0x80339AC0 to the unknown address in $14 (again) + 24
LUI $24, 0x8034
LW $24, -20376($24) ; Loads $24 from memory at 0x8033B068 (again)
ADDIU $15, $0, 4096
SW $15, 20($24) ; Stores 4096 at the unknown address in $24 (again) + 20
LUI $8, 0x8034
LW $8, -20376($8) ; Loads $8 from memory at 0x8033B068 (again)
ADDIU $25, $0, 2048
SW $25, 28($8) ; Stores 2048 at the unknown address in $24 (again) + 28
LUI $10, 0x8034
LW $10, -20376($10) ; Loads $10 from memory at 0x8033B068 (again)
LUI $9, 0x8020
ADDIU $9, $9, 30976
SW $9, 32($10) ; Stores 0x80207900 at the unknown address in $24 (again) + 28
LUI $12, 0x8034
LW $12, -20376($12) ; Loads $12 from memory at 0x8033B068 (again)
ADDIU $11, $0, 1024
SW $11, 36($12) ; Stores 1024 at the unknown address in $12 (again) + 36
LUI $14, 0x8034
LW $14, -20376($14) ; Loads $14 from memory at 0x8033B068 (again)
LUI $13, 0x8022
ADDIU $13, $13, 28672
SW $13, 40($14) ; Stores 0x80227000 at the unknown address in $14 (again) + 40
LUI $25, 0x8034
LW $25, -20376($25) ; Loads $25 from memory at 0x8033B068 (again)
LUI $15, 0x8022
LUI $1, 0x0001
ORI $1, $1, 0xF000
ADDIU $15, $15, 28672
ADDU $24, $15, $1 ; $24 = 0x80227000 + 0x1F000 (= 0x80246000)
SW $24, 44($25) ; Stores 0x80246000 at the unknown address in $25 (again) + 44
LUI $8, 0x8034
LUI $9, 0x8034
LW $9, -20376($9) ; Loads $9 from memory at 0x8033B068 (again)
> LW $8, -20364($8) ; Loads $8 from memory at 0x8033B074
After all of these operations, the one reading at 0x8033B074
may not have the value that was stored there earlier. This load must stay.
SW $8, 48($9) ; Stores $8 at the unknown address in $8 (again) + 48
> LW $10, 4($29) ; Reload the value computed at the beginning
As for this load, it is not allowed to alias any other store because it's on the stack and its address has still not been taken. If the value stored to stack slot 4 was still in $25
, it could simply be moved from $25
to $10
, but it has been rewritten since being stored.
LUI $12, 0x8034
LW $12, -20376($12) ; Loads $12 from memory at 0x8033B068 (again)
SLL $11, $10, 3 ; $11 = $10 * 8;
SW $11, 52($12) ; Stores $11 at the unknown address in $12 (again) + 52
LUI $14, 0x8034
LW $14, -20376($14) ; Loads $14 from memory at 0x8033B068 (again)
LUI $13, 0x8020
ADDIU $13, $13, 28160
SW $13, 56($14) ; Stores 0x80206E00 at the unknown address in $14 (again) + 56
LUI $24, 0x8034
LW $24, -20376($24) ; Loads $24 from memory at 0x8033B068 (again)
ADDIU $15, $0, 2304
SW $15, 60($24) ; Stores 2304 at the unknown address at $24 (again) + 60
So the compiler did its thing? (#results)
Yes, the compiler did its job this time, so it's the programmer's fault.
This subroutine is some sort of state initialization function, giving constant values to 17 members of a structure whose address is contained in a global variable at 0x8033B068
. Here's what it could look like:
struct state {
int member1;
int member2; /* ... */
};
/* A global variable */
struct state* global_state;
void init_state()
{
global_state->member17 = 0x8033B028;
global_state->member18 = 2;
global_state->member1 = 1;
global_state->member3 = 0x8032B260;
/* ... */
}
The problem is that the write of member17
through the pointer may affect the future value of global_state
, if global_state
happens to point 64 bytes before the global variable's address in memory. (The write to member18
may also affect it if it's 68 bytes before, and so on.) The compiler knows nothing about this, so it must carry out all operations.
The programmer should have used a non-pointer global variable (struct state global_state
) or grabbed the value of the pointer at the start of the function:
void init_state()
{
struct state* local_state = global_state;
local_state->member17 = 0x8033B028;
local_state->member18 = 2;
local_state->member1 = 1;
local_state->member3 = 0x8032B260;
/* ... */
}
I'll optimize the subroutine myself to get rid of the pointer reloads before the next entry; don't worry!
No comments :
Post a Comment