2015-10-28

Super Mario 64: Known values and bits

Three very common optimization possibilities (i.e. many non-trivial subroutines can be optimized in these ways) in Super Mario 64 are exemplified by the following pieces of code. I had to extract the pieces of code in a different way for this entry, so this time, their offsets are Nintendo 64 memory addresses, not file offsets.

You should read the entries on the MIPS instruction set first.

8027BE14: SW $24, 28($29) LW $25, 28($29) 80286790: LUI $10, 0x8034 LUI $8, 0x8034 LH $8, -14516($8) LH $10, -14574($10) LUI $1, 0x8034 80288CE4: ANDI $4, $4, 0xFFFF ANDI $5, $5, 0xFFFF ANDI $6, $6, 0xFFFF ANDI $14, $5, 0x000F OR $5, $14, $0 ANDI $15, $5, 0xFFFF

2015-10-20

Super Mario 64 subroutine 2B3C, part 2: Constants

This entry will focus on a second subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 2B3C (hexadecimal) within the ROM.

Last time, we found that the programmer who wrote this C function didn't do so well, leading to the compiler determining that every use of a pointer in a global variable needed to reload its value from memory in order to store a new value through it.

Before writing this entry, I rewrote the subroutine as if the programmer had written a better version of the C function and fed it to the compiler.

Because some data probably exists around the global pointer variable, I chose to take a copy of the global pointer inside the function (struct state* local_state = global_state;) instead of using a non-pointer global variable (struct state global_state).

2015-10-19

Super Mario 64 subroutine 2B3C, part 1: Alias analysis

This entry will focus on a second subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 2B3C (hexadecimal) within the ROM.

Unlike the subroutine at 6CD8, this one has only one basic block. Well, it has two, but the end of the first one is a jump directly to the second, so that doesn't count!

You should read the entries on the MIPS instruction set first.

Super Mario 64 subroutine 6CD8, part 3: The stack

This entry will focus on a subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 6CD8 (hexadecimal) within the ROM.

I already simplified this subroutine in part 1, about basic blocks and part 2, about memory optimizations. The original subroutine, as found in the ROM, is in part 1.

This entry will add some finishing touches to the simplified version from part 2. They are somewhat machine-specific, and thus are done after the compiler's code analysis. Some of the simplifications possible here suggest that Super Mario 64 may have been a debug build, not a release build.

2015-10-17

Super Mario 64 subroutine 6CD8, part 2: Memory optimization

This entry will focus on a subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 6CD8 (hexadecimal) within the ROM.

I already simplified this subroutine in part 1, about basic blocks. The original subroutine, as found in the ROM, is in that part. This entry will build on the simplified version.

Super Mario 64 subroutine 6CD8, part 1: Basic blocks

This entry will focus on a subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 6CD8 (hexadecimal) within the ROM.

Super Mario 64 appears to have been compiled with one of the earliest compilers available for the Nintendo 64. It does not schedule instructions well; a modern compiler could do much better.

You should read the entries on the MIPS instruction set first.

2015-10-13

MIPS, part 4: Branches and jumps

Conditional branches and unconditional jumps are the last piece of the puzzle. They allow programs to make decisions, skip over code, and call subroutines.

You'll want to read MIPS part 1: Registers and calling convention and MIPS part 2: ALU instructions first.

The branch delay slot (#delay)

An important feature of the MIPS architecture is that the instruction following branches and jumps is executed before the target instruction is. This is known as the branch delay slot, or sometimes delay slot for short.