2015-10-28

Super Mario 64: Known values and bits

Three very common optimization possibilities (i.e. many non-trivial subroutines can be optimized in these ways) in Super Mario 64 are exemplified by the following pieces of code. I had to extract the pieces of code in a different way for this entry, so this time, their offsets are Nintendo 64 memory addresses, not file offsets.

You should read the entries on the MIPS instruction set first.

8027BE14: SW $24, 28($29) LW $25, 28($29) 80286790: LUI $10, 0x8034 LUI $8, 0x8034 LH $8, -14516($8) LH $10, -14574($10) LUI $1, 0x8034 80288CE4: ANDI $4, $4, 0xFFFF ANDI $5, $5, 0xFFFF ANDI $6, $6, 0xFFFF ANDI $14, $5, 0x000F OR $5, $14, $0 ANDI $15, $5, 0xFFFF

2015-10-20

Super Mario 64 subroutine 2B3C, part 2: Constants

This entry will focus on a second subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 2B3C (hexadecimal) within the ROM.

Last time, we found that the programmer who wrote this C function didn't do so well, leading to the compiler determining that every use of a pointer in a global variable needed to reload its value from memory in order to store a new value through it.

Before writing this entry, I rewrote the subroutine as if the programmer had written a better version of the C function and fed it to the compiler.

Because some data probably exists around the global pointer variable, I chose to take a copy of the global pointer inside the function (struct state* local_state = global_state;) instead of using a non-pointer global variable (struct state global_state).

2015-10-19

Super Mario 64 subroutine 2B3C, part 1: Alias analysis

This entry will focus on a second subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 2B3C (hexadecimal) within the ROM.

Unlike the subroutine at 6CD8, this one has only one basic block. Well, it has two, but the end of the first one is a jump directly to the second, so that doesn't count!

You should read the entries on the MIPS instruction set first.

Super Mario 64 subroutine 6CD8, part 3: The stack

This entry will focus on a subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 6CD8 (hexadecimal) within the ROM.

I already simplified this subroutine in part 1, about basic blocks and part 2, about memory optimizations. The original subroutine, as found in the ROM, is in part 1.

This entry will add some finishing touches to the simplified version from part 2. They are somewhat machine-specific, and thus are done after the compiler's code analysis. Some of the simplifications possible here suggest that Super Mario 64 may have been a debug build, not a release build.

2015-10-17

Super Mario 64 subroutine 6CD8, part 2: Memory optimization

This entry will focus on a subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 6CD8 (hexadecimal) within the ROM.

I already simplified this subroutine in part 1, about basic blocks. The original subroutine, as found in the ROM, is in that part. This entry will build on the simplified version.

Super Mario 64 subroutine 6CD8, part 1: Basic blocks

This entry will focus on a subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 6CD8 (hexadecimal) within the ROM.

Super Mario 64 appears to have been compiled with one of the earliest compilers available for the Nintendo 64. It does not schedule instructions well; a modern compiler could do much better.

You should read the entries on the MIPS instruction set first.

2015-10-13

MIPS, part 4: Branches and jumps

Conditional branches and unconditional jumps are the last piece of the puzzle. They allow programs to make decisions, skip over code, and call subroutines.

You'll want to read MIPS part 1: Registers and calling convention and MIPS part 2: ALU instructions first.

The branch delay slot (#delay)

An important feature of the MIPS architecture is that the instruction following branches and jumps is executed before the target instruction is. This is known as the branch delay slot, or sometimes delay slot for short.

2015-10-05

MIPS, part 3: Memory access instructions

Even on MIPS, with its plentiful registers, fast memory access is essential.

To make address resolution simpler for the processor, MIPS I only defines one addressing mode: register plus signed 16-bit offset. This allows it to resolve addresses in one cycle, no more, no less.

Memory load requests are first brought up to the data cache, which may service the load immediately. If the data cache doesn't have the requested data, one whole cache line of data is read using a memory burst command.

Memory store requests are first brought up to the store buffer. The cache line surrounding the write has to be loaded in order to preserve the surrounding bytes, but the store buffer allows more operations to be carried out while waiting for the cache line to be loaded. Then, the data is only written when the data cache line needs to be evicted for another memory access, using a memory burst command.

This setup can be bypassed if needed by using uncached memory access. However, most games, and most code within those games, will be using cached accesses. The data cache is very efficient for these accesses.

You'll want to read MIPS part 1: Registers and calling convention first.

MIPS, part 2: ALU instructions

Here are some common instructions in MIPS I and their effects. Some of those effects are important for optimization purposes, so they'll be described in greater detail.

These instructions are common on the Sony PlayStation. Counter-intuitively, they are also much more common than 64-bit instructions on the Nintendo 64: because its data bus is 32-bit, loading and storing 64-bit quantities to memory would take twice as long, so most games were written with 32-bit data and instructions only. Some games could even be run on a pure 32-bit MIPS processor!

You'll want to read MIPS part 1: Registers and calling convention first.