2015-10-19

Super Mario 64 subroutine 6CD8, part 3: The stack

This entry will focus on a subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 6CD8 (hexadecimal) within the ROM.

I already simplified this subroutine in part 1, about basic blocks and part 2, about memory optimizations. The original subroutine, as found in the ROM, is in part 1.

This entry will add some finishing touches to the simplified version from part 2. They are somewhat machine-specific, and thus are done after the compiler's code analysis. Some of the simplifications possible here suggest that Super Mario 64 may have been a debug build, not a release build.

Previously on Old Games, New Tech

6CD8: ADDIU $29, $29, -40 ; Grow the stack by 40 bytes SW $31, 28($29) ; There are JALs below, so store the return address SW $4, 40($29) ; Store argument 1 to the stack SW $5, 44($29) ; Store argument 2 to the stack SW $16, 24($29) ; Preserve the caller's value of $16 OR $2, $0, $0 ; Initialize the return value to 0 LH $16, 42($29) ; Grab the low 16 bits of stack slot 40 (big-endian) BEQ $16, $0, 6D10 ; If those bits are equal to 0, go to 6D10 NOP 6CFC: ADDIU $1, $0, 1 BEQ $16, $1, 6D20 ; If those bits are equal to 1, go to 6D20 NOP BEQ $0, $0, 6D28 ; Unconditional branch to 6D28 NOP 6D10: JAL 0x8024BA8C ; Subroutine call - may use our values of $4 and $5 NOP BEQ $0, $0, 6D28 ; Unconditional branch to 6D28 NOP 6D20: JAL 0x8024B9B8 ; Subroutine call - may use our values of $4 and $5 NOP 6D28: LW $31, 28($29) ; Restore our return address after JALs LW $16, 24($29) ; Restore the caller's value of $16 ADDIU $29, $29, 40 ; Return the stack to its old size JR $31 ; Return after the JAL that called us NOP

Callee-saved registers (#saved)

You might notice that the use of $16 in the subroutine means its value has to be stored to a stack slot at the start, and restored from the stack at the end. That's because the compiler chose a callee-saved register, and 6CD8 is the callee of another subroutine. These stores and loads have a cost, both in terms of instruction fetch and execution, and in memory accesses (somewhat mitigated by the fact that the stack slot is likely to be in the data cache, but still).

A value absolutely needs to be assigned to a callee-saved register if it needs to survive past a subroutine call. More formally, it needs to be saved if an instruction following a subroutine call in a basic block needs the value, or if the basic block containing the subroutine call has a branch, possibly through other basic blocks, to a basic block that needs the value.

Here, is $16 really needed?

function 6CD8(%arg1 signed 16-bit, %arg2) { _start: %1 = 0 decide %arg1Lo == 0? yes: is0; no: test1 test1: decide %arg1Lo == 1? yes: is1; no: returnValue is0: %2 = call 0x8024BA8C(%arg1, %arg2) goto returnValue is1: %3 = call 0x8024B9B8(%arg1, %arg2) goto returnValue returnValue: %4 = phi(%1 from test1, %2 from is0, %3 from is1) set return value to %4 return }

%arg1Lo, the pseudo-value representing the lower bits of $4 extracted by the following instruction: LH $16, 42($29) ; Grab the low 16 bits of stack slot 40 (big-endian) was assigned to $16. To determine if we need to save it, we examine all subroutine call instructions in 6CD8.

  • is0 has the first subroutine call, and it is not followed by any instruction that reads %arg1Lo. It branches to returnValue, so we must examine it.
  • returnValue, queued to be examined, does not contain any instruction that reads %arg1Lo. It doesn't branch elsewhere, so the examination stops here.
  • is1 has the second subroutine call, and it is not followed by any instruction that reads %arg1Lo. It branches to returnValue, which has already been examined.

So it doesn't need to be assigned to a callee-saved register after all! We can delete the code in the prologue that saves $16 to the stack, as well as the code that restores $16 for the caller, if we reassign %arg1Lo to register $8.

Stack frame elimination for release builds (#frame)

There now remain these instructions in the prologue of the subroutine: SW $4, 40($29) ; Store argument 1 to the stack SW $5, 44($29) ; Store argument 2 to the stack

Their task is to preserve the initial value of subroutine arguments. A debugger, having access to the offsets on the stack of a subroutine's return address, preserved argument registers and frame size, may display the value of every single variable used in a misbehaving subroutine and all of its callers, as well as show the initial value of a subroutine's arguments even after they have been modified by assignment statements. This is a very powerful service offered to programmers.

However, in release builds, the code only needs to preserve these values (in callee-saved registers) if they are needed for more than one subroutine call on the same code path.

Seeing as this subroutine does not read $4 or $5 after a subroutine call, writing them to the stack is not absolutely required. The value of $4 is written to the stack in order to re-read it as a signed 16-bit value. However, the store and load needs the same number of instructions (i.e. 2) as the code discussed in part 2, here, to sign-extend the contents of $4 solely in registers. Now, it makes sense to use that code.

The final subroutine (#final)

6CD8: ADDIU $29, $29, -24 ; Grow the stack by 24 bytes SW $31, 16($29) ; There are JALs below, so store the return address OR $2, $0, $0 ; Initialize the return value to 0 SLL $8, $4, 16 ; Kill the upper 16 bits of $4 into $8 SRA $8, $8, 16 ; Bring 16 copies of bit #31 down BEQ $8, $0, 6D08 ; If those bits are equal to 0, go to 6D08 NOP 6CF4: ADDIU $1, $0, 1 BEQ $8, $1, 6D18 ; If those bits are equal to 1, go to 6D18 NOP BEQ $0, $0, 6D20 ; Unconditional branch to 6D20 NOP 6D08: JAL 0x8024BA8C ; Subroutine call - may use our values of $4 and $5 NOP BEQ $0, $0, 6D20 ; Unconditional branch to 6D20 NOP 6D18: JAL 0x8024B9B8 ; Subroutine call - may use our values of $4 and $5 NOP 6D20: LW $31, 16($29) ; Restore our return address after JALs ADDIU $29, $29, 24 ; Return the stack to its old size JR $31 ; Return after the JAL that called us NOP

No comments :

Post a Comment