2015-10-17

Super Mario 64 subroutine 6CD8, part 1: Basic blocks

This entry will focus on a subroutine in Super Mario 64, USA version, for the Nintendo 64. It's a MIPS III subroutine at offset 6CD8 (hexadecimal) within the ROM.

Super Mario 64 appears to have been compiled with one of the earliest compilers available for the Nintendo 64. It does not schedule instructions well; a modern compiler could do much better.

You should read the entries on the MIPS instruction set first.

The original subroutine

6CD8: ADDIU $29, $29, -40 ; Grow the stack by 40 bytes SW $31, 28($29) ; There are JALs below, so store the return address SW $4, 40($29) ; Store argument 1 to the stack SW $5, 44($29) ; Store argument 2 to the stack SW $16, 24($29) ; Preserve the caller's value of $16 SW $0, 36($29) ; Initialize stack slot 36 to 0 LH $16, 42($29) ; Grab the low 16 bits of stack slot 40 (big-endian) BEQ $16, $0, 6D10 ; If those bits are equal to 0, go to 6D10 NOP 6CFC: ADDIU $1, $0, 1 BEQ $16, $1, 6D24 ; If those bits are equal to 1, go to 6D24 NOP BEQ $0, $0, 6D38 ; Unconditional branch to 6D38 NOP 6D10: JAL 0x8024BA8C ; Subroutine call - may use our values of $4 and $5 NOP SW $2, 36($29) ; Store the subroutine's return value into stack slot 36 BEQ $0, $0, 6D38 ; Unconditional branch to 6D38 NOP 6D24: JAL 0x8024B9B8 ; Subroutine call - may use our values of $4 and $5 NOP SW $2, 36($29) ; Store the subroutine's return value into stack slot 36 BEQ $0, $0, 6D38 ; Unconditional branch to 6D38 NOP 6D38: BEQ $0, $0, 6D48 ; Unconditional branch to 6D48 LW $2, 36($29) ; Load our return value from stack slot 36 6D40: BEQ $0, $0, 6D48 ; Unconditional branch to 6D48 NOP 6D48: LW $31, 28($29) ; Restore our return address after JALs LW $16, 24($29) ; Restore the caller's value of $16 ADDIU $29, $29, 40 ; Return the stack to its old size JR $31 ; Return after the JAL that called us NOP

Basic blocks

You may have noticed that the subroutine up there contains multiple branches. Compilers, like gcc and LLVM, view these branches as the terminating instructions of basic blocks.

Basic blocks are blocks of instructions having two properties:

  • Only the first instruction of a basic block may be the target of a branch from elsewhere.
  • The last instruction of a basic block must be a branch to elsewhere.
They are very important to compilers. For example, the very first thing that LLVM does when you ask it to emit machine code is figure out where a function's basic blocks start and end – even before performing the first optimization on the code!

This is because some optimizations, notably instruction scheduling and combination, must know if multiple instructions are always executed or if an instruction in the middle may be the target of a branch.

Accordingly, the very first thing we must do is figure out how a compiler saw this function in terms of basic blocks:

function 6CD8(%arg1 signed 16-bit, %arg2) { _start: stack allocate %retVal %retVal = 0 decide %arg1Lo == 0? yes: is0; no: test1 test1: decide %arg1Lo == 1? yes: is1; no: others others: goto returnValue is0: %retVal = call 0x8024BA8C(%arg1, %arg2) goto returnValue is1: %retVal = call 0x8024B9B8(%arg1, %arg2) goto returnValue returnValue: set return value to %retVal goto preEnd preEnd: goto _end _end: return }

So the code does three things depending on argument 1's value:

  • If it's 0, it calls a function at 0x8024BA8C and arranges to eventually return the value returned by that function.
  • If it's 1, it calls a function at 0x8024B9B8 and arranges to eventually return the value returned by that function.
  • Otherwise, 0, the value with which the stack slot was initialized, is returned.
Much simpler now, isn't it?

Between these basic blocks, three optimizations can be applied.

Basic block elimination (#bbe)

The compiler did not clean up branches to basic blocks whose sole purpose was to branch to another basic block. The basic block I labeled as preEnd above, for example, merely branches to _end, which destroys the stack frame and returns. Branches to preEnd can be rewritten as branches to _end. In the subroutine, it's the block at 6D40.

If a basic block (B) is the target of only one unconditional branch done by another basic block (A), then basic blocks A and B in that order can become the new A. The compiler did not clean these up either; if it did, it would have avoided the branch at the end of the block I labeled returnValue. Note that returnValue itself is the target of 3 unconditional branches, in is0, is1 and others, so they can't be combined.

Dead code elimination (#dce)

Once all branches to a basic block are gone, that basic block can be eliminated. After the basic block elimination above, preEnd could be removed.

Branch elimination (#be)

When it was time to generate code for those basic blocks, the compiler eliminated some branches when the terminating instruction of a basic block went to the very next instruction, but not all. This is something that modern compilers usually do consistently.

You can see these superfluous branches at the end of the blocks at 6D24 and 6D40 in the subroutine. If the block at 6D40 was eliminated, as a modern compiler would, then the block at 6D38 would have have been superfluous instead. In all of those cases, the branch goes to the very next instruction, so it can be removed.

The rewritten subroutine (#rewritten)

6CD8: ADDIU $29, $29, -40 ; Grow the stack by 40 bytes SW $31, 28($29) ; There are JALs below, so store the return address SW $4, 40($29) ; Store argument 1 to the stack SW $5, 44($29) ; Store argument 2 to the stack SW $16, 24($29) ; Preserve the caller's value of $16 SW $0, 36($29) ; Initialize stack slot 36 to 0 LH $16, 42($29) ; Grab the low 16 bits of stack slot 40 (big-endian) BEQ $16, $0, 6D10 ; If those bits are equal to 0, go to 6D10 NOP 6CFC: ADDIU $1, $0, 1 BEQ $16, $1, 6D24 ; If those bits are equal to 1, go to 6D24 NOP BEQ $0, $0, 6D30 ; Unconditional branch to 6D30 NOP 6D10: JAL 0x8024BA8C ; Subroutine call - may use our values of $4 and $5 NOP SW $2, 36($29) ; Store the subroutine's return value into stack slot 36 BEQ $0, $0, 6D30 ; Unconditional branch to 6D30 NOP 6D24: JAL 0x8024B9B8 ; Subroutine call - may use our values of $4 and $5 NOP SW $2, 36($29) ; Store the subroutine's return value into stack slot 36 6D30: LW $2, 36($29) ; Load our return value from stack slot 36 LW $31, 28($29) ; Restore our return address after JALs LW $16, 24($29) ; Restore the caller's value of $16 ADDIU $29, $29, 40 ; Return the stack to its old size JR $31 ; Return after the JAL that called us NOP

A good first step, but we can do better!

No comments :

Post a Comment