Even on MIPS, with its plentiful registers, fast memory access is essential.
To make address resolution simpler for the processor, MIPS I only defines one addressing mode: register plus signed 16-bit offset. This allows it to resolve addresses in one cycle, no more, no less.
Memory load requests are first brought up to the data cache, which may service the load immediately. If the data cache doesn't have the requested data, one whole cache line of data is read using a memory burst command.
Memory store requests are first brought up to the store buffer. The cache line surrounding the write has to be loaded in order to preserve the surrounding bytes, but the store buffer allows more operations to be carried out while waiting for the cache line to be loaded. Then, the data is only written when the data cache line needs to be evicted for another memory access, using a memory burst command.
This setup can be bypassed if needed by using uncached memory access. However, most games, and most code within those games, will be using cached accesses. The data cache is very efficient for these accesses.
You'll want to read MIPS part 1: Registers and calling convention first.
The load delay slot (#delay)
MIPS I specified that the instruction following a load from memory could not use the register into which the data was loaded.
Consider the following code, illegal in MIPS I:
LW $8, 0($4)
ADD $9, $8, $5
(See also the syntax for LW
.)
This code is loading from memory at the address contained in $4 plus 0, and requesting that this loaded memory be stored into $8. The following instruction tries to read from $8, which does not yet have the value.
Code in Sony PlayStation games, for example, must be written with an instruction in the middle (or NOP
if no meaningful work can be scheduled):
LW $8, 0($4)
NOP
ADD $9, $8, $5
On the Nintendo 64: the memory management unit (#mmu)
On the Nintendo 64, the resolved address may be a virtual address, which then needs to be looked up in the MMU's TLB. This involves another cycle of latency.
Additionally, if the TLB does not map the virtual address to a physical address in RAM, a TLB Refill exception is raised, and when the exception handler returns (after loading up a TLB entry to map the virtual address), the instruction is retried. This takes time, depending on the complexity of the exception handler and whether it needs to be brought into the instruction cache.
Due to these added latencies, many Nintendo 64 games directly use physical memory. Those that use virtual memory are distinctly slower, for example GoldenEye 007 and Conker's Bad Fur Day.
Load and store instructions (#load, #store)
The mnemonics and operands for load and store instructions follow a pattern.
- Load instructions start with
L
. Store instructions start withS
. - The width of the access follows:
B
for bytes,H
(half-word) for 16 bits,W
(word) for 32 bits. - In load instructions, an optional
U
follows. If it's present, the loaded value is zero-extended (unsigned). Otherwise, the loaded value is sign-extended. - The first operand is the register to load memory into, or the register to store memory from.
- The second operand is a memory reference:
Imm16(Reg)
. The offset, interpreted as a signed 16-bit number, is added to (or subtracted from) the base address register.
LB: Load signed byte (#lb)
LB Rt, Imm16(Rs) ; Rt <- (int8_t) MemByte(Rs + (int16_t) Imm16)
Loads one byte from the address formed by the memory reference and sign-extends it, storing the result into register Rt
.
LBU: Load unsigned byte (#lbu)
LBU Rt, Imm16(Rs) ; Rt <- (uint8_t) MemByte(Rs + (int16_t) Imm16)
Loads one byte from the address formed by the memory reference and zero-extends it, storing the result into register Rt
.
LH: Load signed half-word (#lh)
LH Rt, Imm16(Rs) ; Rt <- (int16_t) MemHalf(Rs + (int16_t) Imm16)
Loads one half-word from the address formed by the memory reference and sign-extends it, storing the result into register Rt
.
If the resolved address is not half-word-aligned (i.e. bit 0 is set), an Address Error exception is raised and no result is stored into register Rt
.
Despite the alignment requirement and size of the load, the memory reference is made by adding Imm16
bytes to the value of register Rs
.
LHU: Load unsigned half-word (#lhu)
LHU Rt, Imm16(Rs) ; Rt <- (uint16_t) MemHalf(Rs + (int16_t) Imm16)
Loads one half-word from the address formed by the memory reference and zero-extends it, storing the result into register Rt
.
If the resolved address is not half-word-aligned (i.e. bit 0 is set), an Address Error exception is raised and no result is stored into register Rt
.
Despite the alignment requirement and size of the load, the memory reference is made by adding Imm16
bytes to the value of register Rs
.
LW: Load signed word (#lw)
LW Rt, Imm16(Rs) ; Rt <- (int32_t) MemWord(Rs + (int16_t) Imm16)
Loads one word from the address formed by the memory reference, storing the result into register Rt
. On 64-bit MIPS processors, this load will be sign-extended by copying bit 31 to bits 32..63.
If the resolved address is not word-aligned (i.e. any of bits 0..1 is set), an Address Error exception is raised and no result is stored into register Rt
.
Despite the alignment requirement and size of the load, the memory reference is made by adding Imm16
bytes to the value of register Rs
.
SB: Store byte (#sb)
SB Rt, Imm16(Rs) ; MemByte(Rs + (int16_t) Imm16) <- Rt
Stores the low byte of register Rt
to the address formed by the memory reference.
SH: Store half-word (#sh)
SH Rt, Imm16(Rs) ; MemHalf(Rs + (int16_t) Imm16) <- Rt
Stores the low half-word of register Rt
to the address formed by the memory reference.
If the resolved address is not half-word-aligned (i.e. bit 0 is set), an Address Error exception is raised and no memory is written.
Despite the alignment requirement and size of the store, the memory reference is made by adding Imm16
bytes to the value of register Rs
.
SW: Store word (#sw)
SW Rt, Imm16(Rs) ; MemWord(Rs + (int16_t) Imm16) <- Rt
Stores the value of register Rt
to the address formed by the memory reference. On 64-bit MIPS processors, the stored value is actually the low word of the register.
If the resolved address is not word-aligned (i.e. any of bits 0..1 is set), an Address Error exception is raised and no memory is written.
Despite the alignment requirement and size of the store, the memory reference is made by adding Imm16
bytes to the value of register Rs
.
No comments :
Post a Comment