Sunday, 29 January 2017

Frame Rates and I76 Nitro

A Problem with Old Games on Modern Systems

One of the persistent problems the I76 & Nitro games have had on modern systems is the frame rate.
The game uses the graphics frame rate to control physics and timing. Unfortunately on a modern system the frame rate is much higher than the original authors expected and the assumptions of the engine break. This makes for some odd behaviours, and impossible jumps.

However the I76 community have solved this problem by manually modifying the Nitro binaries to add a frame rate limiter.

Wait, they've done what?!

The Files

There are a few different patch files available on the interstate 76 site.

  • Nitro Patch I
  • Nitro Patch II
  • Nitro Patch III
  • Nitro 30fps Patch

Patches I - III provide updated executables and libraries; they're replacements for the versions shipped with the game.

The 30fps patch file is slightly different; it contains a single replacement nitro executable provided by the developer known as BofH, who is is also responsible for the Patch III release.

The executables in patch II & III are identical, and the 30fps version of executable is based on these.

The Change

There's a couple of changes in the binary, but the particularly interesting one for the frame rate affects the main render loop and modifying some zero data at the end of the file into a block of code, effectively implementing a binary trampoline. This change by BofH, welding into the existing binary, is (IMO) incredibly clever.

Looking at the decompiled binary code using Snowman then the giveaway to locating the completed frame is the frame dump function. This is located at 0x488ed0 and is called from 0x433069. It's easy to spot by the reference to the string literal "SCRDUMP.BMP". It's part of a monster of a function starting at 0x431760, that includes the main render loop.

In this section there's a change at 0x4327c8, where we redirect to the the trampoline.

The original code has a compare followed by a conditional jump

4327c8: cmp dword [0x4f30cc], 0x5
4327cf: jnz dword 0x43311d

The new code is different - it has an unconditional call to 0x4c02a0 replacing the compare.

4327c8: jmp dword 0x4c02a0
4327cd: nop
4327ce: nop
4327cf: jnz dword 0x43311d

So, what's at 0x4c02a0? In the original nitro executable Snowman thinks this is filled with "add [eax], al", which is basically a chunk of binary filled with 0 data.

In the new nitro executable this has been replaced with something more interesting. Snowman actually messes up the opcodes slightly - it should look like this though...

4c02a0: push ebx
4c02a1: push ecx
4c02a2: push edx
4c02a3: call dword [0x4c111c]
4c02a9: mov ebx, [0x4cc4f8]
4c02af: test ebx, ebx
4c02b1: jnz 0x4c02b5
4c02b3: jmp 0x4c02de
4c02b5: mov edx, eax
4c02b7: sub edx, ebx
4c02b9: mov ecx, [0x4c02f4]
4c02bf: sub ecx, edx
4c02c1: mov ebx, [0x4cc4f4]
4c02c7: add ebx, ecx
4c02c9: test ebx, ebx
4c02cb: js 0x4c0300
4c02cd: mov [0x4cc4f4], ebx
4c02d3: push ebx
4c02d4: mov ebx, eax
4c02d6: call dword [0x675139]
4c02dc: mov eax, ebx
4c02de: mov [0x4cc4f8], eax
4c02e3: cmp dword [0x4f30cc], 0x5
4c02ea: pop edx
4c02eb: pop ecx
4c02ec: pop ebx
4c02ed: jmp dword 0x4327cf
4c02f2: add [eax], al
4c02f4: and [eax], eax

The short version:

This loop makes two external calls; one to GetTickCount() and one to Sleep() - it's basically a time check and delay calculation which inserts a delay between frame renders.

The value at 0x4c02f4 is actually a data value of "33" - this sets the optimal frame delay of 33mS, or ~30fps. Changing this value winds the delay target value up or down.

The tail end of this code restores the context and duplicates the test that this modification removed before returning to the jnz, so the subsequent graphics loop processing is otherwise unchanged.

The long version

Let's break this down:

4c02a0: push ebx
4c02a1: push ecx
4c02a2: push edx

Entry context save

4c02a3: call dword [0x4c111c]

This is a call to GetTickCount() - so we retrieve the number of milliseconds.

4c02a9: mov ebx, [0x4cc4f8]
4c02af: test ebx, ebx
4c02b1: jnz 0x4c02b5
4c02b3: jmp 0x4c02de

This recovers a value from 0x4cc4f8 and skips the next part of the processing if it's zero; tracing through the code we can see this value gets the latest timer tick count stored back in it, so it's a simple test for a previous tick value, and obviously if there isn't one then the frame delay calculation is pointless.

4c02b5: mov edx, eax
4c02b7: sub edx, ebx

And this subtracts the current from the previous tick value - i.e. it tells us how many mS between renders.

4c02b9: mov ecx, [0x4c02f4]
4c02bf: sub ecx, edx
4c02c1: mov ebx, [0x4cc4f4]
4c02c7: add ebx, ecx

Looking forwards then there are two values here - 0x4cc4f4 is set to the "previous sleep time", and 0x4c02f4 is a constant value, 33. So the calculation is (mS throughout): "new sleep" = "previous sleep" + (33 - "time difference").
So, ideally "previous sleep" will stabilise at the target inter-frame delay for a given frame render time: think of it as "new delay = delay + error". Carrying on:

4c02c9: test ebx, ebx
4c02cb: js 0x4c0300

This tests for a delay below zero, in which case it jumps to this code fragment:

...
4c0300: mov ebx, 0x0
4c0305: jmp 0x4c02cd

This sets a zero delay value and jumps straight back. In this case we want to render as fast as possible because we're under the target frame rate.

4c02cd: mov [0x4cc4f4], ebx

Stores the requested delay value in 0x4cc4f4 for the next loop.

4c02d3: push ebx
4c02d4: mov ebx, eax

Sets up the delay and stores off the tick count.

4c02d6: call dword [0x675139]

This is a call to sleep, and so we delay for a time that limits the outgoing frame rate.

4c02dc: mov eax, ebx
4c02de: mov [0x4cc4f8], eax

restores the tick count and saves it to 0x4cc4f8

4c02e3: cmp dword [0x4f30cc], 0x5

This is the test we replaced in the original binary

4c02ea: pop edx
4c02eb: pop ecx
4c02ec: pop ebx
4c02ed: jmp dword 0x4327cf

Restore context, and return to the main loop at the jnz, which leaves the code back in the state we started.

Hacking up the binary some more

The magic number is the data value of "33". We can just change this directly in the new nitro.exe to modify the frame rate up or down.

So opening the nitro.exe in a hex editor and going to offset 784116 we can tune the value to verify we're understanding this correctly.

  • 0x21 => 33mS, or 30 fps
  • 0x28 => 40mS, or 25 fps
  • 0x64 => 100mS, or 10 fps
  • 0xFF => 256mS or ~4 fps

Annoyingly this doesn't appear to affect the -recordLoads debug values, but you can't have everything. For the 10fps values & below the lower frame rate is very visible.