Dynamic-vsync patch for Salamander

by sd_snatcher on 06-03-2009, 23:45
Topic: Software
Languages:

A week ago, sd_snatcher released the TurboFix patch for Salamander. But the very high CPU-requirements of the game raised a lot of questions in the days that followed. When sd_snatcher was explaining why the requirements were so high, he came up with an idea to solve the problem. Later on he discovered this idea is featured only in very recent games, like those from the Xbox-360.

The new patch replaces the original static-vsync timing routine with a dynamic-vsync one. This unleashes so much CPU-power that even the standard 3.57MHz Z80 now runs the game 50% faster than it used to do! In addition, the CPU-requirements to play the game at full speed most of the time dropped a lot: even a 5.37MHz CPU is now capable of doing so. On a turbo-R, the game runs like a dream.

The GameMaster compatibility has been maintained and turned out to be one of the greatest challenges when implementing this new routine. This new release replaces the previous TurboFix routine.

Relevant link: FRS' MSX page

Comments (22)

By sd_snatcher

Prophet (2990)

sd_snatcher's picture

07-03-2009, 00:22

Ops! The headline needs to be fixed: where it says "even a 3.57MHz CPU is now capable of doing so", it must be "even a 5.37MHz CPU is now capable of doing so" as I have said.

5.37MHz is the clock of the turbo-mode of the Panasonics FS-A1WSX/WX/FX. It needs at least one of those to run at full speed.

With 3.57MHz, the game will now run at around 70% of the full speed. Much better than the original 47% it did run before.

By wolf_

Ambassador_ (9734)

wolf_'s picture

07-03-2009, 00:39

Right. It's sometimes hard to estimate what's a typo and what's not.. Tongue

By Manuel

Ascended (15289)

Manuel's picture

07-03-2009, 01:15

Cool, is this optimization also possible for other Konami games?

And can you explain the idea behind it to everyone? Smile

By Ivan

Ascended (9061)

Ivan's picture

07-03-2009, 09:42

Games that can take benefit of this patch:

  • Maze of Galious - Knightmare 2
  • Gradius 1
  • Gradius 2
  • Gradius 3
  • Parodius
  • King's Valley 2 - MSX1
  • King's Valley 2 - MSX2
  • Space Manbow (bug on introduction and on splitscreen)
  • Metal Gear 1
  • Vampire Killer
  • Quarth (bug on introduction and on splitscreen)
  • Twinbee (SCC version should have turbo)
  • Pooyan
  • Konami's Soccer
  • SD Snatcher (only on introduction)
  • Snatcher (only on introduction and on in-game cutscenes)
  • F1 Spirit 3D Special

Wow!

By ARTRAG

Enlighted (6153)

ARTRAG's picture

07-03-2009, 14:13

please, any detail on the way the patch works?
I want to implemet it!!

By sd_snatcher

Prophet (2990)

sd_snatcher's picture

07-03-2009, 14:51

Cool, is this optimization also possible for other Konami games?
Quick anwer: For most games, yes.

Long answer: Yes, but there are some cases that:

1) It can't be a used. Metal Gear 2 is a clear example of that. Konami used a lot of nonstandard tricks in this game to try to reclaim the unused processor time.

2) It can be used, but the gain is negligible. This is the case when the Z80/3.57MHz can reach around the 90% of the full speed most of the time. So the traditional TurboFix is better for this cases, as it is a much smaller routine.

3) There are not enough free space in the cartridge. In this case, the decision will be straight: better performance with a nonstandard additional 4KB page (will not fit on a EPROM of the original size) or a traditional TurboFix? Quick poll: Which one would you choose?

Games that can take benefit of this patch:
<list sniped>

Oops! Calm down! I didn't say that! The DynamicVsync is not the holly grail of performance improvement! Smile

That list contains the games that do have the bug in the timing routine, but that doesn't mean they WILL get faster with the DynamicVsync. I detailed the reasons above.

For example, King's Valley 2 is a case where the game will not get any faster with DynamicVsync. So the traditional TurboFix is much smaller and will do the job quite well.

please, any detail on the way the patch works?

Tsc, tsc. I see neither you or Manuel did read the included READ-ME.TXT file! Shame on you. I'll copy the relevant part here, you lazy bums! Wink

"This patch replaces the bugged-static-vsync-interrupt-handler from Konami by a new one, featuring dynamic vsync. This is a new method used by modern gaming consoles (like Xbox-360) and 3D graphic cards that enables and disables the vsync on the fly, according to the CPU usage. If the CPU is falling behind the target fps, the vsync gets disabled. Once a less cpu intense area of the game is reached and the CPU usage goes down, the vsync is enabled again."

The process is done on a frame-by-frame basis.

By Manuel

Ascended (15289)

Manuel's picture

07-03-2009, 17:34

But, what is the use of enabling vsync altogether? (Or is that a dumb question?)

By sd_snatcher

Prophet (2990)

sd_snatcher's picture

07-03-2009, 17:59

But, what is the use of enabling vsync altogether? (Or is that a dumb question?)
In fact, it's a very good question.

First, it's to avoid frame tearing.

Second, on a MSX up to 2+ the VDP vblank interrupt only standard source to have some kind of timer to have your game always running at the same speed independently of how fast is the CPU is. Otherwise you'll have games like those ported from spectrum: The game runs as fast as your CPU can take it, because your only option for delays are the unreliable empty software loops. The same applies to the BGM.

But the resolution of 60Hz is too low for many applications, like syncing with the slower devices (like the YM2413 or the FDC controller). This is why those slow devices malfunction when a turbo is used: all delays to sync with them were done in software.

On a Turbo-R, the system-timer can also be used for those devices that require a higher resolution timer. But unfortunately, this system-timer does not generate interrupts, so it is used only by pooling.

On the MSX past, the RTC RP5C01 could have been used to also generate timed interrupts, but someone forgot to connect it´s INT line on the MSX2 design, making this function unusable.

The best source of timed interrupts for the MSX architecture was the Y8950 (Aka MSX-Audio). It has two excellent high-res timers driven by interrupts. But (again but) it was taken aside as a specification by the YM2413, which don't offer this feature. Also, to deal efficiently with so many interrupts sources at the same time, the IM2 of the Z80 should have been implemented in the MSX architecture.

By ARTRAG

Enlighted (6153)

ARTRAG's picture

07-03-2009, 20:19

let me see if i have understood
a static vsync based code would be

main_loop:

do your stuff in ram to prepare the next frame

wait Vblank

output the data to VRAM in the ISR routine

jp main_loop

Here, if the "do your stuff" section takes too long, the ISR is called before the work is done, rendering or partial data or data from the old frame (thus already in VRAM).

The dynamic solution you adopt is instead:

main_loop:

disable Vblank source

do your stuff in ram to prepare the next frame

enable Vblank source

wait Vblank

output the data to VRAM in the ISR routine

jp main_loop

Here when the "do your stuff" section takes too long, the ISR not called, thus saving CPU time and avoiding the rendering of partial data or of data from the old frame (thus already in VRAM).

Have I understood or there is more?

By [D-Tail]

Ascended (8231)

[D-Tail]'s picture

09-03-2009, 00:32

I guess it's more or less like this:

bool I_am_writing_a_new_frame = 0;

enable_interrupts() {
  _asm("ei");
  _asm("ret");
}

do_stuph() {
  ...
  I_am_writing_a_new_frame = 0;
  return;
}

wait_vsync() {
  if (I_am_writing_a_new_frame)
    return;
  else
    _asm("halt");
  return;
}

main() {
  enable_interrupts();
  I_am_writing_a_new_frame = 1;
  do_stuph();
  wait_vsync();
  main();
}

Excuse my lame C/ASM approach here Wink

By [D-Tail]

Ascended (8231)

[D-Tail]'s picture

09-03-2009, 00:33

ARTRAG: as you can see, there is more, because if the INT is missed, the next INT will occur and generate a frame lag, instead of a 'quick&dirty but necessary' frame update.

By ARTRAG

Enlighted (6153)

ARTRAG's picture

09-03-2009, 10:59

Just a question:

main() {
  enable_interrupts();
  I_am_writing_a_new_frame = 1;
  do_stuph();
  wait_vsync();
  main();  // Why this recursive call ?
}

Moreover
what is the code in the ISR (interrupt service routine)?
I cannot see really what you do as
wait_vsync()
is always executed after we finish
do_stuph()

By Metalbrain

Expert (67)

Metalbrain's picture

09-03-2009, 14:30

I think it's not supposed to be a recursive call, but a jp mainloop instead.

By nikodr

Paladin (723)

nikodr's picture

09-03-2009, 16:47

In most konami games the games main routine was something like

XXXX: JR XXXX

So one had to find the interrupt handler and dissasemble it.

Can somebody tell me wether a
XXXX:Halt
JP XXXX (or JR XXXX)

Would be faster ?I always have that question in mind.

Since the vdp syncro was done on the interrupt handler having halt instructions is better?

By PingPong

Prophet (3234)

PingPong's picture

09-03-2009, 20:23

@nikodr: the z80 reacts to int after each instruction end. So it's faster the instruction that takes less t-cycles, JP

By nikodr

Paladin (723)

nikodr's picture

10-03-2009, 15:24

I want to change the default main loop of konami from

xxxx:JR XXXX

to XXXX:Halt (or should i use in a,(#99) ? since this checks for vdp interrupts ?)

JP XXXX

However doing this requires to insert extra bytes to rom .So any absolute calls to specific addresses would fail unless i change the absolute calls by adding to the address the number of the extra bytes i inserted.

the XXXX:JR XXXX is only 2 bytes but JR to another address and the halt is at least 4 bytes (1 for halt 3 for the new jr).

Would the game speed benefit if i did it?

By ARTRAG

Enlighted (6153)

ARTRAG's picture

10-03-2009, 15:55

IMHO
no, in case (and I am not sure), you do not gain more than 1 or 2 cycles per interrupt, i.e. nothing

By Randam

Paladin (901)

Randam's picture

10-03-2009, 20:01

Great job yet again SD_Snatcher. Very good optimisation. As for your small poll, if the speed benefit with future dynamic patches is minimal in contrast to your earlier turbofixes, then for me the regular turbofix is the way to go. In case there is not enough room in the rom file, but the game will seriously benefit from a dynamic patch then a non-standard romsize would have my preference. On a real MSX you do need over 128Kb to execute a rom file anyways so you don't get a real higher spec requirement. Or wouldn't programs like romload and execrom be able to execute the new format romfiles?

By muffie

Paladin (933)

muffie's picture

10-03-2009, 21:11

Soooo good to see some real good stuff made in Brazil again.

By msd

Paragon (1372)

msd's picture

10-03-2009, 21:22

Can we expect some more patches?

By karloch

Paragon (2033)

karloch's picture

13-03-2009, 00:28

Salamander was already a hard enough game with the slowdowns. I can't imagine finishing (good ending) it at full speed, it must be crazy xD

By gdx

Prophet (2663)

gdx's picture

22-11-2017, 10:27

The music loses voices when I press F1 to pause with these patches.

My MSX profile