YM2413 wait-states emulation in openmsx 17

By Bengalack

Paladin (802)

Bengalack's picture

16-01-2022, 17:09

To avoid cluttering the discussion in norakomi's thread, I'm making a new, based on a comment there:
https://www.msx.org/forum/msx-talk/development/wait-15-t-sta...

Grauw wrote:

Really what matters most is that people count the same way when they specify wait times. In my experience people and documents always specify the time between the start of /IORQ signals. For example the 12-cycle wait that the YM2413 documentation requires after address writes, it means that on a standard MSX Z80 you can OUT address and data back-to-back.

Thank you, once again Grauw! I was not aware of this. With this newly aquired knowledge, I had to go straight back to my music-player and apply this to both address- and data-writings. I was able to make a tight and perfect 84-cycle-loop. The great thing with this is that it works perfectly on my (physical) a1-wsx. The sad news is that openmsx introduces "static" or a little noise :( I *can* add two cycles extra (with jp => jr), and music sounds good. But as the start of many of my tunes have more than 50 register writes, I am losing over 100 cycles on this (in an otherwise tight and fully booked frame "in game" :-) )

I wonder if this is inaccuracy in openmsx (missing timing by one cycle, perhaps?), or if this just reflects that there are other YM-2413 implementations that are a bit slower, which I am not aware of (like I just realized is the situation amongst the different VDPs out there). Anyone who knows?

Login or register to post comments

By Grauw

Ascended (10820)

Grauw's picture

16-01-2022, 17:37

OpenMSX does not emulate the timing limitations of FM chips. You can only reliably test this on real hardware for now. See issue #935, issue #936, issue #937, issue #938.

Actually for YM2413 it comes close since it incorporated the Nuked-OPLL sound core, however due to reasons the core’s accurate emulation of too fast access timing has been disabled currently.

So I’m not sure how this “static” would occur, but maybe the workaround is introducing it?

There are no slower YM2413s, they all work the same. Are you sure you counted the 84 cycles correctly? Either way, a 100 cycle loss worst case seems not too bad on a 59736 cycle frame budget (0.17% of frame time), could be much worse for a 50x factor.

Btw the YM2413 is not very sensitive to too fast access by small amounts of cycles, e.g. if you access it only a few cycles too fast it will result in only very occasional dropped notes or wrong frequencies, etc. It breaks down more frequently as the access time gets more radically out of range. But this makes it more difficult to reproduce small timing issues on real hardware.

By Bengalack

Paladin (802)

Bengalack's picture

16-01-2022, 21:24

Grauw wrote:

OpenMSX does not emulate the timing limitations of FM chips. You can only reliably test this on real hardware for now. See issue #935, issue #936, issue #937, issue #938.

Actually for YM2413 it comes close since it incorporated the Nuked-OPLL sound core, however due to reasons the core’s accurate emulation of too fast access timing has been disabled currently.

Thanks again Grauw! Very interesting. But, oh, that bug has been there since 2015. Probably unlikely that this is fixed in the nearest future.

Grauw wrote:

So I’m not sure how this “static” would occur, but maybe the workaround is introducing it?

I often struggle to get these c++ projects to build, so I don't have the chance to compile without that commit. I tried openmsx16, but the code is present there as well.

Grauw wrote:

There are no slower YM2413s, they all work the same. Are you sure you counted the 84 cycles correctly?

I'm now counting from start of out (DATA_PORT) to start of next out(ADDRESS_PORT), and include the cost of the out (12 and 14 depending on which you use) in the wait-cycles. I use the Visual Code extension "Z80 Assembly Meter" by theNestruo in "MSX-mode". I know the cycle count pr instruction by heart pretty well too now :-) So, I believe this is the correct way.

Grauw wrote:

Either way, a 100 cycle loss worst case seems not too bad on a 59736 cycle frame budget (0.17% of frame time), could be much worse for a 50x factor.

Agree, but I'm always striving for minimum cycles. The game I'm doing at the moment, may seem simple and cozy, but whats going on under the hood has been optimized through many iterations. I think I have a possible worst case scene with 41 sprites on the screen at once (6 of them are enemies with 2 sprites each), and fullscreen-scroll as well as animated objects, sfx and music++, so it adds up :) I'm not going to accept frame drops. But yeah, I can live with this --it used to be worse than now.

In fact, I had another look at the loop, and I've been able to get it to run at 85 cycles. openmsx handles that without static. That means only 50 missing cycles, which isn't bad! I'm not sure yet how webmsx tacles this, though.

Grauw wrote:

Btw the YM2413 is not very sensitive to too fast access by small amounts of cycles, e.g. if you access it only a few cycles too fast it will result in only very occasional dropped notes or wrong frequencies, etc. It breaks down more frequently as the access time gets more radically out of range. But this makes it more difficult to reproduce small timing issues on real hardware.

Thanks!

By Grauw

Ascended (10820)

Grauw's picture

16-01-2022, 21:43

Bengalack wrote:

Thanks again Grauw! Very interesting. But, oh, that bug has been there since 2015. Probably unlikely that this is fixed in the nearest future.

Before the blocker was that the exact timing behaviour of the YM2413 was unknown. Thanks to the Nuked-OPLL core that is now resolved, and the remaining hurdle is an implementation detail of openMSX. So we’re much closer now.

Bengalack wrote:

In fact, I had another look at the loop, and I've been able to get it to run at 85 cycles. openmsx handles that without static. That means only 50 missing cycles, which isn't bad! I'm not sure yet how webmsx tacles this, though.

Nice.

WebMSX for sure won’t emulate timing restrictions, not on FM chips and probably not even on the VDP.

By ARTRAG

Enlighted (6976)

ARTRAG's picture

10-04-2022, 22:09

Webmsx doesn't emulate vdp timings for commands. All commands are executed instantaneously

By Parn

Paladin (854)

Parn's picture

11-04-2022, 13:45

I just wanted to remark that I experienced sound corruption using Nuked-OPLL with an overclocked emulated CPU. The corruption seemed more severe if I sped up the overclocked CPU, so at least some of the OPLL sensitivity to fast access is emulated.

By Manuel

Ascended (19677)

Manuel's picture

15-05-2022, 15:27

When the emulation speed setting is 100%, the OPLL access time restrictions are now emulated in the latest openMSX development build. @Bengalack please try it out.

By Bengalack

Paladin (802)

Bengalack's picture

15-05-2022, 21:57

Wohoo Smile At first I didn't think enforcing access restrictions would help my case, as I thought that I didn't violate them. My problem was that jitter/static occured when I use the tightest loop with a minimum of (the allowed) 84 cycles. I had to use 85.

But now, with the newest build: openmsx-17.0-360-g211efaef2-windows-vc-x64-bin-msi, music sounds perfectly using the tight 84-cycles loop Smile

This is great!

But ... I did test with inner loops with minimum 81 cycles and 78 cycles too, and music sounds just as good. How is the time restrictions supposed to sound when applied? Maybe I do something wrong?

By Grauw

Ascended (10820)

Grauw's picture

15-05-2022, 22:58

The 84 cycles are worst case. So if you wait 81 or 78 cycles it will sound ok for the most part, but listen long and carefully enough and there will be some wrong notes. As you wait less the odds of wrong notes increases until it is pretty much guaranteed that you will hear wrong notes all the time.

The OPLL continuously loops through all operators independently of the CPU. It does this one operator every 4 cycles, there are 18 operators so one complete loop takes 72 cycles in total.

When you write a register value, it will store it in a buffer, which takes 12 cycles. Then it is queued until the chip’s loop reaches the operator that is affected by your write, and then it applies it. Worst case this can take 72 cycles, for a total of 84.

So as you can see, if you don’t wait long enough, whether the write goes through correctly depends on where in the operator cycle the OPLL is. In other words, on chance.

By Bengalack

Paladin (802)

Bengalack's picture

16-05-2022, 09:19

Ah, great explanation. Thanks.

Well, this is an absolute great fix for me. And this, once again, shows how important openmsx is for us now, I mean just as a pure emulator this time (not as the great developer tool it also is, of course). Even if the game ran correctly on a physical machine, I was willing to do changes to make it sound correctly on this emulator as well - just because I imagine that so many people use this as their "main msx rig".

By tfh

Prophet (3426)

tfh's picture

16-05-2022, 09:40

Bengalack wrote:

Ah, great explanation. Thanks.

Well, this is an absolute great fix for me. And this, once again, shows how important openmsx is for us now, I mean just as a pure emulator this time (not as the great developer tool it also is, of course). Even if the game ran correctly on a physical machine, I was willing to do changes to make it sound correctly on this emulator as well - just because I imagine that so many people use this as their "main msx rig".

Although very understandable, it's the world upside down if we start to take the emulator as the benchmark instead of the real machines.
So it's a very good thing this issue got solved!