VGMPlay for MSX

Page 20/47
13 | 14 | 15 | 16 | 17 | 18 | 19 | | 21 | 22 | 23 | 24 | 25

By AxelStone

Prophet (2674)

AxelStone's picture

16-06-2017, 07:10

One question: could this player be integrated as player in a MSX game made un MSX-C (DOS2 environment)? I see that player runs fine on DOS2

Thanks!

By sd_snatcher

Prophet (3047)

sd_snatcher's picture

22-08-2017, 14:11

Quote:

@sd_snatcher Did you perchance upgrade something related to CPU speed on the fixed ST? Or maybe the other ST was running in Z80 mode? Otherwise, if there are no differences to speak of, I might’ve timed things a bit too tightly according to my own GT’s tolerances. I’ve had an issue with this in the past for an MSX-Audio modded with a clock crystal (Meits’s).

(This question has originally asked in this news post. Since it's very VGMPlay specific, I thought it was better to answer in this thread. I hope you don't mind.)

The only other relevant upgrade this machine has is an 1MB upgrade. The three possible combinations of the turboR amount of RAM can slightly affect the R800 speed due to differences the way the RAM refresh is performed, but it shouldn't be enough to disturb any software, unless busy-waiting is used for synchronization.

By Grauw

Ascended (8385)

Grauw's picture

22-08-2017, 17:48

Thanks, this is a better place Smile.

Hm, the refresh is afaik only done in bursts at unpredictable times, and I don’t rely on that in the timing (at least not intentionally). I assume the upgrade didn’t affect the wait states inserted by the S1990.

I employ two types of waits for R800, one is by dummy accessing the VDP for which the S1990 guarantees a 54 cycle wait on R800 while not affecting the Z80 so much, the other is relying on the execution time of instructions, which I also picked by their relative slowness on R800 compared to Z80, I do this for the smaller waits.

I’ve been trying out the songs that were mentioned:

Metal Gear - Can reproduce. The problem is that the VGM does not initialise the envelope frequency in the first 1.62 seconds, while it does get used. When I record a VGM from the start of the game myself it plays fine. The MP3 on VGMRips also does not sound correct. So, a case of a badly trimmed VGM, there’s no fix needed in VGMPlay.

Mr. Ghost - I don’t hear any difference between your recordings, my turboR, VGMRips website player and John’s recording on Youtube which is from the game I assume. Maybe I’m listening for the wrong thing? If you can be more specific it would be helpful.

Greatest Driver / Psycho World / Undeadline / Xevious - Can not reproduce Sad, so I’ll need your help for testing. Definitely sounds like the OPLL is accessed too fast though. I’ll send you some versions of VGMPlay, if you would like to test them for me, I can pinpoint the exact problem.

By sd_snatcher

Prophet (3047)

sd_snatcher's picture

23-08-2017, 00:54

Ok. I recorded the files and sent the results to you.

But I noticed one peculiar fact: the missing notes won't always happen at the same places. For instance, in the 1st time I recorded with VGMPLAY0.COM there were no missing notes. But then I managed to screw up that recording and had to repeat it again. In the 2nd time, I could notice some missing notes.

By sd_snatcher

Prophet (3047)

sd_snatcher's picture

23-08-2017, 00:57

Quote:

I employ two types of waits for R800, one is by dummy accessing the VDP for which the S1990 guarantees a 54 cycle wait on R800 while not affecting the Z80 so much, the other is relying on the execution time of instructions, which I also picked by their relative slowness on R800 compared to Z80, I do this for the smaller waits.

Please don't take this as judgmental, but why not use the TR system-timer, as it offers an easy and reliable fixed timing?

By Grauw

Ascended (8385)

Grauw's picture

23-08-2017, 01:51

sd_snatcher wrote:

Ok. I recorded the files and sent the results to you.

But I noticed one peculiar fact: the missing notes won't always happen at the same places. For instance, in the 1st time I recorded with VGMPLAY0.COM there were no missing notes. But then I managed to screw up that recording and had to repeat it again. In the 2nd time, I could notice some missing notes.

Thanks much! I will look into it tomorrow.

The missing notes not happening in the same places is to be expected. Although if the whole 1st time you didn’t notice any missing or off-key notes then you must’ve been very lucky (which is a bit suspicious), but maybe they just happened in less audible places.

sd_snatcher wrote:

Please don't take this as judgmental, but why not use the TR system-timer, as it offers an easy and reliable fixed timing?

No probs, it’s a valid question Smile.

I would have to introduce a separate code-path just for the turboR, while for other MSXes with 3.58 MHz, 7.16 MHz, etc. I still need to wait using the CPU. Not only is it more effort, but it also means increased code complexity and testing surface.

Already I can’t test all possible machine / chip combinations before every release (and had regression bugs reported after), I’m not so keen on adding more system configuration-dependent code paths when I can avoid it…

Still, I keep it in mind of course. If the puzzle pieces fall in place, I see an elegant way, and it also gives me some particular benefit (like faster execution on Z80), I may opt to do it.

By Grauw

Ascended (8385)

Grauw's picture

23-08-2017, 02:33

I snuck a listen at them just now… did you hear any glitches in 1 or 2? I would expect either the one or the other to have them.

I guess I’ll just re-inspect the timings, probably I mathed up and they’re too tight for the 12 / 84 cycles the application manual prescribes, just would’ve been nice to know which of the two I dun wrong.

p.s. Interesting story, I actually used this timing sensitivity of OPL chips as an accurate means to determine the exact R800 instruction cycle times listed on the MSX Assembly Page Smile (and then verified with openMSX findings). Not waiting enough after setting the address or the value had different effects, but I can’t recall which it was that sounded like this…

By sd_snatcher

Prophet (3047)

sd_snatcher's picture

25-08-2017, 14:21

I almost missed your message, sorry.

I also couldn't notice any glitches on 1 or 2.

Quote:

Interesting story, I actually used this timing sensitivity of OPL chips as an accurate means to determine

I wouldn't rely on that info too much. Many Yamaha chips of that time were known to have different processing times depending on the internal operation that the writes triggered. So they published the delays for the worst case scenario.

Tip: I don't know if you're aware, but the MSX magazine published on its edition 1990/11 what can be considered almost the R800 datasheet. It has all the instructions, how the flags are affected and their precise timings based on the R800 *bus* clock. It's on page 112. Smile

By Grauw

Ascended (8385)

Grauw's picture

25-08-2017, 17:40

My measurements weren’t based on Yamaha’s documentation of the worst case numbers, that would be silly. How it was measured was, the threshold between too fast and not too fast was very audible (between the register latch and register write), so by putting different instructions inbetween the two FM chip I/O accesses and padding them with nops until it started glitching, I could find out precisely how many cycles the instructions took to execute.

My findings (as published on the MAP) matched measurements done by the openMSX team years earlier by a different method, using a timer. So they’re accurate, confirmed from two independent sources.

The overview in MSX Magazine however, which is also in appendix A of MSX Datapack Vol. 3, is not accurate at all, e.g. it says “ld r,(hl)” takes 1 wait / 2 cycles, but in internal RAM in reality it takes 4 R800 cycles to complete. It does not match the reality more often than not. I reckon they just carbon copied R800 documentation, without taking the S1990 into account.

Also, I don’t really know how to interpret that wait (B) column, am I supposed to add it? Then ld r,r would be 2 cycles while it is really 1, ld r,(hl) would be 3 while it is really 4, and ld r,(ix) would be 8 while it is really 7. I can’t make heads or tails of it.

Another fun fact, also not included in ASCII’s documentation, is that when you access the external bus, the S1990 optionally adds an extra wait cycle to align itself with the external bus clock. So if you do out (n),a / ld a,d / out (n),a, it takes exactly as much time as out (n),a / out (n),a. You get the 1-cycle ld a,d for free, because the S1990 would otherwise insert a wait there anyway.

By Grauw

Ascended (8385)

Grauw's picture

25-08-2017, 18:02

But for VGMPlay, just going by Yamaha’s worst-case timings is probably the best thing to do at this point, because I think the current waits are perhaps optimised a bit based on “reality” (which may be specific to my machine), and clearly this isn’t working out.

As I recall it (it’s been a while), the register address latch always took an equal amount of time, but the register value write would be very variable, even when writing to the same register. I suspect rather than (or in addition to) some register-specific processing time, it has an access slots mechanism similar to the VDP where it waits for a slot before it can write the value. Kind of makes sense if you think about the inner working of the OP*, where a single operator is used again and again, if registers can only be written between cycles.

So if you have e.g. a 70-cycle wait rather than 84, it would almost always go right, but 1% of the time you get unlucky alignment and it would go wrong.

The difference between test 1 and 2 that I sent you is, I increased the register latch delay in test 1, and the register write delay in test 2 (and both in test 3).

The current situation in VGMPlay;

After the register latch I wait 7 bus cycles, because the OUT takes 5 bus cycles as well the second IORQ starts exactly 12 cycles after the first. But the space inbetween the IORQ signals will be only 9-10 cycles orso. This may not be enough, it depends on the interpretation of the “12 cycles” Yamaha recommends. I thought this was good, it works on my turboR, but maybe it’s not, I hoped test 1 would point out.

After the value write I wait 37 bus cycles, where I’m assuming the processing of the next VGM command will take enough time to fill up the remainder of the 84 cycles. I did the math on this at some point, but I could’ve miscalculated something there.

Given what I mentioned earlier about that 1% of the time, I suspect the fault may be in the latter wait, but I’m not sure that I’m recalling everything correctly.

Page 20/47
13 | 14 | 15 | 16 | 17 | 18 | 19 | | 21 | 22 | 23 | 24 | 25