Your VG-8020/40 has a TMS9929A or TMS9129NL.
As I said earlier, I checked
http://www.formauri.es/personal/pgimeno/temp/VG-8020-40_TMS9...
Could you modify src/video/VDP.cc line #815, and change 4930.0 to something smaller?
I'm not yet convinced that this delay tells the whole story. I'd like to make specially crafted tests to get a clearer answer.
A better measurement method is probably to write to VRAM at different speeds and then read it back to see which one is the last changed byte, i.e. determine the last value of the counter register. Syncing the CPU with the VDP is probably going to be tricky, but it could give us more exact values and help us determine if this is merely a delay (and how much exactly), or if there's something else going on such as faster access.
Hopefully we can make an automated program to run the tests and return meaningful results to the user, who can then report the results to us. I can run it on the HB-10P too. I can't connect the HB-10P to the monitor yet because of the connector. I've already ordered a video cable, but it will probably take a few weeks. But I can save a test program to tape and run it.
I wanted to check if adding a short delay actually is able to make the artifacts of the emulation and the real machine match.
If there's no way to adjust the delay to match a certain real machine, it'd invalidate the approach.
But at least for the T6950A it seems to be the case. Of course, more tests are needed, but I think this is the right direction.
A better measurement method is probably to write to VRAM at different speeds and then read it back to see which one is the last changed byte, i.e. determine the last value of the counter register.
Yes, that's the right way.
But with the Z80 there won't be much granularity to test different speeds. A simple test consisting in changing the color table at full speed (with a different color per write) should be enough. For example, in the disk version of Bestial Warrior there's a program than dumps a bitmap in the screen at then it changes the color table. The bitmap is done with the right timing, but the color table is written too fast. I think that simply comparing that small program in the emulator and in several real machines should be enough.
In any case, I agree it'd be good to test different speeds, but I think that's gonna be difficult with the Z80.
The best would be to put the VDP in a protoboard and to write to VRAM with an Arduino o similar to test different speeds. But let's try first a non-invasive and pain-free strategy
Just a screenshot of my patched code taking into account the TMS9929A, with adjusted constant 2500 (it was 4930 for the T6950PAL). The emulation and the real machine look pretty much the same.
Executed with parameter -machine Philips_VG_8020 in openMSX.
I don't mean more tests are not needed. They are, absolutely. Also to adjust the constants as most as accurate as possible.
This is just a proof of concept to show that it seems that the problem comes from a small timing shift that depends on the actual VDP.
I hope your tests in real machines will confirm this, pgimeno.
I wanted to check if adding a short delay actually is able to make the artifacts of the emulation and the real machine match.
The problem is the subjectivity inherent to this test. Some hundred cycles up or down might make it visually seem to match in the case of this program and make it behave differently in another.
But with the Z80 there won't be much granularity to test different speeds.
In a MSX2, probably not, but that's not my main concern. We can test with the meaningful delays: 12, 14, and from 17 on.
Syncing CPU and VDP should yield reproducible results, rather than random as in the case of the Obliterator test, which I consider too rough to be meaningful. It remains to be seen how doable this is. In the speccy, to find out the contended memory timings I devised an interrupt routine that synced the CPU with the ULA to the exact cycle (not to a multiple of 4 cycles as would happen with a HALT). Hopefully something not too different can be done here, to help us get meaningful readings.
Edit:
Executed with parameter -machine Philips_VG_8020 in openMSX.
Apparently that's the 8020/00. To emulate the 8020/40 I use Philips_VG_8020-20 which has the same VDP as mine in the configuration file. Not that it should matter at the moment, as it doesn't discern VDPs yet.
Honestly i cannot see the reason of doing a lot of test to discover a thing that is extremely well known about TMS VDP. It is mentioned in detail even in the TMS DataSheet.
For example, you cannot pump data at rate of less than 8us in active area/screen 2.
If you do, it is illegal and this affect the correct behaviour. What does matter if in openMSX you see three corrupted pixels and on the real machine only 2? It is illegal. Stop. It must not be done in this way. And to get the correct behaviour you need to change the code.
Even if openMSX did not show the same artifacts on video than the real machine, it is not important if it properly report "too fast vram access". This is the only thing that matter.
Unless you fix the code the access does not work as expected: so why bothering if you have three or four pixels corrupted? The only thing that matters is to have a more reliable message saying you "Hey, guy, you are going too fast"
The problem is the subjectivity inherent to this test. Some hundred cycles up or down might make it visually seem to match in the case of this program and make it behave differently in another.
In the case of the TMS9129 its moving the access slot 704 ns before, which is (approx.) 2.5 t-states. It's small and probably this time shift is in the actual circuit or VDP. And it seems to be different according to the VDP.
But I agree: many tests are needed to ensure that the emulation with this time shift matches what the real machines do.
But for the moment this is fixing Karateka and it seems it's making the artifacs of Obliterator match (I've only seen two pictures, I agree). Can it be just by chance and the reason be totally different? It could be. That's why we need more tests. Is this likely to be something else? I really doubt it, and I think these time shifts are the reason. I just need more confirmations from real machines.
Syncing CPU and VDP should yield reproducible results, rather than random as in the case of the Obliterator test, which I consider too rough to be meaningful. It remains to be seen how doable this is. In the speccy, to find out the contended memory timings I devised an interrupt routine that synced the CPU with the ULA to the exact cycle (not to a multiple of 4 cycles as would happen with a HALT). Hopefully something not too different can be done here, to help us get meaningful readings.
If you manage to do with the MSXs, it'd be great!
However, it seems that the time shift it's really short (3 t-states), if I did my numbers correctly. I'm not sure if you could measure it. That's why I'd choose an indirect method like filling in the color table at full speed an interpreting the resulting images from the emulator and the real machine. To then figure out if there are hidden time shifts.
I'm not a fan of the "Obliterator test", or looking at pictures. It's just that I don't have a real machine to do it better!
I can't provide more information without the real machines, but your tests can.
Manuel: you have like ten billion MSX machines at home, right?
Apparently that's the 8020/00. To emulate the 8020/40 I use Philips_VG_8020-20 which has the same VDP as mine in the configuration file. Not that it should matter at the moment, as it doesn't discern VDPs yet.
Oops, thanks you telling me. In my configs the Philips_VG_8020 has a TMS9929A and Philips_VG_8020-20 a TMS9129. They could have slightly different timings.
Even if openMSX did not show the same artifacts on video than the real machine, it is not important if it properly report "too fast vram access". This is the only thing that matter.
Unless you fix the code the access does not work as expected: so why bothering if you have three or four pixels corrupted?
Hehe... it's even worse, because we try to make the defect as realistic as possible! From a user's point of view, indeed it's better to ignore the too-fast access and forget about it.
Why do I bother? Well, it's just enjoying the challenge of making the emulation a little more realistic and closer to the real machines. Is it important? For some (relatively) yes, for others not at all. But it's fun!
If someone writes a good test program and I can clear up my attic, I'd be happy to run it...
Even if openMSX did not show the same artifacts on video than the real machine, it is not important if it properly report "too fast vram access". This is the only thing that matter.
Unless you fix the code the access does not work as expected: so why bothering if you have three or four pixels corrupted? The only thing that matters is to have a more reliable message saying you "Hey, guy, you are going too fast"
Either the emulation reproduces the glitches, or it doesn't. If it doesn't, it feels less accurate than the real machine. If it does, it has the responsibility of doing it right, lest it shows more glitches than the real machine (like in the case of Karateka) or fails to reproduce the glitches at all. That's how I see it at least.
the point is another.
openMSX should emulate all features of msx hw and even some hidden ones that some sw may use.
for example, the overscan should be emulated because it is a V9938 feature/behaviour that it is used.
Different is the VRAM access timing problem. Because it's more a limit than a feature, and refer to an illegal usage of TMS vdp no sw used it or was able to take advantage of it.
So What is the point of having this feature fully emulated if no sw can use or take advantage of it and if doing to fast I/O make a sw defective or useless? It has no reason to be accurate.
There is no spec that on TMS say: 'if you do too fast I/O the last or first byte is written to VRAM'. Simply you should not or if you do the results depends on some factors not deterministic or in other words are undefined.
It's more similar to what the C standard describe about dereferencing a dangling pointer. Simply the result is 'undefined'.
What's the point in emulating an undefined behaviour?