Just a somehow stupid question: Karateka works fine in real MSX-1, right?
I'm doing some tests in the emulated HB-20P (T6950A VDP) and this game, and it turns out that the minimal time shift that makes the game work is 1280. This, translated to time is advancing the appearance of the access windows 0.372487 μs.
The minimal duration between writes is, according to the datasheet of the V9918, 2 μs + 6 μs = 8 μs.
I suspect that what might be happening is that the VDP actually performs the first delay in less than 2 μs (say, in 2 - 0.372487 = 1.63 μs). Then it perform whatever operations it needs, and at the end it will start over. The available time for the operations will be always 4 cycles, but I think that if instead of taking 2 μs it takes less, there will be a time shift (but no smaller or larger slots).
The "operation cells" I'm referring to it's the color cells that one can see here: http://map.grauw.nl/articles/vdp-vram-timing/vdp-timing-v2.png in the timing analysis.
In http://map.grauw.nl/articles/vdp-vram-timing/vdp-timing-2.html we can read: "Though as explained above that could be anywhere from 1.5µs to 2.5µs, this is between 8 to 13 VDP cycles or 5 to 9 Z80 cycles." And 1.63 μs is inside this window.
I'm not sure if all this is correct. Just some quick computations, but there can be (big!) mistakes.
Why Karateka works? Actually, it performs OUTs very quickly, way below the 29 t-state limit. I guess that it manages pretty well the timing and makes its writes coincide with the access windows it knows in advance. Just a guess. If not, I don't understand at all how it can actually work in the real MSX-1 machines. If that's the case, it means that it needs that the access windows appear exactly where it expects them. So probably the 2 μs is always less in all the v9918 VDP family.
What do you think?
I have only run Karateka up to the scrolling text, not the game. The scrolling text looks almost perfect in the VG-8020/40. Just a few characters here and there glitch a bit, if we leave apart the tearing, which happens also in the emulator when ignoring timing problems. Haven't looked into the code yet to check how fast it writes to the VDP.
I looked only at the scrolling text too.
I'm copying here what I wrote in the gitHub's PR in case it can be useful for your tests in the real machine. Or for anyone else reading this weird thread
With an emulated Sony_HB-20P, my minimal constant that makes Karateka work is slotTimeShift=1280. This constant is the number of ticks of the main clock (defined in EduDuration.hh (MAIN_FREQ = 3579545ULL * 960).
The duration of the time shift is shift_segs_per_cycle = slotTimeShift / MAIN_FREQ = 372.486820 ns.
Let's see now how many VDP ticks is that time shift: 372.486820 ns / 46.560852 ns = 8.000000.
It's "exactly" 8. The code in openMSX assumes that it's a V99x8 running, so we need to divide by 4 to obtain the number of advanced cycles in the case of the T6950A. So, it's an advance of 2 cycles for the T6950A.
OK, the first step is completed. I don't have the test program yet but I have a WIP version which performs the base test that will serve to build on it: a program to measure exactly how many T-states are there between VDP interrupts.
It took a while more than expected, first because I had trouble figuring out the algorithm (until I devised the scheme stated in the .asm); then because I ran into a VDP bug where reading the interrupt bit is not reliable, and finally because of a bug in my own program: I re-enabled interrupts too soon and that screwed the measurement.
In my machines it reports:
NMS-8250: 71364T/frame in 50Hz mode, 59736T/frame in 60Hz mode.
VG-8020/40: 71364T/frame.
HB-10P: 71745T/frame.
The HB-10P contains independent crystals for the CPU and the VDP; the CPU is clocked at the usual 3.58MHz and the VDP (a T6950) at 22.168MHz. Apparently you can't use synced clocks if they use those freqs. On IRC, grauw did some maths to try to explain the differences. He calculated a theoretical value of 71705T/frame.
The most puzzling thing is that I checked whether that 71745 result changed at some point, but it was the same value for the ~15 mins I left it running.
The program is here: http://www.formauri.es/personal/pgimeno/files/msx/vdptest.zip - I will update it as new tests are written. I need the exact number of cycles between frames as a prerequisite for the other tests, so this was a necessary first step. The final program will report this number, of course.
If I would have to characterise my impression of the difference between the TMS9918 and the T6950 VDPs, I would say the former is “designed for NTSC” and the latter “designed for PAL”.
Given the T6950 VDP master clock of 22.168 MHz (5x PAL subcarrier) (higher than the TMS/V99x8), and a suspected horizontal scan frequency closer to 15625 Hz than to 15700 Hz (lower than the TMS/V99x8), I expect its access slot timings are a bit different from the TMS9918. The VDP timings research linked to earlier really only investigated the TMS9918 and the V99x8 as far as I know.
Do the artifacts look different on the Sony HB-20P compared to the Philips VG-8020/40?
By the way, here in this page there's a good picture where one can see clearly the VDP's crystal in the HB-20P, of exactly 22.168 MHz. A direct observation. For other models we need to trust the datasheets. Better to look at pictures than opening the poor machines
http://mymsx2.free.fr/montages/FIX_A8H_HB20/fix_port_a8h_hb2...
Independently from the tests of pgimeno (which are quite valuable, because they provide data from the real machines), I have an attempt of explanation for the 2-tick delay I obtained with the emulator. A time shift of 2-tick delay fixes Karateka in the emulator.
I'm copying here what I wrote as a comment in my Github's PR:
In the document about the timing measurements that was made some time ago [1], we can read (in "general (memory) timings"): "The TMS9918 runs at 5.37MHz (1.5×3.58MHz, 4× slower than the V9938). One display line takes 342 cycles (as expected, 4× less than on V9938). One memory access takes only 2 cycles or 372ns. So compared to V9938 each memory access takes slightly longer (on V9938 one access takes 6 cycles or 279ns)".
So, they have different memory access times: 372 ns vs. 279 ns.
The number of ticks of the difference between these two times is (372 - 279) * 1e-9 * 21477270 = 1.9973861100000003 ticks.
We can compute the access times ourselves from the clock frequency and assuming that there will be always 1368 ticks/line. This gives us 372.49 ns and 279.37 ns. With this we obtain (372.49 - 279.37) * 1e-9 * 21477270 = 1.9999633824000003 ticks.
In practice, both mean 2 ticks.
When the 9918 is used in the emulator, it is expecting to see the access windows 2 ticks before, so they need to be advanced in the emulation. Actually, the current VDP code in openMSX assumes a 9938 and then it makes some adjustments (like multiplying the number of ticks by 4 in VDPAccessSlots.cc for the MSX1 case). This is just an additional adjustment.
In the case of a PAL 9918, we'd have (360.88 - 279.37) * 1e-9 * 22168000 = 1.80691368, which accounts for 2 ticks also.
[1] http://map.grauw.nl/articles/vdp-vram-timing/vdp-timing-2.html
I've created a Git repository of the VDP test program, for those who want to follow the development. I've started with the version above, and prepared it to make it easy to add the incoming tests, by subdividing it into modules.
Edit: I'm trying to solve an issue. The results I posted are not reliable. Please stay tuned.
Sorry for the above, I needed to fix a bug in the measurement routine that caused incorrect results.
Here's a preliminary result of my VDP timing tests related to this issue:
In the TMS9129, in screen 2, fast writes always succeed up to 27129 CPU cycles (aka T states) after the interrupt, inclusive. The first unsuccessful write happens at cycle 27130 after the interrupt (119 scanlines of 228T each, minus 2 cycles).
In the emulator, however, the first failing cycle is 27588 (121 scanlines exactly). That's a difference of 458 cycles for this VDP model (2 scanlines + 2 cycles).
Note that the changes that allowed me to come to this conclusion are not committed yet. I need to wrap my head around some somewhat unexpected results I'm getting before I commit and push them. That's also why the results are preliminary.
This is unrelated, but the other results for the TMS9129 that I got (these are final) are:
- Acknowledging the VDP interrupt works up to 4 or 5 cycles before the interrupt is actually triggered. This means that if you read port 99h 4 (sometimes 5) cycles before the interrupt triggers, it won't.
- Bit 7 of port 99h will be set if you read it up to 3 or 4 cycles before the interrupt is triggered, but always at least 1 cycle after the value in the previous point. This leaves a small window of 1 to 2 cycles at which, if you read the port, you will totally miss both the interrupt and bit 7.
- It takes between 3 and 5 cycles inclusive for the /INT line to go inactive (high) after port 99h is read. I haven't found a way of measuring it with better precision.
My explanation for the varying values of 4 to 5 and 3 to 4 for the first two points is that, since the VDP clock is different (faster) to the CPU clock, CPU and VDP are not always in the same phase, even if they share the same crystal (which is the case for this model). I think there are 6 possible phases between CPU and VDP. Strangely though, they are always the same values until I reset or power-cycle. I haven't found any other way of changing that phase yet, assuming the explanation is correct.
For comparison, the values for the emulator are:
- Acknowledging the VDP interrupt works starting at the exact cycle at which the interrupt is triggered.
- You get bit 7 set if you read the port at the exact cycle at which the interrupt is triggered. This leaves no window for missing bit 7, unlike in the real MSX.
- It takes between 0 and 2 cycles (probably 0) for the interrupt line to go inactive after reading port 99h.