Small inconsistency with 9938 VDP timings...

Page 1/4
| 2 | 3 | 4

By shram86

Expert (117)

shram86's picture

01-08-2019, 03:39

This is a very unique issue and only brought me some trouble because of my relative inexperience working with the 9938.

I'm writing a very simple tile-by-tile scroll using pattern layouts in screen 4 (aka GRAPHIC3). To test this I've been reading rows of tiles from the pattern layout in the VDP to a buffer, shifting them left or right, and moving in the new screen one column at a time.

I got it working in openMSX but noticed a peculiarity - occasionally tiles would be in the wrong spot, but it was very inconsistent (maybe 1-3 tiles per entire screen of scrolling) so I tested it on hardware to see what would happen.

Instead of being relatively okay, everything was reeeally messed up - up to 5 tiles in a row had not read their new values properly, so I looked at my code:

        out [VDP_STATUS], a 
        ld a, (hl)          ; 7 t-states
        out [VDP_DATA], a 

This seemed to be okay, but

        out [VDP_STATUS], a 
        in a, [VDP_DATA]

This most definitely was not. I saw somewhere that its best practice to have 8 T-states in between I/O accesses, so I tried this:

        out [VDP_STATUS], a 
        nop 
        nop 
        in a, [VDP_DATA]

And it worked like a charm on hardware with no glitches.

I don't really know how this would end up being emulated properly in openMSX, but if you have any questions let me know and I can post video or screenies.

Login or register to post comments

By Grauw

Ascended (10818)

Grauw's picture

01-08-2019, 11:50

The only speed limitations should be related to VRAM access (due to the access slots mechanism), you should need no waits between VDP register access. The nop/nop thing is a bit of a myth that dates back to the time when people didn’t realise waits only applied to VRAM access and recommended adding them just to make sure. I myself do exactly what you do without any issues at all on real hardware.

Edit: But actually you don’t mention what your VDP_STATUS and VDP_DATA defines are, so I actually don’t know what you’re doing :). Using the VDP_STATUS identifier to write seems rather strange (I would expect a read). And if it does refer to 99H, then you should always write in pairs, so you’ve got an incomplete write.

By PingPong

Enlighted (4155)

PingPong's picture

01-08-2019, 11:50

Probably vdp_status mean x99
So write vdp status is really set first byte of vram access

By Grauw

Ascended (10818)

Grauw's picture

01-08-2019, 11:54

PingPong wrote:

Probably vdp_status mean x99
So write vdp status is really set first byte of vram access

Ok, that trick only works on TMS9918, and not even sure if it works on all variants (Toshiba/Yamaha clones). It does not work on V9938. People should not rely on that implementation detail of TMS9918, it’s not compatible.

By DarkSchneider

Paragon (1030)

DarkSchneider's picture

01-08-2019, 11:54

Yes we need to know those definitions. Also rendering SC4 with sprites enabled is the most expensive task for the VDP I think, so it could get some extra time for access. Because that is important to know what exactly you are doing there.

By Grauw

Ascended (10818)

Grauw's picture

01-08-2019, 11:57

And as for openMSX emulation correctness, are you using an openMSX machine that matches your real hardware? If openMSX is an MSX1 and your real hardware an MSX2, that could explain the difference.

By shram86

Expert (117)

shram86's picture

01-08-2019, 12:02

Status is 99, data is 98. The system is identical to my hardware. I omitted most of the code because it's not relevant, I'm doing reads and writes properly (otherwise I wouldn't be getting the result I wanted Smile)

Im not sure how it's a myth when I saw the evidence on hardware. I'll up videos later today.

By Grauw

Ascended (10818)

Grauw's picture

01-08-2019, 12:16

Ok, in that case, you probably just set up the VRAM read address and then try to read the value, correct?

There are speed limitations for VRAM access, you need to wait a certain amount after writing and a certain amount before reading. This is documented in the TMS9918 and V9938 application manuals, and in extreme detail in this research by the openMSX team. OpenMSX should be emulating this but perhaps not absolutely perfectly when you’re right on the edge.

The myth I was referring to was related to waits between register access, but that’s not what you are doing (I assumed you were at first but didn’t actually have the context to make that assumption).

By DarkSchneider

Paragon (1030)

DarkSchneider's picture

01-08-2019, 12:19

Are you getting the same tile than the previous read for those wrong ones? If that is the case, I think is a problem of VRAM time access. As mentioned, be careful when using SC4 with sprites.

I recommend to use a buffer in RAM, as memory operations are much faster than getting data from VRAM, and once you have your final result (all columns shifted), then copy all-at-once from RAM to VRAM using the BIOS LDIRVM command.
If you don't want to make many modifications to your current code, you can also read from VRAM with the BIOS LDIRMV command.
Try it and tell us if you get inconsistencies yet.

By shram86

Expert (117)

shram86's picture

01-08-2019, 14:45

Thanks for the tips, I'll check that out as well (didn't realize there was a bios command for loading vram). I am indeed getting the same tile.

As far as using a buffer in RAM, that's partially what I'm doing - the issue comes in from reading the tile number itself.
i.e. set the location of VRAM access (I store these in hl and increment accordingly) then immediately load. I don't quite understand how a RAM buffer can help in this particular circumstance, and I don't see any solution besides NOP.
The (erroneous) block looks like:

        ld a, l 
        out [VDP_STATUS], a 
        ld a, h 
        out [VDP_STATUS], a 
        in a, [VDP_DATA]  ; < reads incorrect byte

followed by various operations (well over 20 t-cycles) and a loop.

Grauw, I am aware of the wait time between commands. Your own website states that this limitation is not present during vblank (I only perform this many operations during vblank, then update the screen once and repeat - the operation as a whole takes about 4 frames), and seems to imply that it only matters for VDP writes - I was using your page as a reference when trying to debug this. I didn't see anywhere that explicitly states that regardless of which VDP register you use you must have a cycle wait, or that there is an issue of timing after setting the VRAM read target in the status registers - unless it just went over my head Smile

I'm clearly not an expert, but this method in particular is not OK on hardware. As I mentioned in the OP, openMSX *sort of* gets the timing right but its off by just enough cycles that hardware provides very visibly different results Smile

By PingPong

Enlighted (4155)

PingPong's picture

01-08-2019, 15:38

When, after the set vram address pointer, you send data to vdp port, the vdp immediately buffer the data you supplied. So, later the vdp write your byte.
In read operations however, this cannot work.
What's happen is :
A) You write the low byte of vram ptr (at this time vdp cannot know if the value you wrote is the low byte of vram or the full value of a vdp register as this information will be supplied in the next byte)
B) you write the second byte. Some of the bits do specify that you want to read and that you want to do a vram read.
Only now the vdp know that you want to setup vramptr for reading and you will read a byte value.
But vdp need some some time after B. IF you immediately read data port the data is not still ready

Page 1/4
| 2 | 3 | 4