This is a very unique issue and only brought me some trouble because of my relative inexperience working with the 9938.
I'm writing a very simple tile-by-tile scroll using pattern layouts in screen 4 (aka GRAPHIC3). To test this I've been reading rows of tiles from the pattern layout in the VDP to a buffer, shifting them left or right, and moving in the new screen one column at a time.
I got it working in openMSX but noticed a peculiarity - occasionally tiles would be in the wrong spot, but it was very inconsistent (maybe 1-3 tiles per entire screen of scrolling) so I tested it on hardware to see what would happen.
Instead of being relatively okay, everything was reeeally messed up - up to 5 tiles in a row had not read their new values properly, so I looked at my code:
out [VDP_STATUS], a ld a, (hl) ; 7 t-states out [VDP_DATA], a
This seemed to be okay, but
out [VDP_STATUS], a in a, [VDP_DATA]
This most definitely was not. I saw somewhere that its best practice to have 8 T-states in between I/O accesses, so I tried this:
out [VDP_STATUS], a nop nop in a, [VDP_DATA]
And it worked like a charm on hardware with no glitches.
I don't really know how this would end up being emulated properly in openMSX, but if you have any questions let me know and I can post video or screenies.