Direct Video Memory Access (DVMA) for V9938

Page 4/6
1 | 2 | 3 | | 5 | 6

By PingPong

Prophet (3834)

PingPong's picture

26-10-2010, 00:04

oh, bluemsx does not warn too fast vram acess with this one:

ld hl,0x0000
ld c,0x99
out (c),l
out (c),h
in a,(0x98)

it should warn lack of 8 microseconds delay between port 99 and port 98 acess (independant of whether port 98 IN or OUT). but only in case VDP was set up for read mode.

@hit9918:
i've already used this. It has the drawback that you need to waste 'c' register . Plus out (c) is slower than out (immediate),a.

About the delay, in a vdp clone (tms) i've experienced a regular read only with 3 nops between out(c),h and in a,(0x98). for msx2 the nops required are 2 instead of 3. I can confirm that in the case tou are writing there is no need to wait for the first data write.

By Eugeny_Brychkov

Paragon (1184)

Eugeny_Brychkov's picture

26-10-2010, 10:12

it should warn lack of 8 microseconds
According to oscillograms full video I/O cycle takes about 8 CPUCLKs in screen0 and 5.5 CPUCLKs in other video modes. VDP can insert 3 read or write ops in screen0 and 2 in other modes during this cycle. It means that for reliable operation minimal delay required is 2.25ms (nop+nop) in screen0 and 1.55ms (nop+nop, but single nop will work in most cases too) in other modes. 'ex (sp),hl'*2 takes 19*2=38 t-states, and seems toooo long delay even at 7MHz CPU speed.
'otir' should work perfectly at standard 3.5MHz speed because its duration is 21 t-states which gives enough time for VDP to flush data to video memory. 'inir' should work well too for the same reason (given programmer gives time for initial byte prefetch). Both should also work at 7MHz.

By hit9918

Prophet (2911)

hit9918's picture

27-10-2010, 19:27

@PingPong:

i've already used this. It has the drawback that you need to waste 'c' register . Plus out (c) is slower than out (immediate),a.

It depends on usage. vertical laser:

	;c = 0x99, de = 32, a = tile
vertical:
	out (c),l
	out (c),h ;upper bit already set for write mode
	out (0x98),a
	add hl,de
	djnz vertical

this time out (c) is much faster. c was still free, and after using it for the port, all 7 main registers in use Smile


About the delay, in a vdp clone (tms) i've experienced a regular read only with 3 nops between out(c),h and in a,(0x98).

3x nop is 15 cycles (M1 wait on all opcode bytes). and IN is 12 cycles, so this would be 27 cycles total. this is hairy because one should not go below 29 cycles on MSX 1.

so I recommend using this one on MSX 1:

...
out 99
5 nop
5 nop
8 and 42
12 in a,(0x98)
--
30 cycles


for msx2 the nops required are 2 instead of 3.

I heard that MSX2 got no VDP delay problems. does this mean it can do otir or can it do the 18 cycles outi outi?

dec de + in (0x98),a = 19 cycles, does DEC DE instead 2x NOP work?
you said one NOP is not enough, so 17 cycles is too small.

I can confirm that in the case tou are writing there is no need to wait for the first data write.

which fits the model of one byte buffer inside the VDP which is read/written at vram acess slot.
at the begining of a vram write the VDP buffer is empty and can be fed with an OUT immedeately. because it is not VDP internals which would wreck by too fast OUT, the first OUT to internal buffer can go with zero waiting.

the action inside VDP is too simple to ever be slower than a z80, that is my model.
probably a turbo-R can permanently change palette registers out 99 out 99 without wrecking?

The actual problem is that the VDPs dont provide a WAIT signal when internal buffer is still waiting for a vram slot.

By hit9918

Prophet (2911)

hit9918's picture

27-10-2010, 19:35

@Eugeny_Brychkov:

It means that for reliable operation minimal delay required is 2.25ms (nop+nop) in screen0 and 1.55ms (nop+nop, but single nop will work in most cases too) in other modes.

but the head of any z80 IO instruction already got at least 8 cycles (2.24 micro seconds) in which it does not touch the IO pins! so by your theory it would be impossible to ever wreck on MSX2, including OUT 99 + OUT 99 + IN 98.

z80 manual mentions how the cycles of instructions sum up:

out (n),a: 5,3,4 = 12
out (c),r: 5,5,4 = 14
outi: 5,6,3,4 = 18

(for MSX I added 1 M1 waitstate cycle for every package that is about fetching an opcode byte).

the IO instructions all got the same 4 cycle tail where they go at the IO pins.
so one can do this: delay between raising IO pins = delayinstructions + IO instruction behind the delayinstructions.

bluemsx MSX 1 does "break on vram" with

outi
nop
nop
outi

build the sum: nop + nop + outi = 28 cycles. below the MSX1 29 cycles minimum, so bluemsx does complain. (sidenote: just had problems triggering this. because I had forgotten to set c to 0x98).
but bluemsx does not complain about nop + dec de + outi = 30 cycles.

so bluemsx seems to use the same formula.

the thing where bluemsx forgets to warn is the IN 98 after a 99 setup of vram read mode.

By multi

Expert (74)

multi's picture

27-10-2010, 21:05

Is it just me or do many people here forget that the msx standard does not allow dirrect use of port #98 & #99 for the video chip? As far as I remember you have to read the IO ports from the bios rom at some address (wasn't it #0006 & #0007?)

Anyway, all code examples showing code like: out (0x98),a are obviously not msx compatible code. Furthermore the discussion if "out (n),a" or "out (c),r" should be used is also irrelevant...

By hit9918

Prophet (2911)

hit9918's picture

28-10-2010, 00:40

Is it just me or do many people here forget that the msx standard does not allow dirrect use of port #98 & #99 for the video chip?/quote]

Quote:
Furthermore the discussion if "out (n),a" or "out (c),r" should be used is also irrelevant...

It would be possible that the game at startup pokes the out 0x98 opcodes from BIOS value!
Because it is possible without speed penalty, I would be willing to support VDP cartridges, but has anyone actually ever seen such thing?

I dont mean things like some VGA overlay, I mean a cartridge VDP meant to run MSX software.

By flyguille

Prophet (3028)

flyguille's picture

28-10-2010, 02:22

Is it just me or do many people here forget that the msx standard does not allow dirrect use of port #98 & #99 for the video chip?/quote]

Quote:
Furthermore the discussion if "out (n),a" or "out (c),r" should be used is also irrelevant...

It would be possible that the game at startup pokes the out 0x98 opcodes from BIOS value!
Because it is possible without speed penalty, I would be willing to support VDP cartridges, but has anyone actually ever seen such thing?

I dont mean things like some VGA overlay, I mean a cartridge VDP meant to run MSX software.

there is the "80 columns expansion for msx1 systems", that is a cartridge with 9938 inside.

But anyway #0007 shows the i/o value for onboard vdp, not for cartridge expansions.

and as to use the same I/O addr is not posible with readable registers like VDP has, it means, that #0007 is only used for IF a given MSX model has the vdp placed elsewhere.

Anyway the 80 columns expansion has the vdp in others I/O, so, only specialized software can use it.

By Leo

Paragon (1236)

Leo's picture

13-11-2010, 22:09

looks interesting , how kb/s compares with regular copy ram to vram commands ( the one that goes through port 98h )

By PingPong

Prophet (3834)

PingPong's picture

13-11-2010, 23:49

@PingPong:

Quote:
i've already used this. It has the drawback that you need to waste 'c' register . Plus out (c) is slower than out (immediate),a.


3x nop is 15 cycles (M1 wait on all opcode bytes). and IN is 12 cycles, so this would be 27 cycles total. this is hairy because one should not go below 29 cycles on MSX 1.

May be, but this on a vdp clone this does not give any problems.
Plus when doing consecutive outs a delay between each one is safe at 26 T-States

Quote:
I heard that MSX2 got no VDP delay problems. does this mean it can do otir or can it do the 18 cycles outi outi?

Quote:

the problem is the delay between the setup of address pointer and the first data write:
when writing there is no problem, one can write just after address setup. Of course another write must respect the minimum delay. On msx2 however, vdp can work in active area even with a block of outi instructions

different is when reading:
the first (and only the first) data read MUST wait the delay i've mentioned, next data read on msx2 are safe even on ini. It's easy to understand why there is difference between writes and reads.

When you write data you feed both address and data (that gets buffered). Even if the vdp has not completed the write, before the next write to data, most probably the byte is already stored in vram

But

When you read data, you feed address, BUT the VDP has not yet done the read, so if you try to read to early you read the incorrect buffered data. that's because the read-ahead.

please condider that there are two delays in vdp

1) the delay needed after an address setup (must be always respected) (about 2us)
2) the delay between two outs on data port (6us about)

By PingPong

Prophet (3834)

PingPong's picture

13-11-2010, 23:53

@Eugeny_Brychkov:

Quote:
It means that for reliable operation minimal delay required is 2.25ms (nop+nop) in screen0 and 1.55ms (nop+nop, but single nop will work in most cases too) in other modes.


but the head of any z80 IO instruction already got at least 8 cycles (2.24 micro seconds) in which it does not touch the IO pins! so by your theory it would be impossible to ever wreck on MSX2, including OUT 99 + OUT 99 + IN 98.

in the case of reading you are wrong!
if you do out /out/ in you must take into account 2us (out/out) + 6us(in) to read reliable data
however, when writing, (out/out/out)
you can safely output data at max speed on z80 because the third out is buffered!
(of course, another out to data port should wait )

Page 4/6
1 | 2 | 3 | | 5 | 6