Direct Video Memory Access (DVMA) for V9938

Page 5/6
1 | 2 | 3 | 4 | | 6

By hit9918

Prophet (2891)

hit9918's picture

14-11-2010, 12:53

so by your theory it would be impossible to ever wreck on MSX2, including OUT 99 + OUT 99 + IN 98.
in the case of reading you are wrong!

Yes. I was using your report that this code does fail. To falsify the theory that an 8 cycle loop seen on oscilloscope is the same time as the delay one got to take care of.

By hit9918

Prophet (2891)

hit9918's picture

14-11-2010, 14:38

@PingPong:

3x nop is 15 cycles (M1 wait on all opcode bytes). and IN is 12 cycles, so this would be 27 cycles total. this is hairy because one should not go below 29 cycles on MSX 1.
May be, but this on a vdp clone this does not give any problems.
Plus when doing consecutive outs a delay between each one is safe at 26 T-States

I hope you are not planning games which say "MSX 1 with fast VDP clone only" Wink


On msx2 however, vdp can work in active area even with a block of outi instructions

you reported "for msx2 the nops required are 2 instead of 3" between OUT 99 and IN 98.

if that was measured with the 12 cycle "IN akku", then the delay needed is bigger than "nop + IN akku" which is 5+12 = 17 cycles.
and outi is 18 cycles. so the required delay is 18 cycles.

ld c,0x98
ld a,l
out (0x99),a
ld a,h
out (0x99),a
nop ;5 cycles
in a,(c) ;14 cycles

this should work because nop + in a,(c) = 5+14 = 19 cycles.

isnt there some official spec paper for MSX2 VDP? so far it looks like "MSX2 can take exactly those 18 cycles of an outi outi outi...".


please condider that there are two delays in vdp

1) the delay needed after an address setup (must be always respected) (about 2us)
2) the delay between two outs on data port (6us about)

Actually the only thing one needs to know is the 8 microseconds total,
because no z80 IN/OUT instruction is faster than 2 microseconds
(and one should note the 8 microseconds are the MSX 1 spec).

And that the vram read case is different does not have to do with those two different delay types.

By PingPong

Prophet (3521)

PingPong's picture

14-11-2010, 19:32

@hit9918:


@PingPong:

Quote:
Quote:
3x nop is 15 cycles (M1 wait on all opcode bytes). and IN is 12 cycles, so this would be 27 cycles total. this is hairy because one should not go below 29 cycles on MSX 1.
May be, but this on a vdp clone this does not give any problems.
Plus when doing consecutive outs a delay between each one is safe at 26 T-States


I hope you are not planning games which say "MSX 1 with fast VDP clone only" Wink

What are you saying here? from my info the "vdp clone" is *WORST* than the real one. So my test are in the worst case.
That's the reason i've tested the vdp clone. otherways make no sense.
do you have different experience? Is the real vdp more slow than the clone? Please don't tell me this. CryingCryingCryingCryingCrying


you reported "for msx2 the nops required are 2 instead of 3" between OUT 99 and IN 98.

if that was measured with the 12 cycle "IN akku", then the delay needed is bigger than "nop + IN akku" which is 5+12 = 17 cycles.
and outi is 18 cycles. so the required delay is 18 cycles.

ld c,0x98
ld a,l
out (0x99),a
ld a,h
out (0x99),a
nop ;5 cycles
in a,(c) ;14 cycles

this should work because nop + in a,(c) = 5+14 = 19 cycles.

isnt there some official spec paper for MSX2 VDP? so far it looks like "MSX2 can take exactly those 18 cycles of an outi outi outi...".

"Should work", instead does not work. My data is experimental, on a standard msx2 @3.5Mhz.
the vdp can work with a block of outi statements, even in active area, @60Hz, with a copy command in progress and all 32 16x16 sprites on screen, in 4 rows of 8 sprites each. This is the worst condition. the vdp work correctly.
However, when doing a vram read, if i do not interleave almost 2 nops between the last out and the in instruction i read garbage.
That's the results.

Unfortunately, there is no official docs. the yamaha ones, have values, but those are aligned to msx1 vdp timings. So teoretically even a bunch of outi, for example should not work. However some MSX2 BIOS functions that are specifically for msx2 vdp (such LDIRMV) are coded in a faster way, (such using otir in active area)

Actually the only thing one needs to know is the 8 microseconds total,
because no z80 IN/OUT instruction is faster than 2 microseconds
(and one should note the 8 microseconds are the MSX 1 spec).

And that the vram read case is different does not have to do with those two different delay types.

No, i do not agree. if you are reading data after address setup, you need to take into consideration the first 2us delay, to 'calibrate' the when the next in on data port should happen.
After this, the successive in/out on data port should not have this first delay- i'm referring to TMS docs to say this.

however, i think a small test routine should be created to discover *effectively* the real situation and to see differences between vdps and or msxs

I repeat again, afaik the vdp clones are slower than the original.

By Leo

Paragon (1236)

Leo's picture

15-11-2010, 13:51

Just one question aside :

In the V9938 databook we can see that there is a 64k expansion bank , additionally to the 128kb vram.
one can only copy from/to this expanded page , i dont think one can display a page from there, interesting would that mean vdp
does not access this bank always and then the data/Addr bus from/to this bank can be fully/exclusively controlled by Z80 for a long
period of time ?
IIRC there is a pinon the VDP to select that bank or not , so we can know when VDP do not access this bank.

The question is :
What about expanding this scheme to bank swithching in the expanded area so VDP can see all the Z80 RAM , bank by bank ?

(this is my 1000nth post !!)Cool

By RetroTechie

Paragon (1563)

RetroTechie's picture

15-11-2010, 14:20

(this is my 1000nth post !!)Cool
Congrats! Now you can't reply anymore without destroying that nice number... Hannibal

For selecting VRAM there's 3 #CAS signals: #CAS0 (1st 64KB), #CAS1 (2nd 64KB) and #CASX (64 KB expanded VRAM). But all (video) RAMs share the same #RAS signal (multiplexed address lines), so you can't use that expanded VRAM independently when the VDP isn't using it. Same thing for data lines. Summarized: when you make some sort of banking mechanism to access VRAM outside of VDP, accessing expanded VRAM would be bound by same restrictions as regular VRAM. It's not the chip access that's the bottleneck, but availability of the bus where the RAMs are on. And yes, if you would somehow 'disconnect' a set of RAM chips & control them directly, you could do as you wish (but then for the VDP, they'd 'disappear' too which you may not want).

Also because the VDP can't display expanded VRAM contents, there's little point in using it other than as 64 KB data storage. If you have a banking mechanism to juggle around VRAM contents, then expanded VRAM becomes even less useful.

By hit9918

Prophet (2891)

hit9918's picture

15-11-2010, 20:29

@Leo:
looks interesting , how kb/s compares with regular copy ram to vram commands ( the one that goes through port 98h )

MSX2 doing OUTI is actually same speed as the RAM instruction LDI.

The problematic thing is random vram acess. That would help the direct spectrum port,
whereas if you make an MSX2 game, the scroller does copy vram to vram with the blitter.

By hit9918

Prophet (2891)

hit9918's picture

15-11-2010, 20:33

@PingPong

about the 26 cycles: did you count cycles including M1 waitstates?

about the 19 cycle MSX2 vram read with 1 nop delay: did you try out 99 + nop + in a,(c) ? the IN (c) instruction takes 2 cycles longer!


No, i do not agree. if you are reading data after address setup, you need to take into consideration the first 2us delay, to 'calibrate' the when the next in on data port should happen.
After this, the successive in/out on data port should not have this first delay- i'm referring to TMS docs to say this.

you mean the TI PDF, section 2.1.5 "cpu read from vram"?
the wording is not so clear. it is unclear when they talk port 99,99,98 and when 98,98,98...

consider this:
if the spec for a 98 cascade were 6 microseconds, this is 21.42 cycles, so this would mean MSX 1 VDP can take 22 cycles. And an otir takes 23 cycles! 21 + 2 M1 waitstates for the double opcode, no?

I think nobody ever got his MSX 1 to take an otir. Except in blank area.

second, making a model of the VDP:
port 98 acess causes an address register increment. one could say that with the increment the adress register was "setup". and IN 98 causes the VDP to put another "vram read to internal buffer is pending" on its todo list. same as when setting up read mode.

So port 98 action actually gets all the same gear in action as port 99 setup! it is not too strange to assume it too takes the 2 microseconds "VDP delay".

If you could check the out 99 + nop + in (c) case, this would be an interesting part of the puzzle.

Some further sightings:

in gradius.rom konami does "loop: outi + jp nz,loop" which nicely makes 29 cycles.

and MSX2 bios does otir except screen mode is 0-3! it looks like 9938 in old modi got same timing as 9918. good for compatiblity, bad for the turbo R brake: that gear cannot check screen mode. I read it does 57 cycles, which is 28.5 cycles in 3.57Mhz, which again is the 8 microseconds figure.

By PingPong

Prophet (3521)

PingPong's picture

15-11-2010, 21:54


@PingPong

about the 26 cycles: did you count cycles including M1 waitstates?

about the 19 cycle MSX2 vram read with 1 nop delay: did you try out 99 + nop + in a,(c) ? the IN (c) instruction takes 2 cycles longer!


unfortunately no, i've not tryed the in (c) variation. about counting cycles, yes, i count wait states, of course.

about delays, you probably are right, the crappy vdp need always the 2us extra delay even on repetitive data port I/O. shame.


If you could check the out 99 + nop + in (c) case, this would be an interesting part of the puzzle.

Umh, all this misteries about vdp delay should be solved- i proposed, and i will insist on a vdp access test demo, testing all conditions.

About the test of the configuration you proposed, i will be happy to test me on my vdp clone when a vdp test is available.

and MSX2 bios does otir except screen mode is 0-3! it looks like 9938 in old modi got same timing as 9918. good for compatiblity, bad for the turbo R brake: that gear cannot check screen mode. I read it does 57 cycles, which is 28.5 cycles in 3.57Mhz, which again is the 8 microseconds figure.

yes, for compatibility, but not on vdp side. i've already verified that otir works good on msx2 on screen 2/4/5/6/7/8, so i think the reason was not to make a msx1 game ran to fast ( !!!! ) when using sc0-3

Plus consider that bandwidth requirements are higher on sc4 than on sc2 (8 sprites/scanline means more fetchs), sc5 is even higher. So when working in more lighter situations, (sc0, sc3) there are no problem.

This is also true on TMS, you can do otir, while working in sc0, because there are more access slots available to cpu.

to summarize, if you can make a .rom or .com program to test all possibilities this would be nice.
It may be interesting comparing those tests against different msx models, to get a clear picture of effective access times.

By hit9918

Prophet (2891)

hit9918's picture

16-11-2010, 23:18

I wrote a VDP speed test! Hannibal

http://jf.peer.name/msx/vdptest.zip

I hope I got no bugs in cyclecounting etc. see source.
The buffer gets copied forth and back so errors should sum up and be permanently visible.

By hit9918

Prophet (2891)

hit9918's picture

20-11-2010, 21:09

I tested real Hit Bit 75P PAL! The manual says it got a "TMS9929ANL".

results:

vpeek 21 cycles FAIL
vpeek dec de/inc de/in a,(0x98) 26 cycles OK
vwrite add #0 outi 26 cycles FAIL
vwrite ld a,(hl)/inc hl/out (0x98),a 27 cycles OK

I just notice the tool got a BUG in the "vpeek 26 cycles" case, it should be 28 cycles.

No matter what, the VDP fails port 98 action far above the 6 microseconds you get without adding 2 microseconds "VDP delay".

Page 5/6
1 | 2 | 3 | 4 | | 6