R800 DMA...

Page 3/5
1 | 2 | | 4 | 5

By PingPong

Prophet (3281)

PingPong's picture

09-05-2019, 01:11

Quote:

Yes, good point. They can get a way with a lot because there is just less stuff to 'draw' on the screen.
Still, if I am not mistaken, an MSX 2+ with V9958 can perform VDP commands in pattern modes? Could you achieve similar results though?

the VDP can write data at about 170-210 KBytes/sec depending on the command you use. but those commands are intended to graphic operations like on screen 8. for example there is no strict equivalent to z80 LDIR instruction. So the efficiency you got depends on how the commands fit your needs. of course you can use the combined power of vdp + cpu to move more data around. there is a karate like game that use this approach and get a relatively high frame rate with the amount of bitmap screen 5 data.

By Sandy Brand

Master (142)

Sandy Brand's picture

09-05-2019, 01:26

42 + 72 + 42 + 288 = 444 T-states

Not 360.

Or what am I missing? Smile

(and you are still missing opcodes to set register C, twice).

And yes, of course, you probably need to map VRAM into the address space of the Z80 somehow, which may cost some time. But, as this is a fictitious example, I could argue that there might be a way do it so that you don't need to do it all the time, just like you suggest for not setting the high bits of the VRAM address.

By PingPong

Prophet (3281)

PingPong's picture

09-05-2019, 01:36

Grauw wrote:

I agree the TMS9918 and V9938 designs aren’t perfect, but I’m not convinced this is due to the absence of direct memory mapped access to the VRAM and registers. I think PingPong made a good case for this.

you expressed my point in a more precise way than me.
msx vdp had a lot of bad design issues. they are a more limiting factor than the I/O based access is.
Of course the I/O protocol could have been better developed making more efficient but the real bottleneck are a lot of design quirks here and there (due to the TMS roots and the need of some compatibility) that really make life hard.
But a lot of people (especially some coming from a spectrum background ) think that the real problem is the lack of memory mapped access. It is not. Otherwise the amazing things the C64 is able to achieve could have been available even on spectrum or amstrad CPC. And we know they are not.

the c64 can move 8 24x21 objects at a full 60fps due to it's hardware sprites, not so easy to achieve on amstrad cpc or speccy. The reason? hw sprites not memory mapped access style. both cpc and zx have. And with a plus of a sligtly more faster processor (the z80)
by contrast the msx can move 32 16x16 objects (with limits) at 60fps. and it does not have memory mapped access. it does have (a more limited) sprite support and I/O based access. So from where it comes the 60fps?

By PingPong

Prophet (3281)

PingPong's picture

09-05-2019, 01:45

Sandy Brand wrote:

(and you are still missing opcodes to set register C, twice).

an overhead of 8 cycles, as said my calcs were approximate but the vram ptr setup is precise . Do not tell me that 8 cycles make difference. ;-) otherwise i can tell you that the DE register pair available can be a speed up in otir method vs ldir one.
ah ok now i see, the mistake, anyway the other calcs are exact, you pay about 25% of overhead damn windows calculator ;-)
Of course in both examples we do not take fully exact calcs because it always depend on how you do things.

By PingPong

Prophet (3281)

PingPong's picture

09-05-2019, 01:48

Sandy Brand wrote:

Not 360.

360 is only the amount of cycles needed to perfor the raw data write. 4 bytes + 16 bytes

By PingPong

Prophet (3281)

PingPong's picture

09-05-2019, 02:16

Sandy Brand wrote:

(btw. Thanks for the tip on how to speed up setting the 17 bits VRAM destination address! I never knew that was possible. I guess the VDP will just figure out that if bit 7 of the second byte send to port #99 again is a '1', that the data is intended for a VDP register?)

there is another trick that it is supposed to work on msx (on sega vdp it does for sure).
Assume you need to set vram ptr to random position but you figure that the high byte of address do not change. Well SMS coders only out the low byte to port 0x99 just like this.
ld a,l
out(0x99),a
this has the effect of setting the low byte of the vram ptr despite the fact that the vdp does not yet know if this byte is for a vram address setup or for a vdp register oO : another strange design. SMS coders use it and i suspect that may also work on msx, at least on TMS vdp...

ref: http://www.smspower.org/Development/VDPProgrammingTechniques

By ricbit

Champion (434)

ricbit's picture

09-05-2019, 08:32

PingPong wrote:

Assume you need to set vram ptr to random position but you figure that the high byte of address do not change. Well SMS coders only out the low byte to port 0x99 just like this.
ld a,l
out(0x99),a

If this can be confirmed, then it is huge news. I can update only x/y coords of sprites and skip pattern and color.

By PingPong

Prophet (3281)

PingPong's picture

09-05-2019, 09:47

should be tested on v9938, v9958, tms, and vdp clones. we need help from real hw owners.
I expect to work on TMS not sure on others.
The reason why i expect to work on TMS is the usual: quirks done: mess done to save transistor count. Now we all know that Guttag was a master in doing such kind of thing even sacrificing good design vs costs even at inaceptable tradeoff.

There may be an unknown reason that make things work like this and i am pretty sure that is was done in this way to save transistor count on DIE. Not to allow a kind of flexibility for sure.

Anyway maybe one could prove this with a small assembly routine to be launched on MSX (maybe from basic) to test this behaviour

By hit9918

Prophet (2853)

hit9918's picture

09-05-2019, 17:51

Quote:

What I see is that by only being able to sequentially write sprite attributes in VRAM I now have to either sort my game objects to update in the correct order, or use some dummy buffer to first build it in RAM and then copy it into VRAM. This adds complexity and overhead.

this was a very good example where the end result is the OPPOSITE of the stereotype: the C64 with its direct registers needs more cpu time for the sprites than the MSX with port IO!!!

there are many cases where one would have to talk about game engine design.
in other words, cpu time was lost in an unknown location.
and then it gets blamed on the VDP.

By Sandy Brand

Master (142)

Sandy Brand's picture

09-05-2019, 19:59

hit9918 wrote:
Quote:

What I see is that by only being able to sequentially write sprite attributes in VRAM I now have to either sort my game objects to update in the correct order, or use some dummy buffer to first build it in RAM and then copy it into VRAM. This adds complexity and overhead.

this was a very good example where the end result is the OPPOSITE of the stereotype: the C64 with its direct registers needs more cpu time for the sprites than the MSX with port IO!!!

You will need to explain that a bit more?

Because on an MSX I have to run all sorts of additional code to either get everything updating in the desired order, or to first write it in a temporary work-buffer before I can then write it into VRAM.

But on a C64 I can just write it directly into (V)RAM, then how does the C64 need more CPU time?

Or are you talking about the C64 CPU needing to slowdown in order to handle the bus sharing?

Page 3/5
1 | 2 | | 4 | 5