R800 DMA...

Page 4/5
1 | 2 | 3 | | 5

By hit9918

Prophet (2866)

hit9918's picture

09-05-2019, 20:26

the great C64 games worked with a SCROLL REGISTER.
and they worked with 8 sprites per scanline.
your explanations of how it is all based on those direct registers, and how it could not work with port IO, is utterly false.

By hit9918

Prophet (2866)

hit9918's picture

09-05-2019, 20:33

the C64 has a most painful sprite multiplexer that you cannot imagine.
we should forget the C64

Quote:

Because on an MSX I have to run all sorts of additional code to either get everything updating in the desired order,

which order is it about? is it about the order for flicker?

By hit9918

Prophet (2866)

hit9918's picture

09-05-2019, 20:49

I suggest making a RAM SAT and a simple copy to vram
then there cant be an acess problem anymore

By Grauw

Ascended (8388)

Grauw's picture

09-05-2019, 21:11

Copying SAT: 128*OUTI = 2304 cycles, 3.9% CPU @ 60 fps
Copying SCT: 512*OUTI = 9216 cycles, 15.4% CPU @ 60 fps

Iterating over your sprite objects in order and OUTing directly to VRAM is much more efficient I think.

By PingPong

Prophet (3435)

PingPong's picture

09-05-2019, 21:21

hit9918 wrote:

the great C64 games worked with a SCROLL REGISTER.
and they worked with 8 sprites per scanline.
your explanations of how it is all based on those direct registers, and how it could not work with port IO, is utterly false.

I fully agree with you hit9918. Special effects are mainly due to hw features like scroll registers, sprites, and to a more smart design than those used in VDP by costs factor.
It is incredible how this I/O addressing is perceived as the only source of problems while I/O addressing represent less than 1% of the VDP problems. The VDP is full of quirks that force us to do things in a very inefficient way. Just to mention some:
- The sprite color table, it does force creative approaches like the color byte (C) hit9918 ;-) otherwise you need to move 512 bytes around the VRAM. Sprite table should be linked to pattern table not to sprite plane no.
- The horrendus early clock bit (my god could not be a simple a "most significant bit of X" instead of a strange shift by 32 px left bit)?
- The stupid Y magic value, it does not make any sense on msx2 with scroll register. To add insult to injury you need also to adjust Y sprite value and TAKE into account the proibited value. This takes precius cycles. It does not give any benefit but create headcaches. It's not like trading resolution for colors where you got a benefit and a bottleneck. It's only a problem.

- the I/O protocol in setting the Vram ptr. with a more smart approach could have been speedy: for example
a) allow to specify vram ptr value separately and in independent way, instead of forcing every time to specify the two/three bytes in addressing operations.
b) two ptrs: one for read one for write. Make copies FAAAAAAAAAST!!!!!!!!!
[on msx2 i use VRAM in 0x98 to read bytes and HMMC to write bytes and i do copies fast, byte x byte]
c) autoincrement is good and made things faster. Let's made even auto-decrementing. useful.
d) allow a windowed mode in vram autoincrement feature. Specify start address, n.of bytes that when you reach you applying automatically an increment of "carry value" making address jump to another screen line, store the carry value to add when you outputted n.of bytes in a register, essentially making similar to HMMC operation.

..... you can add a lot if you want....
Please note:NOTHING of these limits is due to I/O addressing style,
even with memory mapped style, the sprite color table problem, the early clock problem, the Y magic value those problems are always here, and always force you to do 'creative' things to work around the limits.

By contrast (c),(d) in conjunction with I/O based style can make things EVEN FASTER in I/O mode that in memory mapped mode. It of course depends on things you want to achieve, but for example a incr/decrementing mode allow you to spare the registers DE (or another pair) and eventually the time used to manipulate those register values that otherwise were needed to point to destination address in a move operation with memory mapped style. Even with actual implementation a OUTI/OTIR does not touch DE that you can use for anything else. Having registers available allow you to avoid some register/memory move operations that takes some time.

Plus, vram memory operations tend to be block oriented. That is because, often you cannot do real time small updates but you face with relatively huge updates. thus you have the problem of tearing and other artifacts that are visible if you operate directly in vram (like when doing SW sprites). So you need to buffer in another region then upload to VRAM your results.

The only thing where i can see a vdp limit are those involving A LOT of small operations in random addresses. for example a program that do a PSET(RND(1)*100, RND(1)*100). the pset operation is small and the overhead needed to set vram ptr does weight.
Or when you draw a line on screen. Even in this case the overall draw time is probably mainly influenced by the bresenham algo used to calculate point that to the difference between a LD (HL),register instead of a 3/4 sequence that does the same thing.

By hit9918

Prophet (2866)

hit9918's picture

09-05-2019, 21:16

the question no more is speed!
the question is whether sandy can get his sprites in order

By Grauw

Ascended (8388)

Grauw's picture

09-05-2019, 21:41

PingPong wrote:

- The sprite color table, it does force creative approaches like the color byte (C) hit9918 ;-)

What’s that creative approach about?

By hit9918

Prophet (2866)

hit9918's picture

09-05-2019, 21:54

the C mirror byte. it works like an index or id.
if the desired C is the same as the mirror C, then one does not need to copy those 16 colorbytes.

every SAT gets 32 bytes of mirror bytes.

and somewhere in RAM one got those colorpatterns indexed by the C byte.

By hit9918

Prophet (2866)

hit9918's picture

09-05-2019, 22:01

I call "colorpatterns" those 16 bytes that got no name.
one can make the upper 128 colorpatterns a copy of the lower 128, but with the EC bits set.
this way the top bit of C is the EC bit.

By hit9918

Prophet (2866)

hit9918's picture

09-05-2019, 22:08

ok now we again got into VDP tricks
but what I wanted to say
there were two pages where was cried about the port IO
when the actual thing behind it was a question of game engine design!

Page 4/5
1 | 2 | 3 | | 5