SW Sprites in screen2

ページ 1/5
| 2 | 3 | 4 | 5

By PingPong

Prophet (4093)

PingPong さんの画像

09-07-2011, 23:25

Anyone had tryed to do sw sprites in screen 2? (like ZX speccy programmers does). To have a idea of speed, are slower or faster than using vdp commands in screen5?
i assume a 16x16 pixel wide area. I also assume a monocrome situation, not color table.

Pro: less bytes to move compared to screen 5
Cons: more complexity, due to masking and bit shifting, and the need to set multiple times the vram ptr.

I'm not sure if they are faster or slower than using vdp commands in screen 5

ログイン/登録して投稿

By flyguille

Prophet (3031)

flyguille さんの画像

09-07-2011, 23:58

less bytes to move, but a lot more complicated to render, as it is not a bitmap, the formula to convert XYscreen -> VRAM address is too overhead (include done only one time per X\8 result change). So you ends wasting more time to render in the screen, than if you renders all the SW sprite in a bitmap mode like SC5.

So in real time you ends calculating, 16x3 = 48 vram addresses for know the pattern addresses in vram of all X/Y. Because when the SW sprite don't match with the start of (X mod 8) = 0, you needs to handle 3 columns of char patterns. if the SW Y size is 16, you needs to do the same calculation 48 times.

Also you can save calculations if the Y\8 don't changes, like having a pattern address, plus handling some offset inside the 8x8.

by example: having an assembler routine that does the follow:

LastXCalculated: db 255
LastYCalculated: db 255
PatternBaseAddr: dw 65535

GetVRAMAddressOfXY:

if (LastXCalculated\8) = (X\8) and (LastYCalculated\8)= (Y\8) then
GoBack:
VRAMAddr = PatternBaseAddr + (Y mod 8)
ret
end if

' If don't match , to calculate the PatternBaseAddr base address aswell.
LastXCalculated = X\8
LastYCalculated = Y\8

PatternBaseAddr = Pattern base table + (Y\8)*256+(X\8)*8
JP GoBack

That way you can to save a lot of overhead.

But anyway the primary cons is that you can't handle colourfull sprites, because it is just two colors per 8X dots. and that is way all speccy convertions are forecolor/backcolor only.

this formula is if you have the NAME TABLE done the standard way.

By flyguille

Prophet (3031)

flyguille さんの画像

10-07-2011, 00:38

LastXCalculated: db 255
LastYCalculated: db 255
PatternBaseAddr: dw 65535

; IN [L] = X, [H] = Y.
GetVRAMAddressOfXY:

ld bc, (LastXCalculated)
ld a, l
and $F8
ld e, a
ld a, h
and $F8
ld d, a

ld a, e
cp c
jr nz, .IsNot
ld a, d
cp b
jr nz, .IsNot

.Back:
ld a, h
and $07
ld hl, (PatternBaseAddr)
add a, l
ld l, a
ret

IsNot:
ld (LastXCalculated), de

push hl
ld a, d
rrca
rrca
rrca
ld h, a
ld l, e
ld (PatternBaseAddr), hl
pop hl
jr .Back

no, this sucks,, it is short to calculate everything anyway.

modifying.

By flyguille

Prophet (3031)

flyguille さんの画像

10-07-2011, 00:44

; IN [L] = X, [H] = Y.

GetVRAMAddressOfXY:

ld a, l
and $F8
ld e, a
ld a, h
and $F8
ld d, a
    ; E = X\8
    ; D = Y\8

ld c, h
    ; C = Y
ld a, d
rrca
rrca
rrca
ld h, a
ld l, e    ; HL = Y\8*256+X\8*8

ld a, c
and $07   ; A = Y mod 8
or l
ld l, a    ; HL = Y\8*256+X\8*8+Y mod 8
ret

and as pattern base address =0, you don't needs to plus nothing.

for color base address, just aply a set 5,h.

then to the VRAM address, you needs to know the bit to modify, Bit number = X mod 8.

By flyguille

Prophet (3031)

flyguille さんの画像

10-07-2011, 00:55

so , SW sprites consists in three stages...

just after the VBLANK

1) restore all backgrounds
2) here you can to modify the background scenario.

3) reads in RAM the X/Y of all sprite to render.
4) reads and save all the background where the sprite will be on.
5) use the readed information, draws the sprite pattern on it. (in a separate ram location)
6) use the result to send it to VRAM.

so, this way you handle all the sprite dinamically, if they move or not, that don't matter.

When in the gameplay code one sprite is moved or change the figure to use, you only updates the X/Y/P# variable in RAM, and let the SW sprite redrawing routine make the visual changes in the next frame.

so the given routine for calculating the VRAM address, is executed, like 48 times for reading the background, 48 times for posting the SW sprite, in the best case it is 32 times + 32 times. Plus the VPOKE/VPEEK itself.

then , when you has in RAM the background image, it can be sized 2 or 3 columns of 8 pix wide, if it taked 2 or 3 chars, (is to say 32 or 48 bytes), and for drawing on it the sprite image you needs to knows the X bit offset, the Y is already on the top because the pattern copied to RAM will already start on the desired Y pos.

and write bit per bit, 16x16times, is to say a bucle of 256times.

then, to post back all drawed to VRAM

you can use here one trick, you can also save the VRAM addresses of each byte saved to RAM in RAM aswell, so when VPOKING you already has the VRAM position saved in RAM.

so a buffer in ram can to take 48*3 bytes. each data byte, with its vram address saving to rerun twice the X-Y -> vram routine.

By hit9918

Prophet (2927)

hit9918 さんの画像

10-07-2011, 01:03

The first thing that comes to mind: you need the sprites 8x stored in RAM for soft x positioning.

Then AND/OR without shifts, all this goes in a RAM buffer of the screen:

8	ld a,(bc) ;sprite mask
8	and (hl)  ;background
5	ld b,a
8	ld a,(de) ;sprite data
5	or b
8	ld (hl),a ;background store
5	inc c
5	inc e
5	inc h	  ;column addressing
--
57 cycles

With 16x16 sprites you do 3 bytes per line, so after 3 times the above code,
background pointer hl needs to be adjusted to jump to next line

Thinking about the bitmap scroller, I think best RAM layout is to have columns:
256 byte columns, you can do vertical roll with wraparound in them similar to MSX2!

This column style also looks very good when going to next line with hl, because there would be no register free to ADD something to HL:

dec h ;go one column left
dec h ;go one column left
dec h ;go one column left
inc l ;go one line down
--
20 cycles

(57 * 3 + 20) = 191 cycles per line.

clean sprite from a copy of the background buffer:
(16 for LDI * 3 + 20) = 68 cycles per line (inc dec column addressing can be done on DE, even more benefit from column style)

total 191 + 68 = 259 cycles per line.

229 lines per NTSC frame, 14,3 16x16 sprites per frame. hope I got no calculation bugs.

Surprisingly high number, showing this rule: if the framebuffer machines ZX CPC ST would have had hardware scroll, they would have been a lot faster. Including a scroller, they never do a dozen sprites 60fps.

Faster are code sprites: the sprite is encoded in the immedeate values of the code.
Those sprites need roughly 3x as much RAM for all those opcodes.

8 ld a,sprite mask
8 or sprite data
8 ld (hl),a ;background
5 inc h

8 ld a,sprite mask
8 or sprite data
8 ld (hl),a ;background
5 inc h

8 ld a,sprite mask
8 or sprite data
8 ld (hl),a ;background

5 dec h
5 dec h
5 inc l
--
97 cycles per line

total 97 + 68 (for LDI cleanup) = 165 cycles per line

360 lines per NTSC frame, 22.53 sprites per frame.

By hit9918

Prophet (2927)

hit9918 さんの画像

10-07-2011, 01:51

The line overheads are wrong, I forgot I can just render linear 16 bytes vertical.

By flyguille

Prophet (3031)

flyguille さんの画像

10-07-2011, 02:51

having 8x sprite figures pre-shifted for screen2, is the same ammount of memory than just do it for bitmap mode, you ends needing 8x32 = 256 bytes per sprite figure the same than for bitmap that is 16x16 = 256 bytes. And in bitmap is one color per dot.

Also th background copy in RAM is of a fixed size, you can use blitter's commands for vram->ram the square background.

But!, if you don't want to VPEEK from the vdp, and this is important, you can make your scenario render routine to be capable of dump just a 16x16 area in RAM, so you don't needs to fetchs the background from VDP anyway. But your background rendering routine must be simple enough. Like just tiled mapped scenario->bitmap.

By PingPong

Prophet (4093)

PingPong さんの画像

10-07-2011, 12:09

Umh, with the vdp a full copy cycle of 16x16 sprites, meaning:
- save background (HMMC)
- plot sprite (LMMC -> horribly slower)
- erase background (HMMC)

does not allow me to go more than 5-6 sprites x frame @50hz.

So maybe doing screen 2 is faster? (assuming monocromatic sprites)

By Edwin

Paragon (1182)

Edwin さんの画像

10-07-2011, 13:20

If you a little math, you can pretty much discount that idea immediately. A 16x16 sprite in sc2 covers 9 tiles which will have to be updated. This is 9x8=72 bytes. A 16x16 area is 128 bytes (less than twice the amount for sc2 patterns). To update sc2 you have to: read the sprite (assuming pre-shifted data), read the background, mask the background, merge the sprite, write it to vram. Five actions of which the last is very slow. The vdp has to a read/write more to store the background, but has six times the clock to do it in. The vdp does have to do a lot more in that time though, but it can still high speed copy a byte faster (aprox 12 z80 t-states) than the z80 can outi. With a slow copy taking about three times as long, you're on approximately 60 t-states per bytes. Have a try and see what you can do in that time. Wink

By PingPong

Prophet (4093)

PingPong さんの画像

10-07-2011, 14:36

If you a little math, you can pretty much discount that idea immediately. A 16x16 sprite in sc2 covers 9 tiles which will have to be updated. This is 9x8=72 bytes. A 16x16 area is 128 bytes (less than twice the amount for sc2 patterns). To update sc2 you have to: read the sprite (assuming pre-shifted data), read the background, mask the background, merge the sprite, write it to vram. Five actions of which the last is very slow. The vdp has to a read/write more to store the background, but has six times the clock to do it in. The vdp does have to do a lot more in that time though, but it can still high speed copy a byte faster (aprox 12 z80 t-states) than the z80 can outi. With a slow copy taking about three times as long, you're on approximately 60 t-states per bytes. Have a try and see what you can do in that time. Wink
So teoretically, every zx speccy game, that does not have scrolling, but instead only manage sw sprites, can be ported in screen 5 at greater speed?
Still not so sure...

ページ 1/5
| 2 | 3 | 4 | 5