VDP commands and screen 4

Página 1/3
| 2 | 3

Por ARTRAG

Enlighted (6240)

imagem de ARTRAG

26-10-2005, 21:41

After a nice discussion with Fudeba on how to gain z80 time in the scrolling of KG I have proposed a trick for using the vdp commands in order to scroll the scr4 pages. The idea is simple and could be useful for those that who use scr4 and would gain some z80 time (and loose their mind :-)

There are still many small errors in the following code but the idea works and is like this:

1) wait vblank (use interrupts, not halt like my test)
2) go scr5 (disable sprites !!)
3) do VDP copy commands
4) do some z80 tasks while the VDP is busy
5) return to scr4 before the VBLANK ends

The gain is that during the VBLANK time the VDP and the Z80 can do tasks in parallel. While the VDP is used for moving bytes in VRAM, the z80 can do something else. Unfortunately the VDP command is interrupted if you change VDP mode back in scr4, so the VDP time for commands is limited to the VBLANK period.

This small test code in the following uses the VDP for horizontal and vertical scrolls in scr4.

For vertical scrolls, in order to move the scr4 one line down (32 tiles) you need two copies in scr5
and a framebuffer

Assuming the standard base address for the active page in scr4 and 16KB later the framebuffer
the first is command is :
(0,48)-(191,53) to (64,48+128)
the second is :
(192,48)-(255,52) to (0,49+128)

the result now is that the buffer has a copy of the active page one line lower. The z80 (or the VDP itself, why not?) need only to fill the first line with the entering tiles.
Swap the active page and the framebuffer and you have your vertical scroll !!

For horizontal scrollers, in order to move the scr4 of one column right you need only one scr5
copy (and a framebuffer 16KB later :-)

do copy
(0,48)-(252,53) to (2,0+128)

The z80 need to fill the first column on the left in scr4 (what corresponds to 4 columns of 6 points in scr5). For the VDP, now it is harder, you need 4 commands and it does not worth, as giving the commands costs more z80 time than doing the update with VRAM accesses. In any case after the copy swap the active page and the framebuffer and you have your horizzontal scroll !!

Enjoy the test, and sorry for the bugs

db $fe
dw startProgram,endProgram,startProgram

org $C000

; VDP ports
;
vdpport0 equ 098h ; VRAM read/write
vdpport1 equ 099h ; VDP registers read/write
vdpport2 equ 09ah ; Palette registers write
vdpport3 equ 09bh ; Indirect register write

; vdp: send value in A to VDP register
; syntax:vdp reg#
; modifies: A
; note: **** INTERRUPTS MUST BE DISABLED ****
; (6 bytes/31 cycles)
;
MACRO vdp reg
out (vdpport1),a
ld a,reg or 010000000b
out (vdpport1),a
endm

; vdpw: send value to VDP register
; syntax:vdp reg#, value#
; modifies: A
; note: **** INTERRUPTS MUST BE DISABLED ****
;
MACRO vdpw reg,value
ld a,value
vdp reg
endm

; scr5: set screen 5
; syntax:scr5
; modifies: A
; note: **** INTERRUPTS MUST BE DISABLED ****
;
;
MACRO scr5
vdpw 0,6
vdpw 1,98
vdpw 8,10
endm

; scr4: set screen 4,2
; syntax:scr4
; modifies: A
; note: **** INTERRUPTS MUST BE DISABLED ****
;
;
MACRO scr4
vdpw 0 ,4
vdpw 1 ,98
vdpw 2 ,6
vdpw 3 ,255
vdpw 4 ,3
vdpw 5 ,63
vdpw 6 ,7
vdpw 8 ,8
vdpw 9 ,2
vdpw 10 ,0
vdpw 11 ,0
endm

startProgram:

ld a,8
call #5f

ld a,4
call #5f

xor a
ld hl,0
call SetVdp_Write

rept 3
ld hl,(#F920)
ld bc,256*8
1 ld a,(hl)
inc hl
out (vdpport0),a
dec bc
ld a,c
or b
jr nz,1B
endm

vdpw 7,0

xor a
ld hl,#1800
call SetVdp_Write

ld bc,256*3

1 ld a,c
neg
out (vdpport0),a
dec bc
ld a,c
or b
jr nz,1B

xor a
ld hl,22*1024
call SetVdp_Write

ld bc,256*3

1 ld a,'A'
out (vdpport0),a
dec bc
ld a,c
or b
jr nz,1B

rept 3*60
halt
endm

ld bc,31

loopX
push bc

scr4
halt

scr5
ld hl,testdat1X
call DoCopy
vdpw 15,2
1 in a,(vdpport1) ;loop if vdp not ready (CE)
rrca
jp c,1B
scr4
vdpw 15,0

halt

scr5
ld ix,testdat2X
vdpw 15,2
call cycleX
scr4
vdpw 2,22
vdpw 15,0

halt

scr5
ld hl,testdat11X
call DoCopy
vdpw 15,2
1 in a,(vdpport1) ;loop if vdp not ready (CE)
rrca
jp c,1B
scr4
vdpw 2,22
vdpw 15,0

halt

scr5
ld ix,testdat22X
vdpw 15,2
call cycleX
scr4
vdpw 15,0

pop bc

ld a,c
or b
dec bc
jp nz,loopX

;--------------------

ld bc,24

loopY
push bc

scr4
halt

scr5
ld hl,testdat1Y
call DoCopy
vdpw 15,2
1 in a,(vdpport1) ;loop if vdp not ready (CE)
rrca
jp c,1B
scr4
vdpw 15,0

halt

scr5
ld hl,testdat2Y
call DoCopy
vdpw 15,2
1 in a,(vdpport1) ;loop if vdp not ready (CE)
rrca
jp c,1B
scr4
vdpw 2,22
vdpw 15,0

halt

scr5
ld hl,testdat11Y
call DoCopy
vdpw 15,2
1 in a,(vdpport1) ;loop if vdp not ready (CE)
rrca
jp c,1B
scr4
vdpw 2,22
vdpw 15,0

halt

scr5
ld hl,testdat22Y
call DoCopy
vdpw 15,2
1 in a,(vdpport1) ;loop if vdp not ready (CE)
rrca
jp c,1B
scr4
vdpw 15,0

pop bc

ld a,c
or b
dec bc
jp nz,loopY

rept 6*60
halt
endm

xor a
call #5f

ret

;
;Set VDP port #98 to start writing at address AHL (17-bit)
;
SetVdp_Write: rlc h
rla
rlc h
rla
srl h
srl h
vdp 14 ;set bits 15-17
ld a,l ;set bits 0-7
out (vdpport1),a
ld a,h ;set bits 8-14
or 64 ; + write access
out (vdpport1),a
ret

;
;Set VDP port #98 to start reading at address AHL (17-bit)
;
SetVdp_Read: rlc h
rla
rlc h
rla
srl h
srl h
vdp 14 ;set bits 15-17
ld a,l ;set bits 0-7
out (vdpport1),a
ld a,h ;set bits 8-14
; + read access
out (vdpport1),a
ret
;
;In: HL = pointer to 15-byte VDP command data
;Out: HL = updated
;
DoCopy:
vdpw 17,32
ld c,vdpport3
rept 15
outi
endm

ret

; VDP commands (HMMV)

testdat1Y

db 0,0 ; origine X
db 48,0 ; origine Y
db 64,0 ; destinazione X
db 48+128,0 ; destinazione Y
db 192,0 ; dimensione X blocchi
db 6,0 ; dimensione Y blocchi
db #00,#00,#D0

testdat2Y

db 192,0 ; origine X
db 48,0 ; origine Y
db 0,0 ; destinazione X
db 49+128,0 ; destinazione Y
db 64,0 ; dimensione X blocchi
db 5,0 ; dimensione Y blocchi
db #00,#00,#D0

testdat11Y

db 0,0 ; origine X
db 48+128,0 ; origine Y
db 64,0 ; destinazione X
db 48,0 ; destinazione Y
db 192,0 ; dimensione X blocchi
db 6,0 ; dimensione Y blocchi
db #00,#00,#D0

testdat22Y

db 192,0 ; origine X
db 48+128,0 ; origine Y
db 0,0 ; destinazione X
db 49,0 ; destinazione Y
db 64,0 ; dimensione X blocchi
db 5,0 ; dimensione Y blocchi
db #00,#00,#D0

testdat1X

db 0,0 ; origine X
db 48,0 ; origine Y
db 2,0 ; destinazione X
db 48+128,0 ; destinazione Y
db 254,0 ; dimensione X blocchi
db 6,0 ; dimensione Y blocchi
db #00,#00,#D0

testdat2X

db 62,0 ; origine X
db 48,0 ; origine Y
db 0,0 ; destinazione X
db 48+128,0 ; destinazione Y
db 2,0 ; dimensione X blocchi
db 6,0 ; dimensione Y blocchi
db 'A',#00,12 ;#D0

testdat11X

db 0,0 ; origine X
db 48+128,0 ; origine Y
db 2,0 ; destinazione X
db 48,0 ; destinazione Y
db 254,0 ; dimensione X blocchi
db 6,0 ; dimensione Y blocchi
db #00,#00,#D0

testdat22X

db 62,0 ; origine X
db 48+128,0 ; origine Y
db 0,0 ; destinazione X
db 48,0 ; destinazione Y
db 2,0 ; dimensione X blocchi
db 6,0 ; dimensione Y blocchi
db 'A',#00,12 ;#D0

cycleX:
ld bc,62*256+0
call .util
ret

ld bc,126*256+64
call .util

ld bc,190*256+128
call .util

ld bc,254*256+192

.util
ld (ix+0),b
ld (ix+4),c
ld hl,ix

1 in a,(vdpport1) ;loop if vdp not ready (CE)
rrca
jp c,1B
call DoCopy

ret

endProgram:

END

Entrar ou registrar-se para comentar

Por GhostwriterP

Champion (511)

imagem de GhostwriterP

26-10-2005, 23:25

Say, do you also need to switch to sc5 on a v9958?

Por NYYRIKKI

Enlighted (5366)

imagem de NYYRIKKI

27-10-2005, 00:11

No, if you set bit6 of register 25 (But then you need to use SCREEN 8 addressing)

Por Sonic_aka_T

Enlighted (4130)

imagem de Sonic_aka_T

27-10-2005, 00:16

Not if you enable extended commands, no...

Por Sonic_aka_T

Enlighted (4130)

imagem de Sonic_aka_T

27-10-2005, 00:17

Hehe, I'm rite on time, as usual... ^_^

Por Maggoo

Paragon (1195)

imagem de Maggoo

27-10-2005, 07:30

Say, do you also need to switch to sc5 on a v9958?

V9938. The point of Artrag's routine is that it works on MSX2.

Por GhostwriterP

Champion (511)

imagem de GhostwriterP

27-10-2005, 10:47

No, if you set bit6 of register 25 (But then you need to use SCREEN 8 addressing)It is a shame that the v9938/58 has no lineair copy commands like
v9990 has. That would realy be usefullSmile
V9938. The point of Artrag's routine is that it works on MSX2.
I know that. Just a bit curious if my intel was right.

Por snout

Ascended (15187)

imagem de snout

27-10-2005, 14:38

Maggoo - does this open a world of new possibilities for VSCREEN? Or...

Por ARTRAG

Enlighted (6240)

imagem de ARTRAG

27-10-2005, 15:40


It is a shame that the v9938/58 has no lineair copy commands like
v9990 has. That would realy be usefull

But the copies I have proposed do the trick both for vertical and horizzontal scrolls !!
Note that for vertical scroll, at the cost of two scr5 copies you get the same result
of one liner copy, while for horizontal scrolls you have almost no loss at all.

This is if you need to scroll of one tile (in 4 directions), but the same idea can be used
for different speeds (two o three tiles means only a lerger area of new tiles to be filled)...

As last resort, in case you need to change the whole screen
you need to use 20 linear copyes of 32 tiles, but still it is twice
faster than using the z80

Por Maggoo

Paragon (1195)

imagem de Maggoo

27-10-2005, 19:00

Maggoo - does this open a world of new possibilities for VSCREEN? Or...

Yes it does, in theory anyway... But it implies major changes and will not obtain 100% benefits from the method. The idea is brilliant but may be more adapted to a game with a pre-defined scrolling path or direction (like a shooter or Mkid type scrolling), then it's really easy to apply. In Vscreen, I don't really scroll the screen as per say, it's redrawn entirely at every frame to takes into account the changes made to the level. It has no limit in scrolling speed or direction. So it's do-able and probably could bring some CPU time saving but can't be implemented THAT easily. Not saying I won't try to implement it someday...

Por ARTRAG

Enlighted (6240)

imagem de ARTRAG

27-10-2005, 19:41

@Maggoo

I want to do a test case for the 20 copies VRAM-VRAM of 32bytes to be compared with the force brute approach RAM->VRAM.
This implies that you must store the level map both in RAM and VRAM (and update both in case of animated objects made of tiles (!!).
But, if the speed gain is sensible, this improvement could worth to be adopted, as there is no complex logic to be implemented and the two approaches can be easily be interchanged, (at least this solution is easier than the incremental drawing).
If the VDP results much faster that the z80 in updating the screen (actually from my early tests NO, but I need more ad hoc experiments), the spare time can be wasted in waiting the CE bit, otherwise the z80 wait time could be used somehow; at the end, you gained 20 time slots of z80 time for 20 easy tasks to be performed during VBLANK.

Página 1/3
| 2 | 3