A long time ago, I was a MSX and MSX2 programmer

Page 7/12
1 | 2 | 3 | 4 | 5 | 6 | | 8 | 9 | 10 | 11 | 12

By ARTRAG

Enlighted (6551)

ARTRAG's picture

19-12-2012, 23:33

it is an msx2+, with hw horizontal scrolling

By ARTRAG

Enlighted (6551)

ARTRAG's picture

19-12-2012, 23:45

hit9918 wrote:

@ARTRAG,
I don't see how high vram bits cause more overhead with 8x8.
16k page setting got to happen 3 times, no matter whether 16x16 or 8x8. It is same work.

Extra overhead is due to the larger number of tiles to be accessed, not specifically to mangeing VRAM addresses
With 16x16 tiles the overhead is 1/4 with respect to the case of 8x8 tiles.

By hit9918

Prophet (2905)

hit9918's picture

20-12-2012, 00:33

8x8 is not 4x overhead, just 2x overhead. Things go "rept 8 instead rept 16".
But I don't know how much is the overhead when mapper things are in there.
Are you using 24bit addresses in column pointers?

By hit9918

Prophet (2905)

hit9918's picture

20-12-2012, 02:57

This problem is much fun Smile
An 8080 style code without IX and EXX, do 24bit tile addresses in 56 rasterlines Smile The core of just painting 8 pixels takes 80%.
It is 50 rasterlines in 16x16 mode.

The cpu time of the 8x8 version is same as dumping a nametable of same hight in screen 4 Big smile

;need disabled interrupts
;call parameters:

;offset variables. in case not selfmodified, variables be not in pixel page window

;b  : loop count, amount of tiles to draw. multiply this *8 (B counted in outi)
;c  : 0x98
;de : vram pointer

;hl : pointer to 16bit nametable (will be put into SP)


drawmultipletiles:
	ld (savesp),sp
	ld sp,hl
	jp enter
	
dmtl:
			;the tail is here because outi B used for looping
11	ld hl,(NToffset);cycles listed for a selfmodified immedeate load
12	add hl,sp	;offset to get to next line in nametable. actually that offset - 2, because a POP was done.
5	ld sp,hl
enter:
11	pop hl		;fetch 16bit from nametable
8	ld a,(COLoffset);in column 1, offset = 3, because 24bit pointers
5	add l
5	ld l,a
	;the H part can be left out if descriptors never cross a 256 byte address.
	ld a,0		
	adc h
	ld h,a

8	ld a,(hl)	;fetch 24bit address of pixels. mapper byte first.
5	inc hl		;inc l if descriptors never cross a 256 byte address.
12	out (0xff),a 	;e.g. window in page 3, low 48k got nametable and tile descriptors
8	ld a,(hl)	
5	inc hl		;inc l if descriptors never cross a 256 byte address.
8	ld h,(hl)
5	ld l,a		;16bit addr fetched to HL. addresses upper bits must be for the page window used.

8*57	rept 8
	ld a,e
	out (0x99),a
	ld a,d
	out (0x99),a
	ind d

	outi
	endm
	;outi did set zero when B went 0
11	jp nz,dmtl
--
575  cycles for 8 lines, 71.9 cycles per line
1031 cycles for 16x16 tiles with rept 16, then 64.4 cycles per line

	ld a,0
	out (0xff),a	;fix page 3 mapping
	ld sp,(savesp)
	ret

By ARTRAG

Enlighted (6551)

ARTRAG's picture

20-12-2012, 08:50

hit9918 wrote:

8x8 is not 4x overhead, just 2x overhead. Things go "rept 8 instead rept 16".
But I don't know how much is the overhead when mapper things are in there.
Are you using 24bit addresses in column pointers?

You forget that you have to process twice as much columns to render the same area.
All the stuff that is outside the I/O section, with 8x8 tiles will to be executed once more due to the extra columns and once more due to the fact the inner loop deals with 8 pixels at time.

Anyway, thanks for drafting the mapper access at 24bit, but ATM I'm on a workaround to avoid vdp commands get corrupted when r18 change while the vdp is busy.

It seems that if you blank the screen, wait two lines, and than change r18 and enable the screen again, the vdp command is not affected. I've tested this solution on my TuroR and it seems to give correct vdp commands and a clean screen split for the scorebar.
Files for those willing to test on their HW are here
https://docs.google.com/open?id=0Bx4kWAc-fapqaVNGYXZQd1p1blU

The code now shows 3 bugs probably due to logical problems in the code:
1) once each 16 steps the large copy 176x16 and the cpu blitting a line 176x1 work on the same area of screen. This causes a visible glitch on top of the screen once each 16 pixels, that to be solved needs some ad hoc tweaking
2) the whole scrolling logic is somehow odd, as the screen misses one line left / right when scrolling left/right. I have to review step by step what goes wrong.
3) randomly, the scrolling direction changes, I get a wrong column 176x16 on the screen, as the cpu were taking wrong nametable data. Also this issue needs investigation.

By hit9918

Prophet (2905)

hit9918's picture

20-12-2012, 18:22

@ARTRAG.
"You forget that you have to process twice as much columns to render the same area"

But what counts is not area, but time Smile
The deal is rendering one column every frame, isn't it. One column in a given timeframe.
And rendering that one column, 8x8 pixel tiles cause two times more nametable acesses than 16x16.

You are just looking at the size of nametable and the area covered by one entry.
But it is a bit like screen 1 vs screen 2 which both got SAME amount of color RAM DMA even though a screen 1 colorbyte covers more area.

So, what do you think, 16x16 goes in 50 rasterlines and 8x8 in 56 rasterlines. On a screen that is 176 lines high.

By hit9918

Prophet (2905)

hit9918's picture

21-12-2012, 01:45

@ARTRAG,
getting the blitter to run beyond a split, you do groundbreaking research Big smile

Is it really setting R18 after the two blank lines? Not right after blanking?

So the blitter does somehow sync DMA slots to R18, maybe it does sync when it finished a line!
blitting 176 lines in 260 NTSC rasterlines, is one line in roughly 1.5 rasterlines, the two blank lines fit the picture!

If there is something to the theory, a much wider blit should still be wrecked.

But I would expect that setting R18 is needed right after switching to blank
(and that in the meanwhile the blitter uses the DMA slots previously used by display DMA. It's all theory).
So blitter got those two blank lines time to sync to the new R18 value.

By hit9918

Prophet (2905)

hit9918's picture

21-12-2012, 02:01

@ARTRAG,
The interference of blitter with cpu columndraw every 16 pixels, first do columndraw of the first 16k vram area, then start blitter.
Then blitter wont overtake the columndraw.

Now that the blitter can blit beyond splits, such delayed blitter start no more does cut down the blit size budget, great!
It starts later and can finish later, at a point after the interrupt split.

So now like 99% of blitter bandwidth can be used for scroll Big smile

By ARTRAG

Enlighted (6551)

ARTRAG's picture

21-12-2012, 10:20

If I uderstand correctly, that's exactly how I've solved the issue ;-)
https://docs.google.com/open?id=0Bx4kWAc-fapqZkxCMDhoQzhtbUE
I've already moved to the other two last points
;-)

By ARTRAG

Enlighted (6551)

ARTRAG's picture

21-12-2012, 10:39

hit9918 wrote:

@ARTRAG,
getting the blitter to run beyond a split, you do groundbreaking research Big smile

Is it really setting R18 after the two blank lines? Not right after blanking?

So the blitter does somehow sync DMA slots to R18, maybe it does sync when it finished a line!
blitting 176 lines in 260 NTSC rasterlines, is one line in roughly 1.5 rasterlines, the two blank lines fit the picture!

If there is something to the theory, a much wider blit should still be wrecked.

But I would expect that setting R18 is needed right after switching to blank
(and that in the meanwhile the blitter uses the DMA slots previously used by display DMA. It's all theory).
So blitter got those two blank lines time to sync to the new R18 value.

According to pingping, the vdp copy engine has two modes to access to the vram, one " slow" with few preassinged time slots when the raster screen is active, one fast with more slots, when the screen is blank.

R18 rules a sort of timer to the access of the raster to the vram, so moving left the screen will cause the raster start earlier.

R18, unfortunately, does not change the access timing of the vdp copy engine (if not after one or two lines of raster), so in the line where R18 has changed and (maybe) in the following, the VDP engine and the raster conflict for VRAM access (and raster has always priority, so copy get disrupted).

When you disable the screen, the vdp passes in "fast vram access" after one or two lines of blank screen, so changing R18 does not cause conflicts between raster and copy engine, as, basically, the former does not access to the vram.

In the end, in order to have the vdp copy engine working correctly through a screen split, at line int:
- disable the screen,
- wait two raster lines,
- change R18,
- enable the screen,
and the vdp command engine will not miss a byte Wink

Page 7/12
1 | 2 | 3 | 4 | 5 | 6 | | 8 | 9 | 10 | 11 | 12