No reasons apart the speed: less times I access to the nametable, less time the blitting will cost
This is the current wip
https://docs.google.com/open?id=0Bx4kWAc-fapqbXNGZDlmM2FraWs
there are some glitches to be investigated and removed
Scrolling left I think that you can see the "first column problem" you mentioned as few missing pixels in the first two/three lines each 16 pixels
@ARTRAG, when issue is just speed, then I try to convince for 8x8:
Below I sketched a code to draw multiple tiles without a call/ret overhead, and all in registers.
56 rasterlines for 176 lines.
Remember the Nemesis plants level making tile combination bloat on MSX1 scroll - it will do that on 16x16 too!
16x16 is 4 times less puzzeling resolution than 8x8.
16x16 limit doesn't fit the picture, you are about to make high performance engine for the high performance MSX2
;call parameters: ;b : loop count, amount of tiles to draw ;c : 0x99 ;de : offset to go one line down in nametable ;hl : vram pointer ;ix : pointer to 16bit nametable ;c' : 0x98 ;de': offset in tile for X scroll drawmultipletiles: dmtl: 5 exx 21 ld l,(ix+0) 21 ld h,(ix+1) ;get tile address from nametable 12 add hl,de ;add offset for scroll 8*61 rept 8 exx out (c),l out (c),h inc h exx outi endm 5 exx 17 add ix,de ;to next line in nametable 14 djnz dmtl -- 583 for 8 lines, 72,9 cycles per line, 0.32 rasterlines per line. ret
If you add the correct management of the high bits of the vram addresses, this 8x8 solution is by far slower than than the one I posted with 16x16 tiles
The sole strong point in your code is the 16 name table where I have (for now) this
ld hl,(_p) ; q = &tile[*p++][offs]; ld a,(hl) inc hl ld (_p),hl ld bc,(_offs) add a,b ld b,a ld hl,_tile add hl,bc ; from here hl points the current tile column
and you have
21 ld l,(ix+0) 21 ld h,(ix+1) ;get tile address from nametable 12 add hl,de ;add offset for scroll
Actually I would like to "compress" tiles removing repeated columns
This implies the use of two data structures, one holding the data
uchar column[Nmax][256];
one holding the addresses of the actual columns for each tile
uchar* tile[Ntile][16];
in the end, I should compute offline something like
tile[n][offset] = &column[x][16*offset];
thus in the code I should use something like
uchar* q = tile[level[i][j]][offset]
just thinking out loud
;-)
Sorry column definition has to be [Nmax][16]
I got problems with the C 2D array descriptions, especially when I expect double indirection, two 16bit fetches, while your asm seems to do one 8bit fetch and then some high byte offsetting with it.
Doing two times 16bit fetch, with stack abuse, I end up with surprising little extra penalty.
And with DE having gotten free, looks like one can use the faster core with 8*57 cycles instead 8*61.
Funny, in the end it cost practically nothing, lol
instead the
11 add hl,de
it is
11 ld sp,offset
12 add hl,sp
5 ld sp,hl
10 pop hl
;need disabled interrupts ;call parameters: ;b : loop count, amount of tiles to draw ;de : offset to go one line down in nametable ;ix : pointer to 16bit nametable ;sp : for X scroll, column number * 2 ;c' : 0x98 ;de': vram pointer drawmultipletiles: ld (savesp),sp dmtl: 5 exx 21 ld l,(ix+0) 21 ld h,(ix+1) ;get address from nametable 11 ld sp,offset ;selfmodified code will be faster, preliminary can work with ld sp,(offset). ;in column 1, offset = 2, because it is pointers 12 add hl,sp 5 ld sp,hl 10 pop hl ;fetch pointer to pixel colum 8*57 rept 8 ld a,e out (0x99),a ld a,d out (0x99),a ind d outi endm 5 exx 17 add ix,de ;to next line in nametable 14 djnz dmtl ld sp,(savesp) ret
So, when finaly tuning it, I guess no speed issues.
But I wonder whether you got some special usage in mind?
Something special in mind that is about 16x16 compression?
In general usage, stuffing an 8x8 level will again cause the tile combination bloat,
and on that one actualy with disabled column sharing.
e.g. when one 8x8 is a symmetric tile, but the 8x8 below is a not symetric tile, in 16x16 end up without column sharing.
You ended up to my very same inner loop. Add management for high bits of the VRAM address and you will arrive to my code with extra overhead due to the larger numger of tiles to be accessed...
Column compression can apply also to 8x8 tiles, with lower gains. The reason is that mapped RAM costs more than VRAM and bank switching is usually much more annoing than VRAM access.
I have no special use in mind atm, maybe MOAM in screen 8, maybe something else ;-)
"This thread has been officially hijacked"
Frederic.markus, you have to play Manboy 2 game, has a wonderful screen5 horizontal and vertical scroll at the same time.