3D raycasting

Страница 5/16
1 | 2 | 3 | 4 | | 6 | 7 | 8 | 9 | 10

By hit9918

Prophet (2891)

Аватар пользователя hit9918

29-04-2011, 16:56

With the RLE method, drawing transparent sprites goes like this:

If color is 0, then do nothing Big smile Do not draw this chunk. No blitter setup, no blitter run.
Maybe some counting in cpu registers.

For things that are round or with lots holes, this too sounds like a winner.

By ARTRAG

Enlighted (6367)

Аватар пользователя ARTRAG

29-04-2011, 19:34


The blitter takes roughly 1/3 time per pixel as the cpu zoom.

did you test this on real HW?
Openmsx has a vdp engine that is slower than the one on Bluemsx
IMHO the timings of the two emulators are inaccurate (expecially bluemsx, that has also some nasty bugs in the vdp command engine)


The distribution of column sizes in the game is much like drawing a triangle from 256 pixel column case (best case for RLE engine) to 0 pixel column case (worst case). When "the integral of a triangle = 0.5", I feel this means the "allowed overhead" figure is to be divided by 2.

This tells aslo that probably the best solution is to use both solutions:
for columns whose scaled size >= given treshold you use RLE + HMMV
for columns whose scaled size < given treshold you use cpu zoom + HMMC

The most interesting thing is that probably cpu zoom can be effective also with RLE encoded data

e.g. something like

....
ld b,<Run Length*scale>
ld a,<color>

core:
 out (0x9B),a
 djnz core
....

takes 26 cycles per pixel

This means that the very same RLE data could feed BOTH routines, HMMV zoom and the modfied cpu zoom + HMMC

By GhostwriterP

Hero (528)

Аватар пользователя GhostwriterP

29-04-2011, 20:59

OK,
I cannot really understand this latter proposal
If you work on horizontal lines you have to take care of celing and floor...
but those part of the screen usually (at least in my code) is not updated if not in the areas where the columns reduced its heigth across frames
There is no update when the column increases its own size

First of all It was not really meant as a serious solution, secondly an improvement would be swapping b and c, this way we could have 1 texture of 256x64 and keep it straight *no longer 90 degrees turned* in ram. The idea behind this is that you do not have four textures of 64x64 but 256 of 1x64 (w x h)... you can make a simple brick pattern with just four 1x64 textures for instance.
And this idea can be implemented either way.
Moreover your inner loop seems slower than the one proposed by hit9918 and by NYYRIKKI
Is it?
Question
Thirdly, switching back to the vertical approach with HMMC, the inner loop can be rewritten to:

  ld e,texture
  ld h,line ;high <offset_scale_by_line_table>
  ld l,depth

.lus
  ld d,(hl)     ; load texture offset
  inc h          ; advance to next line
  ld a,(de)     ; read texture byte
  out (9Bh),a ; out texture byte
  djnz .lus     ; 

Well how about that? Scales both ways (though limited to 256 scales total) or preforms any kind of transformation you store in the <offset_scale_by_line_table>. Wink
And compared to the hit9918 version core, which counts 58 cycles, the above core counts 48 cycles Tongue.

By hit9918

Prophet (2891)

Аватар пользователя hit9918

29-04-2011, 21:46


The blitter takes roughly 1/3 time per pixel as the cpu zoom.

did you test this on real HW?

No, I remember blitter is in the same ballpark as cpu outi, a ballpark figure Tongue But I think that was copy speed. I just found an "msx assemply page" figure "HMMV 60Hz 212 lines sprites off 5888", I assume thats bytes per frame.
That would be 10.1 cycles per byte, 5.5x faster than the 58 cycle cpu loop. Except the blitter for every line got an overhead (I dont know), so drawing vertical columns would be slower.
Still this is just asking to be tried out.


This tells aslo that probably the best solution is to use both solutions:
for columns whose scaled size >= given treshold you use RLE + HMMV
for columns whose scaled size < given treshold you use cpu zoom + HMMC

An extreme version would be that every column got a threshold value estimated by the compressor. Because threshold depends on how many colorchanges the column got.

By hit9918

Prophet (2891)

Аватар пользователя hit9918

29-04-2011, 22:27

@ARTRAG, I think the result of the zoom mul needs to be a fractional lengh, or else the result will be a bad mess.

I did some code which would be the cpu RLE version:

	;sp  ->RLE stream. one byte amount of runs, then length, color, length, color
	;hl'(exx) ->multable of current zoom stage (integer parts at hl'+256)
	;ix: returnaddress

drawcolumn:
	pop bc
	dec sp
	ld b,c		;amount of runs loaded into B
	
	ld a,0x80	;reset fraction adder. starting with "0.5", just an idea.
			;maybe later a fraction fed from colum draw might be fed here.

	ex af,af'	;fraction adder is in af'
nextRLE:
	exx
	pop de		;d = color, e = length
	ld l,e
	ld c,(hl)	;hl     -> multable fraction part. table must be 256 aligned.
	inc h		;hl+256 -> multable integer part
	ld b,(hl)
	dec h
			;bc = lenght 8:8 fixpoint

	ex af,af'	;switch to fraction adder
	add c		;add lengh fraction part to adder
	jp nc,adderskip
	inc b		;fraction adder overflowed, draw one dot more
adderskip:
	ex af,af'

	ld a,b
	and a
	jr z,zeropixels ;may happen with small zoom values
	
	ld a,d
fill:
	out (0x9b),a
	djnz fill
zeropixels:

	exx
	djnz nextRLE

	jp (ix)		;return after SP abuse

By ARTRAG

Enlighted (6367)

Аватар пользователя ARTRAG

30-04-2011, 08:51

I'm on a script to produce RLE encoded textures
I think that this solution could lead to a real game, all becomes matter of trade-off between vetical detail of textures and framerate

Off topic
http://spectrum.ieee.org/consumer-electronics/gaming/the-wizardry-of-id/0

By ARTRAG

Enlighted (6367)

Аватар пользователя ARTRAG

30-04-2011, 15:58

Tool released (matlab needed)
https://sites.google.com/site/devmsx/rle-textures
It just encodes all the png images it finds in the directory where you put the files

I think the RLE approach is very very promising.

The following 8 textures (actually 4, seen with two levels of light)

https://sites.google.com/site/devmsx/rle-textures/rletextures.bmp?attredirects=0

are stored in 6644 bytes

The plain data for the old (current) code (and is -should be- slower) need 16384 bytes !

By ARTRAG

Enlighted (6367)

Аватар пользователя ARTRAG

30-04-2011, 17:14

same link, new 8 textures compressed in 5712 bytes

the tool now generates also pointers to columns

By hit9918

Prophet (2891)

Аватар пользователя hit9918

30-04-2011, 18:32


The following 8 textures (actually 4, seen with two levels of light)

Make the palette so colors with top bit set are twice as bright.
Then turning on brightnes is just another OR 0x88 on the COLOR value.

A more extreme version would send the COLOR values thru a lookuptable.
Original DOOM looked like doing this.
This allows to use much more colors, in our case images could be of all 16 colors.
The palette just needs enough variety for darker stages to look ok.

e.g. an image got a bright brown pixel and a dark brown pixel.
in the darkened stage it would be two dark brown pixels.
in an even darker LUT, it would appear as a dark brown pixel next to a black one.

The LUT lookup increases setup overhead in the RLE version. While in classic zoom it affects the central per-pixel-draw-loop. Another case where the RLE version is fast.

p.s. if one got e.g. bright green and bright red for lights, one could force their LUT values to stay bright, this makes lights in the dark.

By ARTRAG

Enlighted (6367)

Аватар пользователя ARTRAG

01-05-2011, 00:22

When I started to optimize the routine I wrote, I actually ended up to exactly same as hit9918 Smile

I don't have compiler or MSX emulator on this machine, so I don't know if this works or not...



	ORG #9000

	DB 0, low (16384),low (16384/2),low (16384/3) ... ,low (16384/254), ,low (16384/255)
	DB 0, high(16384),high(16384/2),high(16384/3) ... ,high(16384/254), ,high(16384/255)

; input HL -> texture colum: 64 bytes 
; input A = final size in [1...255]
; input DE = (X,Y) coords for the starting point where to plot the scaled colum 

scaler: ;#9200

	LD IXL,A
	LD B,A
	LD C,#9B
	LD A,#24
	LD (#99),A
	LD A,#91
	OUT (#99),A

	OUT (C),E
	OUT (C),0
	OUT (C),D
	OUT (C),0

	OUT (C),0
	OUT (C),0
	OUT (C),B
	OUT (C),0
	
	LD A,(HL)
	OUT (#9B),A

	OUT (C),0
	LD A,#F0
	OUT (#9B),A

	LD A,#AC
	LD (#99),A
	LD A,#91
	OUT (#99),A

	LD L,B
	LD B,H ; (Texture starts also from 256 byte boundary)
	LD H,#90
	LD E,(HL)
	INC H
	LD D,(HL)
	LD HL,0

CORE:
	ADD HL,DE
	LD C,H
	LD A,(BC)
	OUT (#9B),A

	DEC IXL
	JP NZ,CORE

	ret

@NYYRIKKI

I was studing your code to adapt it to RLE sequences
What is this section for?
Why do you set R#17 = 172 ?

	LD A,#AC
	LD (#99),A
	LD A,#91
	OUT (#99),A
Страница 5/16
1 | 2 | 3 | 4 | | 6 | 7 | 8 | 9 | 10