3D raycasting

Страница 2/16
1 | | 3 | 4 | 5 | 6 | 7

By ARTRAG

Enlighted (6367)

Аватар пользователя ARTRAG

27-04-2011, 15:12

Artrag,

Can you define more precisely the scaling operation on the texture array of data Question

From what I understand, you have a texture of 64 bytes as an input, and the scaling operation would output a result between 1 and 255 bytes. Is that correct ?

But I suppose that apart from the number of bytes, some texture scaling is necessary, resulting in an alteration of the texture bytes. Let's say you have this 4 bytes texture :

FF 00
00 FF

A zoom in scaling by factor 2 would produce this 8 bytes result :

FF FF 00 00
FF FF 00 00
00 00 FF FF
00 00 FF FF

Is that correct ?

Whereas a zoom out scaling by factor 2 would produce this 2 bytes result :

F0
0F

Is that correct ?

The scaling acts on a single colum at time

so, assume you work on column 0 of your texture of heigh 4

FF
FF
00
00

If I call the scaler with final size equal to 2 I expect it plots

FF
00

If I call the scaler with final size equal to 8 I expect it plots

FF
FF
FF
FF
00
00
00
00

If I call the scaler with final size equal to 12 I expect it plots

FF
FF
FF
FF
FF
FF
00
00
00
00
00
00

etc etc

By hit9918

Prophet (2891)

Аватар пользователя hit9918

27-04-2011, 15:41

@hit9918 : your "dream core" will output an horizontal line on screen ... Not sure it is what's expected.

@metalion: I mean it to use HMMC mode.
blitter style drawing of squares, getting fed pixels by the cpu.
And if I got that right, a square with width 1 will make the vdp draw a column fed by the cpu.

If that stunt works, it will save an entire ram to vram screen copy,
and further save cycles in the z80 draw code.

By ARTRAG

Enlighted (6367)

Аватар пользователя ARTRAG

27-04-2011, 16:17

@ARTRAG:

I would need an extension of the rules:
A texture column should not cross a 256 bytes address border.
Also the textures are rotated 90 degree in memory. Is this ok?

That would allow this core:

	add hl,de	;8:8 fixpoint delta add, H is the low byte of texture column address
	ld c,h
	ld a,(bc)	;B contains hibyte of texture column address
	
	exx		;rest is in other register set
	...

And the second thing is: I assume a column is copied in bytes, i.e. walls drawn in doublepixels.
Single pixel resolution with all the masking etc would be even more than 2 times slower.
The doublepixel rendering also could allow the textures stored with dithering, 2 pixels in even and odd column belong together.

Another thing I wonder:

how about directly drawing to vram in MSX2 vertical cpu byte feed mode.
I heard one got to test some "hardware is busy" flag,
but I guess one can snip that given the cycles taken by the rest of the code?

So the loop would do nothing but store the byte out (0x98) (or whatever the port is), resulting in this dream core:

core:
12	add hl,de	;8:8 fixpoint delta add, H is the low byte of texture column address
5	ld c,h
8	ld a,(bc)	;B contains hibyte of texture column address
	
12	out (vdp),a
	
10	dec ixl
11	jp nz,core
--
58 cycles
60000 pixels per second. in screen 8! 
120000 pixels per second in screen 5 with doublepixel rendering.

Your assumptions were also mine (textures with colums rotated 90 degree in memory and byte copy)
Only the allignment at 256 could result in a severe limitation, as it is not very efficient when you use, say 64 byes of 256, for 64*16=1024 times (it would waste 192KBytes!)

Assuming your figures and a view port of 160 bytes, one could reach, in the worst case (against a wall)

60000 bytes per second /128 columns per screen / 160 pixel per colums = about 3 frames per second that is not bad BUT I'm pretty sure it is neither the best it can be achieved

Yours is probably the best option when the texture is scaled down or up untill a given size of the scaled "textel", of about 2-3 pixels.

If the size of the scaled pixel increases, instead of HMMC (that allows the CPU to feed the VRAM to fill a colum of data @Metalion, hit9918 is assuming to use this command)
it could (unsure) be faster to use a sequence of HMMV commands to render the boxes that represent the scale pixels.

Note that the some VDP registers assume as ending state after a command the right value of the next command
E.g. DY assumes as ending state after a command HMMV the value DY*=DY+N thus it can be left as it is.

By Metalion

Paragon (1132)

Аватар пользователя Metalion

27-04-2011, 16:22

The scaling acts on a single colum at time
Mmmmh ... I see.

But if the data in the column is not vectorized, it means that the routine needs to have some kind of data processing technique in order to compress (or expand) the original data Question

Let's say you have this 6 bytes data column :

0F
FF
F0
0F
FF
F0

What should be the expected result for an height of 4 bytes, for example ?

By NYYRIKKI

Enlighted (5541)

Аватар пользователя NYYRIKKI

27-04-2011, 18:41

For scaler it self, could something like this work?



	ORG #9000

	DB 0, low (16384),low (16384/2),low (16384/3) ... ,low (16384/254), ,low (16384/255)
	DB 0, high(16384),high(16384/2),high(16384/3) ... ,high(16384/254), ,high(16384/255)

; input HL -> texture colum: 64 bytes 
; input A = final size in [1...255]
; input DE = (X,Y) coords for the starting point where to plot the scaled colum 

scaler: ;#9200

	PUSH DE
	LD B,H ; (Texture starts also from 256 byte boundary)
	LD H,#90
	LD L,A
	LD E,(HL)
	INC H
	LD D,(HL)
	LD HL,0
	EXX
	POP HL
	LD B,A

ILOOP:
	EXX
	ADD HL,DE
	LD C,H
	LD A,(BC)
	EXX

	CALL PSET_HL_A
	INC H

	DJNZ ILOOP

	ret

By ARTRAG

Enlighted (6367)

Аватар пользователя ARTRAG

27-04-2011, 19:14

The scaling acts on a single colum at time
Mmmmh ... I see.

But if the data in the column is not vectorized, it means that the routine needs to have some kind of data processing technique in order to compress (or expand) the original data Question

Let's say you have this 6 bytes data column :

0F
FF
F0
0F
FF
F0

What should be the expected result for an height of 4 bytes, for example ?

Natuarlly you have some loss of resoultion, all depends on the ratio between intial size and final size

the 1st byte will correspond to 6/4 = 1,5 that is FF
the 2nd byte will correspond to 2*6/4 = 3 that is 0F
the 3rd byte will correspond to 3*6/4 = 4.5 that is FF
the 4th byte will correspond to 4*6/4 = 6 that is F0

so the scaled result is :

FF
0F
FF
F0

Many other ways of scaling are possible (involving e.g. filtering and antialiasing) but this is the fastest (and the sole reasonably viable for 8 bit systems)

By hit9918

Prophet (2891)

Аватар пользователя hit9918

27-04-2011, 19:39

@ARTRAG:
I dont need every colum aligned to address xx00, I only need the 64 bytes of a column not crossing an xx00 border! So when 4 colums of 64 byte are within a 256 byte page, everything is fine. The posted code already works with this.


it could (unsure) be faster to use a sequence of HMMV commands to render the boxes that represent the scale pixels.

Some simple doubling sheme will be like halve vertical resolution, i.e. more vertical wobble:

2 pixels zoomed to 6 pixels:

1
1
1
2
2
2

2 pixels zoomed to 6 pixel when dumping 2-pixel chunks:

1
1
1
1  <- bad, should be 2, vertical wobble
2
2


But this reminds me to EOR wallmapper on Amiga:

Imagine texture columns are runlength encoded! LENGTH, COLOR, LENGTH, COLOR, ...

At zoom level 100%, the blitter is to be told to blit LENGTH bytes of COLOR vertical.

The thing may get very slow with an average texture where every next pixel is different color.
BUT it can be a real blaster when going vertical the texture got lots identic pixels -
the style of the original DOOM wall actually is a pretty good candidate for that.

What is needed is multiplication LENGTH*zoom (and maybe some further subpixel issues to find out. maybe instead LENGTH, one should think on a Y-coordinate basis, I am not sure).

With a max length of 16 pixels chunks and 16bits fixpoint values (likely this is a must) in the lookup table, one would need 8k lookuptable.

If a texture got more than 16 identic pixels in a colum, the encoder must generate two chunks.

brainstorm:
the LENGTH and COLOR bytes can be read from vram with IN A,(0x98) !
Well I hope that still works while blitter running.

A normal MSX2 got 128k vram and 64k RAM, right?
with 64k run length encoded textures in vram you will get quite some content.

And then you maybe can afford some bigger LUT for the LENGTH multiplication.

By wouter_

Champion (426)

Аватар пользователя wouter_

27-04-2011, 19:57

What about this idea: use a table that contains pre-calculated offsets for the texture for each possible zoom-factor. This table would have a size of 255x160 which is approx 40kB. (If this is meant to be used in a real demo/game I recommend to limit both the height and the zoom-range to 128, then the table is only 16kB).

Now, the Z80 has a special instruction to read a 16-bit word from a pointer and at the same time adjust this pointer to the next 16-bit word: the POP instruction. With this trick the core of the routine becomes:

    ... code to setup HMMC VDP command ...
    ... code to translate 'texture scale factor' into precalulated offset table (result in HL) ...
    LD (save_sp),SP
    LD SP,HL
    LD H,...         ; actually have to restore from input parameter
    LD B,160/2

Loop:
    POP DE
    LD L,E
    LD A,(HL)
    OUT (#9B),A
    LD L,D
    LD A,(HL)
    OUT (#9B),A
    DJNZ Loop

    LD SP,(save_sp)    
    ...

The inner loop takes 75 cycles for two pixels (37.5 cycles/pixel). With some unrolling you can get it down to about 31 cycles/pixel.

I don't think it's possible to make this routine still faster (though I'd be happy to be proven wrong). But instead of using this routine, it might still be faster to pre-calculate the scaled textures in VRAM and use HMMM commands to display them. Again if you limit both height and zoom to 128, you can store 8 texture colums in 64kB VRAM (so you have a texture of width=8, original height doesn't matter). Should be enough for a simple brick-wall texture. And since the texture is scaled only once, you can even use a slower but much nicer interpolation algorithm than nearest neighbor.

BTW I first heard about this POP-trick from David Heremans, who used it in his 3D-texture-map routines:
www.youtube.com/watch?v=-fos5HTqdWY (the 3D stuff starts at 16'00")

By hit9918

Prophet (2891)

Аватар пользователя hit9918

27-04-2011, 20:06

    ... code to setup HMMC VDP command ...
Loop:
    POP DE
    LD L,E
    LD A,(HL)
    OUT (#9B),A
    LD L,D
    LD A,(HL)
    OUT (#9B),A
    DJNZ Loop

I thought the idea of HMMC feeding a column with the cpu does not work so easy?

By wouter_

Champion (426)

Аватар пользователя wouter_

27-04-2011, 20:12

I thought the idea of HMMC feeding a column with the cpu does not work so easy?
Why not?

Страница 2/16
1 | | 3 | 4 | 5 | 6 | 7