3D raycasting

Страница 4/16
1 | 2 | 3 | | 5 | 6 | 7 | 8 | 9

By wouter_

Champion (426)

Аватар пользователя wouter_

28-04-2011, 18:33

* I'm fairly sure it's not required to check the TP status bit between OUT's when using the HMMC command. HMMC is even fast enough to handle a sequence of OUT instructions directly following each other (see below).

* I doubt using the HMMV command will be profitable. Actually it will be profitable, but only for very large zoom factors. Executing a HMMV command requires at least writes to VDP registers NY, CLR and CMD (that is assuming DY is updated correctly from a previous command and DX, NX keep their value from the previous command). This requires at least 5 OUT operations (plus some LD commands in between the OUT instructions to get the correct values in the registers). It's also possible to use the HMMC command in combination with RLE, simply OUT the same value N times (so this doesn't require LD instructions in between).
To summarize: I believe using HMMV will only be profitable when the same color can be repeat more than 5 times (perhaps even only for 8 times or still more). I'm not sure, but i think ARTAG didn't require this large zoom factors. OTOH using RLE in combination with HMMC is an interesting idea.

By ARTRAG

Enlighted (6367)

Аватар пользователя ARTRAG

28-04-2011, 18:57

Humm...

If I correctly understand the hit9918's proposal using RLE+HMMV should imply to plot large columns of constant color
If L is the lenght of the "run", C is the color of the "run" and the and S is the scale factor of the column, the HMMV commnad should set

; at init time
DX = X coordinate on the screen ; set only once
DY = Y coordinate on the screen ; set only once
NX = 2 ; set only once

; core loop
wait CE
NY = L*S ; changed and computed each "run"
CLR = C ; changed each "run"
CMD = HMMV  code

so, given that one can put in a table all the possible products L*S (if L is stored on 4 bits, the table would be 4*256 = 1Kbyte), the loop could be faster of HMMC under certain conditions:

My first question: is HMMC at max speed is slower than HMMV ? is it really true? how big is the real difefrence ?

If this is true, and teh speed diffence is large (but I'm really not sure of this), for some scale S>1, if L is in average large, close each time to its limit, there could be a siginficant gain.

What do you think?

PS
I do not see any speed gain in using RLE and HMMC, the sole reason for using RLE was to devote the VDP to the full work as much as possible. Am I missing something?

By Metalion

Paragon (1132)

Аватар пользователя Metalion

28-04-2011, 19:40

* I'm fairly sure it's not required to check the TP status bit between OUT's when using the HMMC command. HMMC is even fast enough to handle a sequence of OUT instructions directly following each other
IT IS required (just look at the flowchart in the VDP manual), but is it necessary when some T-states are used between each OUT ? ...
That's another question that can be only answered by doing some test and research.

By wouter_

Champion (426)

Аватар пользователя wouter_

28-04-2011, 20:35

IT IS required (just look at the flowchart in the VDP manual), but is it necessary when some T-states are used between each OUT ? ...
I know the VDP flowchart says it's required. But it works just fine without testing the TP bit. I've tried it. (And of course also no need to check the EC bit at the end of a HMMC command: when you've send the last byte, you know the command will end very soon.)

If I correctly understand the hit9918's proposal using RLE+HMMV should imply to plot large columns of constant color. If L is the lenght of the "run", C is the color of the "run" and the and S is the scale factor of the column, the HMMV commnad should set......
I think you understand correctly. But what do you estimate a typical run-length L will be. From your problem description: a texture of height 64, scaled to a maximum size 255, I calculate the maximum L will be 4 (is that correct?). Well, the cost to send a HMMV command to the VDP is way more than simply plotting 4 pixels using the HMMC approach. For HMMV you need to execute lots of small commands per column. For HMMC you only need to setup the command once per column and output all pixels.

By ARTRAG

Enlighted (6367)

Аватар пользователя ARTRAG

28-04-2011, 20:44

well, not really
the L refers to the encoding of the column at its standard size

for a texture of height 64, the max L is 64 <=> column of one single color
but any L in 1-64 can occur
so it is matter of how detailed is the texture (vertically)

By PingPong

Prophet (3521)

Аватар пользователя PingPong

28-04-2011, 21:47


I ask again to be clear:
There is a mode where I can do just OUT (port),A and the VDP will paste a column of pixels feed from the cpu?

yes the hmmc command
speed is fast enough to avoid testing status bits on a 3.5mhz z80, if one use a otir or outi

By GhostwriterP

Hero (528)

Аватар пользователя GhostwriterP

28-04-2011, 22:03

OK, let's mingle with a different (horizontal) approach.

Say we have a:

<offset_scale_by_line_table> table @ 4000h
This contains a texture offset per scale for each display line.
Basically 16kb blocks that can be swapped for each 64 consecutive displayed lines.

< texture > 1x 64 x 64 @ 8000h, 90 degrees rotated in ram (xx00h - xx3fh)

And two arrays, being:

< texture_array > of 128 bytes @ 0c000h

followed closely by the:
< scale_array > of 128 bytes @ 0c100h

the <texture_array> contains the texture column (or texture/texture offset),found by
ray casting per display column, and the <scale_array> contains the ray casted depth.

Then we could do something like:


  * EXECUTE RAY CASTING CODE FOR ALL COLUMNS AND STORE RESULTS
    IN <TEXTURE ARRAY> AND <SCALE ARRAY> *

  * SET VDPWRITE AT UPPER LEFT CORNER WITH AUTO INCREMENT*


  ld h,high texture_array

  ld d,high <offset_scale_by_line_table>

  ld ixh,64

.lus2
  ld l,0

  ld ixl,128

.lus
  ld b,(hl)   ; load texture/texture offset
  inc h       ; make hl point to <scale_array>
  ld e,(hl)   ; load scale
  dec h       ; restore hl pointing to <texture_array>
  inc l       ; advance to next column
  ld a,(de)   ; load texture height offset
  ld c,a
  ld a,(bc)   ; read texture byte
  out (98h),a ; out texture byte

  dec ixl
  jp nz,.lus

  inc d       ; proceed to next display line
  dec ixh
  jp nz,.lus2


  * SWAP IN NEXT PART <offset_scale_by_line_table> AND REPEAT LOOPS *

Pros: No use of vdp commands.
Cons: Let's not start listing those Wink

By ARTRAG

Enlighted (6367)

Аватар пользователя ARTRAG

29-04-2011, 08:14

I cannot really understand this latter proposal
If you work on horizontal lines you have to take care of celing and floor...
but those part of the screen usually (at least in my code) is not updated if not in the areas where the columns reduced its heigth across frames
There is no update when the column increases its own size

Moreover your inner loop seems slower than the one proposed by hit9918 and by NYYRIKKI
Is it?
Question

By NYYRIKKI

Enlighted (5541)

Аватар пользователя NYYRIKKI

29-04-2011, 10:44

About my example...

I was thinking that if you need more time between OUTs then you can use the time for more colors using rastering..
Make the loop something like this:

CORE:
	ADD HL,DE
	LD A,#40
	XOR H
	LD H,A
	LD C,A
	LD A,(BC)
	OUT (#9B),A

	DEC IXL
	JP NZ,CORE

Now you need two textures next to each other. One contains raster for odd lines and other raster for even lines.
Example:

If you put value #45 to other texture and #54 in other texture you get something like

45454545
54545454
45454545
54545454

Or if you want lighter raster you can use for example #54 in one texture and #55 in other:

54545454
55555555
54545454
55555555

(Note: you need to init H bit 6 with Y bit 0)

By hit9918

Prophet (2891)

Аватар пользователя hit9918

29-04-2011, 16:44

I doubt using the HMMV command will be profitable. Actually it will be profitable, but only for very large zoom factors.

With this thing you must look at more than just the zoom factor!

Majorly important is the amount of color changes as you walk a texture down a column.
A texture where every pixel got a different color is the desaster case.

The overhead of blitter setup: Lets look at the case of 64 colorchanges (desaster texture) zoomed out to 256 pixels:

The blitter takes roughly 1/3 time per pixel as the cpu zoom.
So 2/3 of the time of the whole column is allowed overhead for blittersetup of 64 runs.

cpu zoom: 58*256 cycles = 14848 cycles, 2/3 of that = 9898 cycles.

9898 / 64 blitterruns = 154 cycles per blitter setup.

The distribution of column sizes in the game is much like drawing a triangle from 256 pixel column case (best case for RLE engine) to 0 pixel column case (worst case). When "the integral of a triangle = 0.5", I feel this means the "allowed overhead" figure is to be divided by 2.

So when you can do it in 75 cycles, means you can compete with cpuzoom in the case of a desaster texture.

It would actually NOT be a desaster if blitter setup takes 150 cycles,
because this would mean you can compete with a texture that got averagely 32 colorchanges in vertical direction.

The deal of the whole thing being that with very simple textures crafted for the RLE method,
the blitter version would run 3X as fast. Special crafted textures for the "main walls" and you got a winner even when you add a couple of desaster jpegs to the scene.

Страница 4/16
1 | 2 | 3 | | 5 | 6 | 7 | 8 | 9