3D rasterisation

Page 1/4
| 2 | 3 | 4

By Grauw

Ascended (10768)

Grauw's picture

20-06-2017, 23:03

Hey all,

Way long ago when I was still a kid in middle school I entered a short Basic listing from MCM or some other magazine which animated a cube rotating around the Y axis. That was pretty cool. I remember trying for hours to tweak the way the sine functions combined to make the cube rotate in three axes, only to fail Big smile. Now, many years later when I apply 3D math daily at work, I’m revisiting 3D on MSX…

So let’s discuss 3D rendering stuff! Open discussion, let’s talk matrix math, scan line rendering, bresenham, z-buffering and all that! I’d love to read your ideas and see some videos and demo code.

(One small request though, let’s focus on 3D rasterisation rendering techniques (like Quake), and keep 3D raycasting (Wolfenstein, Doom) to other threads.)

Login or register to post comments

By Grauw

Ascended (10768)

Grauw's picture

20-06-2017, 23:06

Let me start off by introducing a little project which I started a few weeks back, I simply called it “ThreeD”. I’ve got some of the basics implemented, it’s not mega impressive thus far but I’m enjoying myself with it Smile.

Wireframe rendering video
Filled polygon rendering video
Source code

It would be nice if I could eventually render some full-screen 3D models and scenes of reasonable complexity at acceptable framerates. I know, ambitious. Target platform is turboR, because it’s fast and has a hardware multiply instruction.

The idea was sparked by some recent discussion on Overflow’s IO demo thread, where a CPC demo was shown which animated a (precalculated) flat-filled 3D gameboy model in full screen by only updating the pixels which changed between frames.

That’s pretty interesting, because on MSX (and turboR in particular) the VDP I/O is the bottleneck; at 60 Hz one can only update about 1/10th of the screen (5) within the time of one frame. By only updating pixels which actually change, this time could be used much more effectively.

So this is the basis of my approach; first I render the scene into a run-length encoded image buffer, and then I draw the screen by comparing this with the previous frame’s buffer and only update the pixels which have changed.

For a flat shaded 3D model, the only pixels changing between frames are around the edges. So as the model is scaled up, in theory the number of pixels should scale fairly linearly, and it could be rendered at large sizes without the usual quadratical increase in cost.

By Manuel

Ascended (19466)

Manuel's picture

20-06-2017, 23:10

Do we already know which techniques were used in the Calculus demo by Compjoetania TNG? (David, Wouter?) There it is also pretty impressive IMHO Smile

By wolf_

Ambassador_ (10109)

wolf_'s picture

20-06-2017, 23:12

Grauw wrote:

That’s pretty interesting, because on MSX (and turboR in particular) the VDP I/O is the bottleneck; at 60 Hz one can only update about 1/10th of the screen (5) within the time of one frame.

And what's that performance like using a v9990?

By Grauw

Ascended (10768)

Grauw's picture

21-06-2017, 00:13

Manuel wrote:

Do we already know which techniques were used in the Calculus demo by Compjoetania TNG? (David, Wouter?) There it is also pretty impressive IMHO Smile

Definitely! I hope some of the Compjoetania TNG guys will pitch in Big smile.

For reference, Calculus video (in a bad emulator).
And also this Sandstone video, also by CTNG.

By Grauw

Ascended (10768)

Grauw's picture

20-06-2017, 23:33

wolf_ wrote:
Grauw wrote:

That’s pretty interesting, because on MSX (and turboR in particular) the VDP I/O is the bottleneck; at 60 Hz one can only update about 1/10th of the screen (5) within the time of one frame.

And what's that performance like using a v9990?

Good question, I started briefly to IFDEF some V9990 draw code into the RLE image class but removed it to keep things simple for now. Should be simple to prototype though.

I/O to the V9990 is a lot faster than the V9958 (10 cycles rather than 54), but still relatively slow compared to internal memory. Also the much reduced pixel update count still applies, and VRAM write addresses are a quick to set, so random access is easier, too. So I think it will benefit as well.

Btw, in the filled polygon test I have currently, the delta update when the cube is closest to the camera takes 1.6 frames on V9938, regardless of screen mode 5 / 7 / 8.

By AxelStone

Prophet (3199)

AxelStone's picture

20-06-2017, 23:26

Man that CPC demo rocks!

By Grauw

Ascended (10768)

Grauw's picture

21-06-2017, 00:37

Another neat thing about the delta updates: so long as pixels don’t change, they are not touched. This means that the 3D graphics can cohabitate with 2D bitmap content, so long as they do not overlap! Something like this @ 10:47.

By extension, if you populate the span buffer with a fixed mask containing e.g. some box cut-outs, you can guarantee that these pixels will never be touched. The VDP command engine is currently completely idle so it can easily be instructed to play animations in these cutouts, keep score overlays, etc. Think of the character faces in StarFox.

Then, if you special-case the background colour 0 in the RLE image buffer to take its actual colour data from a background image in memory, your 3D model will be able to freely move in a 2D environment, as if it were layered between two 2D bitmap planes. Now, remember Final Fantasy 7?

By hit9918

Prophet (2932)

hit9918's picture

21-06-2017, 19:03

how does RLE datastructure look like?
well I can imagine run-length, just a lengh byte and a color. but how to render in it.

By hit9918

Prophet (2932)

hit9918's picture

21-06-2017, 19:35

ok I think it is like insert an element in a list.
and when one hits the middle of a span then make that two pieces.

By Grauw

Ascended (10768)

Grauw's picture

24-02-2020, 21:35

@hit9918 Indeed, basically. There’s three steps:

First the scanline rendering loop renders into a span buffer. It is a linked list describing a series of horizontal spans on a single scanline, and takes care of z-culling. When a new span is inserted it loops through the (next, depth, color) triplets until it finds the start position, and then inserts the span, cutting it up if necessary. The span buffer is stored like three aligned 256-byte arrays for efficiency (only operates on h and l, and next doubles as x coordinate).

After each scanline, the span buffer is flushed into the RLE image buffer, which is indeed as simple as you describe it, just (length, color) pairs. Because the linked list of spans is already an almost-RLE format, this code is pretty simple.

Finally when it is updating the screen, it takes a reference to the new RLE image buffer in IX and the previous one in IY, and it loads the current (length, colour) into two register pairs, takes the minimum length of the two, and compares the colours. If the colours are different then it draws them. Finally it subtracts the minimum length from the other length and gets a new (length, colour), and loops as long as there is still data remaining.

Page 1/4
| 2 | 3 | 4