Ancient knowledge for free! (sort of)

Page 10/13
3 | 4 | 5 | 6 | 7 | 8 | 9 | | 11 | 12 | 13

By parallax

Expert (85)

parallax's picture

06-04-2006, 11:38

Also (I forgot this in the previous post), although big copies are in general faster than multiple small ones, this is not the case here, because we can optimize stuff on the small blocks. Consider that big spider again: for the blocks that make up the bulk of its body, which have no transparent pixels, we only need to do high-speed copies, and never have to construct the background under them. If you would want to use a big copy, you would have to use a logical copy for the whole spider, and still have to reconstruct all the background under it.

So because of local optimizations of the blocks, and re-use of blocks throughout the animation frames, many small copies are much more efficient than big ones.

By Manuel

Ascended (18234)

Manuel's picture

02-07-2010, 21:35

For those people who forgot: http://www.youtube.com/view_play_list?p=0643BE42BFE9BEEF
A small set of videos about the Core Dump promo, the last published work of Parallax for MSX...

Too bad this thread died!

By Hrothgar

Champion (479)

Hrothgar's picture

05-07-2010, 08:17

Is Parallax still reading along and available for questions?

By parallax

Expert (85)

parallax's picture

05-07-2010, 22:41

Hi! Well, logged in because Manuel sent me a mail... Quite nice to see the video after all that time.

Feel free to ask questions, but some things might have faded, leading to lots of Question
Don't expect quick replies though, it might take me a while to respond.

By Hrothgar

Champion (479)

Hrothgar's picture

06-07-2010, 12:45

What I'd like to know:

Some people on this forum mention it's very difficult to constantly have to swap between game code (music, AI) and feeding the VDP, especially if you have to do the latter multiple times per frame. Also unpredictable balance between CPU and VDP load causes grey hairs.

Core Dump seems to combine extremely varying workloads for both VDP and CPU, large number of moving characters, very large number of small (8×8) copies and still maintain high and constant framerates without too much slowdown if things get tough. You already explained the copying efficiency measures (only copy what's needed, highspeed copies if possible) but that seems to have been used by games such as Ys, XAK and SD Snatcher as well with varying results (e.g. those games often show significant slowdown when more blocks must be updated, showing that they too use optimization but it doesn't prevent slowdown in the worst case scenario).

What is the general framework you used in Core Dump to switch between game code and VDP code so fast and so frequently? Do you have to slice up AI code in small parts, do you use 'buckets' of prioritized tasks the CPU can perform from which it randomly picks, do you manage to streamline the calls from CPU to VDP and minimize the 'slack' that the VDP is sitting idle between tasks using some trick that other programs didn't know about?

And did you find it increasingly difficult to keep things going at a decent speed once the game progressed and more enemies and action were added, or was the difficulty mainly in thinking out the framework at the beginning and did the game fit in without complaining? In other words: is this game reaching each and every limit of the MSX2's capacity, or would you describe it as 'still well within the computer's reach with room for improvements'?

By parallax

Expert (85)

parallax's picture

06-07-2010, 16:03

Hi Hrothgar,

I worked directly on an MSX (and not through an emulator) so I had to rely on experience more than hard numbers from testing.

Some general principles I think helped Core Dump along:

- I had a big advantage coming from Akin - I learned a lot there and could improve on that for Core Dump. It was therefore easier to predict how things would evolve. (The multiple camera idea came later though and I was in fact surprised I could make it work so efficiently. See part 3 of Manuel's video, around 07:10)

- Compared to the games you mention, which were developed by large teams, I had the advantage of doing level and enemy design in tandem with code. Thus it is easy to balance the complexity of enemies (either large numbers or many blocks with transparent pixels) with the complexity of the background (foreground details, scrolling blocks with transparent pixels). Want a complicated enemy? Show some more of the underlying (static) parallax layer, etc.

- Contrary to what you may expect, there is no fancy parallel OS stuff going on in terms of prioritizing tasks. Every task gets done, and most of it sequentially. The only things that get 'cut' are graphical details in some levels (underwater bubbles only appear under lower loads). If I remember correctly, the interrupts handle music and sfx; all other stuff is done sequentially. Updating the screen is one task, nothing much gets in between. The underlying idea is that tasks are hard enough to balance already, so the focus was to optimize the tasks that are most repeated: (1) copying blocks, (2) testing which blocks need to be copied (and how).
I imagine this leaves some CPU cycles unused when there is a sequence of copies with few tests in between. But this occurs rarely, as the displaying test are quite complicated. There's basically two phases: displaying the areas covered with software sprites (complicated business, does also collision detection) and the remaining areas (simple and using high-speed copies mostly).
It would be fun to write some extension to an emulator to measure these things.

- As you suggest, it is hard on a Z80 to efficiently switch between different tasks and it causes a significant overhead. If the VDP copies are small and fast, it's debatable how much you can gain from switching to a CPU task in the meantime.

- Comparing my code to what I've seen before, I think two things are interesting. First, I used a lot of self-modifying code for table-lookup etc. This is a great gain in some cases. Similarly instead of table lookups for jump addresses you can just turn that into code you can jump into. I think I do that with the enemy objects. Second, because of the huge amount of copies it helps to reduce the overhead there. I think for Core Dump I switch the default VDP status register to S#2, so you can just do an 'in' to see if the VDP is ready.

Last point: have the limits been reached? Surely not.

There's always room for improvement. Some of the copies can be replaced by solid fills. Possibly I could have pre-computed more. I think with some modern analysis means (emulators, some debugging software to assess VDP/CPU usage) you can find even more. But because of the CPU limitations, you can't become too smart in your algorithms and some newer coding techniques simply don't apply.

However, it is hard to talk about limits in something like a game. In a demo it's more clear when you reach the limits. In a game so many things need to be done, that it is hard to plan in advance how much CPU time each aspect will require. I think that indeed experience helps here.

Does that answer your questions?

By PingPong

Prophet (3789)

PingPong's picture

06-07-2010, 16:20

@Hrothgar,
@parallax.

There is a nice tcl script on openMSX that can measure the % of the time VDP is busy on a frame.

By parallax

Expert (85)

parallax's picture

06-07-2010, 16:24

@Pingpong: thanks. Of course you'd also want to see how often the CPU is waiting for the VDP to be ready (so detect these S#2 read loops).

By Hrothgar

Champion (479)

Hrothgar's picture

06-07-2010, 16:55

Thanks for the answers Parallax. To see that things are done sequentially and apparently the copying is only done during part of the visible display (the VBLANKs being an even faster timeframe for copies, but used for something else) and seeing the end result in terms of framerate and playability, was a bit surprising to me and gives a very good picture of how much you can actually do in one frame.

By parallax

Expert (85)

parallax's picture

06-07-2010, 17:23

Hrothgar: say you do a page swap each 4 vblanks. The point is that you are copying for 75% of the time between page swaps anyway, so that is while 2 or 3 vblanks occur. Then you're doing other things, say between two vblanks. It's hard to predict in a game when all of this occurs. The cost of the overhead of trying to use CPU cycles during that period is quite large, and you cannot be sure about what it will give you in the end. Remember there's a lot of copies and you want those to be streamlined.

Page 10/13
3 | 4 | 5 | 6 | 7 | 8 | 9 | | 11 | 12 | 13