Author
| Have the VDP limits been reached?
|
Hrothgar msx freak Posts: 137 | Posted: September 19 2008, 22:21   |
I'm trying to understand your code and the exact way you're trying to achieve the copies (I don't have particularly good coding skills unfortunately so sorry if I'm misunderstanding). You're assuming a near-full Screen 5, built up of 16×16px tiles, and a total stage wide 1024px (or only four horizontal screens) right?
My assumptions:
- Only the initial starting page has to be built from scratch (using 16×16 or even 8×8px tiles);
- The second page is then built up by entirely block-copying the first page leftward by 16px and filling the edge;
- From now on we start swapping page 0 and 1 - no further large-scale copying between pages 0 and 1 is performed after the first step;
- Instead we only copy necessary (changed) blocks (not tiles). Here we limit ourselves to blocks with width one or more times 32px, and height one or more times 16px (as each page will change 32px horizontally relative to its previous state). I guess this is more efficient than 16×16 tiles; larger copies are slightly faster and you'll certainly have fewer copy commands than 15×11 for the whole screen even in worst cases.
- With only multiples of 32px in horizontal and 16px in vertical direction, we should be able to store this rather efficiently;
- Do you have to store each pair of source tile - destination square? Rather store X-position (which is an integer times 32px from the start of the level map), Y-position (which is an integer times 16px), Width (which is an integer from one to e.g. 8) and Height (which is an integer from one to e.g. 8).
- Keep in mind that each 32px scroll position may reuse copy commands from the previous scroll position except for filling the edges. Don't independently store all of those: if on scroll position S you have to copy block [X=6; Y=2; W=4; H=2] 32px to the left, then on scroll position S+1 you have to copy [X=5; Y=2; W=4; H=2) 32px to the left. You can truncate a copy command at screen edge when needed, but this doesn't have to be reflected in how it's stored.
Only filling the edges may require more fine-grained copies including a limited number of source tile -> destination square. Store those separately, but still optimize them by skipping identical blocks and/or copying a 16px wide column from the alternate page and/or copying blocks (as large as possible) from elsewhere on the same or alternate page.
Again: I don't have practical coding experience good enough to code this myself but I hope the situation isn't as bad as the example you mention above. Please correct me if I'm missing something totally obvious.
|
|
ARTRAG msx guru Posts: 2227 | Posted: September 19 2008, 22:33   |
Add some constraints to your algorithm:
You need to build at the very first step (out of 16) the rightmost column 16x176 (or 1x11 in tiles) or you will not be able to build the right border.
You cannot use differential coding on the leftmost column 16x176 due to the fact it is deleted by the black left border.
With this constrains, do you think you can use blocks 32x16 ?
If you stay with one size eg.16x16, you can code in one byte x,y (each in a nibble).
Having two different sizes, you need to distinguish when you encoded 32x16 by 16x16 and this does not fit in a bytes.
|
|
Hrothgar msx freak Posts: 137 | Posted: September 19 2008, 23:32   |
When not using sprites for borders, I agree about the edges. And indeed: even though the pages scroll at 32px intervals there might be 'halves' of these blocks that are still identical to the previous scroll position.
Still, I have the gut feeling that it is more wasteful to store each and every 16×16px block in the center part of the screen and copy all of the changed ones individually instead of trying to find a minimum number of larger block copies (in multiples of 16px in both directions). When using so many copies that small, isn't your 'slack space' as a result of synchronizing CPU game code and VDP busy time getting out of control, reducing the benefits of partial copying altogether?
Take below two images for example. When scrolling that horizontally, would you rather copy the changed pillars and windows in loose tiles or in vertical strips?
http://www.bagofnothing.com/wordpress/wp-content/uploads/2006/04/mario.gif
http://www.videogamecritic.net/images/nes/castlevania.jpg
OK, when standardizing on 16×16px for storing copies, we could still do the center-part copying in up to 4 tiles wide and 8 tiles high blocks in one copy operation in two bytes. Will need different algorithms for the center and the right-edge copying though. |
|
ARTRAG msx guru Posts: 2227 | Posted: September 20 2008, 01:05   |
For your curiosioty
https://sites.google.com/site/testmsx/working-files-of-total-parody/TPengine.rar?attredirects=0
In this version I pre compute in a HUGE ram array, for all the scroll positions, the data of the tiles to be updated.
The big advantage is that I can do a very good load balancing among the 16 interrupts I can use to render the "next" page.
Look at the blue bar. Its length is almost constant. This means equal load among interrupts.
The bad side is memony usage, at certain point you see garbage. The ram buffer was ended so the player plays random data.
a room of 64x11 of 16x16 tiles has 49 scroll positions
each scroll position needs up to 11x15 data for tile update
a data for tile update is 2 bytes
scrolling right needs : 49*11*15*2 = 16170 bytes
the same for left !
and you cannot do vertical scroll without new tables...
we need another way...
|
|
Hrothgar msx freak Posts: 137 | Posted: September 20 2008, 09:07   |
I'm still not getting why you say that each scroll position needs 11×15×2 bytes; The only thing that's unique for each scroll position is filling the 16px right edge with new tiles, for which you need at most 11×2 bytes. For updating the rest of the screen (all columns except the rightmost) you use one lookup table that spans the entire levelmap, of which you read out the section that is relevant for the current scrollposition (which is 14 columns out of the level map width). But in the next scroll position for the same page, 12 out of those 14 columns will be the same so you're not storing that copy information redundantly instead you do a lookup 2 steps further in the level map. For 12/14 you simply reapply the same copy instructions offset 2 columns to the left. That's a dramatic saving already, not even taking into account the possibility to store larger blocks than 16×16 px to copy in one copy instruction, or using some sort of partial repetition/compression for parts of the level.
Indeed scrolling vertically becomes a whole new challenge but I think for the pure horizontal scroll it can't be as bad as you mention.
|
|
ARTRAG msx guru Posts: 2227 | Posted: September 20 2008, 11:56   |
Note1:
all this puzzling is to allow animated background. Otherwise we just copy the existing screen in slices and we forget about the rest (as on MB2). The vdp will be fully used and the CPU will be totally free working in parallel.
Note2:
The "resuse" of the info seems trivial. The fact is that it introduces and additional level of complexity.
First, adding an offset to the destination implies that you do not have a true precomputed info, but that actually you compute the current destination at each step for each 15x11 tile.
Second, the logic managing the calculation of offsets and managing the starting point in the "unique list" of data is not trivial.
let's see this second point that is easier to explain.
If you have a unique "datastream" that encodes differences, for each scroll position you need:
1) a pointer to a starting point into that datastream
2) to know how many data have to be taken from the datastream to render the current scroll position
3) the offset to be added to each data in the datastream.
WRT to 3), this is the very very critical info, as it could change column by column !!
Assume to compute the delta infos for offset 0
you get 11x15 couples of bytes
compute the delta infos for offset 1
you get 11x15 couples of bytes
compare the data:
data for columns from 1 to 14 of offset 0 and
data for columns from 0 to 13 of offset 1 differ for a +16px offset in the X destination
BUT
column 14 of offset 1 is new and unique and has offset 0px has it has been exactly where it is
If you do not see the issue
compute the delta infos for offset 2
you get 11x15 couples of bytes
and compare it to those of offset 0 and of offset 1
compare the data:
data for columns from 2 to 14 of offset 0 and
data for columns from 0 to 12 of offset 2 differ for a +32px offset in the X destination
BUT
data for column 13 of offset 2 differ from column 14 of offset 1 for a +16px offset in the X destination
and
data for column 14 of offset 2 is new and unique and has offset 0px has it has been exactly where it is
Can you think to a nicer way of arranging offsets....
If you can find a good organization of the offset data
the ram storage could be bearable
|
|
Hrothgar msx freak Posts: 137 | Posted: September 20 2008, 15:03   |
First of all you're apparently trying to free up VDP time to do animation and parallax scrolling where I was more thinking about faster or variable scroll speed of 2 or maybe even more pixels. Principle is the same.
When having (e.g.) a 4096px wide level with the screen built up of 16×16px blocks, we have in essence 256 columns of tiles and 240 scrolling positions. We store the delta copy commands in two bytes per command, containing an integer of 1 to 240 to indicate the X-position. When walking from left to right we're going to need copying info for columns S to S+15, where S is the current X-scroll position.
Assuming a one-direction scroll (walk left to right only), we start reading out the copy table for all relevant copy commands, where the relevant ones are the ones with X-coordinate S to S+15. As soon as we encounter the first copy command that is relevant for the next scroll position, store a pointer so we start reading from that position the next scroll state. (Alternatively we can create an index for the 240 scroll positions beforehand.) As soon as you encounter the first copy command that is no longer relevant for the current page (X-coordinate S+16 or later) we're finished for the remainder of the page.
As all of this is quite linear and in multiples of 16px, and using integer counters to do the math, I was hoping this wouldn't be a great burden CPU-wise.
A number of possibilities still:
- If we also do copies wider than 16px (e.g. up to 4 tiles wide) our window for starting to read the copy commands is larger as relevant copies can already occur on X-starting points three columns earlier;
- To be decided if the differential copying is done on the same page (having a 32px difference) or the previous alternate page (having a 16px difference). What do you think is best? Is there a speed penalty of copying from the same or from other pages? Anyhow this probably depends heavily on whether you have a level design with much 16px or 32px repetition.
And yes: you're of course right the rightmost column can never be described in the same way so another algorithm for that one is needed. This may also be true for the second-to-last column if we're copying from the same page (scroll position -2) instead of the previous page (scroll position -1).
|
|
ARTRAG msx guru Posts: 2227 | Posted: September 20 2008, 15:18   |
Two points are not clear to me in your description
1) I do not see how you decide what is the offset to be given to each block to be rendered.
2) the strategy of stopping the scan when you fine a tile belonging to the next frame does not allows a load balancing among the 16 interrupts. That is the sole and major point for pre-computing data...
Only if I know in advance how many tiles I'm going to update in total, I can set equal load at each tick (we have 16 interrupt ticks), otherwise pre-computing is pointless.
What I do without pre-computing is:
I have 16 interrupt ticks,
at each tick I compute on fly if a tile has to be updated or no.
I update no more than 11 tiles per tick.
In this way in the worst case I copy 11 tiles per frame, in the average case
I have that the last ticks are underused as the screen is already drawn.
PS
My sources are in C, looking at them you could get a better insight of what i mean.
|
|
Hrothgar msx freak Posts: 137 | Posted: September 20 2008, 16:25   |
1) I'm not sure I understand the question. If you're at scroll position S=5, we're displaying columns 5 to 20 so we'll need information from the copy table that applies to X-columns 5 to 18/9 (the rightmost one or two need different logic). The actual copy command then translates the copy table position (X,Y) to screen position (X-S,Y) and fills that tile with the pattern already present one or two blocks to the right (copied from the same or the alternate page). As I see it the whole offset-thing can be done by subtracting the current S-scroll counter from the X-position in the copy commands table.
2) My idea for this whole delta-copying was to be able to provide higher scrollspeeds that would not at all be possible with full copying. As long as you can guarantee the deltas are small enough to get a 2px scroller done in time (and you can precompute that it will, otherwise you have to modify your level map until it does), I don't see how it matters if you have two whole idle frames at the end of the cycle, or some slack at the end of each interrupt. Obviously you have a few other use cases such as doing nice things (extra copies/movements) so your interest will be different.
Anyhow: the precomputed copy commands are already stored with an index on their X-position, and they are both stored and read out linearly. So if needed you can interrupt the scan temporarily when you find out you're running ahead of schedule during the current page (which is the case when you run into a copy command with an X-position larger than the current interrupt; or larger than twice the current interrupt for scrolling with 2px per interrupt).
I'll attempt reading your sources, but my experience with C is as limited as with Assembly. Granted, it's *slightly* more readable  |
|
ARTRAG msx guru Posts: 2227 | Posted: September 21 2008, 00:33   |
Having VDP/CPU load balancing is crucial for a game as you cannot forget to move the sprites if the current int has exceed the time of a frame.
Honestly I do not see a true advantage in "precomputing" the deltas if I have to compute on fly their final position.
Do not think that computing if to skip or not blocks is that time consuming...
You do not save a lot of CPU in both the two approaches...
The point was only that if you do not know in advance how many copies you will do in the current frame, you will not be able to split them among the interrupts. (that is very important in real games)
I want to try a new solution, where I do not store which tiles have to be update, but only the total number of them for each scroll position.
In this way I can do a code that computes the delta at run time knowing in advance how many blocks have to process at each step....
|
|
Hrothgar msx freak Posts: 137 | Posted: September 21 2008, 11:11   |
Quote:
| Honestly I do not see a true advantage in "precomputing" the deltas if I have to compute on fly their final position.
Do not think that computing if to skip or not blocks is that time consuming...
|
Honestly? For each scroll position you have to look up 165 tiles from the level map (only for 11 tiles when precomputed, the rest is static + delta table) and then you check all of those 165 individually - I would have guessed that takes a while. Computing the copy position using offsets is one integer addition with a multiple of 16px for each scroll position, where that offset is global for the whole page per scroll position (not different for all the 15 columns) - is that CPU intensive?
Quote:
| You do not save a lot of CPU in both the two approaches...
|
But perhaps you do save VDP time by doing fewer but larger copies? Sometimes you have to copy a larger block (say 8 tiles) - the VDP time alone is smaller with 1 copy instead of 8, let alone the ineffiency of aligning CPU and VDP to send the copies 8 times instead of 1.
Quote:
| The point was only that if you do not know in advance how many copies you will do in the current frame, you will not be able to split them among the interrupts. (that is very important in real games)
|
Understood, but I still don't see the difference between my approach and yours. You apply at most 11 copies per interrupt when needed. In my approach can't you just do something similar as described above? You can either do at most 11 copies as well or at most 1 column per interrupt. In the end computing the total number of copies per scroll position can also be combined with the precalculated ones - but OK: if you say precalculation doesn't save more CPU cycles than it takes again to calculate fixed offsets that's purely theoretical.
By the way, it was mentioned having to double the copy instructions table to also be able to scroll in the other direction - but aren't the delta copy commands symmetrical to begin with? |
|
ARTRAG msx guru Posts: 2227 | Posted: September 21 2008, 23:13   |
Quote:
|
For each scroll position you have to look up 165 tiles from the level map (only for 11 tiles when precomputed, the rest is static + delta table) and then you check all of those 165 individually - I would have guessed that takes a while. Computing the copy position using offsets is one integer addition with a multiple of 16px for each scroll position, where that offset is global for the whole page per scroll position (not different for all the 15 columns) - is that CPU intensive?
|
I agree, the fact is that all that CPU time is spent during the VDP command execution.
Thus, unless you plan to use that time somehow (not trivial at all!!), the CPU cost can be afforded.
The reason for load balance is to have enough time without being busy in feeding the VDP free for the game engine. The time spent in waiting the VDP is hardly usable for that.
Quote:
|
But perhaps you do save VDP time by doing fewer but larger copies? Sometimes you have to copy a larger block (say 8 tiles) - the VDP time alone is smaller with 1 copy instead of 8, let alone the ineffiency of aligning CPU and VDP to send the copies 8 times instead of 1.
|
Larger copies are very complex to manage.
1) You need to find a good strategy of scan of the page
2) You need to encode in the infos you store variable size of the copies
Actually, as I copy all the changes from page 2, i.e. from the tile pages, I cannot exploit large copies.
I have also page flipping (between page 0 and 1): deciding witch page copy from, in order to use variable size of copies, becomes a nightmare.
|
|
Hrothgar msx freak Posts: 137 | Posted: September 22 2008, 09:04   |
As for the complexity: I think it's very much reduced if we store the copy instructions per column (per X-position in the level map, so letting go of the idea of block copies more than 1 tile wide), where we can take all adjacent tiles (below each other) as one single copy. That way we have between zero and six copies per column. Having a few spare bits (only having 11 tiles per column) you can still encode the whole copy in two bytes, usable for both directions.
OK, your constraints of actually having this work in a game are probably valid even though it is technically doable (but I still have the Bombaman comments in mind - how small can you get your sliced game code with VDP commands in between??). And how do you see game code run in your code? Only one small block per interrupt - is that enough?
|
|
ARTRAG msx guru Posts: 2227 | Posted: September 24 2008, 08:22   |
Hrothgar
I followed your main idea, now the X position is computed on fly, thus the ram consumption is divided by 15.
The screen is "delta" encoded, all the copy positions are pre-computed and stored in a dynamic list with max size of map_w*11.
The rendering engine restore about the same number of tiles at each frame, thus the load is almost perfectly balanced.
(the exceptions are/should be column 14 and 0, due to borders)
The code is still buggy but working, (the demo image is done to not show the bugs
https://sites.google.com/site/testmsx/working-files-of-total-parody/TPengine080924.rar?attredirects=0
send me a map and a tileset in any format you like and I'll implement them
apropos
data in the opposite direction have to be different due to the order of the delta blocks in the list.
In order to do the borders I need to put in the list first the new entering column
i.e. 14 when moving right an 0 moving left |
|
Hrothgar msx freak Posts: 137 | Posted: September 24 2008, 16:42   |
Quote:
| In order to do the borders I need to put in the list first the new entering column
i.e. 14 when moving right and 0 moving left
|
OK, but how does this make bidirectional scrolling with one delta list impossible? If you standardize on copying from the alternate page (I still guess that doesn't give a performance problem compared to copying from the same page) you always build one column from scratch (the entering column) and for the other columns you can copy the necessary tiles from the alternate page which by definition has exactly those columns still available.
I only see a problem with the main character turning around; that necessitates some extra code to make sure the correct columns are filled during a delay cycle when scrolling is halted, before continuing with the regular loop.
What are the best tools available to make tilesets and maps BTW? (either on MSX or rather on PC) |
|
|
|
|