Grauw’s RPG in development

페이지 22/22
15 | 16 | 17 | 18 | 19 | 20 | 21 |

By Grauw

Ascended (8516)

Grauw의 아바타

07-09-2019, 17:23

Grauw wrote:

With this I can update the SCT with eight 64-byte HMMM copy commands, a decent size. I did some pen-and-paper maths and this should take about 1250 cycles per copy, give or take 10000 cycles in total, so already a small performance win.

I measured in openMSX the exact number of CPU cycles to perform a 64-byte HMMM and YMMM during active display with sprites enabled:

HMMM: 1465 cycles (calculated during blank: 939, 1.56x)
YMMM: 1345 cycles (calculated during blank: 683, 1.97x)

Additionally the CPU overhead to dispatch the command is 285 ± 15 cycles.

This can be explained by looking at the spacing of the access slots during active display; 32 VDP cycles apart, and every 4th slot is used for the refresh. Due to access slot scarcity YMMM’s transfer rate goes up from 64 cycles per byte to 128 cycles per byte (2x). HMMM’s transfer rate goes from 88 cycles per byte to 128 cycles per byte as well (1.45x).

The small difference remaining between the two is due to access slots being a little less regularly spaced in the horizontal blanking area, but for the most part HMMM and YMMM are the same speed here.

HMMV in comparison goes from 48 cycles per byte to 64 cycles per byte (1.33x) so it is twice as fast and gets a lower penalty.

This is good info for future maths Smile, ’cause I underestimated it this time.

By DarkSchneider

Paladin (882)

DarkSchneider의 아바타

07-09-2019, 18:08

Grauw wrote:

You can still select individual positions and patterns for each sprite. Just the colours must be pre-set in a table in groups of four, so if the legs change then you just need to update it. This is not something that needs to change every frame, and even if it does then you can put all permutations in VRAM. (Also this isn’t really a common case I think.)

I missed to explain about my idea for using the VDP commands. What I wanted was to get extra animations, it is nice to have a character with 20-30 frames, that does some body dancing after a while idle. Because 32 ( 256 (8x8) / 4 (16x16) / 2 (multicolor) ) without mirroring for all characters in the area is too low. With animations characters brings to life.

Then, the idea, and is what I feel is the best way to use that, was equally to use the 4 sprites blocks, then use the VDP command to update the sprite gfx as they are SC5 128-bytes line aligned (so you copy by full lines), and meanwhile copy the SCT data with the CPU for that set. The problem is that this way was only capable to draw 8 up to 4 sprites characters!, including single bullets.

By Grauw

Ascended (8516)

Grauw의 아바타

07-09-2019, 18:19

I do exactly that actually for the player animations! Smile

It works very well, the player sprite animations are super cheap on the CPU. The pattern table entry must be aligned to 16. Because I currently use 16-colour sprites for the player (for graphics conversion convenience), I don’t need to change the SCT (but I do so anyway Tongue).

By DarkSchneider

Paladin (882)

DarkSchneider의 아바타

07-09-2019, 20:02

Well then add that to any number of characters, plus the V9938 cannot use commands on SC4 so have only the VBLANK plus the score layout on SC5 to operate. Not sure how much can fit there and many alignment problems, added to the SAT reordering for flickering that adds extra handling work.
At the end the idea is stored in a drawer to think about something to manage it. So currently using the CPU.

But back to the topic, in your case do you really have something to do while the command? Using CPU are 64 OUTI vs 15 OUTI plus reading the finish flag to start the next command. Some tasks are easily parallelizable, but in this case what task to do meanwhile, because there is no difference between OUTIing or polling the finish flag if there is nothing else to do.

By Grauw

Ascended (8516)

Grauw의 아바타

07-09-2019, 20:38

DarkSchneider wrote:

Well then add that to any number of characters, plus the V9938 cannot use commands on SC4 so have only the VBLANK plus the score layout on SC5 to operate. Not sure how much can fit there and many alignment problems, added to the SAT reordering for flickering that adds extra handling work.

I think you’re proving my point that “it’s unavoidable to tailor the game’s design to the system’s limitations, and the engine to the game’s requirements.” Smile

For this game, I think this approach is going to work, but there is no generic solution.

DarkSchneider wrote:

But back to the topic, in your case do you really have something to do while the command? Using CPU are 64 OUTI vs 15 OUTI plus reading the finish flag to start the next command. Some tasks are easily parallelizable, but in this case what task to do meanwhile, because there is no difference between OUTIing or polling the finish flag if there is nothing else to do.

Yes, the logical task to do in the meanwhile is write the two attribute table frames for the 4 sprites that it’s copying the colour data for! This will save 9001 cycles (a lot!).

By DarkSchneider

Paladin (882)

DarkSchneider의 아바타

08-09-2019, 09:54

Grauw wrote:

I think you’re proving my point that “it’s unavoidable to tailor the game’s design to the system’s limitations, and the engine to the game’s requirements.” Smile

For this game, I think this approach is going to work, but there is no generic solution.

There is, there is. Thinking about it I'll probably use a single flag (only one dynamic character can have it) and that character will use the VDP command at vblank, all the others by CPU. So usually and at start it will be the player, but then if a larger final boss want to use dynamic, the inital script of the area would change the flag to the boss, and the die or exiting area script would put it to the player again.
It must be in a controlled environment as the character must be aligned. That would be the reason of no automatic selection, then each game would take care about by its own and based on its design.
But, this would be for the future as it also requires the character to have all the gfx in blocks, so cannot use frame composition with non-consecutive sprites gfx (this is, have to store full frame sprites always for each frame). But this is not much a problem as there is much VRAM so parts can be duplicated.
For using V9958 or SC5, it should be adapted (no need to be at vblank), but as it requires its own modules, it would be included.

By Grauw

Ascended (8516)

Grauw의 아바타

08-09-2019, 19:39

I was looking at the random number generation I use for the mana shroud effect; currently I am generating a new base x and y position using the Xorshift algorithm, which is reasonably performant. However I was thinking wouldn't it be nicer to just have an incrementing number and scrambling that to produce a random sequence?

My current thought is to pregenerate a 256-byte random number array, and then indexing into it to get a random number. To provide different sequences, I can xor the index with some arbitrarily chosen value.

This has the advantage that 1. it is faster than Xorshift, 2. it can produce 256 random values rather than 255, 3. I can generate different sequences more simply whereas for xorshift I need to parameterise it with 3 values from a very specific set (statically via macro parameters), 4. if I want to have longer period I can simply use the random number for the MSB of a 16-bit index as the xor value for the LSB and I will get a unique (semirandom) sequence. The cost is 256 bytes of aligned memory, but that's okay.

Edit: it works really well, and I was able to reduce the amount of state for the effect to just one word.

By DarkSchneider

Paladin (882)

DarkSchneider의 아바타

09-09-2019, 10:36

Cannot help much with the generator as I use the C function, but about using LUTs, yes they are a good resource. The typical is that if you need trigonometry to generate the required SIN LUTs obviously. But these values are fixed, in the case of random, I'd suggest to update the values in a ring buffer way when can (like when no scroll after some time) for not losing randomness. using always the same values for a long play the LUT could be noticed in loss of that randomness. Probably the player didn't notice as it plays the game "as is", but at the end we know it is there.

By Grauw

Ascended (8516)

Grauw의 아바타

09-09-2019, 12:25

It just needs to appear random, not using it for cryptographic purposes or anything like that Smile. So as long as the period is long enough I think I’m good. I also need different sequences, e.g. if I set X and Y to a random value they should be different; I could offset their index, but a xor mixes it up even more.

I could randomise the xor values every once in a while. I don’t think it’s necessary though.

Anyway this approach with the LUT and xor-ing the index seems pretty powerful and flexible so I like it!

By DarkSchneider

Paladin (882)

DarkSchneider의 아바타

28-09-2019, 09:51

How do you handle visibility? I use checking hit of the object AABB against the camera frustum (that is the rectangle camera position (x, y) until (x+resX, y+resY) ).

페이지 22/22
15 | 16 | 17 | 18 | 19 | 20 | 21 |