FM register log

Page 2/2
1 |

By iamweasel2

Paladin (685)

iamweasel2's picture

29-11-2020, 18:29

I was wondering, this type of replayer (where you keep reading the register and the data to be sent to the register in a loop), of course you need more space to store the data (if you don't compress it), but is it too ineficient to be used in a game, for instance? What would be the best way (in terms of CPU cycles spent) to play music in a game?

By Grauw

Ascended (10158)

Grauw's picture

30-11-2020, 01:17

I would say this type of replayer (stream of register write commands) is the most efficient in terms of speed, however inefficient in terms of space. Space efficiency is improved if you compress the data.

I think for a 32K game with PSG music that isn’t too lengthy, you could use it if the game itself is not too tight on space and if it really needs the extra performance. Otherwise, probably an MML-based or tracker-based format will probably be a better choice.

For a MegaROM game, I think space is not an issue.

For a disk game, disk space itself is not an issue, but since it typically needs to fit in RAM together with level data and gameplay code, it’s somewhere in the middle between the two. It’s not a perfect fit but not a terrible one either.

Aside from speed there are other advantages, for example because of the generic nature of the format you can make music in any way you want, regardless of whether a replayer is available.

I’ve recently investigated compression by adding “subroutines” to the sound data for repeating sections. The idea is to automate the optimisation done in MML or a tracker by hand. It does not need RAM to decompress to. The compression seems pretty good, but it’s hard to give exact predictions because it depends on how much repetition the sound data has. You can see in the percentages on that link that it varies a lot. The sound data I tested is quite short so has limited repetition, suggesting that greater gains can be made, but is also not very complex, suggesting that it is quite optimisable.

I should probably do a comparison with the equivalent tracker or MML sound data. Also, in my code I do the compression on the whole sound data. Possibly the effectiveness of the compression could be increased by having a separate stream per channel. However this would reduce the performance.

By Bengalack

Champion (420)

Bengalack's picture

21-02-2021, 09:33

Vgm-logging is great! And probably my way out of historical "cpu-hungry music replay".

I just did like you people have been talking about; used the "output.vgm" as basis for direct register writes. Currently working on FM/YM2413. Only using "51RV", "61W" and 66"-commands from the vgm-file. Where R, V, W is a register-, value- and wait-byte, respectively. I then prepare a stream. 61, or the length, is currently handled outside the steam. The stream is then assumed to be:

RV RV RV RV ...

But, as register data can never be higher than 38, I insert waits by replacing R with a value that has bit 7 set. The other bits of that byte is the wait value, in *frames* (at the moment at least). The wait-frames-value can never be more than 89 in 60Hz (65535/44100*60). It becomes something like this:

RV RV RV RV RV RV RV RV W RV RV RV RV W RV RV RV RV W ...

So, I always check the first value (R or W) before sending it, but this is mostly ok, as I have to wait 84 cycles between every data-write anyways (I use this time to check for stream-end too). The performance is really good!

Here are numbers for a test file I have:

1. Original vgm-file:       53045 bytes
2. Direct stream-file:      19215 bytes ( 64% red.)(storage in ram)
3. Pletter'ed stream-file:   3760 bytes ( 93% red.)(for storage in file/rom)

This was just my first attempt, so I'm sure there are smarter ways to do things. For example, I'd like the data, when in ram, to be smaller. Therefore I'm super-curious of the research Grauw is doing in this field.

By Grauw

Ascended (10158)

Grauw's picture

21-02-2021, 12:51

Nice, good to hear!

About the compression research, I can say that it works and it can reduce data size by a few tens of percents (average in my test data set was 42%). I got code but it needs some work to be brushed up.

By Bengalack

Champion (420)

Bengalack's picture

21-02-2021, 13:26

Ok. Actually it might be good to have the original mbm-file size in as well. Excluding the custom so-called sample-kit file (which I couldn't see was used anyways) at 32kB (standard), we have:

1. Original mbm-file:        4297 bytes
2. Pletter'ed mbm-file:      1705 bytes

1. Original vgm-file:       53045 bytes
2. Direct stream-file:      19215 bytes ( 64% red. from vgm. 447% incr. from mbm)
3. Pletter'ed stream-file:   3760 bytes ( 93% red. from vgm)

From what I could read, the mbm-file looks the same in memory as on file. If so, we have a way to go to close in on the original memory usage Smile

By Bengalack

Champion (420)

Bengalack's picture

23-02-2021, 08:04

And once again, I find myself curious of how these things work. I have never looked into the inner workings of compression algorithms before. And here I am, reading up on LZ77 and such. Ha ha. In this "going-back-to-retro-programming" of mine, I've certainly been taken in many different directions! Interesting Smile Let's see how far we can get those numbers.

By Grauw

Ascended (10158)

Grauw's picture

23-02-2021, 11:15

At least if you’re interested in this style of compression I would suggest to start reading up on the posts I made in the assembler optimizer thread starting here. It’s a different compression approach from LZ77 (though of course reading up on that is no wasted effort either).

By Bengalack

Champion (420)

Bengalack's picture

28-02-2021, 20:12

Thanks a lot! I read through most of it. Compression is all new to me, so I don't think I grasped it all, but I took an idea from LZ77 and made some custom code.

I ended up with a compression of around 70% reduction, which I am super-happy with.

1. Original vgm-file:      53045 bytes
2. vgm-YM2413-redux:       20832 bytes
3. Homebrew-pack (from #2): 5843 bytes (89% red. from vgm)

The idea is that the compact form should reside in memory, but it will take a certain amount of cycles to read this memory (like "decompress-on-read"). I haven't made this code in z80 yet, but I fear that it might be too time-consuming for my own needs. Lots of recursion and whatnot. Even if I get it pretty fast, I fear it is not fast enough for a game that has already spent most of the cycles of the frame. But, I'll let you guys know the timings when it's done.

This might not be the right thread to discuss though, so I'll stop here Smile

By Bengalack

Champion (420)

Bengalack's picture

28-03-2021, 19:04

Ok, @Grauw, I looked more into this. Your papers, with LFS/LFS2 and Larsson's doc with MFFS and all kinds of stuff, is definitely the way to go. I'm totally scrapping my current entry to this, as it shows that it only compresses off 30% with no nesting. And it turns out that my algorithm becomes too heavy cpu-wise with nesting. Nesting gives up to 70% reduction, but I have found that I can't use it runtime.

The brilliance of your approach is the fact that it remains compact in memory/rom and at the same time (almost) "need no decompression". So, it is fast. And sometimes *fast* is the only sensible alternative on a Z80. If I understand this correctly, there is nesting, so there will be some (but little!) overhead for each nest. But testing on real data will reveal if this is any problem at all. As for vgm-data for YM2413, we have a stream of byte-pairs for every command to the chip. During one frame I have seen a worst case of 32 byte-pairs in my sample tunes (typically at start up of a file). Question is how many nestings you risk in such a sequence.

Anyways, the crux of all of this, is the approach of the compressor and how (well) it chooses to build-up the file, and this is where your stuff seems great.

The papers mention non-terminal characters. I presume this is non-ascii (8-bit set?). How do you deal with that? GhostnGoblins includes bytes in full range I guess? And also, do you plan do release any of your tools? Smile

By Grauw

Ascended (10158)

Grauw's picture

28-03-2021, 21:33

Yeah the nesting I think won’t be too much in practice. And you can limit the nesting depth if needed, at the cost of compression size.

The non-terminal codes are only used in the compression process to make the context free grammar. When you render it out to a command stream, the non-terminal codes are replaced with a CALL command to the address of the subsequence elsewhere in memory, and the subsequence gets a RET command at the end.

Page 2/2
1 |