Assembler Optimizer

Page 25/42
18 | 19 | 20 | 21 | 22 | 23 | 24 | | 26 | 27 | 28 | 29 | 30

By Grauw

Ascended (9684)

Grauw's picture

08-11-2020, 23:21

I completed the output file generation including tail call optimisation, tail jump elimination, and inlining repeats which are not providing any gain. The file size reductions are as follows:

Track Original Compressed Reduction    Repeats
 01      348      244      30%            14
 02     1827     1383      24%            93
 03      552      469      15%            33
 04     3899     2820      28%           329
 05     2918     1662      43%           196
 06     1196      776      35%            62
 07     1023      360      65%            27
 08     3116     1431      54%           200
 09     1256     1040      17%            89
 10     3121     1531      51%           192
 11     4685     2748      41%           410
 12     2652      901      66%           141
 13     6320     3406      46%           562
 14      950      774      19%            52
 15     4835     3533      27%           445
 16     1814      433      76%            26
 17      758      690       9%            43
 18     1395     1157      17%            85
 19     7476     2399      68%           283
 20     1169     1010      14%           100
 21     5490     3863      30%           390
 22      963      750      22%            55
 23      486      386      21%            23
 24     1758      576      67%            90
 25      690      639       7%            47
 26      403      388       4%             6
 27      601      491      18%            12

Total  61701    35860      42%          4005

I also added a column with how many repeated sections were extracted from each file.

Since each repeat call corresponds to 3 bytes in my implementation, if the call command size can be encoded in 2 bytes it will save an additional 4 kB. It would also be interesting to see if the call command size could be reduced further. E.g. for the most frequently occurring repeats one could use a small jump table and single-byte calls.

This is similar to JP vs JR and CALL vs RST. But this brings us back into the territory of compacting by tweaking the encoding, and with Z80 code optimisation the opportunities for that are limited since the instruction set is fixed.

By santiontanon

Paragon (1283)

santiontanon's picture

09-11-2020, 03:46

Cool! So, just to make sure I understand, this last table already includes the extra space for the "call/ret" (or "jp" in case of tail call), right? 42% is quite amazing!

As you mentioned, compression of actual source code might be much smaller as the programmer already has done part of this optimization work, but still it looks very promising! I was also wondering if it could be turned into a general data compression scheme and how would it compare with pletter/aplib/exomizer in terms of compression rate and decompression speed.

By Grauw

Ascended (9684)

Grauw's picture

09-11-2020, 11:20

Correct. The tail call optimisation and tail jump elimination is responsible for 2193 saved bytes.

Yes this compression is also applicable to byte data; the challenge there is to come up with an efficient encoding of the repeats when the alphabet is 256 entries. What comes to mind initially is a fixed bit-oriented Huffman encoding, or LZEXE’s byte-oriented method of interleaving a literal-or-repeat command stream and a data stream. Compression ratio compared to e.g. Pletter should be tested, it depends a lot on the encoding, but it could be good.

By pgimeno

Champion (274)

pgimeno's picture

09-11-2020, 11:47

santiontanon wrote:

I was also wondering if it could be turned into a general data compression scheme and how would it compare with pletter/aplib/exomizer in terms of compression rate and decompression speed.

The rate would not be so good as in LZ77, it would require more memory, and the speed would be comparable.

LZ77 encodes repetitions of a string as references to the previous appearance of the string. This is done by adding a pointer and a length to the stream. For example, ABCABCDABFCD could be encoded as: ABC, (0, 3), D, (0, 2), F, (5, 2). (The pointers are references to the already decompressed part of the stream, so it may not be immediately obvious that 5 points to where CD starts, but it's easy to see by examining the input string). In practice the pointers are typically encoded as negative offsets from the current position. This scheme only needs to "see" the current character or pair, therefore it can work with a very small input buffer.

On the other hand, this method can only encode string terminations. It can't encode only part of a string previously seen. In the case above, I guess that one possible encoding would be AB, C, call (address of AB), CD, call (address of AB), F, call (address of CD). Another possible encoding would be: ABC, call (address of ABC), DABFCD. Unlike LZ77, it would need the full compressed stream to remain in memory during decompression, so that the RET markers are preserved, as they are lost in the decompressed stream.

By Grauw

Ascended (9684)

Grauw's picture

09-11-2020, 14:31

pgimeno wrote:

The rate would not be so good as in LZ77, it would require more memory, and the speed would be comparable.

I’m not sure I agree with that, from what I’ve read the category of off-line grammar compression is competitive with if not better than LZ77 on-line compression (e.g. see the introduction here). Primarily because it can consider the whole file rather than limited dictionary and look-ahead windows. From what I’ve read this can outweigh the benefit of LZ77’s ability to use overlaps.

After a quick google I found this survey paper which amongst others compares LZ77 to LZ78 and confirms that their performance is comparable, and the best performing one is LZ78-based LZFG. Note that LZ78 falls in the grammar-based compression category, but is an inefficient variant (see comments in section II here).

I think I’ve also kind of seen this empirically in my own tests, since I gained a similar 63% sequence size reduction as zip did on the file size, and zip applies Huffman compression in addition to LZ77. I didn’t apply Huffman encoding on the output due to constraints of my of intended use, but if I had I think I would’ve achieved similar file sizes.

As for requiring more memory, that depends on your perspective. Off-line means that it can’t be streamed so you have to keep the entire compressed data in memory. But this is not an issue in many cases, e.g. when decompressing from ROM to RAM. However, it’s true that when used as a general decompressor which outputs to a RAM buffer, the benefit that it doesn’t need a RAM buffer for the dictionary is lost.

The longest-first context-free grammar for the ABCABCDABFCD example would be:

s -> 11D2FCD
1 -> 2C
2 -> AB

By santiontanon

Paragon (1283)

santiontanon's picture

10-11-2020, 01:05

I'm not sure either. In my experience, each compressor has some biases that favors some data over another, since (at least from what I remember), no compression algorithm has been shown to be optimal in the general case yet (some methods are optimal under some assumptions though). So, as long as we have no optimal compression algorithm (which theoretically sounds like we will never have it), I don't think there's going to be any compressor better than the others for all possible data strings.

So, I was just curious to see if this could result in a new compressor to add to the current set of compressors we have. In my latest game I decided to use two separate ones (aplib and pletter), since each of them compressed better different sets of files, and the savings rom using both outweighed having the source code of both unpackers. So, having more options is always good Smile

By Grauw

Ascended (9684)

Grauw's picture

10-11-2020, 12:35

By the way, I tested Pletter on the set of music files prior to my compaction (60 kB), and the total compressed size added up to 17 kB, where zip only got down to 22 kB. After my compaction (36 kB) they compress to 24 kB and 28 kB respectively. I haven’t verified the output, but under the assumption that it’s correct, its performance on this data set is quite impressive given that it’s presumably a much more simplistic compression scheme.

By pgimeno

Champion (274)

pgimeno's picture

10-11-2020, 15:54

santiontanon wrote:

So, as long as we have no optimal compression algorithm (which theoretically sounds like we will never have it), I don't think there's going to be any compressor better than the others for all possible data strings.

Compression is always relative to a model. There are compression methods that are objectively better than others. For example, arithmetic compression is always better (or equal in extreme case) than Huffman compression for every file; however, the model it is based on (zero-order Markov chains) is very simplistic and not common in real data, but it's been proven that it's the best possible compression for that model.

The question that remains is whether one of the methods discussed (LZ77 and grammar-based) is a subset of the other, as in the case of Huffman and arithmetic (arithmetic is like Huffman, but allowing for fractional bit lengths). I guess each one's performance depends on the details of the coding and on how hard the compressor tries in minimizing the output, and that can favour one kind of files over another. Again, these details influence the compression model that the compressor applies. For example, typical LZ77 compressors tend to favour shorter lengths and offsets, in the sense that they are represented with fewer bits because they are assumed to be more common, but if asked to compress files where that's not the case, the compression will suffer in comparison to a different model. Which ones are more common in practice, is what matters. It's also often the case that some methods are better suited to certain types of files than others.

By pgimeno

Champion (274)

pgimeno's picture

10-11-2020, 16:29

Grauw wrote:

Primarily because it can consider the whole file rather than limited dictionary and look-ahead windows.

I assumed typical sizes for MSX. Pletter seems to be using a 64K window, which I assume is the maximum size of a decompressed stream in MSX. Therefore the window covers the whole file.
 

Grauw wrote:

and zip applies Huffman compression in addition to LZ77.

Yeah, zlib is not really comparable. One like Pletter is what I had in mind. LZEXE uses a 16K sliding window, leaving it out of the table, as uncompressed sizes bigger than 16K are common. Anyway, did you try zlib with compression level 9? AFAIK the only difference is in how hard the compressor tries to find repetitions, therefore using anything other than maximum would be unfair to the comparison of how well LZ77 is able to compress (but its use of Huffman discards it as a fair competitor). Also, better try with gzip, as the zip headers overhead is not negligible for small file sizes, unless you're talking about the compressed size reported when listing the contained files.
 

Grauw wrote:

As for requiring more memory, that depends on your perspective. Off-line means that it can’t be streamed so you have to keep the entire compressed data in memory. But this is not an issue in many cases, e.g. when decompressing from ROM to RAM. However, it’s true that when used as a general decompresser which outputs to a RAM buffer, the benefit that it doesn’t need a RAM buffer for the dictionary is lost.

That was my point, as it's what santiontanon asked about: using it as a general decompresser. I was thinking e.g. using it for decompressing disk streams. You could decompress maybe a full 60K (decompressed size) file with a customized Pletter, without too many problems, leaving ~5K for paging the disk handler and other stuff. I don't think you could do that with a grammar-based approach, if you need to keep the grammar in memory, except in rare cases where it is extremely small, or by paging the grammar in and out, making the decompression speed suffer enormously in comparison.

By Grauw

Ascended (9684)

Grauw's picture

10-11-2020, 22:11

Ah, I interpreted santiontanon’s question about the applicability for general data compression in a bit more limited sense; to compress opaque byte data in the context of compressing game data to fit in a (mega) ROM. So more general than the original use case to size-optimise specific command sequences like assembly code or VGM-like data with procedural abstraction, but more narrow than what you had in mind. For truly general purpose compression it is indeed important that it can compress and decompress on-line with small-ish buffers and in O(n) running time.

If I gzip -9 the uncompacted files it’s 18 kB total. If I gzip the compacted files it’s 24 kB total. So it’s closer to Pletter but I’m still surprised at how competitive it is at least on this data set. Also I didn’t expect Pletter’s dictionary window to be 64 kB, I would’ve thought it was smaller to reduce the size of dictionary references (deflate’s is 32 kB).

Page 25/42
18 | 19 | 20 | 21 | 22 | 23 | 24 | | 26 | 27 | 28 | 29 | 30