Assembler Optimizer

Page 24/24
17 | 18 | 19 | 20 | 21 | 22 | 23 |

By Grauw

Ascended (9334)

Grauw's picture

12-10-2020, 01:20

Uninteresting wrote:

I believe that well over ten years ago suffix arrays were becoming more preferable to suffix trees in some fields like bioinformatics for reasons like lower memory usage. They represent mostly the same thing, but I tried googling for suffix arrays in relation to compilers and found nothing of interest. Maybe you'd have better luck, but I suppose suffix trees are easier abstractions to handle.

Implementing algorithms from the original articles tends to be the harder option than checking some online lecture notes. I think the latter was the way I used when I implemented Ukkonen's algorithm in my previous life. (Plus I once implemented one algorithm from the original mid-80s article, only to find the algorithm didn't work and a fix was released in a later issue of the journal.)

Hehe well I managed to make my way through. But thanks for the tip, I’ll broaden my search a bit Smile. I notice finding the right keywords (like “procedural abstraction” and “code compaction” rather than “compression”) also makes a lot of difference for what you can find. Amongst others I found a survey of different techniques and a thesis about code compaction (not read yet), which seemed easier to read than papers like Ukkonen’s, so there are also differences there.

I think indeed if lowering memory use of the compressor was a goal, suffix arrays would be interesting. I’m sure they see many applications in databases, search engines, DNA analysis, etc.

However since I’m not working with big data, I haven’t looked into them much since it seemed to me working with the more explicit tree structure is more convenient. Just like working with a tree structure is generally more convenient than a tree postfix-encoded as a sequence. (Though I found several articles about detecting repeats in ordered trees by postfix encoding them and feeding them into a suffix tree.)

As long as the algorithms are O(n) or O(n log n) it’s probably good enough for my purposes.

Btw I found a project called Squeeze, referenced in a paper, which do code compaction.

https://www2.cs.arizona.edu/projects/squeeze/

By santiontanon

Paragon (1092)

santiontanon's picture

12-10-2020, 07:28

Oh wow, didn't realize there's so much literature on the topic! In one of the papers on the squeeze project they claim 30% reduction in code size in average, which is quite impressive! (MDL gets 1% at best!). I suspect savings from binaries directly written in assembler will be lower, as there'll be much less boiler plate code than in those generated from C/C++, but still!

By Grauw

Ascended (9334)

Grauw's picture

17-10-2020, 01:23

I implemented a first version which just does a single repetition elimination pass on the “stream of commands” type music files generated by my gngplay project. The repeat finding is still very basic, and also not optimised though it runs pretty quick. But I wanted to share my first results of running it on the Ghosts ’n Goblins music files:

Track Original Compressed Reduction
 01      132       73      45%
 02      657      471      28%
 03      286      206      28%
 04     2052     1355      34%
 05     1863      923      50%
 06      486      329      32%
 07      413      249      40%
 08     1817     1088      40%
 09      763      513      33%
 10     2130     1206      43%
 11     2742     1707      38%
 12     1214      704      42%
 13     3449     2131      38%
 14      392      304      22%
 15     2269     1595      30%
 16      831      471      43%
 17      304      206      32%
 18      653      491      25%
 19     3782     2076      45%
 20      519      397      24%
 21     1722     1284      25%
 22      473      339      28%
 23      209      169      19%
 24      890      465      48%
 25      272      230      15%
 26       80       70      12%
 27      170      108      36%

Total  30570    19161      37%

Note:

  1. The sizes are expressed as nr. of commands
  2. Call / return command overhead for repeats is not counted
  3. It goes only 1 level deep, repeats inside repeats are not found

So although I did not count certain factors contributing to final file size, there’s also a lot of missed opportunity still. I also haven’t iterated on various strategies to improve the compression, like different pattern search ordering or reverse prefix trees.

Results do not translate to assembler files already procedurally abstracted by the programmer.

By Grauw

Ascended (9334)

Grauw's picture

17-10-2020, 02:40

I note from the above table that generally speaking the size reduction improves as the files get bigger. So I concatenated the data of all the tracks together and it gets a little smaller; 30570 -> 18216 commands, a reduction of 40%. Mh. Longer music means more repetition I guess, and the tracks don’t share too much between each other.

My current implementation functionally matches the “LFS” algorithm in the paper Simple Linear-Time Off-Line Text Compression by Longest-First Substitution. They also describe an improvement called “LFS2” which is what I describe in my third point above. For their test corpus they get a 54% size reduction for LFS and 80% (another 55%) for LFS2.

So if I extrapolate this to my results, if the size reduction for the music files is 37%, for the LFS2 algorithm I expect an additional size reduction of about 37%, or a 60% reduction in total. If I look at the sizes of the repeats found, several large ones that are internally still uncompressed, such a further reduction would be in line with my expectations.

p.s. The MFFS (Re-pair) method it mentions works bottom-up from shortest rather than longest matches, see here (chapter 7). It does not use a suffix tree but is its own stand-alone algorithm, so it may be simpler to implement with comparable results.

By santiontanon

Paragon (1092)

santiontanon's picture

18-10-2020, 02:52

This is very interesting! Thanks for the update Grauw! I have not yet had a chance to read through the algorithm in detail, but from the explanations you've been putting here, it seems very related to adaptive dictionary compression methods like LZW, right? trying to find common subsequences. So, in addition to using it for compressing code, I wonder if this could also be used as a compression algorithm for data rather than commands.

Also, I'm curious to see how would the call/ret overhead play with the numbers. However, even if the compression goes down from 35% to 20% or even 10% when including that overhead, that's still much better than the current levels of compression MDL can do, which I'd say are around the single digit % number. So, this is all very promising!

By Grauw

Ascended (9334)

Grauw's picture

18-10-2020, 05:46

Yeah LZ78 (which LZW is based on) is related, although from what I understand it’s less optimal than LFS2 or Re-pair because the dictionary itself contains duplication due to the way it is constructed (it is not irreducible).

These off-line dictionary compression techniques (which form a context-free grammar) apply to any type of data, but due to their hierarchical nature they are also suitable for code. They do not need a decompression buffer and can be processed straight from ROM, while on-line LZ77-based algorithms require RAM because the dictionary is built on the fly.

Some ret overhead and call stack depth can be reduced by eliminating tail recursion. Ordering subroutines such that they follow a tail recursion allows you to eliminate tail jumps as well. The opportunities for tail call optimisation might be improved by reversing the input before compression.

About the percentages, note that my input data contains ample repetition, while program code already has repetitions removed by the programmer as subroutines, plus repeats that conditionally branch or unbalance the call stack must be split or rejected. Nevertheless I think typical code still contains repetitions in a bunch of places.

p.s. Given a program execution trace, suffix trees can also be used to identify hot code paths.

Page 24/24
17 | 18 | 19 | 20 | 21 | 22 | 23 |