Assembler Optimizer

Page 17/18
10 | 11 | 12 | 13 | 14 | 15 | 16 | | 18

By santiontanon

Paladin (1013)

santiontanon's picture

01-08-2020, 20:02

Ok!! Thanks for the link mzoran!! Added to my todo list Smile

By santiontanon

Paladin (1013)

santiontanon's picture

02-08-2020, 07:40

U just pushed an updated version (v0.8) of MDL to GitHub with many updates: https://github.com/santiontanon/mdlz80optimizer/releases/tag...

This time, it's mainly better support for different assembler syntax (better sjasm and glass, and initial support for pasmo and winAPE). I cannot guarantee it working 100% with all projects, but I have personally tried it, and it works on the following projects for each supported dialect:
- Glass: VGMPlay, xspelunker, Menace from Triton
- asMSX: wrally, pong, mine, robots, g-monkey
- sjasm: Uridium, Deep Dungeon Adventure
- pasmo: FlappyBird
- winAPE: Subcommander

I am testing it with as many projects as I can find online. If you have a project online in which you want to try MDL and it does not work, please let me know, and I'll investigate! :)

The optimizer is also a bit better in this new version (although I mainly focused on parser support this week). So, we can now have patterns that contain repetitions like this:

pattern: Replace dec a; ... dec a with add a,-?const
0: [?const] dec a
replacement:
0: add a,-?const
constraints:
equal(?const >= 2, -1) 

This pattern will replace a sequence of "dec a; ...; dec a" by "add a,-n" where n is the number of "dec a", but only if there is at least 2 of them. There's a few patterns of this style now, that caught a few cases in a few of the projects above. For example, an interesting one is this:

pattern: Replace inc ?regixiy; ... inc ?regixiy with push bc; ld bc,?const; add ?regixiy,bc; pop bc
0: [?const] inc ?regixiy
replacement:
0: push bc
1: ld bc,?const
2: add ?regixiy,bc
3: pop bc
constraints:
in(?regixiy,IX,IY)
equal(?const >= 5, -1) 

That replaces a series of of inc ix or inc iy, by a direct addition via bc, if there's at least 5 inc in a row (which is when it starts to pay off).

Anyway, I hope sjasm support is better now! But please do let me know if there is still some syntax that I got wrong. I have never used sjasm myself, so I am learning all of its (very complex) syntax as I encounter it in the projects I use for testing!

By ARTRAG

Enlighted (6396)

ARTRAG's picture

02-08-2020, 10:22

The [] sintax for repeated instructions is specific to sjasm
Do the above optimizations hold also for other dialects?

About the last pattern, if bc is unused you can skip push/pop and make it hold for 3 increments
And if de is unused you can use it instead of bc

How do you deal with the case you have 2 or 3 valid patterns?
Do you choose the shortest one or just the fist that matches the replacement rule?

I mean that if de is unused and bc is, you should use the pattern with de without push/pop

By santiontanon

Paladin (1013)

santiontanon's picture

02-08-2020, 17:23

Oh, I use the sjasm [] syntax I the patterns because I already had it in the parser. But it does not matter if the target assembler doesn't support it. As long as there is a sequence of, say, "inc ix", in a row (specified in anyway, either manually repeating, with a [], with a "repeat", or "rept", etc.), it will match!

As for pattern ordering. I don't know what would be best! Right now it applies the patterns in the order in which they are in the pattern definition file. So, the main loop is:

for line_idx in 0 to number_of_lines_in_the_code:
   for each pattern:
     if pattern applies to line_idx:
        apply pattern 
        line_idx -- (in case after applying this pattern, now a new pattern applies to the same line)

But I wonder if I should give them some "priority", or sort by the amount of savings. I haven't yet thought about it! Smile

And good point about the unused bc! Making a note!

By ARTRAG

Enlighted (6396)

ARTRAG's picture

03-08-2020, 12:51

If more patterns can apply you should choose the one that gives the best result

I mean that in the code above you can use bc or de, if bc is used, you must include push/pop, but if de is unused you should use the pattern with de without push/pop

With respect to the specific case, you have 4 patterns, two with de (with and without push), two with bc (with and without push)

The optimizer should select the best out of 4 possibilities

By thegeps

Champion (474)

thegeps's picture

03-08-2020, 13:09

Well I think he can set (maybe in future) a variable called "priority" assigned to patterns. When multiple patterns will fit the same condition the choice can be done by priority value

By mzoran

Resident (57)

mzoran's picture

03-08-2020, 13:14

Thank you for adapting the optimizer on WinAPE syntax. I ran the pattern matching one and yes, plenty of call+ret stuff, forgotten instructions that can be removed (obviously left after various code editing), some ld optimizations, ...
After studying the results I have a few suggestions/questions:
1. I believe that options
-warn-off-unofficial -warn-off-labelnocolon
should be default. I see no reason an optimizer should police how we prefer to write our code. If the native compiler can live with something like add a,7 there is no reason to suggest changing it to add 7 by default
Same goes for equ lines like
label equ
2. Since this code is my first Z80 project I did not dare use unofficial opcodes and used only stuff documented in the Zilog books. This brings me to how you treat undocumented opcodes ? Since in step 1 I can see you are checking syntax, do you also warn about the use of such instructions ?
3. It seems that at this point you suggest a change only if there is some memory saved. I have half expected that something like
push bc
pop iy
would be a candidate for a replacement with undocumented ld instructions since it saves cycles .
Of course this also depends on question 2, will you ever suggest using unofficial opcodes ?
4. You should start thinking about defining and recognizing some directives in the form of comments in the code about turning on/off optimizations.
I did get a lot of suggestions to replace something like
ld d, n
ld e, m
with a single line.
That is very true but that code is just like that on purpose to make understanding and editing easier.
If one could just turn off and then on optimization after a certain block it would be nice.
5. I am not sure suggestions on replacing 16-bit LD with 8-bit one when only lower part has changed is a good idea when doing assignments based on defined label/equ. For example
PLAYER1_STRUCT_LOCATION EQU &C000
PLAYER2_STRUCT_LOCATION EQU &C0FF
...
LD HL, PLAYER1_STRUCT_LOCATION
....
LD HL, PLAYER2_STRUCT_LOCATION
is impractical/weird to change to
LD L, lower byte of (PLAYER2_STRUCT_LOCATION)
I mean yes, this is true, but perhaps an optional suggestion ?

If something has been answered already, my apologies as this thread has grown significantly.
Excellent work !

By theNestruo

Master (155)

theNestruo's picture

03-08-2020, 15:11

I insist that having mutiple pattern files, and beign able to decide which ones to use -and their order- is a simple solution that would solve many problems at the same time:

  • po-default.txt: unambiguos safe patterns
  • po-speed.txt: patterns that can be optimized for speed
  • po-size.txt: patterns that can be optimized for size
  • po-undoc-speed.txt: patterns that can be optimized for speed using undocumented instructions
  • po-undoc-size.txt: patterns that can be optimized for size using undocumented instructions
  • etc

So you can type -po for the default settings (that may be -po default,size,speed), go for just the basics (-po default), use the full speed package with -po undoc-speed,speed,default, or even choose any weird combination your project requires, such as -po size,undoc-speed,undoc-size...

This approach solves disabling specific sets of patterns, solves priorities (each pattern should appear at most once on every file), and provides an easy and homogeneous extension point (no more flags needed if more optimization sets appear... and may be even support user-provided files).

By ARTRAG

Enlighted (6396)

ARTRAG's picture

03-08-2020, 15:41

I think that maintaining all those lists is a work itself... I would only support size and speed.
Undocumented instructions are now a day documented, there is no reason to exclude them

Moreover having multiple files it does not solve the fact that you have one code sequence that can match multiple patterns. Consider this

pattern:
0: [?const] inc ?regixiy

it can have:
replacement a):
0: push bc
1: ld bc,?const
2: add ?regixiy,bc
3: pop bc
constraints:
in(?regixiy,IX,IY)
equal(?const >= 5, -1)

replacement b):
0: push de
1: ld bc,?const
2: add ?regixiy,de
3: pop de
constraints:
in(?regixiy,IX,IY)
equal(?const >= 5, -1)

replacement c):
0: ld bc,?const
1: add ?regixiy,bc
constraints:
in(?regixiy,IX,IY)
equal(?const >= 3, -1, bc unused)

replacement d):
0: ld de,?const
1: add ?regixiy,de
constraints:
in(?regixiy,IX,IY)
equal(?const >= 3, -1, de unused)

By ARTRAG

Enlighted (6396)

ARTRAG's picture

03-08-2020, 16:25

The selection should be guided by testing on bc and de being used or unused
The slow version can be selected if bc and de are both used, while the fast version if there is or bc or de unused
But I don't see a simple expression to test what is the best replacement

Page 17/18
10 | 11 | 12 | 13 | 14 | 15 | 16 | | 18