If you are looking for something to develop and tweak single ASM *routines* (e.g. multiplication or unpackers), did you give Z80Runner a try? It's a testbed for developing assembly routines, it's an interactive debugger for ASM code that shows you each instruction's cyclce and can measure the actual execution time of whole subroutines, depending on input parameters. And when done, you would take the routine back to your production code or library.
You edit the code with your favorite editor, and the tool reloads the ASM file whenever you save your changes, ready to run.
It doesn't emulate other hardware like graphics or sound. Also, it does use the standard "Zilog" instruction notification, due to its back-end assembler (from the Z88DK package), but it's not that hard to convert the SDCC notification to Zilog with regular expressions. For example: To convert all the index registers, you can use the following regular expression in Notepad++ for search & replace:
Find:
(\-?[1-9]\d{0,1}) \((i[x|y])\)
Replace with:
\(\2+\1\)
Z80Runner shows you errors and line numbers in a live window, so you can quickly walk through the list in your favorite editor, save, check the error window, rinse and repeat.
This is an example to test a (slow) 16*8 multiplication routine (not recommended in your production code!):
;Mul_16_8.asm org 0x100 ld de,123 ld b,8 call Mul_16_8_Slow ld de,8 ld b,123 call Mul_16_8_Slow nop nop nop ; Multiplication: HL = DE * C Mul_16_8_Slow: ld hl,0 Loop: add hl,de djnz Loop ret
The code window shows you the instruction times (and hex output):
Cycles Opcodes Command Mul_16_8_Slow: 10 21 00 00 ld hl,0 Loop: 11 19 add hl,de 8/13 10 FD djnz Loop 10 C9 ret
When you mark all the lines of the routine, it shows you a summary (here: 7 bytes, 39-44 cycles), but the real power comes when executing the code and stepping over the calls. The first call (DE=123, B=8) reveals an execution time of 207+17 cycles (17 = the cost for the call), and the second call (DE=8, B=123) routine runs 2967+17 cycles.
This is a very simple and obvious example, but if gives you an exact idea where the actual cycles are burned.
For more complex things like decompression routines, you can also load binary data from files to specific memory locations, and whenever you start over after a code change, all the memory is reinitialized with data from these files.
Specifically: If you are planning to develop hand-written assembly routines or tweak the assembly compiler output, or even compare how much you can do better than the compiler - interactively! - this might be the tool for you.
Sounds like an interesting tool mi-chi! btw, any chance of a unix build (Linux or Mac)? or at least a 64bit Windows one? the current binary is a 32bit Windows binary, and some modern 64bit unix distributions would not run it even with wine unless it's a 64bit binary.
Also, @aoineko, I improved the function detection heuristics, and now it detects all the functions in your code! The latest development version can be found here: https://github.com/santiontanon/mdlz80optimizer/releases/tag...
with this version, on the example code you shared, it produces this output:
source file (.function name) self size total size accum t-states ../MSX/others/sdcc/aoineko.asm 2392 2392 15753/15598 ../MSX/others/sdcc/aoineko.asm._GamePawn_Initialize 146 1029/1019 ../MSX/others/sdcc/aoineko.asm._GamePawn_SetPosition 54 349 ../MSX/others/sdcc/aoineko.asm._GamePawn_SetAction 59 373/368 ../MSX/others/sdcc/aoineko.asm._GamePawn_Update 1713 11208/11088 ../MSX/others/sdcc/aoineko.asm._GamePawn_Draw 331 2221/2201 ../MSX/others/sdcc/aoineko.asm._GamePawn_SetTargetPosition 34 221 ../MSX/others/sdcc/aoineko.asm._GamePawn_InitializePhysics 52 352
And, of course, those times are just the sum of all the assembler instructions (not actual execution time, for which you'll need an actual emulator-based tool, like mi-chi's, or measuring it in openMSX). Also, some functions have two numbers, e.g. "2221/2201", because of instructions like conditional jumps, etc. that can have different duration depending on the condition. So, you should read as [upper-bound]/[lower-bound]. When upper/lower-bounds are the same, I just show a single number for simplicity.
This work perfectly.
For an offline tool, this is the best we can get for testing code optimization.
Thank you santiontanon.
no problem! I am glad it is useful
About the Linux and 64 bit question:
As converting the application to Linux will take some time (Qt seems to be a good candidate for porting it, but will have to spend some time to re-learn the designer and slots and signals), and as I'm busy with another project, I won't be able to convert it any time soon.
That said, I'm not a Linux expert, but I just installed a fresh downloaded Ubuntu 20.x (64 bit) in a Virtual Box, updated Wine and after a bit of fiddling with getting the VC 2010 runtimes installed, I could launch Z80Runner as a 32 bit EXE.
For testing, I built a 64 bit vesion, but needed to install the VC 2010 (64 bit) as well. Anyway, both versions did run in the end.
About what you said: Are there plans to drop 32 bit support in Wine?
Also, can you tell more details on what you tried and what exactly is failing on your site? As I could get it to run on my system, I'm sure there is a way to get it running on your system.
I believe Santi is running macOS, and since macOS 10.15 (Catalina) running 32-bit binaries is no longer supported. Especially not on new Macs with the M1 ARM processors which rely on Rosetta to interpret x86-64 instructions only. This also extends to binaries run with Wine, since Wine Is Not an Emulator .
Here is a link to a download of the 64 bit version.
Indeed, I'm using an M1 machine, with 64-bit support only. It's a bit of a pain, but 6x faster build times for projects than my previous intel machine is definitively worth this little pain
Thanks a lot mi-chi! I just downloaded it, and now wine can run this! I'm still missing some .dll file (I think it's some visual studio run time dll that is present in Windows machines), but that will be easy to get. I'll play with this tomorrow, thanks a lot for the build!
Santi, besides the aforementioned VC 2010 runtime (MFC100U.DLL), which can be installed with "winetricks" and run with the param "vc2010" (and probably run it once with "--self-update" if that checksum error pops up), you will still need to make the external assembler (Z88DK) working, which the tool uses to assemble the code. I know that its sources are available, but have never attemped to built it.
That said, thinking about porting my project to Qt could be a chance to integrate an assembler. I saw that your tool understands all kinds of ASM dialects, which would be tempting, especially when working with SDCC's really weird index register and literal number syntax, and given that this is a primary target for optimizations. Would that parser allow to get the actual address and bytes emitted per assembly-line? That's what Z80Runner requires to get started on a source. And that's what the external assembler is required for.
Indeed, I'm using an M1 machine, with 64-bit support only. It's a bit of a pain, but 6x faster build times for projects than my previous intel machine is definitively worth this little pain
I’ve seen those differences in performance comparisons. I think the LLVM ARM compiler is more efficient because ARM has a much more regular instruction set. It would be interesting to see a performance comparison cross-compiling to x86-64. But that won’t give as impressive numbers of course, so Youtubers can’t make a nice clickbait headline for it. Anyway, off-topic .