Performance Profiler - A first attempt

Page 2/9
1 | | 3 | 4 | 5 | 6 | 7

By Grauw

Ascended (10821)

Grauw's picture

10-08-2019, 00:59

I want to change the profiler to let the start and end points to be marked by specifying a <slot, bank, address> triplet which it then registers as a breakpoint. Then the profile start/end points can be set up either by the program during its initialisation, or externally through TCL (e.g. with lookups in a symbol table), and will not affect performance during execution.

Reason is that I have interrupts getting in the way of my measurements, and I prefer not to add those OUTs to mark the start and end points in the ISR (it is timing sensitive due to splits). Also it seems nice if it makes use of the capabilities of the emulator to live completely outside of the program execution.

By Manuel

Ascended (19687)

Manuel's picture

10-08-2019, 00:08

What do you mean with a 'triplet'?

By Grauw

Ascended (10821)

Grauw's picture

10-08-2019, 01:00

MRC swallowed the <, fixed.

By Manuel

Ascended (19687)

Manuel's picture

10-08-2019, 09:02

Well, that makes a lot of sense. This is also how things were intended. Smile

By Grauw

Ascended (10821)

Grauw's picture

25-07-2020, 02:01

I updated my profiler script for openMSX: profile.tcl. You no longer need to instrument the MSX program, instead the profiler is now hooked in from openMSX TCL scripts. To assist with this, you may also be interested in the symbols.tcl script to read symbol files.

Hook up the profiler by setting up openMSX breakpoints and calling profile::section_begin and profile::section_end with a section ID to log the time under. Use the ID “frame” to delimit frames.

There are also a number of convenience functions which set up the breakpoints for you, amongst others profile::section_scope_bp which you can point to a method which starts the section, and it will automatically end the section when the method returns. There is also profile::section_irq_bp which will start and end a section when the Z80 handles an interrupt request. Note that these two rely on openMSX commit 6529495bd which is not in the 0.15.0 release, so you need to get a recent nightly build. Everything else works on the current openMSX 0.15.0 though.

Additional new functions are, you can specify section IDs to profile to limit its output, and both profile and profile_osd now have the ability to show their statistics in units of seconds (s), milliseconds (ms), clock cycles (t) or as a percentage (%). Furthermore you can use profile_break or click on the OSD to advance execution until a specific section is hit.

Last but not least, you can now exclude the time spent in one section from another, so that e.g. interrupts no longer disturb your measurements, or to omit vsync wait loops from your game loop.

A little example:

profile::section_scope_bp frame 0xFD9F  # H.TIMI
profile::section_scope_bp chput 0xA2 { [pc_in_slot 0 0] }
profile::section_irq_bp int
profile::section_exclude int chput
profile_osd t

This marks a new frame when H.TIMI is called (on the vblank interrupt), and it adds a measurement section for calls to the BIOS CHPUT routine. Another section is added for the ISR, and it is excluded from the time spent in CHPUT. Lastly we show the OSD with clock cycle (t) stats.

From this little test profile we can see exactly how long the ISR takes and how much time is spent in the H.TIMI hook. Additionally we can tell that MSX-BASIC rarely calls CHPUT, but MSX-DOS does.

See this code snippet in tiletile for a more elaborate example.

By Grauw

Ascended (10821)

Grauw's picture

25-02-2020, 00:41

By the way, I removed the Z80 interface entirely, if you want it back then simply put something like these two watchpoints in your project scripts. It was removed because I have so far found the TCL-only set-up more powerful and more convenient so I didn’t see value in keeping it around. But just putting it out there in case anyone’s missing the old style of profiling.

If I were to re-add the Z80 interface today, I would do it slightly different, making it interface with profile::section_begin_bp & co. rather than profile::section_begin, and pass in the various arguments needed via the registers. Then the Z80 program can set-up the breakpoints in its initialisation, and doesn’t incur a run-time cost which the old approach did. The benefit of the Z80 interface would be that it’s easy to determine where everything is located (address, slot, mapper page), though I think it’s not very difficult to do the same in openMSX script either.

About the concept of the <slot, bank, address> triplet I mentioned earlier, that’s not in there although I still think it’s a nice idea, currently you can just pass in a condition to the *_bp methods to achieve this just like you do in openMSX debug set_bp. Actually I was thinking it would be nice if you could set openMSX breakpoints directly on debuggables (so physical memory locations rather than logical locations), however that’s not how things work at the moment.

What I like about this new iteration of the profiler is that 1. it lets me get very precise numbers which are not affected by interrupt handling (my primary motivation for the changes), and 2. it is very easy to profile a method now without touching the code at all, even just quickly from the openMSX console, and without having to worry about multiple return paths (with the scope_bp).

By Grauw

Ascended (10821)

Grauw's picture

25-07-2020, 02:44

Grauw wrote:

I updated my profiler script for openMSX: profile.tcl.

Added VDP command execution profiling thanks to this openMSX addition by wouterv_.

By Grauw

Ascended (10821)

Grauw's picture

10-09-2019, 22:12

Sneak preview (wip):

Timeline chart!

By theNestruo

Champion (430)

theNestruo's picture

11-09-2019, 20:04

Grauw wrote:

Sneak preview (wip):
Timeline chart!

I want that!!! Big smile

By jltursan

Prophet (2619)

jltursan's picture

11-09-2019, 21:42

I've already seen it in your video Smile, looks impressive and a great tool to fine tune the code. Good work!

Page 2/9
1 | | 3 | 4 | 5 | 6 | 7