Execution tracing for call graph generation

Page 2/3
1 | | 3

By jblang

Rookie (21)

jblang's picture

10-02-2019, 00:56

It is mostly a matter of familiarity since I haven't spent as long using the openMSX debugger as I have on blueMSX. I will spend some more time using it and let you know.

Initially I thought I couldn't resize the individual views (e.g., code view, registers, stack, memory, etc.). When I maximized the main debug window, the code view gets very big (way wider than could ever be useful), and the rest of the views stay the same size. If anything, it would make sense for the memory view to use most of the space, since it has the most information to display. However, I just discovered that I can drag the different components out of the main window and then resize them, so that solves my main complaint. On a related note though, I can't figure out how to put the view back inside the main window once I have removed it.

A nice feature of blueMSX's debugger that I miss in openMSX is the header on the memory view that shows the address of the least significant byte in each column. I.e., there are 16 bytes 0-F in each row, and blueMSX has a header on each column that shows that. It makes it much easier to eyeball rather than having to count which byte you're looking at.

My impression of openMSX after using it for a while is it is more powerful than blueMSX but it has a much higher learning curve. See for example, the trouble I had figuring out which machine to use. In blueMSX, I just fired it up, picked the disk image, and it worked. I didn't have to figure out the specific machine to use. Similarly, on blueMSX debugger, it is obvious how to resize the views because they look like normal windows; openMSX, not as much. My overall advice for improving openMSX, not just the debugger, would be to try to look at it through the eyes of someone who has never used it before and think about what might not be obvious to them. Spend some time using blueMSX and think about things it does differently that might be more user friendly.

By jblang

Rookie (21)

jblang's picture

10-02-2019, 05:14

Here is the most recent update to the TCL script:

namespace eval call_graph {

set_help_text call_graph \
{Builds a call graph of functions used in the running code.}

variable log_file

proc log_call {} {
	variable log_file

	if {![info exists log_file]} return
	set addr [reg PC]
    set instr [peek $addr]
    # call instruction
    if {$instr == 0xCD || $instr & 0xC7 == 0xC4} {
        set dest [peek16 [expr {$addr + 1}]]
        puts $log_file [format "call %04x %04x" $addr $dest]
    }
}

proc call_graph {filename} {
	variable log_file
	set log_file [open $filename {WRONLY TRUNC CREAT}]
	debug set_condition 1 {call_graph::log_call}
	return ""
}

namespace export call_graph

} ;# namespace call_log

namespace import call_graph::*

I also wrote a python script to convert the call log to a graphviz file:

#!/usr/bin/env python

import fileinput
from collections import defaultdict
from bisect import bisect

subs = defaultdict(int)
calls = defaultdict(int)
labels = {}

subs[100] = 1

for line in fileinput.input("labels"):
    label, addr = line.split()
    addr = int(addr, 16)
    labels[addr] = label

for line in fileinput.input():
    ins, src, dst = line.split()
    src = int(src, 16)
    dst = int(dst, 16)
    if ins == "call":
        subs[dst] += 1
        calls[(src, dst)] += 1

subsorted = sorted(subs.keys())

subcalls = defaultdict(int)

for pair, count in calls.items():
    src, dst = pair
    i = bisect(subsorted, src)
    if i > 0:
        subcalls[(subsorted[i-1], dst)] += count
    else:
        subcalls[("unknown", dst)] += count

print "digraph G {"
for pair, count in subcalls.items():
    src, dst = pair
    if src in labels:
        src = labels[src]
    else:
        src = "%04x" % src
    if dst in labels:
        dst = labels[dst]
    else:
        dst = "%04x" % dst
    print '\t"%s" -> "%s" [label="%s"]' % (src, dst, count)
print "}"

And then I generated a picture using graphviz:

dot -Tpng calls.gv -o calls.png

It is not perfect yet, but it is a start...

By jblang

Rookie (21)

jblang's picture

10-02-2019, 18:19

Last night, I found z80dismblr, which does almost everything I want. It disassembles code, generates call graphs, and tracks data locations in memory. It does most of this through static analysis but it does use trace files for hints about what areas of memory are code and data. It takes a MAME trace file as input, which has the format:

FD19: jp   (hl)

Basically, just addr: instr. z80dismblr really only cares about 2 things in this file: the list of addresses that are executed, and the target of indirect jumps--i.e., jp (hl) instructions. So I've modified my script to only output what it wants in exactly the format it wants:

namespace eval mame_trace {

set_help_text mame_trace \
{Outputs a MAME cpu trace file.}

variable log_file

proc log_instr {} {
	variable log_file

	if {![info exists log_file]} return
	set address [reg PC]
	set instr [peek $address]
	if {$instr == 0xE9} {
    		puts $log_file [format "%04X: jp   (hl)" $address]
	} else {
    		puts $log_file [format "%04X:" $address]
	}
}

proc mame_trace {filename} {
	variable log_file
	set log_file [open $filename {WRONLY TRUNC CREAT}]
	debug set_condition 1 {mame_trace::log_instr}
	return ""
}

namespace export mame_trace

} ;# namespace call_log

namespace import mame_trace::*

For jp (hl), it outputs "ADDR: jp (hl)", with exactly 3 spaces between jp and (hl), the way the disassembler expects it. For all other instructions, it just outputs "ADDR:", since the disassembler doesn't really care about the instruction, only the address.

This works, but it's really inefficient. The trace from the entire demo's runtime is over 500MB, and very much of this data is redundant. It would be more useful to just output the unique addresses that code execucted from. Instead of just the addresses, it would be good to output the address with the number of times the address was executed. It would be nice to also track the number of times an address is read or written as well to identify frequently accessed variables. To hint the disassembler about indirect jumps, it would also need to output a list of unique source/destination pairs for each indirect jumps. The disassembler could be modified pretty easily to read this new format.

I am going to try to do this using a TCL script with associative arrays later, and I will report how it works.

However, I have no doubt that this could be done much more efficiently in the C code of the emulator without slowing down the emulation nearly as much. Is this something the emulator already tracks, or that could be added pretty easily? In C, it would only take 1.5MB of data to track reads, writes, and executions in 3 different arrays of 64-bit ints containing one int for each address in the 64-KB address space. Each int would contain the count of times that location had been read/written/executed. Actually when slots are taken into account, it would be multiplied a few times, but still manageable, I think.

By jblang

Rookie (21)

jblang's picture

10-02-2019, 18:13

PS: should we start a new thread for this? Or maybe a moderator can rename the thread. After the first few posts, it's not really about which hardware setting and ROM to use for a European MSX1 with disk support.

By hamlet

Scribe (2278)

hamlet's picture

10-02-2019, 21:28

@jblang
I have, at your request, renamed the thread to do more justice to its content.

By jblang

Rookie (21)

jblang's picture

10-02-2019, 22:09

Thanks, hamlet!

By yzi

Champion (441)

yzi's picture

10-02-2019, 23:14

500 megs of trace feels like overkill, when there are only 64k possible memory locations. Using a 32 bit counter for every memory location would be only 256kB of data. I could use a script that counts instruction execution counts and clock cycles, and prints disassembly, with each instruction line commented with the total number of emulated CPU clock cycles spent on executing that line. I once tried making something like this, but I don't remember how far I got with it.

By jblang

Rookie (21)

jblang's picture

11-02-2019, 02:25

Here's an updated version that only outputs unique addresses and counts. The largest it got when running the Bold demo is 70KB, and the largest it could theoretically get is around 1MB or so. There are separate hist_start, hist_stop, and hist_reset commands to start and stop recording histograms, and to reset them. There are also hist_save_exec and hist_save_ijmp commands to dump the execution and indirect jump histograms to a file.

namespace eval histogram {

set_help_text histogram \
{Generate histograms for execution, read, and write pcesses.}

variable exec_cond
variable exec
variable ijmp

proc update_exec {} {
	variable exec
	variable ijmp
	set pc [reg PC]
	set hl [reg HL]
	set opcode [peek $pc]
	if {[info exists exec($pc)]} {
		incr exec($pc)
	} else {
		set exec($pc) 1
	}
	# indirect jump
	if {$opcode == 0xE9} {
		if {[info exists ijmp($pc,$hl)]} {
	    	incr ijmp($pc,$hl)
		} else {
			set ijmp($pc,$hl) 1
		}
	}
}

proc hist_reset {} {
	variable exec
	variable ijmp
	array unset exec
	array unset ijmp
}

proc hist_start {} {
	variable exec_cond
	set exec_cond [debug set_condition 1 {histogram::update_exec}]
	return ""
}

proc hist_stop {} {
	variable exec_cond
	catch {debug remove_condition $exec_cond}
	return ""
}

proc hist_save_exec {filename} {
	variable exec
	set file [open $filename {WRONLY TRUNC CREAT}]
	foreach addr [lsort [array names exec]] {
		puts $file [format "%04X: %d" $addr $exec($addr)]
	}
}

proc hist_save_ijmp {filename} {
	variable ijmp
	set file [open $filename {WRONLY TRUNC CREAT}]
	foreach pair [lsort [array names ijmp]] {
		set fields [split $pair ","]
		lassign $fields src dst
		puts $file [format "%04X %04X: %d" $src $dst $ijmp($pair)]
	}
}

namespace export hist_start
namespace export hist_stop
namespace export hist_reset
namespace export hist_save_exec
namespace export hist_save_ijmp

} ;# namespace call_log

namespace import histogram::*

This is only minimally tested, and not guaranteed to be bug-free. And I still need to update the disassembler to accept this new format.

By Sandy Brand

Master (142)

Sandy Brand's picture

11-02-2019, 17:25

Oh that is very interesting! Smile

I attempted to build a profiler quite some time ago, but it required injecting specific pieces of code to mark the start and end of a function call. Upside is that, with a little tool, you can easily control where the profile markers are added and what meta information they should have (e.g.: interrupt handler vs 'normal' execution) and that it also works if code is on 'hidden' memory pages. Downside is of course that it influences the measured results and makes the binaries bigger.

This thread may be of interest to you as well, it has a couple of ideas and suggestions from various people on this forum: https://www.msx.org/forum/msx-talk/openmsx/performance-profiler-first-attempt

Do you have any performance issues using larger TCL sctips, btw? I found that, while my approach sort of worked, such a heavy use of TCL scripting and writing data to output ports had a tendency to lock-up the emulator.

By jblang

Rookie (21)

jblang's picture

12-02-2019, 01:04

My goal here is to reverse-engineer a program, so the requirement to add additional code would not really work for my scenario. I will check out the thread you suggested.

Enabling the condition slows the emulator down a lot. The frame rate lowers significantly and looks like there are dropped frames between. However, I haven't noticed that the slow down is particularly correlated with the complexity of the TCL script. Enabling any condition seems to have the same effect.

Page 2/3
1 | | 3