Introduction
In this post I discuss how I used Dehydra to do analysis of the entire Tracemonkey tree.
Note
I talk briefly about some Tracemonkey things to motivate what I used Dehydra for, but a knowledge of Tracemonkey is not required to appreciate the main point of this post.
The problem
One of the optimizations in my Tracemonkey inline threading work (Bug 506182), which I will be posting about later, is “PC update elimination”. Doing this requires figuring out which functions out of a certain set can access the JavaScript virtual program counter (which is stored as a variable named “pc” in a class named “JSFrameRegs”). There are about 230 functions in this set, so doing it manually is impractical.
The solution
I used Dehydra to help solve this problem mechanically. Dehydra is a static analysis tool built on top of gcc. It allows the semantic information of a program to be queried with JavaScript scripts.
First steps
By providing a process_function() function in a Dehydra script, I can inspect the variables used and the functions called by every function. Determining whether the pc is used is as simple as seeing if any variable has the name “JSFrameRegs::pc”.
function process_function(f, statements) {
for each (let v in iterate_vars(statements)) {
if (v.name == "JSFrameRegs::pc") {
print(f.name);
break;
}
}
}
The catch
Unfortunately, this isn’t quite what we want. This will tell us which functions directly use the PC, but not which functions can indirectly use it through function calls. Figuring out that requires looking at all the functions called by a given function. This is not straightforward with Dehydra, as functions frequently call functions declared in other files. Dehydra is a gcc plugin and is driven by the normal build system. Thus, it works on a file-by-file basis. Normal builds output an object file for each source file and rely on the linker to stitch it all together. Likewise, to do a whole tree analysis we need to output per-file information and then link it together later.
Collating the data
I reworked my process_function() to determine both whether the PC is directly accessed and the set of functions called directly. This data is then printed with the dump() function (discussed below):
function process_function(f, statements) {
usespc = false;
calls = {};
for each (let v in iterate_vars(statements)) {
if (v.name == "JSFrameRegs::pc") {
usespc = true;
}
if (v.isFunction) {
calls[v.name] = true;
}
}
dump(f.name, usespc, calls);
}
The remaining question is how to structure our output. Outputting it as data declarations for some programming language seems like an easy way to do it. Since I am more comfortable with Python than JavaScript, I output the data as Python code:
function dump(name, usespc, calls) {
s = '@@@';
s += '\t"' + name + '": ({';
for (let f in calls) {
s += '"' + f + '", ';
}
s += '}, ' + (usespc ? "True" : "False") + '),';
print(s);
}
We create a dictionary mapping function names to tuples of the set containing the name of each function called and whether the PC is accessed directly. When doing a build, the output will be intermixed with output from other parts of the build infrastructure, so we tag all of the output lines with ‘@@@’ so that a post-processing shell script can recognize and extract the relevant data.
The relevant bit of the shell script (which is linked below), is:
(echo "callgraph = {";
cd "$DIR" && make -s CXX="$CXX" 2>&1 | grep ": @@@" | cut -d@ -f4-;
echo "}")
This data gathering method could easily be modified to analyze different problems. For simplicity, I extracted only the relevant information and converted it directly to Python data structures. Another approach would be to dump all of the function information (probably in JSON), allowing a later analysis script access to everything.
The analysis
I wrote an analysis script in Python to process the data. The input is a large dictionary, mapping functions to the set of functions they call and whether they touch PC directly.
I viewed the problem in terms of graph theory. The mapping from functions to the functions they call is simply a directed graph. Each function is a node and there is a edge from a function to each function that it calls. In this graph, some nodes (those that touch the PC) are initially marked. We want to color red every node that has a path to one of the initially marked nodes.
Another way to state this is that a node is colored red if and only if it is either initially marked or has an edge to a red node. Doing the coloring is a simple problem. We simply compute the reverse graph (reversing all of the edges) and then perform a depth first search of the reverse graph starting from the marked nodes. Every node we see, we color red.
When doing the DFS, we keep track of which node we first reached a newly colored node from, so that we can determine a path back to an initially marked node.
My implementation also supports providing a predicate to exclude certain nodes from the search, in order to investigate how fixing certain functions to not require the PC would change things.
Caveats
While my method is useful, it is not perfect. It can produce both false positives and false negatives.
False negatives result from polymorphism in the form of virtual functions and function pointers. When a call to such a function is made, there needs to be an edge from the caller to all of the functions that could be called. For virtual functions, this is all functions overriding the virtual function. For function pointers, I think this is every function of the proper type that is used in a way that it not a call at some point. It should be possible to address this problem, but at a significantly increased complexity compared to what I have. Fortunately for me, Tracemonkey does not use virtual classes much and does not use function pointers in places where it matters for my analysis.
False positives are a little trickier, and can’t be worked around in general. Consider the following:
void foo(bool bar) {
if (bar)
// access the PC
}
void baz() { foo(false); }
Calling baz() will not result in the PC being accessed, but my analysis will report that it can access the PC (since foo() can). Knowing whether a function can actually cause the PC to be accessed is isomorphic to the halting problem and thus undecidable.
The code
My code is available here. make_callgraph.sh expects to find g++ and Dehydra under ~/gcc-dehydra/, as suggested in the Dehydra installation instructions. make_callgraph.sh must be run with its output redirected to callgraph.py. search.py contains both the general search algorithm and code that uses it to analyze functions introduced in my inline threading patch.