Trace and replay algorithms < Internals

Tracing and Replaying algorithms

Algorithm 1: Saving Control Flow

Necessity: Consider a scenario where the user is replaying a program, currently at a statement just after an if-then-else statement. If he requests a backward step, the system must be able to determine whether to go to the then part or the else part. Or consider another scenario where the user is at the first statement in a loop block, and requests a backward step. The system must know whether to go to the last statement of the loop's previous iteration, or to go to the statement just before the loop.

Therefore the system must know the control flow of the program—the path taken through the program at trace time should be known at replay time.

Definitions

The tracing problem: To run a program and save its control flow.

Basic block: A sequence of statements that has no branch statements except possibly at the end of the block, and has no branch targets except possibly at the beginning of the block. In other words, it is a sequence of instructions that execute in a straight line. If we know that any of the statements in a basic block has executed, then we know that all the statements have executed (unless the program crashes).

Control Flow Graph: A program expressed as a directed graph in which the nodes are basic blocks and the edges are jumps or fall-throughs.

Marker: A marker is a unique id associated with a control flow graph edge. If control flows through that edge during tracing, then that id is written into a file called a trace file.

The Algorithm:

A simple algorithm for saving control flow is to place markers on every control flow graph edge. Then we will easily know the control flow.

The problem with such an algorithm is that there are too many markers being written. Every marker adds an overhead to the execution time of the program. So, we would like to minimize the number of markers written into the file while still writing enough markers to be able to deduce the control flow from the trace file.

The algorithm is as follows: For each function, determine a spanning tree of its undirectized control flow graph. Mark all edges that are not on this spanning tree. Put a special marker at the beginning and end of each function (these markers are "special" in the sense that they are not associated with control flow graph edges but with nodes.)

Justification:

Consider the following facts:

A program that has only one basic block has no need for any markers; we always know its control flow.

A program that has one if-then-else needs one marker, on either the then or the else part. The presence or absence of the marker tells us the control flow (i.e. whether then or else was executed).

A program that has one loop needs one marker, on the back edge. The number of times that marker appears tells the control flow (i.e. how many times the loop executed).

Whenever there are two or more ways to go from one node to another we must be able to distinguish between all the ways.

Two or more ways to go from one node to another is nothing but a circuit in the undirectized control flow graph.

A connected graph with no circuits is a spanning tree. Any edge added to a spanning tree causes a circuit to be formed. Such edges must be marked.

Therefore, we mark the edges that are not on the spanning tree.

Algorithm 2: Replaying

Given the checkpoints and the trace file produced during trace, the replayer must allow the user to navigate through the execution of the program—i.e. step forwards and backwards, or jump to an arbitrary point forward or backward in the execution.

Definitions:

Pre-set: The pre-set of an edge is the set of markers that can appear immediately before that edge.

Post-set: The post-set of an edge is the set of markers that can appear immediately after that edge.

Target IP: The Instruction Pointer where the user wishes to go to.

Target node: The basic block in which the target IP lies.

The Algorithm:

When the user requests an arbitrary forward jump to some target IP:

Starting from the current point in the trace file, find a pair such that the first element of the pair lies in the pre-set of the target node and the second element lies in the post-set of the target-node. Restore the last checkpoint before that location. Run the child till it hits the target IP.

Justification:

Obviously, if a node is executed, then there will be a pair (A, B) in the trace file such that A lies in the pre-set of the node and B lies in the post-set.

What is interesting is that the converse is true: if a pair (A, B) exists in the trace file such that A lies in the pre-set of the node and B lies in the post-set of the node, then that node has been executed.

We will prove this by contradiction. Assume that the pair exists but the node has not been executed. This means that there are two paths of execution between the elements of the pair—one through the node, and one not through the node. But any unmarked path between two markers must lie entirely in a spanning tree of the control flow graph. Since only one such path may exist, we have a contradiction. Hence the existence of the pair implies that the node has been executed.

This is the result that enables to simply search for a pair in the trace file.

Efficiency:

Set membership checking is implemented in O(1) time. Then, this algorithm takes time proportional to the number of markers between the source and target.

References

Ball and Larus, "Optimally Profiling And Tracing Programs"

LIZARD

Tracing and Replaying algorithms

References