In this part you need to reuse and extend the dataflow analysis framework to implement a liveness analysis and a may-point-to analysis.
Your first task for this part is to implement a liveness analysis based on the framework implemented in the previous assignment. Recall that a variable is live at a particular point in the program if its value at that point will be used in the future. It is dead otherwise.
void initializeBackwardMap(Function * func)
in 231DFA.h
builds a CFG of the given function func
for backwards analyses. Let us call it BDFA CFG. Analyses based on 231DFA.h
should work on DFA CFGs or BDFA CFGs. The choice between the two is determined by the Direction
parameter of the DataFlowAnalysis
class. Below are some of the main points of constructing BDFA CFGs:
First, the edges of a BDFA CFG are reversed with respect to the original LLVM CFG (or the respective DFA CFG for that matter).
Also, instructions may have multiple incoming and outgoing edges, so the first step of any flow function should be joining the incoming data flows.
Second, phi instructions are handled similarly to part 2: the phi
instruction is an IR instruction and any IR instruction can return at most one value. If multiple values need to be synthesized there will be a number of phi
instructions at the beginning of a basic block. I.e. in the LLVM CFG, an IR basic block may start with a number of consecutive phi
instructions. In the BDFA CFG, the first non-phi instruction connects to the first phi
in the same basic block directly, bypassing all the other phi
instructions. Below is an example that illustrates these two kinds of CFG for the same code:
There are k phi
instructions at the start of the basic block C. In the LLVM CFG, these phi
instructions are connected sequentially. In the BDFA CFG, the first non-phi instruction %res = add %a, %b
has an outgoing edge connecting to phi 1
directly. When encountering a phi
instruction, the flow function should process the series of phi
instructions together (effectively a PHI node from the lecture) rather than process each phi
instruction individually. This means that the flow function needs to look at the LLVM CFG to iterate through all the later phi
instructions at the beginning of the same basic block until the first non-phi instruction.
add
IR instruction:
%result = add i32 4, %var
%result
is the variable defined by this add
instruction. No other instructions can redefine %result
. This means that we can use the index of this add
instruction to the variable %result
.
At each program point we are interested for the set of live variables. So the domain D for this analysis is Powerset(S) where S is a set of the indices of all the instructions in the function. The bottom is the empty set. The top is S. ⊑ is ⊆ ("is subset of").
You are asked to implement flow functions that process IR instructions. The flow functions are specified below. You need to implement them in flowfunction
in your subclass of DataFlowAnalysis
. There are three categories of IR instructions:
First Category: IR instructions that return a value (define a variable)
where
operands is the set of variables used (and therefore needed to be live) in the body of the instruction, and
index is the index of the IR instruction, which corresponds to the variable <result>
being defined.
For example, in the instruction %result = add i32 4, %var
, operands would be the singleton set { I%var }, where I%var is the index of the instruction that defines %var
.
Second Category: IR instructions that do not return a value
Third Category: phi instructions
We will handle the phi instructions in a slightly specialized way to boost the precision of our analysis.
But first lets take a look at the problem that we would be facing, if we followed a naive approach. Suppose we compute a single out set, common to all outgoing edges. Following the intuition of the previous cases, this set will contain all values from <v_11> to <v_xn> that correspond to variables. This set is unnecessarily large: it includes variables that might never meet their corresponding definition following the CFG all the way to the beginning of the function. For example, the instruction corresponding to <v_11> will not be found in any basic block other than the one under label_11.
The root of the problem is that the above approach ignores an important piece of information that the phi instruction provides, namely the label of the basic block that each value originates from. The pair [<v_11>, label_11], for example, guarantees that <v_11> comes from the block labeled with label_11. Therefore, it is pointless to propagate the fact for <v_11> to any other basic block, and hence safe to avoid doing so. This intuition leads to the following flow function:
Here we include all incoming facts like before and exclude variables defined at each phi instruction. In addition, each outgoing set out[k] contains those phi variables v_ij that are defined in its matching basic block (labeled with label_k). The function ValueToInstr is merely used to extract the defining instruction from each phi value.
For the liveness analysis, you only need to consider the following IR instructions:
To implement the liveness analysis, you need to
initializeBackwardMap
method in 231DFA.h
that creates the BDFA CFG;class LivenessInfo
in
LivenessAnalysis.cpp
: the class represents the information at each program point for your liveness analysis. It should be a subclass of class Info
in 231DFA.h
;class LivenessAnalysis
in LivenessAnalysis.cpp
: this class performs the liveness analysis. It should be a subclass of DataFlowAnalysis
. Function flowfunction
needs to be implemented in this subclass according to the specifications above;LivenessAnalysisPass
in LivenessAnalysis.cpp
: This pass should be registered by the name cse231-liveness. After processing a function, this pass should print out the liveness information at each program point to stderr
. The output should be in the following form:
Edge[space][src]->Edge[space][dst]:[def 1]|[def 2]| ... [def K]|\n
where [src]
is the index of the instruction that is the start of the edge, [dst]
is the index of the instruction that is the end of the edge, [def 1]
, [def 2]
... [def K]
are indices of instructions (definitions) that reach at this program point. The order of the indices does not matter;
call
is not listed explicitly above;In part 3, you will also need to implement a may-point-to analysis based on the framework you implemented.
For this analysis, only memory objects allocated locally by LLVM IR instruction alloca can be pointees. A pointer can be either an LLVM IR variable of some pointer type (IR pointer), or a memory object (memory pointer). The analysis considers a pointer points to a memory object as long as the pointer points to any part of the object. Also, a memory object may contain multiple pointers (For example, a C struct that constains int * p1
and char * p2
). The analysis considers such a memory object as one memory pointer and it may point to the union of the the memory objects that the pointers inside of it may point to. DFA identifiers are used to reference IR pointers and memory objects: The DFA identifier of a IR pointer is Ri
where i
is the index of the defining IR instruction of this IR pointer (IR variable). The DFA identifier of a memory object is Mi
where i
is the index of the IR instruction that allocates this memory object. For example, suppose that the following IR instruction's index is 10
%ptr = alloca i32, i32 4
then the DFA identifier of the IR pointer, %ptr
, created by this instruction is R10, and the DFA identifier of the memory object allocated by this instruction is M10.
Let Pointers be the set of the DFA identifiers of the pointers in the function (including IR pointers and memory pointers) and MemoryObjects the set of the DFA identifiers of the memory objects allocated in the function. The domain D for this analysis is Powerset(S), where S={p → o | p ∊ Pointers && o ∊ MemoryObjects}. The bottom is the empty set. The top is S. ⊑ is ⊆ ("is subset of").
Like the reaching definition analysis from part 2, this analysis works at the LLVM IR level, so operations that the flow functions process are IR instructions. Again, each instruction may have multiple incoming edges, so the first step is to merge all incoming edges with the join operation. For simplicity, the flow functions below assume that the incoming edges have been merged. Also, many LLVM IR instructions have a variation that takes vector type parameter. That variation should be ignored in this analysis. The flow functions are specified below. Assume that the index of the IR instruction in question is i.
phi
instruction. You need to make sure that all the phi
instructions at the beginning of a basic block are processed properly.
To implement the may-point-to analysis, you need to create
class MayPointToInfo
in
MayPointToAnalysis.cpp
: the class represents the information at each program point for your may-point-to analysis. It should be a subclass of class Info
in 231DFA.h
;class MayPointToAnalysis
in MayPointToAnalysis.cpp
: this class performs a points-to analysis. It should be a subclass of DataFlowAnalysisflowfunction
needs to be implemented in this subclass according to the specifications above;MayPointToAnalysisPass
in MayPointToAnalysis.cpp
: This pass should be registered by the name cse231-maypointto. After processing a function, this pass should print out the points-to information at each program point to stderr
. The output should be in the following form:
Edge[space][src]->Edge[space][dst]:[point-to 1]|[point-to 2]| ... [point-to K]|\n
where [src]
is the index of the instruction that is the start of the edge, [dst]
is the index of the instruction that is the end of the edge. [point-to i]
represents what the ith pointer may point to at this moment, and it should be in the following form:
[pointer's DFA ID]->([pointee 1's DFA ID][slash][pointee 2's DFA ID][slash] ... [pointee m's DFA ID][slash])
The orders of the pointers and pointees do not matter;
submission_pt3
and the passes need to be named cse231-liveness
and cse231-maypointto