phi
instruction.In this part you need to implement a generic optimistic iterative dataflow analysis framework, and a reach definition analysis on top of it. The framework will be reused and extended in part 3.
docker pull prodromou87/llvm:3.9
This will give you the latest Docker image with the solution of part 2 in /solutions/optgit pull origin master
to retrieve the latest version. We created a new folder under "Passes" called DFA. You will implement your pass in this folder. The folder contains the 231DFA.h
, which you will have to modify and use in this part of the project. As usual, after you launch Docker, the DFA folder will be found at /LLVM_ROOT/llvm/lib/Transforms/CSE231_Project/DFA/
231DFA.h
and the updated solution opt
here.You need to implement a generic intra-procedural dataflow analysis framework. We provide an incomplete base template class class DataFlowAnalysis
in /LLVM_ROOT/llvm/lib/Transforms/CSE231_Project/DFA/231DFA.h
. To create a dataflow analysis based on the base class DataFlowAnalysis
, you need to provide:
class Info
: the class that represents the information at each program point;bool Direction
: the direction of analysis. If it is true, then the analysis is a forward analysis; otherwise it is a backward analysis.Info InitialState
: the initial state of the analysis.Info Bottom
: the bottom of the lattice.DataFlowAnalysis
. For example, to create subclass of DataFlowAnalysis
for a forward analysis in which information is represented by class MyInfo
, we use
class MyForwardAnalysis : public DataFlowAnalysis<MyInfo, true> { ... };
The last two parameters are the parameters to the constructor of DataFlowAnalysis
and its subclasses. For example, in class MyForwardAnalysis
, we need to implement a constructor declared as MyForwardAnalysis(MyInfo & InitialStates, MyInfo & Bottom);
.
You need to implement class DataFlowAnalysis
in /LLVM_ROOT/llvm/lib/Transforms/CSE231_Project/DFA/231DFA.h
so that it performs a forward analysis. We provide most of the code for forward analysis in class DataFlowAnalysis
. You only need to implement the worklist algorithm in function runWorklistAlgorithm
.
In part 2, you will also need to implement a reaching definition analysis based on the framework you implemented.
void initializeForwardMap(Function * func)
in 231DFA.h
also builds a CFG of the given function func
for forward analyses. Let us call it DFA CFG. Analyses based on 231DFA.h
should work on DFA CFGs. DFA CFGs are slightly different from what we have seen during lectures.
First, any instruction may have more than one incoming data flows, as is shown below
So the first step of any flow function should be joining the incoming data flows.
Second, in the lecture PHI nodes are used to synthesize values from different
paths in an SSA representation. In LLVM IR,
the phi instructions serve a similar functionality. However, because the phi
instruction is an IR instruction and any IR instruction can return at most one value, if multiple values need to be synthesized there will be a number of phi
instructions at the beginning of a basic block. I.e. in the LLVM CFG, an IR basic block may start with a number of consecutive phi
instructions. In the DFA CFG, the first phi
instruction connects to the first non-phi instruction in the same basic block directly, bypassing all the other phi
instructions. Below is an example to illustrate these two kinds of CFG for the same code:
There are k phi
instructions at the start of the basic block C. In the LLVM CFG, these phi
instructions are connected sequentially. In the DFA CFG, phi 1
has an outgoing edge connecting to the first non-phi instruction %res = add %a, %b
directly. When encountering a phi
instruction, the flow function should process the series of phi
instructions together (effectively a PHI node from the lecture) rather than process each phi
instruction individually. This means that the flow function needs to look at the LLVM CFG to iterate through all the later phi
instructions at the beginning of the same basic block until the first non-phi instruction.
add
IR instruction:
%result = add i32 4, %var
%result
is the variable defined by this add
instruction. None of any other instructions can redefine %result
. This means that we can use the index of this add
instruction to represent this definition of %result
.
Since a definition can be represented by the index of the defining instruction, unlike the reaching analysis taught in class, the domain D for this analysis is Powerset(S) where S is a set of the indices of all the instructions in the function. The bottom is the empty set. The top is S. ⊑ is ⊆ ("is subset of").
This analysis works at the LLVM IR level, so operations that the flow functions process are IR instructions. The flow functions are specified below. You need to implement them in flowfunction
in your subclass of DataFlowAnalysis
. There are three categories of IR instructions:
First Category: IR instructions that return a value (defines a variable)
where index is the index of the IR instruction, which corresponds to the variable <result>
being defined.
Second Category: IR instructions that do not return a value
Third Category: phi instructions
where indices is the set of all the indices of the phi instructions.
For the reaching definition analysis, you only need to consider the following IR instructions:
To implement the reaching definition analysis, you need to create
class ReachingInfo
in
/LLVM_ROOT/llvm/lib/Transforms/CSE231_Project/DFA/ReachingDefinitionAnalysis.cpp
: the class represents the information at each program point for your reaching definition analysis. It should be a subclass of class Info
in 231DFA.h
;class ReachingDefinitionAnalysis
in ReachingDefinitionAnalysis.cpp
: this class performs reaching definition analysis. It should be a subclass of DataFlowAnalysisflowfunction
needs to be implemented in this subclass according to the specifications above;ReachingDefinitionAnalysisPass
in ReachingDefinitionAnalysis.cpp
: This pass should be registered by the name cse231-reaching. After processing a function, this pass should print out the reaching definition information at each progrm point to stderr. The output should be in the following form:
Edge[space][src]->Edge[space][dst]:[def 1]|[def 2]| ... [def K]|\n
where [src]
is the index of the instruction that is the start of the edge, [dst]
is the index of the instruction that is the end of the edge, [def 1]
, [def 2]
... [def K]
are indices of instructions (definitions) that reach at this program point. The order of the indices does not matter;
call
is not listed explicitly above.The turnin script for part 2 is here.
Directions
231DFA.h
ReachingDefinitionAnalysis.cpp