Discussion slides can be accessed from here.
Here are some clarifications that might help you in formulating the structure of your project -
ConstantPropAnalysis.cpp
In this part you will implement a modified global variables (mod) analysis. The results of the MOD analysis will be used in the flow functions of the ConstPropAnalysis
Your first task for this part is to implement an inter-procedural modified global variables analysis. For each function, our analysis pass describes which global variables may be modified. This analysis pass operates on LLVM IR.
Implementation hint - Just to make things less complicated, for now, we will store all the variables that adhere to the conditions mentioned below irrespective of them being global or not. While printing out the information in the print function for ConstantPropAnalysis, we will keep an additional condition so that we only print out the global variables.
function foo (....&operand....){}
This implies that the operand is passed by reference to the function and therefore can be pointed/modified by a variable inside the function. Therefore, we will add this operand to the MPT set. Use&
data type provided by LLVM. For any given call instruction I, you can get the operands that were passed by reference as follows - for (Use& operand : I.operands()) {}
std::set <Value*> MPT;
glob = ....
*var = ....
, you'll need to add the subset of MPT containing GlobalVariables to the LMOD for that particular function. (This is the condition where MPT needs to be added to MOD as mentioned in before getting started section)
Note: Since this is a MAY analysis we can choose to not remove any thing from our Infos. The analysis will be correct but may be not very precise.
Now we come to CMOD, the tricky part. To calculate CMOD, you will need to know the caller-callee relationships between functions. This information can be obtained from the call graph. Every function's CMOD is the union of all of its callee's MODs.231DFA.h
, we will use the concept of strongly connected components here and that is precisely the reason why we use the CallGraphSCCPass
, because it gives us those strongly connected components.
More information on SCC can be found here. You don't need to worry about the actual details of SCC but all you need to remember is that there's a smart way to start iterating over your functions and the SCC pass gives you the necessary APIs to do that.
doInitialization
method, your runOnSCC
method(which is analogous to the runOnFunction method of a function pass), and the doFinalization
method.
Detailed information about this pass can be found on LLVMs documentation here
The following calculation are supposed to be made in each of the methods -bool doInitialization(CallGraph &CG)
You'll build your MPT and LMOD data structures in this method. Just one run across the call graph is sufficient for calculating them
Remember that since you have the call graph with you, you have access to all the functions in the module.
bool runOnSCC(CallGraphSCC &SCC)
You'll build your CMOD data structures in this method
The SCC
will give you all the caller-callee relationship that you can use to calculate the MOD for the functions in your module.
You'll get the LMOD[caller]
and then you'll get the MOD[callee]
. The union of these two things is MOD[caller]
bool doFinalization(CallGraph &CG)
The above two functions would have built the MOD map and MPT set.
This is the function where you'll iterate over the entire call graph and call the worklist algorithm that you had written in 231DFA.h
to do the Constant Prop Analysis on each function and print the analysis results.
The SCC data structure contains the CallGraphNode. The CallGraphNode
further provides you access to the function that is present in that CallGraphNode. To access the function present in a CallGraphNode, use the getFunction() method on it.
Once you get the Function from the CallGraphNode, you need to find the Callee(s) of that function, if any, and union the MODS.
One special case I want to draw your attention towards is -
Notice that in this case the MOD[main] = MOD[foo], otherwise we would never reach a fixed point
This basically implies that the MOD of all the functions within one specific SCC has to be the same.The following functions might be useful when working with a callgraph -
CG.getModule().getGlobalList()
- gives a list of GlobalVariables in the call graphCG.getModule().functions()
- gives you a list of all the functions in the call graphFunction*
as key and GlobalVariable
set as the valuestd::unordered_map<Function*, std::set<GlobalVariable*>> MOD;
To implement the mod analysis, you need to
In part 4, you will also need to implement a constant propagation analysis based on the data-flow analysis framework you used earlier in the project. The analysis describes at each point in the program which global variables must be a constant value, and their corresponding constant values.
For this analysis, you will map every global variable to a single constant value, top (not constant), or bottom (all constants). As discussed in lectures earlier in the class, using a lattice of height 2 for every variable guarantees termination in the worklist algorithm.
Again, we'll be similar same flow functions to the flow functions described in lecture. You should use LLVM's ConstantFolder
to fold in unary and binary expressions with constant operands into a constant. You need to include #include "llvm/IR/ConstantFolder.h"
for ConstantFolder to work
To simplify the analysis, after a call instruction we set all variables in the callee's MOD set to top (we assume every variable in MOD was modified to some non constant value).
Also, when we encounter an instruction which modifies a dereferenced pointer, set all variables in MPT to top.
NOTE - We will not be providing explicit flow function definitions for this analysis like we did in the previous parts. It is your responsibility to come up with the flow functions yourselves. To check if your understanding is correct, run it against the reference solution provided or if you really can't figure out the flow function for any instruction, discussions on Piazza/OHs are always welcome
For the ConstantProp analysis, you only need to consider the following IR instructions:
These following data structures can be used while implementing your constant prop analysis :
enum ConstState { Bottom, Const, Top };
- enum to store which state the operand belongs tostruct Const { ConstState state ; Constant* value; };
- struct for storing info about a constant valuetypedef std::unordered_map<Value*, struct Const> ConstPropContent;
- the map used by ConstPropInfo for storing the contentYou are in no way limited to using these data structures, if you have a better design in mind, please feel free to use that. If you do decide to use this design then you don't need to worry about the IndexToInstr and InstrToIndex map. You can directly store the instructions as is
To implement the constant propagation analysis, you need to create
class ConstPropInfo
in
ConstPropAnalysis.cpp
: the class represents the information at each program point for your constant propagation analysis. It should be a subclass of class Info
in 231DFA.h
;class ConstPropAnalysis
in ConstPropAnalysis.cpp
: this class performs a constant propagation analysis. It should be a subclass of DataFlowAnalysisflowfunction
needs to be implemented in this subclass according to the specifications above;ConstPropAnalysisPass
in ConstPropAnalysis.cpp
: This pass should be registered by the name cse231-constprop. After processing a function, this pass should print out the constant propagation information at each program point to stderr
. The output should be in the following form:
Edge[space][src]->Edge[space][dst]:[glob1]=[val1]|[glob2]=[val2]| ... [globK]=[valK]|\n
where [src]
is the index of the instruction that is the start of the edge, [dst]
is the index of the instruction that is the end of the edge. [globi]
is a string representation of the ith global variable (the variable name). [vali]
is the string representation of the ith value (⊤, ⊥, or a constant value).
To help you test your code, we provide our solution contained in the docker image. All solutions have been compiled in a module named "231_solution.so". Our constant propagation analysis solution is registered with the same name (cse231-constprop). For example, to run the constant propagation solution, type
opt -load 231_solution.so -cse231-constprop < input.ll > /dev/null
Moreover, you are also required to write 10 test cases of your own, ideally, covering all the instruction types mentioned above and make sure that the output produced by your solution is the same as that you the solution provided to you. You'll be required to submit these test cases and you'll be graded on them. Submission instructions can be found in the section below
submission_pt4
and the pass need to be named cse231-constprop
test_{i}.c
or test_{i}.cpp
In conclusion, within the zipped folder named tests.zip
you should have the following test files - test_1.cpp, test_2.cpp......test_10.cpp
We will test these cases against our solution as well as your submission and grade you based on the output produced.
This is worth 10% of this assignment's grade