Getting Started

If you are using docker,
    There is folder under "Passes" called "DFA". You will implement your pass in this folder. The folder contains 231DFA.h, which you will have to modify and use in this part of the project. As usual, after you launch Docker, the DFA folder will be found at /LLVM_ROOT/llvm/lib/Transforms/CSE231_Project/DFA/.
If you are not using docker,
  1. Download 231DFA.h.

Dataflow Analysis Framework

You need to implement a generic intra-procedural dataflow analysis framework. We provide an incomplete base template class class DataFlowAnalysis in /LLVM_ROOT/llvm/lib/Transforms/CSE231_Project/DFA/231DFA.h. To create a dataflow analysis based on the base class DataFlowAnalysis, you need to provide:

  1. class Info: the class that represents the information at each program point;
  2. bool Direction: the direction of analysis. If it is true, then the analysis is a forward analysis; otherwise it is a backward analysis.
  3. Info InitialState: the initial state of the analysis.
  4. Info Bottom: the bottom of the lattice.
The first two parameters are the parameters of the template class DataFlowAnalysis. For example, to create subclass of DataFlowAnalysis for a forward analysis in which information is represented by class MyInfo, we use

class MyForwardAnalysis : public DataFlowAnalysis<MyInfo, true> { ... };

The last two parameters are the parameters to the constructor of DataFlowAnalysis and its subclasses. For example, in class MyForwardAnalysis, we need to implement a constructor declared as MyForwardAnalysis(MyInfo & InitialStates, MyInfo & Bottom);.

Directions

You need to implement class DataFlowAnalysis in /LLVM_ROOT/llvm/lib/Transforms/CSE231_Project/DFA/231DFA.h so that it performs a forward analysis. We provide most of the code for forward analysis in class DataFlowAnalysis. You only need to implement the worklist algorithm in function runWorklistAlgorithm.

Reaching Definition Analysis

In part 2, you will also need to implement a reaching definition analysis based on the framework you implemented.

Control Flow Graph (CFG)

LLVM builds a CFG for a given function, and this CFG is available when your function pass is running. Let us call it LLVM CFG. The function void initializeForwardMap(Function * func) in 231DFA.h also builds a CFG of the given function func for forward analyses. Let us call it DFA CFG. Analyses based on 231DFA.h should work on DFA CFGs. DFA CFGs are slightly different from what we have seen during lectures.

First, any instruction may have more than one incoming data flows, as is shown below

So the first step of any flow function should be joining the incoming data flows.
Second, in the lecture PHI nodes are used to synthesize values from different paths in an SSA representation. In LLVM IR, the phi instructions serve a similar functionality. However, because the phi instruction is an IR instruction and any IR instruction can return at most one value, if multiple values need to be synthesized there will be a number of phi instructions at the beginning of a basic block. I.e. in the LLVM CFG, an IR basic block may start with a number of consecutive phi instructions. In the DFA CFG, the first phi instruction connects to the first non-phi instruction in the same basic block directly, bypassing all the other phi instructions. Below is an example to illustrate these two kinds of CFG for the same code:

There are k phi instructions at the start of the basic block C. In the LLVM CFG, these phi instructions are connected sequentially. In the DFA CFG, phi 1 has an outgoing edge connecting to the first non-phi instruction %res = add %a, %b directly. When encountering a phi instruction, the flow function should process the series of phi instructions together (effectively a PHI node from the lecture) rather than process each phi instruction individually. This means that the flow function needs to look at the LLVM CFG to iterate through all the later phi instructions at the beginning of the same basic block until the first non-phi instruction.

Lattice

This analysis operates on LLVM IR, so we only care about the reaching definitions of LLVM IR variables. Note that LLVM IR is in Single Static Assignment (SSA) form, so every LLVM IR variable has exactly one definion. Also, because any LLVM IR instruction that can define a variable (i.e. has a non-void return type), can only define one variable at a time, there is a one-to-one mapping between LLVM IR variables and IR instructions that can define variables. For example, below is an add IR instruction:
%result = add i32 4, %var
%result is the variable defined by this add instruction. None of any other instructions can redefine %result. This means that we can use the index of this add instruction to represent this definition of %result. Since a definition can be represented by the index of the defining instruction, unlike the reaching analysis taught in class, the domain D for this analysis is Powerset(S) where S is a set of the indices of all the instructions in the function. The bottom is the empty set. The top is S. ⊑ is ⊆ ("is subset of").

Flow Function

This analysis works at the LLVM IR level, so operations that the flow functions process are IR instructions. The flow functions are specified below. You need to implement them in flowfunction in your subclass of DataFlowAnalysis. There are three categories of IR instructions:

First Category: IR instructions that return a value (defines a variable)

where index is the index of the IR instruction, which corresponds to the variable <result> being defined.

Second Category: IR instructions that do not return a value

Third Category: phi instructions

where indices is the set of all the indices of the phi instructions.

For the reaching definition analysis, you only need to consider the following IR instructions:

  1. All the instructions under binary operations;
  2. All the instructions under binary bitwise operations;
  3. br;
  4. switch;
  5. alloca;
  6. load;
  7. store;
  8. getelementptr;
  9. icmp;
  10. fcmp;
  11. phi;
  12. select.
Every instruction above falls into one of the three categories. If an instruction has multiple outgoing edges, all edges have the same information. Any other IR instructions that are not mentioned above should be treated as IR instructions that do not return a value (the second categories above).

Directions

To implement the reaching definition analysis, you need to create

  1. class ReachingInfo in /LLVM_ROOT/llvm/lib/Transforms/CSE231_Project/DFA/ReachingDefinitionAnalysis.cpp: the class represents the information at each program point for your reaching definition analysis. It should be a subclass of class Info in 231DFA.h;
  2. class ReachingDefinitionAnalysis in ReachingDefinitionAnalysis.cpp: this class performs reaching definition analysis. It should be a subclass of DataFlowAnalysis. Function flowfunction needs to be implemented in this subclass according to the specifications above;
  3. A function pass called ReachingDefinitionAnalysisPass in ReachingDefinitionAnalysis.cpp: This pass should be registered by the name cse231-reaching. After processing a function, this pass should print out the reaching definition information at each progrm point to stderr. The output should be in the following form:
    Edge[space][src]->Edge[space][dst]:[def 1]|[def 2]| ... [def K]|\n
    
    where [src] is the index of the instruction that is the start of the edge, [dst] is the index of the instruction that is the end of the edge, [def 1], [def 2] ... [def K] are indices of instructions (definitions) that reach at this program point. The order of the indices does not matter;
  4. You may assume that only C functions will be used to test your implementation;
  5. You may assume that the testing functions do not make any functions calls. However, LLVM does insert calls to intrinsics for analysis and optimization purposes. You should just treat them as second category instructions in your flow function, as call is not listed explicitly above;
  6. You have to implement the exact flow functions as specified above. Given these flow functions, you have to get the most precise result for each edge.

Turnin Instructions

The turnin script for part 2 is here.

Directions

  • Put the script in the same directory as the following files:
    • 231DFA.h
    • ReachingDefinitionAnalysis.cpp
  • Run it in that directory.
  • You may submit your code as many times as you like before the deadline.

Tutorial

You can find a an overview on the dataflow analysis project here.