Section 1: Collecting Static Instruction Counts

Section 2: Collecting Dynamic Instruction Counts

The purpose of this project is to get you acquainted with LLVM. In particular, it will give you familiarity with the data structures, types, and code organization you will need to understand to implement program analyses. All code will be written in C++, as that is the language in which LLVM is written.

Section 1: Collecting Static Instruction Counts

Your task is to write a function pass that counts the number of each unique instruction in a function statically. After processing a function, the pass should output the counts to stderr in the following format:

[instruction name]\t[count]\n

For example, if the pass processes a function that consists of 2 load and 3 add instructions, the output should be:

load 2
add 3

Directions

Implement the pass in /LLVM_ROOT/llvm/lib/Transforms/CSE231_Project/Passes/part1/CountStaticInstructions.cpp as a function pass.
Register the pass by the name cse231-csi. This means that to run this pass, type
```
 opt -load /LLVM_ROOT/build/lib/CSE231.so -cse231-csi < input.bc > /dev/null 
```
where CSE231.so is the name of the shared object containing your pass (we assume this for the following two sections. You may name it whatever you want.) and input.bc is the bitcode file to be analyzed.
The order of the instructions in output does not matter.
For each instruction, the name printed out by the pass should be consistent with Instruction::getOpcodeName.
Do not use the STATISTIC convenience macro provided by LLVM.
You may assume that all the test cases are in C.
You may assume that each test case (compilation unit) only has one function.

Hints

Modules, Functions, and BasicBlocks each provide iterators over their children. This section of the LLVM Programmer’s Guide is a good resource for understanding this interface.
You are free to use (but not limited to) readily-available data structures such as STL’s std::map or LLVM’s DenseMap to store instruction counts.

Section 2: Collecting Dynamic Instruction Counts

The analysis you wrote in Section 1 was a static analysis because it did not actually execute the program to analyze it. In this section, we are going to write a dynamic analysis that executes the program and collects runtime information. In particular, you will write a pass that instruments the original program by inserting code to count the number times each instruction executes. Each instrumented function should output the analysis results before function termination.

The general strategy you should follow:

You need a C++ runtime library to keep track of the runtime information (In this section, it is how many times each instruction is executed.). We provide you a reference implementation in /lib231/lib231.cpp that contains helper functions and global data structures. You are free to implement your own runtime library but you need to name it as lib231.cpp.
From your LLVM pass, instrument the program with calls to the runtime library at the correct locations.
When it’s time to run the analysis, run your LLVM pass on the original bitcode file, link the runtime library to the instrumented bitcode, then execute the linked program.

Directions

Implement the pass in /LLVM_ROOT/llvm/lib/Transforms/CSE231_Project/Passes/part1/CountDynamicInstructions.cpp as a function pass.
Register the pass by the name cse231-cdi. This means that to run this pass, type
```
 opt -load /LLVM_ROOT/build/lib/CSE231.so -cse231-cdi < input.bc -o input-instrumented.bc 
```
where input.bc is the original bitcode file and input-instrumented.bc is the instrumented version.
The output of your analysis should be in the same format as in Section 1 (the only change is that the numbers are now dynamic instruction counts).
You may assume that the execution of each instrumented function always terminates normally.
You may assume that a function that will be instrumented by your pass does not make any function calls.
If you develop your own runtime library, you do not need to worry about multi-threading.
When you link the runtime library with the instrumented code, you may want to put the runtime library before the instrumented code. For example, assume input-instrumented.bc is the instrumented code and lib231.bc is the library, then you should link them by running
```
clang++ lib231.bc input-instrumented.bc -o mytest
```
(Why do we need this?)

Hints

A basic block is a single-entry, single-exit section of code. For this reason, you are guaranteed that if execution enters a basic block, all instructions in the basic block will be executed in a straight line. You can use this to your advantage to avoid instrumenting every instruction individually.
For inserting new instructions, have a look at the IRBuilder class, which is a convenience class for adding instructions. IRBuilder::SetInsertPoint sets the insertion point, and various create... methods on the IRBuilder instance insert instructions at the specified point.
To insert calls to a function, you will find the following functions useful:
- FunctionType::get
- Module::getOrInsertFunction
- IRBuilder::CreateCall
LLVM mangles externally-visible function names according to the standard set forth in the Itanium C++ ABI. (Although this may seem strange, consider what would happen if we tried to compile multiple overloaded functions without name mangling.) This is the reason why the functions names you see in LLVM IR are different from the ones you see in the source code.
You may be stuck wondering what instruction sequence corresponds to the C++ statements you want to insert. A useful approach is to write out the C++, compile it to LLVM IR with Clang, then examine the bitcode.

Section 3: Profiling Branch Bias

Now, write a dynamic analysis that computes the branch bias on a per-function basis: count the number of times conditional branch instructions are executed and the number of times conditional branch instructions are taken. Note that we only consider conditional branches. A conditional branch is considered taken if its condition evaluates to true. Each instrumented function should output these two counts before function termination. The output should be in the following format:

taken\t[count of taken]\n
total\t[count of total]\n

The general strategy you should follow:

You need a C++ runtime library to keep track of the runtime information (In this section, it is how many times a branch is executed or taken.). We provide you a reference implementation in /lib231/lib231.cpp that contains helper functions and global data structures. You are free to implement your own runtime library but you need to name it as lib231.cpp and put it in /lib231/.
From your LLVM pass, instrument the program with calls to the runtime library at the correct locations.
When it’s time to run the analysis, run your LLVM pass on the original bitcode file, link the runtime library to the instrumented bitcode, then execute the linked program.

Directions

Implement the pass in /LLVM_ROOT/llvm/lib/Transforms/CSE231_Project/Passes/part1/BranchBias.cpp as a function pass.
Register the pass by the name cse231-bb. This means that to run this pass, type
```
 opt -load /LLVM_ROOT/build/lib/CSE231.so -cse231-bb < input.bc -o input-instrumented.bc 
```
where input.bc is the original bitcode file and input-instrumented.bc is the instrumented version.
You may assume that the execution of each instrumented function always terminates normally.
You may assume that a function that will be instrumented by your pass does not make any function calls.
If you develop your own runtime library, you do not need to worry about multi-threading.
When you link the runtime library with the instrumented code, you may want to put the runtime library before the instrumented code. For example, assume input-instrumented.bc is the instrumented code and lib231.bc is the library, then you should link them by running
```
clang++ lib231.bc input-instrumented.bc -o mytest
```
(Why do we need this?)

Testing

To help you test your code, we provide our solution in binary. If you use the docker image, you can find it at /solution/opt. Otherwise, you can download it here. All the three passes have been statically linked to opt, which means that you do not need to load any shared object to run one of the three passes. For example, to run the cse231-csi pass from Section 1, type

 /solution/opt -cse231-csi < input.bc > /dev/null

Note that the cse231-cdi and cse231-bb passes assume the use of our runtime library. That is, you have to link the bitcode that is instrumented by the solution opt with the runtime library we provide.

In addition, we provide you a test case in /tests/test-example. This directory has test code in C/C++, and a bash script, /tests/test-example/run.sh, that builds and runs the tests.

Directions

Make sure that the pathes set at the beginning of /tests/test-example/run.sh are correct;
/tests/test-example/run.sh runs all three passes. If you have not implemented them all, you need to comment out the lines for the pass(es) you have not implemented;
The outputs of cse231-csi, cse231-cdi, and cse231-bb are saved in /tmp/csi.result, /tmp/csi.result, and /tmp/bb.result, respectively.

Turnin Instructions

The turnin script for part 1 is here.

Directions

Put the script in the same directory as the following files:
- CountStaticInstructions.cpp
- CountDynamicInstructions.cpp
- BranchBias.cpp
- lib231.cpp
Run it in that directory.
You may submit your code as many times as you like before the deadline.

Tutorial

You can find a tutorial on the relevant parts of the LLVM API here and here.

Part 1

Due Jan 26 11:59:59 PM

Section 1: Collecting Static Instruction Counts

Section 2: Collecting Dynamic Instruction Counts

Section 3: Profiling Branch Bias

Testing

Turnin Instructions

Tutorial