Section 1: Collecting Static Instruction Counts

Section 2: Collecting Dynamic Instruction Counts

The purpose of this project is to get you acquainted with LLVM. In particular, it will give you familiarity with the data structures, types, and code organization you will need to understand to implement program analyses. All code will be written in C++, as that is the language in which LLVM is written.

Section 1: Collecting Static Instruction Counts

Your task is to write a function pass that counts the number of each unique instruction in a function statically. After processing a function, the pass should output the counts to stderr in the following format:

[instruction name]\t[count]\n

For example, if the pass processes a function that consists of 2 load and 3 add instructions, the output should be:

load 2
add 3

Directions

Create a new directory in the Passes directory you pulled from github in part 0, called part1. Implement your pass in that directory (.../Passes/part1/CountStaticInstructions.cpp). This pass will be implemented as a function pass.
Create a CMakeLists.txt file inside th part1 directory. Name the module submission_pt1. You can name it anything you want, but you'll have to use this name before submitting to gradescope. See the provided test pass for an example on how to do this.
Update the CMakeLists.txt file in the higher-level directory (Passes) to include the new pass.
Register the pass by the name cse231-csi. This means that to run this pass, type
```
 opt -load submission_pt1.so -cse231-csi < input.ll > /dev/null 
```
where input.ll is the IR code file to be analyzed.
The order of the instructions in output does not matter.
For each instruction, the name printed out by the pass should be consistent with Instruction::getOpcodeName.
Do not use the STATISTIC convenience macro provided by LLVM.
You may assume that all the test cases are in C.
You may assume that each test case (compilation unit) only has one function.

Hints

Modules, Functions, and BasicBlocks each provide iterators over their children. This section of the LLVM Programmer’s Guide is a good resource for understanding this interface.
You are free to use (but not limited to) readily-available data structures such as STL’s std::map or LLVM’s DenseMap to store instruction counts.

Section 2: Collecting Dynamic Instruction Counts

The analysis you wrote in Section 1 was a static analysis because it did not actually execute the program to analyze it. In this section, we are going to write a dynamic analysis that executes the program and collects runtime information. In particular, you will write a pass that instruments the original program by inserting code to count the number times each instruction executes. Each instrumented function should output the analysis results before function termination.

The general strategy you should follow:

You need a C++ runtime library to keep track of the runtime information (In this section, it is how many times each instruction is executed.). We provide you a reference implementation in /lib231/lib231.cpp that contains helper functions and global data structures.
From your LLVM pass, instrument the program with calls to the runtime library at the correct locations.
When it’s time to run the analysis, run your LLVM pass on the original IR file, link the runtime library to the instrumented bitcode, then execute the linked program.

Directions

Implement the pass in .../Passes/part1/CountDynamicInstructions.cpp as a function pass.
Register the pass by the name cse231-cdi. This means that to run this pass, type
```
 opt -load submission_pt1.so -cse231-cdi < input.ll -o input-instrumented.bc 
```
where input.ll is the original IR file and input-instrumented.bc is the instrumented version.
The output of your analysis should be in the same format as in Section 1 (the only change is that the numbers are now dynamic instruction counts).
You may assume that the execution of each instrumented function always terminates normally.
You may assume that a function that will be instrumented by your pass does not make any function calls.
When you link the runtime library with the instrumented code, you may want to put the runtime library before the instrumented code. For example, assume input-instrumented.bc is the instrumented code and lib231.bc is the library, then you should link them by running
```
clang++ lib231.bc input-instrumented.bc -o mytest
```

Info!

We recently updated the provided lib231.cpp file. Due to this change, compiling it requires some extra flags. Read below for further information. You can also take a look at the provided run.sh scripts, found under /tests/conditionalSum and /tests/test-example

Our latest lib231.cpp calls the Instruction::getOpcodeName() LLVM function. Since lib231.cpp is not contained within the LLVM directory structure (i.e. it's not under /LLVM_ROOT/llvm/....), we have to provide clang with flags to point to the appropriate include directories. LLVM provides the llvm-config tool that generates the approriate flags given some arguments. Specifically for our case, the shell command $ llvm-config --system-libs --cppflags --ldflags --libs core returns

-I/LLVM_ROOT/llvm/include -I/LLVM_ROOT/build/include -D_GNU_SOURCE 
-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
-L/LLVM_ROOT/build/lib
-lLLVMCore -lLLVMBinaryFormat -lLLVMSupport -lLLVMDemangle
-lrt -ldl -lpthread -lm

In fact, these flags are more than what we need, so we can also pass the -Wno-unused-command-line-argument flag, simply to avoid getting warnings about this. Putting it all together, whenever lib231.cpp (or lib231.bc, or lib231.ll) is involved in a clang/clang++ command, this is how it should look like (note the backticks (`) ):

$ clang++ /lib231/lib231.cpp -emit-llvm -S `llvm-config --system-libs --cppflags --ldflags --libs core` -Wno-unused-command-line-argument -o /tmp/lib231.ll

To make things a little more organized, we can assign a flags variable like this:

FLAGS=`llvm-config --system-libs --cppflags --ldflags --libs core`
FLAGS="$FLAGS -Wno-unused-command-line-argument"

clang++ /lib231/lib231.cpp -emit-llvm -S $FLAGS -o /tmp/lib231.ll

Hints

A basic block is a single-entry, single-exit section of code. For this reason, you are guaranteed that if execution enters a basic block, all instructions in the basic block will be executed in a straight line. You can use this to your advantage to avoid instrumenting every instruction individually.
For inserting new instructions, have a look at the IRBuilder class, which is a convenience class for adding instructions. IRBuilder::SetInsertPoint sets the insertion point, and various create... methods on the IRBuilder instance insert instructions at the specified point.
To insert calls to a function, you will find the following functions useful:
- FunctionType::get
- Module::getOrInsertFunction
- IRBuilder::CreateCall
LLVM mangles externally-visible function names according to the standard set forth in the Itanium C++ ABI. (Although this may seem strange, consider what would happen if we tried to compile multiple overloaded functions without name mangling.) This is the reason why the functions names you see in LLVM IR are different from the ones you see in the source code.
You may be stuck wondering what instruction sequence corresponds to the C++ statements you want to insert. A useful approach is to write out the C++, compile it to LLVM IR with Clang, then examine the bitcode.

Section 3: Profiling Branch Bias

Now, write a dynamic analysis that computes the branch bias on a per-function basis: count the number of times conditional branch instructions are executed and the number of times conditional branch instructions are taken. Note that we only consider conditional branches. A conditional branch is considered taken if its condition evaluates to true. Each instrumented function should output these two counts before function termination. The output should be in the following format:

taken\t[count of taken]\n
total\t[count of total]\n

The general strategy you should follow:

You need a C++ runtime library to keep track of the runtime information (In this section, it is how many times a branch is executed or taken.). We provide you a reference implementation in /lib231/lib231.cpp that contains helper functions and global data structures.
From your LLVM pass, instrument the program with calls to the runtime library at the correct locations.
When it’s time to run the analysis, run your LLVM pass on the original bitcode file, link the runtime library to the instrumented bitcode, then execute the linked program.

Directions

Implement the pass in .../Passes/part1/BranchBias.cpp as a function pass.
Register the pass by the name cse231-bb. This means that to run this pass, type
```
 opt -load submission_pt1.so -cse231-bb < input.ll -o input-instrumented.bc 
```
where input.ll is the original bitcode file and input-instrumented.bc is the instrumented version.
You may assume that the execution of each instrumented function always terminates normally.
You may assume that a function that will be instrumented by your pass does not make any function calls.
When you link the runtime library with the instrumented code, you may want to put the runtime library before the instrumented code. For example, assume input-instrumented.bc is the instrumented code and lib231.bc is the library, then you should link them by running
```
clang++ lib231.bc input-instrumented.bc -o mytest
```

Testing

To help you test your code, we provide our solution contained in the docker image. All the three passes have been compiled in a module named "231_solution.so". Our passes are registered with the same names (cse231-csi, cse231-cdi, cse231-bb). For example, to run the cse231-csi pass from Section 1, type

 opt -load 231_solution.so -cse231-csi < input.ll > /dev/null

Note that the cse231-cdi and cse231-bb passes assume the use of our runtime library. That is, you have to link the bitcode that is instrumented by the solution opt with the runtime library we provide.

Use the provided test cases or write your own to try out your passes. In addition, take a look at the /tests/test-example/run.sh bash script. This script is now outdated and won't work as is, but it will show you the process you need to use (You should be able to easily update the script and make it work - try to fix the paths in it).

Directions

Make sure that the pathes set at the beginning of /tests/test-example/run.sh are correct
Make sure the module names are correct.
/tests/test-example/run.sh runs all three passes. If you have not implemented them all, you need to comment out the lines for the pass(es) you have not implemented;
The outputs of cse231-csi, cse231-cdi, and cse231-bb are saved in /tmp/csi.result, /tmp/csi.result, and /tmp/bb.result, respectively.

Turnin Instructions

You will turn in your submission in Gradescope. As soon as you submit, your code will be auto-graded and you should have your grade and some feedback within a few minutes. You are allowed to submit as many times as you want until the deadline.

Grading

Your submission will be graded against 4 benchmarks we developed which satisfy all the requirements described in Sections 1-3 (only one function, written in C, no function calls, etc). For this part of the project, you will get 1 point for each case your solution matches ours, for a total of 12 points (4 benchmarks, 3 passes). The order of your output does not matter. Make sure your output is not polluted with debug print statements (happens more often than you'd think).

Submission Directions

You have to submit all source files necessary to compile your passes. This includes the CMakeLists.txt file you wrote in your part1 directory (not the one under Passes).
Since you are providing a CMakeLists.txt file, your source code files can have any name you want. But the LLVM module must be named submission_pt1 and the passes need to be named cse231-csi, cse231-cdi, cse231-bb
Do not submit a custom lib231.cpp library. Use the one provided under /lib231/lib231.cpp

Tutorial

You can find a tutorial on the relevant parts of the LLVM API here and here.

Part 1

Due Jan 25 11:59:59 PM

Section 1: Collecting Static Instruction Counts

Section 2: Collecting Dynamic Instruction Counts

Section 3: Profiling Branch Bias

Testing

Turnin Instructions

Grading

Submission Directions

Tutorial