• Section 1: Collecting Static Instruction Counts
  • Section 2: Collecting Dynamic Instruction Counts
  • Section 3: Profiling Branch Bias
  • Testing
  • Turnin Instructions
  • Tutorial

  • The purpose of this project is to get you acquainted with LLVM. In particular, it will give you familiarity with the data structures, types, and code organization you will need to understand to implement program analyses. All code will be written in C++, as that is the language in which LLVM is written.

    Section 1: Collecting Static Instruction Counts

    Your task is to write a function pass that counts the number of each unique instruction in a function statically. After processing a function, the pass should output the counts to stderr in the following format:

    [instruction name]\t[count]\n
    For example, if the pass processes a function that consists of 2 load and 3 add instructions, the output should be:

    load 2
    add 3
    

    Directions

    Hints

    Section 2: Collecting Dynamic Instruction Counts

    The analysis you wrote in Section 1 was a static analysis because it did not actually execute the program to analyze it. In this section, we are going to write a dynamic analysis that executes the program and collects runtime information. In particular, you will write a pass that instruments the original program by inserting code to count the number times each instruction executes. Each instrumented function should output the analysis results before function termination.

    The general strategy you should follow:

    1. You need a C++ runtime library to keep track of the runtime information (In this section, it is how many times each instruction is executed.). We provide you a reference implementation in /lib231/lib231.cpp that contains helper functions and global data structures.
    2. From your LLVM pass, instrument the program with calls to the runtime library at the correct locations.
    3. When it’s time to run the analysis, run your LLVM pass on the original IR file, link the runtime library to the instrumented bitcode, then execute the linked program.

    Directions

    Info!

    We recently updated the provided lib231.cpp file. Due to this change, compiling it requires some extra flags. Read below for further information. You can also take a look at the provided run.sh scripts, found under /tests/conditionalSum and /tests/test-example

    Our latest lib231.cpp calls the Instruction::getOpcodeName() LLVM function. Since lib231.cpp is not contained within the LLVM directory structure (i.e. it's not under /LLVM_ROOT/llvm/....), we have to provide clang with flags to point to the appropriate include directories. LLVM provides the llvm-config tool that generates the approriate flags given some arguments. Specifically for our case, the shell command $ llvm-config --system-libs --cppflags --ldflags --libs core returns
    -I/LLVM_ROOT/llvm/include -I/LLVM_ROOT/build/include -D_GNU_SOURCE 
    -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
    -L/LLVM_ROOT/build/lib
    -lLLVMCore -lLLVMBinaryFormat -lLLVMSupport -lLLVMDemangle
    -lrt -ldl -lpthread -lm
    
    In fact, these flags are more than what we need, so we can also pass the -Wno-unused-command-line-argument flag, simply to avoid getting warnings about this. Putting it all together, whenever lib231.cpp (or lib231.bc, or lib231.ll) is involved in a clang/clang++ command, this is how it should look like (note the backticks (`) ):
    $ clang++ /lib231/lib231.cpp -emit-llvm -S `llvm-config --system-libs --cppflags --ldflags --libs core` -Wno-unused-command-line-argument -o /tmp/lib231.ll
    
    To make things a little more organized, we can assign a flags variable like this:
    FLAGS=`llvm-config --system-libs --cppflags --ldflags --libs core`
    FLAGS="$FLAGS -Wno-unused-command-line-argument"
    
    clang++ /lib231/lib231.cpp -emit-llvm -S $FLAGS -o /tmp/lib231.ll  
    

    Hints

    Section 3: Profiling Branch Bias

    Now, write a dynamic analysis that computes the branch bias on a per-function basis: count the number of times conditional branch instructions are executed and the number of times conditional branch instructions are taken. Note that we only consider conditional branches. A conditional branch is considered taken if its condition evaluates to true. Each instrumented function should output these two counts before function termination. The output should be in the following format:

    taken\t[count of taken]\n
    total\t[count of total]\n
    

    The general strategy you should follow:

    1. You need a C++ runtime library to keep track of the runtime information (In this section, it is how many times a branch is executed or taken.). We provide you a reference implementation in /lib231/lib231.cpp that contains helper functions and global data structures.
    2. From your LLVM pass, instrument the program with calls to the runtime library at the correct locations.
    3. When it’s time to run the analysis, run your LLVM pass on the original bitcode file, link the runtime library to the instrumented bitcode, then execute the linked program.

    Directions

    Testing

    To help you test your code, we provide our solution contained in the docker image. All the three passes have been compiled in a module named "231_solution.so". Our passes are registered with the same names (cse231-csi, cse231-cdi, cse231-bb). For example, to run the cse231-csi pass from Section 1, type

     opt -load 231_solution.so -cse231-csi < input.ll > /dev/null 
    Note that the cse231-cdi and cse231-bb passes assume the use of our runtime library. That is, you have to link the bitcode that is instrumented by the solution opt with the runtime library we provide.

    Use the provided test cases or write your own to try out your passes. In addition, take a look at the /tests/test-example/run.sh bash script. This script is now outdated and won't work as is, but it will show you the process you need to use (You should be able to easily update the script and make it work - try to fix the paths in it).

    Directions

    Turnin Instructions

    You will turn in your submission in Gradescope. As soon as you submit, your code will be auto-graded and you should have your grade and some feedback within a few minutes. You are allowed to submit as many times as you want until the deadline.

    Grading

    Your submission will be graded against 4 benchmarks we developed which satisfy all the requirements described in Sections 1-3 (only one function, written in C, no function calls, etc). For this part of the project, you will get 1 point for each case your solution matches ours, for a total of 12 points (4 benchmarks, 3 passes). The order of your output does not matter. Make sure your output is not polluted with debug print statements (happens more often than you'd think).

    Submission Directions

    Tutorial

    You can find a tutorial on the relevant parts of the LLVM API here and here.