The purpose of this part is to ensure you all have a working and compatible LLVM installation. In order to avoid potential compatibility issues generated from students using different LLVM versions than the expected (v9.0.1), we provide a Docker image with bearbones Ubuntu 18.04 and a clean LLVM-9 installation.
In this guide, we provide instructions on how to install Docker and pull the LLVM image. In case you are not able to use Docker, you will have to install LLVM manually.
Installing Docker should be straightforward for Windows and Mac OS users, by downloading it from the Docker website. Linux users will have to follow this guide. The linux guides essentially try to upgrade your system to a compatible version (for example upgrading to Ubuntu 16.04). Be careful not to break your current system. If you are working with linux, having a Ubuntu 16.04 or higher system should result in an easier docker installation.
For Windows users, Docker will require you enable Hyper-V and restart your computer. Some Windows 10 versions do not have Hyper-V. If you face any issues with installing Docker on Windows, installing Docker Toolbox instead of Docker should be the easiest way out. As an added benefit you will later be able to use our linux scripts to start the docker image.
After succesful instalation of Docker, open a command prompt or shell and execute the following command:
$ docker pull yalhessi/cse231_student:llvm9
Docker should automaticaly start downloading and extracting the provided LLVM image. If you skip this step the image will automatically be downloaded the first time you attempt to start it. Once finished you can verify you have it by typing
$ docker images
At a high level, LLVM can be split conceptually into three parts:
In this class, we’ll only be adding code to (2). In particular, we will be writing additional passes that analyze and transform LLVM IR programs.
LLVM is architected such that a single run consists of a sequence of many passes run back to back. By keeping the logic of each pass relatively isolated from the others, LLVM achieves a nice modularity that makes it easy to add new passes. Additionally, you’ll probably find that LLVM IR is a refreshingly simple representation compared to, say, C++ or x86 assembly. This allows our analyses to avoid the complexity of reasoning about the idiosyncrasies of those more complex languages.
This guide will provide information on how you can develop code locally and then compile and run it within the docker container.
Start by cloning the CSE231_Project folder from github:
$ git clone https://github.com/UCSD-PL/cse-231-starter.git
Take a minute to see what we prepared for you in this folder. You get a set of guides on how to generate IR code, how to compile and how to run a pass. All these guides are also covered in this page.
The mount_and_launch.sh script will automatically mount the other three folders to the appropriate mount points and start a shell in the docker image. All you have to do is write your code under the Passes directory (pay attention to the CMakeLists.txt files - they are important for compiling your passes), run mount_and_launch.sh and you are ready to compile and run. You will use the Output folder to store results from running your passes. Spend some time to familiarize yourselves with the directory structure. Also take a look at the script's code so you understand exactly how it works, in case you need to modify it later on. We run the image with the "--rm" option so it automatically stops and kills the running image after you exit it and "-it" for an interactive session (opening a shell).
To start a shell in the LLVM docker image, type:
$ sudo ./mount_and_launch.sh
Windows users will have to open the file names "windows_docker_command.txt", modify the paths in the command and then copy-paste it in a Powershell. Some windows machines forbid the execution of scripts in Powershell for security purposes. It's up to you if you wish to enable it and convert the provided file to a Powershell script in the future.
The Tests directory you pulled from Github is now included in the Docker image we provide, so you no longer need to mount it before running the docker container. If however, you write your own passes in that directory, use the -v flag (see below for more info) to mount it.
If you installed Docker toolbox instead of Docker, you should be able to use the mount_and_launch.sh script. Try not to have spaces in the path to the cloned "starter code" because it might lead to unpredictable errors.
Regardless of the host OS, after you start an interactive session in Docker you will all be working in the same system. The remaining guides in this page will be the same for Windows, Mac OS and Linux users.
When you start a shell in Docker, you will find the LLVM source code under /LLVM_ROOT/llvm. The compiled LLVM is located in /LLVM_ROOT/build. The environment in the provided image comes pre-configured so you don't have to type in full paths every time you need to run an LLVM command.
The directories in the "starter pack" code you cloned in the previous step will be mounted at the following points within the container (Verify by moving into those mount points (cd command) and make sure the content of the folders is there.
Spend some time exploring these folders and verify that changes from the host are immediately observable in the image and vice versa. Careful when deleting files because they will be deleted on the host as well.
If you installed LLVM manually, you will have to generate a folder under the lib/Transforms folder to store your passes, as well as modify the appropriate CMakeLists.txt files.
In the Docker container, you will find a "Hello World" program written in C++ under the provided "tests" folder. We will use this as the first test case for your first LLVM pass. You will need to be able to convert source code to IR code for your passes to operate on. Feel free to write your own small program while following this code. All you have to do is create a new folder under "tests" and write your code in it. Otherwise simply follow this guide to compile the Hello World program.
First, start the docker image by executing the provided script. Then navigate (cd) to /tests. Then cd into the HelloWorld folder. You need to use clang to generate IR code by typing the following command:
# clang -O0 -S -emit-llvm HelloWorld.cpp
This will generate a new file called HelloWorld.ll. Keep in mind that files generated inside a docker container will be deleted as soon as you exit it (mounted directories excluded). You can read the file and see how IR code looks like. You just compiled your first program into LLVM IR. Congratulations.
For reference, here's all the commands you will need to run:
$ sudo ./mount_and_launch.sh
(automatically running cmake before opening the shell)
# cd /tests
# cd HelloWorld
# clang -O0 -S -emit-llvm HelloWorld.cpp
Go ahead and open HelloWorld.ll
, which contains the human-readable LLVM IR produced from HelloWorld.cpp
in the same directory. You will notice that LLVM IR looks a lot like assembly code. In fact, LLVM IR is an assembly language, with a few unusual features:
i32
and i8*
%
Otherwise, the usual suspects are there: arithmetic instructions, logical instructions, branch and jump instructions, etc. The LLVM Language Reference lists them all, in case you are interested.
The LLVM pass code is provided for convenience. To understand exactly what each line of code means (or even better try to write it on your own), follow this guide.
For each new pass you create in the future, you will have to create a new directory under "Passes" with your code and the appropriate CMakeLists.txt file in it. Instructions on what to include in that file can be found in the link provided earlier. To make sure your new pass will be compiled, you have to include your new directory in the CMakeLists.txt file under the "Passes" folder, simply following the existing syntax.
You are now ready to compile your first pass. Start the docker image and move (cd) into the build directory (/LLVM_ROOT/build). We first have to invoke cmake and point it to the source code, which in turn will visit the CMakeLists.txt files and prepare everything for compilation.
We only need to compile our pass, so after cmake is done, move into the pass directory (/LLVM_ROOT/build/lib/Transforms/CSE231_Project) and compile ("make" command).
If everything went well you should be able to find your module (an LLVM module includes passes) under /LLVM_ROOT/build/lib. Keep in mind that after exiting the docker image, you will have to re-compile your passes. You might want to keep it running between compiling and running.
# cd /LLVM_ROOT/build
# cmake /LLVM_ROOT/llvm
(... cmake ends without errors ...)
# cd /LLVM_ROOT/build/lib/Transforms/CSE231_Project
# make
(Pass can now be found under /LLVM_ROOT/build/lib)
After compiling your first LLVM pass and generating the IR code for your test cases, all that's left is to run it. The provided test pass prints the message "TEST: ", followed by each function name in the IR code. Output is redirected to the stderr output because we used the function errs() in the source code of the pass (outs() would have sent it to standard out). Because of the pre-configured environment you should be able to run the pass from any directory. You just have to make sure to store the output in the mounted /output folder if you want it to persist after exiting docker.
(Start docker image and compile your pass following the previous guides)
# opt -load LLVMTestPass.so -TestPass < /tests/HelloWorld/HelloWorld.ll > /dev/null 2> /output/test_pass_output.txt
We used 2> /output/test_pass_output.txt
to redirect the standard error output to the mounted output directory. You can organize that directory in any way you want. We got rid of the standard output with > /dev/null
because we have no use for it. Try to run the pass without it to see what exactly is printed out just for reference.
Let’s dissect this command line:
opt
is LLVM’s command line tool for executing passes.-load LLVMTestPass.so
causes opt
to load the shared library that contains the Hello pass. We don't need to specify the full path to the pass since the docker image's environment was appropriately configured to know where to search. The name of the module (LLVMTestPass) is defined in CMakeLists.txt
-TestPass
tells opt
that we wish to run the TestPass pass, which was bound to the -TestPass
command line flag by the RegisterPass<TestPass>
declaration in TestPass.cpp.opt
is a transformed program. Since the Hello pass doesn’t perform any transformations, we just redirect the output to /dev/null
to ignore it.Analogous to -TestPass
, opt
has flags to enable each of the other passes provided by LLVM. If you’re curious, running opt -help
dumps out a massive laundry list of all the passes that are built into LLVM.
Although the TestPass pass may be trivial, it contains a lot of the boilerplate that you will need to use and understand to write your own passes. Part of this assignment will be to read the documentation and learn how to use the existing infrastructure provided by LLVM.
Here is some high-level guidance for how to understand Hello.cpp.
Pass
class: TestPass
subclasses FunctionPass
, which implements functionality for analyses and optimizations that only look at a single function at a time. There are analogous passes like ModulePass
and BasicBlockPass
that look at entire modules or single basic blocks, respectively.Pass
, there is a corresponding entry point function called runOn<suffix>
, where <suffix>
depends on the kind of pass. For example, FunctionPass
defines a virtual function called runOnFunction(Function &F)
that you fill in with your pass’s implementation. You don’t worry about how this function gets called: you simply write the details of the function, and the system makes sure it gets called for each function in the program.raw_ostream.h
rather than those in standard C++. errs()
is the error output stream, and there is a corresponding outs()
for standard output.After you are done installing LLVM manually, follow the previous guides to get the "getting started" code, compile and run an LLVM pass. You will not need to use the docker commands. In most cases you will have to specify full paths to LLVM passes and tools, unless you configure the PATH variable.
First, create a new directory for this project.
$ mkdir cse231-proj1
$ cd cse231-proj1
$ mkdir llvm
Check out the source code for LLVM and clang using subversion (svn)
$ cd llvm
$ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
$ cd tools
$ svn co http://llvm.org/svn/llvm-project/cfe/trunk clang
Now compile LLVM and clang using cmake
$ cd cse231-proj1 (the root directory we created earlier)
$ mkdir build
$ cd build
$ cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release ../llvm
$ make
LLVM is a large codebase, so this will take a long time (use the -j flag to run multiple compilation threads for speedup if you can).