Installing LLVM manually

The purpose of this part is to ensure you all have a working and compatible LLVM installation. In order to avoid potential compatibility issues generated from students using different LLVM versions than the expected (v9.0.1), we provide a Docker image with bearbones Ubuntu 18.04 and a clean LLVM-9 installation.

In this guide, we provide instructions on how to install Docker and pull the LLVM image. In case you are not able to use Docker, you will have to install LLVM manually.

Warning! Read Carefully.

Using the provided Docker container requires a installing Docker, 3 GB of free space and root access on the host machine (admin rights for windows).
If you are not able to use the provided container, you can install LLVM on your own. Make sure you install LLVM 9. We will expect your results to match ours.
Docker containers are not intended to store data. We highly recommend you develop your solutions locally and only use docker to compile and run. The following guides show you how to do that. Obtained results should be stored locally as well. If you develop within the container you are at risk of losing your work. You have been warned.
When you work within the provided container (interactively or not) you are automaticaly logged in as root. If you delete the mounted directory containing your work it will be deleted from the host system. Make sure your work is secure at all times. We recommend you use some version control system (e.g. git).

Installing Docker

Installing Docker should be straightforward for Windows and Mac OS users, by downloading it from the Docker website. Linux users will have to follow this guide. The linux guides essentially try to upgrade your system to a compatible version (for example upgrading to Ubuntu 16.04). Be careful not to break your current system. If you are working with linux, having a Ubuntu 16.04 or higher system should result in an easier docker installation.

For Windows users, Docker will require you enable Hyper-V and restart your computer. Some Windows 10 versions do not have Hyper-V. If you face any issues with installing Docker on Windows, installing Docker Toolbox instead of Docker should be the easiest way out. As an added benefit you will later be able to use our linux scripts to start the docker image.

Pulling the LLVM image.

After succesful instalation of Docker, open a command prompt or shell and execute the following command:

$ docker pull yalhessi/cse231_student:llvm9

Docker should automaticaly start downloading and extracting the provided LLVM image. If you skip this step the image will automatically be downloaded the first time you attempt to start it. Once finished you can verify you have it by typing

$ docker images

The Design of LLVM

At a high level, LLVM can be split conceptually into three parts:

A compiler frontend called Clang that takes C/C++ and translates it to a simplified program representation called LLVM IR,
a target-independent backend that performs various analyses and transformations on LLVM IR, and
a target-dependent backend that lowers LLVM IR to native machine code like x86.

In this class, we’ll only be adding code to (2). In particular, we will be writing additional passes that analyze and transform LLVM IR programs.

LLVM is architected such that a single run consists of a sequence of many passes run back to back. By keeping the logic of each pass relatively isolated from the others, LLVM achieves a nice modularity that makes it easy to add new passes. Additionally, you’ll probably find that LLVM IR is a refreshingly simple representation compared to, say, C++ or x86 assembly. This allows our analyses to avoid the complexity of reasoning about the idiosyncrasies of those more complex languages.

Getting started with LLVM in Docker.

This guide will provide information on how you can develop code locally and then compile and run it within the docker container.

Clone the "getting started" code

Start by cloning the CSE231_Project folder from github:

$ git clone https://github.com/UCSD-PL/cse-231-starter.git

Take a minute to see what we prepared for you in this folder. You get a set of guides on how to generate IR code, how to compile and how to run a pass. All these guides are also covered in this page.

The mount_and_launch.sh script will automatically mount the other three folders to the appropriate mount points and start a shell in the docker image. All you have to do is write your code under the Passes directory (pay attention to the CMakeLists.txt files - they are important for compiling your passes), run mount_and_launch.sh and you are ready to compile and run. You will use the Output folder to store results from running your passes. Spend some time to familiarize yourselves with the directory structure. Also take a look at the script's code so you understand exactly how it works, in case you need to modify it later on. We run the image with the "--rm" option so it automatically stops and kills the running image after you exit it and "-it" for an interactive session (opening a shell).

To start a shell in the LLVM docker image, type:

Linux:

$ sudo ./mount_and_launch.sh

Windows:

Windows users will have to open the file names "windows_docker_command.txt", modify the paths in the command and then copy-paste it in a Powershell. Some windows machines forbid the execution of scripts in Powershell for security purposes. It's up to you if you wish to enable it and convert the provided file to a Powershell script in the future.

Info!

The Tests directory you pulled from Github is now included in the Docker image we provide, so you no longer need to mount it before running the docker container. If however, you write your own passes in that directory, use the -v flag (see below for more info) to mount it.

Info!

If you installed Docker toolbox instead of Docker, you should be able to use the mount_and_launch.sh script. Try not to have spaces in the path to the cloned "starter code" because it might lead to unpredictable errors.

Info!

Regardless of the host OS, after you start an interactive session in Docker you will all be working in the same system. The remaining guides in this page will be the same for Windows, Mac OS and Linux users.

Understanding Docker mount points

When you start a shell in Docker, you will find the LLVM source code under /LLVM_ROOT/llvm. The compiled LLVM is located in /LLVM_ROOT/build. The environment in the provided image comes pre-configured so you don't have to type in full paths every time you need to run an LLVM command.

Mount points:

The directories in the "starter pack" code you cloned in the previous step will be mounted at the following points within the container (Verify by moving into those mount points (cd command) and make sure the content of the folders is there.

Passes --> /LLVM_ROOT/llvm/lib/Transforms/CSE231_Project

*Included in container. No need to mount anymore.

Output --> /output

Spend some time exploring these folders and verify that changes from the host are immediately observable in the image and vice versa. Careful when deleting files because they will be deleted on the host as well.

Info!

If you installed LLVM manually, you will have to generate a folder under the lib/Transforms folder to store your passes, as well as modify the appropriate CMakeLists.txt files.

Generating IR code for a test case

In the Docker container, you will find a "Hello World" program written in C++ under the provided "tests" folder. We will use this as the first test case for your first LLVM pass. You will need to be able to convert source code to IR code for your passes to operate on. Feel free to write your own small program while following this code. All you have to do is create a new folder under "tests" and write your code in it. Otherwise simply follow this guide to compile the Hello World program.

First, start the docker image by executing the provided script. Then navigate (cd) to /tests. Then cd into the HelloWorld folder. You need to use clang to generate IR code by typing the following command:

# clang -O0 -S -emit-llvm HelloWorld.cpp

This will generate a new file called HelloWorld.ll. Keep in mind that files generated inside a docker container will be deleted as soon as you exit it (mounted directories excluded). You can read the file and see how IR code looks like. You just compiled your first program into LLVM IR. Congratulations.

For reference, here's all the commands you will need to run:

$ sudo ./mount_and_launch.sh
(automatically running cmake before opening the shell)
# cd /tests
# cd HelloWorld
# clang -O0 -S -emit-llvm HelloWorld.cpp

First Look at LLVM IR

Go ahead and open HelloWorld.ll, which contains the human-readable LLVM IR produced from HelloWorld.cpp in the same directory. You will notice that LLVM IR looks a lot like assembly code. In fact, LLVM IR is an assembly language, with a few unusual features:

It contains types, such as i32 and i8*
It has an unlimited number of registers with names that start with %
No register occurs on the left hand side of more than one assignment.

Otherwise, the usual suspects are there: arithmetic instructions, logical instructions, branch and jump instructions, etc. The LLVM Language Reference lists them all, in case you are interested.

Writing and compiling your first LLVM pass

The LLVM pass code is provided for convenience. To understand exactly what each line of code means (or even better try to write it on your own), follow this guide.

For each new pass you create in the future, you will have to create a new directory under "Passes" with your code and the appropriate CMakeLists.txt file in it. Instructions on what to include in that file can be found in the link provided earlier. To make sure your new pass will be compiled, you have to include your new directory in the CMakeLists.txt file under the "Passes" folder, simply following the existing syntax.

You are now ready to compile your first pass. Start the docker image and move (cd) into the build directory (/LLVM_ROOT/build). We first have to invoke cmake and point it to the source code, which in turn will visit the CMakeLists.txt files and prepare everything for compilation.

We only need to compile our pass, so after cmake is done, move into the pass directory (/LLVM_ROOT/build/lib/Transforms/CSE231_Project) and compile ("make" command).

If everything went well you should be able to find your module (an LLVM module includes passes) under /LLVM_ROOT/build/lib. Keep in mind that after exiting the docker image, you will have to re-compile your passes. You might want to keep it running between compiling and running.

# cd /LLVM_ROOT/build
# cmake /LLVM_ROOT/llvm
(... cmake ends without errors ...)
# cd /LLVM_ROOT/build/lib/Transforms/CSE231_Project
# make
(Pass can now be found under /LLVM_ROOT/build/lib)

Running your first LLVM pass

After compiling your first LLVM pass and generating the IR code for your test cases, all that's left is to run it. The provided test pass prints the message "TEST: ", followed by each function name in the IR code. Output is redirected to the stderr output because we used the function errs() in the source code of the pass (outs() would have sent it to standard out). Because of the pre-configured environment you should be able to run the pass from any directory. You just have to make sure to store the output in the mounted /output folder if you want it to persist after exiting docker.

(Start docker image and compile your pass following the previous guides)
# opt -load LLVMTestPass.so -TestPass < /tests/HelloWorld/HelloWorld.ll > /dev/null 2> /output/test_pass_output.txt

We used 2> /output/test_pass_output.txt to redirect the standard error output to the mounted output directory. You can organize that directory in any way you want. We got rid of the standard output with > /dev/null because we have no use for it. Try to run the pass without it to see what exactly is printed out just for reference.

Let’s dissect this command line:

opt is LLVM’s command line tool for executing passes.
-load LLVMTestPass.so causes opt to load the shared library that contains the Hello pass. We don't need to specify the full path to the pass since the docker image's environment was appropriately configured to know where to search. The name of the module (LLVMTestPass) is defined in CMakeLists.txt
-TestPass tells opt that we wish to run the TestPass pass, which was bound to the -TestPass command line flag by the RegisterPass<TestPass> declaration in TestPass.cpp.
By default, the output of opt is a transformed program. Since the Hello pass doesn’t perform any transformations, we just redirect the output to /dev/null to ignore it.

Analogous to -TestPass, opt has flags to enable each of the other passes provided by LLVM. If you’re curious, running opt -help dumps out a massive laundry list of all the passes that are built into LLVM.

Although the TestPass pass may be trivial, it contains a lot of the boilerplate that you will need to use and understand to write your own passes. Part of this assignment will be to read the documentation and learn how to use the existing infrastructure provided by LLVM.

Here is some high-level guidance for how to understand Hello.cpp.

In LLVM, each pass is implemented in a separate C++ class that inherits from one of the subclasses of the Pass class: TestPass subclasses FunctionPass, which implements functionality for analyses and optimizations that only look at a single function at a time. There are analogous passes like ModulePass and BasicBlockPass that look at entire modules or single basic blocks, respectively.
For each kind of Pass, there is a corresponding entry point function called runOn<suffix>, where <suffix> depends on the kind of pass. For example, FunctionPass defines a virtual function called runOnFunction(Function &F) that you fill in with your pass’s implementation. You don’t worry about how this function gets called: you simply write the details of the function, and the system makes sure it gets called for each function in the program.
Idiomatic I/O in LLVM uses the output streams provided by raw_ostream.h rather than those in standard C++. errs() is the error output stream, and there is a corresponding outs() for standard output.
Read the Quick Start section of Writing An LLVM Pass, which goes line-by-line explaining Hello.cpp. Use the rest of the guide as a resource for navigating the LLVM APIs.
For more detailed reference and guidance, see the LLVM Programmer’s Manual, especially the section Helpful Hints for Common Operations. LLVM’s documentation is quite extensive, as you can see from the full list here.

Installing LLVM manually

Info!

After you are done installing LLVM manually, follow the previous guides to get the "getting started" code, compile and run an LLVM pass. You will not need to use the docker commands. In most cases you will have to specify full paths to LLVM passes and tools, unless you configure the PATH variable.

First, create a new directory for this project.

$ mkdir cse231-proj1
$ cd cse231-proj1
$ mkdir llvm

Check out the source code for LLVM and clang using subversion (svn)

$ cd llvm
$ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
$ cd tools
$ svn co http://llvm.org/svn/llvm-project/cfe/trunk clang

Now compile LLVM and clang using cmake

$ cd cse231-proj1 (the root directory we created earlier)
$ mkdir build
$ cd build
$ cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release ../llvm
$ make

LLVM is a large codebase, so this will take a long time (use the -j flag to run multiple compilation threads for speedup if you can).

Part 0

(Do now)

Installing Docker

Pulling the LLVM image.

The Design of LLVM

Getting started with LLVM in Docker.

Clone the "getting started" code

Linux:

Windows:

Understanding Docker mount points

Mount points:

Generating IR code for a test case

First Look at LLVM IR

Writing and compiling your first LLVM pass

Running your first LLVM pass

Installing LLVM manually