Getting started with LLVM core libraries

LLVM is an inspiring software project that started with the passion for compilers of a single person, Chris Lattner. The events that followed the first versions of LLVM and how it became widely adopted later reveal a pattern that may be observed across the history of other successful open source projects: they did not start within a company, but instead they are the product of simple human curiosity with respect to a given subject. For example, the first Linux kernel was the result of a Finnish student being intrigued by the area of operating systems and being motivated to understand and see in practice how a real operating system should work.

For Linux or LLVM, the contribution of many other programmers matured and leveraged the project to a first-class software that rivals, in quality, any other established competitor. It is unfair, therefore, to attribute the success of any big project to a single person. However, in the open source community, the leap from a student's project to an incredibly complex yet robust software depends on a key factor: attracting contributors and programmers who enjoy spending their time on the project.

Schools create a fascinating atmosphere because education involves the art of teaching people how things work. For these people, the feeling of unraveling how intricate mechanisms work and surpassing the state of being puzzled to finally mastering them is full of victory and overcoming. In this environment, at the University of Illinois at Urbana-Champaign (UIUC), the LLVM project grew by being used both as a research prototype and as a teaching framework for compiler classes lectured by Vikram Adve, Lattner's Master's advisor. Students contributed to the first bug reports, setting in motion the LLVM trajectory as a well-designed and easy-to-study piece of software.

The blatant disparity between software theory and practice befuddles many Computer Science students. A clean and simple concept in computing theory may involve so many levels of implementation details such that they disguise real-life software projects to become simply too complex for the human mind to grasp, especially all of its nuances. A clever design with powerful abstractions is the key to aid the human brain to navigate all the levels of a project: from the high-level view, which implements how the program works in a broader sense, to the lowest level of detail.

This is particularly true for compilers. Students who have a great passion to learn how compilers work often face a tough challenge when it comes to understanding the factual compiler implementation. Before LLVM, GCC was one of the few open source options for hackers and curious students to learn how a real compiler is implemented, despite the theory taught in schools.

However, a software project reflects, in its purest sense, the view of the programmers who created it. This happens through the abstractions employed to distinguish modules and data representation across several components. Programmers may have different views about the same topic. In this way, old and large software bases such as GCC, which is almost 30 years old, frequently embody a collection of different views of different generation of programmers, which makes the software increasingly difficult for newer programmers and curious observers to understand.

The LLVM project not only attracted experienced compiler programmers, but also a lot of young and curious minds that saw in it a much cleaner and simpler hackable piece of software, which represented a compiler with a lot of potential. This was clearly observed by the incredible number of scientific papers that chose LLVM as a prototype to do research. The reason is simple; in academia, students are frequently in charge of the practical aspects of the implementation, and thus, it is of paramount importance for research projects that the student be able to master its experimental framework code base. Seduced by its newer design using the C++ language (instead of C used in GCC), modularity (instead of the monolithic structure of GCC), and concepts that map more easily to the theory being taught in modern compiler courses, many researchers found it easy to hack LLVM in order to implement their ideas, and they were successful. The success of LLVM in academia, therefore, was a consequence of this reduced gap between theory and practice.

Beyond an experimental framework for scientific research, the LLVM project also attracted industry interest due to its considerably more liberal license in comparison with the GPL license of GCC. As a project that grew in academia, a big frustration for researchers who write code is the fear that it will only be used for a single experiment and be immediately discarded afterwards. To fight this fate, Chris Lattner, in his Master's project at UIUC that gave birth to LLVM, decided to license the project under the University of Illinois/NCSA Open Source License, allowing its use, commercial or not, as long as the copyright notice is maintained. The goal was to maximize LLVM adoption, and this goal was fulfilled with honor. In 2012, LLVM was awarded the ACM Software System Award, a highly distinguished recognition of notable software that contributed to science.

Many companies embraced the LLVM project with different necessities and performed different contributions, widening the range of languages that an LLVM-based compiler can operate with as well as the range of machines for which the compiler is able to generate code. This new phase of the project provided an unprecedented level of maturity to the library and tools, allowing it to permanently leave the state of experimental academia software to enter the status of a robust framework used in commercial products. With this, the name of the project also changed from Low Level Virtual Machine to the acronym LLVM.

The decision to retire the name Low Level Virtual Machine in favor of just LLVM reflects the change of goals of the project across its history. As a Master's project, LLVM was created as a framework to study lifelong program optimizations. These ideas were initially published in a 2003 MICRO (International Symposium on Microarchitecture) paper entitled LLVA: A Low-level Virtual Instruction Set Architecture, describing its instruction set, and in a 2004 CGO (International Symposium on Code Generation and Optimization) paper entitled LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation.

Outside of an academic context, LLVM became a well-designed compiler with the interesting property of writing its intermediate representation to disk. In commercial systems, it was never truly used as a virtual machine such as the Java Virtual Machine (JVM), and thus, it made little sense to continue with the Low Level Virtual Machine name. On the other hand, some other curious names remained as a legacy. The file on the disk that stores a program in the LLVM intermediate representation is referred to as the LLVM bitcode, a parody of the Java bytecode, as a reference to the amount of space necessary to represent programs in the LLVM intermediate representation versus the Java one.

Our goal in writing this book is twofold. First, since the LLVM project grew a lot, we want to present it to you in small pieces, a component at a time, making it as simple as possible to understand while providing you with the joy of working with a powerful compiler library. Second, we want to evoke the spirit of an open source hacker, inspiring you to go far beyond the concepts presented here and never stop expanding your knowledge.

Happy hacking!

What this book covers

Chapter 1, Build and Install LLVM, will show you how to install the Clang/LLVM package on Linux, Windows, or Mac, including a discussion about building LLVM on Visual Studio and Xcode. It will also discuss the different flavors of LLVM distributions and discuss which distribution is best for you: pre-built binaries, distribution packages, or source codes.

Chapter 2, External Projects, will present external LLVM projects that live in separate packages or repositories, such as extra Clang tools, the DragonEgg GCC plugin, the LLVM debugger (LLDB), and the LLVM test suite.

Chapter 3, Tools and Design, will explain how the LLVM project is organized in different tools, working out an example on how to use them to go from source code to assembly language. It will also present how the compiler driver works, and finally, how to write your very first LLVM tool.

Chapter 4, The Frontend, will present the LLVM compiler frontend, the Clang project. It will walk you through all the steps of the frontend while explaining how to write small programs that use each part of the frontend as it is presented. It finishes by explaining how to write a small compiler driver with Clang libraries.

Chapter 5, The LLVM Intermediate Representation, will explain a crucial part of the LLVM design: its intermediate representation. It will show you what characteristics make it special, present its syntax, structure, and how to write a tool that generates the LLVM IR.

Chapter 6, The Backend, will introduce you to the LLVM compiler backend, responsible for translating the LLVM IR to machine code. This chapter will walk you through all the backend steps and provide you with the knowledge to create your own LLVM backend. It finishes by showing you how to create a backend pass.

Chapter 7, The Just-in-Time Compiler, will explain the LLVM Just-in-Time compilation infrastructure, which allows you to generate and execute machine code on demand. This technology is essential in applications where the program source code is only known at runtime, such as JavaScript interpreters in Internet browsers. This chapter walks you through the steps to use the right libraries in order to create your own JIT compiler.

Chapter 8, Cross-platform Compilation, will guide you through the steps for Clang/LLVM to create programs for other platforms such as ARM-based ones. This involves configuring the right environment to correctly compile programs that will run outside the environment where they were compiled.

Chapter 9, The Clang Static Analyzer, will present a powerful tool for discovering bugs in large source code bases without even running the program, but simply by analyzing the code. This chapter will also show you how to extend the Clang Static Analyzer with your own bug checkers.

Chapter 10, Clang Tools with LibTooling, will present the LibTooling framework and a series of Clang tools that are built upon this library, which allow you to perform source code refactoring or simply analyze the source code in an easy way. This chapter finishes by showing you how to write your own C++ source code refactoring tool by using this library.

At the time of this writing, LLVM 3.5 had not been released. While this book focuses on LLVM Version 3.4, we plan to release an appendix updating the examples in this book to LLVM 3.5 by the third week of September 2014, allowing you to exercise the content of the book with the newest versions of LLVM. This appendix will be available at https://www.packtpub.com/sites/default/files/downloads/6924OS_Appendix.pdf.

What you need for this book

To begin exploring the world of LLVM, you can use a UNIX system, a Mac OS X system, or a Windows system, as long as they are equipped with a modern C++ compiler. The LLVM source code is very demanding on the C++ compiler used to compile it and uses the newest standards. This means that on Linux, you will need at least GCC 4.8.1; on Max OS X, you will need at least Xcode 5.1; and on Windows, you will need Visual Studio 2012.

Even though we explain how to build LLVM on Windows with Visual Studio, this book does not focus on this platform because some LLVM features are unavailable for it. For example, LLVM lacks loadable module support on Windows, but we show you how to write LLVM plugins that are built as shared libraries. In these cases, the only way to see this in practice is to use either Linux or Mac OS X.

If you do not want to build LLVM for yourself, you can use a prebuilt binary bundle. However, you will be restricted to use the platforms where this convenience is available.

Who this book is for

This book is intended for enthusiasts, computer science students, and compiler engineers interested in learning about the LLVM framework. You need a background in C++ and, although not mandatory, should know at least some compiler theory. Whether you are a newcomer or a compiler expert, this book provides a practical introduction to LLVM and avoids complex scenarios. If you are interested enough and excited about this technology, then this book is definitely for you.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The prebuilt package for Windows comes with an easy-to-use installer that unpacks the LLVM tree structure in a subfolder of your Program Files folder."

A block of code is set as follows:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

int main() {
    uint64_t a = 0ULL, b = 0ULL;
    scanf ("%lld %lld", &a, &b);
    printf ("64-bit division is %lld\n", a / b);
    return EXIT_SUCCESS;
}

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

KEYWORD(float                       , KEYALL)
KEYWORD(goto                        , KEYALL)
KEYWORD(inline                      , KEYC99|KEYCXX|KEYGNU)
KEYWORD(int                         , KEYALL)
KEYWORD(return                      , KEYALL)
KEYWORD(short                       , KEYALL)
KEYWORD(while                       , KEYALL)

Any command-line input or output is written as follows:

$ sudo mv clang+llvm-3.4-x86_64-linux-gnu-ubuntu-13.10 llvm-3.4
$ export PATH="$PATH:/usr/local/llvm-3.4/bin"

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "During installation, make sure to check the Add CMake to the system PATH for all users option."

Note

Warnings or important notes appear in a box like this.

Note

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.

Getting started with LLVM core libraries

Getting started with LLVM core libraries

Overview of this book

Related Content you might be interested in

Current Title:

Getting started with LLVM core libraries

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Note

Note

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions