Book Image

LLVM Techniques, Tips, and Best Practices Clang and Middle-End Libraries

By : Min-Yih Hsu
Book Image

LLVM Techniques, Tips, and Best Practices Clang and Middle-End Libraries

By: Min-Yih Hsu

Overview of this book

Every programmer or engineer, at some point in their career, works with compilers to optimize their applications. Compilers convert a high-level programming language into low-level machine-executable code. LLVM provides the infrastructure, reusable libraries, and tools needed for developers to build their own compilers. With LLVM’s extensive set of tooling, you can effectively generate code for different backends as well as optimize them. In this book, you’ll explore the LLVM compiler infrastructure and understand how to use it to solve different problems. You’ll start by looking at the structure and design philosophy of important components of LLVM and gradually move on to using Clang libraries to build tools that help you analyze high-level source code. As you advance, the book will show you how to process LLVM IR – a powerful way to transform and optimize the source program for various purposes. Equipped with this knowledge, you’ll be able to leverage LLVM and Clang to create a wide range of useful programming language tools, including compilers, interpreters, IDEs, and source code analyzers. By the end of this LLVM book, you’ll have developed the skills to create powerful tools using the LLVM framework to overcome different real-world challenges.
Table of Contents (18 chapters)
1
Section 1: Build System and LLVM-Specific Tooling
6
Section 2: Frontend Development
11
Section 3: "Middle-End" Development

Tweaking CMake arguments

This section will show you some of the most common CMake arguments in LLVM's build system that can help you customize your build and achieve maximum efficiency.

Before we start, you should have a build folder that has been CMake-configured. Most of the following subsections will modify a file in the build folder; that is, CMakeCache.txt.

Choosing the right build type

LLVM uses several predefined build types provided by CMake. The most common types among them are as follows:

  • Release: This is the default build type if you didn't specify any. It will adopt the highest optimization level (usually -O3) and eliminate most of the debug information. Usually, this build type will make the building speed slightly slower.
  • Debug: This build type will compile without any optimization applied (that is, -O0). It preserves all the debug information. Note that this will generate a huge number of artifacts and usually take up ~20 GB of space, so please be sure you have enough storage space when using this build type. This will usually make the building speed slightly faster since no optimization is being performed.
  • RelWithDebInfo: This build type applies as much compiler optimization as possible (usually -O2) and preserves all the debug information. This is an option balanced between space consumption, runtime speed, and debuggability.

You can choose one of them using the CMAKE_BUILD_TYPE CMake variable. For example, to use the RelWithDebInfo type, you can use the following command:

$ cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo …

It is recommended to use RelWithDebInfo first (if you're going to debug LLVM later). Modern compilers have gone a long way to improve the debug information's quality in optimized program binaries. So, always give it a try first to avoid unnecessary storage waste; you can always go back to the Debug type if things don't work out.

In addition to configuring build types, LLVM_ENABLE_ASSERTIONS is another CMake (Boolean) argument that controls whether assertions (that is, the assert(bool predicate) function, which will terminate the program if the predicate argument is not true) are enabled. By default, this flag will only be true if the build type is Debug, but you can always turn it on manually to enforce stricter checks, even in other build types.

Avoiding building all targets

The number of LLVM's supported targets (hardware) has grown rapidly in the past few years. At the time of writing this book, there are nearly 20 officially supported targets. Each of them deals with non-trivial tasks such as native code generation, so it takes a significant amount of time to build. However, the chances that you're going to be working on all of these targets at the same time are low. Thus, you can select a subset of targets to build using the LLVM_TARGETS_TO_BUILD CMake argument. For example, to build the X86 target only, we can use the following command:

$ cmake -DLLVM_TARGETS_TO_BUILD="X86" …

You can also specify multiple targets using a semicolon-separated list, as follows:

$ cmake -DLLVM_TARGETS_TO_BUILD="X86;AArch64;AMDGPU" …

Surround the list of targets with double quotes!

In some shells, such as BASH, a semicolon is an ending symbol for a command. So, the rest of the CMake command will be cut off if you don't surround the list of targets with double-quotes.

Let's see how building shared libraries can help tweak CMake arguments.

Building as shared libraries

One of the most iconic features of LLVM is its modular design. Each component, optimization algorithm, code generation, and utility libraries, to name a few, are put into their own libraries where developers can link individual ones, depending on their usage. By default, each component is built as a static library (*.a in Unix/Linux and *.lib in Windows). However, in this case, static libraries have the following drawbacks:

  • Linking against static libraries usually takes more time than linking against dynamic libraries (*.so in Unix/Linux and *.dll in Windows).
  • If multiple executables link against the same set of libraries, like many of the LLVM tools do, the total size of these executables will be significantly larger when you adopt the static library approach compared to its dynamic library counterpart. This is because each of the executables has a copy of those libraries.
  • When you're debugging LLVM programs with debuggers (GDB, for example), they usually spend quite some time loading the statically linked executables at the very beginning, hindering the debugging experience.

Thus, it's recommended to build every LLVM component as a dynamic library during the development phase by using the BUILD_SHARED_LIBS CMake argument:

$ cmake -DBUILD_SHARED_LIBS=ON …

This will save you a significant amount of storage space and speed up the building process.

Splitting the debug info

When you're building a program in debug mode – adding the -g flag when using you're GCC and Clang, for example – by default, the generated binary contains a section that stores debug information. This information is essential for using a debugger (for example, GDB) to debug that program. LLVM is a large and complex project, so when you're building it in debug mode – using the cmAKE_BUILD_TYPE=Debug variable – the compiled libraries and executables come with a huge amount of debug information that takes up a lot of disk space. This causes the following problems:

  • Due to the design of C/C++, several duplicates of the same debug information might be embedded in different object files (for example, the debug information for a header file might be embedded in every library that includes it), which wastes lots of disk space.
  • The linker needs to load object files AND their associated debug information into memory during the linking stage, meaning that memory pressure will increase if the object file contains a non-trivial amount of debug information.

To solve these problems, the build system in LLVM provides allows us to split debug information into separate files from the original object files. By detaching debug information from object files, the debug info of the same source file is condensed into one place, thus avoiding unnecessary duplicates being created and saving lots of disk space. In addition, since debug info is not part of the object files anymore, the linker no longer needs to load them into memory and thus saves lots of memory resources. Last but not least, this feature can also improve our incremental building speed – that is, rebuild the project after a (small) code change – since we only need to update the modified debug information in a single place.

To use this feature, please use the LLVM_USE_SPLIT_DWARF cmake variable:

$ cmake -DcmAKE_BUILD_TYPE=Debug -DLLVM_USE_SPLIT_DWARF=ON

Note that this CMake variable only works for compilers that use the DWARF debug format, including GCC and Clang.

Building an optimized version of llvm-tblgen

TableGen is a Domain-Specific Language (DSL) for describing structural data that will be converted into the corresponding C/C++ code as part of LLVM's building process (we will learn more about this in the chapters to come). The conversion tool is called llvm-tblgen. In other words, the running time of llvm-tblgen will affect the building time of LLVM itself. Therefore, if you're not developing the TableGen part, it's always a good idea to build an optimized version of llvm-tblgen, regardless of the global build type (that is, CMAKE_BUILD_TYPE), making llvm-tblgen run faster and shortening the overall building time.

The following CMake command, for example, will create build configurations that build a debug version of everything except the llvm-tblgen executable, which will be built as an optimized version:

$ cmake -DLLVM_OPTIMIZED_TABLEGEN=ON -DCMAKE_BUILD_TYPE=Debug …

Lastly, you'll see how you can use Clang and the new PassManager.

Using the new PassManager and Clang

Clang is LLVM's official C-family frontend (including C, C++, and Objective-C). It uses LLVM's libraries to generate machine code, which is organized by one of the most important subsystems in LLVM – PassManager. PassManager puts together all the tasks (that is, the Passes) required for optimization and code generation.

In Chapter 9, Working with PassManager and AnalysisManager, will introduce LLVM's new PassManager, which builds from the ground up to replace the existing one somewhere in the future. The new PassManager has a faster runtime speed compared to the legacy PassManager. This advantage indirectly brings better runtime performance for Clang. Therefore, the idea here is pretty simple: if we build LLVM's source tree using Clang, with the new PassManager enabled, the compilation speed will be faster. Most of the mainstream Linux distribution package repositories already contain Clang. It's recommended to use Clang 6.0 or later if you want a more stable PassManager implementation. Use the LLVM_USE_NEWPM CMake variable to build LLVM with the new PassManager, as follows:

$ env CC=`which clang` CXX=`which clang++` \
  cmake -DLLVM_USE_NEWPM=ON …

LLVM is a huge project that takes a lot of time to build. The previous two sections introduced some useful tricks and tips for improving its building speed. In the next section, we're going to introduce an alternative build system to build LLVM. It has some advantages over the default CMake build system, which means it will be more suitable in some scenarios.