Book Image

Learn LLVM 17 - Second Edition

By : Kai Nacke, Amy Kwan
Book Image

Learn LLVM 17 - Second Edition

By: Kai Nacke, Amy Kwan

Overview of this book

LLVM was built to bridge the gap between the theoretical knowledge found in compiler textbooks and the practical demands of compiler development. With a modular codebase and advanced tools, LLVM empowers developers to build compilers with ease. This book serves as a practical introduction to LLVM, guiding you progressively through complex scenarios and ensuring that you navigate the challenges of building and working with compilers like a pro. The book starts by showing you how to configure, build, and install LLVM libraries, tools, and external projects. You’ll then be introduced to LLVM's design, unraveling its applications in each compiler stage: frontend, optimizer, and backend. Using a real programming language subset, you'll build a frontend, generate LLVM IR, optimize it through the pipeline, and generate machine code. Advanced chapters extend your expertise, covering topics such as extending LLVM with a new pass, using LLVM tools for debugging, and enhancing the quality of your code. You'll also focus on just-in-time compilation issues and the current state of JIT-compilation support with LLVM. Finally, you’ll develop a new backend for LLVM, gaining insights into target description and how instruction selection works. By the end of this book, you'll have hands-on experience with the LLVM compiler development framework through real-world examples and source code snippets.
Table of Contents (20 chapters)
1
Part 1: The Basics of Compiler Construction with LLVM
4
Part 2: From Source to Machine Code Generation
10
Part 3: Taking LLVM to the Next Level
14
Part 4: Roll Your Own Backend

Customizing the build process

The CMake system uses a project description in the CMakeLists.txt file. The top-level file is in the llvm directory, llvm/CMakeLists.txt. Other directories also have CMakeLists.txt files, which are recursively included during the generation process.

Based on the information provided in the project description, CMake checks which compilers are installed, detects libraries and symbols, and creates the build system files, for example, build.ninja or Makefile (depending on the chosen generator). It is also possible to define reusable modules, for example, a function to detect whether LLVM is installed. These scripts are placed in the special cmake directory (llvm/cmake), which is searched automatically during the generation process.

The build process can be customized with the definition of CMake variables. The command-line option –D is used to set a variable to a value. The variables are used in the CMake scripts. Variables defined by CMake itself are almost always prefixed with CMAKE_ and these variables can be used in all projects. Variables defined by LLVM are prefixed with LLVM_ but they can only be used if the project definition includes the use of LLVM.

Variables defined by CMake

Some variables are initialized with the value of environment variables. Most notable are CC and CXX, which define the C and C++ compilers to be used for building. CMake tries to locate a C and a C++ compiler automatically, using the current shell search path. It picks the first compiler found. If you have several compilers installed, for example, gcc and clang or different versions of clang, then this might not be the compiler you want for building LLVM.

Suppose you like to use clang17 as a C compiler and clang++17 as a C++ compiler. Then, you can invoke CMake in a Unix shell in the following way:

$ CC=clang17 CXX=clang++17 cmake –B build –S llvm

This sets the value of the environment variables only for the invocation of cmake. If necessary, you can specify an absolute path for the compiler executables.

CC is the default value of the CMAKE_C_COMPILER CMake variable, and CXX is the default value of the CMAKE_CXX_COMPILER CMake variable. Instead of using the environment variables, you can set the CMake variables directly. This is equivalent to the preceding call:

$ cmake –DCMAKE_C_COMPILER=clang17 \
  -DCMAKE_CXX_COMPILER=clang++17 –B build –S llvm

Other useful variables defined by CMake are as follows:

Variable name

Purpose

CMAKE_INSTALL_PREFIX

This is a path prefix that is prepended to every path during installation. The default is /usr/local on Unix and C:\Program Files\<Project> on Windows. To install LLVM in the /opt/llvm directory, you specify -DCMAKE_INSTALL_PREFIX=/opt/llvm. The binaries are copied to /opt/llvm/bin, library files to /opt/llvm/lib, and so on.

CMAKE_BUILD_TYPE

Different types of build require different settings. For example, a debug build needs to specify options to generate debug symbols and usually link against debug versions of system libraries. In contrast, a release build uses optimization flags and links against production versions of libraries. This variable is only used for build systems that can only handle one build type, for example, Ninja or Make. For IDE build systems, all variants are generated and you have to use the mechanism of the IDE to switch between build types. Possible values are as follows:

DEBUG: build with debug symbols

RELEASE: build with optimization for speed

RELWITHDEBINFO: release build with debug symbols

MINSIZEREL: build with optimization for size

The default build type is taken from the CMAKE_BUILD_TYPE environment variable. If this variable is not set, then the default depends on the used toolchain and is often empty. In order to generate build files for a release build, you specify -DCMAKE_BUILD_TYPE=RELEASE.

CMAKE_C_FLAGS

CMAKE_CXX_FLAGS

These are extra flags used when compiling C and C++ source files. The initial values are taken from the CFLAGS and CXXFLAGS environment variables, which can be used as an alternative.

CMAKE_MODULE_PATH

This specifies additional directories that are searched for CMake modules. The specified directories are searched before the default ones. The value is a semicolon-separated list of directories.

PYTHON_EXECUTABLE

If the Python interpreter is not found or if the wrong one is picked in case you have installed multiple versions, you can set this variable to the path of the Python binary. This variable only has an effect if the Python module of CMake is included (which is the case for LLVM).

Table 1.1 - Additional useful variables provided by CMake

CMake provides built-in help for variables. The --help-variable var option prints help for the var variable. For instance, you can type the following to get help for CMAKE_BUILD_TYPE:

$ cmake --help-variable CMAKE_BUILD_TYPE

You can also list all variables with the following command:

$ cmake --help-variable-list

This list is very long. You may want to pipe the output to more or a similar program.

Using LLVM-defined build configuration variables

The build configuration variables defined by LLVM work in the same way as those defined by CMake except that there is no built-in help. The most useful variables are found in the following tables, where they are divided into variables that are useful for first-time users installing LLVM, and also variables for more advanced LLVM users.

Variables useful for first-time users installing LLVM

Variable name

Purpose

LLVM_TARGETS_TO_BUILD

LLVM supports code generation for different CPU architectures. By default, all these targets are built. Use this variable to specify the list of targets to build, separated by semicolons. The current targets are AArch64, AMDGPU, ARM, AVR, BPF, Hexagon, Lanai, LoongArch, Mips, MSP430, NVPTX, PowerPC, RISCV, Sparc, SystemZ, VE, WebAssembly, X86, and XCore. all can be used as shorthand for all targets. The names are case-sensitive. To only enable the PowerPC and the System Z target, you specify -DLLVM_TARGETS_TO_BUILD="PowerPC;SystemZ".

LLVM_EXPERIMENTAL_TARGETS_TO_BUILD

In addition to the official targets, the LLVM source tree also contains experimental targets. These targets are under development and often do not yet support the full functionality of a backend. The current list of experimental targets is ARC, CSKY, DirectX, M68k, SPIRV, and Xtensa. To build the M68k target, you specify -D LLVM_EXPERIMENTAL_TARGETS_TO_BUILD=M68k.

LLVM_ENABLE_PROJECTS

This is a list of the projects you want to build, separated by semicolons. The source for the projects must be on the same level as the llvm directory (side-by-side layout). The current list is bolt, clang, clang-tools-extra, compiler-rt, cross-project-tests, libc, libclc, lld, lldb, mlir, openmp, polly, and pstl. all can be used as shorthand for all projects in this list. Additionally, you can specify the flang project here. Due to some special build requirements, it is not yet part of the all list.

To build clang and bolt together with LLVM, you specify -DLLVM_ENABLE_PROJECT="clang;bolt".

Table 1.2 - Useful variables for first-time LLVM users

Variables for advanced users of LLVM

LLVM_ENABLE_ASSERTIONS

If set to ON, then assertion checks are enabled. These checks help to find errors and are very useful during development. The default value is ON for a DEBUG build and otherwise OFF. To turn assertion checks on (e.g. for a RELEASE build), you specify –DLLVM_ENABLE_ASSERTIONS=ON.

LLVM_ENABLE_EXPENSIVE_CHECKS

This enables some expensive checks that can really slow down compilation speed or consume large amounts of memory. The default value is OFF. To turn these checks on, you specify -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON.

LLVM_APPEND_VC_REV

LLVM tools such as llc display the LLVM version they are based on besides other information if the –version command-line option is given. This version information is based on the LLVM_REVISION C macro. By default, not only the LLVM version but also the current Git hash is part of the version information. This is handy in case you are following the development of the master branch because it makes clear on which Git commit the tool is based. If not needed, then this can be turned off with –DLLVM_APPEND_VC_REV=OFF.

LLVM_ENABLE_THREADS

LLVM automatically includes thread support if a threading library is detected (usually the pthreads library). Further, LLVM assumes in this case that the compiler supports TLS (thread-local storage). If you don’t want thread support or your compiler does not support TLS, then you can turn it off with -DLLVM_ENABLE_THREADS=OFF.

LLVM_ENABLE_EH

The LLVM projects do not use C++ exception handling and therefore turn exception support off by default. This setting can be incompatible with other libraries your project is linking with. If needed, you can enable exception support by specifying –DLLVM_ENABLE_EH=ON.

LLVM_ENABLE_RTTI

LLVM uses a lightweight, self-build system for runtime type information. The generation of C++ RTTI is turned off by default. Like the exception handling support, this may be incompatible with other libraries. To turn generation of C++ RTTI on, you specify –DLLVM_ENABLE_RTTI=ON.

LLVM_ENABLE_WARNINGS

Compiling LLVM should generate no warning messages if possible. The option to print warning messages is therefore turned on by default. To turn it off, you specify –DLLVM_ENABLE_WARNINGS=OFF.

LLVM_ENABLE_PEDANTIC

The LLVM source should be C/C++ language standard-conforming; hence, pedantic checking of the source is enabled by default. If possible, compiler-specific extensions are also disabled. To reverse this setting, you specify –DLLVM_ENABLE_PEDANTIC=OFF.

LLVM_ENABLE_WERROR

If set to ON, then all warnings are treated as errors – the compilation aborts as soon as warnings are found. It helps to find all remaining warnings in the source. By default, it is turned off. To turn it on, you specify –DLLVM_ENABLE_WERROR=ON.

LLVM_OPTIMIZED_TABLEGEN

Usually, the tablegen tool is built with the same options as all other parts of LLVM. At the same time, tablegen is used to generate large parts of the code generator. As a result, tablegen is much slower in a debug build, increasing the compile time noticeably. If this option is set to ON, then tablegen is compiled with optimization turned on even for a debug build, possibly reducing compile time. The default is OFF. To turn it on, you specify –DLLVM_OPTIMIZED_TABLEGEN=ON.

LLVM_USE_SPLIT_DWARF

If the build compiler is gcc or clang, then turning on this option will instruct the compiler to generate the DWARF debug information in a separate file. The reduced size of the object files reduces the link time of debug builds significantly. The default is OFF. To turn it on, you specify -LLVM_USE_SPLIT_DWARF=ON.

Table 1.3 - Useful variables for advanced LLVM users

Note

LLVM defines many more CMake variables. You can find the complete list in the LLVM documentation about CMake https://releases.llvm.org/17.0.1/docs/CMake.html#llvm-specific-variables. The preceding list contains only the ones you are most likely to need.