Learn LLVM 17 - Second Edition

By : Kai Nacke, Amy Kwan

Learn LLVM 17 - Second Edition

By: Kai Nacke, Amy Kwan

Overview of this book

LLVM was built to bridge the gap between the theoretical knowledge found in compiler textbooks and the practical demands of compiler development. With a modular codebase and advanced tools, LLVM empowers developers to build compilers with ease. This book serves as a practical introduction to LLVM, guiding you progressively through complex scenarios and ensuring that you navigate the challenges of building and working with compilers like a pro. The book starts by showing you how to configure, build, and install LLVM libraries, tools, and external projects. You’ll then be introduced to LLVM's design, unraveling its applications in each compiler stage: frontend, optimizer, and backend. Using a real programming language subset, you'll build a frontend, generate LLVM IR, optimize it through the pipeline, and generate machine code. Advanced chapters extend your expertise, covering topics such as extending LLVM with a new pass, using LLVM tools for debugging, and enhancing the quality of your code. You'll also focus on just-in-time compilation issues and the current state of JIT-compilation support with LLVM. Finally, you’ll develop a new backend for LLVM, gaining insights into target description and how instruction selection works. By the end of this book, you'll have hands-on experience with the LLVM compiler development framework through real-world examples and source code snippets.

Preface

What’s new in this edition

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Part 1: The Basics of Compiler Construction with LLVM

Free Chapter

Chapter 1: Installing LLVM

Compiling LLVM versus installing binaries

Getting the prerequisites ready

Cloning the repository and building from source

Customizing the build process

Summary

Chapter 2: The Structure of a Compiler

Building blocks of a compiler

An arithmetic expression language

Lexical analysis

Syntactical analysis

Semantic analysis

Generating code with the LLVM backend

Summary

Part 2: From Source to Machine Code Generation

Chapter 3: Turning the Source File into an Abstract Syntax Tree

Defining a real programming language

Creating the project layout

Managing the input files for the compiler

Handling messages for the user

Structuring the lexer

Constructing a recursive descent parser

Performing semantic analysis

Summary

Chapter 4: Basics of IR Code Generation

Generating IR from the AST

Using AST numbering to generate IR code in SSA form

Setting up the module and the driver

Summary

Chapter 5: IR Generation for High-Level Language Constructs

Technical requirements

Working with arrays, structs, and pointers

Getting the application binary interface right

Creating IR code for classes and virtual functions

Summary

Chapter 6: Advanced IR Generation

Throwing and catching exceptions

Generating metadata for type-based alias analysis

Adding debug metadata

Summary

Chapter 7: Optimizing IR

Technical requirements

The LLVM pass manager

Implementing a new pass

Using the ppprofiler pass with LLVM tools

Adding an optimization pipeline to your compiler

Summary

Part 3: Taking LLVM to the Next Level

Chapter 8: The TableGen Language

Technical requirements

Understanding the TableGen language

Experimenting with the TableGen language

Generating C++ code from a TableGen file

Drawbacks of TableGen

Summary

Chapter 9: JIT Compilation

Technical requirements

LLVM’s overall JIT implementation and use cases

Using JIT compilation for direct execution

Implementing our own JIT compiler with LLJIT

Building a JIT compiler class from scratch

Summary

Chapter 10: Debugging Using LLVM Tools

Technical requirements

Instrumenting an application with sanitizers

Finding bugs with libFuzzer

Performance profiling with XRay

Checking the source with the clang static analyzer

Creating your own clang-based tool

Summary

Part 4: Roll Your Own Backend

Chapter 11: The Target Description

Setting the stage for a new backend

Adding the new architecture to the Triple class

Extending the ELF file format definition in LLVM

Creating the target description

Adding the M88k backend to LLVM

Implementing the assembler parser

Creating the disassembler

Summary

Chapter 12: Instruction Selection

Defining the rules of the calling convention

Instruction selection via the selection DAG

Adding register and instruction information

Putting an empty frame lowering in place

Emitting machine instructions

Creating the target machine and the sub-target

Global instruction selection

How to further evolve the backend

Summary

Chapter 13: Beyond Instruction Selection

Adding a new machine function pass to LLVM

Integrating a new target into the clang frontend

Targeting a different CPU architecture

Summary

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Cloning the repository and building from source

With the build tools ready, you can now check out all LLVM projects from GitHub and build LLVM. This process is essentially the same on all platforms:

Configure Git.
Clone the repository.
Create the build directory.
Generate the build system files.
Finally, build and install LLVM.

Let’s begin with configuring Git.

Configuring Git

The LLVM project uses Git for version control. If you have not used Git before, then you should do some basic configuration of Git first before continuing: to set the username and email address. Both pieces of information are used if you commit changes.

One can check whether they previously had an email and username already configured in Git with the following commands:

$ git config user.email
$ git config user.name

The preceding commands will output the respective email and username that you already have set when using Git. However, in the event that you are setting the username and email for the first time, the following commands can be entered for first-time configuration. In the following commands, you can simply replace Jane with your name and [email protected] with your email:

$ git config --global user.email "[email protected]"
$ git config --global user.name "Jane"

These commands change the global Git configuration. Inside a Git repository, you can locally overwrite those values by not specifying the --global option.

By default, Git uses the vi editor for commit messages. If you prefer another editor, then you can change the configuration in a similar way. To use the nano editor, you type the following:

$ git config --global core.editor nano

For more information about Git, please see the Git Version Control Cookbook (https://www.packtpub.com/product/git-version-control-cookbook-second-edition/9781789137545).

Now you are ready to clone the LLVM repository from GitHub.

Cloning the repository

The command to clone the repository is essentially the same on all platforms. Only on Windows, it is recommended to turn off the auto-translation of line endings.

On all non-Windows platforms, you type the following command to clone the repository:

$ git clone https://github.com/llvm/llvm-project.git

Only on Windows, add the option to disable auto-translation of line endings. Here, you type the following:

$ git clone --config core.autocrlf=false \
  https://github.com/llvm/llvm-project.git

This Git command clones the latest source code from GitHub into a local directory named llvm-project. Now change the current directory into the new llvm-project directory with the following command:

$ cd llvm-project

Inside the directory are all LLVM projects, each one in its own directory. Most notably, the LLVM core libraries are in the llvm subdirectory. The LLVM project uses branches for subsequent release development (“release/17.x”) and tags (“llvmorg-17.0.1”) to mark a certain release. With the preceding clone command, you get the current development state. This book uses LLVM 17. To check out the first release of LLVM 17 into a branch called llvm-17, you type the following:

$ git checkout -b llvm-17 llvmorg-17.0.1

With the previous steps, you cloned the whole repository and created a branch from a tag. This is the most flexible approach.

Git also allows you to clone only a branch or a tag (including history). With git clone --branch release/17.x https://github.com/llvm/llvm-project, you only clone the release/17.x branch and its history. You then have the latest state of the LLVM 17 release branch, so you only need to create a branch from the release tag like before if you need the exact release version. With the additional –-depth=1 option, which is known as a shallow clone with Git, you prevent the cloning of the history, too. This saves time and space but obviously limits what you can do locally, including checking out a branch based on the release tags.

Creating a build directory

Unlike many other projects, LLVM does not support inline builds and requires a separate build directory. Most easily, this is created inside the llvm-project directory, which is your current directory. Let us name the build directory, build, for simplicity. Here, the commands for Unix and Windows systems differ. On a Unix-like system, you use the following:

$ mkdir build

And on Windows, use the following:

$ md build

Now you are ready to create the build system files with the CMake tool inside this directory.

Generating the build system files

In order to generate build system files to compile LLVM and clang using Ninja, you run the following:

$ cmake -G Ninja -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_ENABLE_PROJECTS=clang -B build -S llvm

The -G option tells CMake for which system to generate build files. Often-used values for that option are as follows:

Ninja – for the Ninja build system
Unix Makefiles – for GNU Make
Visual Studio 17 VS2022 – for Visual Studio and MS Build
Xcode – for Xcode projects

With the –B option, you tell CMake the path of the build directory. Similarly, you specify the source directory with the –S option. The generation process can be influenced by setting various variables with the –D option. Usually, they are prefixed with CMAKE_ (if defined by CMake) or LLVM_ (if defined by LLVM).

As mentioned previously, we are also interested in compiling clang alongside LLVM. With the LLVM_ENABLE_PROJECTS=clang variable setting, this allows CMake to generate the build files for clang in addition to LLVM. Furthermore, the CMAKE_BUILD_TYPE=Release variable tells CMake that it should generate build files for a release build.

The default value for the –G option depends on your platform, and the default value for the build type depends on the toolchain. However, you can define your own preference with environment variables. The CMAKE_GENERATOR variable controls the generator, and the CMAKE_BUILD_TYPE variable specifies the build type. If you use bash or a similar shell, then you can set the variables with the following:

$ export CMAKE_GENERATOR=Ninja
$ export CMAKE_BUILD_TYPE=Release

If you are using the Windows command prompt instead, then you set the variables with the following:

$ set CMAKE_GENERATOR=Ninja
$ set CMAKE_BUILD_TYPE=Release

With these settings, the command to create the build system files becomes the following, which is easier to type:

$ cmake -DLLVM_ENABLE_PROJECTS=clang -B build -S llvm

You will find more about CMake variables in the Customizing the build process section.

Compiling and installing LLVM

After the build files are generated, LLVM and clang can be compiled with the following:

$ cmake –-build build

This command runs Ninja under the hood because we told CMake to generate Ninja files in the configuration step. However, if you generate build files for a system such as Visual Studio, which supports multiple build configurations, then you need to specify the configuration to use for the build with the --config option. Depending on the hardware resources, this command runs for between 15 minutes (server with lots of CPU cores, memory, and fast storage) and several hours (dual-core Windows notebook with limited memory).

By default, Ninja utilizes all available CPU cores. This is good for the speed of compilation but may prevent other tasks from running; for example, on a Windows-based notebook, it is almost impossible to surf the internet while Ninja is running. Fortunately, you can limit the resource usage with the –j option.

Let’s assume you have four CPU cores available and Ninja should only use two (because you have parallel tasks to run); you then use this command for compilation:

$ cmake --build build –j2

After compilation is finished, a best practice is to run the test suite to check whether everything works as expected:

$ cmake --build build --target check-all

Again, the runtime of this command varies widely with the available hardware resources. The check-all Ninja target runs all test cases. Targets are generated for each directory containing test cases. Using check-llvm instead of check-all runs the LLVM tests but not the clang tests; check-llvm-codegen runs only the tests in the CodeGen directory from LLVM (that is, the llvm/test/CodeGen directory).

You can also do a quick manual check. One of the LLVM applications is llc, the LLVM compiler. If you run it with the -version option, it shows the LLVM version, the host CPU, and all supported architectures:

$ build/bin/llc --version

If you have trouble getting LLVM compiled, then you should consult the Common Problems section of the Getting Started with the LLVM System documentation https://releases.llvm.org/17.0.1/docs/GettingStarted.html#common-problems) for solutions to typical problems.

As the last step, you can install the binaries:

$ cmake --install build

On a Unix-like system, the install directory is /usr/local. On Windows, C:\Program Files\LLVM is used. This can be changed, of course. The next section explains how.

Learn LLVM 17 - Second Edition

By : Kai Nacke, Amy Kwan

Learn LLVM 17 - Second Edition

By: Kai Nacke, Amy Kwan

Overview of this book

Related Content you might be interested in

Current Title:

Learn LLVM 17 - Second Edition

LLVM Techniques, Tips, and Best Practices Clang and Middle-End Libraries

Clang Compiler Frontend

Cloning the repository and building from source

Configuring Git

Cloning the repository

Creating a build directory

Generating the build system files

Compiling and installing LLVM