Book Image

Learn LLVM 12

By : Kai Nacke
Book Image

Learn LLVM 12

By: Kai Nacke

Overview of this book

LLVM was built to bridge the gap between compiler textbooks and actual compiler development. It provides a modular codebase and advanced tools which help developers to build compilers easily. This book provides a practical introduction to LLVM, gradually helping you navigate through complex scenarios with ease when it comes to building and working with compilers. You’ll start by configuring, building, and installing LLVM libraries, tools, and external projects. Next, the book will introduce you to LLVM design and how it works in practice during each LLVM compiler stage: frontend, optimizer, and backend. Using a subset of a real programming language as an example, you will then learn how to develop a frontend and generate LLVM IR, hand it over to the optimization pipeline, and generate machine code from it. Later chapters will show you how to extend LLVM with a new pass and how instruction selection in LLVM works. You’ll also focus on Just-in-Time compilation issues and the current state of JIT-compilation support that LLVM provides, before finally going on to understand how to develop a new backend for LLVM. By the end of this LLVM book, you will have gained real-world experience in working with the LLVM compiler development framework with the help of hands-on examples and source code snippets.
Table of Contents (17 chapters)
1
Section 1 – The Basics of Compiler Construction with LLVM
5
Section 2 – From Source to Machine Code Generation
11
Section 3 –Taking LLVM to the Next Level

Lexical analysis

As we saw in the example in the previous section, a programming language consists of many elements, such as keywords, identifiers, numbers, operators, and so on. The task of lexical analysis is to take the textual input and create a sequence of tokens from it. The calc language consists of the with, :, +, -, *, /, (, and ) tokens and the ([a-zA-Z])+ (an identifier) and ([0-9])+ (a number) regular expressions. We assign a unique number to each token to make handling them easier.

A handwritten lexer

The implementation of a lexical analyzer is often called a Lexer. Let's create a header file called Lexer.h and start defining Token. It begins with the usual header guard and the required headers:

#ifndef LEXER_H
#define LEXER_H
#include "llvm/ADT/StringRef.h"
#include "llvm/Support/MemoryBuffer.h"

The llvm::MemoryBuffer class provides read-only access to a block of memory, filled with the content of a file. On request, a trailing zero...