Book Image

Learn Bosque Programming

By : Sebastian Kaczmarek, Joel Ibaceta
Book Image

Learn Bosque Programming

By: Sebastian Kaczmarek, Joel Ibaceta

Overview of this book

Bosque is a new high-level programming language inspired by the impact of structured programming in the 1970s. It adopts the TypeScript syntax and ML semantics and is designed for writing code that is easy to reason about for humans and machines. With this book, you'll understand how Bosque supports high productivity and cloud-first development by removing sources of accidental complexity and introducing novel features. This short book covers all the language features that you need to know to work with Bosque programming. You'll learn about basic data types, variables, functions, operators, statements, and expressions in Bosque and become familiar with advanced features such as typed strings, bulk algebraic data operations, namespace declarations, and concept and entity declarations. This Bosque book provides a complete language reference for learning to program with Bosque and understanding the regularized programming paradigm. You'll also explore real-world examples that will help you to reinforce the knowledge you've acquired. Additionally, you'll discover more advanced topics such as the Bosque project structure and contributing to the project. By the end of this book, you'll have learned how to configure the Bosque environment and build better and reliable software with this exciting new open-source language.
Table of Contents (22 chapters)
1
Section 1: Introduction
5
Section 2: The Bosque Language Overview
10
Section 3: Practicing Bosque
15
Section 4: Exploring Advanced Features

Learning what Intermediate Representation is

Nowadays, it's not unusual to find high-level programming languages that use one or more intermediate representations when they're translating source code into binary code or machine code. By doing this, the compilation process can be simplified without us losing the advantages of a high-level language. It opens the path to developing new programming languages and being friendlier with developers and closer to the process of human reasoning. Bosque is no exception.

Let's learn how intermediate representation works by looking at an example.

First, an abstract representation is usually modeled through a graph that describes the program we are compiling through a data structure. This can occur in different ways:

  • An abstract syntax tree (AST)
  • Lineal IR's three-way code or Postfix notation

Let's take a look at the following expression:

5 * a - b

This expression can be expressed using the following AST:

Figure 1.1 – AST graph representation

Figure 1.1 – AST graph representation

If we quickly inspect the graph, we can identify the code's intent through its structure. We could use this abstract representation to generate an intermediate code representation that is more agnostic to the architecture or execution environment. This code is usually a sequential representation of the syntactic tree. Generally, from this abstract representation, an intermediate code representation could be generated. Even though this is more similar to final object code, this code is usually a sequential representation of the syntactic tree, agnostic to the architecture or execution environment.

Here, we can observe the representation in the intermediate code of the previous structure:

t1 = 5 * a 
t2 = b
t3 = t1 - t2

From this intermediate code, and by using a suitable compiler, a final executable binary can be generated, which would be the result of our source code's compilation. Let's look at a more complex example based on a C# snippet. This can be interpreted as follows:

while ( x > y ) {
    if ( x > 0 ) {
        x = x - y;
    }
}

The AST representation of the previous code is as follows:

Figure 1.2 – AST graph representation

Figure 1.2 – AST graph representation

During its interpretation, the C# source code is converted into an intermediate code called IL, representing the original code's intention. However, in this case, it is less understandable at first glance, as shown here:

.locals init ( 
    [0] int32 x, 
    [1] int32 y, 
    [2] bool, 
    [3] bool 
)
IL_0001: ldc.i4.0
IL_0002: stloc.0
IL_0003: ldc.i4.0
IL_0004: stloc.1
// sequence point: hidden
IL_0005: br.s IL_0017
// loop start (head: IL_0017)
    IL_0007: nop
    IL_0008: ldloc.0
    IL_0009: ldc.i4.0
    IL_000a: cgt
    IL_000c: stloc.2
    // sequence point: hidden
    IL_000d: ldloc.2
    IL_000e: brfalse.s IL_0016
    IL_0010: nop
    IL_0011: ldloc.0
...

Later, this intermediate code will be converted into low-level code so that it can be interpreted by the virtual machine that will build the final executable – in this case, .NET Framework. Some additional examples of known intermediate languages are as follows:

  • GNU RTL: This is an intermediate language that's used to support many of the programs in the programming languages found in the GNU Compiler Collection.
  • CIL: This an intermediate language that's used by Microsoft .NET Framework's high-level languages. The final binary code is generated from this representation.
  • C: Even though it wasn't designed as an intermediate language, it's often used as a layer of abstraction for the assembler language, which is why so many languages have adopted it as their intermediate representation.

The advantages of having an intermediate representation include being able to have an intermediate stage of interpretation from the high-level language. In this stage, we can optimize, analyze, or correct the code in the most appropriate way according to the language's design and objectives.

The process of transforming an intermediate representation into native code that results in an executable binary is also called ahead-of-time (AOT) compilation. This has many advantages over the just-in-time (JIT) compilation, generally resulting in a considerable reduction in execution time, resource savings, and shorter startup times.

This is why the Bosque compilation process uses a binary generation tool based on AOT compilation, called ExeGen, whose usage we will explore in more detail in the next chapter. Additionally, the Bosque project has an IR that's been explicitly designed to support the needs of the regularized programming paradigm. It explores how to build intermediate representations so that they include support for symbolic testing, enhanced fuzzing, GC compilation, and API auto-marshaling, among others.

Some of these characteristics that Bosque IR supports are part of the regularized programming paradigm, as we will see next.