Book Image

Build Your Own Programming Language - Second Edition

By : Clinton L. Jeffery

Book Image

Build Your Own Programming Language - Second Edition

By: Clinton L. Jeffery

Overview of this book

There are many reasons to build a programming language: out of necessity, as a learning exercise, or just for fun. Whatever your reasons, this book gives you the tools to succeed. You’ll build the frontend of a compiler for your language and generate a lexical analyzer and parser using Lex and YACC tools. Then you’ll explore a series of syntax tree traversals before looking at code generation for a bytecode virtual machine or native code. In this edition, a new chapter has been added to assist you in comprehending the nuances and distinctions between preprocessors and transpilers. Code examples have been modernized, expanded, and rigorously tested, and all content has undergone thorough refreshing. You’ll learn to implement code generation techniques using practical examples, including the Unicon Preprocessor and transpiling Jzero code to Unicon. You'll move to domain-specific language features and learn to create them as built-in operators and functions. You’ll also cover garbage collection. Dr. Jeffery’s experiences building the Unicon language are used to add context to the concepts, and relevant examples are provided in both Unicon and Java so that you can follow along in your language of choice. By the end of this book, you'll be able to build and deploy your own domain-specific language.

Preface

Who this book is for

What this book covers

To get the most out of this book

Section I: Programming Language Frontends

Section I: Programming Language Frontends

Free Chapter

Why Build Another Programming Language?

Why Build Another Programming Language?

Motivations for writing your own programming language

Types of programming language implementations

Organizing a bytecode language implementation

Languages used in the examples

The difference between programming languages and libraries

Applicability to other software engineering tasks

Establishing the requirements for your language

Case study – requirements that inspired the Unicon language

Programming Language Design

Programming Language Design

Determining the kinds of words and punctuation to provide in your language

Specifying the control flow

Deciding on what kinds of data to support

Overall program structure

Completing the Jzero language definition

Case study – designing graphics facilities in Unicon

Scanning Source Code

Scanning Source Code

Technical requirements

Lexemes, lexical categories, and tokens

Regular expressions

Using UFlex and JFlex

Writing a scanner for Jzero

Regular expressions are not always enough

Parsing

Technical requirements

Syntax analysis

Context-free grammars

Using iyacc and BYACC/J

Writing a parser for Jzero

Syntax Trees

Technical requirements

Learning about trees

Creating leaves from terminal symbols

Building internal nodes from production rules

Forming syntax trees for the Jzero language

Debugging and testing your syntax tree

Section II: Syntax Tree Traversals

Section II: Syntax Tree Traversals

Symbol Tables

Technical requirements

Establishing the groundwork for symbol tables

Creating and populating symbol tables for each scope

Checking for undeclared variables

Finding redeclared variables

Handling package and class scopes in Unicon

Testing and debugging symbol tables

Checking Base Types

Checking Base Types

Technical requirements

Type representation in the compiler

Assigning type information to declared variables

Determining the type at each syntax tree node

Runtime type checks and type inference in Unicon

Checking Types on Arrays, Method Calls, and Structure Accesses

Checking Types on Arrays, Method Calls, and Structure Accesses

Technical requirements

Checking operations on array types

Checking method calls

Checking structured type accesses

Intermediate Code Generation

Intermediate Code Generation

Technical requirements

What is intermediate code?

An intermediate code instruction set

Annotating syntax trees with labels for control flow

Generating code for expressions

Generating code for control flow

Syntax Coloring in an IDE

Syntax Coloring in an IDE

Writing your own IDE versus supporting an existing one

Downloading the software used in this chapter

Adding support for your language to Visual Studio Code

Integrating a compiler into a programmer’s editor

Avoiding reparsing the entire file on every change

Using lexical information to colorize tokens

Highlighting errors using parse results

Section III: Code Generation and Runtime Systems

Section III: Code Generation and Runtime Systems

Preprocessors and Transpilers

Preprocessors and Transpilers

Understanding preprocessors

Code generation in the Unicon preprocessor

The difference between preprocessors and transpilers

Transpiling Jzero code to Unicon

Bytecode Interpreters

Bytecode Interpreters

Technical requirements

Understanding what bytecode is

Comparing bytecode with intermediate code

Building a bytecode instruction set for Jzero

Implementing a bytecode interpreter

Writing a runtime system for Jzero

Running a Jzero program

Examining iconx, the Unicon bytecode interpreter

Generating Bytecode

Generating Bytecode

Technical requirements

Converting intermediate code to Jzero bytecode

Comparing bytecode assembler with binary formats

Linking, loading, and including the runtime system

Unicon example – bytecode generation in icont

Native Code Generation

Native Code Generation

Technical requirements

Deciding whether to generate native code

Introducing the x64 instruction set

Using registers

Converting intermediate code to x64 code

Generating x64 output

Leave a review!

Implementing Operators and Built-In Functions

Implementing Operators and Built-In Functions

Implementing operators

Writing built-in functions

Integrating built-ins with control structures

Developing operators and functions for Unicon

Domain Control Structures

Domain Control Structures

Knowing when a new control structure is needed

Scanning strings in Icon and Unicon

Rendering regions in Unicon

Garbage Collection

Garbage Collection

Grasping the importance of garbage collection

Counting references to objects

Marking live data and sweeping the rest

Final Thoughts

Reflecting on what was learned from writing this book

Deciding where to go from here

Exploring references for further reading

Section IV: Appendix

Section IV: Appendix

Answers

Other Books You May Enjoy

Other Books You May Enjoy

Index

Appendix: Unicon Essentials

Appendix: Unicon Essentials

Syntactic shorthand

Using Unicon’s declarations and data types

Evaluating expressions

Debugging and environmental issues

Function mini-reference

Selected keywords

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Questions

A bytecode interpreter could use an instruction set with up to three addresses (operands) per instruction, such as three-address code. Instead, the Jzero interpreter uses zero or one operands per instruction. What are the pros and cons of using three-address code in the bytecode interpreter, just as it is used in intermediate code?
On real CPUs and in many C-based bytecode interpreters, bytecode addresses are represented by literal machine addresses. However, the bytecode interpreters that were shown in this chapter implement bytecode addresses as positions or offsets within allocated blocks of memory. Is a programming language that does not have a pointer data type at a fatal disadvantage in implementing a bytecode interpreter, compared to a language that does support pointer data types?
If code is represented in memory as an immutable string value, what constraints does that impose on the implementation of a bytecode interpreter?

Join our community...