Book Image

Build Your Own Programming Language

By : Clinton L. Jeffery

Book Image

Build Your Own Programming Language

By: Clinton L. Jeffery

Overview of this book

The need for different types of computer languages is growing rapidly and developers prefer creating domain-specific languages for solving specific application domain problems. Building your own programming language has its advantages. It can be your antidote to the ever-increasing size and complexity of software. In this book, you’ll start with implementing the frontend of a compiler for your language, including a lexical analyzer and parser. The book covers a series of traversals of syntax trees, culminating with code generation for a bytecode virtual machine. Moving ahead, you’ll learn how domain-specific language features are often best represented by operators and functions that are built into the language, rather than library functions. We’ll conclude with how to implement garbage collection, including reference counting and mark-and-sweep garbage collection. Throughout the book, Dr. Jeffery weaves in his experience of building the Unicon programming language to give better context to the concepts where relevant examples are provided in both Unicon and Java so that you can follow the code of your choice of either a very high-level language with advanced features, or a mainstream language. By the end of this book, you’ll be able to build and deploy your own domain-specific languages, capable of compiling and running programs.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Share Your Thoughts

Section 1: Programming Language Frontends

Section 1: Programming Language Frontends

Free Chapter

Chapter 1: Why Build Another Programming Language?

Chapter 1: Why Build Another Programming Language?

So, you want to write your own programming language…

Language versus library – what's the difference?

Applicability to other software engineering tasks

Establishing the requirements for your language

Case study – requirements that inspired the Unicon language

Chapter 2: Programming Language Design

Chapter 2: Programming Language Design

Determining the kinds of words and punctuation to provide in your language

Specifying the control flow

Deciding on what kinds of data to support

Overall program structure

Completing the Jzero language definition

Case study – designing graphics facilities in Unicon

Chapter 3: Scanning Source Code

Chapter 3: Scanning Source Code

Technical requirements

Lexemes, lexical categories, and tokens

Regular expressions

Using UFlex and JFlex

Writing a scanner for Jzero

Regular expressions are not always enough

Chapter 4: Parsing

Chapter 4: Parsing

Technical requirements

Analyzing syntax

Understanding context-free grammars

Using iyacc and BYACC/J

Writing a parser for Jzero

Improving syntax error messages

Chapter 5: Syntax Trees

Chapter 5: Syntax Trees

Technical requirements

Learning about trees

Creating leaves from terminal symbols

Building internal nodes from production rules

Forming syntax trees for the Jzero language

Debugging and testing your syntax tree

Section 2: Syntax Tree Traversals

Section 2: Syntax Tree Traversals

Chapter 6: Symbol Tables

Chapter 6: Symbol Tables

Technical requirements

Establishing the groundwork for symbol tables

Creating and populating symbol tables for each scope

Checking for undeclared variables

Finding redeclared variables

Handling package and class scopes in Unicon

Testing and debugging symbol tables

Chapter 7: Checking Base Types

Chapter 7: Checking Base Types

Technical requirements

Type representation in the compiler

Assigning type information to declared variables

Determining the type at each syntax tree node

Runtime type checks and type inference in Unicon

Chapter 8: Checking Types on Arrays, Method Calls, and Structure Accesses

Chapter 8: Checking Types on Arrays, Method Calls, and Structure Accesses

Technical requirements

Checking operations on array types

Checking method calls

Checking structured type accesses

Chapter 9: Intermediate Code Generation

Chapter 9: Intermediate Code Generation

Technical requirements

Preparing to generate code

An intermediate code instruction set

Annotating syntax trees with labels for control flow

Generating code for expressions

Generating code for control flow

Chapter 10: Syntax Coloring in an IDE

Chapter 10: Syntax Coloring in an IDE

Downloading the example IDEs used in this chapter

Integrating a compiler into a programmer's editor

Avoiding reparsing the entire file on every change

Using lexical information to colorize tokens

Highlighting errors using parse results

Adding Java support

Section 3: Code Generation and Runtime Systems

Section 3: Code Generation and Runtime Systems

Chapter 11: Bytecode Interpreters

Chapter 11: Bytecode Interpreters

Technical requirements

Understanding what bytecode is

Comparing bytecode with intermediate code

Building a bytecode instruction set for Jzero

Implementing a bytecode interpreter

Writing a runtime system for Jzero

Running a Jzero program

Examining iconx, the Unicon bytecode interpreter

Chapter 12: Generating Bytecode

Chapter 12: Generating Bytecode

Technical requirements

Converting intermediate code to Jzero bytecode

Comparing bytecode assembler with binary formats

Linking, loading, and including the runtime system

Unicon example – bytecode generation in icont

Chapter 13: Native Code Generation

Chapter 13: Native Code Generation

Technical requirements

Deciding whether to generate native code

Introducing the x64 instruction set

Using registers

Converting intermediate code to x64 code

Generating x64 output

Chapter 14: Implementing Operators and Built-In Functions

Chapter 14: Implementing Operators and Built-In Functions

Implementing operators

Writing built-in functions

Integrating built-ins with control structures

Developing operators and functions for Unicon

Chapter 15: Domain Control Structures

Chapter 15: Domain Control Structures

Knowing when you need a new control structure

Scanning strings in Icon and Unicon

Rendering regions in Unicon

Chapter 16: Garbage Collection

Chapter 16: Garbage Collection

Appreciating the importance of garbage collection

Counting references to objects

Marking live data and sweeping the rest

Chapter 17: Final Thoughts

Chapter 17: Final Thoughts

Reflecting on what was learned from writing this book

Deciding where to go from here

Exploring references for further reading

Section 4: Appendix

Section 4: Appendix

Assessments

Other Books You May Enjoy

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Appendix: Unicon Essentials

Appendix: Unicon Essentials

Using Unicon's declarations and data types

Evaluating expressions

Debugging and environmental issues

Function mini-reference

Selected keywords

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Lexemes, lexical categories, and tokens

Programming languages read characters and group adjacent characters together when they are part of the same entity in the language. This can be a multi-character name or reserved word, a constant value, or an operator.

A lexeme is a string of adjacent characters that form a single entity. Most punctuation marks are lexemes unto themselves, in addition to separating what came before from what comes after them. In reasonable languages, whitespace characters such as spaces and tabs are ignored other than to separate lexemes. Almost all languages also have a way of including comments in the source code, and comments are typically treated the same as whitespace: they can be the boundary that separates two lexemes, but they are discarded and not considered further.

Each lexeme has a lexical category. In natural languages, lexical categories are called parts of speech. In a programming language implementation, the lexical category is generally...