Book Image

Groovy for Domain-Specific Languages, Second Edition

By : Fergal Dearle
Book Image

Groovy for Domain-Specific Languages, Second Edition

By: Fergal Dearle

Overview of this book

Table of Contents (20 chapters)
Groovy for Domain-specific Languages Second Edition
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
1
Introduction to DSLs and Groovy
Index

DSL – a new name for an old idea


I've mentioned domain-specific language (DSL) several times now, so what does this really mean? The term "DSL" describes a programming language that is dedicated to a specific problem domain. The idea is not new. DSLs have been around for a long time. One of the most exciting features of Unix has always been its mini languages. These include a rich set of typesetting languages (troff, eqn, pic, and so on), shell tools (awk, sed, and so on), and software development tools (make, yacc, and lex).

The Java platform has a multitude of mini DSLs in the form of XML config files for configuration of everything from EJBs to web applications. In many JEE applications, Enterprise Java Beans (EJB) can be configured using an XML configuration file, ejb-jar.xml. While the ejb-jar.xml file is written in the general-purpose language XML, the contents of the file need to conform to a document type definition (DTD) or XML schema, which describes the valid structure of the file.

XML configuration files can be found across a wide range of libraries and frameworks. Spring is configured by using a spring-config.xml file, and Struts with struts-config.xml. In each case, the DTD or schema defines the elements and tags, which are valid for the specific domain, be it EJB, Spring, or Struts. So, ejb-jar.xml can be considered a mini DSL for configuring EJB, spring-config.xml is a mini DSL for configuring Spring beans, and so on.

In essence, DSL is a fancy name for something that we use every day of our professional programming lives. There are not many applications that can be fully written in a single general-purpose language. As such, we are the everyday consumers of many different DSLs, each of which is specific to a particular purpose.

A typical day's work could involve working with Java code for program logic, CSS for styling a web page, JavaScript for providing some dynamic web content, and Ant, Maven, or Gradle to build the scripts that tie it all together. We are well used to consuming DSLs, but seldom consider producing new DSLs to implement our applications—which we should.

The evolution of programming languages

My own background is probably typical of many of my generation of old-school programmers. Back in 1986, I was a young software engineer fresh out of college. During my school and college years, I studied many different programming languages. I was fortunate in high school to have had a visionary Math teacher who taught us to program in BASIC, so I cut my teeth programming as early as 1974. Through various college courses, I came to know about Pascal, C, Fortran, Lisp, Assembler, and COBOL.

My school, college, and early professional career all reinforced a belief that programming languages were for the exclusive use of us programmers. We liked nothing better than spending hours locked away in dark rooms writing reams of arcane and impenetrable code. The more arcane and impenetrable the better! The hacker spirit prevailed, and annual competitions such as the International Obfuscated C Code Contest (IOCCC) were born.

Note

The IOCCC runs to this day. The point of the contest is to write valid but impenetrable C code that works. Check out http://www.ioccc.org to see how not to write code.

General-purpose languages

All of the teaching in college in those days revolved around the general-purpose languages. I recall sitting in class and being taught about the "two" types of programming language: machine language, and high-level languages. Both were types of general-purpose languages, in which you could build any type of application, but each language had its own strengths and weaknesses. The notion of a DSL was not yet considered as part of the teaching program. Nor was the idea that anyone other than a cadre of trained professional programmers (hackers) would ever write programs for computers. These days, the word "hacker" has bad connotations of being synonymous with virus writers and the likes. In those days, a good "hack" was an elegant programming solution to a hard problem and being called a hacker by one's peers was a badge of pride for most programmers.

The high-level programming language you used defined what type of an application programmer you were. COBOL was for business application programming, Fortran was for scientific programmers, and C was for hackers building Unix and PC software. Although COBOL and Fortran were designed to be used in a particular business domain, they were still considered general-purpose languages. You could still write a scientific application in COBOL or a business application in Fortran if you wanted to. However, you were unlikely to try any low-level device driver development in COBOL.

Although it was possible to build entire applications in assembly language (and many people did), high-level languages, such as C, BASIC, and COBOL, were much better suited to this task. The first version of the world-beating spreadsheet Lotus 1-2-3 was written entirely in 8086 assembly language, and ironically, it was the rewrite of this into the supposed high-level language C that nearly broke the company in the late 1980's.

Languages such as C and C++ provide the low-level functionality in a high-level language, which enabled them to be used across a much greater range of domains, including those where assembly was utilized before. These days, Java, C# and C++ compete with each other like the Swiss Army knives of general-purpose languages. There are almost no application domains to which these languages have not been applied, from space exploration, through to enterprise business systems, and mobile phones.

Spreadsheets and 4GLs

Programs such as Lotus 1-2-3 and its precursor VisiCalc revolutionized people's view of who would program computers. A whole generation of accountants, financial analysts, scientists, and engineers came to realize that they can develop sophisticated turnkey solutions for themselves, armed only with a spreadsheet and a little knowledge of macros. Spreadsheet macros are probably one of the first DSLs to find their way out of the cloisters of the IT community and into the hands of the general business user.

Around this time, there was also much media attention paid to the new 4GL (fourth-generation language) systems. 4GLs were touted as being hugely more efficient for developing applications than traditional high-level languages, which then became known as third-generation language (3GL). From the hype in the media at the time, you would be forgiven for thinking that the age of the professional programmer was coming to an end and that an ordinary business user could use a 4GL to develop their own business applications. I viewed this claim with a degree of healthy skepticism—how could a non-programmer build software?

Like DSLs, 4GLs were, generally speaking, targeted at particular problem spaces, and tended to excel at providing solutions in those narrow target markets. The sophistication of most applications in those days was such that it was possible to build them with a few obvious constructs. 4GLs tended to be turnkey environments with integrated tools and runtime environments. You were restricted by the environment that the 4GL provided, but the applications that could be built with a 4GL could be built rapidly, and with a minimal amount of coding.

4GLs differ from our modern understanding of a DSL. We generally think of a DSL as being a mini language with a particular purpose, and they do not generally impose an entire runtime or tool set on their use. The best DSLs can be mixed and matched together, and used in conjunction with a general-purpose programming language, such as C++ or Java, to build our applications.

Language-oriented programming

Martin Fowler has spoken about the use of many mini DSLs in application development. He advocates building applications out of many mini DSLs, which are specific to the particular problem space, in a style of development called language-oriented programming. In a way, this style of programming is the norm for most developers these days, when we mix and match HTML, CSS, SQL, and Java together to build our applications.

The thrust of language-oriented programming is that we should all be going beyond exploiting these generally available languages and implementing our own DSLs that represent the particular problem space that we are working on. With a language-oriented programming approach, we should be building DSLs that are as narrowly focused as the single application that we are currently working on. A DSL does not need to be generally applicable to be useful to us.

Who are DSLs for?

It's worth considering for a moment who the different types of users of a DSL might be. Most DSLs require some programming skills in order to get to grips with them, and are used by software and IT professionals in their daily chores, building, and maintaining and managing systems. They are specific to a particular technical aspect of system development. So the domain of CSS as a DSL is web development in general, and specifically page styling and layout. Many web developers start from a graphic design background and become proficient as coders of HTML, CSS, and JavaScript simply because it gives them better fine-grained control of the design process.

Many graphic designers, for this reason, eventually find themselves eschewing graphical tools such as Dreamweaver in favor of code. Hopefully, our goal in life will not be to turn everybody into a coder. Whereas most DSLs will remain in the realm of the programmer, there are many cases where a well-designed DSL can be used by other stakeholders in the development process other than professional developers. In some cases, DSLs can enable stakeholders to originate parts of the system by enabling them to write the code themselves. In other cases, the DSL can become a shared representation of the system. If the purpose of a particular DSL is to implement business rules then, ideally, that DSL should express the business rule in such a way that it can be clearly understood upon reading by both the business stakeholder who specified it and the programmer who wrote it.

A DSL for process engineers

My own introduction to the concept of DSLs came about in 1986 when I joined Computer Products Inc. (CPI) as a software engineer. In this case, the DSL in question was sophisticated enough to enable the stakeholders to develop large parts of a running system.

CPI developed a process control system, which was primarily sold to chemical and pharmaceutical industries. It was a genuinely distributed system when most process control systems were based on centralized mini or mainframe computers. It had its own real-time kernel, graphics, and a multitude of device drivers for all types of control and measurement devices. But the most innovative part of the system, which excited customers, was a scripting language called EXTended Operations Language (EXTOL). EXTOL was a DSL in the purest sense because it drew the domain experts right into the development process, as originators of the running code.

With EXTOL, a chemical process engineer or chemist could write simple scripts to define the logic for controlling their plant. Each control block and measurement block in the system was addressable from EXTOL. Using EXTOL, a process engineer could write control logic in the same pseudo English that they used to describe the logic to their peers.

The following script could be deployed on a reactor vessel to control the act of half-filling the vessel with the reactant from VALVE001:

drive VALVE001 to OPEN
when LEVELSENSOR.level >= 50%
drive VALVE001 to CLOSED

This was an incredibly powerful concept. Up to this point, most process control systems were programmed in a combination of high-level languages on the main process system, and relay logic on PLCs in the plant. Both tasks required specific programming skills, and could not generally be completed by the chemists or chemical engineers, who designed the high-level chemical processing undertaken at the plant. I recall a room full of white-coated chemists at one plant happily writing EXTOL scripts, as we commissioned the plant.

The proof of the pudding is always in the eating, and I don't recall a CPI engineer ever being called upon to write a single line of EXTOL code on behalf of a customer. Given an appropriate DSL that fit their needs, our customers could write all of the code that they need themselves, without having to be programmers.

This shows the power of DSLs at their best. At this extreme end of the spectrum, a DSL becomes a programming tool that a domain expert can use independently, and without recourse to the professional programmer. It's important to remember, however, that the domain experts in this case were mostly process engineers. Process engineers are already well accustomed to devising stepwise instructions, and building process flows. They will often use the same visual representations as a programmer, such as a flow chart to express a process that they are working on.

When devising a DSL for a particular domain, we should always consider the stakeholders who need to be involved in using it. In the case of EXTOL, the DSL was targeted at a technical audience who could take the DSL and become part of the system development process. Not all of our stakeholders will be quite as technical as this. But, at the very least, the goal when designing a DSL should be to make the DSL understandable to nontechnical stakeholders.

Stakeholder participation

It's an unfortunate fact that with many DSLs, especially those based on XML, the code that represents a particular domain problem is often only legible to the programming staff. This leads to a disconnect between what the business analysts and domain experts define, and what eventually gets implemented in the system. For instance, a business rule is most likely to be described in plain English by a business analyst in a functional specification document. But these rules will most likely be translated by developers into an XML representation that is specific to the particular rules engine, which is then deployed as a part of the application. If the business analyst can't read the XML representation and understand it, then the original intent of the rule can easily be lost in translation.

With language-oriented programming, we should aim to build DSLs that can be read and understood by all stakeholders. As such, these DSLs should become the shared living specification of the system, even if in the end they must, by necessity, be written by a programmer with a technical understanding of the DSL.

DSL design and implementation

DSLs can take many different forms. Some DSLs, such as Unix mini languages, (sed, awk, and troff) have a syntactical structure, which is unique to that particular language. To implement such DSLs, we need to be able to parse this syntax out of the text files that contain the source code of that particular language. To implement our own DSL in this style involves implementing a mini compiler that uses lexing and parsing tools such as lex, yacc, or antlr.

Compiler writing is one particular skill that is outside the skill set of most application development teams. Writing your own parser or compiler grammar is a significant amount of effort to go to, unless the DSL is going to be used generally, and is beyond the scope of most application-specific DSLs.

EXTOL circumvented this problem by having its own syntax-sensitive editor. Users edited their EXTOL scripts from within the editor, and were prompted for the language constructs that they needed to use for each circumstance. This ensured that the scripts were always well-formed and syntactically correct. This also meant that the editor can save the scripts in an intermediate p-code form so that the scripts never existed as text-based program files, and therefore never needed to be compiled.

Many of the DSLs that we use are embedded within other languages. The multitude of XML configuration scripts in the Java platform are an example of this. These mini DSLs piggyback on the XML syntax, and can optionally use an XML DTD or schema definition to define their own particular syntax. These XML-based DSLs can be easily validated for "well-formedness" by using the DTD or schema.

External versus internal DSLs

We generally refer to DSLs that are implemented with their own unique syntax as external DSLs, and those that are implemented within the syntax of a host language as embedded or internal DSLs. Ideally, whenever building a new DSL, it would be best to give it its own unique and individual syntax. By designing our own unique syntax, we can provide language constructs, which are designed with both the problem domain and the target audience in mind.

If the intended user of the DSL is a non-programmer, then developing an XML-based syntax can be problematic. XML has its own particular rules about opening, closing, and properly terminating tags that appear arcane to anybody except a programmer. This is a natural constraint when working with DSLs that are embedded/internal to another language. An XML-based DSL cannot help being similar to XML.

Embedded/internal DSLs will never be as free-form as a custom external DSL due to the constraints of the host language. Fortunately, Groovy-based DSLs are capable of being structured in a more human-readable format. However, they always need to use well-formed Groovy syntax, and there are always going to be compromises when designing Groovy-based DSLs that are readable by your target audience.

Operator overloading

Some general-purpose languages, such as C++, Lisp, and now Groovy, have language features that assist in the development of mini language syntaxes. C++ was one of the earliest languages to implement the concept of operator overloading. By using operator overloading, we can make non-numeric objects behave like numeric values by implementing the appropriate operators. So, we can add a plus operator to a String object in order to support concatenation. When we implement a class that represents a numeric type, we can add the numeric operators again to make them behave like numeric primitives. We can implement a ComplexNumber class, which represents complex numbers, as follows:

class ComplexNumber {
public:
double real, imag;
        ComplexNumber() { real = imag = 0; }
        ComplexNumber(double r, double i) { real = r; imag = i; }
        ComplexNumber& operator+(const ComplexNumber& num);
};

To add one complex number to another, we need to correctly add each of the real and imaginary parts together to generate the result. We implement an equality operator for ComplexNumber as follows:

ComplexNumber& ComplexNumber::operator=(const ComplexNumber& num) {
       real = num.real;
       imag = num.imag;
       return *this;
}

This allows us then to add ComplexNumber objects together as if they were simple numeric values:

int main(int argc, const char* argv[]) {
      ComplexNumber a(1, 2), b(3, 4);
      ComplexNumber sum;
      sum = a + b;
      cout << "sum is " << sum.real << " ; "
           << sum.imaginary << "i" << endl;
}

One of the criticisms of the operator overload feature in C++ is that when using operator overloading, there is no way to control what functionality is being implemented in the overloaded function. It is perfectly possible—but not very sensible—to make the + operator subtract values and the operator add values. Misused operator overloading has the effect of obfuscating the code rather than simplifying it. However, sometimes this very obfuscation can be used to good effect.

The preceding example illustrates what could be considered as a classic case of obfuscation in C++. If your use of C++ predated the introduction of the standard C++ libraries and the streams libraries in particular, you will probably do a double take when looking at this code.

The example uses what has become commonly known as the stream operator <<. This operator can be used to send a character stream to standard output, the logic being that it looks very much like how we stream output from one program to another in a Unix shell script. In fact, there really is no such thing as a stream operator in C++ and what has been overloaded here is the binary left shift operator <<. I have to admit that my first encounter with a code like this left me perplexed. Why would anybody want to left shift the address of a string into another object was beyond me? Common use over the intervening years means that this is now a perfectly natural coding style for all C++ programmers. In effect, the streaming operator implements a mini internal DSL for representing streaming. It subverts the original language a little by using an operator out of context, but the end effect is perfectly understandable and makes sense.

During a fireside chat event at JavaOne some years ago, James Gosling was asked if he would ever consider operator overloading for the Java language, and the answer was a resolute no! Fortunately, we don't have to wait and see if Oracle will ever add operator overloading to Java. With Groovy, we can have it now. Groovy has an extensive set of features, including operator overloading that allow us to implement feature-rich DSLs from within the language. We'll take a look at some of those features that distinguish it from Java, now.