Book Image

Modular Programming in Java 9

By : Koushik Srinivas Kothagal
Book Image

Modular Programming in Java 9

By: Koushik Srinivas Kothagal

Overview of this book

The Java 9 module system is an important addition to the language that affects the way we design, write, and organize code and libraries in Java. It provides a new way to achieve maintainable code by the encapsulation of Java types, as well as a way to write better libraries that have clear interfaces. Effectively using the module system requires an understanding of how modules work and what the best practices of creating modules are. This book will give you step-by-step instructions to create new modules as well as migrate code from earlier versions of Java to the Java 9 module system. You'll be working on a fully modular sample application and add features to it as you learn about Java modules. You'll learn how to create module definitions, setup inter-module dependencies, and use the built-in modules from the modular JDK. You will also learn about module resolution and how to use jlink to generate custom runtime images. We will end our journey by taking a look at the road ahead. You will learn some powerful best practices that will help you as you start building modular applications. You will also learn how to upgrade an existing Java 8 codebase to Java 9, handle issues with libraries, and how to test Java 9 applications.
Table of Contents (19 chapters)
Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface

Rethinking Java development with packages


Think about why we use packages in Java. We could very well write entire Java applications without creating any packages and, thereby, using just the default unnamed package. It would work! However, unless it's a simple or throwaway application, that's not a good idea. The idea of packages is to group your Java types into namespaces that signify the relationship, or perhaps a common theme among those types. It makes code easier to read, understand, and navigate.

The following diagram shows an example of classes organized in packages. Adding all classes to a single package (left) is not good practice. We typically group related classes into well-named packages that describe the nature of the classes in them (right):

There's really no rule about what types belong together in a package. However, it's generally understood that when you create a package and put a bunch of Java types in it, the types are usually related in some way. You could very well write any random set of types in the same package and the compiler wouldn't care. However, anyone else who ends up working on your code could potentially hate you forever, so this is not a wise thing to do! Having related types in common packages also has the benefit of those types being able to access the protected members of each other. This is another level of encapsulation--any protected members or methods are encapsulated within types of a package. (Although, there's an exception to this, as inherited classes are able to access private members across packages.)

So, if the idea of modular programming is to break code and functionality into encapsulated units, there's a sense in which you can do some kind of modular programming in Java well before Java 9.

The following table shows the various ways in which you can encapsulate code in Java before Java 9:

What to encapsulate

How to encapsulate

Encapsulation boundary

Member variables and methods

private modifier

Class

Member variables and methods

protected modifier

Package

Member variables, methods, and types

No modifier

(default package - protected)

Package

 

Isn't that good enough? Well, not really. The preceding table is where a limitation in the modular ability of the language becomes apparent. Notice the What to encapsulate column. Most of the encapsulation features provided by these modifiers focus on controlling access to member variables and methods. The only way you can really protect access to a type is by making it package-protected. That, unfortunately, ends up making access difficult for even your own library code to access the type, and you are forced to move all the code that accesses that type into the same package. What if you want more?

Why, you ask? There are a couple of problems with approaching modularity with just the preceding paradigm available in Java 8 and earlier. Let me explain both those problems with two stories.

The unfortunate tale of a library developer

Meet Jack. He's a Java developer at a medium-sized enterprise organization. He's a part of a team that writes code to do data processing. One day, Jack wrote some Java code to sort a list of usernames in alphabetical order. His code worked well without any errors and Jack was proud of his work. Since this was something that could be used by other developers in the organization, he decided to build it as a reusable library and share it with his colleagues as a packaged JAR file. Here's the structure of Jack's library:

His code belonged to two packages--acme.util.stringsorter and acme.util.stringsorter.internal. The main utility class was StringSorterUtil with one method--sortStrings. The method in turn internally called and delegated the sorting responsibility to the BubbleSortUtil.sortStrings() class  from a class in the acme.util.stringsorter.internal package. The BubbleSortUtil class used the popular Bubble Sort algorithm to sort a given list of Strings.

All that any developer had to do was to drop the jar in the classpath and call the StringSorterUtil.sortStrings() method by passing in an list of strings they needed sorting. And they did! Jack's little library became a hit! His colleagues loved the convenience that his library provided and they started using it to sort many things, such as names, tokens, addresses, and so on.

A few months later, Jack happened to talk to Daryl at the water cooler, and as usual, their conversation veered towards a discussion about their current favorite sorting algorithms. Daryl couldn't stop talking about his new-found love for hash sort. He said he found it performs much better than bubble sort, and it was unabashedly his new favorite algorithm! Jack was intrigued. He went to his desk and ran a few tests. Daryl was right! Hash sort outperformed bubble sort in most of his tests. Jack knew right then that he had to update his sorting utility to use hash sort. He added a new class, HashSortUtil in the acme.util.stringsorter.internal package and removed BubbleSortUtil.

The following is the structure of Jack's library after the change:

Thankfully, he had a separate internal class that did the sorting, so the process to invoke the StringSorterUtil.sortStrings() utility wouldn't change. Everyone could just drop in the newer version of the JAR and everything would work just fine.

But it didn't! A few of the code builds in his company started failing. It turned out the culprit was the newer version of Jack's library. Jack couldn't believe it. He didn't miss anything, did he? Well, no. All the projects that used just the StringSorterUtil class worked just fine. However, it turned out that some of the developers ended up using the BubbleSortUtil class in the internal package directly. It was available in the classpath, so they had just imported and used it. Now, since that class didn't exist in the new jar anymore, their code couldn't compile!

Jack sent out an email instructing everyone using BubbleSortUtil to update their code to use StringSorterUtil instead. However, it turned out the BubbleSortUtil class was being used in multiple places by that time, and it wasn't an easy task to change them all. "Couldn't Jack just put the BubbleSortUtil class back?" they asked. Jack yielded to their requests and the next version of the library had both the SortUtil classes (and would possibly do so well into the foreseeable future), even though it internally used only one of those two classes.

After the dust settled, Jack sat at his desk and wondered what had gone wrong. What could he have done to prevent this problem? Clearly, naming the package as internal did not prevent developers from using it. One solution would have been to write that internal bubble sort type as package-protected and move the external type to the same package. This way, he could leverage the third mechanism in the preceding encapsulation table. However, he liked the idea of separating the bubble sort class into its own type and package. Also, imagine if this were a bigger library and there was a common shared class that was supposed to be internal. In that case, pretty much all types in that library that need the internal type have to exist in the same package as that internal type! Wasn't there a better way to encapsulate the internal types?

The impossible task of a deployment engineer

Meet Amit, a deployment engineer at yet another enterprise technology firm. His job is to make sure that during every product release, the organization's code base is compiled and deployed properly in the production environment. During every release, he pulls in the application code and all the necessary jar files and places them in the classpath. He then starts the application that results in the Java Virtual Machine (JVM) loading all the classes and initializing execution.

One night, there was a major product feature release. There were a lot of changes to the code that were all supposed to be deployed and launched together. Amit made sure that all the new code was compiled properly and he had all the necessary jars in the classpath. He then had to start the application. Before he clicked on the button to launch the build, Amit wondered if there was some way he could make sure everything was good and that the application would work without any runtime class errors.

One thing that could potentially go wrong was if he had missed adding a certain class or jar in the classpath. Was there a way he could statically verify whether all the classes were available without actually running the application?

Each JAR bundled a set of types in a set of packages. Each type therein could potentially import other types, either from the same JAR or from other jars. To make sure he has all the classes in the classpath, he has to go to each class and verify that all its imports are in the classpath. Considering that the number of classes in his application run to thousands, it's a Herculean task.

The following diagram is a simplified version of what a sample deployed Java application looks like:

There are four jar files in the picture above, each of which contains packages and classes within them. Jar 1 is deployed in Classpath A, Jar 2 and Jar 3 in Classpath B, and Jar 4 in Classpath C. Let's assume each jar has two classes as indicated by the smaller white boxes. The three paths are configured as classpaths for the Java runtime, so the runtime knows to look at all three paths to scan and pick up classes.

After scanning all the classpaths, this is what the structure looks like to the Java runtime:

Notice that the runtime doesn't care which directory or classpath the package/type is in. It also doesn't care which jar the package/type is bundled in. As far as the Java runtime is concerned, it's just a flattened list of types in packages!

In Java, a classpath is a just set of paths. Any of those locations could have the jars and classes that the application needs to work. You can immediately see how easy it is for things to break! There's always a possibility that some of the classes that the application uses are not available in the classpath. Perhaps a missing jar or library. If the runtime doesn't have a specific class it needs, the application could start running fine, but throw a NoClassDefFoundError much later. That too, only when the execution hits a point where a missing class is actually needed.

This is a huge and very real problem in large Java applications today. There is a whole ecosystem of solutions that have sprung up to address this. For example, tools and build utilities, such as Maven or Gradle, standardize the process of specifying and acquiring external dependencies. Process-based solutions such as continuous integration aim to solve the unpredictable nature of builds across various development environments. However, all that such tools can do is make the process predictable. They cannot verify the validity or accuracy of the result that they help assemble. Once the dependencies are fetched, there's nothing that those tools can do to detect missing or duplicate types in the classpath.

Back to Amit's story. Having no way to verify whether all the classes are available up front, Amit hopes for the best and deploys the application. The application starts up fine and runs for a couple of hours without any errors. However, there's still no saying if he's got it right. Maybe there's a class in there that hasn't been executed yet, but when it has, the JVM might realize that it cannot find one of its imports. Or, maybe, there are duplicate versions of the same class in the classpath and the JVM picks up the first copy it finds. Wasn't there a better way to ensure that any given Java application will work reliably in advance?

The classpath problem

We've seen two problems in Jack's and Amit's stories. Jack needed an effective way to encapsulate portions of his library, but couldn't. Amit needed a way to ensurereliableexecution of his application without actually executing it. Both Jack and Amit didn't really have a solution to their problems because of the way classpath resolution works in Java. We may sometimes mistakenly think of a JAR file as a way to build a reusable module in Java, but that's unfortunately not the case. A JAR file is just a convenient bundle of classes. Nothing more! Once in the classpath, the JVM treats classes in a JAR no differently from separate class files all in the same root directory. At runtime, as far as the JVM is concerned, an application is just a set of classes in a flat list of packages.

What's worse is, once a class is in the classpath, it's free for all. It's incredibly easy for any developer to use a type they are not supposed to, or a type that might be available for them during compile time, but not at deployment/runtime. Or there could be multiple copies or even multiple versions of the same class in two different classpath locations, making it unpredictable which version the runtime will actually pick up during execution. There's a problem commonly called JAR hell, which refers to several issues resulting from mismatched and incorrect classes and versions in JAR files.

This problem is exacerbated in huge code bases with hundreds of thousands of classes. Imagine all those classes in your application as a flat list with no structure! It's a nightmare to maintain and organize. The bigger the code base, the bigger the problem. To illustrate this, let's take the classic example of a code base that's written in Java, that's incredibly large and complex, and has lasted for many years now. It is perhaps one of the oldest Java code bases ever, and still it continues to grow and change at a fairly rapid pace. Any guesses? Well, it's the Java platform itself!