Book Image

Scala Microservices

By : Selvam Palanimalai, Jatin Puri
Book Image

Scala Microservices

By: Selvam Palanimalai, Jatin Puri

Overview of this book

<p>In this book we will learn what it takes to build great applications using Microservices, the pitfalls associated with such a design and the techniques to avoid them. </p><p>We learn to build highly performant applications using Play Framework. You will understand the importance of writing code that is asynchronous and nonblocking and how Play leverages this paradigm for higher throughput. The book introduces Reactive Manifesto and uses Lagom Framework to implement the suggested paradigms. Lagom teaches us to: build applications that are scalable and resilient to failures, and solves problems faced with microservices like service gateway, service discovery, communication and so on. Message Passing is used as a means to achieve resilience and CQRS with Event Sourcing helps us in modelling data for highly interactive applications. </p><p>The book also shares effective development processes for large teams by using good version control workflow, continuous integration and deployment strategies. We introduce Docker containers and Kubernetes orchestrator. Finally, we look at end to end deployment of a set of scala microservices in kubernetes with load balancing, service discovery and rolling deployments. </p><p></p>
Table of Contents (12 chapters)

Microservices

Microservices are one such design pattern that aids in fixing problems we have faced until now. But what exactly are microservices?

Code example

Let's look at a simple program. We have an array of floating point numbers. We need to sort them in ascending order and finally print them rounded to the closest whole number. And, because we believe that the word Yo has power to inspire the young generation, we will also append Yo while printing the numbers. A simple code for this would be as follows:

double[] arr = ....// array to sort 
 
//sort the array 
for (int i = 0; i < arr.length; i++) { 
   for (int j = i + 1; j < arr.length; j++) { 
       double temp = 0; 
       if (arr[i] > arr[j]) { 
           temp = arr[i]; 
           arr[i] = arr[j]; 
           arr[j] = temp; 
       } 
   } 
} 
 //loop at every element of the array 
for(double num : arr){ 
   //round it to nearest whole number and print 
   long temp = (long) num; 
   if(num - temp > 0.5) 
       System.out.println("Yo - "+(temp+1)); 
   else System.out.println("Yo - "+temp); 
} 

The preceding code definitely doesn't look pretty, though it is a very simple piece of code. Here's a quick summary of the problems with the code:

  • It looks complex and appears to do so many things.
  • The sort implementation cannot be vouched for just by looking at it; it has to be tested.
  • The same applies with the logic for rounding up.
  • We are mixing the rounding up of numbers and printing them. Writing a test case for rounding would be difficult, as we would have to check the output of the program to test.
  • If we have to sort at some other place in the code base, we will have to rewrite it.
  • Any change in the sort or round-up logic would require testing of the complete code.

The code is difficult to read and difficult to test. The program does four primary things-- sort, round to the closest whole number, append Yo to the number, and print.

We can, of course, simplify it by splitting each functionality as a different function, as shown here:

sort(arr); 
for (double num : arr) { 
   long round =round(num); 
   System.out.println(getYoed(round)); 
} 
 
private static String getYoed(long num){ 
   return "Yo - "+Long.toString(num); 
} 

The advantages of the preceding code include the following:

  • The code is readable; we can clearly see that we are sorting, iterating, rounding up, and printing.
  • We can reuse the sort and round methods.
  • It is easier to test the sort and round methods.
  • We could individually optimize the sort algorithm or even rewrite it completely without changing any part of the code. Just that the contract of argument type has to be maintained.
  • We could cache results as part of the sort implementation if the input array is the same as previously invoked.
  • Supervising strategy: let's suppose you implement your sort algorithm based on heuristics, which is faster than the default sort available as part of JDK, you can always call your sort implementation; however, if the implementation throws an exception or provides incorrect results (remember it is heuristics based), you can catch the exception and call the inbuilt JDK available to sort as backup. Thus, there is a supervisor to handle failures and not let the program crash.
  • The code is more maintainable when compared to the previous code.

So, using the right abstraction is important. Years ago, writing an assembly was the only alternative, but, with evolution, came several paradigms. Languages such as C made programmability easier with a procedural and functional type of coding. Lisp showed the power of functional composition and recursion. Languages such as Erlang introduced the actor model as an abstraction to write highly concurrent code. And now we have all sorts of paradigms such as polymorphism with inheritance, parametric polymorphism, ad hoc polymorphism, F-based polymorphism, structural typing, pattern matching, immutability, and so on--the list is never ending. However, each abstraction gave us the ability to express in a better manner than before.

Microservices is not a new concept but an abstraction. It is trying to do exactly what we attempted doing before by extracting our modules from a single giant application (often called a monolith) to different standalone applications (often called microservices). Each microservice will have necessary isolation with other modules of the application. The communication protocol of each microservice would be well-defined. Our application will now be a collaboration of different microservices. The advantages expected by doing this would be as follows:

  • We would have clear semantics of different part of applications
  • It would be easier to scale as we can target each microservice individually
  • It would be easier to test each module
  • Development must be easier as developers, who clearly know their constraints, can focus on one module at a time
  • Failures are easier and more effective to handle

Restructuring

In our talent-search engine application, we can split the application to a set of individual microservices:

  • A separate application/process for each Stack Overflow engine, GitHub engine, and LinkedIn engine, which will collect and process data for each kind of site and expose the final results comprising of the rank and score of each user of a respective site using a communication protocol, normally HTTP.
  • A separate microservice to consolidate all the developer meta information. This will act as a single source of information related to all the developers on the platform. It will also link handles on different sites to a single developer (the same user can have accounts on each of GitHub, Stack Overflow, and LinkedIn, and it helps if they can be linked to a single user).
  • A frontend server that will receive the searches. This will also store user preferences.
  • A rank server that consolidates results from different microservices and generates a global rank of developers on the platform.

We have discussed isolation and the need to split up into a system of microservices, but how do we identify the basis to form a microservice and the boundary?

What exactly are microservices

Linux tools such as Unix Pipelines, Diff, ls, sort, grep, and others are interesting metaphors. Doug Mcilroy has documented the Unix philosophy and summarizes it at https://en.wikipedia.org/wiki/Unix_philosophy.

This is the Unix philosophy--Write programs that do one thing and do it well. Write programs to work together.

A Linux command to search an exception message in a log file is:

tail output.log | grep Exception

This works great because we distinctively know what tail and grep do individually. They do one thing and they do it great. And then we integrate both of them to work together via a clean interface.

In the paper, Program design in Unix Environment by Unix authors Rob Pike and Brian W. Kernighan, they summarize it at http://harmful.cat-v.org/cat-v/unix_prog_design.pdf.

Much of the power of the UNIX operating system comes from a style of program design that makes programs easy to use and, more important, easy to combine with other programs. This style has been called the use of software tools, and depends more on how the programs fit into the programming environment and how they can be used with other programs than on how they are designed internally. But, as the system has become commercially successful and has spread widely, this style has often been compromised, to the detriment of all users. Old programs have become encrusted with dubious features. Newer programs are not always written with attention to proper separation of function and design for interconnection.

The Unix philosophy tends to favor composability over monolithic designs.

There is no global definition for a microservice. But here we intend to define it as:

Do one thing and do it well!

This clarity makes the composition extremely powerful, as we understand the consequences of using a microservice very well. The semantics are clear, just like the Unix toolset. Everything that has to be done with respect to functionality will be done at one place. So, this means the developers working on a microservice will only be concerned with the code base of a single functionality, the database architects know the precise optimizations to be done, as there is precise clarity on usage (as before many modules using the same database complicates matters), and scalability is easier, as you would have to worry about a small portion of a problem rather than the complete application.

Sharing of a database

A microservice needs to do one thing and must do it well. There also needs to be clear a boundary in between microservices on what distinguishes them.

In software development, every application will have some state to be maintained and persisted. Can this state or database be shared across microservices? If it can, would it mean that the boundary is being crossed when microservices share the state amongst each other?

In order to answer this, we need to better understand what we mean by boundaries for each microservice so that isolation remains intact. In general, we want our microservices to be autonomous. Webster's Dictionary defines Autonomous as:

  • Having the right or power of self-government
  • Undertaken or carried on without outside control

So, an autonomous system must have the sovereign right over everything it does and no outside system must be able to influence or control it. This means an autonomous system must have the capability of existing independently in all situations and thus responding, reacting, or developing independently of the whole.

By this definition and understanding, it is important for a microservice to be immune to outside changes and have total control of what it does. So, it is very crucial for a microservice to own and command over its state. This means the persistence or a database cannot be shared across microservices. If they do, there is no clear boundary between microservices, and the systems cannot be truly autonomous. For example, if two microservices share a same set of tables across each other, then:

  • Any update in the schema by one microservice will unknowingly affect the other.
  • A write lock by one microservice will affect the reads of others.
  • A microservice might decide to cache the data. If another microservice updates the entries in a table, this would mean that the cache of the first microservice now stores incorrect data and the cache will have to be invalidated. So microservices will have to communicate with each other to invalidate respective caches. This also means they will need to understand the internals of each other for an application to survive, adding to development and maintenance complexity.
  • A table can either be optimized for faster reads or for faster writes with appropriate indexes. But rarely for both. One microservice might want the table to exhibit faster writes and the other microservice might wish for faster reads. In such a scenario, they are at crossheads.

This situation would be very similar to a monolith. It's like breaking different interfaces of a monolith into different applications, just that now they interact with each other over the network. Of course, this means more things can go wrong.

Services must be loosely coupled, so that they can be developed, deployed, and scaled independently and changes to one microservice state should not impact other microservices. For a microservice to be autonomous in the truest sense, it cannot share the same state or even the same database server. One could argue that there is no harm in sharing the same instance of a database server given they don't share the same set of tables in a database. But this adds to the unnecessary complexity where in time no one remains sure about what each microservice owns and the other doesn't. Maybe a solution could be to name tables with a prefix of the application that owns it.

But it doesn't solve the problem that if the database server is affected due to heavy load by one microservice, it inadvertently affects the other microservice. Moreover, each application might want to configure the database best for its usage patterns. In Microsoft SQL Server, we usually set the number of data files (data files contain data and objects such as tables, indexes, stored procedures, and views); for example, we could either set a 160 GB database as 1 * 160 GB file or 2 * 80 GB or 4 * 40 GB, and so. Each setup will provide the best performance for different scenarios.

In short, microservices do not share their persistence storage with each other. If they do, then the isolation is affected

However, sometimes things are not very straightforward. It might be that common tables need to be shared across microservices and we need to learn mechanisms to handle such scenarios.

Different things are inferred differently by different people. For example, if you are decorating a children's bedroom, as a parent you focus on the wall paint color, decor, a study table, height of the bed, and so on. But to the construction agency, they are only concerned about the dimensions of the room and not the decor. So, even though the domain is the same, the context of each is different. Hence, the modeling of the same domain by both of them would be different, which may means different schema even though both of them deal with a single object that is a room.

So, it might appear that it make sense to share the same state (a room here), but what they represent can be very different. We will jump into details of modeling such applications when we introduce domain driven design and its implication of architecting microservices in Chapter 8, Effective Communication, with strategies to set boundaries in complex domain designs. Then we will introduce design patterns on how to model them effectively in practice using event sourcing and CQRS.

Defining microservice

To summarize what we mean by a microservice:

  • Microservices do one thing and they do it well.
  • A microservice owns its own state. It does not share it with others microservices.

Micro in microservice

The word micro in microservice can be very misleading.

Micro in microservice does not mean that the microservice should be tiny in terms of the number of lines of code or shouldn't do many operations. There is no constraint on the size of a microservice. For example, in our search engine application, we could have a separate microservice for the Stack Overflow engine. Now the code base for this microservice has no limits. It could be a 1,000 lines of code or may be a hundred thousand lines of code long. It could use different types of databases to store content, different technologies to cache, and a plethora of libraries to function. All of this is irrelevant until the time it is in its own boundary, and does not interfere with others and everything related to Stack Overflow is done at this single place.

The size of the microservice is irrelevant until the time the right boundary across domains is maintained.

Polyglot

In our application, if one realizes that once the communication semantics are set, we could implement any of the Stack Overflow or LinkedIn engines in any programming language. A team might wish to do it in Haskell and someone else in Groovy and they will all have the independence and flexibility in which to implement it. Designing with microservices gives this as a first class mechanism.

The dark side of microservices architecture

Not everything is rosy. Just breaking different parts of the systems into different applications is not making things easy if we do not do it the right way. Worse, it becomes like a giant distributed system where the method calls are over the network. And the network always has numerous reasons to mess up. It becomes unpredictable and the application becomes difficult to debug.

With distributed systems, there could be other problems, such as:

  • Deployment of many applications becomes painful as we now have numerous applications to deploy.
  • As if tracking and monitoring the monolith was easy! We now have several microservices to monitor, adding to the workload.
  • Collecting and viewing logs and restarting applications.
  • Distributed transactions. With a monolith, transactions are easier, as it is a single application and one could lock across several tables (and modules) for a transaction. With distributed systems, transactions can get extremely complicated (or impossible) as they are different applications altogether and having a lock spread across all of them is a nightmare. We need to learn the skill to model data in distributed systems.
  • An application will need the address of other applications to access them. If the communication is via HTTP then it will need to store the URL of all the applications it wishes to access. If another application changes its URL, this will have to be communicated to all the dependents to update their URL list. Worse, if the input format changes, such as the method signature of a service that takes HTTP requests, this change in signature has to be conveyed to all other teams. Not doing so will lead to errors.
  • Logs are scattered across microservices. If something goes wrong, trailback gets difficult. This also makes finding the root cause of the problem difficult.
  • Version management.
  • One of the tactics mentioned in this chapter is to do asynchronous communication by message passing. But asynchronous communication can get very complicated as the response of the operation would be notified sometime in the future. This can add to code complexity and makes debugging difficult unless the right abstractions are used.

We need to be well equipped to handle all the scenarios better to see the true benefits of the microservices architecture. This book will be a journey on how to write great microservices at the beginning and then focus on tackling pain points effectively towards the end.

Why Scala

For the functionalities we desire, we need the right toolset to express them better.

Scala provides abstractions (such as a powerful type system, pattern matching, implicits, immutable collections, and many others) and syntactic sugar (such as case classes, for comprehensions, extractors, and many others) to express better. This results in code that is not only less verbose (and less code means less bugs) but also write scalable code that the language name is synonym with.

Coupled with the fact that Scala runs on a fantastic VM--Java Virtual Machine, which is one of the best platforms out there for high performance, this also gives the ability to access all the Java libraries. It makes Scala be the best of both worlds--great runtime execution thanks to JVM and an expressive Typesafe programming language.

If you would like to learn more about Scala, we suggest that you read the book, Programming in Scala by Martin Odersky, Lex Spoon, and Bill Venners, Artima Inc. Also the online free course, Functional Programming Principles in Scala, on Coursera (https://www.coursera.org/learn/progfun1), by Martin Odersky is a good introduction.