Book Image

Concurrent Patterns and Best Practices

By : Atul S. Khot
Book Image

Concurrent Patterns and Best Practices

By: Atul S. Khot

Overview of this book

Selecting the correct concurrency architecture has a significant impact on the design and performance of your applications. Concurrent design patterns help you understand the different characteristics of parallel architecture to make your code faster and more efficient. This book will help Java developers take a hands-on approach to building scalable and distributed apps by following step-by-step explanations of essential concepts and practical examples. You’ll begin with basic concurrency concepts and delve into the patterns used for explicit locking, lock-free programming, futures, and actors. You’ll explore coding with multithreading design patterns, including master, slave, leader, follower, and map-reduce, and then move on to solve problems using synchronizer patterns. You'll even discover the rationale for these patterns in distributed and parallel applications, and understand how future composition, immutability, and the monadic flow help you create more robust code. By the end of the book, you’ll be able to use concurrent design patterns to build high performance applications confidently.
Table of Contents (14 chapters)

Concurrency in a breeze

We start with a simple definition. When things happen at the sametime, we say that things are happening concurrently. As far as this book is concerned, whenever parts of an executable program run at the same time, we are dealing with concurrent programming. We use the term parallel programming as a synonym for concurrent programming.  

The world is full of concurrent occurrences. Let's look at a real-life example. Say that there are a certain number of cars driving on a multilane highway. In the same lane, though, cars need to follow other cars, the ones that are already ahead of them. A road lane, in this case, is a resource to be shared. 

When a toll plaza is built, however, things change. Each car stops in its lane for a minute or two to pay the toll and collect a receipt. While the toll attendant is engaged with the car, other cars behind it need to queue up and wait. However, a toll plaza has more than one payment gate. There are attendants at each gate, attending to different cars at the same timeIf there are three attendants, each serving one gate, then three cars could pay the toll at the same point in time; that is, they can get serviced in parallel, as shown in the following diagram:      

Note that the cars queuing up at the same booth get serviced in sequence. At any given time, a toll attendant can service only one car, so others in the queue need to wait for their turn. 

It would be really odd to see a toll booth with just one gate! People wouldn't be served in parallelStrictly sequential processing of toll charges would make life unbearable for the frequent traveler.

Even when there are multiple gates and an abnormally large influx of cars (say on vacations), each gate becomes a bottleneck; there are far fewer resources for servicing the workload.     

The push for concurrency

Let's come back to the software world. You want to listen to some music at the same time that you are writing an article. Isn't that a basic need? Oh yes, and your mail program should also be running so that you get important emails in time. It is difficult to imagine life if these programs don't run in parallel.

As time goes by, software is becoming bigger and demand for faster CPUs is ever increasing; for example, database transactions/second are increasing in number. The data processing demand is beyond the capabilities of any single machine. So a divide and conquer strategy is applied: many machines work concurrently on different data partitions. 

Another problem is that chip manufacturers are hitting limits on how fast they can make chips go! Improving the chip to make the CPU faster has inherent limitations. See for a lucid explanation of this problem. 

Today's big data systems are processing trillions of messages per second, and all using commodity hardware (that is, the hardware you and me are using for our day-to-day development)—nothing fancy, like a supercomputer. 

The rise of the cloud has put provisioning power in the hands of almost everyone. You don't need to spend too much upfront to try out new ideas—just rent the processing infrastructure on the cloud to try out the implementation of the idea. The following diagram shows both scaling approaches:


The central infrastructure design themes are horizontal versus vertical scaling. Horizontal scaling essentially implies the use of a distributed concurrency pattern; it's cost effective, and a prominent idea in the big data world. For example, NoSQL databases (such as Cassandra), analytics processing systems (such as Apache Spark), and message brokers (such as Apache Kafka) all use horizontal scaling, and that means distributed and concurrent processing.

On the other hand, installing more memory or processing power in a single computer is a good example of vertical scaling. See for a comparison between the two scaling approaches.

We will look at two common concurrency themes for horizontally scaled systems: MapReduce and fault tolerance.   

The MapReduce pattern

The MapReduce pattern is an example of a common case where concurrency is needed. The following diagram shows a word frequency counter; given a huge text stream of trillions of words, we need to see how many times every word occurs in the text. The algorithm is super simple: we keep the running count of each word in a hash table, with the word as the key and the counter as the value. The hash table allows us to quickly look up the next word and increment the associated value (counter). 

Given the size of the input text, one single node does not have the memory for the entire hash table! Concurrency provides a solution, using the MapReduce pattern, as shown in the following diagram:


The solution is divide and conquer: we maintain a distributed hash table and run the same algorithm, suitably adapted for a cluster.

The master node reads—parses—the text and pushes it to a set ofslave processing nodes. The idea is to distribute the text in such a way that one word is processed by one slave node only. For example, given three processing nodes, as shown in the preceding diagram, we would dividerangewise: push nodes starting with the characters {a..j} to node 1, {k..r} to node 2 and the rest—{s..z}—onto node 3. This is the map part (distribution of work).

Once the stream is exhausted, each slave node sends its frequency result back to the master, which prints the result. 

The slave nodes are all doing the same processing concurrently. Note that the algorithm would run faster if we add, more slave nodes; that is, if we scale it horizontally.  

Fault tolerance

Another common approach is to build in intentional redundancy to provide fault tolerance; for example, big data processing systems, such as Cassandra, Kafka, and ZooKeeper, can't afford to go down completely. The following diagram shows how concurrently replicating the input stream protects against any one slave node going down. This pattern is commonly used by Apache Kafka, Apache Cassandra, and many other systems:

The right side of the diagram shows redundant machines on which a data stream is replicated 

In case any one node goes down (hardware failure), other redundant nodes take its place, thus ensuring that the system as a whole is never down.

Time sharing 

In the real world, we also perform a number of tasks concurrentlyWe attend to a task and then if another task also needs our attention, we switch to it, attend to it for a while, and then go back to the first task. Let's look at a real-world example of how an office receptionist deals with their tasks. 

When you visit any office, there is usually a receptionist who receives you and asks for your business. Say that, just as they are asking about who you need to meet, the office buzzer rings. They take the call, say "hello," speak for a while, and then ask you to wait for a second. Once the call is finished, they resume talking to you. These actions are shown in the following diagram:

The receptionist is sharing her time among all the parties interested in talking to her. She is working in a way so that everyone gets a slice of her time. 

Now, keeping the toll plaza and the receptionist in mind, replace the toll operators with a CPU core and the cars with tasks, and you get a fairly good mental model of today's concurrent hardware. If we increase the number of toll operators from three to six, we will increase the number of cars getting serviced in parallel (at the exact same time) to six. A pleasant side effect is that the queued cars will also spread out, and every car will get serviced fasterThe same holds true when we execute a concurrent program. Hence, things are faster overall.

Just as the receptionist is doing multiple things at the same time, such as time sharing between the visitors, a CPU shares time with processes (running programs). This is how concurrency gets supported on a single CPU. 

Two models for concurrent programming

Concurrency implies that multiple tasks are happening in parallel to achieve a common goal. Just like communication with a group, we need to communicate and coordinate with the concurrently executing entities.

For example, let us say that we present the previously mentioned word frequency counter via a UI functionality. A user uploads a huge file and clicks the start button, resulting in a long-running MapReduce job. We need to distribute the work among the slaves. To send the workload, we need a way to communicate with them. The following diagram shows the various streams of communications that are required in this system:

If the user changes their mind and aborts the job, we need to communicate the stop message to each concurrent entity, as any further processing is futile.

There are two prominent models for concurrent communications: message passing and shared memory. The preceding diagram shows a message passing model.   

We will first discuss the message passing model, using the celebrated UNIX shell pipeline as an example. Next, we will see the shared memory approach in depth and the problems that are associated with it.