Mastering Akka

Mastering Akka

By : Christian Baxter

Buy this Book

Mastering Akka

By: Christian Baxter

Buy this Book

Overview of this book

For a programmer, writing multi-threaded applications is critical as it is important to break large tasks into smaller ones and run them simultaneously. Akka is a distributed computing toolkit that uses the abstraction of the Actor model, enabling developers to build correct, concurrent, and distributed applications using Java and Scala with ease. The book begins with a quick introduction that simplifies concurrent programming with actors. We then proceed to master all aspects of domain-driven design. We’ll teach you how to scale out with Akka Remoting/Clustering. Finally, we introduce Conductr as a means to deploy to and manage microservices across a cluster.

Mastering Akka

Credits

About the Author

Acknowledgments

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Building a Better Reactive App

Understanding the initial example app

Working with the example application

So what's wrong with this application?

Summary

Simplifying Concurrent Programming with Actors

Understanding the actor model's origin

Differentiating concurrency and parallelism

Using Akka actors for safe concurrency

A word on dispatchers in Akka

Mailbox types in Akka

Refactoring a bad actor to FSM

Testing your Akka actors

Summary

Curing Anemic Models with Domain-Driven Design

What is DDD?

Identifying anemic vs rich models

Designing our DDD refactor

Understanding the refactored bookstore code

Fixing the transaction in the order creation process

Improvements needed for the refactor

Refactoring homework

Summary

Making History with Event Sourcing

Event sourcing primer

Akka Persistence for event sourcing

Using Cassandra for our persistent store

Refactoring the inventory management module

The refactored sales order processing module

Refactoring homework

Summary

Separating Concerns with CQRS

A brief history of CQRS

CQRS in the event sourced world

Designing our read model implementation

Implementing the Book read model

Refactoring sales order processing

Playing around with the new application

Refactoring homework

Closing comments

Summary

Going with the Flow with Akka Streams

Understanding the Reactive Streams API

The Akka Streams API

Refactoring the view building code

Refactoring homework

Summary

REST Easy with Akka HTTP

Creating RESTful HTTP interfaces

Comparing Akka HTTP to Spray

Creating RESTful interfaces with Akka HTTP

Invoking RESTful APIs with Akka HTTP

Refactoring homework

Summary

Scaling Out with Akka Remoting/Clustering

Using Akka Remoting

Using Akka Clustering

Using clustering in the bookstore application

Summary

Managing Deployments with ConductR

An overview of ConductR

Installing and setting up ConductR locally

Using the ConductR CLI

Preparing your services for ConductR

Refactoring the bookstore for ConductR

Building and deploying the bookstore bundles

Testing the bookstore modules in ConductR

Summary

Troubleshooting and Best Practices

Refactoring retrospective

Troubleshooting and best practices for actors

Troubleshooting and best practices for Akka HTTP

Troubleshooting and best practices for Akka Clustering/Akka Remoting

So what's wrong with this application?

By this point, you've had some time to interact with the example app to see what it can do. You've also looked at the code enough to see how everything is coded. In fact, maybe you've coded something similar to this yourself when building reactive apps on top of Akka. So now, the million dollar question is, "What's actually wrong with this app?"

The short answer is probably nothing. Wrong is a very black and white word, and when it comes to coding and application design, you're dealing more with shades of gray. This app may suit some needs perfectly well. For example, if high scalability is not a concern, you have a small development team and/or if the app's functionality doesn't need to expand much more.

This wouldn't be much of a book if we left it at that though. The long answer is that while nothing is absolutely wrong, there is a lot that we can improve upon to help our app and team continue to grow and scale. I'll break down some of the areas that I think can be made better in the following sections. This will help serve as a primer for some of the refactors that we will do in the upcoming chapters.

Understanding the meanings of scalability

To me, scalability is a nebulous term. I think a lot of people, when they hear scalability, immediately begin to think of things such as performance, throughput, queries per second, and the likes. These types of areas address the runtime characteristics of whatever application you have deployed. This is certainly a big aspect of the scalability umbrella and very important to your app's growth, but it's not the only thing you should be thinking of.

Another key area of scalability, that I think of when discussing the topic, is how well your application codebase will scale to the growth of both in-app functionality and to the growth of the development team. When your team is small (like me as the single developer of this example app) and the feature set of the application is minimal (again, like this example app), then codebase scalability is probably not the first thing on your mind. However, if you expect your company to be successful and grow, then your codebase needs to grow along with it; or else you run the risk of becoming impediment to the business as opposed to an enabler of the business. Therefore, there are some decisions that you can make earlier on in the growth process to help enable the codebase to scale with the growth of the business and development team.

There's a lot to discuss related to these two areas of application scalability, and we will break them down in more detail in the subsequent sections.

The scalability cube

If you haven't encountered Martin L. Abbot and Michael T. Fisher's excellent book, The Art of Scalability, then you should give it a look some time. This book covers all aspects of scaling a business and the technology that goes along with it. In many a meeting, I've referenced materials from this book when discussing how to architect software components. There's a ton of valuable lessons in here for beginners all the way up to the more seasoned technologists.

In the book, the authors discuss the concept of the the Three Dimensions of Scalability for a running application. Those dimensions are represented in the following diagram of the cube:

X axis scaling

The X axis scaling is the one that most people are familiar with. You run multiple copies of the application code on different servers and put a load balancer in front to partition inbound traffic. This kind of scaling gives you high availability, in that, if one server dies, the app can still serve traffic as the load balancer will redirect that traffic to the other nodes. This technique also gives your better per-node throughput as each node is only handling a percentage of the traffic. If you see your nodes are struggling to handle the current traffic rate, simply add another node to ease the burden a little on the existing nodes.

This is the kind of scaling that a monolith, like the example app, is most likely to use. It's pretty simple to keep scaling out by just adding more application nodes, but it seems a little inefficient in terms of resource usage. For example, if in your monolith, it's really only one of the services that is receiving the bulk of the additional load, you still need to deploy every other service into the new server even though they do not need the additional headroom.

In this kind of deployment, your services can't really have their own individual scaling profile. They are all scaling out together because they are co-deployed in a monolith. When you are sizing out the new instance node to deploy into in terms of CPU and RAM, you are stuck with a more generic profile as this node will handle traffic for all of the services. If certain services were more CPU heavy and others were more RAM heavy, you end up having to pick a node that has both a lot of RAM and a high number of CPUs as opposed to being able to choose between either one. These kinds of decisions can be cost prohibitive in the cloud-based world where changes in either of those two areas cost more money and more do when they need to be coupled together.

Microservices and Y axis scaling

The Y axis scaling approach addresses this exact kind of problem. With y axis scaling, you break up the application around functional boundaries and then deploy these different functional areas separately and independent of one another. This way, if functional area A needs high CPU and functional area B needs high RAM, you can select individual node instances that are the best suited to those needs. In addition, if functional area A receives the bulk of the traffic, then you can increase the number of nodes that handle that area without having to do the same thing for functional areas that receive less traffic.

If you've heard of the Microservices approach to building software (and it would be hard not to have, given how much technical literature on the Web is dedicated to it), then you are familiar with an example of using Y axis scaling to build software. This kind of approach achieves the goal of small independent services that can scale independent of each other, but it comes with additional complexity around the deployment and management of those components. In a simple monolithic, X axis style deployment, you know that every component is on every node, so the load balancer can send traffic to any of the available nodes.

In a Microservice deployment, you need to know where in the node set service A lives (and it should be in multiple nodes to give high availability) so that you can route traffic accordingly. This service location concern complicates these kinds of deployments and can involve bringing in another moving part (software component) to handle it, which further increases the complexity. Many times though, the benefits gained from decoupled, independent services outweigh these additional complexities enough to make this kind of approach worth pursuing.

Z axis scaling

In Z axis scaling, you take the same component and duplicate it across an entire set of nodes, but you make each node responsible for only a subset of the requests or data. This is commonly referred to as sharding, and it is quite often seen in database-related technologies. If you are running in-memory caching on the nodes, then this kind of approach eliminates duplicate cached data in each node (which can occur with X axis scaling). Only the node responsible for each data set (defined by your partitioning scheme) will receive traffic for that data.

When using Akka Cluster Sharding, you can be sure only one actor instance is receiving requests for a particular entity or piece of data, and this is an example of z axis scaling.

Monolith versus microservices

Our initial example app is a monolith, and even though that approach now carries negative connotations, it's not necessarily a bad thing. When starting out on a new application, a monolith-first kind of approach may actually be the right choice. It's simpler to code, build, deploy, and deal with when it's in production as opposed to a more sophisticated (but complicated) microservices deployment. Sometimes, trying to start out with something like microservices can lead to the team and application collapsing under the weight of the additional complexity of that pattern.

Within an Agile development approach, getting the software in the hands of users quickly so you can get feedback and iterate further is going to be way more important then having a fancy, new fangled architecture. The product team is not going to want to hear that they have deployment impediments, because you still can't figure out how to get all of your decoupled services to communicate together. At the end of the day, as an engineer, you're there to build and deploy a product. If your initial architecture prevents doing this easily, then it will be hard for you and your business to be successful.

In the beginning stages of a new business, it's important to be able to iterate and change features easily in response to feedback. Take too long (due to a complicated initial architecture) and you risk being passed by a competitor, or dropped by those very users who provided that valuable feedback. That's why it's not necessarily bad to start out with something like a monolith. It's simple to deploy and scale initially with an Xx axis style approach, and that can suit most needs just fine in the beginning.

Scale issues with our monolith

In the case of our bookstore application, we followed the more simple monolith-first approach, but now, the rubber is finally hitting the road. We have traction with users. Our product feature set is growing and so is the team and the complexity of the application. Our monolith is starting to become an impediment to our agility and that's going to be a problem. Because of this, we are going to embrace the microservices style of small, independent, decoupled services. The subsections to follow will detail how we arrived at this decision.

Issues with the size of deployments

Currently, we need to build and deploy the entire codebase even if we are touching a single line within a single service. The less code you have to deploy, the less risky the deployment is. Also, as the monolith continues to grow, so does the build and test cycle associated with it. If all you have to do is change a single line buried deep down within a service, but the ensuing build and test cycle takes 30 minutes because it's rebuilding everything, this will eventually become a problem. If you have an issue in production and need to get a hotfix out as soon as possible, you don't want this monolith-side effect to get in the way of that. This is where a microservices-like approach will help alleviate that problem.

Supporting different runtime characteristics of services

Now that our app has been in production for a while, we are starting to better understand the runtime usage patterns of our different sets of services. We know which services are being hit the most in the normal app flows, and we would like to be able to have more instances of these critical services available compared to other less important services. Our current monolith does not support us in doing this, but switching to a microservices-like system will enable it.

The pain of a shared domain model

The shared domain model in the app (in the common project) is going to be an issue when it comes to isolating the deployments. Changes to this shared model will necessitate full deployments, and that will prevent our goal of smaller isolated deployments. When we initially designed the example app structure, we thought we were being forward looking in separating services into different projects and then allowing them to communicate via the shared domain model. We eventually wanted to package and deploy these projects separately, but now, in hindsight, the shared domain model is actually going to make this harder as opposed to enabling it to happen.

If we do end up packaging and separating the services fully for deployments, each one will have to have a copy of the shared domain library code available to it at runtime in order to run. If we change the domain model and then only deploy one of the services, you run the risk of having issues when that service communicates with another service that was not rebuilt and redeployed after the model change.

If we had decided to use Akka remoting to handle remote communication between actor services, then we could run into issues with Java serialization when deserializing the messages and result types exchanged by the services.

We could work around this by using a different serialization scheme (such as protobuf), but this is certainly more work, and there are more flexible ways to communicate between our services.

We should try and decouple our components and modules as much as possible. We need an approach that allows them to communicate indirectly, outside of the normal request/response cycle of a user interaction with the application. This most likely means some form of event based interaction between our modules, with schemas and versioning for those events, so that other modules can consume them safely even as the model continues to evolve.

We can fall back on direct communication over HTTP (with versioning of endpoints), if necessary to support some interactions. We should only use that as a last resort though if an indirect approach just won't fit a certain situation well enough. Direct communications like this create the kind of coupling that we are trying to avoid, so they should used sparingly.

Issues with our relational database usage

When we built out our bookstore application, the team decided to use a relational database to store data, selecting Postgres as one to use. As far as relational databases go, Postgres is a solid choice. As I mentioned earlier, it's fast and has a lot of great additional features such as JSON column type support. But now that we are moving towards a microservices approach, is it going to still be the right choice for our application? I see the following shortcomings with our current usage of Postgres that will likely lead to us moving away from it as we evolve our application.

Sharing the same schema across all services

The Microservices approach promotes the shared nothing model of software development. This means that each microservice should not share any of its code models or database schemas with any other component in the system. Sharing creates coupling and we are trying to decouple or components as much as possible. If you try and have fully decoupled services, but they end up sharing a single database and schema underneath them, then you're going to end up in trouble pretty quickly. If you make a change to the schema, it's highly possible that change is going to ripple through multiple services, causing you to have to recode and redeploy more than you intended. When this happens, you're right back to where you were with your monolith and don't actually have service independence.

A traditional relational database model is designed around having a highly related and normalized model that will span all subdomains within your business. You will more than likely have database entities from one subdomain related (via foreign key) to an entity from another subdomain. We see it in the example app's schema where we have foreign keys from SalesOrderHeader to StoreUser and from SalesOrderLineItem to Book. These cross-subdomain relationships will end up causing problems if we are trying to do a share nothing microservices type model. If we are going to go down this path, then we will need to consider alternatives to a relational model.

Fixing a single point of failure

When building out complex systems, you are only ever as strong as your weakest link. In our current application, that weak link is our Postgres database because it's not highly available; in fact, it's a single point of failure.

With our current deployment model, the app itself is highly available because we have duplicated it across a set of nodes and put a load balancer in front (X axis scaling). We can survive the failure of a node because we have others that can pick up the slack for it until we get it back online. Unfortunately, we cannot say the same for Postgres. Currently, we are only rolling out a single Postgres instance, and so, if that goes down, it's game over for our application.

Postgres certainly supports techniques to eliminate it being a single point of failure. You can start by setting it up with a node as a hot standby using log shipping. In the case of a failure in the master, you can cut over to the secondary node with only minimal data loss. You can't write to that secondary node (it's master/slave, not master/master). However, if you put a little work into your application layer, you could leverage that secondary for reads and ease the burden on the master node a little as long as you can deal with potential replication lag (stale data) when performing reads.

This kind of model is better than the single point of failure we had before, but it still seems prone to the database itself having to deal with a lot of activity as our user base grows, especially the master node. We need to make sure we size that instance correctly (vertically scaled) to allow it to handle the load we expect to happen as our user base grows.

We could try and ease this burden by sharding (Zz axis scaling) the data in Postgres, but as this is not natively supported, we would need to roll out our own solution. If we distribute the data to a bunch of Postgres instances, we can no longer rely on the auto-generated keys in the tables to be globally unique across all of the database nodes. Because of this, we would have to do something like generating the keys in the application layer (as GUIDs perhaps) as opposed to letting the database generate them. We would then have to write our own shard-routing logic in the application layer to consistently hash the key to determine which node to store it or retrieve it from. In addition, for queries that look up more than one record, we would have to write out our own logic to distribute that query across all shards (a global query) as the matching records will likely be in multiple shards.

A custom sharding solution like this could certainly work, but this seems like a lot more complexity being put on our code base. If we don't get this shard routing logic right, then the consequences are pretty bad as we could miss data, and the app will act as if it didn't exist even though it might. There must be something we can more easily do to give us high availability and avoid having all of the data stored in one single location.

Avoiding cross-domain transactions

Another potential problem that has crept up with our usage of a relational database is that we are performing a database transaction that crosses service-domain boundaries. You can see this transaction within the OrderManager service when it's creating a new SalesOrder. If you look at the code in the DAO class, you can see these three steps being executed in a single transaction:

Insert the SalesOrderHeader record.
Insert each SalesOrderLineItem record for the order.
For each book (on each line item), decrement the inventory for that book.

We coded this using a transaction because we felt it was required to have strong consistency between the number of sales for a book and the remaining inventory for that book. The code does check to make sure that the inventory is available before attempting to write to the database, but that inventory could be sold out from underneath us after we checked it and before we commit it.

The statement to decrement inventory uses an optimistic concurrency-checking technique to ensure this does not happen. That statement looks like this:

update Book set inventoryAmount = inventoryAmount - $  {item.quantity} where id = ${item.bookId} and inventoryAmount >=  ${item.quantity}

The key there is the where clause, where we are checking to make sure that the row we are about to apply our atomic decrement to still has at least the quantity we plan on deducting from it. If it doesn't, then our code fails the transaction explicitly by applying a filter on the result, making sure the number of rows updated is 1 and not 0. The code that handles that responsibility is as follows:

insert. 
andThen(decrementInv). 
filter(_ == 1)

This is all coded soundly and works as expected, but is this the best way that we can be handling the requirement of keeping inventory aligned with sales quantities?

I think the main issue here is the fact that we are executing a transaction that really spans two separate subdomains within our application; sales order processing and inventory management. We did this because we thought that the strong consistency gained from an atomicity, consistency, isolation, and durability (ACID) transaction was the only way to make this work properly. The problem with this approach though is that it's not going to scale, both from a performance perspective and from a code design and deployment perspective.

These kinds of ACID transactions are heavy weight for the database and can start to cripple it if they are happening at a high frequency. We obviously expect and want sales to be happening at a very high frequency, so there's a clear conflict of interest here. Also, currently, we are sort of benefitting from the fact that there is a shared single database under the app. What if we decided to keep using a relational database, but separated it out so that each service had its own schema or db instance? How would we make something like this multitable, cross-domain transaction work then?

If we were faced with such a problem, we'd probably need to look into getting distributed transactions (XA transactions) working across the different databases, and that's not a good direction to be forced into. While it would allow us to keep our strong consistency guarantees and ACID compliance, XA transactions can be initially difficult to set up and get working correctly in your code. In addition, they are a big performance drain, as the two-phase commit involves longer lock durations than the same transaction would in a local only mode, also increasing the possibility of deadlocks. Distributed transactions are also tied to the availability of multiple systems (databases in our case), so if either of those systems is not available, you cannot proceed with your transaction.

So, we need to be able to support high throughput handling of sales orders while at the same time be able to properly keep inventory in sync with the sales of our books. Also, we want to avoid crossing over into another domain's responsibility when processing that sales order. There must be a technique that will allow us to do this and fit well into our proposed microservices model for developing our services.

Understanding the CAP theorem

When designing the way sales orders were handled, we were sure that ACID level consistency was the proper way to handle things. Now that we are faced with the issues discussed in the previous section, that decision is starting to look more like a problem than a solution. We do want some level of consistency in the data, but strong consistency is not the only game in town, and there is another model that can help us avoid being burned by ACID.

We also have realized that our current Postgres deployment sets the db up as a single point of failure. Ideally, we need to embrace a model where the data is distributed across a set of nodes (with replication) so that we can get both high availability and be able to deal with the temporary loss of a node within that cluster.

The CAP theorem, also known as Brewer's theorem after the University of California Berkley computer scientist Eric Brewer, is a way to think about consistency guarantees within a distributed system. The theorem states that it's not possible for a distributed system to supply all three of the following simultaneously:

Consistency: Do all my nodes see (on a read) the same exact data after a write has occurred?
Availability: Do all my requests get a response?
Partition tolerance: Will my system continue to operate in the face of arbitrary loss of parts of the system?

We all want a system that is consistent, highly available, and has partition failure tolerance, but this theorem states that the best you can do is two out of the three. There are a few databases out there now that can give us high availability and partition tolerance, sacrificing strong consistence for an eventually consistent model instead. These kinds of databases do not support the atomic, consistent, isolated, and durable guarantees that an ACID compliant database will give you. Instead, these databases give you guarantees of basically available, soft state, eventually consistent, or BASE.

It's a bit of a mental shift to embrace this new model of eventual consistency, but the tradeoff of a highly available system with partition failure tolerance can mitigate that change. We need to find a database that fits in this space as that will best support our shift to a share nothing microservices like model.

Cassandra to the rescue

Apache's Cassandra is a distributed key/value document store that also supports queries of that data using secondary indexes. From the CAP theorem, Cassandra's model gives you both high availability and partition tolerance. Cassandra achieves this by having a cluster of nodes where a specific range of keys is assigned to multiple nodes in the cluster. This way, if you need to look up a key, there will be multiple possible nodes that the request could be serviced by. If one of the nodes goes down, then you will still be able to have that request serviced by another cluster member that also handles that range of keys.

The thing to be careful of in Cassandra is that the data in the cluster is only eventually consistent. If nodes A, D, and F in my cluster house the key foo, and I update that key in node A, and then I read it again, and the read goes to node D, it's not guaranteed yet that my update has been received by that node D, which can lead to a stale read. Cassandra offers the ability to tune the consistency model to make it more consistent, but this comes at the price of latency, so be careful if you decide to go this route.

So, how can we apply Cassandra and its model to our initial cross-domain transaction problem? Well, we could use Akka Persistence (which works with Cassandra) to first write the SalesOrder into the system in an event-sourced manner, with an initial status of pending as it's awaiting inventory allocation. The Book subsystem could be listening on the event stream for SalesOrder activity, and when it sees a new one, it can see what books and quantities are on it and reserve inventory for it (if available), resulting in a new InventoryReserved event for that SalesOrder ID. The Order subsystem is in turn listening on the event stream for inventory-related activity and will update its status to approved and start the process of packing and shipping the order once it sees that inventory is available.

So, using Cassandra here, we get a database that is very fast in writing SalesOrder into the system. It's also a database that is highly available and can handle node failure, which are guarantees that our current Postgres database can't make. Then, leveraging Akka Persistence and using an event-sourced model on top of Cassandra, we can use an eventually consistent approach to get the SalesOrder and book inventory systems working together.

This approach eliminates direct interaction between those services and also does away with that nasty cross-domain transaction. It allows us to better scale the runtime performance of the app and the codebase itself, which are both big wins for the future health of our app.

Assessing the application's domain model

If you have looked at the example app's code, you have probably seen the following structure related to entities and services:

Entities (such as Book, BookstoreUser,and SalesOrder) are modeled as very simple case classes without any business logic
Services (ending with the *Manager suffix) are set up to handle the business logic for those entities

This is a common paradigm seen in software development, so it's not like we were going off the rails with this approach. It's pretty simple to develop and understand, and can be a good approach to use when your problem domain and codebase are small and simple. The only problem is that its modeling of our problem domain is not entirely representative of how things work in the real world of book sales. In fact, models like this have been referred to as being anemic, in that, they only weakly resemble the problem domain they are trying to represent.

The domain-driven design (DDD) is a newer approach to software modeling that aims to have a more representative modeling of software components. The term was coined by Eric Evans in his book of the same name. The goal of a DDD approach is to model the software components after representations within the domain. These domain representations will encapsulate the business logic and functions of those business entities entirely. In doing so, you have the business entities in your system that are much richer representations of their real-life counterparts.

In our current example app, this means something such as Book, which is a very simple case class, becomes an actor that accepts messages that allow you to do things to that book as part of the user interactions with our app. The DDD approach has its own set of building blocks, such as aggregates and bounded contexts, that we can use to remodel our current app into something that better represents the business domain.

This kind of approach is a bit more complex than the simpler model the app currently uses. Its true benefit is realized within complex business domains as the one-to-one relationship between the software and the domain concepts eases the development burden of that complex domain. This is a bit of a stretch for the relatively simple mode in our example app, but we're going to give it a shot anyway as part of one of our refactors. At the very least, we can explore a different way of modeling software, one that might really benefit our app if it starts to get more complex in what it's doing.

Recognizing poorly written actors

If you happened to look at the logic within the SalesOrderManager actor, you will notice a fairly complicated actor, at least in terms of the other actors in this app. This actor needs to work with a bunch of the other services in the app to first gather some data (to perform validations) before it talks to the database to create the order. The bulk of the work is laid out in the createOrder method and is as follows:

val bookMgrFut = lookup(BookMgrName) 
val userMgrFut = lookup(UserManagerName) 
val creditMgrFut = lookup(CreditHandlerName) 
for{ 
  bookMgr <- bookMgrFut 
  userMgr <- userMgrFut 
  creditMgr <- creditMgrFut 
  (user, lineItems) <- loadUser(request, userMgr). 
  zip(buildLineItems(request, bookMgr)) 
  total = lineItems.map(_.cost).sum 
  creditTxn <- chargeCreditCard(request, total, creditMgr) 
  order = SalesOrder(0, user.id, creditTxn.id,  
  SalesOrderStatus.InProgress, total,  
  lineItems, new Date, new Date) 
  daoResult <- dao.createSalesOrder(order) 
} yield daoResult

This all looks nice and neat, and the code is pretty well organized. Readability is enhanced by delegating a lot of the work into separate methods instead of directly in the body of the for comprehension. So what's wrong with an approach like this?

Mixing actors and Futures is a topic that has received much chatter out there on the Internet. A lot of people call it anti-pattern, and I agree mostly. I think a little Future usage, like how the other actors use the DAO and then pipe the result back to the sender, is okay, but this clearly crosses the line.

One of the biggest concerns with mixing Futures and actors is the fear that you might accidentally close over mutable scope (variables) and access them in an unsafe way. Once the actor code execution hits the first Future callback, you are done processing that message and the actor moves on to the next one in the mailbox. If you close over something that is mutable (sender() being the classic example), you run the risk of you trying to access it while another thread (the one processing the next message in the mailbox) is also accessing it. This basically eliminates one of the biggest benefits of actors, in that, you get serialized mailbox handling and thus don't have to worry about concurrent modifications to internal state. This particular actor doesn't have issues with mishandling of mutable state, but that alone doesn't mean it's a good use of actors. So, much work is being done outside of the context of message handling via the mailbox, that coded as is, it's not really worth using an actor.

On top of that, there is a fair amount of using the ask pattern here (the ? operator). This pattern involved making a request/response semantic out of what is normally a one way messaging pattern with tell (!). The Akka team pushes you to try and limit as it leads more into mixing Futures with your actors, which can lead to undesired behavior. In addition, a short-lived actor instance is created behind the Future so that the receiver has an actual sender() ActorRef to send a response back to. This creation of short-lived instance is inefficient, especially when you consider the total number of times it's happening within the servicing of a CreateOrder request.

We need to clean up this actor so that we don't use it to set a precedent for Future actors that also have complex workflows. For me, when faced with a rather involved flow like this one, I use a pattern of an actor per request in combination with using Akka's finite-state machine (FSM). Using FSM, we can design all of the different aspects of the flow as states and then progress through them as we react to the different data we're loading and processing.

An approach like this makes the code more intuitive as you just need to understand the different states and what triggers are allowing you to move between them, eventually reaching a termination point. As a side effect, this approach also allows me to get rid of ask and focus completely on tell when communicating with other actors. Having cleaner, more intuitive, and idiomatic actor code is a big win, and we will jump right into this refactor in Chapter 2, Simplifying Concurrent Programming with Actors.

Replacing our HTTP libraries

Within the example app, we have the need for both inbound and outbound HTTP handling. These two needs are met by unfiltered and dispatch, respectively. These two sister libraries have personally accomplished a lot for me in my Scala development projects. Unfortunately, neither is as actively maintained as we would like and that can be a problem moving forward. For instance, if you suddenly needed a new feature within this app, such as HTTP/2 support, you might be stuck waiting a while to get it. If you are going all in on using a third-party library, it's always a good practice to use one that is very actively maintained and with a lot of people using it. This means that things such as bugs will be fixed early and often, and there is also a lot of community information out there to help you if and when you get stuck.

Fortunately, starting with release version 2.4.2, Akka now includes full support for both inbound and outbound non-blocking HTTP handling. The HTTP support is based on the excellent spray library and is fully integrated with Akka's Reactive Streams project. These modules (Akka Streams and Akka HTTP) had been available separately before, with their own versioning scheme, but now they are available and versioned along with the other core Akka projects. Also, before being folded into the core Akka repo, these modules had been tagged as experimental. Now these modules are no longer tagged as experimental, with akka-http being the only exception as of the writing of this book.

Being an Akka library, and thus built on top of actors, Akka HTTP will be a much better fit with our current use of actors then either the Unfiltered or Dispatch libraries were. We can eliminate some extra thread pools that unfiltered and dispatch were using, using the actor system's dispatcher(s) instead, which should lead to better use of our CPU (less total threads to deal with).

As with all Akka libraries, the HTTP module is very actively maintained, and there is great community support out there. As we are all in on Akka already with this app, getting rid of two third-party libraries and replacing them with something from Akka can also be considered a big win. Depending on too many third-party libraries from many different sources can be an impediment to upgrading Scala (due to binary incompatibility) when that need arises, and that's not a boat we want to be in.

All in all, this seems like a great decision and will be part of our ongoing refactor process in the upcoming chapters.

Mastering Akka

By : Christian Baxter

Mastering Akka

By: Christian Baxter

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Akka

So what's wrong with this application?

Understanding the meanings of scalability

The scalability cube

X axis scaling

Microservices and Y axis scaling

Z axis scaling

Monolith versus microservices

Scale issues with our monolith

Issues with the size of deployments

Supporting different runtime characteristics of services

The pain of a shared domain model

Issues with our relational database usage

Sharing the same schema across all services

Fixing a single point of failure

Avoiding cross-domain transactions

Understanding the CAP theorem

Cassandra to the rescue

Assessing the application's domain model

Recognizing poorly written actors

Replacing our HTTP libraries