Book Image

Oracle Coherence 3.5

By : Aleksandar Seovic
Book Image

Oracle Coherence 3.5

By: Aleksandar Seovic

Overview of this book

Scalability, performance, and reliability have to be designed into an application from the very beginning, as there may be substantial cost or implementation consequences if they need to be added down the line. This indispensible book will teach you how to achieve these things using Oracle Coherence, a leading data grid product on the market.Authored by leading Oracle Coherence authorities, this essential book will teach you how to use Oracle Coherence to build high-performance applications that scale to hundreds of machines and have no single points of failure. You will learn when and how to use Coherence features such as distributed caching, parallel processing, and real-time events within your application, and understand how Coherence fits into the overall application architecture. Oracle Coherence provides a solid architectural foundation for scalable, high-performance and highly available enterprise applications, through features such as distributed caching, parallel processing, distributed queries and aggregations, real-time events, and the elimination of single points of failure.However, in order to take full advantage of these features, you need to design your application for Coherence from the beginning. Based on the authors' extensive knowledge of Oracle Coherence, and how to use it in the real world, this book will provide you with all the information you need in order to leverage various Coherence features properly. It contains a collection of best practice-based solutions and mini-frameworks that will allow you to be more productive from the very beginning.The early chapters cover basics like installation guidelines and caching topologies, before moving on to the domain model implementation guidelines, distributed queries and aggregations, parallel processing, and real-time events. Towards the end, you learn how to integrate Coherence with different persistence technologies, how to access Coherence from platforms other than Java, and how to test and debug classes and applications that depend on Coherence.
Table of Contents (22 chapters)
Oracle Coherence 3.5
Credits
Foreword
About the author
Acknowledgements
About the co-authors
About the reviewers
Preface
12
The Right Tool for the Job
Index

Preface

As an architect of a large, mission-critical website or enterprise application, you need to address at least three major non-functional requirements: performance, scalability, and availability.

Performance is defined as the amount of time that an operation takes to complete. In a web application, it is usually measured as "time to last byte" (TTLB)—the amount of time elapsed from the moment the web server received a request, until the moment the last byte of response has been sent back to the client. Performance is extremely important, because experience has shown us that no matter how great and full-featured an application is, if it is slow and unresponsive, the users will hate it.

Scalability is the ability of the system to maintain acceptable performance as the load increases, or to support additional load by adding hardware resources. While it is relatively simple to make an application perform well in a single-user environment, it is significantly more difficult to maintain that level of performance as the number of simultaneous users increases to thousands, or in the case of very large public websites, to tens or even hundreds of thousands. The bottom line is, if your application doesn't scale well, its performance will degrade as the load increases and the users will hate it.

Finally, availability is measured as the percentage of time an application is available to the users. While some applications can crash several times a day without causing major inconvenience to the user, most mission-critical applications simply cannot afford that luxury and need to be available 24 hours a day, every day. If your application is mission critical, you need to ensure that it is highly available or the users will hate it. To make things even worse, if you build an e-commerce website that crashes during the holiday season, your investors will hate you as well.

The moral of the story is that in order to keep your users happy and avoid all that hatred, you as an architect need to ensure that your application is fast, remains fast even under heavy load, and stays up and running even when the hardware or software components that it depends on fail. Unfortunately, while it is relatively easy to satisfy any one of these three requirements individually and not too difficult to comply with any two of them, it is considerably more difficult to fulfill all three at the same time.

Introducing Oracle Coherence

Over the last few years, In-Memory Data Grids have become an increasingly popular way to solve many of the problems related to performance and scalability, while improving availability of the system at the same time.

Oracle Coherence is an In-Memory Data Grid that allows you to eliminate single points of failure and single points of bottleneck in your application by distributing your application's objects and related processing across multiple physical servers.

There are several important points in the definition above:

  • Coherence manages application objects, which are ready for use within the application. This eliminates the need for repeated, and often expensive, loading and transformation of the raw data into objects.

  • Coherence distributes application objects across many physical servers while ensuring that a coherent, Single System Image (SSI) is presented to the application.

  • Coherence ensures that no data or in-flight operations are lost by assuming that any node could fail at any time and by ensuring that every piece of information is stored in multiple places.

  • Coherence stores data in memory in order to achieve very high performance and low latency for data access.

  • Coherence allows you to distribute not only application objects, but also the processing that should be performed on these objects. This can help you eliminate single points of bottleneck.

The following sections provide a high-level overview of Coherence features; the remainder of the book will teach you "how", and more importantly, "when" to use them.

Distributed caching

One of the easiest ways to improve application performance is to bring data closer to the application, and keep it in a format that the application can consume more easily.

Most enterprise applications are written in one of the object-oriented languages, such as Java or C#, while most data is stored in relational databases, such as Oracle, MySql or SQL Server. This means that in order to use the data, the application needs to load it from the database and convert it into objects. Because of the impedance mismatch between tabular data in the database and objects in memory, this conversion process is not always simple and introduces some overhead, even when sophisticated O-R mapping tools, such as Hibernate or EclipseLink are used.

Caching objects in the application tier minimizes this performance overhead by avoiding unnecessary trips to the database and data conversion. This is why all production-quality O-R mapping tools cache objects internally and short-circuit object lookups by returning cached instances instead, whenever possible.

However, when you scale out your application across multiple servers, you will start running into cache synchronization issues. Each server will cache its own copy of the data, and will have no way of knowing if that same data has been changed on another server—in this case, the locally cached copy should be invalidated and evicted from the cache.

Oracle Coherence solves this problem by allowing you to distribute your cache across a cluster of machines, while providing a unified, fully coherent view of the data. This means that you can configure Coherence as an L2 cache for Hibernate or EclipseLink, and forget about distributed cache synchronization!

If this was all Coherence did, it would be impressive enough. However, it actually does so much more that I don't recommend using it purely as an L2 cache, unless you have an existing application that you need to scale out. While Coherence works like a charm as an L2 cache behind an O-R mapper, this architecture barely scratches the surface of what Coherence can do. It is like "killing an ox for a pound of meat", as the Serbian proverb says.

It is much more powerful to use Coherence as a logical persistence layer of your application, which sits between the application logic and the physical data store. Whenever the application needs data, it asks Coherence for it. If the data is not already in the cache, Coherence will transparently load it from the data store, cache it, and return it to the application. Similarly, when the application needs to store data, it simply puts objects into the cache, and Coherence updates the underlying data store automatically.

This architecture is depicted in the following diagram and is the basis for the architecture we will use throughout the book:

Although Coherence is not really a persistent store in the preceding scenario, the fact that the application thinks that it is decouples the application from the data store and enables you to achieve very high scalability and availability. You can even configure Coherence so the application will be isolated from a complete data store failure.

Distributed queries

Having all the data in the world is meaningless unless there is a way to find the information you need, when you need it. One of the many advantages of In-Memory Data Grids over clustered caches, such as Memcached, is the ability to find data not just by the primary key, but also by executing queries and aggregations against the cache.

Coherence is no exception—it allows you to execute queries and aggregations in parallel, across all the nodes in the cluster. This allows for the efficient processing of large data sets within the grid and enables you to improve aggregation and query performance by simply adding more nodes to the cluster.

In-place and parallel processing

In many situations, you can improve performance enormously if you perform the processing where the data is stored, instead of retrieving the data that needs to be processed. For example, while working with a relational database, you can use bulk update or a stored procedure to update many records without moving any data across the network.

Coherence allows you to achieve the same thing. Instead of retrieving the whole dataset that needs to be processed and iterating over it on a single machine, you can create an entry processor—a class that encapsulates the logic you want to execute for each object in a target dataset. You can then submit an instance of the processor into the cluster, and it will be executed locally on each node. By doing so, you eliminate the need to move a large amount of data across the network. The entry processor itself is typically very small and allows processing to occur in parallel.

The performance benefit of this approach is tremendous. Entry processors, just like distributed queries, execute in parallel across grid nodes. This allows you to improve performance by simply spreading your data across more nodes.

Coherence also provides a grid-enabled implementation of CommonJ Work Manager, which is the basis for JSR-237. This allows you to submit a collection of work items that Coherence will execute "in parallel" across the grid. Again, the more nodes you have in the grid, the more work items can be executed in parallel, thereby improving the overall performance.

Cache events

In many applications, it is useful to know when a particular piece of data changes. For example, you might need to update a stock price on the screen as it changes, or alert the user if a new workflow task is assigned to them.

The easiest and the most common solution is to periodically poll the server to see if the information on the client needs to be updated. This is essentially what Outlook does when it checks for new e-mail on the POP3 mail server, and you (the user) control how often the polling should happen.

The problem with polling is that the more frequently it occurs, the more load it puts on the server, decreasing its scalability, even if there is no new information to be retrieved.

On the other hand, if the server knows which information you are interested in, it can push that information to you. This is how Outlook works with Exchange Server—when the new mail arrives, the Exchange Server notifies Outlook about this event, and Outlook displays the new message in your inbox.

Coherence allows you to register interest in a specific cache, a specific item, or even a specific subset of the data within the cache using a query. You can specify if you are interested in cache insertions, updates or deletions only, as well as whether you would like to receive the old and the new cache value with the event.

As the events occur in the cluster, your application is notified and can take the appropriate action, without the need to poll the server.

Coherence within the Oracle ecosystem

If you look at Oracle marketing material, you will find out that Coherence is a member of the Oracle Fusion Middleware product suite. However, if you dig a bit deeper, you will find out that it is not just another product in the suite, but a foundation for some of the high-profile initiatives that have been announced by Oracle, such as Oracle WebLogic Application Grid and Complex Event Processing.

Coherence is also the underpinning of the "SOA grid"—a next-generation SOA platform that David Chappell, vice president and chief technologist for SOA at Oracle, wrote about for The SOA Magazine [SOAGrid1&2].

I believe that over the next few years, we will see Coherence being used more and more as an enabling technology within various Oracle products, because it provides an excellent foundation for fast, scalable, and highly-available solutions.

Coherence usage scenarios

There are many possible uses for Coherence, some more conventional than the others.

It is commonly used as a mechanism to off-load expensive, difficult-to-scale backend systems, such as databases and mainframes. By fronting these systems with Coherence, you can significantly improve performance and reduce the cost of data access.

Another common usage scenario is eXtreme Transaction Processing (XTP). Because of the way Coherence partitions data across the cluster, you can easily achieve throughput of several thousand transactions per second. What's even better is that you can scale the system to support an increasing load by simply adding new nodes to the cluster.

As it stores all the data in memory and allows you to process it in-place and in parallel, Coherence can also be used as a computational grid. In one such application, a customer was able to reduce the time it took to perform risk calculation from eighteen hours to twenty minutes.

Coherence is also a great integration platform. It allows you to load data from multiple data sources (including databases, mainframes, web services, ERP, CRM, DMS, or any other enterprise system), providing a uniform data access interface to client applications at the same time.

Finally, it is an excellent foundation for applications using the Event Driven Architecture, and can be easily integrated with messaging, ESB, and Complex Event Processing (CEP) systems.

That said, for the remainder of the book I will use the "conventional" web application architecture described earlier, to illustrate Coherence features—primarily because most developers are already familiar with it and also because it will make the text much easier to follow.

Oracle Coherence editions

Coherence has three different editions—Standard, Enterprise, and Grid Editions. As is usually the case, each of these editions has a different price point and feature set, so you should evaluate your needs carefully before buying.

The Coherence client also has two different editions—Data Client and Real-Time Client. However, for the most part, the client edition is determined by the server license you purchase—Standard and Enterprise Edition give you a Data Client license, whereas the Grid Edition gives you a Real-Time Client license.

A high-level overview of edition differences can be found at http://www.oracle.com/technology/products/coherence/coherencedatagrid/coherence_editions.html, but you are likely to find the following documents available in the Coherence Knowledge Base much more useful:

Throughout the book, I will assume that you are using the Grid Edition and Real-Time Client Edition, which provide access to all Coherence features.

The important thing to note is that when you go to Oracle's website to download Coherence for evaluation, you will find only one download package for Java, one for .NET, and one for each supported C++ platform. This is because all the editions are included into the same binary distribution; choosing the edition you want to use is simply a matter of obtaining the appropriate license and specifying the edition in the configuration file.

By default, Grid Edition features are enabled on the server and Real-Time Client features on the client, which is exactly what you will need in order to run the examples given in the book.

What this book covers

Chapter 1, Achieving Performance, Scalability, and Availability Objectives discusses obstacles to scalability, performance, and availability and also some common approaches that are used to overcome these obstacles. It also talks about how these solutions can be improved using Coherence.

Chapter 2, Getting Started teaches you how set up Coherence correctly in a development environment, and the basics of how to access Coherence caches, both by using the supplied command-line client and programmatically.

Chapter 3, Planning Your Caches covers various cache topologies supported by Coherence and provides guidance on when to use each one and how to configure them.

Chapter 4, Implementing Domain Objectsintroduces the sample application we will be building throughout the book and shows you how to design your domain objects to take full advantage of Coherence.

Chapter 5, Querying the Data Grid teaches you how to use Coherence queries and aggregators to retrieve data from the cache in parallel.

Chapter 6, Parallel and In-Place Processing covers Coherence features that allow you to perform in-place or parallel processing within a data grid.

Chapter 7, Processing Data Grid Events shows you how to use powerful event mechanisms provided by Coherence.

Chapter 8, Implementing Persistence Layer discusses options for integration with various data repositories, including relational databases.

Chapter 9, Bridging Platform and Network Boundaries covers the Coherence*Extend protocol, which allows you to access a Coherence cluster from remote clients and from platforms and languages other than Java, such as .NET and C++.

Chapters 10, Accessing Coherence from .NET and Chapter 11, Accessing Coherence from C++ teach you how to access Coherence from .NET and C++ clients, respectively.

Chapter 12, The Right Tool for the Job, provides some parting thoughts and reiterates practices you should apply when building scalable applications.

Appendix, Coherent Bank Sample Application, describes how to set up the sample application that accompanies the book in your environment.

The main goal of this book is to provide the missing information that puts various Coherence features into context and teaches you when to use them. As such, it does not cover every nook and cranny Coherence has to offer, and you are encouraged to refer to the Coherence product documentation [CohDoc] for details.

On the other hand, real-world applications are not developed using a single technology, no matter how powerful that one technology is. While the main focus of the book is Coherence, it will also discuss how Coherence fits into the overall application architecture, and show you how to integrate Coherence with some popular open source frameworks and tools.

You are encouraged to read this book in the order it was written, as the material in each chapter builds on the topics that were previously discussed.

What you need for this book

In addition to some spare time, an open mind, and a desire to learn, you will need to have Java SDK 1.5 or higher in order to run Coherence and the examples given in this book. While Coherence itself will run just fine on Java 1.4, the examples use some features that are only available in Java 1.5 or higher, such as enums and generics.

To run .NET examples from Chapter 10, you will need .NET Framework 3.5 and Visual Studio 2008. Although you can access Coherence using .NET Framework 1.1 and higher, the examples use features such as generics and Windows Presentation Foundation, which are only available in the more recent releases of the .NET Framework.

Finally, to run the C++ examples from Chapter 11, you need an appropriate version of the C++ compiler and related tools depending on your platform (for details check http://download.oracle.com/docs/cd/E14526_01/coh.350/e14513/cpprequire.htm#BABDCDFG), a fast machine to compile and link examples on, and a lot of patience!

Who this book is for

The primary audience for this book is experienced architects and developers who are interested in, or responsible for, the design and implementation of scalable, high-performance systems using Oracle Coherence.

However, Coherence has features that make it useful even in smaller applications, such as applications based on Event Driven Architecture, or Service Oriented Applications that would benefit from the high-performance, platform-independent binary protocol built into Coherence.

Finally, this book should be an interesting read for anyone who wants to learn more about the implementation of scalable systems in general, and how Oracle Coherence can be used to remove much of the pain associated with the endeavor.

Who this book is not for

This book is not for a beginner looking forward to learning how to write computer software. While I will try to introduce the concepts in a logical order and provide background information where necessary, for the most part I will assume that you, the reader, are an experienced software development professional with a solid knowledge of object-oriented design, Java, and XML.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text are shown as follows: "As a matter of fact, such a class already exists within coherence.jar, and is called AbstractEvolvable".

A block of code is set as follows:

public interface QueryMap extends Map {
  Set keySet(Filter filter);
  Set entrySet(Filter filter);
  Set entrySet(Filter filter, Comparator comparator);
  ...
}

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

Filter filter = new BetweenFilter(
                     new PropertyExtractor("time"),
                     from, to);

Any command-line input or output is written as follows:

$ . bin/multicast-test.sh –ttl 0

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "clicking the OK button finishes the installation".

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to , and mention the book title via the subject of your message.

If there is a book that you need and would like to see us publish, please send us a note in the SUGGEST A TITLE form on www.packtpub.com or e-mail to .

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book on, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Tip

Downloading the example code for the book

Visit http://www.packtpub.com/files/code/6125_Code.zip to directly download the example code.

The downloadable files contain instructions on how to use them.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the let us know link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at if you are having a problem with any aspect of the book, and we will do our best to address it.