Book Image

Persistence Best Practices for Java Applications

By : Otavio Santana, Karina Varela
Book Image

Persistence Best Practices for Java Applications

By: Otavio Santana, Karina Varela

Overview of this book

Having a solid software architecture breathes life into tech solutions. In the early stages of an application’s development, critical decisions need to be made, such as whether to go for microservices, a monolithic architecture, the event-driven approach, or containerization. In Java contexts, frameworks and runtimes also need to be defi ned. But one aspect is often overlooked – the persistence layer – which plays a vital role similar to that of data stores in modern cloud-native solutions. To optimize applications and data stores, a holistic understanding of best practices, technologies, and existing approaches is crucial. This book presents well-established patterns and standards that can be used in Java solutions, with valuable insights into the pros and cons of trending technologies and frameworks used in cloud-native microservices, alongside good Java coding practices. As you progress, you’ll confront the challenges of cloud adoption head-on, particularly those tied to the growing need for cost reduction through stack modernization. Within these pages, you’ll discover application modernization strategies and learn how enterprise data integration patterns and event-driven architectures enable smooth modernization processes with low-to-zero impact on the existing legacy stack.
Table of Contents (18 chapters)
1
Part 1: Persistence in Cloud Computing – Storing and Managing Data in Modern Software Architecture
6
Part 2: Jakarta EE, MicroProfile, Modern Persistence Technologies, and Their Trade-Offs
9
Chapter 7: The Missing Guide for jOOQ Adoption
11
Part 3: Architectural Perspective over Persistence

Exploring the trade-offs of distributed database systems – a look into the CAP theorem and beyond

If the perfect Distributed Database System (DDBS) were to be described, it would certainly be a database that was highly scalable, provided perfectly consistent data, and didn’t require too much attention in regard to management (tasks such as backup, migrations, and managing the network). Unfortunately, the CAP theorem, formulated by Eric Brewer, states that that’s not possible.

Note

To date, there is no database solution that can provide the ideal combination of features such as total data consistency, high availability, and scalability all together.

For details, check: Towards robust distributed systems. PODC. 7. 10.1145/343477.343502 (https://www.researchgate.net/publication/221343719_Towards_robust_distributed_systems).

The CAP theorem is a way of understanding the trade-offs between different properties of a DDBS. Eric Brewer, at the 2000 Symposium on Principles of Distributed Computing (PODC), conjectured that when creating a DDBS, “you can have at most two of these properties for any shared-data system,” referring to the properties consistency, availability, and tolerance to network partitions.

Figure 1.2 – Representation inspired by Eric Brewer’s keynote presentation

Figure 1.2 – Representation inspired by Eric Brewer’s keynote presentation

Note

Towards Robust Distributed Systems. For more information on Eric Brewer’s work, refer to Brewer, Eric. (2000), presentation: https://people.eecs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf.

The three characteristics described in the CAP theorem can be described as follows:

  • Consistency: The guarantee that every node in a distributed cluster returns the same, most recent, successful write.
  • Availability: Every non-failing node returns a response for all read and write requests in a reasonable amount of time.
  • Partition tolerance: The system continues to function and uphold its consistency guarantees despite network partitions. In other words, the service is running despite crashes, disk failures, database, software, and OS upgrades, power outages, and other factors.

In other words, the DDBSes we can pick and choose from would only be CA (consistent and highly available), CP (consistent and partition-tolerant), or AP (highly available and partition-tolerant).

Tip

As stressed in the book Fundamentals of Software Architecture: An Engineering Approach, good software architecture requires dealing with trade-offs. This is yet another trade-off to take into consideration (https://www.amazon.com/Fundamentals-Software-Architecture-Engineering-Approach-ebook/dp/B0849MPK73/).

By considering the CAP theorem, we can then apply this new knowledge to back us up in decision-making processes in regard to choosing between SQL and NoSQL. For example, traditional DBMSes thrive when (mostly) providing the Atomicity, Consistency, Isolation, and Durability (ACID) properties; however, in regard to distributed systems, it may be necessary to give up consistency and isolation in order to achieve higher availability and better performance. This is commonly known as sacrificing consistency for availability.

Almost 12 years after the idea of CAP was proposed, Seth Gilbert and Nancy Lynch at MIT published some research, a formal proof of Brewer’s conjecture. However, another expert on database system architecture and implementation has also done some research on scalable and distributed systems, adding, to the existing theorem, the consideration of the consistency and latency trade-off.

In 2012, Prof. Daniel Abadi published a study stating CAP has become “increasingly misunderstood and misapplied, causing significant harm” leading to unnecessarily limited Distributed Database Management System (DDBMS) creation, as CAP only presents limitations in the face of certain types of failures – not during normal operations.

Abadi’s paper Consistency Tradeoffs in Modern Distributed Database System Design proposes a new formulation, Performance and Consistency Elasticity Capabilities (PACELC), which argues that the trade-offs between consistency and performance can be managed through the use of elasticity. The following question quoted in the paper clarifies the main idea: “If there is a partition (P), how does the system trade off availability and consistency (A and C); else (E), when the system is running normally in the absence of partitions, how does the system trade off latency (L) and consistency (C)?”

According to Abadi, a distributed database could be both highly consistent and highly performant, but only under certain conditions – only when the system can adjust its consistency level based on network conditions through the use of elasticity.

At this point, the intricacies of building database systems, particularly distributed ones, have been made crystal clear. As professionals tasked with evaluating and selecting DDBSes and designing solutions on top of them, having a fundamental understanding of the concepts discussed in these studies serves as a valuable foundation for informed decision-making.