Book Image

Mastering Blockchain - Fourth Edition

By : Imran Bashir
5 (3)
Book Image

Mastering Blockchain - Fourth Edition

5 (3)
By: Imran Bashir

Overview of this book

Blockchain is the backbone of cryptocurrencies, it has had a massive impact in many sectors, including finance, supply chains, healthcare, government, and media. It’s also being used for cutting edge technologies such as AI and IoT. This new edition is thoroughly revised to offer a practical approach to using Ethereum, Hyperledger, Fabric, and Corda with step-by-step tutorials and real-world use-cases to help you understand everything you need to know about blockchain development and implementation. With new chapters on Decentralized Finance and solving privacy, identity, and security issues, as well as bonus online content exploring alternative blockchains, this is an unmissable read for everyone who wants to gain a deep understanding of blockchain. The book doesn’t shy away from advanced topics and practical expertise, such as decentralized application (DApp) development using smart contracts and oracles, and emerging trends in the blockchain space. Throughout the book, you’ll explore blockchain solutions beyond cryptocurrencies, such as the IoT with blockchain, enterprise blockchains, and tokenization, and gain insight into the future scope of this fascinating and disruptive technology. By the end of this blockchain book, you will have gained a thorough comprehension of the various facets of blockchain and understand the potential of this technology in diverse real-world scenarios.
Table of Contents (24 chapters)
23
Index

Distributed systems

Understanding distributed systems is essential to our understanding of blockchain, as blockchain is a distributed system at its core. It is a distributed ledger that can be centralized or decentralized. A blockchain is originally intended to be and is usually used as a decentralized platform. It can be thought of as a system that has properties of both decentralized and distributed paradigms. It is a decentralized-distributed system.

A distributed system is a computing paradigm whereby two or more nodes work with each other in a coordinated fashion to achieve a common outcome. It is modeled in such a way that end users see it as a single logical platform. For example, Google’s search engine is based on a large distributed system; however, to a user, it looks like a single, coherent platform. It is composed of processes (nodes) and channels (communication channels) where nodes communicate by passing messages. A blockchain is a message-passing distributed system.

A node is an individual player (a computer) in a distributed system. All nodes can send and receive messages to and from each other. Nodes can be honest, faulty, or malicious, and they have memory and a processor. A node that exhibits arbitrary behavior is known as a Byzantine node after the Byzantine Generals problem.

The Byzantine Generals problem

In 1982, a thought experiment was proposed by Lamport et al. in their research paper, The Byzantine Generals Problem, which is available here: https://www.microsoft.com/en-us/research/publication/byzantine-generals-problem/

In this problem, a group of army generals who lead different parts of the Byzantine army is planning to attack or retreat from a city. The only way of communicating with them is via a messenger. They need to agree to strike at the same time to win. The issue is that one or more generals might be traitors who could send a misleading message. Moreover, the messenger could be captured by the city, resulting in no message delivery. Therefore, there is a need for a viable mechanism that allows agreement among the generals, even in the presence of the treacherous ones, and message loss, so that the attack can still take place at the same time. As an analogy for distributed systems, the generals can be considered as honest nodes, the traitors as Byzantine nodes (that is, nodes with arbitrary behavior), the messenger can be thought of as a channel of communication with the generals, and a captured messenger as a delayed or lost message. Several solutions were presented to this problem in the paper by Lamport et al. in 1982.

This type of inconsistent behavior of Byzantine nodes can be intentionally malicious, which is detrimental to the operation of the network. Any unexpected behavior by a node on the network, whether malicious or not, can be categorized as Byzantine.

A small-scale example of a distributed system is shown in the following diagram. This distributed system has six nodes, of which one (N4) is a Byzantine node, leading to possible data inconsistency. L2 is a link that is broken or slow, and this can lead to a partition in the network:

Diagram  Description automatically generated

Figure 1.1: Design of a distributed system: N4 is a Byzantine node and L2 is broken or a slow network link

Two key challenges of a distributed system design are the coordination between nodes and fault tolerance. Even if some (a certain threshold dictated by the consensus protocol) of the nodes become faulty or network links break, the distributed system should be able to tolerate this and continue to work to achieve the desired result. This problem has been an active area of distributed system design research for many years, and several algorithms and mechanisms have been proposed to overcome these issues.

Distributed systems are challenging to design. It has been proven that a distributed system cannot have all three of the much-desired properties of consistency, availability, and partition tolerance simultaneously. This principle is known as the CAP theorem.

CAP theorem

The CAP theorem, also known as Brewer’s theorem, was introduced by Eric Brewer in 1998 as a conjecture. In 2002, it was proven as a theorem by Seth Gilbert and Nancy Lynch. The theorem states that any distributed system cannot have consistency, availability, and partition tolerance simultaneously:

  • Consistency is a property that ensures that all nodes in a distributed system have a single, current, and identical copy of the data. Consistency is achieved using consensus algorithms to ensure that all nodes have the same copy of the data. This is also called state machine replication (SMR). The blockchain is a means of achieving state machine replication.
  • Availability means that the nodes in the system are up, accessible, and are accepting incoming requests and responding with data without any failures as and when required. In other words, data is available at each node and the nodes are responding to requests.
  • Partition tolerance ensures that if a group of nodes is unable to communicate with other nodes due to network failures, the distributed system continues to operate correctly. This can occur due to network and node failures.

A Venn diagram is commonly used to visualize the CAP theorem, as shown below:

Diagram, venn diagram  Description automatically generated

Figure 1.2: CAP theorem

The preceding diagram depicts that only two properties at a time can be attained; either AP, CA, or CP.

In summary:

  • If we opt for CP (consistency and partition tolerance), we sacrifice availability.
  • If we opt for AP (availability and partition tolerance), we sacrifice consistency.
  • If we opt for AC (availability and consistency), we sacrifice partition tolerance.

Note that AC does not really exist. The CAP theorem in practice means that in the case of a network partition, a distributed system is either available or consistent. As network partitions cannot be ignored, the choice is between consistency or availability when a network partition occurs.

We can explain this concept with the following example.

Let’s imagine that there is a distributed system with two nodes. Now, let’s apply the three theorem properties to this smallest of possible distributed systems with only two nodes:

  • Consistency is achieved if both nodes have the same shared state; that is, they have the same up-to-date copy of the data.
  • Availability is achieved if both nodes are up and running and responding with the latest copy of data.
  • Partition tolerance is achieved if, despite communication failure or delay between nodes, the network (distributed system) continues to operate.

Now think of a scenario where a partition occurs and nodes can no longer communicate with each other. If newly updated data comes in now, it can only be updated on one node. If the node accepts the update, then only one node in the network is updated, and consistency is lost. If the node rejects the update, that will result in a loss of availability. This means that either availability or consistency is unachievable due to the network partition. This is strange because somehow, blockchain manages to achieve all these properties, violating the theorem (especially in its most successful implementation, Bitcoin)—or does it?

In blockchains, consistency is sacrificed in favor of availability and partition tolerance. In this scenario, consistency (C) in the blockchain is not achieved simultaneously with partition tolerance (P) and availability (A), but it is achieved over time. This is called eventual consistency, where consistency is achieved due to validation from multiple nodes over time. There can be a temporary disagreement between nodes on the final state, but it is eventually agreed upon. For example, in Bitcoin, multiple transaction confirmations are required to achieve a good confidence level that transactions may not be rolled back in the future. Eventually, a consistent view of transaction history is available in all nodes. Multiple confirmations of a transaction over time provide eventual consistency in Bitcoin. For this purpose, the process of mining was introduced in Bitcoin. Mining is a process that facilitates the achievement of consensus by using the Proof of Work (PoW) algorithm. At a higher level, mining can be defined as a process that’s used to add more blocks to the blockchain. We will cover more on this later in Chapter 6, Bitcoin Architecture.

PACELC theorem

An extension of the CAP theorem called PACELC was first proposed by Daniel J. Abadi from Yale University. It states that, in addition to the three properties proposed by CAP, there are also tradeoffs between latency and consistency. It states that tradeoffs between consistency and latency always exist in replicated systems, whereas CAP is only applicable when there are network partitions. In other words, it means that even if no network partitions occur, under normal operation the tradeoff between consistency and latency exists. For example, some databases may choose to give up consistency for lower latency, and some databases could pay the availability and latency costs to achieve consistency. This is true for replicated systems and presents a more inclusive picture of consistency tradeoffs in distributed systems.

PACELC was formally proved in a paper available here: https://dl.acm.org/doi/10.1145/3197406.3197420

With a better understanding of distributed systems, let’s now talk about blockchain itself. First, we’ll begin with a brief rundown of the history of blockchain and Bitcoin.