Book Image

Blockchain Quick Start Guide

By : Xun (Brian) Wu, Weimin Sun
Book Image

Blockchain Quick Start Guide

By: Xun (Brian) Wu, Weimin Sun

Overview of this book

Blockchain is a technology that powers the development of decentralized applications.This technology allows the construction of a network with no single control that enables participants to make contributions to and receive benefits from the network directly. This book will give you a thorough overview of blockchain and explain how a blockchain works.You will begin by going through various blockchain consensus mechanisms and cryptographic hash functions. You will then learn the fundamentals of programming in Solidity – the defacto language for developing decentralize, applications in Ethereum. After that, you will set up an Ethereum development environment and develop, package, build, and test campaign-decentralized applications.The book also shows you how to set up Hyperledger composer tools, analyze business scenarios, design business models, and write a chain code. Finally, you will get a glimpse of how blockchain is actually used in different real-world domains. By the end of this guide, you will be comfortable working with basic blockchain frameworks, and develop secure, decentralized applications in a hassle-free manner.
Table of Contents (14 chapters)
Title Page
Copyright and Credits
About Packt
Contributors
Preface
Index

The genealogy analogy


One of the authors recently attended a Chinese university alma mater reunion event in Beijing, where blockchain became a hot discussion topic. A very well-regarded schoolmate and scholar, Professor Yang, who has authored books on cryptography and public data safeguards, used genealogy to describe a blockchain. This is a well-thought-out analogy since it explains blockchain intuitively and easily. The analogy is borrowed here to illustrate the basic ideas behind the technology. 

Back in the old days in China, it was a custom for each family of a clan (sharing the same last name) to keep a copy of the genealogical tree of the clan. When members of a family changed due to either marriage or the birth of an offspring, as well as adoption, the new member's name would appear in each copy. However, the new member had to be accepted by the clan before the name could be added in. There were cases when a marriage was not endorsed by a majority of the clan due to various reasons. In this case, the new member's name would not be entered into the genealogy. In other words, when a new member joined in a family, the news was broadcast to other families of the clan. If the clan reached a consensus on accepting the new member, each family would update their copy of the genealogical tree to reflect the change. On the other hand, if the clan decided not to accept the new member, the name would not be added in. The genealogy could be used for verification purposes. For example, if a stranger made a claim to be a member of the clan, or two people with the same last name were eager to find out whether they shared the same ancestor, with the genealogy, it was easy to verify this. The outcome would be accepted since the genealogy was considered reliable thanks to the aforementioned consensus and decentralized records, which were difficult to manipulate unless the majority of families agreed.

A blockchain shares many of the characteristics of a genealogy. They are summarized as follows:

  • Like a clan consisting of many related families, a blockchain network consists of nodes. Each node is like a family.
  • Like every family keeping a copy of the clan's genealogy, each node of a blockchain maintains a copy of all transactions that have occurred on the chain, starting from the very beginning. The collection of all transactions is a ledger. This makes a blockchain a decentralized data repository.
  • A genealogy starts with a common ancestor of the clan and names with direct relationships, such as parents and children, that are connected by a line for linkage. Similarly, a ledger consists of blocks. Each block contains one or multiple transactions depending on the type of blockchain. (As you will see later, blocks on Bitcoin or Ethereum host multiple transactions, while R3's Corda uses a block with only one transaction). Transactions are like names, and a block is similar to the invisible box containing a couple's names. An equivalent of the root ancestor is called the genesis block, which is the first block of a blockchain. Similar to a line linking parents and children, a hash, which will later be explained in more detail, points from the current block to its ancestor block. 
  • Like the consensus mechanism for adding new names to a genealogy, the Bitcoin blockchain uses a mechanism called Proof-of-Work to decide whether a block can be added to the chain. Like a genealogy, after a block is added to a chain, it is difficult to change (hack) unless one possesses the majority (which is called a 51% attack) of the computing power of the network. 
  • Genealogy provides transparency in a clan's history. Similarly, a blockchain allows a user to query the whole ledger or just a part of the ledger and find out about coin movements.
  • Since every family kept a copy of the genealogy, it was unlikely to lose the genealogy even if many copies were lost due to a natural disaster, a war, or other reasons. As long as at least one family survived, the genealogy survived. Similarly, a decentralized ledger will survive as long as at least one node survives.

While genealogy is a good analogy to explain some key concepts of a blockchain, they are not the same. Inevitably, there are features that are not shared by them. For example, the blockchain uses cryptography and hashes extensively for data protection and deterring hackers. A genealogy does not have such a need. Therefore, next we move away from the genealogy analogy and explain key blockchain concepts chronically.

Bitcoin

Blockchain technology initially caught people's attention due to the Bitcoin blockchain, an idea outlined by a white paper authored by Satoshi Nakamoto and published in October 2008 on the cryptography mailing list at metzdowd.com. It describes the Bitcoin digital currency (BTC) and was titled Bitcoin: A Peer-to-Peer Electronic Cash System. In January 2009, Satoshi Nakamoto released the first Bitcoin software, which launched the network and the first units of the Bitcoin cryptocurrency: BTC coins.

 

Why Bitcoin

The creation of Bitcoin was right after the 2008 financial crisis, the most severe economic crisis since the Great Depression. This is not coincidental. The inventor of the Bitcoin cryptocurrency aimed at addressing people's disillusionment with financial institutions, whose epic failures in risk controls resulted in the 2008 financial crisis.

A fundamental role played by financial institutions is to be an intermediary entity and bring untrusting parties together to facilitate transactions. For example, a retail bank attracts residual money from individuals and lends to individuals or companies that need the money. The difference in interest paid to the money suppliers and borrowers is the fee a bank charges for providing the intermediary service. Financial institutions are very successful in providing these services and play a pivotal role in powering economies worldwide. However, there are many deficiencies associated with this business model. Here are some examples:

  • Slow: It often takes days to complete a financial transaction. For instance, it takes three days (after an order is initially entered) to complete and settle a cross-border money transfer. To make it happen, multiple departments and application systems within an institution and across institutions have to work together to facilitate the transaction. Another example is stock trading. An investor hires a broker to enter an order to be routed to a stock exchange. Here, the broker is either a member of the exchange or routes the order to another intermediary institution with membership. After a match is found between a buyer and a seller at the exchange, the transaction details are recorded by two parties who send it to their back offices respectively. The back-office teams work with a clearing house for clearance and settlement. It takes T + 3 for both parties to complete the action of exchanging ownership of the security (stock) and the cash.
  • Expensive: Financial intermediaries often charge hefty fees when providing these services. For example, a US bank could charge $10 to $30 USD to serve an individual by sending money from the US to a receiver in another country. In the case of stock trading, a full-service broker often charges tens of USD or more for a transaction. Even with a discount broker, an investor needs to pay $7 to $10 USD per transaction.
  • Prone to be hacked: Since details on a customer and the transactions are saved in a centralized area within an institution, it is prone to being hacked and causing severe financial loss or leakage of confidential personal information about customers. Recently, there have been high-profile personal data leakage incidents at reputable companies such as JP Morgan (83 million accounts hacked in 2014), Target (up to 70 million customers' information hacked in 2013), and Equifax (148 million US consumers' information hacked in 2017).
  • Not transparent: Financial institutions keep both detailed and aggregated information on transactions. However, most of the information is not open to the individual customer and this results in information imparity. In the example of cross-border money transfers, both the sender and receiver have to wait for three days to know whether the transaction has been completed successfully or not. If a transaction fails, a lengthy investigation has to be triggered. Imagine if the receiver was in an emergency and needed the funding immediately. Such a service is unsatisfactory despite the client having to pay a high fee.

With blockchain technology, the preceding problems are resolved elegantly. In the case of the Bitcoin blockchain, the underlying asset to be transferred is the digital coin, BTC. A cross-border BTC transaction can complete in no more than 1 hour. No settlement is needed since transaction and settlement are in one action. The cost of this transaction is a tiny fraction of a transfer via a bank. For example, a recent report published by the Bank of America (BoA) claims a transfer via blockchain costs 1/6000 of what BoA charges. However, for some clients, waiting an hour is still too long. Ripple, a payment provider for sending money globally, completes in under 1 minute.

The word Bitcoin often causes confusion as people use the word interchangeably for three things: the cryptocurrency, the blockchain, and the protocol. To avoid this confusion, we use BTC to refer to the cryptocurrency, and Bitcoin to refer to the blockchain and the corresponding network that uses the distributed ledger. For the protocol, we will fully spell out Bitcoin protocol or simply protocol.

A peer-to-peer network

To explain how Bitcoin works, let's look at what steps are involved with the existing business model for completing a cross-border transaction:

  • A customer enters an order either by visiting a bank branch or via the web. The sender provides detailed information of an order such as the amount, sending currency, receiver name, receiving currency, receiver's bank name, account and branch numbers, and a SWIFT number. Here, SWIFT stands for the Society for Worldwide Interbank Financial Telecommunications, a messaging network used by financial institutions to transmit information and instructions securely through a standardized system of codes. SWIFT assigns each financial organization a unique code called, interchangeably, the bank identifier code (BIC), SWIFT code, SWIFT ID, or ISO 9362 code.
  • The sending bank takes the order and verifies that the sender has sufficient funds available.
  • The bank charges a fee and converts the remaining amount from the sending currency to an amount in the receiving currency by executing an FX transaction.
  • The sending bank enters a transferring message to SWIFT with all the needed information.
  • Upon receiving the message, the receiving bank verifies the receiver's account information.
  • Upon a successful verification and settling the funds between sending and receiving banks following the protocol, the receiving bank credits the amount to the receiver's account.

Since there are multiple steps, entities, and systems involved, the preceding activities take days to complete.

A Bitcoin network connects computers around the world. Each computer is a node with equal status, except for a subset of nodes called miners, which choose to play the role of verifying transactions, building blocks and linking to the chain. With Bitcoin, the business model for completing a money transfer involves the following steps:

  1. A sender enters the number of BTCs, the addresses of Bitcoins to be taken from, and addresses of Bitcoins to be transferred to, using an e-wallet.
  2. The transaction request is sent to the Bitcoin network by the e-wallet.
  3. After miners have successfully verified the transaction and committed it to the network, the BTCs are now available for use by the receiver.

The Bitcoin transfer is a lot faster (in 1 hour, or minutes if using Ripple) for the following reasons:

  • The transaction and settlement are one step. This avoids the need to go through a time-consuming and expensive reconciliation process.
  • No FX trade is needed since BTC is borderless. It can move worldwide freely and rapidly.
  • No fund settlement is needed between banks since the transaction requires no intermediary banks.

In a case where a sender or receiver prefers to use a fiat currency such as USD, GBP, CNY, or JPY, a cryptocurrency market can be used for a conversion between BTC and a fiat currency. A website, CoinMarketCap, lists these markets: https://coinmarketcap.com/rankings/exchanges/. As of September 21, 2018, there are 14,044 markets. In terms of market capitalization, the top three are Binance (https://www.binance.com/), OKEx (https://www.binance.com/), and Huopi (https://www.huobi.pro).

A peer-to-peer network can connect nodes worldwide. However, a merely physical connection is not enough to make two untrusting parties trade with each other. To allow them to trade, Bitcoin takes the following measures:

  • Every node saves a complete copy of all transactions in a decentralized ledger. This makes any alteration to a transaction on the chain infeasible.
  • The ledger transactions are grouped in blocks. A non-genesis block is linked to its previous block by saving the hash of all preceding blocks' transactions. Consequently, changing a transaction requires changing the current block of transactions and all subsequent blocks. This makes hacking the decentralized ledger extremely difficult.
  • Bitcoin addresses the double-spending issue, that is the same BTC being spent twice, by using the Proof-of-Work consensus algorithm.
  • Hashes are used extensively to protect the identities of parties and detect any changes occurring in a block.
  • Public/private keys and addresses are used to mask the identities of trading parties and to sign a transaction digitally .

With these measures, untrusting parties feel comfortable to trade due to these reasons:

  • The transaction is immutable and permanent. Neither party can nullify a transaction unilaterally.
  • No double spending is possible.
  • Transaction and settlement occur simultaneously; therefore, there is no settlement risk.
  • Identities are protected.
  • Transactions are signed by both parties, which will avoid any future legal disputes.

Cryptography and hash functions

Cryptography or cryptology is research on techniques for securing communication in the presence of adversaries. In the old days, cryptography was synonymous with encryption. Modern cryptography relies heavily on mathematical theory and computer science. It also utilizes works from other disciplines such as electrical engineering, communications science, and physics.

Cryptographic algorithms are designed around the assumption that with foreseeable computational hardware advances, it will not be feasible for any adversary to decipher encrypted messages based on these algorithms. In other words, in theory, it is possible to decode the encrypted message, but it is infeasible to do so practically. These algorithms are therefore defined to be computationally secure. Theoretical research (for instance, parallel or integer factorization algorithms) and computational technology advancements (for instance, quantum computers) can make these algorithms practically insecure and, therefore, encryption algorithms need to be adapted continuously.

Encryption is the process of converting plaintext into unintelligible text, called ciphertext. Decryption is the reverse, in other words moving from the unintelligible ciphertext back to plaintext.

The encryption algorithms used by Bitcoin mining are hash functions. A hash function is a function that maps data of any size to data of a fixed size. The values returned by a hash function are called hash values or simply hashes. A cryptographic hash function allows one to verify easily  that some input data maps to a given hash value. However, the reverse – when the input data is unknown—it is practically infeasible to reconstruct the input plaintext from a hash value. In other words, hashing is a one-way operation. Another notable attribute of a hashing function is that a minor change in the input plaintext will result in a completely different hash value. This feature is desirable for safeguarding information as any tiny change to the original data by a hacker results in a visibly different hash.

Two common hash algorithms are MD5 (message-digest algorithm 5) and SHA-1 (secure hash algorithm):

  • Developed by Ronald Rivest in 1991, MD5 maps input plaintext into a 128-bit resulting hash value. MD5 Message-Digest checksums are commonly used to validate data integrity when digital files are transferred or stored. MD5 has been found to suffer from extensive vulnerabilities.
  • SHA-1 is a cryptographic hash function mapping input plaintext into a 160-bit (20-byte) hash known as a message digest – often displayed as a hexadecimal number, 40 digits long. SHA-1 was designed by the United States national security agency and is a US federal information processing standard.

SHA-256 is a successor hash function to SHA-1. It is one of the strongest hash functions available and has not yet been compromised in any way. SHA-256 generates an almost unique 256-bit (32-byte) signature for a text. For example, My test string maps to 5358c37942b0126084bb16f7d602788d00416e01bc3fd0132f4458d d355d8e76. With a small change, the hash of My test strings is 98ff9f0555435 f792339d6b7bf5fbcca82f1a83fde2bb76f6aa95d66050887cca completely different value. SHA-256 produces 2^256 possible hashes. There is yet to be a case where two different inputs have produced the same SHA-256 hash, an issue called collision in cryptography. Even with the fastest supercomputer, it will take longer than the age of our universe to hit a collision. As a result, SHA-256 is used by Bitcoin for encryption.

The distributed ledger, blocks, transactions, addresses, and UTXO

At a financial institution, a ledger is a book for recording financial transactions. Similarly, Bitcoin maintains a ledger for bookkeeping BTC transactions and balances by address. One key difference is that a bank's ledger is centralized and Bitcoin's ledger is decentralized. Consequently, a bank's ledger is much easier to be cooked. On the other side, Bitcoin's ledger is very difficult to cook as one has to change the ledger at all nodes worldwide.

A user submits a transaction containing the following information:

  • Sources of the BTCs to be transferred from
  • The amount of BTCs to be transferred
  • Destinations the BTCs should be transferred to

As per the Wiki site, a transaction has a general structure shown as follows:

Both source and destination addresses are 64-character hashes. Here is an example of an address: 979e6b063b436438105895939f4ff13d068428d2f71312cf5594c132905bfxy1.

The term address is a bit confusing. A programmer may think it to be an address related to a disk or memory location. However, it has nothing to do with a physical location. Instead, it is a logical label for grouping BTCs that have been transferred from/to it. In a way, one can think of it as a bank account number, yet there are fundamental differences between them. For example, a bank has a centralized place where metadata on an account, for instance, owner name, account open date, and account type, is saved. In addition, the account balance is precalculated and saved. In Bitcoin, there is no metadata on an address and one has to query the entire ledger to find the balance of an address by counting the net BTCs being transferred in and out of the address.  Addresses are referred to only in Bitcoin transactions. When the balance of an address falls to 0, any future request for taking BTCs from the address will fail the transaction validation due to insufficient funds.

 

 

Bitcoin utilizes the UTXO model to manage its BTC transfer. The term was introduced by cryptocurrency, where it refers to an unspent transaction output. This is an output of a blockchain transaction that has not been spent and can be used as an input for a future transaction. In a Bitcoin transaction, only unspent outputs can be used as an input, which helps to prevent double spending and fraud. As a result, a committed transaction results in deleting inputs on a blockchain and creating outputs in the form of UTXOs. The newly created unspent transaction outputs can be spent by the owner holding the corresponding private keys. In other words, UTXOs are processed continuously and a committed transaction leads to removing spent coins and creating new unspent coins in the UTXO database.

Like an address, a BTC is not associated with any physical object such as a digital token file or a physically minted coin. Instead, it only exists in transactions in the distributed ledger. For example, if one wants to know the total number of BTCs minted so far, one has to go through all nonzero balance addresses on the blockchain and add up all the BTCs. Since every node of Bitcoin keeps a copy of the ledger, it is only a matter of taking computing time to find an answer.

When a user enters a BTC transaction request at a node, Bitcoin software installed at the node broadcasts the transaction to all nodes. Nodes on the network will verify the validity of the transaction by retrieving all historical transactions containing the input addresses and ensuring that BTCs from these addresses are legitimate and sufficient. After that, the mining nodes start to construct a block by collecting the verified transactions. Normally, a Bitcoin block contains between 1,500 to 2,000 transactions. A miner who wins the race to resolve a difficult mathematical puzzle gets the role to build and link a new block to the chain. On the Bitcoin blockchain, a new block is created around every 10 minutes. As of September 21, 2018, approximately 542,290 blocks have been created on Bitcoin. The structure of a Bitcoin block is shown as follows:

 

Here, the block header contains the following fields:

The concept of a nonce will be explained in the subsection on mining. hashPrevBlock is the same value as hashMerkleRoot. The Merkle tree hash root is essentially the hash of all transaction hashes in the block via a binary tree aggregation structure. The following diagram explains the idea: