Book Image

Modern Computer Architecture and Organization – Second Edition - Second Edition

By : Jim Ledin

Book Image

Modern Computer Architecture and Organization – Second Edition - Second Edition

By: Jim Ledin

Overview of this book

Are you a software developer, systems designer, or computer architecture student looking for a methodical introduction to digital device architectures, but are overwhelmed by the complexity of modern systems? This step-by-step guide will teach you how modern computer systems work with the help of practical examples and exercises. You’ll gain insights into the internal behavior of processors down to the circuit level and will understand how the hardware executes code developed in high-level languages. This book will teach you the fundamentals of computer systems including transistors, logic gates, sequential logic, and instruction pipelines. You will learn details of modern processor architectures and instruction sets including x86, x64, ARM, and RISC-V. You will see how to implement a RISC-V processor in a low-cost FPGA board and write a quantum computing program and run it on an actual quantum computer. This edition has been updated to cover the architecture and design principles underlying the important domains of cybersecurity, blockchain and bitcoin mining, and self-driving vehicles. By the end of this book, you will have a thorough understanding of modern processors and computer architecture and the future directions these technologies are likely to take.

Preface

Who this book is for

What this book covers

To get the most out of this book

Free Chapter

Introducing Computer Architecture

Introducing Computer Architecture

Technical requirements

The evolution of automated computing devices

Computer architecture

Digital Logic

Technical requirements

Electrical circuits

Sequential logic

Hardware description languages

Processor Elements

Processor Elements

Technical requirements

A simple processor

The instruction set

Addressing modes

Instruction categories

Interrupt processing

Input/output operations

Computer System Components

Computer System Components

Technical requirements

Memory subsystem

Introducing the MOSFET

Constructing DRAM circuits with MOSFETs

Graphics displays

Network interface

Keyboard and mouse

Modern computer system specifications

Hardware-Software Interface

Hardware-Software Interface

Technical requirements

The boot process

Operating systems

Processes and threads

Multiprocessing

Specialized Computing Domains

Specialized Computing Domains

Technical requirements

Real-time computing

Digital signal processing

Examples of specialized architectures

Processor and Memory Architectures

Processor and Memory Architectures

Technical requirements

The von Neumann, Harvard, and modified Harvard architectures

Physical and virtual memory

Memory management unit

Performance-Enhancing Techniques

Performance-Enhancing Techniques

Technical requirements

Instruction pipelining

Simultaneous multithreading

SIMD processing

Specialized Processor Extensions

Specialized Processor Extensions

Technical requirements

Privileged processor modes

Floating-point arithmetic

Power management

System security management

Modern Processor Architectures and Instruction Sets

Modern Processor Architectures and Instruction Sets

Technical requirements

x86 architecture and instruction set

x64 architecture and instruction set

32-bit ARM architecture and instruction set

64-bit ARM architecture and instruction set

The RISC-V Architecture and Instruction Set

The RISC-V Architecture and Instruction Set

Technical requirements

The RISC-V architecture and applications

The RISC-V base instruction set

RISC-V extensions

RISC-V variants

Standard RISC-V configurations

RISC-V assembly language

Implementing RISC-V in an FPGA

Processor Virtualization

Processor Virtualization

Technical requirements

Introducing virtualization

Virtualization challenges

Virtualizing modern processors

Virtualization tools

Virtualization and cloud computing

Domain-Specific Computer Architectures

Domain-Specific Computer Architectures

Technical requirements

Architecting computer systems to meet unique requirements

Smartphone architecture

Personal computer architecture

Warehouse-scale computing architecture

Neural networks and machine learning architectures

Cybersecurity and Confidential Computing Architectures

Cybersecurity and Confidential Computing Architectures

Technical requirements

Cybersecurity threats

Features of secure hardware

Confidential computing

Designing for security at the architectural level

Ensuring security in system and application software

Blockchain and Bitcoin Mining Architectures

Blockchain and Bitcoin Mining Architectures

Technical requirements

Introduction to blockchain and bitcoin

The bitcoin mining process

Bitcoin mining computer architectures

Alternative types of cryptocurrency

Self-Driving Vehicle Architectures

Self-Driving Vehicle Architectures

Technical requirements

Overview of self-driving vehicles

Safety concerns of self-driving vehicles

Hardware and software requirements for self-driving vehicles

Autonomous vehicle computing architecture

Quantum Computing and Other Future Directions in Computer Architectures

Quantum Computing and Other Future Directions in Computer Architectures

Technical requirements

The ongoing evolution of computer architectures

Extrapolating from current trends

Potentially disruptive technologies

Building a future-tolerant skill set

Other Books You May Enjoy

Other Books You May Enjoy

Index

Appendix

Chapter 1: Introducing Computer Architecture

Chapter 2: Digital Logic

Chapter 3: Processor Elements

Chapter 4: Computer System Components

Chapter 5: Hardware-Software Interface

Chapter 6: Specialized Computing Domains

Chapter 7: Processor and Memory Architectures

Chapter 8: Performance-Enhancing Techniques

Chapter 9: Specialized Processor Extensions

Chapter 10: Modern Processor Architectures and Instruction Sets

Chapter 11: The RISC-V Architecture and Instruction Set

Chapter 12: Processor Virtualization

Chapter 13: Domain-Specific Computer Architectures

Chapter 14: Cybersecurity and Confidential Computing Architectures

Chapter 15: Blockchain and Bitcoin Mining Architectures

Chapter 16: Self-Driving Vehicle Architectures

Chapter 17: Future Directions in Computer Architectures

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Computer architecture

The descriptions of a number of key architectures from the history of computing presented in the previous sections of this chapter included some terms that may or may not be familiar to you. This section will introduce the conceptual building blocks that are used to construct modern-day processors and related computer subsystems.

Representing numbers with voltage levels

One ubiquitous feature of modern computers is the use of voltage levels to indicate data values. In general, only two voltage levels are recognized: a low level and a high level. The low level is often assigned the value 0, and the high level is assigned the value 1.

The voltage at any point in a circuit (digital or otherwise) is analog in nature and can take on any voltage within its operating range. When changing from the low level to the high level, or vice versa, the voltage must pass through all voltages in between. In the context of digital circuitry, the transitions between low and high levels happen quickly and the circuitry is designed to not react to voltages between the high and low levels.

Binary and hexadecimal numbers

The circuitry within a processor does not work directly with numbers, in any sense. Processor circuit elements obey the laws of electricity and electronics and simply react to the inputs provided to them. The inputs that drive these actions result from the code developed by programmers and from the data provided as input to the program. The interpretation of the output of a program as, say, numbers in a spreadsheet, or characters in a word processing program, is a purely human interpretation that assigns meaning to the result of the electronic interactions within the processor. The decision to assign 0 to the low voltage and 1 to the high voltage is the first step in the interpretation process.

The smallest unit of information in a digital computer is a binary digit, called a bit, which represents a discrete data element containing the value 0 or 1. Multiple bits can be placed together to enable the representation of a greater range of values. A byte is composed of 8 bits placed together to form a single value. A byte is the smallest unit of information that can be read from or written to memory by most modern processors. Some computers, past and present, use a different number of bits for the smallest addressable data item, but the 8-bit byte is the most common size.

A single bit can take on two values: 0 and 1. Two bits placed together can take on four values: 00, 01, 10, and 11. Three bits can take on eight values: 000, 001, 010, 011, 100, 101, 110, and 111. In general, a group of n bits can take on 2ⁿ values. An 8-bit byte, therefore, can represent 2⁸, or 256, unique values.

The binary number format is not most people’s first choice when it comes to performing arithmetic. Working with numbers such as 11101010 can be confusing and error-prone, especially when dealing with 32- and 64-bit values. To make working with these numbers somewhat easier, hexadecimal numbers are often used instead. The term hexadecimal is often shortened to hex.

In the hexadecimal number system, binary numbers are separated into groups of 4 bits. With 4 bits in the group, the number of possible values is 2⁴, or 16. The first 10 of these 16 numbers are assigned the digits 0-9, and the last 6 are assigned the letters A-F. Table 1.1 shows the first 16 binary values starting at 0, along with the corresponding hexadecimal digit and the decimal equivalent to the binary and hex values:

Binary	Hexadecimal	Decimal
`0000`	`0`	`0`
`0001`	`1`	`1`
`0010`	`2`	`2`
`0011`	`3`	`3`
`0100`	`4`	`4`
`0101`	`5`	`5`
`0110`	`6`	`6`
`0111`	`7`	`7`
`1000`	`8`	`8`
`1001`	`9`	`9`
`1010`	`A`	`10`
`1011`	`B`	`11`
`1100`	`C`	`12`
`1101`	`D`	`13`
`1110`	`E`	`14`
`1111`	`F`	`15`

Table 1.1: Binary, hexadecimal, and decimal numbers

The binary number 11101010 can be represented more compactly by breaking it into two 4-bit groups (1110 and 1010) and writing them as the hex digits EA. A 4-bit grouping is sometimes referred to as a nibble, meaning it is half a byte. Because binary digits can take on only two values, binary is a base-2 number system. Hex digits can take on 16 values, so hexadecimal is base-16. Decimal digits can have 10 values, and therefore decimal is base-10.

When working with these different number bases, it is easy for things to become confusing. Is a number written as 100 a binary, hexadecimal, or decimal value? Without additional information, you can’t tell. Various programming languages and textbooks have taken different approaches to remove this ambiguity. In most cases, decimal numbers are unadorned, so the number 100 is usually decimal. In programming languages such as C and C++, hexadecimal numbers are prefixed by 0x, so the number 0x100 is 100 hex. In assembly languages, either the prefix character $ or the suffix h might be used to indicate hexadecimal numbers. The use of binary values in programming is less common, mostly because hexadecimal is preferred due to its compactness. Some compilers support the use of 0b as a prefix for binary numbers.

HEXADECIMAL NUMBER REPRESENTATION

This book uses either the prefix $ or the suffix h to represent hexadecimal numbers, depending on the context. The suffix b will represent binary numbers, and the absence of a prefix or suffix indicates decimal numbers.

Bits are numbered individually within a binary number, with bit 0 as the rightmost, least significant bit. Bit numbers increase in magnitude leftward, up to the most significant bit at the far left.

Some examples should make this clear. In Table 1.1, the binary value 0001b (1 decimal) has bit number 0 set to 1 and the remaining three bits are cleared to 0. For 0010b (2 decimal), bit 1 is set and the other bits are cleared. For 0100b (4 decimal), bit 2 is set and the other bits are cleared.

SET VERSUS CLEARED

A bit that is set has the value 1. A bit that is cleared has the value 0.

An 8-bit byte can take on values from $00h to $FF, equivalent to the decimal range 0-255. When performing addition at the byte level, the result can exceed 8 bits. For example, adding $01 to $FF results in the value $100. When using 8-bit registers, this represents a carry into the 9^th bit, which must be handled appropriately by the processor hardware and by the software performing the addition.

In unsigned arithmetic, subtracting $01 from $00 results in a value of $FF. This constitutes a wraparound to $FF. Depending on the computation being performed, this may or may not be the desired result. Once again, the processor hardware and the software must handle this situation to arrive at the desired result.

When appropriate, negative values can be represented using binary numbers. The most common signed number format in modern processors is two’s complement. In two’s complement, 8-bit signed numbers span the range from -128 to 127. The most significant bit of a two’s complement data value is the sign bit: a 0 in this bit represents a positive number and a 1 represents a negative number. A two’s complement number can be negated (multiplied by -1) by inverting all the bits, adding 1, and ignoring the carry. Inverting a bit means changing a 0 bit to 1 and a 1 bit to 0. See Table 1.2 for some step-by-step examples negating signed 8-bit numbers:

Decimal value	Binary value	Invert the bits	Add one	Negated result
`0`	`00000000b`	`11111111b`	`00000000b`	`0`
`1`	`00000001b`	`11111110b`	`11111111b`	`-1`
`-1`	`11111111b`	`00000000b`	`00000001b`	`1`
`127`	`01111111b`	`10000000b`	`10000001b`	`-127`
`-127`	`10000001b`	`01111110b`	`01111111b`	`127`

Table 1.2: Negation operation examples

Negating 0 returns a result of 0, as you would expect mathematically.

TWO’S COMPLEMENT ARITHMETIC

Two’s complement arithmetic is identical to unsigned arithmetic at the bit level. The manipulations involved in addition and subtraction are the same whether the input values are intended to be signed or unsigned. The interpretation of the result as signed or unsigned depends entirely on the intent of the user.

Table 1.3 shows how the binary values 00000000b to 11111111b correspond to signed values over the range -128 to 127, and unsigned values from 0 to 255:

Binary	Signed Decimal	Unsigned Decimal
`00000000b`	`0`	`0`
`00000001b`	`1`	`1`
`00000010b`	`2`	`2`

`01111110b`	`126`	`126`
`01111111b`	`127`	`127`
`10000000b`	`-128`	`128`
`10000001b`	`-127`	`129`
`10000010b`	`-126`	`130`

`11111101b`	`-3`	`253`
`11111110b`	`-2`	`254`
`11111111b`	`-1`	`255`

Table 1.3: Signed and unsigned 8-bit numbers

Signed and unsigned representations of binary numbers extend to larger integer data types. 16-bit values can represent unsigned integers from 0 to 65,535, and signed integers in the range -32,768 to 32,767. 32-bit, 64-bit, and even larger integer data types are commonly available in modern processors and programming languages.

The 6502 microprocessor

This section introduces a processor architecture that is relatively simple compared to more powerful modern processors.

The intent here is to provide a whirlwind introduction to some basic concepts shared by processors spanning the spectrum from very low-end microcontrollers to sophisticated multi-core 64-bit processors.

The 6502 processor was introduced by MOS Technology in 1975. The 6502 found widespread use in video game consoles from Atari and Nintendo and in computers marketed by Commodore and Apple. Versions of the 6502 continue to be in widespread use today in embedded systems, with estimates of between 5 and 10 billion (yes, billion) units produced as of 2018. In popular culture, both Bender, the robot in Futurama, and the T-800 robot in The Terminator appear to have employed the 6502, based on onscreen evidence.

Like many early microprocessors, the 6502 was powered by 5 volts (V) direct current (DC). In these circuits, a low signal level is any voltage between 0 and 0.8 V. A high signal level is any voltage between 2 and 5 V. Voltages between these ranges occur only during transitions from low to high and from high to low. The low signal level is defined as logical 0, and the high signal level is defined as logical 1. Chapter 2, Digital Logic, will delve further into the electronic circuits used in digital electronics.

The word length of a processor defines the size of the fundamental data element the processor operates upon. The 6502 has a word length of 8 bits. This means the 6502 reads and writes memory 8 bits at a time and stores data internally in 8-bit wide registers.

Program memory and data memory share the same address space and the 6502 accesses its memory over a single bus. Like the Intel 8088, the 6502 implements the von Neumann architecture. The 6502 has a 16-bit address bus, enabling the addressing of 64 kilobytes of memory.

1 KB is defined as 2¹⁰, or 1,024 bytes. The number of unique binary combinations of the 16 address lines is 2¹⁶, which permits access to 65,536 byte-wide memory locations. Note that just because a device can address 64 KB, it does not mean there must be memory at each of those locations. The Commodore VIC-20, based on the 6502, contained just 5 KB of RAM and 20 KB of ROM.

The 6502 contains internal storage areas called registers. A register is a location in a logical device in which a word of information can be stored and acted upon during computation. A typical processor contains a small number of registers for temporarily storing data values and performing operations such as addition or address computations.

Figure 1.1 shows the 6502 register structure. The processor contains five 8-bit registers (A, X, Y, P, and S) and one 16-bit register (PC). The numbers above each register indicate the bit numbers at each end of the register:

Figure 1.1: The 6502 register structure

Figure 1.1: 6502 register set

Each of the A, X, and Y registers can serve as a general-purpose storage location. Program instructions can load a value into one of those registers and, some instructions later, use the saved value for some purpose if the intervening instructions did not modify the register contents. The A register is the only register capable of performing arithmetic operations. The X and Y registers, but not the A register, can be used as index registers in calculating memory addresses.

The P register contains processor flags. Each bit in this register has a unique purpose, except for the bit labeled 1. The 1 bit is unused and can be ignored. Each of the remaining bits in this register is called a flag and indicates a specific condition that has occurred or represents a configuration setting. The 6502 flags are as follows:

N: Negative sign flag: This flag is set when the result of an arithmetic operation sets bit 7 in the result. This flag is used in signed arithmetic.
V: Overflow flag: This flag is set when a signed addition or subtraction results in overflow or underflow outside the range -128 to 127.
B: Break flag: This flag indicates a Break (BRK) instruction has executed. This bit is not present in the P register itself. The B flag value is only relevant when examining the P register contents as stored on the stack by a BRK instruction or by an interrupt. The B flag is set to distinguish a software interrupt resulting from a BRK instruction from a hardware interrupt during interrupt processing.
D: Decimal mode flag: If set, this flag indicates processor arithmetic will operate in Binary-Coded Decimal (BCD) mode. BCD mode is rarely used and won’t be discussed here, other than to note that this base-10 computation mode evokes the architectures of the Analytical Engine and ENIAC.
I: Interrupt disable flag: If set, this flag indicates that interrupt inputs (other than the non-maskable interrupt) will not be processed.
Z: Zero flag: This flag is set when an operation produces a result of 0.
C: Carry flag: This flag is set when an arithmetic operation produces a carry.

The N, V, Z, and C flags are the most important flags in the context of general computing involving loops, counting, and arithmetic.

The S register is the stack pointer. In the 6502, the stack is the region of memory from addresses $100 to $1FF. This 256-byte range is used for the temporary storage of parameters within subroutines and holds the return address when a subroutine is called. At system startup, the S register is initialized to point to the top of this range. Values are “pushed” onto the stack using instructions such as PHA, which pushes the contents of the A register onto the stack.

When a value is pushed onto the stack, the 6502 stores the value at the address indicated by the S register, after adding the fixed $100 offset, and then decrements the S register. Additional values can be placed on the stack by executing more push instructions. As additional values are pushed, the stack grows downward in memory. Programs must take care not to exceed the fixed 256-byte size of the stack when pushing data onto it.

Data stored on the stack must be retrieved in the reverse of the order from which it was pushed onto the stack. The stack is a Last-In, First-Out (LIFO) data structure, meaning when you “pop” a value from the stack, it is the byte most recently pushed onto it. The PLA instruction increments the S register by 1 and then copies the value at the address indicated by the S register (plus the $100 offset) into the A register.

The PC register is the program counter. This register contains the memory address of the next instruction to be executed. Unlike the other registers, the PC is 16 bits long, allowing access to the entire 6502 address space.

Each instruction consists of a 1-byte operation code, called opcode for short, and may be followed by 0 to 2 operand bytes, depending on the type of instruction. After each instruction executes, the PC updates to point to the next instruction following the one that just completed. In addition to automatic updates during sequential instruction execution, the PC can be modified by jump instructions, branch instructions, and subroutine call and return instructions.

The 6502 instruction set

We will now examine the 6502 instruction set. Instructions are individual processor commands that, when strung together sequentially, execute the algorithm coded by the programmer. An instruction contains a binary number called an operation code (or opcode) that tells the processor what to do when that instruction executes.

If they wish, programmers can write code directly using processor instructions. We will see examples of this later in this section. Programmers can also write code in a so-called high-level language. The programmer then uses a software tool called a compiler that translates the high-level code into a (usually much longer) sequence of processor instructions.

In this section, we are working with code written as sequences of processor instructions. This form of source code is called assembly language.

Each of the 6502 instructions has a three-character mnemonic. In assembly language source files, each line of code contains an instruction mnemonic followed by any operands associated with the instruction. The combination of the mnemonic and the operands defines the addressing mode. The 6502 supports several addressing modes providing a great deal of flexibility in accessing data in registers and memory. For this introduction, we’ll only work with the immediate addressing mode, in which the operand itself contains a value rather than indicating a register or memory location containing the value. An immediate value is preceded by a # character.

In 6502 assembly, decimal numbers have no adornment (48 means 48 decimal), while hexadecimal values are preceded by a $ character ($30 means 30 hexadecimal, equivalent to 00110000b and to 48 decimal). An immediate decimal value looks like #48 and an immediate hexadecimal value looks like #$30.

Some assembly code examples will demonstrate the 6502 arithmetic capabilities. Five 6502 instructions are used in the following examples:

LDA loads register A with a value
ADC performs addition using Carry (the C flag in the P register) as an additional input and output
SBC performs subtraction using the C flag as an additional input and output
SEC sets the C flag directly
CLC clears the C flag directly

Since the C flag is an input to the addition and subtraction instructions, it is important to ensure it has the correct value prior to executing the ADC or SBC instructions. Before initiating an addition operation, the C flag must be clear to indicate there is no carry from a prior addition. When performing multi-byte additions (say, with 16-bit, 32-bit, or 64-bit numbers), the carry, if any, will propagate from the sum of one byte pair to the next as you add the more significant bytes together. If the C flag is set when the ADC instruction executes, the effect is to add 1 to the result. After the ADC completes, the C flag serves as the ninth bit of the result: a C flag result of 0 means there was no carry, and a 1 indicates there was a carry from the 8-bit register.

Subtraction using the SBC instruction tends to be a bit more confusing to novice 6502 assembly language programmers. Schoolchildren learning subtraction use the technique of borrowing when subtracting a larger digit from a smaller digit. In the 6502, the C flag represents the opposite of Borrow. If C is 1, then Borrow is 0, and if C is 0, Borrow is 1. Performing a simple subtraction with no incoming Borrow requires setting the C flag before executing the SBC command.

The following examples employ the 6502 as a calculator using inputs defined as immediate values in the code and with the result stored in the A register. The Results columns show the final value of the A register and the states of the N, V, Z, and C flags:

Instruction Sequence	Description	Results
Instruction Sequence	Description	A	N	V	Z	C
`CLC` `LDA #1` `ADC #1`	8-bit addition with no Carry input: Clear the Carry flag, then load an immediate value of 1 into the A register and add 1 to it.	`$02`	`0`	`0`	`0`	`0`
`SEC` `LDA #1` `ADC #1`	8-bit addition with a Carry input: Set the Carry flag, then load an immediate value of 1 into the A register and add 1 to it.	`$03`	`0`	`0`	`0`	`0`
`SEC` `LDA #1` `SBC #1`	8-bit subtraction with no Borrow input: Set the Carry flag, then load an immediate value of 1 into the A register then subtract 1 from it. C = 1 indicated no Borrow occurred.	`$00`	`0`	`0`	`1`	`1`
`CLC` `LDA #1` `SBC #1`	8-bit subtraction with a Borrow input: Clear the Carry flag, then load an immediate value of 1 into the A register and subtract 1 from it. C = 0 indicates a Borrow occurred.	`$FF`	`1`	`0`	`0`	`0`
`CLC` `LDA $FF` `ADC #1`	Unsigned overflow: Add 1 to $FF. C = 1 indicates a Carry occurred.	`$00`	`0`	`0`	`1`	`1`
`SEC` `LDA #0` `SBC #1`	Unsigned underflow: Subtract 1 from 0. C = 0 indicates a Borrow occurred.	`$FF`	`1`	`0`	`0`	`0`
`CLC` `LDA #$7F` `ADC #1`	Signed overflow: Add 1 to $7F. V = 1 indicates signed overflow occurred.	`$80`	`1`	`1`	`0`	`0`
`SEC` `LDA #$80` `SBC #1`	Signed underflow: Subtract 1 from $80. V = 1 indicates signed underflow occurred.	`$7F`	`0`	`1`	`0`	`1`

Table 1.4: 6502 arithmetic instruction sequences

If you don’t happen to have a 6502-based computer with an assembler and debugger handy, there are several free 6502 emulators available online that you can run in your web browser. One excellent emulator is available at https://skilldrick.github.io/easy6502/. Visit the website and scroll down until you find a default code listing with buttons for assembling and running 6502 code. Replace the default code listing with a group of three instructions from Table 1.4 and then assemble the code.

To examine the effect of each instruction in the sequence, use the debugger controls to single-step through the instructions and observe the result of each instruction on the processor registers.

This section has provided a very brief introduction to the 6502 processor and a small subset of its capabilities. One point of this analysis was to illustrate the challenge of dealing with the issue of carries when performing addition and borrows when doing subtraction. From Charles Babbage to the designers of the 6502 to the developers of modern computer systems, computer architects have developed solutions to the problems of computation and implemented them using the best technology available to them.