Book Image

Modern Computer Architecture and Organization

By : Jim Ledin

Book Image

Modern Computer Architecture and Organization

By: Jim Ledin

Overview of this book

Are you a software developer, systems designer, or computer architecture student looking for a methodical introduction to digital device architectures but overwhelmed by their complexity? This book will help you to learn how modern computer systems work, from the lowest level of transistor switching to the macro view of collaborating multiprocessor servers. You'll gain unique insights into the internal behavior of processors that execute the code developed in high-level languages and enable you to design more efficient and scalable software systems. The book will teach you the fundamentals of computer systems including transistors, logic gates, sequential logic, and instruction operations. You will learn details of modern processor architectures and instruction sets including x86, x64, ARM, and RISC-V. You will see how to implement a RISC-V processor in a low-cost FPGA board and how to write a quantum computing program and run it on an actual quantum computer. By the end of this book, you will have a thorough understanding of modern processor and computer architectures and the future directions these architectures are likely to take.

Preface

Who this book is for

What this book covers

To get the most out of this book

Conventions used

Section 1: Fundamentals of Computer Architecture

Section 1: Fundamentals of Computer Architecture

Free Chapter

Chapter 1: Introducing Computer Architecture

Chapter 1: Introducing Computer Architecture

The evolution of automated computing devices

Computer architecture

Chapter 2: Digital Logic

Chapter 2: Digital Logic

Electrical circuits

Sequential logic

Hardware description languages

Chapter 3: Processor Elements

Chapter 3: Processor Elements

A simple processor

The instruction set

Addressing modes

Instruction categories

Interrupt processing

Input/output operations

Chapter 4: Computer System Components

Chapter 4: Computer System Components

Technical requirements

Memory subsystem

Introducing the MOSFET

Constructing DRAM circuits with MOSFETs

Graphics displays

Network interface

Keyboard and mouse

Modern computer system specifications

Chapter 5: Hardware-Software Interface

Chapter 5: Hardware-Software Interface

The boot process

Operating systems

Processes and threads

Multiprocessing

Chapter 6: Specialized Computing Domains

Chapter 6: Specialized Computing Domains

Real-time computing

Digital signal processing

Examples of specialized architectures

Section 2: Processor Architectures and Instruction Sets

Section 2: Processor Architectures and Instruction Sets

Chapter 7: Processor and Memory Architectures

Chapter 7: Processor and Memory Architectures

Technical Requirements

The von Neumann, Harvard, and modified Harvard architectures

Physical and virtual memory

Paged virtual memory

Memory management unit

Chapter 8: Performance-Enhancing Techniques

Chapter 8: Performance-Enhancing Techniques

Instruction pipelining

Simultaneous multithreading

SIMD processing

Chapter 9: Specialized Processor Extensions

Chapter 9: Specialized Processor Extensions

Technical requirements

Privileged processor modes

Floating-point mathematics

Power management

System security management

Chapter 10: Modern Processor Architectures and Instruction Sets

Chapter 10: Modern Processor Architectures and Instruction Sets

Technical requirements

x86 architecture and instruction set

x64 architecture and instruction set

32-bit ARM architecture and instruction set

64-bit ARM architecture and instruction set

Chapter 11: The RISC-V Architecture and Instruction Set

Chapter 11: The RISC-V Architecture and Instruction Set

Technical requirements

The RISC-V architecture and features

The RISC-V base instruction set

RISC-V extensions

Standard RISC-V configurations

RISC-V assembly language

Implementing RISC-V in an FPGA

Section 3: Applications of Computer Architecture

Section 3: Applications of Computer Architecture

Chapter 12: Processor Virtualization

Chapter 12: Processor Virtualization

Technical requirements

Introducing virtualization

Virtualization challenges

Virtualizing modern processors

Virtualization tools

Virtualization and cloud computing

Chapter 13: Domain-Specific Computer Architectures

Chapter 13: Domain-Specific Computer Architectures

Technical requirements

Architecting computer systems to meet unique requirements

Smartphone architecture

Personal computer architecture

Warehouse-scale computing architecture

Neural networks and machine learning architectures

Chapter 14: Future Directions in Computer Architectures

Chapter 14: Future Directions in Computer Architectures

The ongoing evolution of computer architectures

Extrapolating from current trends

Potentially disruptive technologies

Building a future-tolerant skill set

Answers to Exercises

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Computer architecture

The descriptions of a small number of key architectures from the history of computing mentioned in the previous section included some terms that may or may not be familiar to you. This section will provide an introduction to the building blocks used to construct modern-day processors and related computer subsystems.

One ubiquitous feature of modern computers is the use of voltage levels to indicate data values. In general, only two voltage levels are recognized: a low level and a high level. The low level is often assigned the value zero and the high level assigned the value one. The voltage at any point in a circuit (digital or otherwise) is analog in nature and can take on any voltage within its operating range. When changing from the low level to the high level, or vice versa, the voltage must pass through all voltages in between. In the context of digital circuitry, the transitions between low and high levels happen quickly and the circuitry is designed to not react to voltages between the high and low levels.

Binary and hexadecimal numbers

The circuitry within a processor does not work directly with numbers, in any sense. Processor circuit elements obey the laws of electricity and electronics and simply react to the inputs provided to them. The inputs that drive these actions result from the code developed by programmers and from the data provided as input to the program. The interpretation of the output of a program as, say, numbers in a spreadsheet, or characters in a word processing program, is a purely human interpretation that assigns meaning to the result of the electronic interactions within the processor. The decision to assign zero to the low voltage and one to the high voltage is the first step in the interpretation process.

The smallest unit of information in a digital computer is a binary digit, called a bit, which represents a discrete data element containing the value zero or one. A number of bits can be placed together to enable representation of a greater range of values. A byte is composed of eight bits placed together to form a single value. The byte is the smallest unit of information that can be read from or written to memory by most modern processors.

A single bit can take on two values: 0 and 1. Two bits placed together can take on four values: 00, 01, 10, and 11. Three bits can take on eight values: 000, 001, 010, 011, 100, 101, 110, and 111. In fact, any number of bits, n, can take on 2n values, where 2n indicates multiplying n copies of two together. An 8-bit byte, therefore, can take on 28 or 256 different values.

The binary number format is not most people's first choice when it comes to performing arithmetic, and working with numbers such as 11101010 can be confusing and error prone, especially when dealing with 32- and 64-bit values. To make working with these numbers somewhat easier, hexadecimal numbers are often used instead. The term hexadecimal is often shortened to hex. In the hexadecimal number system, binary numbers are separated into groups of four bits. Since there are four bits in the group, the number of possible values is 24, or 16. The first ten of these 16 numbers are assigned the digits 0-9. The last six are assigned the letters A-F. Table 1.1 shows the first 16 binary values starting at zero along with the corresponding hexadecimal digit and the decimal equivalent to the binary and hex values.

Table 1.1: Binary, hexadecimal, and decimal numbers

The binary number 11101010 can be represented more compactly by breaking it into two 4-bit groups (1110 and 1010) and writing them as the hex digits EA. Because binary digits can take on only two values, binary is a base-2 number system. Hex digits can take on 16 values, so hexadecimal is base-16. Decimal digits can have ten values, therefore decimal is base-10.

When working with these different number bases, it is possible for things to become confusing. Is the number written as 100 a binary, hexadecimal, or decimal value? Without additional information, you can't tell. Various programming languages and textbooks have taken different approaches to remove this ambiguity. In most cases, decimal numbers are unadorned, so the number 100 is usually decimal. In programming languages such as C and C++, hexadecimal numbers are prefixed by 0x so the number 0x100 is 100 hex. In assembly languages, either the prefix character $, or the suffix h might be used to indicate hexadecimal numbers. The use of binary values in programming is less common, mostly because hexadecimal is preferred due to its compactness. Some compilers support the use of 0b as a prefix for binary numbers.

Hexadecimal number representation

This book uses either the prefix $ or the suffix h to represent hexadecimal numbers, depending on the context. The suffix b will represent binary numbers, and the absence of a prefix or suffix indicates decimal numbers.

Bits are numbered individually within a binary number, with bit zero as the rightmost, least significant bit. Bit numbers increase in magnitude leftward. Some examples should make this clear: In Table 1.1, the binary value 0001b (1 decimal) has bit number zero set and the remaining three bits are cleared. In 0010b (2 decimal), bit 1 is set and the other bits are cleared. In 0100b (4 decimal), bit 2 is set and the other bits are cleared.

Set versus cleared

A bit that is set has the value 1. A bit that is cleared has the value 0.

An 8-bit byte can take on values from $00h to $FF, equivalent to the decimal range 0-255. When performing addition at the byte level, it is possible for the result to exceed 8 bits. For example, adding $01 to $FF results in the value $100. When using 8-bit registers, this represents a carry, which must be handled appropriately.

In unsigned arithmetic, subtracting $01 from $00 results in a value of $FF. This constitutes a wraparound to $FF. Depending on the computation being performed, this may or may not be the desired result.

When desired, negative values can be represented using binary numbers. The most common signed number format in modern processors is two's complement. In two's complement, 8-bit signed numbers span the range from -128 to 127. The most significant bit of a two's complement data value is the sign bit: a 0 in this bit represents a positive value and a 1 represents a negative value. A two's complement number can be negated (multiplied by -1) by inverting all of the bits, adding 1, and ignoring any carry. Inverting a bit means changing a 0 bit to 1 and a 1 bit to 0.

Table 1.2: Negation operation examples

Note that negating zero returns a result of zero, as you would expect mathematically.

Two's complement arithmetic

Two's complement arithmetic is identical to unsigned arithmetic at the bit level. The manipulations involved in addition and subtraction are the same whether the input values are intended to be signed or unsigned. The interpretation of the result as signed or unsigned depends entirely on the intent of the user.

Table 1.3: Signed and unsigned 8-bit numbers

Signed and unsigned representations of binary numbers extend to larger integer data types. 16-bit values can represent unsigned integers from 0 to 65,535 and signed integers in the range -32,768 to 32,767. 32-bit, 64-bit, and even larger integer data types are commonly available in modern programming languages.

The 6502 microprocessor

This section will introduce the architecture of a processor with a relatively simple design compared to more powerful modern processors. The intent here is to provide a whirlwind introduction to some basic concepts shared by processors spanning the spectrum from the very low end to sophisticated modern processors.

The 6502 processor was introduced by MOS Technology in 1975. The 6502 found widespread use in its early years in video game consoles from Atari and Nintendo and in computers marketed by Commodore and Apple. The 6502 continues in widespread use today in embedded systems, with estimates of between five and ten billion (yes, billion) units produced as of 2018. In popular culture, both Bender the robot in Futurama and the T-800 robot in The Terminator appear to have employed the 6502, based on onscreen evidence.

Many early microprocessors, like the 6502, were powered by a constant voltage of 5 volts (5V). In these circuits, a low signal level is any voltage between 0 and 0.8V. A high signal level is any voltage between 2 and 5V. The low signal level is defined as logical 0 and the high signal level is defined as logical 1. Chapter 2, Digital Logic, will delve further into digital electronics.

The word length of a processor defines the size of the fundamental data element the processor operates upon. The 6502 has a word length of 8 bits. This means the 6502 reads and writes memory 8 bits at a time and stores data internally in 8-bit wide registers.

Program memory and data memory share the same address space and the 6502 accesses its memory over a single bus. As was the case with the Intel 8088, the 6502 implements the von Neumann architecture. The 6502 has a 16-bit address bus, enabling access to 64 KB of memory.

One kilobyte (abbreviated KB) is defined as 210, or 1,024 bytes. The number of unique binary combinations of the 16 address lines is 216, equal to 64 multiplied by 1,024, or 65,536 locations. Note that just because a device can address 64 KB, it does not mean there must be memory at all of those locations. The Commodore VIC-20, based on the 6502, contained just 5 KB of Random Access Memory (RAM) and 20 KB of Read-Only Memory (ROM).

The 6502 contains internal storage areas called registers. A register is a location in a logical device in which a word of information can be stored and acted upon during computation. A typical processor contains a small number of registers for temporarily storing data values and performing operations such as addition or address computation.

The following figure 1.1 shows the 6502 register structure. The processor contains five 8-bit registers (A, X, Y, P, and S) and one 16-bit register (PC). The numbers above each register indicate the bit numbers at each end of the register:

Figure 1.1: The 6502 register structure

Figure 1.1: 6502 register set

Each of the A, X, and Y registers can serve as a general-purpose storage location. Program instructions can load a value into one of those registers and, some instructions later, use the saved value for some purpose, as long as the intervening instructions did not modify the register contents. The A register is the only register capable of performing arithmetic operations. The X and Y registers, but not the A register, can be used as index registers in calculating memory addresses.

The P register contains processor flags. Each bit in this register has a unique purpose, except for the bit labeled 1. The 1 bit is unused and can be ignored. Each of the remaining bits in this register is called a flag and indicates a specific condition that has occurred or represents a configuration setting. The 6502 flags are as follows:

N: Negative sign flag: This flag is set when the result of an arithmetic operation sets bit 7 in the result. This flag is used in signed arithmetic.
V: Overflow flag: This flag is set when a signed addition or subtraction results in overflow or underflow outside the range -128 to 127.
B: Break flag: This flag indicates a Break (BRK) instruction has executed. This bit is not present in the P register itself. The B flag value is only relevant when examining the P register contents as stored onto the stack by a BRK instruction or by an interrupt. The B flag is set to distinguish a software interrupt resulting from a BRK instruction from a hardware interrupt during interrupt processing.
D: Decimal mode flag: This flag indicates processor arithmetic will operate in Binary-Coded Decimal (BCD) mode. BCD mode is rarely used and won't be discussed here, other than to note that this base-10 computation mode evokes the architectures of the Analytical Engine and ENIAC.
I: Interrupt disable flag: This flag indicates that interrupt inputs (other than the non-maskable interrupt) will not be processed.
Z: Zero flag: This flag is set when an operation produces a result of zero.
C: Carry flag: This flag is set when an arithmetic operation produces a carry.

The N, V, Z, and C flags are the most important flags in the context of general computing involving loops, counting, and arithmetic.

The S register is the stack pointer. In the 6502, the stack is the region of memory from addresses $100 to $1FF. This 256-byte range is used for temporary storage of parameters within subroutines and holds the return address when a subroutine is called. At system startup, the S register is initialized to point to the top of this range. Values are "pushed" onto the stack using instructions such as PHA, which pushes the contents of the A register onto the stack. When a value is pushed onto the stack, the 6502 stores the value at the address indicated by the S register, after adding the fixed $100 offset, then decrements the S register. Additional values can be placed on the stack by executing more push instructions. As additional values are pushed, the stack grows downward in memory. Programs must take care not to exceed the fixed 256-byte size of the stack when pushing data onto it.

Data stored on the stack must be retrieved in the reverse of the order from which it was pushed onto the stack. The stack is a Last-In, First-Out (LIFO) data structure, meaning when you "pop" a value from the stack, it is the byte most recently pushed onto it. The PLA instruction increments the S register by one, then copies the value at the address indicated by the S register (plus the $100 offset) into the A register.

The PC register is the program counter. This register contains the memory address of the next instruction to be executed. Unlike the other registers, the PC is 16 bits long, allowing access to the entire 6502 address space. Each instruction consists of a 1-byte operation code, called opcode for short, and may be followed by zero to two operand bytes, depending on the instruction. After each instruction executes, the PC updates to point to the next instruction following the one that just completed. In addition to these automatic updates during sequential instruction execution, the PC can be modified by jump instructions, branch instructions, and subroutine call and return instructions.

The 6502 instruction set

Each of the 6502 instructions has a three-character mnemonic. In assembly language source files, each line of code contains an instruction mnemonic followed by any operands associated with the instruction. The combination of the mnemonic and the operands defines the addressing mode. The 6502 supports several addressing modes providing a great deal of flexibility in accessing data in registers and memory. For this introduction, we'll only work with the immediate addressing mode, in which the operand itself contains a value rather than indicating a register or memory location containing the value. An immediate value is preceded by a # character.

In 6502 assembly, decimal numbers have no adornment (48 means 48 decimal) while hexadecimal values are preceded by a $ character ($30 means 30 hexadecimal, equivalent to 00110000b and to 48 decimal). An immediate decimal value looks like #48 and an immediate hexadecimal value looks like #$30.

Some assembly code examples will demonstrate the 6502 arithmetic capabilities. Five 6502 instructions are used in the following examples:

LDA loads register A with a value.
ADC performs addition using the Carry (C flag) as an additional input and output.
SBC performs subtraction using the Carry flag as an additional input and output.
SEC sets the Carry flag directly.
CLC clears the Carry flag directly.

Since the Carry flag is an input to the addition and subtraction instructions, it is important to ensure it has the correct value prior to executing the ADC or SBC instructions. Before initiating an addition operation, the C flag must be clear to indicate there is no carry from a prior addition. When performing multi-byte additions (say, with 16-bit, 32-bit, or 64-bit numbers), the carry, if any, will propagate from the sum of one byte pair to the next as you add the more significant bytes to each other. If the C flag is set when the ADC instruction executes, the effect is to add one to the result. After the ADC completes, the C flag serves as the ninth bit of the result: a C flag result of 0 means there was no carry, and a 1 indicates there was a carry from the 8-bit register.

Subtraction using the SBC instruction tends to be a bit more confusing to novice 6502 assembly language programmers. Schoolchildren learning subtraction use the technique of borrowing when subtracting a larger digit from a smaller digit. In the 6502, the C flag represents the opposite of Borrow. If C is 1, then Borrow is 0, and if C is 0, Borrow is 1. Performing a simple subtraction with no incoming Borrow requires setting the C flag before executing the SBC command.

The examples in the following employ the 6502 as a calculator using inputs defined directly in the code and with the result stored in the A register. The Results columns show the final value of the A register and the N, V, Z, and C flags.

Table 1.4: 6502 arithmetic instruction sequences

If you don't happen to have a 6502-based computer with an assembler and debugger handy, there are several free 6502 emulators available online that you can run in your web browser. One excellent emulator is at https://skilldrick.github.io/easy6502/. Visit the website and scroll down until you find a default code listing with buttons for assembling and running 6502 code. Replace the default code listing with a group of three instructions from Table 1.4, then assemble the code. To examine the effect of each instruction in the sequence, use the debugger controls to single-step through the instructions and observe the result of each instruction on the processor registers.

This section has provided a very brief introduction to the 6502 processor and a small subset of its capabilities. One point of this analysis was to illustrate the challenge of dealing simply with the issue of carries when performing addition and borrows when doing subtraction. From Charles Babbage to the designers of the 6502, computer architects have developed solutions to the problems of computation and implemented them using the best technology available to them.