Book Image

Architecting and Building High-Speed SoCs

By : Mounir Maaref
5 (1)
Book Image

Architecting and Building High-Speed SoCs

5 (1)
By: Mounir Maaref

Overview of this book

Modern and complex SoCs can adapt to many demanding system requirements by combining the processing power of ARM processors and the feature-rich Xilinx FPGAs. You’ll need to understand many protocols, use a variety of internal and external interfaces, pinpoint the bottlenecks, and define the architecture of an SoC in an FPGA to produce a superior solution in a timely and cost-efficient manner. This book adopts a practical approach to helping you master both the hardware and software design flows, understand key interconnects and interfaces, analyze the system performance and enhance it using the acceleration techniques, and finally build an RTOS-based software application for an advanced SoC design. You’ll start with an introduction to the FPGA SoCs technology fundamentals and their associated development design tools. Gradually, the book will guide you through building the SoC hardware and software, starting from the architecture definition to testing on a demo board or a virtual platform. The level of complexity evolves as the book progresses and covers advanced applications such as communications, security, and coherent hardware acceleration. By the end of this book, you'll have learned the concepts underlying FPGA SoCs’ advanced features and you’ll have constructed a high-speed SoC targeting a high-end FPGA from the ground up.
Table of Contents (20 chapters)
1
Part 1: Fundamentals and the Main Features of High-Speed SoC and FPGA Designs
7
Part 2: Implementing High-Speed SoC Designs in an FPGA
12
Part 3: Implementation and Integration of Advanced High-Speed FPGA SoCs

Xilinx FPGA devices overview

An FPGA is a very large-scale integration (VLSI) integrated circuit (IC) that can contain hundreds of thousands of configurable logic blocks (CLBs), tens of thousands of predefined hardware functional blocks, hundreds of predefined external interfaces, thousands of memory blocks, thousands of input/output (I/O) pads, and even a fully predefined SoC centered around an IBM PowerPC or an ARM Cortex-A class processor in certain FPGA families. These functional elements are optimally spread around the FPGA silicon area and can be interconnected via programmable routing resources. This allows them to behave in a functional manner that’s desired by a logic designer so that they can meet certain design specifications and product requirements.

Application-specific integrated circuits (ASICs) and application-specific standard products (ASSPs) are VLSI devices that have been architected, designed, and implemented for a given product or a particular application domain. In contrast to ASICs and ASSPs, FPGA devices are generic ICs that can be programmed to be used in many applications and industries. FPGAs are usually reprogrammable as they are based on static random-access memory (SRAM) technology, but there is a type that is only programmed once: one-time programmable (OTP) FPGAs. Standard SRAM-based FPGAs can be reprogrammed as their design evolves or changes, even once they have been populated in the electronics design board and after being deployed in the field. The following diagram illustrates the concept of an FPGA IC:

Figure 1.1 – FPGA IC conceptual diagram

Figure 1.1 – FPGA IC conceptual diagram

As we can see, the FPGA device is structured as a pool of resources that the design assembles to perform a given logical task.

Once the FPGA’s design has been finalized, a corresponding configuration binary file is generated to program the FPGA device. This is typically done directly from the host machine at development and verification time over JTAG. Alternatively, the configuration file can be stored in a non-volatile media on the electronics board and used to program the FPGA at powerup.

A brief historical overview

Xilinx shipped its first FPGA in 1985 and its first device was the XC2064; it offered 800 gates and was produced on a 2.0μ process. The Virtex UltraScale+ FPGAs, some of the latest Xilinx devices, are produced in a 14nm process node and offer high performance and a dense integration capability. Some modern FPGAs use 3D ICs stacked silicon interconnect (SSI) technology to work around the limitations of Moore’s law and pack multiple dies within the same package. Consequently, they now provide an immense 9 million system logic cells in a single FPGA device, a four order of magnitude increase in capacity alone compared to the first FPGA; that is, XC2064. Modern FPGAs have also evolved in terms of their functionality, higher external interface bandwidth, and a vast choice of supported I/O standards. Since their initial inception, the industry has seen a multitude of quantitative and qualitative advances in FPGA devices’ performance, density, and integrated functionalities. Also, the adoption of the technology has seen a major evolution, which has been aided by adequate pricing and Moore’s law advancements. These breakthroughs, combined with matching advances in software development tools, intellectual property (IP), and support technologies, have created a revolution in logic design that has also penetrated the SoC segment.

There has also been the emergence of the new Xilinx Versal devices portfolio, which targets the data center’s workload acceleration and offers a new AI-oriented architecture. This device class family is outside the scope of this book.

FPGA devices and penetrated vertical markets

FPGAs were initially used as the electronics board glue logic of digital devices. They were used to implement buses, decode functions, and patch minor issues discovered in the board ASICs post-production. This was due to their limited capacities and functionalities. Today’s FPGAs can be used as the hearts of smart systems and are designed with their full capacities in terms of parallel processing and their flexible adaptability to emerging and changing standards, specifically at the higher layers, such as the Link and Transactions layers of new communication or interface protocols. These make reconfiguring FPGA the obvious choice in medium or even large deployments of these emerging systems. With the addition of ASIC class embedded processing platforms within the FPGA for integrating a full SoC, FPGA applications have expanded even deeper into industry verticals where it has seen limited useability in the past. It is also very clear that, with the prohibitive cost of non-recurring engineering (NRE) and producing ASICs at the current process nodes, FPGAs are becoming the first choice for certain applications. They also offer a very short time to market for certain segments where such a factor is critical for the product’s success.

FPGAs can be found across the board in the high-tech sector and range from the classical fields such as wired and wireless communication, networking, defense, aerospace, industrial, audio-video broadcast (AVB), ASIC prototyping, instrumentation, and medical verticals to the modern era of ADAS, data centers, the cloud and edge computing, high-performance computing (HPC), and ASIC emulation simulators. They have an appealing reason to be used almost everywhere in an electronics-based application.

An overview of the Xilinx FPGA device families

Xilinx provides a comprehensive portfolio of FPGA devices to address different system design requirements across a wide range of the application’s spectrum. For example, Xilinx FPGA devices can help system designers construct a base platform for a high-performance networking application necessitating a very dense logic capacity, a very wide bandwidth, and performance. They can also be used for low-cost, small-footprint logic design applications using one of the low-cost FPGA devices either for high or low-volume end applications.

In this large offering, there are the cost-optimized families such as the Spartan-7 family and the Spartan-6 family, which are built using a 45nm process node, the Artix-7 family, and the Zynq-7000 family, which is built using a 28nm process node.

There is also the 7-series family in a 28nm process, which includes the Artix-7, Kintex-7, and Virtex-7 families of FPGAs, in addition to the Spartan-7 family.

Additionally, there are FPGAs from the UltraScale Kintex and Virtex families in a 20nm process node.

The UltraScale+ category contains three more additional families – the Artix UltraScale+, the Kintex UltraScale+, and the Virtex UltraScale+, all in a 16nm process node.

Each device family has a matrixial offering table that is defined by the density of logic, the number of functional hardware blocks, the capacity of the internal memory blocks, and the amount of I/Os in each package. This makes the offered combinations an interesting catalog to pick a device that meets the requirements of the system to build using the specific FPGA. To examine a given device offering matrix, you need to consult the specific FPGA family product table and product selection guide. For example, for the UltraScale+ FPGAs, please go to https://www.xilinx.com/content/dam/xilinx/support/documentation/selection-guides/ultrascale-plus-fpga-product-selection-guide.pdf.

An overview of the Xilinx FPGA devices features

As highlighted in the introduction to this chapter, modern Xilinx FPGA devices contain a vast list of hardware block features and external interfaces that relatively define their category or family and, consequently, make them suitable for a certain application or a specific market vertical. This chapter looks at the rich list of these features to help you understand what today’s FPGAs are capable of offering system designers. It is worth noting that not all the FPGAs contain all these elements.

For a detailed overview of these features, you are encouraged to examine the Xilinx UltraScale+ Data Sheet as a good starting point at https://www.xilinx.com/content/dam/xilinx/support/documentation/data_sheets/ds890-ultrascale-overview.pdf.

In the following subsections, we will summarize some of these features.

Logic elements

Modern Xilinx FPGAs have an abundance of CLBs. These CLBs are formed by lookup tables (LUTs) and registers known as flip-flops. These CLBs are the elementary ingredients that logic user functions are built from to form the desired engine to perform a combinatorial function that’s coupled (or not) with sequential logic. These are also built from Flip-Flop resources contained within the CLBs. Following a full design process from design capture, to synthesizing and implementing the production of a binary image to program the FPGA device, these CLBs are configured to operate in a manner that matches the aforementioned required partial task within the desired function defined by the user. The CLB can also be configured to behave as a deep shift register, a multiplexer, or a carry logic function. It can also be configured as distributed memory from which more SRAM memory is synthesized to complement the SRAM resources that can be built using the FPGA device block’s RAM.

Storage

Xilinx FPGAs have many block RAMs with built-in FIFO. Additionally, in UltraScale+ devices, there are 4Kx72 UltraRAM blocks. As mentioned previously, the CLB can also be configured as distributed memory from which more SRAM memory can be synthesized.

The Virtex UltraScale+ HBM FPGAs can integrate up to 16 GB of high-bandwidth memory (HBM) Gen2.

Xilinx Zynq UltraScale+ MPSoC also provides many layers of SRAM memory within its ARM-based SoC, such as OCM memory and the Level 1 and Level 2 caches of the integrated CPUs and GPUs.

Signal processing

Xilinx FPGAs are rich in resources for digital signal processing (DSP). They have DSP slices with 27x18 multipliers and rich local interconnects. The DSP slice has many usage possibilities, as described in the FPGA datasheet.

Routing and SSI

The Xilinx FPGA’s device interconnect employs a routing infrastructure, which is a combination of configurable switches and nets. These allow the FPGA elements such as the I/O blocks, the DSP slices, the memories, and the CLBs to be interconnected.

The efficiency of using these routing resources is as important as the device hardware’s logical resources and features. This is because they represent the nerve system of the FPGA device, their abundance of interconnect logic, and their functional elements, which are crucial to meeting the design performance criteria.

Design clocking

Xilinx FPGA devices contain many clock management elements, including digital local loops (DLLs) for clock generation and synthesis, global buffers for clock signal buffering, and routing infrastructure to meet the demands of many challenging design requirements. The flexibility of the clocking network minimizes the inter-signal delays or skews.

External memory interfaces

The Xilinx FPGAs can interface to many external parallel memories, including DDR4 SDRAM. Some FPGAs also support interfacing to external serial memories, such as Hybrid Memory Cube (HMC).

External interfaces

Xilinx FPGA devices interface to the external ICs through I/Os that support many standards and PHY protocols, including the serial multi-gigabit transceivers (MGTs), Ethernet, PCIe, and Interlaken.

ARM-based processing subsystem

The first device family that Xilinx brought to the market that integrated an ARM CPU was the Zynq-7000 SoC FPGA with its integrated ARM Cortex-A9 CPU. This family was followed by the Xilinx Zynq UltraScale+ MPSoCs and RFSoCs, which feature a processing system (PS) that includes a dual or a quad-core variant of the ARM Cortex-A53, and a dual-core ARM Cortex-R5F. Some variants have a graphics processing unit (GPU). We will delve into the Xilinx SoCs in the next chapter.

Configuration and system monitoring

Being SRAM-based, the FPGA requires a configuration file to be loaded when powered up to define its functionality. Consequently, any errors that are encountered in the FPGA’s configuration binary image, either at configuration time or because of a physical problem in mission mode, will alter the overall system functionality and may even cause a disastrous outcome for sensitive applications. Therefore, it is a necessity for critical applications to have system monitoring to urgently intervene when such an error is discovered to correct it and limit any potential damage via its built-in self-monitoring mechanism.

Encryption

Modern FPGAs provide decryption blocks to address security needs and protect the device’s hardware from hacking. FPGAs with integrated SoC and PS blocks have a configuration and security unit (CSU) that allows the device to be booted and configured safely.