Book Image

The Art of Writing Efficient Programs

By : Fedor G. Pikus

3 (2)

Book Image

The Art of Writing Efficient Programs

3 (2)

By: Fedor G. Pikus

Overview of this book

The great free lunch of "performance taking care of itself" is over. Until recently, programs got faster by themselves as CPUs were upgraded, but that doesn't happen anymore. The clock frequency of new processors has almost peaked, and while new architectures provide small improvements to existing programs, this only helps slightly. To write efficient software, you now have to know how to program by making good use of the available computing resources, and this book will teach you how to do that. The Art of Efficient Programming covers all the major aspects of writing efficient programs, such as using CPU resources and memory efficiently, avoiding unnecessary computations, measuring performance, and how to put concurrency and multithreading to good use. You'll also learn about compiler optimizations and how to use the programming language (C++) more efficiently. Finally, you'll understand how design decisions impact performance. By the end of this book, you'll not only have enough knowledge of processors and compilers to write efficient programs, but you'll also be able to understand which techniques to use and what to measure while improving performance. At its core, this book is about learning how to learn.

Preface

Who is this book for?

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Share Your Thoughts

Section 1 – Performance Fundamentals

Section 1 – Performance Fundamentals

Free Chapter

Chapter 1: Introduction to Performance and Concurrency

Chapter 1: Introduction to Performance and Concurrency

Why focus on performance?

Why performance matters

What is performance?

Evaluating, estimating, and predicting performance

Learning about high performance

Chapter 2: Performance Measurements

Chapter 2: Performance Measurements

Technical requirements

Performance measurements by example

Performance benchmarking

Performance profiling

Micro-benchmarking

Chapter 3: CPU Architecture, Resources, and Performance

Chapter 3: CPU Architecture, Resources, and Performance

Technical requirements

The performance begins with the CPU

Probing performance with micro-benchmarks

Data dependencies and pipelining

Pipelining and branches

Speculative execution

Optimization of complex conditions

Branchless computing

Chapter 4: Memory Architecture and Performance

Chapter 4: Memory Architecture and Performance

Technical requirements

The performance begins with the CPU but does not end there

Measuring memory access speed

The speed of memory: the numbers

Optimizing memory performance

The ghost in the machine

Chapter 5: Threads, Memory, and Concurrency

Chapter 5: Threads, Memory, and Concurrency

Technical requirements

Understanding threads and concurrency

Understanding the cost of memory synchronization

Why data sharing is expensive

Learning about concurrency and order

Section 2 – Advanced Concurrency

Section 2 – Advanced Concurrency

Chapter 6: Concurrency and Performance

Chapter 6: Concurrency and Performance

Technical requirements

What is needed to use concurrency effectively?

Locks, alternatives, and their performance

Building blocks for concurrent programming

Chapter 7: Data Structures for Concurrency

Chapter 7: Data Structures for Concurrency

Technical requirements

What is a thread-safe data structure?

The thread-safe stack

The thread-safe queue

The thread-safe list

Chapter 8: Concurrency in C++

Chapter 8: Concurrency in C++

Technical requirements

Concurrency support in C++11

Concurrency support in C++17

Concurrency support in C++20

Section 3 – Designing and Coding High-Performance Programs

Section 3 – Designing and Coding High-Performance Programs

Chapter 9: High-Performance C++

Chapter 9: High-Performance C++

Technical requirements

What is the efficiency of a programming language?

Unnecessary copying

Inefficient memory management

Optimization of conditional execution

Chapter 10: Compiler Optimizations in C++

Chapter 10: Compiler Optimizations in C++

Technical requirements

Compilers optimizing code

Chapter 11: Undefined Behavior and Performance

Chapter 11: Undefined Behavior and Performance

Technical requirements

What is undefined behavior?

Why have undefined behavior?

Undefined behavior and C++ optimization

Using undefined behavior for efficient design

Chapter 12: Design for Performance

Chapter 12: Design for Performance

Technical requirements

Interaction between the design and performance

Design for performance

API design considerations

Design for optimal data access

Performance trade-offs

Making informed design decisions

Assessments

Other Books You May Enjoy

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

3 (2)

5 star

50%

4 star

0

3 star

0

2 star

0

1 star

50%

Why focus on performance?

In the early days of computing, programming was hard. The processors were slow, the memory was limited, the compilers were primitive, and nothing could be achieved without a major effort. The programmer had to know the architecture of the CPU, the layout of the memory, and when the compiler did not cut it, the critical code had to be written in assembler.

Then things got better. The processors were getting faster every year, the number that used to be the capacity of a huge hard drive became the size of the main memory in an average PC, and the compiler writers learned a few tricks to make programs faster. The programmers could spend more time actually solving problems. This was reflected in the programming languages and design styles: between the higher-level languages and evolving design and programming practices, the programmers' focus shifted from what they wanted to say in code to how they wanted to say it.

Formerly common knowledge, such as exactly how many registers the CPU has and what their names are, became esoteric, arcane matter. A "large code base" used to be one that needed both hands to lift the card deck; now, it was one that taxed the capacity of the version control system. There was hardly ever a need to write code specialized for a particular processor or a memory system, and portable code became the norm.

As for assembler, it was actually difficult to outperform the compiler-generated code, a task well out of reach for most programmers. For many applications, and those writing them, there was "enough performance," and other aspects of the programmers' trade became more important (to be clear, the fact that the programmers could focus on the readability of their code without worrying whether adding a function with a meaningful name would make the program unacceptably slow was a good thing).

Then, and rather suddenly, the free lunch of "performance taking care of itself" was over. The seemingly unstoppable progress of the ever-growing computing power just … stopped.

Figure 1.1 – Charting 35 years of microprocessor evolution
(Refer to https://github.com/karlrupp/microprocessor-trend-data and https://github.com/karlrupp/microprocessor-trend-data/blob/master/LICENSE.txt)

Figure 1.1 – Charting 35 years of microprocessor evolution (Refer to https://github.com/karlrupp/microprocessor-trend-data and https://github.com/karlrupp/microprocessor-trend-data/blob/master/LICENSE.txt)

Around the year 2005, the computing power of a single CPU reached saturation. To a large extent, this was directly related to the CPU frequency, which also stopped growing. The frequency, in turn, was limited by several factors, one of which was power consumption (if the frequency trend continued unchanged, today's CPUs would pack more power per square millimeter than the great jet engines that lift rockets into space).

It is evident from the preceding figure that not every measure of progress stalled in 2005: the number of transistors packed into a single chip kept growing. So, what were they doing if not making chips faster? The answer is two-fold, and part of it is revealed by the bottom curve: instead of making the single processor larger, the designers had to settle for putting several processor cores on the same die. The computing power of all these cores together, of course, increased with the number of cores, but only if the programmer knew how to use them. The second part of the "great transistor mystery" (where do all the transistors go?) is that they went into various very advanced enhancements to the processor capabilities, enhancements that can be used to improve performance, but again, only if the programmer makes an effort to use them.

The change in the progress of processors that we have just seen is often held as the reason that concurrent programming has entered the mainstream. But the change was even more profound than that. You will learn throughout this book how, in order to obtain the best performance, the programmer once again needs to understand the intricacies of the processor and memory architecture and their interactions. Great performance doesn't "just happen" anymore. At the same time, the progress we have made in writing code that clearly expresses what needs to be done, rather than how it's done, is not to be rolled back. We still want to write readable and maintainable code, and (and not but) we want it to be efficient as well.

To be sure, for many applications there is still enough performance in modern CPUs, but performance is getting more attention than it used to, in large part because of the change in CPU development we just discussed and because we want to do more computing in more applications that do not necessarily have access to the best computing resources (for example, a portable medical device today may have a full neural network in it).

Fortunately, we do not have to rediscover some lost art of performance by digging through piles of decaying punch cards in a dark storage room. At any time, there were still hard problems, and the phrase there is never enough computing power was true for many programmers. As computing power grew exponentially, so did the demands on it. The art of extreme performance was kept alive in those few domains that needed it. An example of one such domain may be instructive and inspiring at this point.