Book Image

Hands-On High Performance Programming with Qt 5

By : Marek Krajewski
5 (1)
Book Image

Hands-On High Performance Programming with Qt 5

5 (1)
By: Marek Krajewski

Overview of this book

Achieving efficient code through performance tuning is one of the key challenges faced by many programmers. This book looks at Qt programming from a performance perspective. You'll explore the performance problems encountered when using the Qt framework and means and ways to resolve them and optimize performance. The book highlights performance improvements and new features released in Qt 5.9, Qt 5.11, and 5.12 (LTE). You'll master general computer performance best practices and tools, which can help you identify the reasons behind low performance, and the most common performance pitfalls experienced when using the Qt framework. In the following chapters, you’ll explore multithreading and asynchronous programming with C++ and Qt and learn the importance and efficient use of data structures. You'll also get the opportunity to work through techniques such as memory management and design guidelines, which are essential to improve application performance. Comprehensive sections that cover all these concepts will prepare you for gaining hands-on experience of some of Qt's most exciting application fields - the mobile and embedded development domains. By the end of this book, you'll be ready to build Qt applications that are more efficient, concurrent, and performance-oriented in nature
Table of Contents (14 chapters)

Traditional wisdom and basic guidelines

When I started with programming (a long time ago), the pieces of advice about performance optimization traditionally given to a newbie were the following:

  • Don't do it (yet)
  • Premature optimization is the root of all evil
  • First make it run, then make it right, then make it fast

The first advice contained the yet only in its variant for the experts; the second was (and still is) normally misquoted, leaving out the "in say 97% of the cases" part, and the third quote gives you the impression that merely writing a program is already so difficult that fretting about performance is a luxury. It's no wonder then that the normal approach to performance was to fix it later!

But all of the adages nonetheless highlight an important insight—performance isn't distributed evenly through your code. The 80-20, or maybe even the 90-10 rule, applies here, because there are some hotspots where extreme care is needed, but we shouldn't try to optimize every nook and cranny in our code. So, our first guideline will be premature optimization—we should forget about it in, say, 95% of cases.

But what exactly are the 20%, 10%, or 5% of code where we shouldn't forget about it? Another old-age programming wisdom states this—programmers are notoriously bad at guessing performance bottlenecks.

So, we shouldn't try to predict the tight spot and measure the performance of a ready program instead. This does sound a lot like the fix it later cowboy coder's approach. Well, this book takes the stance that though premature optimization should be avoided, nonetheless, premature pessimization should be avoided at all costs, as it's even worse! However, avoiding premature pessimizations requires much detailed knowledge about which language constructs, which framework use cases, and which architectural decisions come with what kind of performance price tags. This book will try to provide this knowledge in the context of the Qt framework.

But, first, let's talk about quite general principles that address the question of what should be avoided, lest the performance degrades. As I see it, we can distill from the traditional performance wisdom from the following basic common-sense advice:

  • Don't do the same thing twice.
  • Don't do slow things often.
  • Don't copy data unnecessarily.

You'll agree that all that can't be good for performance? So, let's discuss these three simple but fundamental insights in some more detail.

Avoiding repeated computation

The techniques falling under the first point are concerned with unneeded repetition of work. The basic counter measure here is caching, that is, saving the results of computation for later use. A more extreme example of avoiding repletion of work is to precompute results even before their first usage. This is normally achieved by hand-coded (or generated by a script) precomputed tables or, if your programming language allows that, with compile-time computation. In the latter case, we sacrifice compilation times for better run-time performance. We'll have a look at C++ compile time techniques in Chapter 3, Deep Dive into C++ and Performance.

Choosing the optimal algorithm and data structure also falls into that realm, as different algorithms and data structures are optimized for different use cases, and you have to make your choice wisely. We'll have a look at some gotchas pertaining Qt's own data structures in Chapter 4, Using Data Structures and Algorithms Efficiently.

The very basic techniques such as pulling code out of a loop, such as the repeated computations or initializations of local variables, fall into that class as well, but I'm convinced you knew about this already.

Avoiding paying the high price

The techniques falling under the second point come into play if there's something we can't avoid doing, but it has a pretty high cost tagged on to it. An example of this is interaction with the operating system or hardware, such as writing data to a file or sending a packet over the network. In this case, we resort to batching, also known in I/O context as buffering—instead of writing or sending a couple of small chunks of data right away, we first gather them and then write or send them together to avoid paying the high cost each time.

On the other hand, we can apply techniques of this type too. In I/O or memory context, this would be the prefetching of data, also known as read-ahead. When reading data from a file, we read more than the user actually requested, hoping that the next portion of data will be needed soon. In the networking context, there are examples of speculative pre-resolving of Domain Name System (DNS) addresses when a user is hovering over a link in browsers or even pre-connecting to such addresses. However, such measures can turn into its counterpart when the prediction fails, and such techniques require very careful tuning!

Related techniques to be mentioned in this context are also avoidance of system calls and avoidance of locking to spare the costs of system call and switching to the kernel context.

We'll see some applications of such techniques in last chapters of the book when we discuss I/O, graphics , and networking.

Another example of when this rule can be used is memory management. General-purpose memory allocators tend to incur rather high costs on single allocations, so the remedy is to preallocate one big buffer at first and then use it for all needs of the program by managing it by ourselves using a custom allocation strategy. If we additionally know how big our objects are going to be, we can just allocate several buffer pools for different object sizes, making the custom allocation strategy rather simple. Preallocating memory at the start used to be a classic measure to improve the performance of memory intensive programs. We'll discuss these technical C++ details in Chapter 3, Deep Dive into C++ and Performance.

Avoiding copying data around

The techniques falling under the third point tend to be somehow of a lower-level nature. The first example is avoiding copying data when passing parameters to a function call. A suitable choice of data structure will avoid copying of data as well—just think about an automatically growing vector. In many cases, we can use preallocation techniques to prevent this (such as the reserve() method of std::vector) or choose a different data structure that will better match the intended use case.

Another common case when the copying of data can be a problem is string processing. Just adding two strings together will, in the naive implementation, allocate a new one and copy the contents of the two strings to be joined. And as much of programming contains some string manipulations, this can be a big problem indeed! The remedy for that could be using static string literals or just choosing a better library implementation for strings.

We'll discuss these themes in Chapter 3, Deep Dive into C++ and Performance, and Chapter 4, Using Data Structures and Algorithms Efficiently.

Another example of this optimization rule is the holy grail of network programming—the zero-copy sending and receiving of data. The idea is that data isn't copied between user buffers and network stack before sending it out. Most modern network hardware supports scatter-gather (also known as vectored I/O), where the data to be sent doesn't have to be provided in a single contiguous buffer but can be made available as a series of separate buffers.

In that way, a user's data doesn't have to be consolidated before sending, sparing us copying of data. The same principle can be applied to software APIs as well; for example, Facebook's recent TSL 1.3 implementation (codename Fizz, open sourced) supports scatter-gather API on library level!

General performance optimization approach

Up to now, we listed the following classic optimization techniques:

  • Optimal algorithms
  • Optimal data structures
  • Caching
  • Precomputed tables
  • Preallocation and custom allocators
  • Buffering and batching
  • Read-ahead
  • Copy avoidance
  • Finding a better library

With our current stand of knowledge, we can formulate the following general-performance optimization procedure:

  1. Write your code, avoiding unnecessary pessimizations where it doesn't cost much, as in the following examples:
    • Pass parameters by reference.
    • Use reasonably good, widely known algorithms and data structures.
    • Avoid copying data and unnecessary allocations.

This alone should give you a pretty decent baseline performance.

  1. Measure the performance, find the tight spots, and use some of the standard techniques listed. Then, measure again and iterate. This step must be done if the performance of our program isn't satisfactory despite our sound programming practices. Unfortunately, we can't know or anticipate everything that will happen in the complex interplay of hardware and software—there can always be surprises waiting for us.
  2. If you still can't achieve good performance, then your hardware is probably too slow. Even with performance optimization techniques, we still can't do magic, sorry!

The preceding advice looks quite reasonable, and you might ask: Are we done? That wasn't that scary! Unfortunately, it's not the whole story. Enter the leaky abstraction of modern processor architectures.