Book Image

C++ High Performance - Second Edition

By : Björn Andrist, Viktor Sehr
5 (2)
Book Image

C++ High Performance - Second Edition

5 (2)
By: Björn Andrist, Viktor Sehr

Overview of this book

C++ High Performance, Second Edition guides you through optimizing the performance of your C++ apps. This allows them to run faster and consume fewer resources on the device they're running on without compromising the readability of your codebase. The book begins by introducing the C++ language and some of its modern concepts in brief. Once you are familiar with the fundamentals, you will be ready to measure, identify, and eradicate bottlenecks in your C++ codebase. By following this process, you will gradually improve your style of writing code. The book then explores data structure optimization, memory management, and how it can be used efficiently concerning CPU caches. After laying the foundation, the book trains you to leverage algorithms, ranges, and containers from the standard library to achieve faster execution, write readable code, and use customized iterators. It provides hands-on examples of C++ metaprogramming, coroutines, reflection to reduce boilerplate code, proxy objects to perform optimizations under the hood, concurrent programming, and lock-free data structures. The book concludes with an overview of parallel algorithms. By the end of this book, you will have the ability to use every tool as needed to boost the efficiency of your C++ projects.
Table of Contents (17 chapters)
15
Other Books You May Enjoy
16
Index

Function objects and lambda expressions

Lambda expressions, introduced in C++11, and further enhanced with every C++ version since, are one of the most useful features in modern C++. Their versatility comes not only from easily passing functions to algorithms, but also their use in a lot of circumstances where you need to pass the code around, especially as you can store a lambda in a std::function.

Although lambdas made these programming techniques vastly simpler to work with, everything mentioned in this section is possible to perform without them. A lambda—or, more formally, a lambda expression—is a convenient way of constructing a function object. But instead of using lambda expressions, we could instead implement classes with operator() overloaded, and then instantiate these to create function objects.

We will explore the lambda's similarities to these kinds of classes later, but first I will introduce lambda expressions in a simple use case.

The basic syntax of a C++ lambda

In a nutshell, lambdas enable programmers to pass functions to other functions, just as easily as a variable is passed.

Let's compare passing a lambda to an algorithm with passing a variable:

// Prerequisite 
auto v = std::vector{1, 3, 2, 5, 4}; 
 
// Look for number three 
auto three = 3; 
auto num_threes = std::count(v.begin(), v.end(), three); 
// num_threes is 1 
 
// Look for numbers which is larger than three 
auto is_above_3 = [](int v) { return v > 3; }; 
auto num_above_3 = std::count_if(v.begin(), v.end(), is_above_3);
// num_above_3 is 2

In the first case, we pass a variable to std::count(), and in the latter case we pass a function object to std::count_if(). This is a typical use case for lambdas; we pass a function to be evaluated many times by another function (in this case, std::count_if()).

Also, the lambda does not need to be tied to a variable; just as we can put a variable right into an expression, we can do the same with a lambda:

auto num_3 = std::count(v.begin(), v.end(), 3); 
auto num_above_3 = std::count_if(v.begin(), v.end(), [](int i) { 
  return i > 3; 
});

The lambdas you have seen so far are called stateless lambdas; they don't copy or reference any variables from outside the lambda and therefore don't need any internal state. Let's make this a little more advanced by introducing stateful lambdas by using capture blocks.

The capture clause

In the previous example, we hard-coded the value 3 inside the lambda so that we always counted the numbers greater than three. What if we want to use external variables inside the lambda? What we do is capture the external variables by putting them in the capture clause; that is, the [] part of the lambda:

auto count_value_above(const std::vector<int>& v, int x) { 
  auto is_above = [x](int i) { return i > x; }; 
  return std::count_if(v.begin(), v.end(), is_above); 
}

In this example, we captured the variable x by copying it into the lambda. If we want to declare x as a reference, we put an & at the beginning, like this:

auto is_above = [&x](int i) { return i > x; };

The variable is now merely a reference to the outer x variable, just like a regular reference variable in C++. Of course, we need to be very cautious about the lifetime of objects we pass by reference into a lambda since the lambda might execute in a context where the referenced objects have ceased to exist. It's therefore safer to capture by value.

Capture by reference versus capture by value

Using the capture clause for referencing and copying variables works just like regular variables. Take a look at these two examples and see if you can spot the difference:

Capture by value

Capture by reference

auto func() {
  auto vals = {1,2,3,4,5,6};
  auto x = 3;
  auto is_above = [x](int v) {
    return v > x;
  };
  x = 4;
  auto count_b = std::count_if(
    vals.begin(),
    vals.end(),
    is_above
   );  // count_b equals 3 }
auto func() {
  auto vals = {1,2,3,4,5,6};
  auto x = 3;
  auto is_above = [&x](int v) {
    return v > x;
  };
  x = 4;
  auto count_b = std::count_if(
    vals.begin(),
    vals.end(),
    is_above
   );  // count_b equals 2 }

In the first example, x was copied into the lambda and was therefore not affected when x was mutated; consequently std::count_if() counts the number of values above 3.

In the second example, x was captured by reference, and therefore std::count_if() instead counts the number of values above 4.

Similarities between a lambda and a class

I mentioned earlier that lambda expressions generate function objects. A function object is an instance of a class that has the call operator, operator()(), defined.

To understand what a lambda expression consists of, you can view it as a regular class with restrictions:

  • The class only consists of one member function
  • The capture clause is a combination of the class' member variables and its constructor

The following table shows lambda expressions and the corresponding classes. The left column uses capture by value and the right column capture by reference:

A lambda with capture by value…

A lambda with capture by reference...

auto x = 3;auto is_above = [x](int y) { return y > x;};auto test = is_above(5);
auto x = 3;auto is_above = [&x](int y) { return y > x;};auto test = is_above(5);

...corresponds to this class:

...corresponds to this class:

auto x = 3;class IsAbove {
public: IsAbove(int x) : x{x} {} auto operator()(int y) const {   return y > x; }private: int x{}; // Value };auto is_above = IsAbove{x};
auto test = is_above(5);
auto x = 3;class IsAbove {
public: IsAbove(int& x) : x{x} {} auto operator()(int y) const {   return y > x; }private: int& x; // Reference };
auto is_above = IsAbove{x};
auto test = is_above(5);

Thanks to lambda expressions, we don't have to manually implement these function object types as classes.

Initializing variables in capture

As seen in the previous example, the capture clause initializes member variables in the corresponding class. This means that we can also initialize member variables inside a lambda. These variables will only be visible from inside the lambda. Here is an example of a lambda that initializes a capture variable called numbers:

auto some_func = [numbers = std::list<int>{4,2}]() {
  for (auto i : numbers)
    std::cout << i;
};
some_func();  // Output: 42

The corresponding class would look something like this:

class SomeFunc {
public:
 SomeFunc() : numbers{4, 2} {}
 void operator()() const {
  for (auto i : numbers)
    std::cout << i;
 }
private:
 std::list<int> numbers;
};
auto some_func = SomeFunc{};
some_func(); // Output: 42

When initializing a variable inside a capture, you can imagine that there is a hidden auto keyword in front of the variable name. In this case, you can think about numbers as being defined like auto numbers = std::list<int>{4, 2}. If you want to initialize a reference, you can use an ampersand in front of the name, which would correspond to auto&. Here is an example:

auto x = 1;
auto some_func = [&y = x]() {
  // y is a reference to x
};

Again, you have to be very cautious about lifetimes when referencing (and not copying) objects outside the lambda.

It's also possible to move an object inside a lambda, which is necessary when using move-only types such as std::unique_ptr. Here is how it can be done:

auto x = std::make_unique<int>(); 
auto some_func = [x = std::move(x)]() {
  // Use x here..
};

This also demonstrates that it is possible to use the same name (x) for the variable. This is not necessary. Instead, we could have used some other name inside the lambda, for example [y = std::move(x)].

Mutating lambda member variables

As the lambda works just like a class with member variables, it can also mutate them. However, the function call operator of a lambda is const by default, so we explicitly need to specify that the lambda can mutate its members by using the mutable keyword. In the following example, the lambda mutates the counter variable every time it's invoked:

auto counter_func = [counter = 1]() mutable {
  std::cout << counter++;
};
counter_func(); // Output: 1
counter_func(); // Output: 2
counter_func(); // Output: 3

If a lambda only captures variables by reference, we do not have to add the mutable modifier to the declaration, as the lambda itself doesn't mutate. The difference between mutable and non-mutable lambdas is demonstrated in the following code snippets:

Capture by value

Capture by reference

auto some_func() {
  auto v = 7;
  auto lambda = [v]() mutable {
    std::cout << v << " ";
    ++v;
  };
  assert(v == 7);
  lambda();  lambda();
  assert(v == 7);
  std::cout << v;
}
auto some_func() {
  auto v = 7;
  auto lambda = [&v]() {
    std::cout << v << " ";
    ++v;
  };
  assert(v == 7);
  lambda();
  lambda();
  assert(v == 9);
  std::cout << v;
}

Output: 7 8 7

Output: 7 8 9

In the example to the right where v is captured by reference, the lambda will mutate the variable v, which is owned by the scope of some_func(). The mutating lambda in the left column will only mutate a copy of v, owned by the lambda itself. This is the reason why we will end up with different outputs in the two versions.

Mutating member variables from the compiler's perspective

To understand what's going on in the preceding example, take a look at how the compiler sees the previous lambda objects:

Capture by value

Capture by reference

class Lambda {
 public:
 Lambda(int m) : v{m} {}
 auto operator()() {
   std::cout<< v << " ";
   ++v;
 }
private:
  int v{};
};
class Lambda {
 public:
 Lambda(int& m) : v{m} {}
 auto operator()() const {
   std::cout<< v << " ";
   ++v;
 }
private:
 int& v;
};

As you can see, the first case corresponds to a class with a regular member, whereas the capture by reference case simply corresponds to a class where the member variable is a reference.

You might have noticed that we add the modifier const on the operator() member function of the capture by reference class, and we also do not specify mutable on the corresponding lambda. The reason this class is still considered const is that we do not mutate anything inside the actual class/lambda; the actual mutation applies to the referenced value, and therefore the function is still considered const.

Capture all

In addition to capturing variables one by one, all variables in the scope can be captured by simply writing [=] or [&].

Using [=] means that every variable will be captured by value, whereas [&] captures all variables by reference.

If we use lambdas inside a member function, it is also possible to capture the entire object by reference using [this] or by copy by writing [*this]:

class Foo { 
public: 
 auto member_function() { 
   auto a = 0; 
   auto b = 1.0f;
   // Capture all variables by copy 
   auto lambda_0 = [=]() { std::cout << a << b; }; 
   // Capture all variables by reference 
   auto lambda_1 = [&]() { std::cout << a << b; }; 
   // Capture object by reference 
   auto lambda_2 = [this]() { std::cout << m_; }; 
   // Capture object by copy 
   auto lambda_3 = [*this]() { std::cout << m_; }; 
 }
private: 
 int m_{}; 
};

Note that using [=] does not mean that all variables in the scope are copied into the lambda; only the variables actually used inside the lambda are copied.

When capturing all variables by value, you can specify variables to be captured by reference (and vice versa). The following table shows the result of different combinations in the capture block:

Capture block

Resulting capture types

int a, b, c;auto func = [=] { /*...*/ };

Capture a, b, c by value.

int a, b, c;auto func = [&] { /*...*/ };

Capture a, b, c by reference.

int a, b, c;auto func = [=, &c] { /*...*/ };

Capture a, b by value.

Capture c by reference.

int a, b, c;auto func = [&, c] { /*...*/ };

Capture a, b by reference.

Capture c by value.

Although it is convenient to capture all variables with [&] or [=], I recommend capturing variables one by one, as it improves the readability of the code by clarifying exactly which variables are used inside the lambda scope.

Assigning C function pointers to lambdas

Lambdas without captures can be implicitly converted to function pointers. Let's say you are using a C library, or an older C++ library, that uses a callback function as a parameter, like this:

extern void download_webpage(const char* url,
                              void (*callback)(int, const char*));

The callback is called with a return code and some downloaded content. It is possible to pass a lambda as a parameter when calling download_webpage(). Since the callback is a regular function pointer, the lambda must not have any captures and you have to use a plus (+) in front of the lambda:

auto lambda = +[](int result, const char* str) {
  // Process result and str
};
download_webpage("http://www.packt.com", lambda);

This way, the lambda is converted into a regular function pointer. Note that the lambda cannot have any captures at all in order to use this functionality.

Lambda types

Since C++20, lambdas without captures are default-constructible and assignable. By using decltype, it's now easy to construct different lambda objects that have the same type:

auto x = [] {};   // A lambda without captures
auto y = x;       // Assignable
decltype(y) z;    // Default-constructible
static_assert(std::is_same_v<decltype(x), decltype(y)>); // passes
static_assert(std::is_same_v<decltype(x), decltype(z)>); // passes

However, this only applies to lambdas without captures. Lambdas with captures have their own unique type. Even if two lambda functions with captures are plain clones of each other, they still have their own unique type. Therefore, it's not possible to assign one lambda with captures to another lambda.

Lambdas and std::function

As mentioned in the previous section, lambdas with captures (stateful lambdas) cannot be assigned to each other since they have unique types, even if they look exactly the same. To be able to store and pass around lambdas with captures, we can use std::function to hold a function object constructed by a lambda expression.

The signature of a std::function is defined as follows:

std::function< return_type ( parameter0, parameter1...) >

So, a std::function returning nothing and having no parameters is defined like this:

auto func = std::function<void(void)>{};

A std::function returning a bool with an int and a std::string as parameters is defined like this:

auto func = std::function<bool(int, std::string)>{};

Lambda functions sharing the same signature (same parameters and same return type) can be held by the same type of std::function objects. A std::function can also be reassigned at runtime.

What is important here is that what is captured by the lambda does not affect its signature, and therefore both lambdas with and without captures can be assigned to the same std::function variable. The following code shows how different lambdas are assigned to the same std::function object called func:

// Create an unassigned std::function object 
auto func = std::function<void(int)>{}; 
// Assign a lambda without capture to the std::function object 
func = [](int v) { std::cout << v; }; 
func(12); // Prints 12 
// Assign a lambda with capture to the same std::function object 
auto forty_two = 42; 
func = [forty_two](int v) { std::cout << (v + forty_two); }; 
func(12); // Prints 54 

Let's put the std::function to use in something that resembles a real-world example next.

Implementing a simple Button class with std::function

Assume that we set out to implement a Button class. We can then use the std::function to store the action corresponding to clicking the button, so that when we call the on_click() member function, the corresponding code is executed.

We can declare the Button class like this:

class Button {
public: 
  Button(std::function<void(void)> click) : handler_{click} {} 
  auto on_click() const { handler_(); } 
private: 
  std::function<void(void)> handler_{};
};

We can then use it to create a multitude of buttons with different actions. The buttons can conveniently be stored in a container because they all have the same type:

auto create_buttons () { 
  auto beep = Button([counter = 0]() mutable {  
    std::cout << "Beep:" << counter << "! "; 
    ++counter; 
  }); 
  auto bop = Button([] { std::cout << "Bop. "; }); 
  auto silent = Button([] {});
  return std::vector<Button>{beep, bop, silent}; 
}

Iterating the list and calling on_click() on each button will execute the corresponding function:

const auto& buttons = create_buttons();
for (const auto& b: buttons) {
  b.on_click();
}
buttons.front().on_click(); // counter has been incremented
// Output: "Beep:0! Bop. Beep:1!"

The preceding example with buttons and click handlers demonstrates some of the benefits of using std::function in combination with lambdas; even though each stateful lambda will have its own unique type, a single std::function type can wrap lambdas that share the same signature (return type and arguments).

As a side note, you might have noticed that the on_click() member function is declared const. However, it's mutating the member variable handler_ by increasing the counter variable in one of the click handlers. This might seem like it breaks const-correctness rules, as a const member function of Button is allowed to call a mutating function on one of its class members. The reason it is allowed is the same reason that member pointers are allowed to mutate their pointed-to value in a const context. Earlier in this chapter, we discussed how to propagate constness for pointer data members.

Performance consideration of std::function

A std::function has a few performance losses compared to a function object constructed by a lambda expression directly. This section will discuss some of the things related to performance to consider when using std::function.

Prevented inline optimizations

When it comes to lambdas, the compiler has the ability to inline the function call; that is, the overhead of the function call is eliminated. The flexible design of std::function make it nearly impossible for the compiler to inline a function wrapped in a std::function. The prevention of inline optimizations can have a negative impact on the performance if small functions wrapped in std::function are called very frequently.

Dynamically allocated memory for captured variables

If a std::function is assigned to a lambda with captured variables/references, the std::function will, in most cases, use heap-allocated memory to store the captured variables. Some implementations of std::function do not allocate additional memory if the size of the captured variable is below some threshold.

This means that not only is there a performance penalty due to the extra dynamic memory allocation, but also that it is slower, as heap-allocated memory can increase the number of cache misses (read more about cache misses in Chapter 4, Data Structures).

Additional run-time computation

Calling a std::function is generally a bit slower than executing a lambda, as a little more code is involved. For small and frequently called std::functions, this overhead may become significant. Imagine that we have a really small lambda defined like this:

auto lambda = [](int v) { return v * 3; };

The benchmark that follows demonstrates the difference between executing 10 million function calls for a std::vector of the explicit lambda type versus a std::vector of a corresponding std::function. We will begin with the version using the explicit lambda:

auto use_lambda() { 
  using T = decltype(lambda);
  auto fs = std::vector<T>(10'000'000, lambda);
  auto res = 1;
  // Start clock
  for (const auto& f: fs)
    res = f(res);
  // Stop clock here
  return res;
}

We only measure the time it takes to execute the loop inside the function. The next version wraps our lambda in a std::function, and looks like this:

auto use_std_function() { 
  using T = std::function<int(int)>;
  auto fs = std::vector<T>(10'000'000, T{lambda});
  auto res = 1;
  // Start clock
  for (const auto& f: fs)
    res = f(res);
  // Stop clock here
  return res;
}

I'm compiling this code on my MacBook Pro from 2018 using Clang with optimizations turned on (-O3). The first version, use_lambda(), executes the loop at roughly 2 ms, whereas the second version, use_std_function(), takes almost 36 ms to execute the loop.

Generic lambdas

A generic lambda is a lambda accepting auto parameters, making it possible to invoke it with any type. It works just like a regular lambda, but the operator() has been defined as a member function template.

Only the parameters are template variables, not the captured values. In other words, the captured value, v, in the following example will be of type int regardless of the types of v0 and v1:

auto v = 3; // int
auto lambda = [v](auto v0, auto v1) {
  return v + v0*v1;
};

If we translate the above lambda to a class, it would correspond to something like this:

class Lambda {
public:
  Lambda(int v) : v_{v} {}
  template <typename T0, typename T1>
  auto operator()(T0 v0, T1 v1) const { 
    return v_ + v0*v1; 
  }
private:
  int v_{};
};
auto v = 3;
auto lambda = Lambda{v};

Just like the templated version, the compiler won't generate the actual function until the lambda is invoked. So, if we invoke the previous lambda like this:

auto res_int = lambda(1, 2);
auto res_float = lambda(1.0f, 2.0f);

the compiler will generate something similar to the following lambdas:

auto lambda_int = [v](int v0, const int v1) { return v + v0*v1; };
auto lambda_float = [v](float v0, float v1) { return v + v0*v1; };
auto res_int = lambda_int(1, 2);
auto res_float = lambda_float(1.0f, 2.0f);

As you might have figured out, these versions are further handled just like regular lambdas.

A new feature of C++20 is that we can use typename instead of just auto for the parameter types of a generic lambda. The following generic lambdas are identical:

// Using auto
auto x = [](auto v) { return v + 1; };
// Using typename
auto y = []<typename Val>(Val v) { return v + 1; };

This makes it possible to name the type or refer to the type inside the body of the lambda.