-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
The C++ Programmer's Mindset
By :
There are many reasons to use C++ for solving problems, but the main reason for choosing C++ is that you need to make high-performance solutions. Modern C++ has many features that make it easy to write fast and (relatively) safe code without taking away control of the low-level primitives that the programmer can use to achieve the best possible performance. In this section, we will look at some of the features of modern C++ that can be used to write high-quality code for solving complex problems without compromising performance.
Before we start, we need to make something clear. Just because C++ provides the tools for micro-optimizing your code, that doesn’t mean that you should be using them. Modern compilers are far better at producing optimized machine code than even very experienced programmers writing hand-crafted assembly code. Trust the compiler toolchain to optimize your code and only spend time making micro-optimizations if it is absolutely necessary. Remember, you need to keep the bigger picture in mind, and over-optimizing one part of the code probably means that you are neglecting another.
That being said, it is important to understand that not all code is going to produce optimal performance. We have already seen an example where the C++ code is unlikely to achieve maximum performance with the recursive algorithm for parsing strings. But, as we mentioned in the commentary on that algorithm, the first task is to solve the problem and obtain a correct solution. Then, and only then, should you consider whether the algorithm has the desired performance characteristics.
It is a good idea to keep track of the parts of the code where performance will really matter. For example, any tight loops that perform an operation on (potentially) large sets of data are likely to need to be optimized, but a function that obtains records from a database is not, since this will always be constrained by the connection to the database. This allows you to focus on the most important parts when it comes to optimizing your code.
Another thing we need to address early is the issue of memory management. Do not manage your memory by hand with new and delete – or worse, with malloc and free. This is a recipe for creating memory leaks and invoking undefined behavior. Use standard containers such as std::vector, smart pointers (std::unique_ptr, std::shared_ptr), and other mechanisms provided by the standard template library (or other high-quality libraries such as Boost and Abseil). This is especially true if you make use of multithreading, where it is essential that your memory management is thread-safe.
This is a good point to discuss how we will configure and build our C++ projects throughout this book. CMake is a cross-platform build-system generator that constructs a set of build files (Makefiles, Ninja configurations, or otherwise) from a source file called CMakeLists.txt in the project root. The syntax can be a little frustrating at first, but you will quickly get used to it (if you aren’t already).
Modern CMake organizes code and dependencies into targets, which are usually either static or shared libraries or executables. It has a sophisticated mechanism for finding dependencies with its find_package function, which can then be linked to our targets, providing the necessary include directories and link lines necessary to successfully integrate the functionality of the dependency into our project. CMake also provides various functions for controlling the configuration of the compiler, such as setting the appropriate flags for a particular C++ standard in a portable way. This really takes the pain out of configuring cross-platform builds. A very basic skeleton for a C++ project, CMakeLists.txt, is as follows.
cmake_minimum_required(VERSION 3.30)
project(MyProject)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
add_executable(MyExecutable main.cpp)
The first line specifies the version of CMake that is required to configure the project. (At the time of writing, 3.30 was a fairly recent release of CMake.) The next line declares the project, which is the point at which CMake performs some background tasks such as finding the compiler and checking various settings. The next two lines set the C++ standard and set this standard to be required, which will cause CMake to emit an error if the compiler does not support this standard. Finally, we add an executable, called MyExecutable, which has one source file called main.cpp attached to it. We can link dependencies, external or other targets we declare in the CMake file, using this line:
target_link_libraries(MyExecutable PRIVATE MyDep)
Here, MyDep is the name of a target (either constructed using add_library or via a call to find_package) that should be linked. The PRIVATE specifier declares that the link information does not need to be propagated along with our target. This is sensible for an executable, which cannot be linked by another target, but it might not be appropriate for library targets.
To configure the build system, one uses the cmake executable from the command line or via an integration with your IDE of choice. (CLion has excellent CMake support, and there is a CMake extension of VSCode. Visual Studio also supports CMake projects.) On the command line, one can use the following invocation to configure and build the project in release mode.
cmake -B out/Release -S . -DCMAKE_BUILD_TYPE=Release
cmake --build out/Release --config=Release
The configured build files are placed in the out/Release directory, as specified by the –B argument, and the source file is specified with the –S argument (using ‘.' for the current working directory). The final argument sets the build type to release settings for the configuration. On most build systems, this is sufficient to build in release mode, but some build systems, such as MSBuild, support multiple configurations, in which case the --config=Release argument on the following line becomes necessary.
One of the advantages of CMake is that one can attach several different package managers, such as vcpkg or conan, to make obtaining, finding, and linking dependencies easier. Moreover, CMake is more feature-complete and easier to use than some of the similar tools that exist, such as Bazel and Meson. The documentation is quite readable, and it is extremely flexible.
Throughout the remainder of this book, and in the corresponding code repository, you’ll see many examples of CMake files and how to use them, particularly in Chapter 7 and Chapter 12.
The C++20 standard brought many features to C++ that had been standard in other languages for many years. These include copy-less memory views (string_view and span), along with ranges and constrained algorithms. These are great improvements over the iterator-based interfaces that existed before, as they allow for cleaner and safer code.
String views and spans can be thought of as ranges in which the elements are stored contiguously in memory. (All the bytes are stored together and in order in a single block with no gaps between them.) String views are immutable in that the elements in the range cannot be modified. (Modifying strings in-place is dangerous because a new UTF-8 character might require more space than the character that it replaces, forcing a new allocation.) Spans provide mutable or immutable access to the block of elements.
A range is an abstraction on top of the usual iterator access to containers. Loosely speaking, a range is any object that exposes a begin and end that allow sequential access to the elements of the container; a far better description can be found at https://en.cppreference.com/w/cpp/ranges. The power of ranges comes from the fact that they can be composed with views, which provide modifications to the underlying iterator range. For example, the enumerate view modifies the range to return a pair of index and value. Combined with a range-based loop, these make for some very simple and easy-to-read code. For example, we could rewrite the more optimized version of the word-finding function from before using this mechanism as follows.
size_t end_of_first_word(string_view s) noexcept {
int depth = 0;
for (const auto [position, char] : std::views::enumerate(s)) {
switch (char) {
case '[': ++depth; break;
case ']': --depth;
default:
if (depth == 0) {
return position;
}
}
}
}
This isn’t substantially different from the previous implementation, but it does make the intent clearer. Now it is very obvious that the position variable should be tracking the current index of iteration. Moreover, this has the added benefit of decontaminating the surrounding scope, since the position is initialized inside the context of the range-based for loop. This is not a problem, but it does help keep code clean.
Templates are arguably one of the best features of C++, and also one of the most difficult and frustrating; anyone who has had to debug a template metaprogram bug will understand. The reason they are so powerful is that they allow the coder to write a single piece of code that can apply to many types on demand, without requiring a new implementation for each combination of types that is needed in a given program. There are some downsides to this: template instantiation is expensive for the compiler and very complex, and errors are very difficult to find and debug.
Templates actually form a complete programming language in themselves. They can be used to compute values at compile time, reducing the runtime cost of using those values to effectively zero. A template (class, function, or value) is instantiated by the compiler for each combination of types with which it is used, meaning that the compiler takes the body of the template and replaces the template parameters with the specific types that were provided by the code. For instance, our max_element template function could be instantiated by the following snippet of code. This instantiation is performed recursively, and if at any point in this process the compiler encounters an expression that is not valid, it raises a compiler error. These errors can be very difficult to diagnose because the error could have been caused far away from the first place where it is detected.
Concepts are an extension of the template mechanism that allows the user to declare the exact set of requirements for a template requirement up front. This means the compiler does not need to recursively instantiate the template to find out whether it is valid or not; it just checks whether the concept is valid for the type and emits an error if this is not the case. This leads to better error messages for the coder and potentially improves the speed of compilation. The basic concept for our max_element function might be defined as follows.
template <typename T>
concept OrderableContainer = requires(const T& t) {
// Has a dependent type called "value_type", which is orderd by <
std::totally_ordered<T::value_type>;
// Has a begin and end methods valid on Const T&
t.begin();
t.end();
};
This is not a complete description because we do not specify that the begin and end methods should return iterators. Moreover, we don’t check that the value_type is copy constructible. Fortunately, we don’t need to reinvent the wheel here; we can just extend the forward_range concept from the standard library, as shown here.
template <typename T>
concept OrderableContainer = std::input_range<const T>
&& std::totally_ordered<std::range_value_t<const T>>
&& std::copy_constructible<std::range_value_t<const T>>;
This implementation checks that const T is an input_range, meaning that it has begin and end that return iterators that can read values in a forward iteration pattern, and that the value in this input range is ordered. This is actually significantly more general than the one we defined because ranges might be declared in other ways besides having an iterator to the first element and one past the last element. To account for this generalization, we really should make use of the range library rather than using the begin and end methods directly. This has the added benefit of creating cleaner code, as follows.
template <OrderableContainer Container>
std::range_value_t<Container> max_element(const Container& container) {
auto begin = std::ranges::begin(container);
const auto end = std::ranges::end(container);
if (begin == end) {
throw std::invalid_argument("Container must be non-empty");
}
auto max = *begin;
++begin;
for (; begin != end; ++begin) {
if (max < *begin) {
max = *begin;
}
}
return max;
}
Notice that we still have to check that the container is not empty. This can only be tested at runtime, whereas concepts are a compile-time construction. The only real difference is using the ranges::begin and ranges::end functions to get the iterators. This is probably overkill for such a simple function, but thinking in terms of concepts can be a great help as you try to formulate abstractions of your problem. Moreover, the more concepts you use, the better experience you will have debugging large, complex bodies of templated code. This covers how to handle data flexibly and efficiently. The next section shows how to handle cases where things go wrong.
You should always account for things that might go wrong when implementing solutions to problems. There are always things that can go wrong: preconditions might be violated, the algorithm might fail to produce an outcome (for instance, if a numerical method fails to converge), data might be ill-formed or in a format that is not supported, there might be imposed limits that are reached (such as limiting the amount of time that can be spent on a single computation). Your implementation should have a mechanism for gracefully handling these kinds of errors that can occur.
It’s worth pointing out the difference between a failure and an error. For instance, if we implement a search algorithm, then this might reasonably fail to find an entry that satisfies the condition. This is a failure, but not an error. An error occurs when the program enters an invalid state or encounters a problem that it cannot handle. Failures should be handled as a routine problem using constructions such as std::optional or returning the end iterator for a failed search. Errors should be propagated to a point where they can be handled gracefully or terminate the application.
For a long time, there were two ways to handle errors in C++.
Ideally, there should be an alternative that is lightweight, like the C-style error codes, but expressive and flexible, like the exception model. In C++23, the expected template was added to the C++ standard library, which allows one to return a single object that can either contain the valid result of a function call or an unexpected (error) value and is never empty. This has the advantage of permitting a very lightweight error handling mechanism that remains local (unless explicitly propagated), which provides a great deal of flexibility, especially on interface boundaries.
Note
If C++23 is not an option, Abseil has the Status template class, which serves a similar function, and Boost has the Leaf library, which has a similar result template class.
Besides errors, you might also consider adding logging capabilities to your code (though not in places where performance is critical). This can be extremely helpful when tracking down bugs or unexpected behavior once the code leaves the development environment. Remember that any code that is “shipped” or incorporated into a package needs to be maintained by somebody (including future you). Any time you can spend to make this process less arduous is time well spent. Adding logging using a standard logging library such as spdlog or equivalent is a low-effort way of providing a wealth of debugging information to users who cannot simply launch a debugger to see what is going on inside the library.
Even very simple software projects should have tests. Every new feature should add new tests. Every reported bug should be confirmed by adding tests. This is the only way to know that your code is “correct” and performs appropriately. There are several layers of tests that you should include: unit tests, integration tests, and end-to-end tests. A complete suite of tests should contain all three kinds of tests that cover as much of the package as feasibly possible.
There are several testing harnesses available in C++. Two of the most common are GoogleTest and Catch2. GoogleTest is a very flexible library, is quite easy to set up and use, and is very extensible. Catch2 is a simpler library that is less flexible but easier to set up and use. Catch2 is a header-only library that does not require linking to external runtimes, whereas GoogleTest requires linking to the gtest.(so|dll) library. The tests included with the code for this book use GoogleTest.
Git (and GitHub) is the de facto standard for maintaining control of source code versions and managing a code base. It is a very effective tool for this task. Even relatively simple software projects should be kept in version control – which doesn’t have to be public or even stored on a server somewhere. This is for a number of reasons. The first is that, at some point, you might need to revert some code to what it was at some point in the past because of mistakes or performance regressions. The second reason, which might not be an issue for very small projects, is sharing your code with a larger team of developers (or even just between your own personal computers).
Services such as GitHub and GitLab provide the ability to run continuous integration testing that can help identify any changes that break existing functionality or otherwise find problems. This also helps to make sure that your code runs on all the different platforms that you support (Windows, Linux, macOS, and various different architectures). No single computer can test all of these configurations on its own. Continuous integration tools make this easy.
Change the font size
Change margin width
Change background colour