Book Image

Hands-On System Programming with C++

By : Dr. Rian Quinn
Book Image

Hands-On System Programming with C++

By: Dr. Rian Quinn

Overview of this book

C++ is a general-purpose programming language with a bias toward system programming as it provides ready access to hardware-level resources, efficient compilation, and a versatile approach to higher-level abstractions. This book will help you understand the benefits of system programming with C++17. You will gain a firm understanding of various C, C++, and POSIX standards, as well as their respective system types for both C++ and POSIX. After a brief refresher on C++, Resource Acquisition Is Initialization (RAII), and the new C++ Guideline Support Library (GSL), you will learn to program Linux and Unix systems along with process management. As you progress through the chapters, you will become acquainted with C++'s support for IO. You will then study various memory management methods, including a chapter on allocators and how they benefit system programming. You will also explore how to program file input and output and learn about POSIX sockets. This book will help you get to grips with safely setting up a UDP and TCP server/client. Finally, you will be guided through Unix time interfaces, multithreading, and error handling with C++ exceptions. By the end of this book, you will be comfortable with using C++ to program high-quality systems.
Table of Contents (16 chapters)

Benefits of using C++ when system programming

Although the focus of this book is on system programming and not C++, and we do provide a lot of examples in C, there are several benefits to system programming in C++ compared to standard C.

Note that this section assumes some general knowledge of C++. A more complete explanation of the C++ standard will be provided in Chapter 2, Learning the C, C++17, and POSIX Standards.

Type safety in C++

Standard C is not a type-safe language. Type safety refers to protections put in place to prevent one type from being confused with another type. Some languages, such as ADA, are extremely type-safe, providing so many protections that the language, at times, can be frustrating to work with.

Conversely, languages such as C are so type-unsafe that hard-to-find type errors occur frequently, and often lead to instability.

C++ provides a compromise between the two approaches, encouraging reasonable type safety by default, while providing mechanisms to circumvent this when needed.

For example, consider the following code:

/* Example: C */
int *p = malloc(sizeof(int));

// Example: C++
auto p = new int;

Allocating an integer on the heap in C requires the use of malloc(), which returns void *. There are several issues with this code that are addressed in C++:

  • C automatically converts the void * type to int *, meaning that an implicit type conversion has occurred even though there is no connection between the type the user stated and the type returned. The user could easily allocate short (which is not the same thing as int, a topic we will discuss in Chapter 3, System Types for C and C++). The type conversion would still be applied, meaning that the compiler would not have the proper context to detect that the allocation was not large enough for the type the user was attempting to allocate.
  • The size of the allocation must be stated by the programmer. Unlike C++, C has no understanding of the type that is being allocated. Thus, it is unaware of the size of the type, and so the programmer must explicitly state this. The problem with this approach is that hard-to-find allocation bugs can be introduced. Often, the type that is provided to sizeof() is incorrect (for example, the programmer might provide a pointer instead of the type itself, or the programmer might change the code later on, but forget to change the value being provided to sizeof()). As stated previously, there is no connection between what malloc() allocates and returns, and the type the user attempts to allocate, providing an opportunity to introduce a hard-to-find logic error.
  • The type must be explicitly stated twice. malloc() returns void *, but C implicitly converts to whatever pointer type the user stateswhich means a type has been declared twice (in this case, void * and int *). In C++, the use of auto means that the type is only declared once (in this case, int states the type is an int *), and auto will take on whatever type is returned. The use of auto and the removal of implicit type conversions means whatever type is declared in the allocation is what the p variable will take on. If the code after this allocation expects a different type to the one p takes on, the compiler will know about it at compile time in C++, while a bug like this would likely not be caught in C until runtime, when the program crashes (we hope this code is not controlling an airplane!).

In addition to the preceding example of the dangers of implicit type casting, C++ also provides run-time type information (RTTI). This information has many uses, but the most important use case involves the dynamic_cast<> operator, which performs runtime type checking.

Specifically, converting from one type to another can be checked during runtime, to ensure a type error doesn't occur. This is often seen when performing the following:

  • Polymorphic type conversions: In C, polymorphism is possible, but it must be done manually, a pattern that is seen often in kernel programming. C, however, doesn't have the ability to determine whether a pointer was allocated for a base type or not, resulting in the potential for a type error. Conversely, C++ is capable of determining at runtime whether a provided pointer is being cast to the proper type, including when using polymorphism.
  • Exception support: When catching an exception, C++ uses RTTI (essentially dynamic_cast<>), to ensure that the exception being thrown is caught by the proper handler.

Objects of C++

Although C++ supports object-oriented programming with built-in constructs, object-oriented programming is a design pattern that is often used in C as well, and in POSIX in general. Take the following example:

/* Example: C */

struct point
{
int x;
int y;
};

void translate(point *p; int val)
{
if (p == NULL) {
return;
}

p->x += val;
p->y += val;
}

In the preceding example, we have a struct that stores a point{}, which contains x and y positions. We then offer a function that is capable of translating this point{} in both the x and y positions, using a given value (that is, a diagonal translation).

There are a couple of notes with respect to this example:

  • Often, people will claim to dislike object-oriented programming, but then you see this sort of thing in their code, which is, in fact, an object-oriented design. The use of class isn't the only way to create an object-oriented design. The difference with C++ is that the language provides additional constructs for cleanly and safely working with objects, while with C this same functionality must be done by handa process that is prone to error.
  • The translate() function is only related to the point{} object because it takes a point{} as a parameter. As a result, the compiler has no contextual information to understand how to manipulate a point{} struct, without translate() being given a pointer to it as a parameter. This means that every single public-facing function that wishes to manipulate a point{} struct must take a pointer to it as its first parameter, and verify that the pointer is valid. Not only is this a clunky interface, it's slow.

In C++, the preceding example can be written as the following:

// Example: C++

struct point
{
int x;
int y;

void translate(int val)
{
p->x += val;
p->y += val;
}
};

In this example, a struct is still used. The only difference between a class and a struct in C++ is that all variables and functions are public by default with a struct, while they are private by default with a class.

The difference is that the translate() function is a member of point{}, which means it has access to the contents of its structure, and so no pointers are needed to perform the translation. As a result, this code is safer, more deterministic, and easier to reason about, as there is never the fear of a null dereference.

Finally, objects in C++ provide construction and destruction routines that help prevent objects from not being properly initialized or properly deconstructed. Take the following example:

// Example: C++

struct myfile
{
int fd{0};

~myfile() {
close(fd);
}
};

In the preceding example, we create a custom file object that holds a file descriptor, often seen and used when system programming with POSIX APIs.

In C, the programmer would have to remember to manually set the file descriptor to 0 on initialization, and close the file descriptor when it is no longer in scope. In C++, using the preceding example, both of these operations would be done for you any time you use myfile.

This is an example of the use of Resource Acquisition Is Initialization (RAII), a topic that will be discussed in more detail in Chapter 4, C++, RAII, and the GSL Refresher, as this pattern is used a lot by C++. We will leverage this technique when system programming to avoid a lot of common POSIX-style pitfalls.

Templates used in C++

Template programming is often an undervalued, misunderstood addition to C++ that is not given enough credit. Most programmers need to look no further than attempting to create a generic linked list to understand why.

C++ templates provides you with the ability to define your code without having to define type information ahead of time.

One way to create a linked list in C is to use pointers and dynamic memory allocation, as seen in this simple example:

struct node 
{
void *data;
node next;
};

void add_data(node *n, void *val);

In the preceding example, we store data in the linked list using void *. An example of how to use this is as follows:

node head;
add_data(&head, malloc(sizeof(int)));
*(int*)head.data = 42;

There are a few issues with this approach:

  • This type of linked list is clearly not type-safe. The use of the data and the data's allocation are completely unrelated, requiring the programmer using this linked list to manage all of this without error.
  • A dynamic memory allocation is needed for both the nodes and the data. As was discussed earlier, memory allocations are slow as they require system calls.
  • In general, this code is hard to read and clunky.

Another way to create a generic linked list is to use macros. There are several implementations of these types of linked lists (and other data structures) floating around on the internet, which provide a generic implementation of a linked list without the need for dynamically allocating data. These macros provide the user with a way to define the data type the linked list will manage at compile time.

The problem with these approaches, other than reliability, is these implementations use macros to implement template programming in a way that is far less elegant. In other words, the solution to adding generic data structures to C is to use C's macro language to manually implement template programming. The programmer would be better off just using C++ templates.

In C++, a data structure like a linked list can be created without having to declare the type the linked list is managing until it is declared, as follows:

template<typename T>
class mylinked_list
{
struct node
{
T data;
node *next;
};

public:

...

private:

node m_head;
};

In the preceding example, not only are we able to create a linked list without macros or dynamic allocations (and all the problems that come with the use of void * pointers), but we are also able to encapsulate the functionality, providing a cleaner implementation and user API.

One complaint that is often made about template programming is the amount of code it generates. Most code bloat from templates typically originates as a programming error. For example, a programmer might not realize that integers and unsigned integers are not the same types, resulting in code bloat when templates are used (as a definition for each type is created).

Even aside from that issue, the use of macros would produce the same code bloat. There is no free lunch. If you want to avoid the use of dynamic allocation and type casting while still providing generic algorithms, you have to create an instance of your algorithm for each type you plan to use. If reliability is your goal, allowing the compiler to generate the code needed to ensure your program executes properly outweighs the disadvantages.

Functional programming associated with C++

Functional programming is another addition to C++ that provides the user with compiler assistance, in the form of lambda functions. Currently, this must be carried out by hand in C.

In C, a functional programming construct can be achieved using a callback. For example, consider the following code:

void
guard(void (*ptr)(int *val), int *val)
{
lock();
ptr(val);
unlock();
}

void
inc(int *val)
{
*val++;
}

void
dec(int *val)
{
*val--;
}

void
foo()
{
int count = 0;
guard(inc, &count);
guard(dec, &count);
}

In the preceding code example, we create a guard function that locks a mutex, calls a function that operates on a value, and then unlocks the mutex on exit. We then create two functions, one that increments a value given to it, and one that decrements a value given to it. Finally, we create a function that instantiates a count, and then increments the count and decrements the count using the guard function.

There are a couple of issues with this code:

  • The first issue is the need for pointer logic to ensure we can manipulate the variable we wish to operate on. We are also required to manually pass this pointer around to keep track of it. This makes the APIs clunky, as we have a lot of extra code that we have to write manually for such a simple example.
  • The function signature of the helper functions is static. The guard function is a simple one. It locks a mutex, calls a function, and then unlocks it. The problem is that, since the parameters of the function must be known while writing the code instead of at compile time, we cannot reuse this function for other tasks. We will need to hand-write the same function for each function signature type we plan to support.

The same example can be written using C++ as follows:

template<typename FUNC>
guard(FUNC f)
{
lock();
f();
unlock();
}

void
foo()
{
int count = 0;
guard(inc, [&]{ count++ });
guard(inc, [&]{ count-- });
}

In the preceding example, the same functionality is provided, but without the need for pointers. In addition, the guard function is generic and can be used for more than one case. This is accomplished by leveraging both template programming and functional programming.

The lambda provides the callback, but the parameters of the callback are encoded into the lambda's function signature, which is absorbed by the use of a template function. The compiler is capable of generating a version of the guard function for use that takes the parameters (in this case, a reference to the count variable) and storing it in the code itself, removing the need for users to do this by hand.

The preceding example will be used a lot in this book, especially when creating benchmarking examples, as this pattern gives you the ability to wrap functionality in code designed to time the execution of your callback.

Error handling mechanism in C++

Error handling is another issue with C. The problem, at least until set jump exceptions were added, was that the only ways to get an error code from a function were as follows:

  • Constrain the output of a function, so that certain output values from the function could be considered an error
  • Get the function to return a structure, and then manually parse that structure

For example, consider the following code:

struct myoutput 
{
int val;
int error_code;
}

struct myoutput myfunc(int val)
{
struct myoutput = {0};

if (val == 42) {
myoutput.error_code = -1;
}

myoutput.val = val;
return myoutput;
}

void
foo(void)
{
struct myoutput = myfunc(42);

if (myoutput.error_code == -1) {
printf("yikes\n");
return;
}
}

The preceding example provides a simple mechanism for outputting an error from a function without having to constrain the output of the function (for example, by assuming that -1 is always an error).

In C++, this can be implemented using the following C++17 logic:

std::pair<int, int>
myfunc(int val)
{
if (val == 42) {
return {0, -1};
}

return {val, 0};
}

void
foo(void)
{
if (auto [val, error_code] = myfunc(42); error_code == -1) {
printf("yikes\n");
return;
}
}

In the preceding example, we were able to remove the need for a dedicated structure by leveraging std::pair{}, and we were able to remove the need to work with std::pair{} by leveraging an initializer_list{} and C++17-structured bindings.

There is, however, an even easier method for handling errors without the need for checking the output of every function you execute, and that is to use exceptions. C provides exceptions through the set jump API, while C++ provides C++ exception support. Both of these will be discussed at length in Chapter 13, Error - Handling with Exceptions.

APIs and C++ containers in C++

As well as the language primitives that C++ provides, it also comes with a Standard Template Library (STL) and associated APIs that greatly aid system programming. A good portion of this book will focus on these APIs, and how they support system programming.

It should be noted that the focus of this book is system programming and not C++, and for this reason, we do not cover C++ containers in any detail, but instead assume the reader has some general knowledge of what they are and how they work. With that said, C++ containers support system programming by preventing the user from having to re-write them manually.

We teach students how to write their own data structures, not so that when they need a data structure they know how to write one, but instead so that, when they need one, they know which data structure to use and why. C++ already provides most, if not all, of the data structures you might need when system programming.