Book Image

Extreme C

By : Kamran Amini
5 (1)
Book Image

Extreme C

5 (1)
By: Kamran Amini

Overview of this book

There’s a lot more to C than knowing the language syntax. The industry looks for developers with a rigorous, scientific understanding of the principles and practices. Extreme C will teach you to use C’s advanced low-level power to write effective, efficient systems. This intensive, practical guide will help you become an expert C programmer. Building on your existing C knowledge, you will master preprocessor directives, macros, conditional compilation, pointers, and much more. You will gain new insight into algorithm design, functions, and structures. You will discover how C helps you squeeze maximum performance out of critical, resource-constrained applications. C still plays a critical role in 21st-century programming, remaining the core language for precision engineering, aviations, space research, and more. This book shows how C works with Unix, how to implement OO principles in C, and fully covers multi-processing. In Extreme C, Amini encourages you to think, question, apply, and experiment for yourself. The book is essential for anybody who wants to take their C to the next level.
Table of Contents (23 chapters)

Variable pointers

The concept of a variable pointer, or for short pointer, is one of the most fundamental concepts in C. You can hardly find any direct sign of them in most high-level programming languages. In fact, they have been replaced by some twin concepts, for example, references in Java. It is worth mentioning that pointers are unique in the sense that the addresses they point to can be used directly by hardware, but this is not the case for the higher-level twin concepts like references.

Having a deep understanding about pointers and the way they work is crucial to become a skilled C programmer. They are one of the most fundamental concepts in memory management, and despite their simple syntax, they have the potential to lead to a disaster when used in a wrong way. We will cover memory management-related topics in Chapter 4, Process Memory Structure, and Chapter 5, Stack and Heap, but here in this chapter, we want to recap everything about pointers. If you feel confident about the basic terminology and the concepts surrounding the pointers, you can skip this section.

Syntax

The idea behind any kind of pointer is very simple; it is just a simple variable that keeps a memory address. The first thing you may recall about them is the asterisk character, *, which is used for declaring a pointer in C. You can see it in example 1.9. The following code box demonstrates how to declare and use a variable pointer:

int main(int argc, char** argv) {
  int var = 100;
  int* ptr = 0;
  ptr = &var;
  *ptr = 200;
  return 0;
}

Code Box 1-19 [ExtremeC_examples_chapter1_9.c]: Example on how to declare and use a pointer in C

The preceding example has everything you need to know about the pointer's syntax. The first line declares the var variable on top of the Stack segment. We will discuss the Stack segment in Chapter 4, Process Memory Structure. The second line declares the pointer ptr with an initial value of zero. A pointer which has the zero value is called a null pointer. As long as the ptr pointer retains its zero value, it is considered to be a null pointer. It is very important to nullify a pointer if you are not going to store a valid address upon declaration.

As you see in Code Box 1-19, no header file is included. Pointers are part of the C language, and you don't need to have anything included to be able to use them. Indeed, we can have C programs, which do not include any header file at all.

All of the following declarations are valid in C:

int* ptr = 0;

int * ptr = 0;

int *ptr = 0;

The third line in the main function introduces the & operator, which is called the referencing operator. It returns the address of the variable next to it. We need this operator to obtain the address of a variable. Otherwise, we cannot initialize pointers with valid addresses.

On the same line, the returned address is stored into the ptr pointer. Now, the ptr pointer is not a null pointer anymore. On the fourth line, we see another operator prior to the pointer, which is called the dereferencing operator and denoted by *. This operator allows you to have indirect access to the memory cell that the ptr pointer is pointing to. In other words, it allows you to read and modify the var variable through the pointer that is pointing to it. The fourth line is equivalent to the var = 200; statement.

A null pointer is not pointing to a valid memory address. Therefore, dereferencing a null pointer must be avoided because it is considered as an undefined behavior, which usually leads to a crash.

As a final note regarding the preceding example, we usually have the default macro NULL defined with value 0, and it can be used to nullify pointers upon declaration. It is a good practice to use this macro instead of 0 directly because it makes it easier to distinguish between the variables and the pointers:

char* ptr = NULL;

Code Box 1-20: Using the NULL macro to nullify a pointer

The pointers in C++ are exactly the same as in C. They need to be nullified by storing 0 or NULL in them, but C++11 has a new keyword for initializing the pointers. It is not a macro like NULL nor an integer like 0. The keyword is nullptr and can be used to nullify the pointers or check whether they are null. The following example demonstrates how it is used in C++11:

char* ptr = nullptr;

Code Box 1-21: Using nullptr to nullify a pointer in C++11

It is crucial to remember that pointers must be initialized upon declaration. If you don't want to store any valid memory address while declaring them, don't leave them uninitialized. Make it null by assigning 0 or NULL! Do this otherwise you may face a fatal bug!

In most modern compilers, an uninitialized pointer is always nullified. This means that the initial value is 0 for all uninitialized pointers. But this shouldn't be considered as an excuse to declare pointers without initializing them properly. Keep in mind that you are writing code for different architectures, old and new, and this may cause problems on legacy systems. In addition, you will get a list of errors and warnings for these kinds of uninitialized pointers in most memory profilers. Memory profilers will be explained thoroughly as part of Chapter 4, Process Memory Structure, and Chapter 5, Stack and Heap.

Arithmetic on variable pointers

The simplest picture of memory is a very long one-dimensional array of bytes. With this picture in mind, if you're standing on one byte, you can only go back and forth in the array; there's no other possible movement. So, this would be the same for the pointers addressing different bytes in the memory. Incrementing the pointer makes the pointer go forward and decrementing it makes the pointer go backward. No other arithmetic operation is possible for the pointers.

Like we said previously, the arithmetic operations on a pointer are analogous to the movements in an array of bytes. We can use this figure to introduce a new concept: the arithmetic step size. We need to have this new concept because when you increment a pointer by 1, it might go forward more than 1 byte in the memory. Each pointer has an arithmetic step size, which means the number of bytes that the pointer will move if it is incremented or decremented by 1. This arithmetic step size is determined by the C data type of the pointer.

In every platform, we have one single unit of memory and all pointers store the addresses inside that memory. So, all pointers should have equal sizes in terms of bytes. But this doesn't mean that all of them have equal arithmetic step sizes. As we mentioned earlier, the arithmetic step size of a pointer is determined by its C data type.

For example, an int pointer has the same size as a char pointer, but they have different arithmetic step sizes. int* usually has a 4-byte arithmetic step size and char* has a 1-byte arithmetic step size. Therefore, incrementing an integer pointer makes it move forward by 4 bytes in the memory (adds 4 bytes to the current address), and incrementing a character pointer makes it move forward by only 1 byte in the memory. The following example, example 1.10, demonstrates the arithmetic step sizes of two pointers with two different data types:

#include <stdio.h>
int main(int argc, char** argv) {
  int var = 1;
  int* int_ptr = NULL; // nullify the pointer
  int_ptr = &var;
  char* char_ptr = NULL;
  char_ptr = (char*)&var;
  printf("Before arithmetic: int_ptr: %u, char_ptr: %u\n",
          (unsigned int)int_ptr, (unsigned int)char_ptr);
  int_ptr++;    // Arithmetic step is usually 4 bytes
  char_ptr++;   // Arithmetic step in 1 byte
  printf("After arithmetic: int_ptr: %u, char_ptr: %u\n",
          (unsigned int)int_ptr, (unsigned int)char_ptr);
  return 0;
}

Code Box 1-22 [ExtremeC_examples_chapter1_10.c]: Arithmetic step sizes of two pointers

The following shell box shows the output of example 1.10. Note that the printed addresses can be different for two successive runs on the same machine, and even from a platform to another, therefore you probably observe different addresses in your output:

$ gcc ExtremeC_examples_chapter1_10.c
$ ./a.out
Before arithmetic: int_ptr: 3932338348, char_ptr: 3932338348
After arithmetic:  int_ptr: 3932338352, char_ptr: 3932338349
$

Shell Box 1-4: Output of example 1.10 after first run

It is clear from the comparison of the addresses before and after the arithmetic operations that the step size for the integer pointer is 4 bytes, and it is 1 byte for the character pointer. If you run the example again, the pointers probably refer to some other addresses, but their arithmetic step sizes remain the same:

$ ./a.out
Before arithmetic: int_ptr: 4009638060, char_ptr: 4009638060
After arithmetic:  int_ptr: 4009638064, char_ptr: 4009638061
$

Shell Box 1-5: Output of example 1.10 after second run

Now that you know about the arithmetic step sizes, we can talk about a classic example of using pointer arithmetic to iterate over a region of memory. The examples 1.11 and 1.12 are about to print all the elements of an integer array. The trivial approach without using the pointers is brought in example 1.11, and the solution based on pointer arithmetic is given as part of example 1.12.

The following code box shows the code for example 1.11:

#include <stdio.h>
#define SIZE 5
int main(int argc, char** argv) {
 int arr[SIZE];
 arr[0] = 9;
 arr[1] = 22;
 arr[2] = 30;
 arr[3] = 23;
 arr[4] = 18;
 for (int i = 0; i < SIZE; i++) {
   printf("%d\n", arr[i]);
 }
 return 0;
}

Code Box 1-23 [ExtremeC_examples_chapter1_11.c]: Iterating over an array without using pointer arithmetic

The code in Code Box 1-23 should be familiar to you. It just uses a loop counter to refer to a specific index of the array and read its content. But if you want to use pointers instead of accessing the elements via the indexer syntax (an integer between [ and ]), it should be done differently. The following code box demonstrates how to use pointers to iterate over the array boundary:

#include <stdio.h>
#define SIZE 5
int main(int argc, char** argv) {
 int arr[SIZE];
 arr[0] = 9;
 arr[1] = 22;
 arr[2] = 30;
 arr[3] = 23;
 arr[4] = 18;
 int* ptr = &arr[0];
 for (;;) {
   printf("%d\n", *ptr);
   if (ptr == &arr[SIZE - 1]) {
     break;
   }
   ptr++;
 }
 return 0;
}

Code Box 1-24 [ExtremeC_examples_chapter1_12.c]: Iterating over an array using pointer arithmetic

The second approach, demonstrated in Code Box 1-24, uses an infinite loop, which breaks when the address of the ptr pointer is the same as the last element of the array.

We know that arrays are adjacent variables inside the memory, so incrementing and decrementing a pointer which is pointing to an element effectively makes it move back and forth inside the array and eventually point to a different element.

As is clear from the preceding code, the ptr pointer has the data type int*. That's because of the fact that it must be able to point to any individual element of the array which is an integer of type int. Note that all the elements of an array are from the same type hence they have equal sizes. Therefore, incrementing the ptr pointer makes it point to the next element inside the array. As you see, before the for loop, ptr points to the first element of the array, and by further increments, it moves forward along the array's memory region. This is a very classic usage of pointer arithmetic.

Note that in C, an array is actually a pointer that points to its first element. So, in the example, the actual data type of arr is int*. Therefore, we could have written the line as follows:

int* ptr = arr;

Instead of the line:

int* ptr = &arr[0];

Generic pointers

A pointer of type void* is said to be a generic pointer. It can point to any address like all other pointers, but we don't know its actual data type hence we don't know its arithmetic step size. Generic pointers are usually used to hold the content of other pointers, but they forget the actual data types of those pointers. Therefore, a generic pointer cannot be dereferenced, and one cannot do arithmetic on it because its underlying data type is unknown. The following example, example 1.13, shows us that dereferencing a generic pointer is not possible:

#include <stdio.h>
int main(int argc, char** argv) {
 int var = 9;
 int* ptr = &var;
 void* gptr = ptr;
 printf("%d\n", *gptr);
 return 0;
}

Code Box 1-25 [ExtremeC_examples_chapter1_13.c]: Dereferencing a generic pointer generates a compilation error!

If you compile the preceding code using gcc in Linux, you will get the following error:

$ gcc ExtremeC_examples_chapter1_13.c
In function 'main':warning: dereferencing 'void *' pointer
  printf("%d\n", *gptr);
                 ^~~~~
error: invalid use of void expression
  printf("%d\n", *gptr);
$

Shell Box 1-6: Compiling example 1.13 in Linux

And if you compile it using clang in macOS, the error message is different, but it refers to the same issue:

$ clang ExtremeC_examples_chapter1_13.c
error: argument type 'void' is incomplete
  printf("%d\n", *gptr);
                 ^
1 error generated.
$

Shell Box 1-7: Compiling example 1.13 in macOS

As you see, both compilers don't accept dereferencing a generic pointer. In fact, it is meaningless to dereference a generic pointer! So, what are they good for? In fact, generic pointers are very handy to define generic functions that can accept a wide range of different pointers as their input arguments. The following example, example 1.14, tries to uncover the details regarding generic functions:

#include <stdio.h>
void print_bytes(void* data, size_t length) {
  char delim = ' ';
  unsigned char* ptr = data;
  for (size_t i = 0; i < length; i++) {
    printf("%c 0x%x", delim, *ptr);
    delim = ',';
    ptr++;
  }
  printf("\n");
}
int main(int argc, char** argv) {
 int a = 9;
 double b = 18.9;
 print_bytes(&a, sizeof(int));
 print_bytes(&b, sizeof(double));
 return 0;
}

Code Box 1-26 [ExtremeC_examples_chapter1_14.c]: An example of a generic function

In the preceding code box, the print_bytes function receives an address as a void* pointer and an integer indicating the length. Using these arguments, the function prints all the bytes starting from the given address up to the given length. As you see, the function accepts a generic pointer, which allows the user to pass whatever pointer they want. Keep in mind that assignment to a void pointer (generic pointer) does not need an explicit cast. That is why we have passed the addresses of a and b without explicit casts.

Inside the print_bytes function, we have to use an unsigned char pointer in order to move inside the memory. Otherwise, we cannot do any arithmetic on the void pointer parameter, data, directly. As you may know, the step size of a char* or unsigned char* is one byte. So, it is the best pointer type for iterating over a range of memory addresses one byte at a time and processing all of those bytes one by one.

As a final note about this example, size_t is a standard and unsigned data type usually used for storing sizes in C.

size_t is defined in section 6.5.3.4 of the ISO/ICE 9899:TC3 standard. This ISO standard is the famous C99 specification revised in 2007. This standard has been the basis for all C implementations up until today. The link to ISO/ICE 9899:TC3 (2007) is http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf.

Size of a pointer

If you search for the size of a pointer in C on Google, you may realize that you cannot find a definitive answer to that. There are many answers out there, and it is true that you cannot define a fixed size for a pointer in different architectures. The size of a pointer depends on the architecture rather than being a specific C concept. C doesn't worry too much about such hardware-related details, and it tries to provide a generic way of working with pointers and other programming concepts. That is why we know C as a standard. Only pointers and the arithmetic on them are important to C.

Architecture refers to the hardware used in a computer system. You will find more details in the upcoming chapter, Compilation and Linking.

You can always use the sizeof function to obtain the size of a pointer. It is enough to see the result of sizeof(char*) on your target architecture. As a rule of thumb, pointers are 4 bytes in 32-bit architectures and 8 bytes in 64-bit architectures, but you may find different sizes in other architectures. Keep in mind that the code you write should not be dependent on a specific value for the size of a pointer, and it should not make any assumptions about it. Otherwise, you will be in trouble while porting your code to other architectures.

Dangling pointers

There are many known issues caused by misusing pointers. The issue of dangling pointers is a very famous one. A pointer usually points to an address to which there is a variable allocated. Reading or modifying an address where there is no variable registered is a big mistake and can result in a crash or a segmentation fault situation. Segmentation fault is a scary error that every C/C++ developer should have seen it at least once while working on code. This situation usually happens when you are misusing pointers. You are accessing places in memory that you are not allowed to. You had a variable there before, but it is deallocated by now.

Let's try to produce this situation as part of the following example, example 1.15:

#include <stdio.h>
int* create_an_integer(int default_value) {
  int var = default_value;
  return &var;
}
int main() {
  int* ptr = NULL;
  ptr = create_an_integer(10);
  printf("%d\n", *ptr);
  return 0;
}

Code Box 1-27 [ExtremeC_examples_chapter1_15.c]: Producing a segmentation fault situation

In the preceding example, the create_an_integer function is used to create an integer. It declares an integer with a default value and returns its address to the caller. In the main function, the address of the created integer, var, is received and it gets stored in the ptr pointer. Then, the ptr pointer is dereferenced, and the value stored in the var variable gets printed.

But things are not that easy. When you want to compile this code using the gcc compiler on a Linux machine, it generates a warning as follows, but still successfully finishes the compilation, and you get the final executable:

$ gcc ExtremeC_examples_chapter1_15.c
In function 'f':
warning: function returns address of local variable [-Wreturn-local-addr]
   return &var;
          ^~~~
$

Shell Box 1-8: Compiling example 1.15 in Linux

This is indeed an important warning message which can be easily missed and forgotten by the programmer. We'll talk more about this later as part of Chapter 5, Stack and Heap. Let's see what happens if we proceed and execute the resulting executable.

When you run example 1.15, you get a segmentation fault error, and the program crashes immediately:

$ ./a.out
Segmentation fault (core dumped)
$

Shell Box 1-9: Segmentation fault when running example 1.15

So, what went wrong? The ptr pointer is dangling and points to an already deallocated portion of memory that was known to be the memory place of the variable, var.

The var variable is a local variable to the create_an_integer function, and it will be deallocated after leaving the function, but its address can be returned from the function. So, after copying the returned address into ptr as part of the main function, ptr becomes a dangling pointer pointing to an invalid address in memory. Now, dereferencing the pointer causes a serious problem and the program crashes.

If you look back at the warning generated by the compiler, it is clearly stating the problem.

It says that you are returning the address of a local variable, which will be deallocated after returning from the function. Smart compiler! If you take these warnings seriously, you won't face these scary bugs.

But what is the proper way to rewrite the example? Yes, using the Heap memory. We will cover heap memory fully in Chapter 4, Process Memory Structure, and Chapter 5, Stack and Heap, but, for now, we will rewrite the example using Heap allocation, and you will see how you can benefit from using Heap instead of the Stack.

Example 1.16 below shows how to use Heap memory for allocating variables, and enabling the passing addresses between functions without facing any issues:

#include <stdio.h>
#include <stdlib.h>
int* create_an_integer(int default_value) {
  int* var_ptr = (int*)malloc(sizeof(int));
  *var_ptr = default_value;
  return var_ptr;
}
int main() {
  int* ptr = NULL;
  ptr = create_an_integer(10);
  printf("%d\n", *ptr);
  free(ptr);
  return 0;
}

Code Box 1-28 [ExtremeC_examples_chapter1_16.c]: Rewriting example 1.15 using Heap memory

As you see in the preceding code box, we have included a new header file, stdlib.h, and we are using two new functions, malloc and free. The simple explanation is like this: the created integer variable inside the create_an_integer function is not a local variable anymore. Instead, it is a variable allocated from the Heap memory and its lifetime is not limited to the function declaring it. Therefore, it can be accessed in the caller (outer) function. The pointers pointing to this variable are not dangling anymore, and they can be dereferenced as long as the variable exists and is not freed. Eventually, the variable becomes deallocated by calling the free function as an end to its lifetime. Note that deallocating a Heap variable is mandatory when it is not needed anymore.

In this section, we went through all the essential discussions regarding variable pointers. In the upcoming section, we'll be talking about functions and their anatomy in C.