Book Image

Modern C++: Efficient and Scalable Application Development

By : Richard Grimes, Marius Bancila
Book Image

Modern C++: Efficient and Scalable Application Development

By: Richard Grimes, Marius Bancila

Overview of this book

C++ is one of the most widely used programming languages. It is fast, flexible, and used to solve many programming problems. This Learning Path gives you an in-depth and hands-on experience of working with C++, using the latest recipes and understanding most recent developments. You will explore C++ programming constructs by learning about language structures, functions, and classes, which will help you identify the execution flow through code. You will also understand the importance of the C++ standard library as well as memory allocation for writing better and faster programs. Modern C++: Efficient and Scalable Application Development deals with the challenges faced with advanced C++ programming. You will work through advanced topics such as multithreading, networking, concurrency, lambda expressions, and many more recipes. By the end of this Learning Path, you will have all the skills to become a master C++ programmer. This Learning Path includes content from the following Packt products: • Beginning C++ Programming by Richard Grimes • Modern C++ Programming Cookbook by Marius Bancila • The Modern C++ Challenge by Marius Bancila
Table of Contents (24 chapters)
Title Page
Copyright
About Packt
Contributors
Preface
12
Math Problems
13
Language Features
14
Strings and Regular Expressions
15
Streams and Filesystems
16
Date and Time
17
Algorithms and Data Structures
Index

Writing C++


C++ is a very flexible language when it comes to formatting and writing code. It is also a strongly typed language, meaning there are rules about declaring the types of variables, which you can use to your advantage by making the compiler help you write better code. In this section, we will cover how to format C++ code and rules on declaring and scoping variables.

Using whitespace

Other than string literals, you have free usage of white space (spaces, tabs, newlines), and are able to use as much or as little as you like. C++ statements are delimited by semicolons, so in the following code there are three statements, which will compile and run:

    int i = 4; 
    i = i / 2; 
    std::cout << "The result is" << i << std::endl;

The entire code could be written as follows:

    int i=4;i=i/2; std::cout<<"The result is "<<i<<std::endl;

There are some cases where whitespace is needed (for example, when declaring a variable you must have white space between the type and the variable name), but the convention is to be as judicious as possible to make the code readable. And while it is perfectly correct, language-wise, to put all the statements on one line (like JavaScript), it makes the code almost completely unreadable.

Note

If you are interested in some of the more creative ways of making code unreadable, have a look at the entries for the annual International Obfuscated C Code Contest (http://www.ioccc.org/). As the progenitor of C++, many of the lessons in C shown at IOCCC apply to C++ code too.

Bear in mind that, if the code you write is viable, it may be in use for decades, which means you may have to come back to the code years after you have written it, and it means that other people will support your code, too. Making your code readable is not only a courtesy to other developers, but unreadable code is always a likely target for replacement.

Formatting Code

Inevitably, whoever you are writing code for will dictate how you format code. Sometimes it makes sense, for example, if you use some form of preprocessing to extract code and definitions to create documentation for the code. In many cases, the style that is imposed on you is the personal preference of someone else.

Note

Visual C++ allows you to place XML comments in your code. To do this you use a three--slash comment (///) and then compile the source file with the /doc switch. This creates an intermediate XML file called an xdc file with a <doc> root element and containing all the three--slash comments. The Visual C++ documentation defines standard XML tags (for example, <param>, <returns> to document the parameters and return value of a function). The intermediate file is compiled to the final document XML file with the xdcmake utility.

There are two broad styles in C++: K&R and Allman.

Kernighan and Ritchie (K&R) wrote the first, and most influential book about C (Dennis Ritchie was the author of the C language). The K&R style is used to describe the formatting style used in that book. In general, K&R places the opening brace of a code block on the same line of the last statement. If your code has nested statements (and typically, it will) then this style can get a bit confusing:

    if (/* some test */) { 
        // the test is true  
        if (/* some other test */) { 
            // second test is true  
        } else { 
            // second test is false    
        } 
    } else { 
        // the test is false  
    }

This style is typically used in Unix (and Unix-like) code.

The Allman style (named after the developer Eric Allman) places the opening brace on a new line, so the nested example looks as follows:

        if (/* some test */)  
        { 
            // the test is true  
            if (/* some other test */)  
            { 
                // second test is true   
            }  
            else  
            { 
                // second test is false     
            } 
        }  
        else  
        { 
           // the test is false  
        }

The Allman style is typically used by Microsoft.

Remember that your code is unlikely to be presented on paper, so the fact that K&R is more compact will save no trees. If you have the choice, you should choose the style that is the most readable; the decision of this author, for this book, is that Allman is more readable.

If you have multiple nested blocks, the indents can give you an idea of which block the code resides in. However, comments can help. In particular, if a code block has a large amount of code, it is often helpful to comment the reason for the code block. For example, in an if statement, it is helpful to put the result of the test in the code block so you know what the variable values are in that block. It is also useful to put a comment on the closing brace of the test:

    if (x < 0)  
    { 
       // x < 0 
       /* lots of code */ 
    }  // if (x < 0) 

    else  
    { 
       // x >= 0 
       /* lots of code */ 
    }  // if (x < 0)

If you put the test as a comment on a closing brace, it means that you have a search term that you can use to find the test that resulted in the code block. The preceding lines make this commenting redundant, but when you have code blocks with many tens of lines of code, and with many levels of nesting, comments like this can be very helpful.

Writing Statements

A statement can be a declaration of a variable, an expression that evaluates to a value, or it can be a definition of a type. A statement may also be a control structure to affect the flow of the execution through your code.

A statement ends with a semicolon. Other than that, there are few rules about how to format statements. You can even use a semicolon on its own, and this is called a null statement. A null statement does nothing, so having too many semicolons is usually benign.

Working with Expressions

An expression is a sequence of operators and operands (variables or literals) that results in some value. Consider the following:

    int i; 
    i = 6 * 7;

On the right side 6 * 7 is an expression, and the assignment (from i on the left-hand side to the semicolon on the right) is a statement.

Every expression is either an lvalue or an rvalue. You are most likely to see these keywords used in error descriptions. In effect, an lvalue is an expression that refers to some memory location. Items on the left-hand side of an assignment must be lvalues. However, an lvalue can appear on the left- or right-hand side of an assignment. All variables are lvalues. An rvalue is a temporary item that does not exist longer than the expression that uses it; it will have a value, but cannot have a value assigned to it, so it can only exist on the right-hand side of an assignment. Literals are rvalues. The following shows a simple example of lvalues and rvalues:

    int i; 
    i = 6 * 7;

In the second line, i is an lvalue, and the expression 6 * 7 results in an rvalue (42). The following will not compile because there is an rvalue on the left:

    6 * 7 = i;

Broadly speaking, an expression becomes a statement by when you append a semicolon. For example, the following are both statements:

    42;
    std::sqrt(2);

The first line is an rvalue of 42, but since it is temporary it has no effect. A C++ compiler will optimize it away. The second line calls the standard library function to calculate the square root of 2. Again, the result is an rvalue and the value is not used, so the compiler will optimize this away. However, it illustrates that a function can be called without using its return value. Although it is not the case with std::sqrt, many functions have a lasting effect other than their return value. Indeed, the whole point of a function is usually to do something, and the return value is often used merely to indicate if the function was successful; often developers assume that a function will succeed and ignore the return value.

Using the Comma Operator

Operators will be covered later in this chapter; however, it is useful to introduce the comma operator here. You can have a sequence of expressions separated by a comma as a single statement. For example, the following code is legal in C++:

    int a = 9;
    int b = 4;
    int c;
    c = a + 8, b + 1;

The writer intended to type c = a + 8 / b + 1; and : they pressed comma instead of a /. The intention was for c to be assigned to 9 + 2 + 1, or 12. This code will compile and run, and the variable c will be assigned with a value of 17 (a + 8). The reason is that the comma separates the right-hand side of the assignment into two expressions, a + 8 and b + 1, and it uses the value of the first expression to assign c. Later in this chapter, we will look at operator precedence. However, it is worth saying here that the comma has the lowest precedence and + has a higher precedence than =, so the statement is executed in the order of the addition: the assignment and then the comma operator (with the result of b + 1 thrown away).

You can change the precedence using parentheses to group expressions. For example, the mistyped code could have been as follows:

    c = (a + 8, b + 1);

The result of this statement is: variable c is assigned to 5 (or b + 1). The reason is that with the comma operator expressions are executed from left to right so the value of the group of expressions is the tight-most one. There are some cases, for example, in the initialization or loop expression of a for loop, where you will find the comma operator useful, but as you can see here, even used intentionally, the comma operator produces hard-to-read code.

Using Types and Variables

It is useful to give basic information here. C++ is a strongly typed language, which means that you have to declare the type of the variables that you use. The reason for this is that the compiler needs to know how much memory to allocate for the variable, and it can determine this by the type of the variable. In addition, the compiler needs to know how to initialize a variable, if it has not been explicitly initialized, and to perform this initialization the compiler needs to know the type of the variable.

Note

C++11 provides the auto keyword, which relaxes this concept of strong typing. However, the type checking of the compiler is so important that you should use type checking as much as possible.

C++ variables can be declared anywhere in your code as long as they are declared before they are used. Where you declare a variable determines how you use it (this is called the scope of the variable). In general, it is best to declare the variable as close as possible to where you will use it, and within the most restrictive scope. This prevents name clashes, where you will have to add additional information to disambiguate two or more variables.

You may, and should, give your variables descriptive names. This makes your code much more readable and easier to understand. C++ names must start with an alphabetic character, or an underscore. They can contain alphanumeric characters except spaces, but can contain underscores. So, the following are valid names:

    numberOfCustomers 
    NumberOfCustomers 
    number_of_customers

C++ names are case-sensitive, and the first 2,048 characters are significant. You can start a variable name with an underscore, but you cannot use two underscores, nor can you use an underscore followed by a capital letter (these are reserved by C++). C++ also reserves keywords (for example, while and if), and clearly you cannot use type names as variable names, neither built in type names (int, long, and so on) nor your own custom types.

You declare a variable in a statement, ending with a semicolon. The basic syntax of declaring a variable is that you specify the type, then the name, and, optionally, any initialization of the variable.

Built-in types must be initialized before you use them:

    int i; 
    i++;           // C4700 uninitialized local variable 'i' used 
    std::cout << i;

There are essentially three ways to initialize variables. You can assign a value, you can call the type constructor (constructors for classes will be defined in Chapter 4, Classes) or you can initialize a variable using function syntax:

    int i = 1; 
    int j = int(2); 
    int k(3);

These three are all legal C++, but stylistically the first is the better because it is more obvious: the variable is an integer, it is called i, and it is assigned a value of 1. The third looks confusing; it looks like the declaration of a function when it is actually declaring a variable. 

Chapter 4, Classes will cover classes, your own custom types. A custom type may be defined to have a default value, which means that you may decide not to initialize a variable of a custom type before using it. However, this will result in poorer performance, because the compiler will initialize the variable with the default value and subsequently your code will assign a value, resulting in an assignment being performed twice.

Using constants and literals

Each type will have a literal representation. An integer will be a numeric represented without a decimal point and, if it is a signed integer, the literal can also use the plus or minus symbol to indicate the sign. Similarly, a real number can have a literal value that contains a decimal point, and you may even use the scientific (or engineering) format including an exponent. C++ has various rules to use when specifying literals in code. Some examples of literals are shown here:

    int pos = +1; 
    int neg = -1; 
    double micro = 1e-6; 
    double unit = 1.; 
    std::string name = "Richard";

Note that for the unit variable, the compiler knows that the literal is a real number because the value has a decimal point. For integers, you can provide a hexadecimal literal in your code by prefixing the number with 0x, so 0x100 is 256 in decimal. By default, the output stream will print numeric values in base 10; however, you can insert a manipulator into an output stream to tell it to use a different number base. The default behavior is std::dec, which means the numbers should be displayed as base 10, std::oct means display as octal (base 8), and std::hex means display as hexadecimal (base 16). If you prefer to see the prefix printed, then you use the stream manipulator std::showbase (more details will be given in Chapter 5, Using the Standard Library Containers).

C++ defines some literals. For bool, the logic type, there are true and false constants, where false is zero and true is 1. There is also the nullptr constant, again, zero, which is used as an invalid value for any pointer type.

Defining constants

In some cases, you will want to provide constant values that can be used throughout your code. For example, you may decide to declare a constant for π. You should not allow this value to be changed because it will change the underlying logic in your code. This means that you should mark the variable as being constant. When you do this, the compiler will check the use of the variable and if it is used in code that changes the value of the variable the compiler will issue an error:

    const double pi = 3.1415; 
    double radius = 5.0; 
    double circumference = 2 * pi * radius;

In this case the symbol pi is declared as being constant, so it cannot change. If you subsequently decide to change the constant, the compiler will issue an error:

    // add more precision, generates error C3892 
    pi += 0.00009265359;

Once you have declared a constant, you can be assured that the compiler will make sure it remains so. You can assign a constant with an expression as follows:

    #include <cmath> 
    const double sqrtOf2 = std::sqrt(2);

In this code, a global constant called sqrtOf2 is declared and assigned with a value using the std::sqrt function. Since this constant is declared outside a function, it is global to the file and can be used throughout the file.

The problem with this approach is that the preprocessor does a simple replacement. With constants declared with const, the C++ compiler will perform type checking to ensure that the constant is being used appropriately.

You can also use const to declare a constant that will be used as a constant expression. For example, you can declare an array using the square bracket syntax (more details will be given in Chapter 2, Working with Memory, Arrays, and Pointers):

    int values[5];

This declares an array of five integers on the stack and these items are accessed through the values array variable. The 5 here is a constant expression. When you declare an array on the stack, you have to provide the compiler with a constant expression so it knows how much memory to allocate and this means the size of the array must be known at compile time. (You can allocate an array with a size known only at runtime, but this requires dynamic memory allocation, explained in Chapter 2, Working with Memory, Arrays, and Pointers.) In C++, you can declare a constant to do the following:

    const int size = 5;  
    int values[size];

Elsewhere in your code, when you access the values array, you can use the size constant to make sure that you do not access items past the end of the array. Since the size variable is declared in just one place, if you need to change the size of the array at a later stage, you have just one place to make this change. The const keyword can also be used on pointers and references (see Chapter 2, Working with Memory, Arrays, and Pointers) and on objects (see Chapter 4, Classes); often, you'll see it used on parameters to functions (see Chapter 3, Using Functions). This is used to get the compiler to help ensure that pointers, references, and objects are used appropriately, as you intended.

Using Constant Expressions

C++11 introduces a keyword called constexpr. This is applied to an expression, and indicates that the expression should be evaluated at compile type rather than at runtime:

    constexpr double pi = 3.1415; 
    constexpr double twopi = 2 * pi;

This is similar to initializing a constant declared with the const keyword. However, the constexpr keyword can also be applied to functions that return a value that can be evaluated at compile time, and so this allows the compiler to optimize the code:

    constexpr int triang(int i) 
    { 
       return (i == 0) ? 0 : triang(i - 1) + i;
    }

In this example, the function triang calculates triangular numbers recursively. The code uses the conditional operator. In the parentheses, the function parameter is tested to see if it is zero, and if so the function returns zero, in effect ending the recursion and returning the function to the original caller. If the parameter is not zero, then the return value is the sum of the parameter and the return value of triang called with the parameter is decremented.

 

This function, when called with a literal in your code, can be evaluated at compile time. The constexpr is an indication to the compiler to check the usage of the function to see if it can determine the parameter at compile time. If this is the case, the compiler can evaluate the return value and produce code more efficiently than by calling the function at runtime. If the compiler cannot determine the parameter at compile-time, the function will be called as normal. A function marked with the constexpr keyword must only have one expression (hence the use of the conditional operator ?: in the triang function).

Using Enumerations

A final way to provide constants is to use an enum variable. In effect, an enum is a group of named constants, which means that you can use an enum as a parameter to a function. For example:

    enum suits {clubs, diamonds, hearts, spades};

This defines an enumeration called suits, with named values for the suits in a deck of cards. An enumeration is an integer type and by default the compiler will assume an int, but you can change this by specifying the integer type in the declaration. Since there are just four possible values for card suits, it is a waste of memory to use int (usually 4 bytes) and instead, we can use char (a single byte):

    enum suits : char {clubs, diamonds, hearts, spades};

When you use an enumerated value, you can use just the name; however, it is usual to scope it with the name of the enumeration, making the code more readable:

    suits card1 = diamonds; 
    suits card2 = suits::diamonds;

Both forms are allowed, but the latter makes it more explicit that the value is taken from an enumeration. To force developers to specify the scope, you can apply the keyword class:

    enum class suits : char {clubs, diamonds, hearts, spades};

With this definition and the preceding code, the line declaring card2 will compile, but the line declaring card1 will not. With a scoped enum, the compiler treats the enumeration as a new type and has no inbuilt conversion from your new type to an integer variable. For example:

    suits card = suits::diamonds; 
    char c = card + 10; // errors C2784 and C2676

 

 

 

 

The enum type is based on char but when you define the suits variable as being scoped (with class) the second line will not compile. If the enumeration is defined as not being scoped (without class) then there is an inbuilt conversion between the enumerated value and char.

By default, the compiler will give the first enumerator a value of 0 and then increment the value for the subsequent enumerators. Thus suits::diamonds will have a value of 1 because it is the second value in suits. You can assign values yourself:

    enum ports {ftp=21, ssh, telnet, smtp=25, http=80};

In this case, ports::ftp has a value of 21, ports::ssh has a value of 22 (21 incremented), ports::telnet is 22, ports::smtp is 25, and ports::http is 80.

Note

Often the point of enumerations is to provide named symbols within your code and their values are unimportant. Does it matter what value is assigned to suits::hearts? The intention is usually to ensure that it is different from the other values. In other cases, the values are important because they are a way to provide values to other functions.

Enumerations are useful in a switch statement (see later) because the named value makes it clearer than using just an integer. You can also use an enumeration as a parameter to a function and hence restrict the values passed via that parameter:

    void stack(suits card) 
    { 
        // we know that card is only one of four values 
    }

Declaring Pointers

Since we are covering the use of variables, it is worth explaining the syntax used to define pointers and arrays because there are some potential pitfalls. Chapter 2, Working with Memory, Arrays, and Pointers, covers this in more detail, so we will just introduce the syntax so that you are familiar with it.

 

 

In C++, you will access memory using a typed pointer. The type indicates the type of the data that is held in the memory that is pointed to. So, if the pointer is an (4 byte) integer pointer, it will point to four bytes that can be used as an integer. If the integer pointer is incremented, then it will point to the next four bytes, which can be used as an integer.

Note

Don't worry if you find pointers confusing at this point. Chapter 2, Working with Memory, Arrays, and Pointers, will explain this in more detail. The purpose of introducing pointers at this time is to make you aware of the syntax.

In C++, pointers are declared using the * symbol and you access a memory address with the & operator:

    int *p; 
    int i = 42; 
    p = &i;

The first line declares a variable, p, which will be used to hold the memory address of an integer. The second line declares an integer and assigns it a value. The third line assigns a value to the pointer p to be the address of the integer variable just declared. It is important to stress that the value of p is not42; it will be a memory address where the value of 42 is stored.

Note how the declaration has the * on the variable name. This is common convention. The reason is that if you declare several variables in one statement, the * applies only to the immediate variable. So, for example:

    int* p1, p2;

Initially, this looks like you are declaring two integer pointers. However, this line does not do this; it declares just one pointer to integer called p1. The second variable is an integer called p2. The preceding line is equivalent to the following:

    int *p1;  
    int p2;

If you wish to declare two integers in one statement, then you should do it as follows:

    int *p1, *p2;

Using Namespaces

Namespaces give you one mechanism to modularize code. A namespace allows you to label your types, functions, and variables with a unique name so that, using the scope resolution operator, you can give a fully qualified name. The advantage is that you know exactly which item will be called. The disadvantage is that using a fully qualified name you are in effect switching off C++'s argument-dependent lookup mechanism for overloaded functions where the compiler will choose the function that has the best fit according to the arguments passed to the function.

Defining a namespace is simple: you decorate the types, functions, and global variables with the namespace keyword and the name you give to it. In the following example, two functions are defined in the utilities namespace:

    namespace utilities 
    { 
        bool poll_data() 
        { 
            // code that returns a bool 
        } 
        int get_data() 
        { 
            // code that returns an integer 
        } 
    }

Note

Do not use semicolon after the closing bracket.

Now when you use these symbols, you need to qualify the name with the namespace:

    if (utilities::poll_data()) 
    { 
        int i = utilities::get_data(); 
        // use i here... 
    }

The namespace declaration may just declare the functions, in which case the actual functions would have to be defined elsewhere, and you will need to use a qualified name:

    namespace utilities 
    { 
        // declare the functions 
        bool poll_data(); 
        int get_data(); 
    } 

    //define the functions 
    bool utilities::poll_data() 
    { 
        // code that returns a bool 
    } 

    int utilities::get_data() 
    { 
       // code that returns an integer 
    }

One use of namespaces is to version your code. The first version of your code may have a side-effect that is not in your functional specification and is technically a bug, but some callers will use it and depend on it. When you update your code to fix the bug, you may decide to allow your callers the option to use the old version so that their code does not break. You can do this with a namespace:

    namespace utilities 
    { 
        bool poll_data(); 
        int get_data(); 

        namespace V2 
        { 
            bool poll_data(); 
            int get_data(); 
            int new_feature(); 
        } 
    }

Now callers who want a specific version can call the fully qualified names, for example, callers could use utilities::V2::poll_data to use the newer version and utilities::poll_data to use the older version. When an item in a specific namespace calls an item in the same namespace, it does not have to use a qualified name. So, if the new_feature function calls get_data, it will be utilities::V2::get_data that is called. It is important to note that, to declare a nested namespace, you have to do the nesting manually (as shown here); you cannot simply declare a namespace called utilities::V2.

The preceding example has been written so that the first version of the code will call it using the namespace utilities. C++11 provides a facility called an inline namespace that allows you to define a nested namespace, but allows the compiler to treat the items as being in the parent namespace when it performs an argument-dependent lookup:

    namespace utilities 
    { 
        inline namespace V1 
        { 
            bool poll_data(); 
            int get_data(); 
        } 

        namespace V2 
        { 
            bool poll_data(); 
            int get_data(); 
            int new_feature(); 
        } 
    }

Now to call the first version of get_data, you can use utilities::get_data or utilities::V1::get_data.

Fully qualified names can make the code difficult to read, especially if your code will only use one namespace. To help here you have several options. You can place a using statement to indicate that symbols declared in the specified namespace can be used without a fully qualified name:

    using namespace utilities; 
    int i = get_data(); 
    int j = V2::get_data();

You can still use fully qualified names, but this statement allows you to ease the requirement. Note that a nested namespace is a member of a namespace, so the preceding using statement means that you can call the second version of get_data with either utilities::V2::get_data or V2::get_data. If you use the unqualified name, then it means that you will call utilities::get_data.

A namespace can contain many items, and you may decide that you only want to relax the use of fully qualified names with just a few of them. To do this, use using and give the name of the item:

    using std::cout; 
    using std::endl; 
    cout << "Hello, World!" << endl;

This code says that, whenever cout is used, it refers to std::cout. You can use using within a function, or you can put it as file scope and make the intention global to the file.

You do not have to declare a namespace in one place, you can declare it over several files. The following could be in a different file to the previous declaration of utilities:

    namespace utilities 
    { 
        namespace V2 
        { 
            void print_data(); 
        } 
    }

The print_data function is still part of the utilities::V2 namespace.

You can also put an #include in a namespace, in which case the items declared in the header file will now be part of the namespace. The standard library header files that have a prefix of c (for example, cmath, cstdlib, and ctime) give access to the C runtime functions by including the appropriate C header in the std namespace.

The great advantage of a namespace is to be able to define your items with names that may be common, but are hidden from other code that does not know the namespace name of. The namespace means that the items are still available to your code via the fully qualified name. However, this only works if you use a unique namespace name, and the likelihood is that, the longer the namespace name, the more unique it is likely to be. Java developers often name their classes using a URI, and you could decide to do the same thing:

    namespace com_packtpub_richard_grimes 
    { 
        int get_data(); 
    }

The problem is that the fully qualified name becomes quite long:

    int i = com_packtpub_richard_grimes::get_data();

You can get around this issue using an alias:

    namespace packtRG = com_packtpub_richard_grimes; 
    int i = packtRG::get_data();

C++ allows you to define a namespace without a name, an anonymous namespace. As mentioned previously, namespaces allow you to prevent name clashes between code defined in several files. If you intend to use such a name in only one file you could define a unique namespace name. However, this could get tedious if you had to do it for several files. A namespace without a name has the special meaning that it has internal linkage, that is, the items can only be used in the current translation unit, the current file, and not in any other file.

Code that is not declared in a namespace will be a member of the global namespace. You can call the code without a namespace name, but you may want to explicitly indicate that the item is in the global namespace using the scope resolution operator without a namespace name:

    int version = 42; 

    void print_version() 
    { 
        std::cout << "Version = " << ::version << std::endl; 
    }

C++ Scoping of Variables

The compiler will compile your source files as individual items called translation units. The compiler will determine the objects and variables you declare and the types and functions you define, and once declared you can use any of these in the subsequent code within the scope of the declaration. At its very broadest, you can declare an item at the global scope by declaring it in a header file that will be used by all of the source files in your project. If you do not use a namespace it is often wise when you use such global variables to name them as being part of the global namespace:

    // in version.h 
    extern int version; 

    // in version.cpp 
    #include "version.h"  
    version = 17; 

    // print.cpp 
    #include "version.h" 
    void print_version() 
    { 
        std::cout << "Version = " << ::version << std::endl; 
    }

This code has the C++ for two source files (version.cpp and print.cpp) and a header file (version.h) included by both source files. The header file declares the global variable version, which can be used by both source files; it declares the variable, but does not define it. The actual variable is defined and initialized in version.cpp; it is here that the compiler will allocate memory for the variable. The extern keyword used on the declaration in the header indicates to the compiler that version has external linkage, that is, the name is visible in files other than where the variable is defined. The version variable is used in the print.cpp source file. In this file, the scope resolution operator (::) is used without a namespace name and hence indicates that the variable version is in the global namespace.

You can also declare items that will only be used within the current translation unit, by declaring them within the source file before they are used (usually at the top of the file). This produces a level of modularity and allows you to hide implementation details from code in other source files. For example:

    // in print.h 
    void usage(); 

    // print.cpp 
    #include "version.h" 
    std::string app_name = "My Utility"; 
    void print_version() 
    { 
       std::cout << "Version = " << ::version << std::endl; 
    } 

    void usage() 
    { 
       std::cout << app_name << " "; 
       print_version(); 
    }

The print.h header contains the interface for the code in the file print.cpp. Only those functions declared in the header will be callable by other source files. The caller does not need to know about the implementation of the usage function, and as you can see here it is implemented using a call to a function called print_version that is only available to code in print.cpp. The variable app_name is declared at file scope, so it will only be accessible to code in print.cpp.

If another source file declares a variable at file scope, that is called app_name, and is also a std::string the file will compile, but the linker will complain when it tries to link the object files. The reason is that the linker will see the same variable defined in two places and it will not know which one to use.

A function also defines a scope; variables defined within the function can only be accessed through that name. The parameters of the function are also included as variables within the function, so when you declare other variables, you have to use different names. If a parameter is not marked as const then you can alter the value of the parameter in your function.

You can declare variables anywhere within a function as long as you declare them before you use them. Curly braces ({}) are used to define code blocks, and they also define local scope; if you declare a variable within a code block then you can only use it there. This means that you can declare variables with the same name outside the code block and the compiler will use the variable closest to the scope it is accessed.

Before finishing this section, it is important to mention one aspect of the C++ storage class. A variable declared in a function means that the compiler will allocate memory for the variable on the stack frame created for the function. When the function finishes, the stack frame is torn down and the memory recycled. This means that, after a function returns, the values in any local variables are lost; when the function is called again, the variable is created anew and initialized again.

C++ provides the static keyword to change this behavior. The static keyword means that the variable is allocated when the program starts just like variables declared at global scope. Applying static to a variable declared in a function means that the variable has internal linkage, that is, the compiler restricts access to that variable to that function:

    int inc(int i) 
    { 
        static int value; 
        value += i; 
        return value; 
    } 

    int main() 
    { 
        std::cout << inc(10) << std::endl; 
        std::cout << inc(5) << std::endl; 
    }

By default, the compiler will initialize a static variable to 0, but you can provide an initialization value, and this will be used when the variable is first allocated. When this program starts, the value variable will be initialized to 0 before the main function is called. The first time the inc function is called, the value variable is incremented to 10, which is returned by the function and printed to the console. When the inc function returns the value variable is retained, so that when the inc function is called again, the value variable is incremented by 5 to a value of 15.