Book Image

Learning Boost C++ Libraries

By : Arindam Mukherjee
Book Image

Learning Boost C++ Libraries

By: Arindam Mukherjee

Overview of this book

Table of Contents (19 chapters)
Learning Boost C++ Libraries
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Move semantics and rvalue references


Copy semantics are for creating clones of objects. It is useful sometimes, but not always needed or even meaningful. Consider the following class that encapsulates a TCP client socket. A TCP socket is an integer that represents one endpoint of a TCP connection and through which data can be sent or received to the other endpoint. The TCP socket class can have the following interface:

class TCPSocket
{
public:
  TCPSocket(const std::string& host, const std::string& port);
  ~TCPSocket();

  bool is_open();
  vector<char> read(size_t to_read);
  size_t write(vector<char> payload);

private:
  int socket_fd_;

  TCPSocket(const TCPSocket&);
  TCPSocket& operator = (const TCPSocket&);
};

The constructor opens a connection to a host on a specified port and initializes the socket_fd_ member variable. The destructor closes the connection. TCP does not define a way to make clones of open sockets (unlike file descriptors with dup/dup2) and therefore cloning TCPSocket would not be meaningful either. Therefore, we disable copy semantics by declaring the copy constructor and copy assignment operators private. In C++11, the preferred way to do this is to declare these members as deleted:

TCPSocket(const TCPSocket&) = delete;
TCPSocket& operator = (const TCPSocket&) = delete;

Although not copyable, it would make perfect sense to create a TCPSocket object in one function and then return it to a calling function. Consider a factory function that creates connections to some remote TCP service:

TCPSocket connectToService()
{
  TCPSocket socket(get_service_host(),  // function gets hostname
                   get_service_port()); // function gets port
  return socket;
}

Such a function would encapsulate the details about which host and port to connect to, and would create an object of TCPSocket to be returned to the caller. This would not really call for copy semantics at all, but move semantics, in which the contents of the TCPSocket object created in the connectToService function would be transferred to another TCPSocket object at the call site:

TCPSocket socket = connectToService();

In C++03, this would not be possible to write without enabling the copy constructor. We could subvert the copy constructor to provide move semantics, but there are many problems with this approach:

TCPSocket::TCPSocket(TCPSocket& that) {
  socket_fd_ = that.socket_fd_;
  that.socket_fd_ = -1;
}

Note that this version of the "copy" constructor actually moves the contents out of its argument, which is why the argument is non-const. With this definition, we can actually implement the connectToService function, and use it as shown earlier. But nothing would prevent situations like the following:

 1 void performIO(TCPSocket socket)
 2 {
 3   socket.write(...);
 4   socket.read(...);
 5   // etc.
 6 }
 7
 8 TCPSocket socket = connectToService();
 9 performIO(socket);   // moves TCPSocket into performIO
10 // now socket.socket_fd_ == -1
11 performIO(socket);   // OOPs: not a valid socket

We obtain an instance of TCPSocket called socket by calling connectToService (line 8) and pass this instance to performIO (line 9). But the copy constructor used to pass socket by value to performIO moves its contents out, and when performIO returns, socket no longer encapsulates a valid TCP socket. By disguising a move as a copy, we have created an unintuitive and error-prone interface; if you are familiar with std::auto_ptr, you would have seen this before.

rvalue references

In order to support move semantics better, we must first answer the question: which objects can be moved from? Consider the TCPSocket example again. In the function connectToService, the expression TCPSocket(get_service_host(), get_service_port()) is an unnamed temporary object of TCPSocket whose sole purpose is to be transferred to the caller's context. There is no way for anyone to refer to this object beyond the statement where it gets created. It makes perfect sense to move the contents out of such an object. But in the following snippet:

TCPSocket socket = connectToService();
performIO(socket);

It would be dangerous to move out the contents of socket object because in the calling context, the object is still bound to the name socket and can be used in further operations. The expression socket is called an lvalue expression—one that has an identity and whose address can be taken by prefixing the &-operator to the expression. Non-lvalue expressions are referred to as rvalue expressions. These are unnamed expressions whose address cannot be computed using the &-operator on the expression. An expression, such as TCPSocket(get_service_host(), get_service_port()) is an rvalue expression.

We can say that, in general, it is dangerous to move contents from an lvalue expression but safe to move contents from rvalue expressions. Thus, the following is dangerous:

TCPSocket socket = connectToService();
performIO(socket);

But the following is alright:

performIO(connectToService());

Note here that the expression connectToService() is not an lvalue expression and therefore qualifies as an rvalue expression. In order to distinguish between lvalue and rvalue expressions, C++11 introduced a new class of references called rvalue references that can refer to rvalue-expressions but not lvalue-expressions. Such references are declared using a new syntax involving double ampersands as shown below:

socket&& socketref = TCPSocket(get_service_host(), 
                               get_service_port());

The other class of references that were earlier simply called references are now called lvalue references. A non-const lvalue reference can only refer to an lvalue expression, while a const lvalue reference can also refer to an rvalue expression:

/* ill-formed */
socket& socketref = TCPSocket(get_service_host(), 
                              get_service_port());

/* well-formed */
const socket& socketref = TCPSocket(get_service_host(), 
                                    get_service_port());

An rvalue reference can be, and usually is, non-const:

socket&& socketref = TCPSocket(...);
socketref.read(...);

In the preceding snippet, the expression socketref itself is an lvalue-expression because you can compute its address using &-operator. But it is bound to an rvalue-expression, and object referred to by the non-const rvalue reference can be modified through it.

rvalue-reference overloads

We can create overloads of a function based on whether they take lvalue expressions or rvalue expressions. In particular, we can overload the copy constructor to take rvalue expressions. For the TCPSocket class, we can write the following:

TCPSocket(const TCPSocket&) = delete;

TCPSocket(TCPSocket&& rvref) : socket_fd_(-1)
{
  std::swap(socket_fd_, rvref.socket_fd_);
}

While the lvalue overload is the deleted copy constructor, rvalue overload is called the move constructor because this is implemented to usurp or "steal" the contents of the rvalue expression passed to it. It moves the contents of the source to the target, leaving the source (rvref) in some unspecified state that is safe to destruct. In this case, this amounts to setting the socket_fd_ member of the rvref to -1.

With this definition of the move constructor, TCPSocket becomes movable but not copyable. The connectToService implementation would work correctly:

TCPSocket connectToService()
{
  return TCPSocket(get_service_host(),get_service_port());
}

This would move the temporary object back to the caller. But the following call to performIO would be ill-formed because socket is an lvalue expression and TCPSocket only defines move semantics for which an rvalue expression was necessary:

TCPSocket socket = connectToService();
performIO(socket);

This is a good thing because you cannot move contents out of an object like socket that you could potentially use later. An rvalue-expression of a movable type can be passed by value and thus the following will be well-formed:

performIO(connectToService());

Note that the expression connectToService()is an rvalue expression because it is not bound to a name and its address cannot be taken.

A type can be both copyable and movable. For example, we could implement a move constructor for the String class in addition to its copy constructor:

 1 // move-constructor
 2 String::String(String&& source) noexcept
 3       : buffer_(0), len_(0)
 4 {
 5   swap(source); // See listing A.1c
 6 }

The nothrow swap plays a central role in the implementation of move semantics. The contents of the source and target objects are exchanged. So when the source object goes out of scope in the calling scope, it releases its new contents (the target object's old state). The target object lives on with its new state (the source object's original state). The move is implemented in terms of the nothrow swap, which just swaps pointers and values of primitive types, and it is guaranteed to succeed; hence, the noexcept specification. In fact, moving objects usually requires less work involving swapping pointers and other data bits, while copying often requires new allocations that could potentially fail.

Move assignment

Just as we can construct an object by stealing the contents of another object, we can also move the contents of one object to another after both have been constructed. To do this, we can define a move assignment operator, an rvalue-overload of the copy assignment operator:

 1 // move assignment
 2 String& String::operator=(String&& rhs) noexcept
 3 {
 4   swap(rhs);
 5   return *this;
 6 }

Alternatively, we can define a universal assignment operator that works for both lvalue and rvalue expressions:

 1 // move assignment
 2 String& String::operator=(String rhs)
 3 {
 4   swap(rhs);
 5   return *this;
 6 }

Note that the universal assignment operator cannot coexist with either the lvalue or the rvalue overload, else there would be ambiguity in overload resolution.

xvalues

When you call a function with an rvalue expression, the compiler resolves function calls to an rvalue-overload of the function if one is available. But if you call the function with a named variable, it gets resolved to an lvalue overload if one is available or the program is ill-formed. Now you might have a named variable that you can move from because you have no use for it later:

void performIO(TCPSocket socket);

TCPSocket socket = connectToService();
// do stuff on socket
performIO(socket);  // ill-formed because socket is lvalue

The preceding example will fail to compile because performIO takes its sole parameter by value and socket is of a move-only type but it is not an rvalue expression. By using std::move, you can cast an lvalue expression to an rvalue expression, and pass it to a function that expects an rvalue expression. The std::move function template is defined in the standard header utility.

#include <utility> // for std::moves
performIO(std::move(socket));

The call to std::move(socket) gives us an rvalue reference to socket; it does not cause any data to be moved out of socket. When we pass this expression of rvalue-reference type to the function performIO, which takes its parameter by value, a new TCPSocket object is created in the performIO function, corresponding to its by-value parameter. It is move initialized from socket, that is, its move constructor steals the contents of socket. Following the call to performIO, the variable socket loses its contents and therefore should not be used in further operations. If the move constructor of TCPSocket is correctly implemented, then socket should still be safe to destruct.

The expression std::move(socket) shares the identity of socket, but it would potentially be moved from within the function it is passed to. Such expressions are called xvalues, the x standing for expired.

Tip

xvalues have a well-defined identity like lvalues, but can be moved from like rvalues. xvalues bind to rvalue reference parameters of a function.

If performIO did not take its parameter by value but as an rvalue-reference then one thing would change:

void performIO(TCPSocket&& socket);
performIO(std::move(socket));

The call to performIO(std::move(socket)) would still be well-formed, but would not automatically move out the contents of socket. This is because we pass a reference to an existing object here, whereas we create a new object that is move initialized from socket when we pass by value. In this case, unless the performIO function implementation explicitly moves out the contents of socket, it would still remain valid in the calling context after the call to performIO.

Tip

In general, if you have cast your object to an rvalue-expression and passed it to a function that expects an rvalue-reference, you should just assume that it has been moved from and not use it beyond the call.

An object of type T that is local to a function can be returned by value from that function if T has an accessible move or copy constructor. If a move constructor is available, the returned value will be move-initialized, else it would be copy-initialized. If however, the object is not local to the function, then it must have an accessible copy constructor to be returned by value. Additionally, compilers, whenever they can, optimize away copies and moves.

Consider the implementation of connectToService and how it is used:

 1 TCPSocket connectToService()
 2 {
 3   return TCPSocket(get_service_host(),get_service_port());
 4 }
 5
 6 TCPSocket socket = connectToService();

In this case, the compiler will actually construct the temporary (line 3) directly in the storage for the socket object (line 6) where the return value of connectToService was meant to be moved to. This way, it would simply optimize away the move initialization of socket (line 6). This optimization is effected even if the move constructor has side effects, which means that those side effects may not take effect as a result of this optimization. In the same way, the compiler can optimize away copy initialization and directly construct the returned object at the target site. This is referred to as Return Value Optimization (RVO) and has been the norm for all major compilers since C++03, when it optimized away only copies. Although the copy or move constructors are not invoked when RVO takes effect, they must nevertheless be defined and accessible for RVO to work.

While RVO applies when rvalue expressions are returned, the compiler can sometimes optimize away a copy or move, even when a named local object on the stack is returned from a function. This is known as Named Return Value Optimization (NRVO).

Return Value Optimization is a specific case of Copy Elision, in which the compiler optimizes away a move or copy of an rvalue expression to construct it directly in the target storage:

std::string reverse(std::string input);

std::string a = "Hello";
std::string b = "World";
reverse(a + b);

In the preceding example, the expression a + b is an rvalue expression that generates a temporary object of type std::string. This object will not be copied into the function reverse instead the copy would be elided, and the object resulting from the expression a + b would be constructed directly in the storage for reverse's parameter.

Tip

Passing and returning an object of type T by value requires either move or copy semantics to be defined for T. If a move constructor is available, it is used, otherwise the copy constructor is used. Whenever possible, the compiler optimizes away copy or move operations and constructs the object directly at the target site in the calling or called function.

Move emulation using Boost.Move

In this section, we look at how, with relative ease, you can actually retrofit much of the move semantics for your own legacy classes using the Boost.Move library. First, consider the interface of the String class in C++ 11 syntax:

 1 class String
 2 {
 3 public:
 4   // Constructor
 5   String(const char *str = 0);
 6
 7   // Destructor
 8   ~String();
 9
10   // Copy constructor
11   String(const String& that);
12
13   // Copy assignment operator
14   String& operator=(const String& rhs);
15
16   // Move constructor
17   String(String&& that);
18
19   // Move assignment
20   String& operator=(String&& rhs);
21   …
22 };

Let us now see how you would define an equivalent interface using Boost's facilities:

Listing A.2a: Move emulation with Boost.Move

 1 #include <boost/move/move.hpp>
 2 #include <boost/swap.hpp>
 3
 4 class String {
 5 private:
 6   BOOST_COPYABLE_AND_MOVABLE(String);
 7
 8 public:
 9   // Constructor
10   String(const char *str = 0);
11
12   // Destructor
13   ~String();
14
15   // Copy constructor
16   String(const String& that);
17
18   // Copy assignment operator
19   String& operator=(BOOST_COPY_ASSIGN_REF(String) rhs);
20
21   // Move constructor
22   String(BOOST_RV_REF(String) that);
23
24   // Move assignment
25   String& operator=(BOOST_RV_REF(String) rhs);
26 
27   void swap(String& rhs);
28
29 private:
30   char *buffer_;
31   size_t size_;
32 };

The key changes are as follows:

  • Line 6: The macro BOOST_COPYABLE_AND_MOVABLE(String) defines some internal infrastructure to support copy and move semantics, and distinguish between lvalues and rvalues of type String. This is declared as private.

  • Line 19: A copy assignment operator that takes the type BOOST_COPY_ASSIGN_REF(String). This is a wrapper type for String to which String lvalues can be implicitly converted.

  • Line 22 and 25: A move constructor and a move-assignment operator that take the wrapper type BOOST_RV_REF(String). String rvalues implicitly convert to this type.

  • Note that on line 16, the copy constructor does not change.

Under a C++ 03 compiler, the emulation of move-semantics is provided without any special support from the language or the compiler. With a C++ 11 compiler, the macros automatically use C++ 11 native constructs for supporting move-semantics.

The implementation is pretty much the same as the C++ 11 version except for the parameter types.

Listing A.2b: Move emulation with Boost Move

 1 // Copy constructor
 2 String::String(const String& that) : buffer_(0), len_(0)
 3 {
 4   buffer_ = dupstr(that.buffer_, len_);
 5 }
 6 
 7 // Copy assignment operator
 8 String& String::operator=(BOOST_COPY_ASSIGN_REF(String)rhs)
 9 {
10   String tmp(rhs);
11   swap(tmp);        // calls String::swap member
12   return *this;
13 }
14 
15 // Move constructor
16 String::String(BOOST_RV_REF(String) that) : buffer_(0), 
17                                             size_(0) 
18 { 
19   swap(that);      // calls String::swap member 
20 }
21 // Move assignment operator
22 String& String::operator=(BOOST_RV_REF(String)rhs)
23 {
24   swap(rhs);
25   String tmp;
26   rhs.swap(tmp);
27
28   return *this;
29 }
30 
31 void String::swap(String& that)
32 {
33   boost::swap(buffer_, that.buffer_);
34   boost::swap(size_, that.size_);
35 }

If we wanted to make our class only support move semantics but not copy semantics, then we should have used the macro BOOST_MOVABLE_NOT_COPYABLE in place of BOOST_COPYABLE_AND_MOVABLE and should not have defined the copy constructor and copy assignment operator.

In the copy/move assignment operators, we could check for self-assignment if we wanted by putting the code that does the swapping/copying inside an if-block like this:

if (this != &rhs) {
  …
}

This will not change the correctness of the code as long the implementation of copy/move is exception-safe. But it would help to improve the performance by avoiding further operations in case of assignment to the self.

So in summary, the following macros help us emulate move-semantics in C++ 03:

#include <boost/move/move.hpp>

BOOST_COPYABLE_AND_MOVABLE(classname)
BOOST_MOVABLE_BUT_NOT_COPYABLE(classname)
BOOST_COPY_ASSIGN_REF(classname)
BOOST_RV_REF(classname)

You can also use BOOST_RV_REF(…) encapsulated types for parameters of other member methods, besides the move constructors and assignment operators.

If you want to move from an lvalue, you would naturally have to cast it to an "rvalue-emulating" expression. You do this using boost::move, which corresponds to std::move in C++ 11. Here are some examples of invoking different copy and move operations on String objects using the Boost move emulation:

 1 String getName();                       // return by value
 2 void setName(BOOST_RV_REF(String) str); // rvalue ref overload
 3 void setName(const String&str);        // lvalue ref overload
 4 
 5 String str1("Hello");                 
 6 String str2(str1);                      // copy ctor
 7 str2 = getName();                       // move assignment
 8 String str3(boost::move(str2));         // move ctor
 9 String str4;
10 str4 = boost::move(str1);               // move assignment
11 setName(String("Hello"));               // rvalue ref overload
12 setName(str4);                          // lvalue ref overload
13 setName(boost::move(str4));             // rvalue ref overload