why use move constructors? clang-tidy modernize-pass-by-value [duplicate] - c++11

I saw code somewhere in which someone decided to copy an object and subsequently move it to a data member of a class. This left me in confusion in that I thought the whole point of moving was to avoid copying. Here is the example:
struct S
{
S(std::string str) : data(std::move(str))
{}
};
Here are my questions:
Why aren't we taking an rvalue-reference to str?
Won't a copy be expensive, especially given something like std::string?
What would be the reason for the author to decide to make a copy then a move?
When should I do this myself?

Before I answer your questions, one thing you seem to be getting wrong: taking by value in C++11 does not always mean copying. If an rvalue is passed, that will be moved (provided a viable move constructor exists) rather than being copied. And std::string does have a move constructor.
Unlike in C++03, in C++11 it is often idiomatic to take parameters by value, for the reasons I am going to explain below. Also see this Q&A on StackOverflow for a more general set of guidelines on how to accept parameters.
Why aren't we taking an rvalue-reference to str?
Because that would make it impossible to pass lvalues, such as in:
std::string s = "Hello";
S obj(s); // s is an lvalue, this won't compile!
If S only had a constructor that accepts rvalues, the above would not compile.
Won't a copy be expensive, especially given something like std::string?
If you pass an rvalue, that will be moved into str, and that will eventually be moved into data. No copying will be performed. If you pass an lvalue, on the other hand, that lvalue will be copied into str, and then moved into data.
So to sum it up, two moves for rvalues, one copy and one move for lvalues.
What would be the reason for the author to decide to make a copy then a move?
First of all, as I mentioned above, the first one is not always a copy; and this said, the answer is: "Because it is efficient (moves of std::string objects are cheap) and simple".
Under the assumption that moves are cheap (ignoring SSO here), they can be practically disregarded when considering the overall efficiency of this design. If we do so, we have one copy for lvalues (as we would have if we accepted an lvalue reference to const) and no copies for rvalues (while we would still have a copy if we accepted an lvalue reference to const).
This means that taking by value is as good as taking by lvalue reference to const when lvalues are provided, and better when rvalues are provided.
P.S.: To provide some context, I believe this is the Q&A the OP is referring to.

To understand why this is a good pattern, we should examine the alternatives, both in C++03 and in C++11.
We have the C++03 method of taking a std::string const&:
struct S
{
std::string data;
S(std::string const& str) : data(str)
{}
};
in this case, there will always be a single copy performed. If you construct from a raw C string, a std::string will be constructed, then copied again: two allocations.
There is the C++03 method of taking a reference to a std::string, then swapping it into a local std::string:
struct S
{
std::string data;
S(std::string& str)
{
std::swap(data, str);
}
};
that is the C++03 version of "move semantics", and swap can often be optimized to be very cheap to do (much like a move). It also should be analyzed in context:
S tmp("foo"); // illegal
std::string s("foo");
S tmp2(s); // legal
and forces you to form a non-temporary std::string, then discard it. (A temporary std::string cannot bind to a non-const reference). Only one allocation is done, however. The C++11 version would take a && and require you to call it with std::move, or with a temporary: this requires that the caller explicitly creates a copy outside of the call, and move that copy into the function or constructor.
struct S
{
std::string data;
S(std::string&& str): data(std::move(str))
{}
};
Use:
S tmp("foo"); // legal
std::string s("foo");
S tmp2(std::move(s)); // legal
Next, we can do the full C++11 version, that supports both copy and move:
struct S
{
std::string data;
S(std::string const& str) : data(str) {} // lvalue const, copy
S(std::string && str) : data(std::move(str)) {} // rvalue, move
};
We can then examine how this is used:
S tmp( "foo" ); // a temporary `std::string` is created, then moved into tmp.data
std::string bar("bar"); // bar is created
S tmp2( bar ); // bar is copied into tmp.data
std::string bar2("bar2"); // bar2 is created
S tmp3( std::move(bar2) ); // bar2 is moved into tmp.data
It is pretty clear that this 2 overload technique is at least as efficient, if not more so, than the above two C++03 styles. I'll dub this 2-overload version the "most optimal" version.
Now, we'll examine the take-by-copy version:
struct S2 {
std::string data;
S2( std::string arg ):data(std::move(x)) {}
};
in each of those scenarios:
S2 tmp( "foo" ); // a temporary `std::string` is created, moved into arg, then moved into S2::data
std::string bar("bar"); // bar is created
S2 tmp2( bar ); // bar is copied into arg, then moved into S2::data
std::string bar2("bar2"); // bar2 is created
S2 tmp3( std::move(bar2) ); // bar2 is moved into arg, then moved into S2::data
If you compare this side-by-side with the "most optimal" version, we do exactly one additional move! Not once do we do an extra copy.
So if we assume that move is cheap, this version gets us nearly the same performance as the most-optimal version, but 2 times less code.
And if you are taking say 2 to 10 arguments, the reduction in code is exponential -- 2x times less with 1 argument, 4x with 2, 8x with 3, 16x with 4, 1024x with 10 arguments.
Now, we can get around this via perfect forwarding and SFINAE, allowing you to write a single constructor or function template that takes 10 arguments, does SFINAE to ensure that the arguments are of appropriate types, and then moves-or-copies them into the local state as required. While this prevents the thousand fold increase in program size problem, there can still be a whole pile of functions generated from this template. (template function instantiations generate functions)
And lots of generated functions means larger executable code size, which can itself reduce performance.
For the cost of a few moves, we get shorter code and nearly the same performance, and often easier to understand code.
Now, this only works because we know, when the function (in this case, a constructor) is called, that we will be wanting a local copy of that argument. The idea is that if we know that we are going to be making a copy, we should let the caller know that we are making a copy by putting it in our argument list. They can then optimize around the fact that they are going to give us a copy (by moving into our argument, for example).
Another advantage of the 'take by value" technique is that often move constructors are noexcept. That means the functions that take by-value and move out of their argument can often be noexcept, moving any throws out of their body and into the calling scope (who can avoid it via direct construction sometimes, or construct the items and move into the argument, to control where throwing happens). Making methods nothrow is often worth it.

This is probably intentional and is similar to the copy and swap idiom. Basically since the string is copied before the constructor, the constructor itself is exception safe as it only swaps (moves) the temporary string str.

You don't want to repeat yourself by writing a constructor for the move and one for the copy:
S(std::string&& str) : data(std::move(str)) {}
S(const std::string& str) : data(str) {}
This is much boilerplate code, especially if you have multiple arguments. Your solution avoids that duplication on the cost of an unnecessary move. (The move operation should be quite cheap, however.)
The competing idiom is to use perfect forwarding:
template <typename T>
S(T&& str) : data(std::forward<T>(str)) {}
The template magic will choose to move or copy depending on the parameter that you pass in. It basically expands to the first version, where both constructor were written by hand. For background information, see Scott Meyer's post on universal references.
From a performance aspect, the perfect forwarding version is superior to your version as it avoids the unnecessary moves. However, one can argue that your version is easier to read and write. The possible performance impact should not matter in most situations, anyway, so it seems to be a matter of style in the end.

Related

iterate over non-const std::unordered_set

I have a std::unordered_set that contains instances of class bar.
I'd like to iterate over all the bars in the set and call some void foo(bar& b) function on each one.
You'll probably notice from the function signature that I want foo to change the state of the bar& b parameter in some way.
Now, I do know that foo won't change bar in a way that affects hashing or equality comparisons, but I still have a problem.
However I iterate over the set, the best I can hope for is a const bar& which obviously won't work.
I can think of a couple of possible ways around this:
Use const_cast. Don't know if this will work (yet). It kind of smells bad to me, but I'm happy to be enlightened!!
Use a std::unordered_map instead of std::unordered_set, so that even if I can only get a const of the key, I can just use that key to lookup the bar object and safely call foo on it.
I'd really appreciate some advice!
Thanks in advance!
Some clean solutions have already been shown in this answer.
Another clean way would be to add a layer of indirection through a pointer. Even if the pointer itself is const, the data pointed to will not be:
struct Bar
{
int key;
std::unique_ptr<int> pValue;
};
std::unordered_set< Bar, BarHash, BarEqual > bars;
for( const auto& bar : bars )
{
// Works because only the pointer is constant, not the data pointed to.
*bar.pValue = 42;
}
This obviously has the overhead of an additional memory allocation, the space required to store the pointer and the indirection when accessing the value through the pointer.
You will also have to write a custom copy constructor and an assignment operator if you want to keep value semantics.
Use const_cast. Don't know if this will work (yet). It kind of smells bad to me, but I'm happy to be enlightened!!
Yes, it will work. You can easily have an overload foo(bar const&), const_cast the reference, and then call foo(bar&). I agree with you that it smells bad and points to a flaw in design. You might want to take a fresh look at the design and see if there is a clean solution.
Use a std::unordered_map instead of std::unordered_set, so that even if I can only get a const of the key, I can just use that key to lookup the bar object and safely call foo on it
That is not too different from the first approach. std::unordered_set<T> is essentially std::unordered_map<T, bool>.
Potential clean solutions:
Get a copy of the object from the set, remove the entry from the set, update the copy, and put the copy back in the set. If that proves too expensive ...
Use a std::vector<Bar>. You can get a Bar& from the vector and all is well.
Make the member variables of Bar that don't impact its hash value to be mutable. Then, you can just use foo(Bar const&) and be able to call it directly using a reference to the objects in the set.

Move assignment operator, move constructor

I've been trying to nail down the rule of 5, but most of the information online is vastly over-complicated, and the example codes differ.
Even my textbook doesn't cover this topic very well.
On move semantics:
Templates, rvalues and lvalues aside, as I understand it, move semantics are simply this:
int other = 0; //Initial value
int number = 3; //Some data
int *pointer1 = &number; //Source pointer
int *pointer2 = &other; //Destination pointer
*pointer2 = *pointer1; //Both pointers now point to same data
pointer1 = nullptr; //Pointer2 now points to nothing
//The reference to 'data' has been 'moved' from pointer1 to pointer2
As apposed to copying, which would be the equivalent of something like this:
pointer1 = &number; //Reset pointer1
int newnumber = 0; //New address for the data
newnumber = *pointer1; //Address is assigned value
pointer2 = &newnumber; //Assign pointer to new address
//The data from pointer1 has been 'copied' to pointer2, at the address 'newnumber'
No explanation of rvalues, lvalues or templates is necessary, I would go as far as to say those topics are unrelated.
The fact that the first example is faster than the second, should be a given. And I would also point out that any efficient code prior to C++ 11 will do this.
To my understanding, the idea was to bundle all of this behavior in a neat little operator move() in std library.
When writing copy constructors and copy assignment operators, I simply do this:
Text::Text(const Text& copyfrom) {
data = nullptr; //The object is empty
*this = copyfrom;
}
const Text& Text::operator=(const Text& copyfrom) {
if (this != &copyfrom) {
filename = copyfrom.filename;
entries = copyfrom.entries;
if (copyfrom.data != nullptr) { //If the object is not empty
delete[] data;
}
data = new std::string[entries];
for (int i = 0; i < entries; i++) {
data[i] = copyfrom.data[i];
//std::cout << data[i];
}
std::cout << "Data is assigned" << std::endl;
}
return *this;
}
The equivalent, one would think, would be this:
Text::Text(Text&& movefrom){
*this = movefrom;
}
Text&& Text::operator=(Text&& movefrom) {
if (&movefrom != this) {
filename = movefrom.filename;
entries = movefrom.entries;
data = movefrom.data;
if (data != nullptr) {
delete[] data;
}
movefrom.data = nullptr;
movefrom.entries = 0;
}
return std::move(*this);
}
I'm quite certain this won't work, so my question is: How do you achieve this type of constructor functionality with move semantics?
It's not entirely clear to me what is supposed to be proved by your code examples -- or what the focus is of this question is.
Is it conceptually what does the phrase 'move semantics' mean in C++?
Is it "how do I write move ctors and move assignment operators?" ?
Here is my attempt to introduce the concept. If you want to see code examples then look at any of the other SO questions that were linked in comments.
Intuitively, in C and C++ an object is supposed to represent a piece of data residing in memory. For any number of reasons, commonly you want to send that data somewhere else.
Often one can take a direct approach of simply passing a pointer / reference to the object to the place where the data is needed. Then, it can be read using the pointer. Taking the pointer and moving the pointer around is very cheap, so this is often very efficient. The chief drawback is that you have to ensure that the object will live for as long as is needed, or you get a dangling pointer / reference and a crash. Sometimes that's easy to ensure, sometimes its not.
When it isn't, one obvious alternative is to make a copy and pass it (pass-by-value) rather than passing by reference. When the place where the data is needed has its own personal copy of the data, it can ensure that the copy stays around as long as is needed. The chief drawback here is that you have to make a copy, which may be expensive if the object is big.
A third alternative is to move the object rather than copying it. When an object is moved, it is not duplicated, and instead becomes available exclusively in the new site, and no longer in the old site. You can only do this when you won't need it at the old site anymore, obviously, but in that case this saves you a copy which can be a big savings.
When the objects are simple, all of these concepts are fairly trivial to actually implement and get right. For instance, when you have a trivial object, that is, one with trivial construction / destruction, it is safe to copy it exactly as you do in the C programming language, using memcpy. memcpy produces a byte-for-byte copy of a block of bytes. If a trivial object was properly initialized, since its creation has no possible side-effects, and its later destruction doesn't either, then memcpy copy is also properly initialized and results in a valid object.
However, in modern C++ many of your objects are not trivial -- they may "own" references to heap memory, and manage this memory using RAII, which ties the lifetime of the object to the usage of some resource. For instance, if you have a std::string as a local variable in a function, the string is not totally a "contiguous" object and rather is connected to two different locations in memory. There is a small, fixed-size (sizeof(std::string), in fact) block on the stack, which contains a pointer and some other info, pointing to a dynamically sized buffer on the heap. Formally, only the small "control" part is the std::string object, but intuitively from the programmer's point the buffer is also "part" of the string and is the part that you usually think about. You can't copy a std::string object like this using memcpy -- think about what will happen if you have std::string s and you try to copy sizeof(std::string) bytes from address &s to get a second string. Instead of two distinct string objects, you'll end up with two control blocks, each pointing to the same buffer. And when the first one is destroyed, that buffer is deleted, so using the second one will cause a segfault, or when the second one is destroyed, you get a double delete.
Generally, copying nontrivial C++ objects with memcpy is illegal and causes undefined behavior. This is because it conflicts with one of the core ideas of C++ which is that object creation and destruction may have nontrivial consequences defined by the programmer using ctors and dtors. Object lifetimes may be used to create and enforce invariants which you use to reason about your program. memcpy is a "dumb" low-level way to just copy some bytes -- potentially it bypasses the mechanisms that enforce the invariants which make your program work, which is why it can cause undefined behavior if used incorrectly.
Instead, in C++ we have copy constructors which you can use to safely make copies of nontrivial objects. You should write these in a way that preserves what invariants you need for your object. The rule of three is a guideline about how to actually do that.
The C++11 "move semantics" idea is a collection of new core language features which were added to extend and refine the traditional copy construction mechanism from C++98. Specifically, it's about, how do we move potentially complex RAII objects, not just trivial objects, which we already were able to move. How do we make the language generate move constructors and such for us automatically when possible, similarly to how it does it for copy constructors. How do we make it use the move options when it can to save us time, without causing bugs in old code, or breaking core assumptions of the language. (This is why I would say that your code example with int's and int *'s has little to do with C++11 move semantics.)
The rule of five, then, is the corresponding extension of the rule of three which describes conditions when you may need to implement a move ctor / move assignment operator also for a given class and not rely on the default behavior of the language.

How to insert / emplace into map to avoid creation of temporary objects?

I'm reading a recommendation from one of C++11 books to prefer emplace over insert when adding items to the container in order to avoid creation of temporary objects (calls of constructors / destructors of the objects being inserted). But I'm a bit confused because there are a few possibilities how one can add an object to the map, e.g.
#include <iostream>
#include <string>
#include <cstdint>
#include <map>
int main()
{
std::string one { "one" };
std::string two { "two" };
std::map<uint32_t, std::string> testMap;
testMap.insert(std::make_pair(1, one)); // 1
testMap.emplace(2, two); // 2
testMap.insert(std::make_pair(3, "three")); // 3
testMap.emplace(4, "four"); // 4
using valType = std::map < uint32_t, std::string >::value_type;
testMap.emplace(valType(5, "five")); // 5
testMap.insert(valType(6, "six")); // 6
return 0;
}
There are also some under-the-hood-mechanisms involved which are not immediately visible when reading such a code - perfect forwarding, implicit conversions ...
What is the optimal way of adding items to the map container ?
Let's consider your options one at a time (plus one or two you haven't mentioned).
Options 1 and 6 are essentially identical as far as semantics go. The using and the pair are just two different ways of spelling the value_type of the map. If you wanted you could add a third way using a typedef instead of a using statement:
typedef std::map<uint32_t, std::string>::value_type valType;
...and have a C++98/03 equivalent of your #6. All three end up doing the same thing though: creating a temporary object of the pair type, and inserting that into the map.
Versions 3 and 5 do pretty much the same. They use emplace, but what they pass is already an object of the map's value_type. By the time emplace itself starts to execute, the type of object that will be stored in the map has already been constructed. Again, the only difference between the two is in the syntax used to specify that pair type--and, again, with a typedef like I've shown above, you could have a C++98/03 equivalent of the one that currently has the using statement. The fact that version 3 uses insert and version 5 uses emplace makes almost no real difference--by the time either member function is invoked, we've already created and passed a temporary object.
Options 2 and 4 both actually use emplace more like it was probably intended--passing individual components, perfect-forwarding them to the constructor, and constructing a value_type object in-place, so we avoid creating any temporary objects at any point. The primary (sole?) difference between the two of them is in whether the thing we pass for the string component of the value_type is a string literal (from which a temporary std::string object needs to be created) or a std::string object that was created ahead of time.
The choice between those could be non-trivial. If (as above) you're only doing it once, it won't really make any difference at all--regardless of when you create it, you're creating a string object, then putting it into the map.
So, to make a real difference, we need to pre-create the string object, then repeatedly insert that same string object into the map. That's pretty unusual in itself--in most cases, you're going to do something like read external data into a string, then insert that into the map. If you really do insert (an std::string constructed from) the same string literal repeatedly, chances are pretty good that any reasonable compiler can detect that the resulting string is loop-invariant, and hoist the string construction out of the loop, giving essentially the same effect.
Bottom line: as far as the use of map itself goes, choices 2 and 4 are equivalent. Between those two, I wouldn't go to any real effort to use option 2 over option 4 (i.e., pre-creating the string) but it's likely to happen most of the time, simply because inserting a single string literal into a map is rarely useful. The string you put in a map will much more frequently come from some external data source, so you'll have a string because that's what (for example) std::getline gave you when you read the data from the file.

std::shared_ptr assignment of data vs. memcpy

I am using std::shared_ptr in C++11 and I would like to understand if it's better to assign structures of type T in this way:
T a_data;
std::shared_ptr<T> my_pointer(new T);
*my_pointer = a_data;
or like:
memcpy(&my_pointer, data, sizeof(T));
or like:
my_pointer.reset(a_data);
Regards
Mike
They each do a different thing.
1.
T a_data;
std::shared_ptr<T> my_pointer(new T);
*my_pointer = a_data;
Here, a new object (call it n) of type T will be allocated, managed by my_pointer. Then, object a_data will be copy-assigned into n.
2.
memcpy(&my_pointer, a_data, sizeof(T)); // I assume you meant a_data here, not data
That's total nonsense - tha's overwriting the shared_ptr itself with the contents of a_data. Undefined behaviour at its finest (expect a crash or memory corruption).
Perhaps you actually meant my_pointer.get() instead of &my_pointer (that is, you wanted to copy into the object being pointed to)? If that's the case, it can work, as long as T is trivially copyable - which means that it doesn't have non-trivial copy or move ctors, doesn't have non-trivial copy or move assignment operators, and has a trivial destructor. But why rely on that, when normal assignment (*my_pointer = a_data;) does exactly the same for that case, and also works for non-trivially-copyable classes?
3.
my_pointer.reset(a_data);
This normally won't compile as-is, it would need to be my_pointer.reset(&a_data);. That's disaster waiting to happen - you point my_pointer to the automatic (= local) variable a_data and give it ownership of that. Which means that when my_pointer goes out of scope (actually, when the last pointer sharing ownership wiht it does), it will call the deleter, which normally calls delete. On a_data, which was not allocated with new. Welcome to UB land again!
If you just need to manage a dynamically-allocated copy of a_data with a shared_ptr, do this:
T a_data;
std::shared_ptr<T> my_pointer(new T(a_data));
Or even better:
T a_data;
auto my_pointer = std::make_shared<T>(a_data);

Why this two code snippet for using Move semantic but the next doesn't?

Base on this artical, http://www.drdobbs.com/cpp/when-is-it-safe-to-move-an-object-instea/240156579#disqus_thread
Following code will NOT call the move constructor:
void func()
{
Thing t;
work_on(t);
// never using t variable again ...
}
Following code will call the move constructor:
work_on(Thing());
The reason is for the first code snippet, the constructor may save the constructing object address, and use it later.
My question is:
But for the second code snippet, the temp object still are alive before work_on finished base on the C++ standard, so the author can also save the address of the constructing object, and use it inside work_on function.
So base on the same reason, it also shouldn't call move constructor, doesn't this make sense?
void func()
{
Thing t;
work_on(t); // <--- POINT 1
work_on(move(t)); // <--- POINT 2
work_on(Thing()); // <--- POINT 3
}
The expression t at POINT 1 is an lvalue.
The expression move(t) at POINT 2 is an xvalue.
The expression Thing() at POINT 3 is a prvalue.
Based on this value category of an expression, a best viable function is chosen from the overloaded set.
Suppose the two available functions were:
work_on(const Thing&); // lvalue reference version
work_on(Thing&&); // rvalue reference version
An lvalue will select the lvalue reference version, and will never bind to the rvalue reference version.
An xvalue or prvalue (collectively called rvalues) will viably bind to either, but will select the rvalue reference version as the better match if available.
Inside the implementation of the two versions of work_on, the parameters are largely the same. The purpose of this is that the rvalue reference version can assume that the argument is theirs to modify or move. So it may call the move constructor on its argument - whereas the lvalue reference version should not.
So suppose we had some vector<Thing> V that work_on should add their parameter to:
void work_on(Thing&& t)
{
V.push_back(move(t));
}
void work_on(const Thing& t)
{
V.push_back(t);
}
std::vector::push_back is overloaded in a similar fashion to work_on, and a similar overload resolution takes place. Inside the two different implementations of push_back, the rvalue reference version will call the move constructor to push the value onto its array, possibly destroying t. The lvalue reference version will call the copy constructor, leaving t intact.
The main purpose of this language mechanic is simply to keep track of variables (lvalues), intentionally marked expiring values (xvalues) and temporaries (prvalues) - so we know when we can safely reuse their resources (move them) and when we can copy them.
You got all your reasons wrong. There's nothing about "saving addresses". (Anyone can write any manner of horribly broken code by randomly storing addresses. That's not an argument.)
The simple reason is that in the first snippet, t continues living and can be used, so you can't move from it:
Thing t;
work_on(t);
t.x = 12;
foo(t);
In the second snippet, though, the temporary value Thing() only lives on that one line, till the end of the full-expression, so nobody can possibly refer to that value after the end of the statement. Thus it's perfectly safe to move from (i.e. mutate) the object.

Resources