C++ return value and move rule exceptions - c++11

When we return a value from a C++ function copy-initialisation happens. Eg:
std::string hello() {
std::string x = "Hello world";
return x; // copy-init
}
Assume that RVO is disabled.
As per copy-init rule if x is a non-POD class type, then the copy constructor should be called. However for C++11 onward, I see move-constrtuctor being called. I could not find or understand the rules regarding this https://en.cppreference.com/w/cpp/language/copy_initialization. So my first question is -
What does the C++ standard say about move happening for copy-init when value is returned from function?
As an extension to the above question, I would also like to know in what cases move does not happen. I came up with the following case where copy-constructor is called instead of move:
std::string hello2(std::string& param) {
return param;
}
Finally, in some library code I saw that std::move was being explicitly used when returning (even if RVO or move should happen). Eg:
std::string hello3() {
std::string x = "Hello world";
return std::move(x);
}
What is the advantage and disadvantage of explicitly using std::move when returning?

You are confused by the fact that initialization via the move constructor is a special case of "copy initialization", and does not come as seperate concept. Check the notes on the cppreference page.
If other is an rvalue expression, move constructor will be selected by overload resolution and called during copy-initialization. There is no such term as move-initialization.
For returning a value from the function, check the description of returning on cppreference. It says in a box called "automatic move from local variables and parameters", where expression refers to what you return (warning: that quote is shortened! read the original for full details about other cases):
If expression is a (possibly parenthesized) id-expression that names a variable whose type is [...] a non-volatile object type [...] and that variable is declared [...] in the body or as a parameter of the [...] function, then overload resolution to select the constructor to use for initialization of the returned value is performed twice: first as if expression were an rvalue expression (thus it may select the move constructor), and if the first overload resolution failed [...] then overload resolution is performed as usual, with expression considered as an lvalue (so it may select the copy constructor).
So in the special case of returning a local variable, the variable can be treated as r-value, even if normal syntactic rules would make it a l-value. The spirit of the rule is that after the return, you can't find out whether the value of the local variable has been destroyed during the copy-initialization of the returned value, so moving it does not do any damage.
Regarding your second question: It is considered bad style to use std::move while returning, because moving will happen anyway, and it inhibits NRVO.
Quoting the C++ core guidelines linked above:
Never write return move(local_variable);, because the language already knows the variable is a move candidate. Writing move in this code won’t help, and can actually be detrimental because on some compilers it interferes with RVO (the return value optimization) by creating an additional reference alias to the local variable.
So that library code you quote is suboptimal.
Also, you can not implicitly move from anything that is not local to the function (that is local variables and value parameters), because implicit moving may move from something that is still visible after the function returned. In the quote from cppreference, the important point is "a non-volatile object type". When you return std::string& param, that is a variable with reference type.

Related

Is it still necessary to use std move even if auto && has been used

As we know, STL usually offered two kinds of functions to insert an element: insert/push and emplace.
Let's say I want to emplace all of elements from one container to another.
for (auto &&element : myMap)
{
anotherMap.emplace(element); // vs anotherMap.empalce(std::move(element));
}
In this case, if I want to call the emplace, instead of insert/push, must I still call std::move here or not?
If you indeed want to move all elements from myMap into anotherMap then yes you must call std::move(). The reason is that element here is still an lvalue. Its type is rvalue reference as declared, but the expression itself is still an lvalue, and thus the overload resolution will give back the lvalue reference constructor better known as the copy constructor.
This is a very common point of confusion. See for example this question.
Always keep in mind that std::move doesn't actually do anything itself, it just guarantees that the overload resolver will see an appropriately-typed rvalue instead of an lvalue associated with a given identifier.

Avoiding self assignment in std::shuffle

I stumbled upon the following problem when using the checked implementation of glibcxx:
/usr/include/c++/4.8.2/debug/vector:159:error: attempt to self move assign.
Objects involved in the operation:
sequence "this" # 0x0x1b3f088 {
type = NSt7__debug6vectorIiSaIiEEE;
}
Which I have reduced to this minimal example:
#include <vector>
#include <random>
#include <algorithm>
struct Type {
std::vector<int> ints;
};
int main() {
std::vector<Type> intVectors = {{{1}}, {{1, 2}}};
std::shuffle(intVectors.begin(), intVectors.end(), std::mt19937());
}
Tracing the problem I found that shuffle wants to std::swap an element with itself. As the Type is user defined and no specialization for std::swap has been given for it, the default one is used which creates a temporary and uses operator=(&&) to transfer the values:
_Tp __tmp = _GLIBCXX_MOVE(__a);
__a = _GLIBCXX_MOVE(__b);
__b = _GLIBCXX_MOVE(__tmp);
As Type does not explicitly give operator=(&&) it is default implemented by "recursively" applying the same operation on its members.
The problem occurs on line 2 of the swap code where __a and __b point to the same object which results in effect in the code __a.operator=(std::move(__a)) which then triggers the error in the checked implementation of vector::operator=(&&).
My question is: Who's fault is this?
Is it mine, because I should provide an implementation for swap that makes "self swap" a NOP?
Is it std::shuffle's, because it should not try to swap an element with itself?
Is it the checked implementation's, because self-move-assigment is perfectly fine?
Everything is correct, the checked implementation is just doing me a favor in doing this extra check (but then how to turn it off)?
I have read about shuffle requiring the iterators to be ValueSwappable. Does this extend to self-swap (which is a mere runtime problem and can not be enforced by compile-time concept checks)?
Addendum
To trigger the error more directly one could use:
#include <vector>
int main() {
std::vector<int> vectorOfInts;
vectorOfInts = std::move(vectorOfInts);
}
Of course this is quite obvious (why would you move a vector to itself?).
If you where swapping std::vectors directly the error would not occur because of the vector class having a custom implementation of the swap function that does not use operator=(&&).
The libstdc++ Debug Mode assertion is based on this rule in the standard, from [res.on.arguments]
If a function argument binds to an rvalue reference parameter, the implementation may assume that this parameter is a unique reference to this argument.
i.e. the implementation can assume that the object bound to the parameter of T::operator=(T&&) does not alias *this, and if the program violates that assumption the behaviour is undefined. So if the Debug Mode detects that in fact the rvalue reference is bound to *this it has detected undefined behaviour and so can abort.
The paragraph contains this note as well (emphasis mine):
[Note: If a program casts an lvalue to an xvalue while passing that lvalue to a library function (e.g., by calling the function with the argument
std::move(x)), the program is effectively asking that function to treat that lvalue as a temporary object. The implementation is free to optimize away aliasing checks which might be needed if the
argument was an lvalue. —end note]
i.e. if you say x = std::move(x) then the implementation can optimize away any check for aliasing such as:
X::operator=(X&& rval) { if (&rval != this) ...
Since the implementation can optimize that check away, the standard library types don't even bother doing such a check in the first place. They just assume self-move-assignment is undefined.
However, because self-move-assignment can arise in quite innocent code (possibly even outside the user's control, because the std::lib performs a self-swap) the standard was changed by Defect Report 2468. I don't think the resolution of that DR actually helps though. It doesn't change anything in [res.on.arguments], which means it is still undefined behaviour to perform a self-move-assignment, at least until issue 2839 gets resolved. It is clear that the C++ standard committee think self-move-assignment should not result in undefined behaviour (even if they've failed to actually say that in the standard so far) and so it's a libstdc++ bug that our Debug Mode still contains assertions to prevent self-move-assignment.
Until we remove the overeager checks from libstdc++ you can disable that individual assertion (but still keep all the other Debug Mode checks) by doing this before including any other headers:
#include <debug/macros.h>
#undef __glibcxx_check_self_move_assign
#define __glibcxx_check_self_move_assign(x)
Or equivalently, using just command-line flags (so no need to change the source code):
-D_GLIBCXX_DEBUG -include debug/macros.h -U__glibcxx_check_self_move_assign '-D__glibcxx_check_self_move_assign(x)='
This tells the compiler to include <debug/macros.h> at the start of the file, then undefines the macro that performs the self-move-assign assertion, and then redefines it to be empty.
(In general defining, undefining or redefining libstdc++'s internal macros is undefined and unsupported, but this will work, and has my blessing).
It is a bug in GCC's checked implementation. According to the C++11 standard, swappable requirements include (emphasis mine):
17.6.3.2 §4 An rvalue or lvalue t is swappable if and only if t is swappable with any rvalue or lvalue, respectively, of type T
Any rvalue or lvalue includes, by definition, t itself, therefore to be swappable swap(t,t) must be legal. At the same time the default swap implementation requires the following
20.2.2 §2 Requires: Type T shall be MoveConstructible (Table 20) and MoveAssignable (Table 22).
Therefore, to be swappable under the definition of the default swap operator self-move assignment must be valid and have the postcondition that after self assignment t is equivalent to it's old value (not necessarily a no-op though!) as per Table 22.
Although the object you are swapping is not a standard type, MoveAssignable has no precondition that rv and t refer to different objects, and as long as all members are MoveAssignable (as std::vector should be) the generate move assignment operator must be correct (as it performs memberwise move assignment as per 12.8 §29). Furthermore, although the note states that rv has valid but unspecified state, any state except being equivalent to it's original value would be incorrect for self assignment, as otherwise the postcondition would be violated.
I read a couple of tutorials about copy constructors and move assignments and stuff (for example this). They all say that the object must check for self assignment and do nothing in that case. So I would say it is the checked implementation's fault, because self-move-assigment is perfectly fine.

Why this two code snippet for using Move semantic but the next doesn't?

Base on this artical, http://www.drdobbs.com/cpp/when-is-it-safe-to-move-an-object-instea/240156579#disqus_thread
Following code will NOT call the move constructor:
void func()
{
Thing t;
work_on(t);
// never using t variable again ...
}
Following code will call the move constructor:
work_on(Thing());
The reason is for the first code snippet, the constructor may save the constructing object address, and use it later.
My question is:
But for the second code snippet, the temp object still are alive before work_on finished base on the C++ standard, so the author can also save the address of the constructing object, and use it inside work_on function.
So base on the same reason, it also shouldn't call move constructor, doesn't this make sense?
void func()
{
Thing t;
work_on(t); // <--- POINT 1
work_on(move(t)); // <--- POINT 2
work_on(Thing()); // <--- POINT 3
}
The expression t at POINT 1 is an lvalue.
The expression move(t) at POINT 2 is an xvalue.
The expression Thing() at POINT 3 is a prvalue.
Based on this value category of an expression, a best viable function is chosen from the overloaded set.
Suppose the two available functions were:
work_on(const Thing&); // lvalue reference version
work_on(Thing&&); // rvalue reference version
An lvalue will select the lvalue reference version, and will never bind to the rvalue reference version.
An xvalue or prvalue (collectively called rvalues) will viably bind to either, but will select the rvalue reference version as the better match if available.
Inside the implementation of the two versions of work_on, the parameters are largely the same. The purpose of this is that the rvalue reference version can assume that the argument is theirs to modify or move. So it may call the move constructor on its argument - whereas the lvalue reference version should not.
So suppose we had some vector<Thing> V that work_on should add their parameter to:
void work_on(Thing&& t)
{
V.push_back(move(t));
}
void work_on(const Thing& t)
{
V.push_back(t);
}
std::vector::push_back is overloaded in a similar fashion to work_on, and a similar overload resolution takes place. Inside the two different implementations of push_back, the rvalue reference version will call the move constructor to push the value onto its array, possibly destroying t. The lvalue reference version will call the copy constructor, leaving t intact.
The main purpose of this language mechanic is simply to keep track of variables (lvalues), intentionally marked expiring values (xvalues) and temporaries (prvalues) - so we know when we can safely reuse their resources (move them) and when we can copy them.
You got all your reasons wrong. There's nothing about "saving addresses". (Anyone can write any manner of horribly broken code by randomly storing addresses. That's not an argument.)
The simple reason is that in the first snippet, t continues living and can be used, so you can't move from it:
Thing t;
work_on(t);
t.x = 12;
foo(t);
In the second snippet, though, the temporary value Thing() only lives on that one line, till the end of the full-expression, so nobody can possibly refer to that value after the end of the statement. Thus it's perfectly safe to move from (i.e. mutate) the object.

What expressions are allowed in tracepoints?

When creating a tracepoint in Visual Studio (right-click the breakpoint and choose "When Hit..."), the dialog has this text, emphasis mine:
You can include the value of a variable or other expression in the message by placing it in curly braces...
What expressions are allowed?
Microsoft's documentation is rather sparse on the exact details of what is and is not allowed. Most of the below was found by trial and error in the Immediate window. Note that this list is for C++, as that's what I code in. I believe in C#, some of the prohibited items below are actually allowed.
Most basic expressions can be evaluated, including casting, setting variables, and calling functions.
General Restrictions
Only C-style casts supported; no static_cast, dynamic_cast, reinterpret_cast, const_cast
Can't declare new variables or create objects
Can't use overloaded operators
Ternary operator doesn't work
Can't use the comma operator because Visual Studio uses it to format the result of the expression; use multiple sets of braces for multiple expressions
Function Calls
Prohibited calls:
Lambdas (can't define or call them)
Functions in an anonymous namespace
Functions that take objects by value (because you can't create objects)
Permitted calls:
Member functions, both regular and virtual
Functions taking references or pointers, to either fundamental or class types
Passing in-scope variables
Using "&" to pass pointers to in-scope variables
Passing the literals "true", "false", numbers
Passing string literals, as long you don't run afoul of the "can't create objects" rule
Calling multiple functions with one tracepoint by using multiple sets of braces
Variable Assignment
Prohibited:
Objects
String literals
Permitted:
Variables with fundamental types, value either from literals or other variables
Memory addresses, after casting: { *(bool*)(0x1234) = true }
Registers: { #eip = 0x1234 }
Use Cases
Calling functions from tracepoints can be quite powerful. You can get around most of the restrictions listed above with a carefully set up function and the right call. Here are some more specific ideas.
Force an if
Pretty straightforward: set up a tracepoint to set a variable and force an if-condition to true or false, depending on what you need to test. All without adding code or leaving the debug session.
Breakpoint "toggling"
I've seen the question a few times, "I need to break in a spot that gets hit a lot. I'd like to simply enable that breakpoint from another breakpoint, so the one I care about only gets breaks from a certain code path. How can I do that?" With our knowledge above, it's easy, although you do need a helper variable.
Create a global boolean, set to false.
Create a breakpoint at your final destination, with a condition to break only when the global flag is true.
Set tracepoints in the critical spots that assign the global flag to true.
The nice thing is that you can move the tracepoints around without leaving the debugging session. Use the Immediate window or the Watch window to reset your global flag, if you need to make another run at it. When you're done, all you need to clean up is that global boolean. No other code to remove.
Automatically skip code
The EIP register (at least on x86) is the instruction pointer. If you assign to it, you can change your program flow.
Find the address of the line you want to skip to by breaking on it once and looking at the value of EIP, either in the Registers window or the Watch window with "#eip,x". (Note that the value in the Registers window is hex, but without the leading "0x".)
Add a tracepoint on the line you want to skip from, with an expression like {#eip = address}, using the address from step 1.
EIP assignment will happen before anything on the line is executed.
Although this can be handy, be careful because skipping code like this can cause weird behavior.
As Kurt Hutchinson says, string assignment is not allowed in a tracepoint. You can get around this by creating a method that assigns the string variable, and call that.
public static class Helper
{
public static void AssignTo(this string value, out string variable)
{
variable = value;
}
}
Then in your tracepoint message:
{"new string value".AssignTo(out stringVariable)}

General programming - calling a non void method but not using value

This is general programming, but if it makes a difference, I'm using objective-c. Suppose there's a method that returns a value, and also performs some actions, but you don't care about the value it returns, only the stuff that it does. Would you just call the method as if it was void? Or place the result in a variable and then delete it or forget about it? State your opinion, what you would do if you had this situation.
A common example of this is printf, which returns an int... but you rarely see this:
int val = printf("Hello World");
Yeah just call the method as if it was void. You probably do it all the time without noticing it. The assignment operator '=' actually returns a value, but it's very rarely used.
It depends on the environment (the language, the tools, the coding standard, ...).
For example in C, it is perfectly possible to call a function without using its value. With some functions like printf, which returns an int, it is done all the time.
Sometimes not using a value will cause a warning, which is undesirable. Assigning the value to a variable and then not using it will just cause another warning about an unused variable. For this case the solution is to cast the result to void by prefixing the call with (void), e.g.
(void) my_function_returning_a_value_i_want_to_ignore().
There are two separate issues here, actually:
Should you care about returned value?
Should you assign it to a variable you're not going to use?
The answer to #2 is a resounding "NO" - unless, of course, you're working with a language where that would be illegal (early Turbo Pascal comes to mind). There's absolutely no point in defining a variable only to throw it away.
First part is not so easy. Generally, there is a reason value is returned - for idempotent functions the result is function's sole purpose; for non-idempotent it usually represents some sort of return code signifying whether operation was completed normally. There are exceptions, of course - like method chaining.
If this is common in .Net (for example), there's probably an issue with the code breaking CQS.
When I call a function that returns a value that I ignore, it's usually because I'm doing it in a test to verify behavior. Here's an example in C#:
[Fact]
public void StatService_should_call_StatValueRepository_for_GetPercentageValues()
{
var statValueRepository = new Mock<IStatValueRepository>();
new StatService(null, statValueRepository.Object).GetValuesOf<PercentageStatValue>();
statValueRepository.Verify(x => x.GetStatValues());
}
I don't really care about the return type, I just want to verify that a method was called on a fake object.
In C it is very common, but there are places where it is ok to do so and other places where it really isn't. Later versions of GCC have a function attribute so that you can get a warning when a function is used without checking the return value:
The warn_unused_result attribute causes a warning to be emitted if a caller of the function with this attribute does not use its return value. This is useful for functions where not checking the result is either a security problem or always a bug, such as realloc.
int fn () __attribute__ ((warn_unused_result));
int foo ()
{
if (fn () < 0) return -1;
fn ();
return 0;
}
results in warning on line 5.
Last time I used this there was no way of turning off the generated warning, which causes problems when you're compiling 3rd-party code you don't want to modify. Also, there is of course no way to check if the user actually does something sensible with the returned value.

Resources