In C++, how can one predict if move or copy semantics would be invoked? - c++11

Given the latitude that a C++ compiler has in instantiating temporary objects, and in invoking mechanisms like return value optimization etc., it is not always clear by looking at some code if move or copy semantics will be invoked (or how many).
It almost feels as if these primitives exist for incidental optimizations. That is, you may or may not get them. It seems like it's difficult to design any kind of resource management strategy that leverages moves, when it is hard to control the invocation of moves themselves.
Is there a way to predict clearly (and simply) where and how many copies and moves might occur in some code? Ideally, one would not need to be an expert in compiler internals to be able to do this.

It seems like it's difficult to design any kind of resource management strategy that leverages moves, when it is hard to control the invocation of moves themselves.
I would contradict here. Leveraging move semantics when designing a resource handling class should be done independently of how or when copy- or move-construction occurs in the client code. Once move-ctor/assignment is there, client code can be designed to leverage the existence of these special member functions.
Is there a way to predict clearly (and simply) where and how many copies and moves might occur in some code?
A bit hard to tell what simply means here, but this is how I understand it:
Given that a class has no move ctor/assignment operator, you will always get a copy. This is trivial, but important to keep in mind when working with e.g. classes in a legacy code that have user defined destructors and/or copy-ctor/assignment, because the compiler doesn't generate move ctors/assignment in this case.
Return value optimization. The question is tagged C++11, so you don't have guaranteed copy elision for initialization with prvalues brought by C++17. However, it is fair to assume that identical mechanism are already implemented by your compiler. Hence,
struct A {};
A func() { return A{}; }
can be assumed to construct the instance of A to which the function return value is bound on the calling side in place. This causes neither move nor copy construction. The same behavior can optimistically be assumed if the returned object has a name, as long as func() has no branching that renders NRVO impossible.
As an exception from this guideline, function return values that are also function parameters do not qualify for return value optimization. Hence, move/forward them to prevent copy in case A is move-constructible:
A func(A& a) { return std::move(a); }
The object created by the return value of func(A&) will hence be move-constructed.
Function parameters do not reveal per se how they behave, it depends on the type and its special member functions. Given
void f1(A a1) { A a2{std::move(a1)}; };
void f2(A& a1) { /* Same as above. */ };
void f1(A&& a1) { /* Again, same. */ };
the instances a2 are move-constructed if A has a move ctor, otherwise, it's copy.
There is a lot to discover beyond the exemplary cases above, I am neither capable of going into more detail, nor would this fit into the desired simplicity of an answer. Also, the scenario is different when you don't know the types you are dealing with, e.g. in function or class templates. In this case, a good read on how to deal with the related uncertainty of whether copies or moves are made is Item 29 in Eff. Modern C++ ("Assume that move operations are not present, not cheap, and not used").

Related

Why isn't std::move a keyword in C++?

Obviously, move semantics/r-value references were a much needed addition in C++11. One thing that has always bugged me though, is std::move. The purpose of std::move is to transform an l-value into an r-value. And yet, the compiler is perfectly happy to let you continue using that value as an l-value and you get to find out at runtime that you screwed up.
It seems like there is a missed opportunity to define move (or some other name) as a keyword (similar to *_cast) and actually have the compiler understand that the referenced value can no longer be used as an l-value here. I'm sure there is some implementation work to do this, but is there some fundamental reason why this wasn't done?
In C++, moved-from objects in are still objects. They can be used. They are usually in a defined state.
There are some optimizations you can do when you are willing to 'rip the guts' out of an object and use it elsewhere. The C++ committee decided these optimizations should be done implicitly and automatically in a few cases; usually where elision was already permitted, but where it wouldn't work for whatever reason.
Then, the ability to explicitly do this was added. Making this operation end the lifetime of its right hand side would complicate the lifetime rules of C++ to an extreme degree; rather than doing that, they noted they could be highly efficient without complicating the lifetime rules of C++ and leaving them exactly as-is.
It turns out there are a handful of flaws in this; to this extent, C++20 may be adding some "move and destroy the source" operations. In particular, a number of move-construction like operations are easier to write as nothrow if you can both move and destroy the source in one fell swoop.
Actually having it change the lifetime of automatic storage variables is not in the cards. Even describing how such a change would work, let alone making sure it doesn't break anything horribly, would be a challenge.
A simple example of why having it always happen wouldn't be good might be:
Foo foo;
if (some_condition) {
bar = std::move(foo);
}
the lifetime of foo is now a function of some_condition? You'd either have to ban the above with that kind of construct, or go down a pit of madness you may never get out of.

Indirect Member RAII: unique_ptr or optional?

Consider a class with a member that can't be stored directly, e.g., because it does not have a default constructor, and the enclosing class's constructor doesn't have enough information to create it:
class Foo
{
public:
Foo(){} // Default ctor
private:
/* Won't build: no default ctor or way to call it's
non-default ctor at Foo's ctor. */
Bar m_bar;
};
Clearly, m_bar needs to be stored differently, e.g., through a pointer. A std::unique_ptr seems better, though, as it will destruct it automatically:
std::unique_ptr<Bar> m_bar;
It's also possible to use std::experimental::optional, though:
std::experimenatl::optional<Bar> m_bar;
My questions are: 1. What are the tradeoffs? and 2. Does it make sense to build a class automating the choice between them?
Specifically, looking at the exception guarantees for the ctor of std::unique_ptr and the exception guarantees for the ctor of std::experimental::optional, it seems clear that the former must perform dynamic allocation and deallocation - runtime speed disadvantages, and the latter stores things in some (aligned) memory buffer - size disadvantages. Are these the only tradeoffs?
If these are indeed the tradeoffs, and given that both types share enough of their interface (ctor, operator*), does it make sense to automate the choice between them with something like
template<typename T>
using indirect_raii = typename std::conditional<
// 20 - arbitrary constant
sizeof(std::experimental::optional<T>) >
20 + sizeof(std::exerimental::optional<T>)sizeof(std::unique_ptr<T>),
std::unique_ptr<T>,
std::experimental::optional<T>>::type;
(Note: there is a question discussing the tradeoffs between these two as return types, but the question and answers focus on what each conveys to the callers of the function, which is irrelevant for these private members.)
IMO there are other trade-offs at play here:
unique_ptr is not copyable or copy-assignable, while optional is.
I suppose one thing you could do is make indirect_RAII a class-type and conditionally add definitions to make it copyable by calling Bar's copy ctor, even when unique_ptr is selected. (Or conversely, disable copying when it's an optional.)
optional types can have a constexpr constructor -- you can't really do the equivalent thing with a unique_ptr at compile-time.
Bar can be incomplete at the time that unique_ptr<Bar> is constructed. It cannot be incomplete at the time that optional<Bar> is known. In your example I guess you assume that Bar is complete since you take its size, but potentially you might want to implement a class using indirect_RAII where this isn't the case.
Even in cases where Bar is large, you still may find that e.g. std::vector<Foo> will perform better when optional is selected than when unique_ptr is. I would expect this to happen in cases where the vector is populated once, and then iterated over many times.
It may be that as a general rule of thumb, your size rule is good for common use in your program, but I guess for "common use" it doesn't really matter which one you pick. An alternative to using your indirect_RAII type is, just pick one or the other in each case, and in places where you would have taken advantage of the "generic interface", pass the type as a template parameter when necessary. And in performance-critical areas, make the appropriate choice manually.

What is the rationale of Go not having the const qualifier?

I'm a C++ senior programmer. I'm currently doing some Go programming. The only feature I really miss is the const qualifier. In go, if you want to modify an object, you pass its pointer. If you don't want to modify it, you pass it by value. But if the struct is big, you should pass it by pointer, which overrides the no-modification feature. Worse, you can pass an object by value, but if it contains a pointer, you can actually modify its contents, with terrible race condition dangers. Some language types like maps and slices have this feature. This happens in a language that's supposed to be built for concurrency. So the issue of avoiding modification is really non-existent in Go, and you should pass small objects that do not contain pointers (you must be aware that the object does not contain a pointer) by value, if they aren't gonna be modified.
With const, you can pass objects by const pointer and don't worrying about modification. Type-safety is about having a contract that allows speed and prevents type-related bugs. Another feature that does this too is the const qualifier.
The const type qualifier in C/C++ has various meanings. When applied to a variable, it means that the variable is immutable. That's a useful feature, and one that is missing from Go, but it's not the one you seem to be talking about.
You are talking about the way that const can be used as a partially enforced contract for a function. A function can give a pointer parameter the const qualifier to mean that the function won't change any values using that pointer. (Unless, of course, the function uses a cast (a const_cast in C++). Or, in C++, the pointer points to a field that is declared mutable.)
Go has a very simple type system. Many languages have a complex type system in which you enforce the correctness of your program by writing types. In many cases this means that a good deal of programming involves writing type declarations. Go takes a different approach: most of your programming involves writing code, not types. You write correct code by writing correct code, not by writing types that catch cases where you write incorrect code. If you want to catch incorrect code, you write analyzers, like go vet that look for cases that are invalid in your code. These kinds of analyzers are much much easier to write for Go than for C/C++, because the language is simpler.
There are advantages and disadvantages to this kind of approach. Go is making a clear choice here: write code, not types. It's not the right choice for everyone.
Please treat it as an expanded comment. I'm not any programming language designer, so can't go deep inside the details here, but will present my opinion as a long-term developer in C++ and short-term developer in Go.
Const is a non-trivial feature for the compiler, so one would have to make sure whether it's providing enough advantage for the user to implement it as well as won't sacrifice the simplicity of syntax. You might think it's just a const qualifier we're talking about, but looking at C++ itself, it's not so easy – there're a lot of caveats.
You say const is a contract and you shouldn't be able to modify it at any circumstances. One of your arguments against using read only interfaces is that you can cast it to original type and do whatever you want. Sure you can. The same way you can show a middle finger to the contract in C++ by using const_cast. For some reason it was added to the language and, not sure I should be proud of it, I've used it once or twice.
There's another modifier in C++ allowing you to relax the contract – mutable. Someone realised that const structures might actually need to have some fields modified, usually mutexes protecting internal variables. I guess you would need something similar in Go in order to be able to implement thread-safe structures.
When it comes simple const int x people can easily follow. But then pointers jump in and people really get consfused. const int * x, int * const x, const int * const x – these are all valid declarations of x, each with different contract. I know it's not a rocket science to choose the right one, but does your experience as a senior C++ programmer tell you people widely understand these and are always using the right one? And I haven't even mentioned things like const int * const * * * const * const x. It blows my mind.
Before I move to point 4, I would like to cite the following:
Worse, you can pass an object by value, but if it contains a pointer,
you can actually modify its contents
Now this is interesting accusation. There's the same issue in C++; worse – it exists even if you declare object as const, which means you can't solve the problem with a simple const qualifier. See the next point:
Per 3, and pointers, it's not so easy to express the very right contract and things sometimes get unexpected. This piece of code surprised a few people:
struct S {
int *x;
};
int main() {
int n = 7;
const S s = {&n}; // don't touch s, it's read only!
*s.x = 666; // wait, what? s is const! is satan involved?
}
I'm sure it's natural for you why the code above compiles. It's the pointer value you can't modify (the address it points to), not the value behind it. You must admit there're people around that would raise their eyebrow.
I don't know if it makes any point, but I've been using const in C++ all the time. Very accurate. Going mental about it. Not sure whether is has ever saved my ass, but after moving to Go I must admit I've never missed it. And having in mind all these edge cases and exceptions I can really believe creators of a minimalistic language like Go would decide to skip on this one.
Type-safety is about having a contract that allows speed and prevents
type-related bugs.
Agreed. For example, in Go, I love there're no implicit conversions between types. This is really preventing me from type-related bugs.
Another feature that does this too is the const qualifier.
Per my whole answer – I don't agree. Where a general const contract would do this for sure, a simple const qualifier is not enough. You then need a mutable one, maybe kind of a const_cast feature and still – it can leave you with misleading believes of protection, because it's hard to understand what exactly is constant.
Hopefully some language creators will design a perfect way of defining constants all over in our code and then we'll see it in Go. Or move over to the new language. But personally, I don't think C++'s way is a particularly good one.
(Alternative would be to follow functional programming paradigms, which would love to see all their "variables" immutable.)

C++ why is noexcept required in the context of Move Constructors and Move Assignment Operators to enable optimizations?

Consider the following class, with a move constructor and move assignment operator:
class my_class
{
protected:
double *my_data;
uint64_t my_data_length;
}
my_class(my_class&& other) noexcept : my_data_length{other.my_data_length}, my_data{other.my_data}
{
// Steal the data
other.my_data = nullptr;
other.my_data_length = 0;
}
const my_class& operator=(my_class&& other) noexcept
{
// Steal the data
std::swap(my_data_length, other.my_data_length);
std::swap(my_data, other.my_data);
return *this;
}
What is the purpose of noexcept here? I know that is hits to the compiler that no exceptions should be thrown by the following function, but how does this enable compiler optimizations?
The special importance of noexcept on move constructors and assignment operators is explained in detail in https://vimeo.com/channels/ndc2014/97337253
Basically, it doesn't enable "optimisations" in the traditional sense of allowing the compiler to generate better code. Instead it allows other types, such as containers in the library, to take a different code path when they can detect that moving the element types will never throw. That can enable taking an alternate code path that would not be safe if they could throw (e.g. because it would prevent the container from meeting exception-safety guarantees).
For example, when you do push_back(t) on a vector, if the vector is full (size() == capacity()) then it needs to allocate a new block of memory and copy all the existing elements into the new memory. If copying any of the elements throws an exception then the library just destroys all the elements it created in the new storage and deallocates the new memory, leaving the original vector is unchanged (thus meeting the strong exception-safety guarantee). It would be faster to move the existing elements to the new storage, but if moving could throw then any already-moved elements would have been altered already and meeting the strong guarantee would not be possible, so the library will only try to move them when it knows that can't throw, which it can only know if they are noexcept.
IMHO using noexcept will not enable any compiler optimization on its own. There are traits in STL:
std::is_nothrow_move_constructible
std::is_nothrow_move_assignable
STL containters like vector etc use these traits to test type T and use move constructors and assignment instead of copy constructors and assignment.
Why STL use these traits instead of:
std::is_move_constructible
std::is_move_assignable
Answer: to provide strong exception guarantee.
First of all I would remark that in move constructors or move assignment nothing should throw and there seems to be no need to this ever. The only thing which must be done in constructors/assignment operator is dealing with already allocated memory and pointers to them. Normally you should not call any other methods which can throw and your own moving inside your constructor/operator has no need to do so. But on the other hand a simple output of a debug message breaks this rule.
Optimization can be done in a some different ways. Automatically by the compiler and also by different implementations of code which uses your constructors and assignment operator. Take a look to the STL, there are some specializations for code which are different if you use exceptions or not which are implemented via type traits.
The compiler itself can optimize better while having the guarantee that any code did never throw. The compiler have a guaranteed call tree through your code which can be better inlined, compile time calculated or what so ever. The minimum optimization which can be done is to not store all the informations about the actual stack frame which is needed to handle the throw condition, like deallocation variables on the stack and other things.
There was also a question here: noexcept, stack unwinding and performance
Maybe your question is a duplicate to that?
A maybe helpful question related to this I found here: Are move constructors required to be noexcept?
This discuss the need of throwing in move operations.
What is the purpose of noexcept here?
At minimum saving some program space, which is not only relevant to move operations but for all functions. And if your class is used with STL containers or algorithms it can handled different which can result in better optimization if your STL implementation uses these informations. And maybe the compiler is able to get better general optimization because of a known call tree if all other things are compile time constant.

Why isn't DRY considered a good thing for type declarations?

It seems like people who would never dare cut and paste code have no problem specifying the type of something over and over and over. Why isn't it emphasized as a good practice that type information should be declared once and only once so as to cause as little ripple effect as possible throughout the source code if the type of something is modified? For example, using pseudocode that borrows from C# and D:
MyClass<MyGenericArg> foo = new MyClass<MyGenericArg>(ctorArg);
void fun(MyClass<MyGenericArg> arg) {
gun(arg);
}
void gun(MyClass<MyGenericArg> arg) {
// do stuff.
}
Vs.
var foo = new MyClass<MyGenericArg>(ctorArg);
void fun(T)(T arg) {
gun(arg);
}
void gun(T)(T arg) {
// do stuff.
}
It seems like the second one is a lot less brittle if you change the name of MyClass, or change the type of MyGenericArg, or otherwise decide to change the type of foo.
I don't think you're going to find a lot of disagreement with your argument that the latter example is "better" for the programmer. A lot of language design features are there because they're better for the compiler implementer!
See Scala for one reification of your idea.
Other languages (such as the ML family) take type inference much further, and create a whole style of programming where the type is enormously important, much more so than in the C-like languages. (See The Little MLer for a gentle introduction.)
It isn't considered a bad thing at all. In fact, C# maintainers are already moving a bit towards reducing the tiring boilerplate with the var keyword, where
MyContainer<MyType> cont = new MyContainer<MyType>();
is exactly equivalent to
var cont = new MyContainer<MyType>();
Although you will see many people who will argue against var usage, which kind of shows that many people is not familiar with strong typed languages with type inference; type inference is mistaken for dynamic/soft typing.
Repetition may lead to more readable code, and sometimes may be required in the general case. I've always seen the focus of DRY being more about duplicating logic than repeating literal text. Technically, you can eliminate 'var' and 'void' from your bottom code as well. Not to mention you indicate scope with indentation, why repeat yourself with braces?
Repetition can also have practical benefits: parsing by a program is easier by keeping the 'void', for example.
(However, I still strongly agree with you on prefering "var name = new Type()" over "Type name = new Type()".)
It's a bad thing. This very topic was mentioned in Google's Go language Techtalk.
Albert Einstein said, "Everything should be made as simple as possible, but not one bit simpler."
Your complaint makes no sense in the case of a dynamically typed language, so you must intend this to refer to statically typed languages. In that case, your replacement example implicitly uses Generics (aka Template Classes), which means that any time that fun or gun is used, a new definition based upon the type of the argument. That could result in dozens of extra methods, regardless of the intent of the programmer. In particular, you're throwing away the benefit of compiler-checked type-safety for a runtime error.
If your goal was to simply pass through the argument without checking its type, then the correct type would be Object not T.
Type declarations are intended to make the programmer's life simpler, by catching errors at compile-time, instead of failing at runtime. If you have an overly complex type definition, then you probably don't understand your data. In your example, I would have suggested adding fun and gun to MyClass, instead of defining them separately. If fun and gun don't apply to all possible template types, then they should be defined in an explicit subclass, not as separate functions that take a templated class argument.
Generics exist as a way to wrap behavior around more specific objects. List, Queue, Stack, these are fine reasons for Generics, but at the end of the day, the only thing you should be doing with a bare Generic is creating an instance of it, and calling methods on it. If you really feel the need to do more than that with a Generic, then you probably need to embed your Generic class as an instance object in a wrapper class, one that defines the behaviors you need. You do this for the same reason that you embed primitives into a class: because by themselves, numbers and strings do not convey semantic information about their contents.
Example:
What semantic information does List convey? Just that you're working with multiple triples of integers. On the other hand, List, where a color has 3 integers (red, blue, green) with bounded values (0-255) conveys the intent that you're working with multiple Colors, but provides no hint as to whether the List is ordered, allows duplicates, or any other information about the Colors. Finally a Palette can add those semantics for you: a Palette has a name, contains multiple Colors, but no duplicates, and order isn't important.
This has gotten a bit far afield from the original question, but what it means to me is that DRY (Don't Repeat Yourself) means specifying information once, but that specification should be as precise as is necessary.

Resources