I am using std::shared_ptr in C++11 and I would like to understand if it's better to assign structures of type T in this way:
T a_data;
std::shared_ptr<T> my_pointer(new T);
*my_pointer = a_data;
or like:
memcpy(&my_pointer, data, sizeof(T));
or like:
my_pointer.reset(a_data);
Regards
Mike
They each do a different thing.
1.
T a_data;
std::shared_ptr<T> my_pointer(new T);
*my_pointer = a_data;
Here, a new object (call it n) of type T will be allocated, managed by my_pointer. Then, object a_data will be copy-assigned into n.
2.
memcpy(&my_pointer, a_data, sizeof(T)); // I assume you meant a_data here, not data
That's total nonsense - tha's overwriting the shared_ptr itself with the contents of a_data. Undefined behaviour at its finest (expect a crash or memory corruption).
Perhaps you actually meant my_pointer.get() instead of &my_pointer (that is, you wanted to copy into the object being pointed to)? If that's the case, it can work, as long as T is trivially copyable - which means that it doesn't have non-trivial copy or move ctors, doesn't have non-trivial copy or move assignment operators, and has a trivial destructor. But why rely on that, when normal assignment (*my_pointer = a_data;) does exactly the same for that case, and also works for non-trivially-copyable classes?
3.
my_pointer.reset(a_data);
This normally won't compile as-is, it would need to be my_pointer.reset(&a_data);. That's disaster waiting to happen - you point my_pointer to the automatic (= local) variable a_data and give it ownership of that. Which means that when my_pointer goes out of scope (actually, when the last pointer sharing ownership wiht it does), it will call the deleter, which normally calls delete. On a_data, which was not allocated with new. Welcome to UB land again!
If you just need to manage a dynamically-allocated copy of a_data with a shared_ptr, do this:
T a_data;
std::shared_ptr<T> my_pointer(new T(a_data));
Or even better:
T a_data;
auto my_pointer = std::make_shared<T>(a_data);
Related
I saw code somewhere in which someone decided to copy an object and subsequently move it to a data member of a class. This left me in confusion in that I thought the whole point of moving was to avoid copying. Here is the example:
struct S
{
S(std::string str) : data(std::move(str))
{}
};
Here are my questions:
Why aren't we taking an rvalue-reference to str?
Won't a copy be expensive, especially given something like std::string?
What would be the reason for the author to decide to make a copy then a move?
When should I do this myself?
Before I answer your questions, one thing you seem to be getting wrong: taking by value in C++11 does not always mean copying. If an rvalue is passed, that will be moved (provided a viable move constructor exists) rather than being copied. And std::string does have a move constructor.
Unlike in C++03, in C++11 it is often idiomatic to take parameters by value, for the reasons I am going to explain below. Also see this Q&A on StackOverflow for a more general set of guidelines on how to accept parameters.
Why aren't we taking an rvalue-reference to str?
Because that would make it impossible to pass lvalues, such as in:
std::string s = "Hello";
S obj(s); // s is an lvalue, this won't compile!
If S only had a constructor that accepts rvalues, the above would not compile.
Won't a copy be expensive, especially given something like std::string?
If you pass an rvalue, that will be moved into str, and that will eventually be moved into data. No copying will be performed. If you pass an lvalue, on the other hand, that lvalue will be copied into str, and then moved into data.
So to sum it up, two moves for rvalues, one copy and one move for lvalues.
What would be the reason for the author to decide to make a copy then a move?
First of all, as I mentioned above, the first one is not always a copy; and this said, the answer is: "Because it is efficient (moves of std::string objects are cheap) and simple".
Under the assumption that moves are cheap (ignoring SSO here), they can be practically disregarded when considering the overall efficiency of this design. If we do so, we have one copy for lvalues (as we would have if we accepted an lvalue reference to const) and no copies for rvalues (while we would still have a copy if we accepted an lvalue reference to const).
This means that taking by value is as good as taking by lvalue reference to const when lvalues are provided, and better when rvalues are provided.
P.S.: To provide some context, I believe this is the Q&A the OP is referring to.
To understand why this is a good pattern, we should examine the alternatives, both in C++03 and in C++11.
We have the C++03 method of taking a std::string const&:
struct S
{
std::string data;
S(std::string const& str) : data(str)
{}
};
in this case, there will always be a single copy performed. If you construct from a raw C string, a std::string will be constructed, then copied again: two allocations.
There is the C++03 method of taking a reference to a std::string, then swapping it into a local std::string:
struct S
{
std::string data;
S(std::string& str)
{
std::swap(data, str);
}
};
that is the C++03 version of "move semantics", and swap can often be optimized to be very cheap to do (much like a move). It also should be analyzed in context:
S tmp("foo"); // illegal
std::string s("foo");
S tmp2(s); // legal
and forces you to form a non-temporary std::string, then discard it. (A temporary std::string cannot bind to a non-const reference). Only one allocation is done, however. The C++11 version would take a && and require you to call it with std::move, or with a temporary: this requires that the caller explicitly creates a copy outside of the call, and move that copy into the function or constructor.
struct S
{
std::string data;
S(std::string&& str): data(std::move(str))
{}
};
Use:
S tmp("foo"); // legal
std::string s("foo");
S tmp2(std::move(s)); // legal
Next, we can do the full C++11 version, that supports both copy and move:
struct S
{
std::string data;
S(std::string const& str) : data(str) {} // lvalue const, copy
S(std::string && str) : data(std::move(str)) {} // rvalue, move
};
We can then examine how this is used:
S tmp( "foo" ); // a temporary `std::string` is created, then moved into tmp.data
std::string bar("bar"); // bar is created
S tmp2( bar ); // bar is copied into tmp.data
std::string bar2("bar2"); // bar2 is created
S tmp3( std::move(bar2) ); // bar2 is moved into tmp.data
It is pretty clear that this 2 overload technique is at least as efficient, if not more so, than the above two C++03 styles. I'll dub this 2-overload version the "most optimal" version.
Now, we'll examine the take-by-copy version:
struct S2 {
std::string data;
S2( std::string arg ):data(std::move(x)) {}
};
in each of those scenarios:
S2 tmp( "foo" ); // a temporary `std::string` is created, moved into arg, then moved into S2::data
std::string bar("bar"); // bar is created
S2 tmp2( bar ); // bar is copied into arg, then moved into S2::data
std::string bar2("bar2"); // bar2 is created
S2 tmp3( std::move(bar2) ); // bar2 is moved into arg, then moved into S2::data
If you compare this side-by-side with the "most optimal" version, we do exactly one additional move! Not once do we do an extra copy.
So if we assume that move is cheap, this version gets us nearly the same performance as the most-optimal version, but 2 times less code.
And if you are taking say 2 to 10 arguments, the reduction in code is exponential -- 2x times less with 1 argument, 4x with 2, 8x with 3, 16x with 4, 1024x with 10 arguments.
Now, we can get around this via perfect forwarding and SFINAE, allowing you to write a single constructor or function template that takes 10 arguments, does SFINAE to ensure that the arguments are of appropriate types, and then moves-or-copies them into the local state as required. While this prevents the thousand fold increase in program size problem, there can still be a whole pile of functions generated from this template. (template function instantiations generate functions)
And lots of generated functions means larger executable code size, which can itself reduce performance.
For the cost of a few moves, we get shorter code and nearly the same performance, and often easier to understand code.
Now, this only works because we know, when the function (in this case, a constructor) is called, that we will be wanting a local copy of that argument. The idea is that if we know that we are going to be making a copy, we should let the caller know that we are making a copy by putting it in our argument list. They can then optimize around the fact that they are going to give us a copy (by moving into our argument, for example).
Another advantage of the 'take by value" technique is that often move constructors are noexcept. That means the functions that take by-value and move out of their argument can often be noexcept, moving any throws out of their body and into the calling scope (who can avoid it via direct construction sometimes, or construct the items and move into the argument, to control where throwing happens). Making methods nothrow is often worth it.
This is probably intentional and is similar to the copy and swap idiom. Basically since the string is copied before the constructor, the constructor itself is exception safe as it only swaps (moves) the temporary string str.
You don't want to repeat yourself by writing a constructor for the move and one for the copy:
S(std::string&& str) : data(std::move(str)) {}
S(const std::string& str) : data(str) {}
This is much boilerplate code, especially if you have multiple arguments. Your solution avoids that duplication on the cost of an unnecessary move. (The move operation should be quite cheap, however.)
The competing idiom is to use perfect forwarding:
template <typename T>
S(T&& str) : data(std::forward<T>(str)) {}
The template magic will choose to move or copy depending on the parameter that you pass in. It basically expands to the first version, where both constructor were written by hand. For background information, see Scott Meyer's post on universal references.
From a performance aspect, the perfect forwarding version is superior to your version as it avoids the unnecessary moves. However, one can argue that your version is easier to read and write. The possible performance impact should not matter in most situations, anyway, so it seems to be a matter of style in the end.
For C++11, is there still a performance difference between the following?
(for std::map<Foo, std::vector<Bar> > as an example)
map[key] = myVector and map.emplace(key, myVector)
The part I'm not figuring out is the exact internal of operator[]. My understanding so far has been (when key doesn't exist):
Create a new key and the associated empty default vector in place inside the map
Return the reference of the associated empty vector
Assign myVector to the reference???
The point 3 is the part I couldn't understand, how can you assign a new value to a reference in the first place?
Though I cannot sort through point 3 I think somehow there's just a copy/move required. Assuming C++11 will be smart enough to know it's gonna be a move operation, is this whole "[]" assignment then already cheaper than insert()? Is it almost equivalent to emplace()? ---- default construction and move content over, versus construct vector with content directly in place?
There are a lot of differences between the two.
If you use operator[], then the map will default construct the value. The return value from operator[] will be this default constructed object, which will then use operator= to assign to it.
If you use emplace, the map will directly construct the value with the parameters you provide.
So the operator[] method will always use two-stage construction. If the default constructor is slow, or if copy/move construction is faster than copy/move assignment, then it could be problematic.
However, emplace will not replace the value if the provided key already exists. Whereas operator[] followed by operator= will always replace the value, whether there was one there or not.
There are other differences too. If copying/moving throws, emplace guarantees that the map will not be changed. By contrast, operator[] will always insert a default constructed element. So if the later copy/move assignment fails, then the map has already been changed. That key will exist with a default constructed value_type.
Really, performance is not the first thing you should be thinking about when deciding which one to use. You need to focus first on whether it has the desired behavior.
C++17 will provide insert_or_assign, which has the effect of map[] = v;, but with the exception safety of insert/emplace.
how can you assign a new value to a reference in the first place?
It's fundamentally no different from assigning to any non-const reference:
int i = 5;
int &j = i;
j = 30;
i == 30; //This is true.
Under what situations would it be valid to compare shared_ptr instances instead of the underly lying type the shared_ptr manages?
As an example, would there ever be a situation where the size of personset being 2 would be valid after the following code has run?
shared_ptr<person> p0 = make_shared<person>(....);
shared_ptr<person> p1 = p0;
set<shared_ptr<person>> personset;
personset.insert(p0);
personset.insert(p1);
There is no viable reason to compare the instances. Infact shared_ptrs by default will perform equality/inequality comparators based via the underlying pointer to the control block (via .get method).
http://en.cppreference.com/w/cpp/memory/shared_ptr/operator_cmp
I've been trying to nail down the rule of 5, but most of the information online is vastly over-complicated, and the example codes differ.
Even my textbook doesn't cover this topic very well.
On move semantics:
Templates, rvalues and lvalues aside, as I understand it, move semantics are simply this:
int other = 0; //Initial value
int number = 3; //Some data
int *pointer1 = &number; //Source pointer
int *pointer2 = &other; //Destination pointer
*pointer2 = *pointer1; //Both pointers now point to same data
pointer1 = nullptr; //Pointer2 now points to nothing
//The reference to 'data' has been 'moved' from pointer1 to pointer2
As apposed to copying, which would be the equivalent of something like this:
pointer1 = &number; //Reset pointer1
int newnumber = 0; //New address for the data
newnumber = *pointer1; //Address is assigned value
pointer2 = &newnumber; //Assign pointer to new address
//The data from pointer1 has been 'copied' to pointer2, at the address 'newnumber'
No explanation of rvalues, lvalues or templates is necessary, I would go as far as to say those topics are unrelated.
The fact that the first example is faster than the second, should be a given. And I would also point out that any efficient code prior to C++ 11 will do this.
To my understanding, the idea was to bundle all of this behavior in a neat little operator move() in std library.
When writing copy constructors and copy assignment operators, I simply do this:
Text::Text(const Text& copyfrom) {
data = nullptr; //The object is empty
*this = copyfrom;
}
const Text& Text::operator=(const Text& copyfrom) {
if (this != ©from) {
filename = copyfrom.filename;
entries = copyfrom.entries;
if (copyfrom.data != nullptr) { //If the object is not empty
delete[] data;
}
data = new std::string[entries];
for (int i = 0; i < entries; i++) {
data[i] = copyfrom.data[i];
//std::cout << data[i];
}
std::cout << "Data is assigned" << std::endl;
}
return *this;
}
The equivalent, one would think, would be this:
Text::Text(Text&& movefrom){
*this = movefrom;
}
Text&& Text::operator=(Text&& movefrom) {
if (&movefrom != this) {
filename = movefrom.filename;
entries = movefrom.entries;
data = movefrom.data;
if (data != nullptr) {
delete[] data;
}
movefrom.data = nullptr;
movefrom.entries = 0;
}
return std::move(*this);
}
I'm quite certain this won't work, so my question is: How do you achieve this type of constructor functionality with move semantics?
It's not entirely clear to me what is supposed to be proved by your code examples -- or what the focus is of this question is.
Is it conceptually what does the phrase 'move semantics' mean in C++?
Is it "how do I write move ctors and move assignment operators?" ?
Here is my attempt to introduce the concept. If you want to see code examples then look at any of the other SO questions that were linked in comments.
Intuitively, in C and C++ an object is supposed to represent a piece of data residing in memory. For any number of reasons, commonly you want to send that data somewhere else.
Often one can take a direct approach of simply passing a pointer / reference to the object to the place where the data is needed. Then, it can be read using the pointer. Taking the pointer and moving the pointer around is very cheap, so this is often very efficient. The chief drawback is that you have to ensure that the object will live for as long as is needed, or you get a dangling pointer / reference and a crash. Sometimes that's easy to ensure, sometimes its not.
When it isn't, one obvious alternative is to make a copy and pass it (pass-by-value) rather than passing by reference. When the place where the data is needed has its own personal copy of the data, it can ensure that the copy stays around as long as is needed. The chief drawback here is that you have to make a copy, which may be expensive if the object is big.
A third alternative is to move the object rather than copying it. When an object is moved, it is not duplicated, and instead becomes available exclusively in the new site, and no longer in the old site. You can only do this when you won't need it at the old site anymore, obviously, but in that case this saves you a copy which can be a big savings.
When the objects are simple, all of these concepts are fairly trivial to actually implement and get right. For instance, when you have a trivial object, that is, one with trivial construction / destruction, it is safe to copy it exactly as you do in the C programming language, using memcpy. memcpy produces a byte-for-byte copy of a block of bytes. If a trivial object was properly initialized, since its creation has no possible side-effects, and its later destruction doesn't either, then memcpy copy is also properly initialized and results in a valid object.
However, in modern C++ many of your objects are not trivial -- they may "own" references to heap memory, and manage this memory using RAII, which ties the lifetime of the object to the usage of some resource. For instance, if you have a std::string as a local variable in a function, the string is not totally a "contiguous" object and rather is connected to two different locations in memory. There is a small, fixed-size (sizeof(std::string), in fact) block on the stack, which contains a pointer and some other info, pointing to a dynamically sized buffer on the heap. Formally, only the small "control" part is the std::string object, but intuitively from the programmer's point the buffer is also "part" of the string and is the part that you usually think about. You can't copy a std::string object like this using memcpy -- think about what will happen if you have std::string s and you try to copy sizeof(std::string) bytes from address &s to get a second string. Instead of two distinct string objects, you'll end up with two control blocks, each pointing to the same buffer. And when the first one is destroyed, that buffer is deleted, so using the second one will cause a segfault, or when the second one is destroyed, you get a double delete.
Generally, copying nontrivial C++ objects with memcpy is illegal and causes undefined behavior. This is because it conflicts with one of the core ideas of C++ which is that object creation and destruction may have nontrivial consequences defined by the programmer using ctors and dtors. Object lifetimes may be used to create and enforce invariants which you use to reason about your program. memcpy is a "dumb" low-level way to just copy some bytes -- potentially it bypasses the mechanisms that enforce the invariants which make your program work, which is why it can cause undefined behavior if used incorrectly.
Instead, in C++ we have copy constructors which you can use to safely make copies of nontrivial objects. You should write these in a way that preserves what invariants you need for your object. The rule of three is a guideline about how to actually do that.
The C++11 "move semantics" idea is a collection of new core language features which were added to extend and refine the traditional copy construction mechanism from C++98. Specifically, it's about, how do we move potentially complex RAII objects, not just trivial objects, which we already were able to move. How do we make the language generate move constructors and such for us automatically when possible, similarly to how it does it for copy constructors. How do we make it use the move options when it can to save us time, without causing bugs in old code, or breaking core assumptions of the language. (This is why I would say that your code example with int's and int *'s has little to do with C++11 move semantics.)
The rule of five, then, is the corresponding extension of the rule of three which describes conditions when you may need to implement a move ctor / move assignment operator also for a given class and not rely on the default behavior of the language.
Given the code:
#include <stdlib.h>
#include <stdint.h>
typedef struct { int32_t x, y; } INTPAIR;
typedef struct { int32_t w; INTPAIR xy; } INTANDPAIR;
void foo(INTPAIR * s1, INTPAIR * s2)
{
s2->y++;
s1->x^=1;
s2->y--;
s1->x^=1;
}
int hey(int x)
{
static INTPAIR dummy;
void *p = calloc(sizeof (INTANDPAIR),1);
INTANDPAIR *p1 = p;
INTPAIR *p2a = p;
INTPAIR *p2b = &p1->xy;
p2b->x = x;
foo(p2b,p2a);
int result= p2b->x;
free(p);
return result;
}
#include <stdio.h>
int main(void)
{
for (int i=0; i<10; i++)
printf("%d.",hey(i));
}
Behavior depends upon gcc optimization level, which implies that gcc thinks
this code invokes Undefined Behavior (the definition of "foo" collapses to nothing, but interestingly the definition of "hey" increments the value passed in). I'm not quite sure what if anything it does that runs afoul of the Standard's rules, though.
The code very deliberately and evilly constructs two pointers such that
s2a->y and s2b->x will alias, but the pointers are deliberately constructed in such a way that both identify legitimate potential objects of type INTPAIR. Because code used calloc to get the memory, all field members have legitimate initial defined values of zero. All accesses to the allocated memory are done via an int32_t member of an INTPAIR*.
I can understand why it would make sense for the Standard to forbid aliasing structure fields in this fashion, but I couldn't find anything in the Standard which actually does so. Is gcc operating in Standard-compliant fashion here, or is it violating some clause in the Standard which isn't referenced by Annex J.2 and doesn't use any of the terms I searched for?
UPDATE:
I felt this answer was OK, but not still a little imprecise, and not cut and dry as to what the UB was. After a lot of very interesting discussion and comments I have tried again with a new answer
The right part of the C99 standard is quoted in this answer. I'm copying it here for convenience. The question and several of the answers are quite thorough.
(C99; ISO/IEC 9899:1999 6.5/7:
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types 73) or 88):
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of
the object,
a type that is the signed or unsigned type corresponding to the
effective type of the object,
a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
a character type.
73) or 88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.
What is an effective type then? (C99; ISO/IEC 9899:1999 6.5/6:
The effective type of an object for an access to its stored value is the declared type of the object, if any. 87) If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.
87) Allocated objects have no declared type.
So at the line p2b->x = x the object at p+4 becomes of effective type INTPAIR. Is it aligned correctly? If it isn't then Undefined Behavior (UB). But to keep it interesting, assume it is as it must be in this case because of the layout of INTANDPAIR.
By the same analysis there are two 8 byte objects, p2a (s2) at #(p+4) and p2b #p. As your example is demonstrating the 2nd element of p2a and the first of p2b end up being aliased.
In the foo(), the object p2b #p+4 is accessed by the normal method via s1->x. But then the "stored value" of object p2b is also accessed by a side effect of modifying a different object p2a #p. Since this falls under none of the bullets of 6.5/7, it is UB. Note that 6.5/7 says only, so objects shall not be accessed in any other ways.
I think the main distinction is that the "object" in question is the whole structure p2a/s2 and p2b/s1, not the integer members. If you change the argument of the function to take the integers and alias them it works "fine" because the function can't know s1 and s2 alias. For example:
void foo2(int *s1, int *s2)
{
(*s2)++;
(*s1)^=1;
(*s2)--;
(*s1)^=1;
}
...
/*foo(p2b,p2a);*/
foo2((int*)p, (int*)p); /* or p+4 or whatever you want */
This more or less confirms that this is the way GCC chose to interpret things: modifying a member is modifying the whole struct object and that since side effects of modifying one object are not on the listed legal ways to indirectly modify a different object, whee! we can do whatever silly thing we feel like doing.
So whether GCC interprets the ambiguities in standard to decide that by deriving s1 and s2 pointers through different typed pointers and then accessing them constitutes indirectly accessing the memory via different original types via p1 and p or whether it interprets the standard in the way I'm suggesting that "object" s2->y modifies is not just the integer but the s2 object, it is UB either way. Or is GCC just being especially snarky and pointing out that if the standard doesn't very clearly specify the semantics of dynamically allocated yet overlapping objects, it is free to do whatever it wants because by definition it is "undefined".
I don't think at this microscopic level anyone other than the standards body can definitively answer whether this should be UB or not because at this level it requires some "interpretation". The GCC's implementers opinion's seem to favor very aggressive interpretations.
I like Linus's reaction to this whole thing. And it is true, why not just be conservative and let the programmer tell the compiler when it is safe? Very Excellent Linus Rant
My previous answer was lacking, maybe not completely wrong, but the sample program is deliberately designed to sidestep each of the more obvious explicit Undefined Behaviors (UB) dictated by the C99 standard, like 6.5/7. But with both GCC (and Clang) this example demonstrates strict aliasing failure like symptoms under optimization. They appear to be assuming s1->y and s2-x can't alias. So, is the compiler wrong? Is this a loophole in the strict aliasing legalese?
Short answer: No. I wouldn't be surprised if there was a loophole of some kind in the standard, given its complexity. But in this example, creating overlapping objects on the heap is explicitly undefined behavior, and there are several other things happening that the standard does not define.
I think the point of the example is not that it fails - it is obvious that "playing fast and loose" with pointers is a bad idea and relying on corner cases and legalese to prove the compile "wrong" is of little help if the code doesn't work. The key questions are: is GCC wrong? and what in the standard says so.
First, lets look at the obvious strict aliasing rules and how this example is trying to avoid them.
C99 6.5/7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types: 76)
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.
This is the main strict aliasing section. It means that accessing the same memory via two different type pointers is UB. This example sidesteps it by accessing both using INTPAIR pointers in foo().
The key problem with this is that it is talking about accessing the stored value via two different effective types (e.g. pointers). It doesn't talk about accessing via two different objects.
What is being accessed? is it the integer member or the entire object s1 / s2? Is accessing s2->x via s1->y access via "a type compatible with the effective type of the object". I believe an argument can be made that a) the access as a side effect of modifying a different object does not fall under the permissible methods in 6.5/7 and that b) modifying one member of the aggregate transitively modifies the aggregate (*s1 or *s2) also.
Since this is not specified, it is UB, but it is a bit hand-wavy.
How did we get pointers to two overlapping objects? Are the pointer casts leading to them OK? Section 6.3.2.3 contains the rules for casting pointers and the example carefully does not violate any of them. In particular, because p2b is a pointer to INTANDPAIR member xy the alignment is guaranteed to be right, otherwise it would definitely run afoul of 6.3.2.3/7.
Furthermore, &p1->xy is not a problem - it can't be - it is a perfectly legitimate pointer to an INTPAIR. Simply casting pointers and/or taking addresses is safely outside the definition of "access" (3.1/1).
It is clear that the problem comes about by accessing two integer members that overlay each other as different parts of overlapping objects. Any attempt to do this via pointers of different types would clearly run afoul of 6.5/7. If accessed by the same type pointer at the same address, there would be no problem whatsoever. So the only way left that they could alias this way is that if two objects at different addresses overlapped in some fashion.
Obviously this could occur as part of a union, but that is not the case for this example. Type punning through unions may not be UB in C99, but it would be a different question whether a variant of this example could be made misbehave via unions.
The example uses dynamic allocation and casts the resultant void pointer to two different types. Going from from a pointer to an object to void * and back again is valid (6.3.2.3/1). Several other ways of obtaining pointers to objects that would overlap are explicitly UB by the pointer conversion rules of 6.3.2.3, the aliasing rules of 6.5/7, and/or the compatible type rules 6.2.7.
So what else is wrong?
6.2.4 Storage durations of objects
1 An object has a storage duration that determines its lifetime. There are three storage durations: static, automatic, and allocated. Allocated storage is described in 7.20.3
The storage for each of the objects is allocated by calloc() so the duration we want is "allocated". So we check 7.20.3: (emphasis added)
7.20.3 Memory management functions
1 The order and contiguity of storage allocated by successive calls to the calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object.
...
2 The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address, 25) and retains its last-stored value throughout its lifetime. 26) If an object is referred to outside of its lifetime, the behavior is undefined.
To avoid UB, the accesses to the two different objects must be to a valid object within its lifetime. You can get a single valid object (or an array) with malloc()/calloc(), but these guarantee that you will receive a pointer disjoint from all other objects. So is the object returned from calloc() p or is it p1? It can't be both.
The UB is triggered by attempting to reuse the same dynamically allocated object to hold two objects that are not disjoint. While calloc() guarantees it will return a pointer to a disjoint object, there is nothing that says it will still work if you then start using parts of the buffer for a 2nd overlapping one. In fact, it even explicitly says it is UB if you access an object outside its lifetime and there is only a single allocation ergo a single lifetime.
Also note:
4. Conformance
In this International Standard, ‘‘shall’’ is to be interpreted as a requirement on an implementation or on a program; conversely, ‘‘shall not’’ is to be interpreted as a prohibition.
If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition
of behavior. There is no difference in emphasis among these three; they all describe ‘‘behavior that is undefined’’.
For this to be a compiler error it must fail on a program that only uses constructs explicitly defined. Anything else is outside the safe-harbor and is still undefined, even if it the standard doesn't explicitly state that it is Undefined Behavior.