Is it safe to write to a std::strings buffer directly? - c++11

If I have the following code:
std::string hello = "hello world";
char* internalBuffer = &hello[0];
Is it then safe to write to internalBuffer up to hello.length()? Or is this UB/implemention defined? Obviously I can write tests and see that this works, but it doesn't answer my question.

Yes, it's safe. No, it's not explicitly allowed by the standard.
According to my copy of the standard draft from like half a year ago, they do assure that data() points at a contiguous array, and that that array be the same as what you receive from operator[]:
21.4.7.1 basic_string accessors [string.accessors]
const charT* c_str() const noexcept;
const charT* data() const noexcept;
Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
From this one can conclude that operator[] returns a reference to some place within that contiguous array. They also allow the returned reference from (non-const) operator[] be modified.
Having a non-const reference to one member of an array I dare to say that we can modify the entire array.

The relevant section in the standard is §21.4.5:
const_reference operator[](size_type pos) const noexcept;
reference operator[](size_type pos) noexcept;
[...]
Returns: *(begin() + pos) if pos < size(), otherwise a reference to an
object of type T with value charT(); the referenced value shall not be modified.
If I understand this correctly, it means that as long as the index given to operator[] is smaller than the string's size, one is allowed to modify the value. If however, the index is equal to size and thus we obtain the \0 terminating the string, we must not write to this value.
Cppreference uses a slightly different wording here:
If pos == size(), a reference to the character with value CharT() (the null character) is returned.
For the first (non-const) version,the behavior is undefined if this character is modified.
I read this such that 'this character' here only refers to the default constructed CharT, and not to the reference returned in the other case. But I admit that the wording is a bit confusing here.

In practice it is safe, theoretically - no.
C++ standard doesn't force to implement string as a sequential character array like it does for the vector. I'm not aware of any implementation of string where it is not safe, but theoretically there is no guarantee.
http://herbsutter.com/2008/04/07/cringe-not-vectors-are-guaranteed-to-be-contiguous/

Related

Why std::move() is not stealing an int value?

std::move() is stealing the string value whereas not an int, please help me.
int main()
{
int i = 50;
string str = "Mahesh";
int j = std::move(i);
string name = std::move(str);
std::cout <<"i: "<<i<<" J: "<<j <<std::endl;
std::cout <<"str: "<<str<<" name: "<<name <<std::endl;
return 0;
}
Output
i: 50 J: 50
str: name: Mahesh
std::move is a cast to an rvalue reference. This can change overload resolution, particularly with regard to constructors.
int is a fundamental type, it doesn't have any constructors. The definition for int initialisation does not care about whether the expression is const, volatile, lvalue or rvalue. Thus the behaviour is a copy.
One reason this is the case is that there is no benefit to a (destructive) move. Another reason is that there is no such thing as an "empty" int, in the sense that there are "empty" std::strings, and "empty" std::unique_ptrs
std::move() itself doesn't actually do any moving. It is simply used to indicate that an object may be moved from. The actual moving must be implemented for the respective types by a move constructor/move assignment operator.
std::move(x) returns an unnamed rvalue reference to x. rvalue references are really just like normal references. Their only purpose is simply to carry along the information about the "rvalue-ness" of the thing they refer to. When you then use the result of std::move() to initialize/assign to another object, overload resolution will pick a move constructor/move assignment operator if one exists. And that's it. That is literally all that std::move() does. However, the implementation of a move constructor/move assignment operator knows that the only way it could have been called is when the value passed to it is about to expire (otherwise, the copy constructor/copy assignment operator would have been called instead). It, thus, can safely "steal" the value rather than make a copy, whatever that may mean in the context of the particular type.
There is no general answer to the question what exactly it means to "steal" a value from an object. Whoever defines a type has to define whether it makes sense to move objects of this type and what exactly it means to do so (by declaring/defining the respective member functions). Built-in types don't have any special behavior defined for moving their values. So in the case of an int you just get what you get when you initialize an int with a reference to another int, which is a copy…

Why does the STL Output Iterator allow only once assignment?

As mentioned here:
http://www.cplusplus.com/reference/iterator/OutputIterator/
Can be dereferenced as an lvalue (if in a dereferenceable state).
It shall only be dereferenced as the left-side of an assignment statement.
Once dereferenced, its iterator value may no longer be dereferenceable.
Next to it there is an example of a valid expression:
*a = t
After this expression (the dereference) I can't derefernce again.
I don't understand why for example I can't do:
*a = t2
After the first expression.
One reason is that output iterators are used for output streams, such as terminals, pipes and sockets. Once data have been written into the stream, it is considered sent elsewhere and thus cannot be changed.
Other iterator types, including Trivial Iterator and Input Iterator, define the notion of a value type, the type returned when an iterator is dereferenced. This notion does not apply to Output Iterators, however, since the dereference operator (unary operator*) does not return a usable value for Output Iterators. The only context in which the dereference operator may be used is assignment through an output iterator: *x = t. Although Input Iterators and output iterators are roughly symmetrical concepts, there is an important sense in which accessing and storing values are not symmetrical: for an Input Iterator operator* must return a unique type, but, for an Output Iterator, in the expression *x = t, there is no reason why operator= must take a unique type. Consequently, there need not be any unique "value type" for Output Iterators.

About the underlying storage of std::basic_string

After reading through the description about std::basic_string on cppreference, I'm uncertain about the following two questions regarding the underlying storage of std::basic_string:
1) Since C++11, does the contiguity of std::basic_string extends to the terminating null character? Note that str[str.size()] returns a reference to a terminating null character. But I want to make sure whether this is the one after str[str.size() - 1].
2) Since C++11, data() and c_str() become equivalent. But does it hold that data() == c_str() == &front()?
Any quotation from the standard would be appreciated.
21.4.1.7 basic_string accessors
const charT* c_str() const noexcept;
const charT* data() const noexcept;
1 Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
2 Complexity: constant time.
This effectively requires that the terminating NUL be stored contiguously together with the character sequence (it forces an additional requirement on operator[] that s[s.size()] not do anything fancy, though the plain text of 21.4.5 appears to give it some latitude).
It also explicitly requires that s.c_str() == &s[0], which in turn means that s.c_str() == &s.front() (front() is defined as operator[](0)).

How are null-terminated strings terminated in C++11?

Maybe it's stupid or obvious but I couldn't google any answer. What character ends a null-terminated string in C++11? NULL (which is in fact 0) or new nullptr? On the one hand, nullptr is supposed to replace NULL. On the other, though, I'm not sure if nullptr is a character at all. Or can be interpreted as one.
NULL and nullptr has little to do with null-terminated strings. Both NULL and nullptr are used to denote a pointer which points to nothing, ie. null.
The null-termination of c-style strings is still (and has always) been denoted by a CharT having the integral value 0; or as it's most often written when talking, through a char-literal; '\0'.
Remember that character types are nothing more than integral types with some special meaning.
Comparing a char to an int (which is the type of literal 0) is allowed, it's also allowed to assign the value 0 to a char, as stated: a character type is an integral type.. and integral types hold integral values.
Why this confusion?
Back in the days when we didn't have nullptr, instead we had the macro NULL to denote that a certain pointer didn't have anything to point towards. The value of NULL is, and was, implementation-specific but the behaviour was well-defined; it shall not compare equal to any pointer value that is actually pointing to something.
As a result of how the behaviour of NULL was described plenty of compilers used #define NULL 0, or similar construct, resulting in a "feature" where one could easily compare NULL to any integral type (including char) to see if it's relation to the value zero.
With the previously stated in mind you'd often stumbled upon code such as the below, where the for-condition would be equivalent of having *ptr != 0.
char const * str = "hello world";
for (char const * ptr = str; *ptr != NULL; ++ptr) {
...
}
Lesson learned: Just because something works doesn't mean that it is correct...
NULL and nullptr are completely separate concepts from the "null terminator". They have nothing more in common than the word "null". The null terminator is a character with value 0. It has nothing to do with null pointers.
You can use 0 or '\0' etc.

C++: shared_ptr as unordered_set's key

Consider the following code
#include <boost/unordered_set.hpp>
#include <boost/shared_ptr.hpp>
#include <boost/make_shared.hpp>
int main()
{
boost::unordered_set<int> s;
s.insert(5);
s.insert(5);
// s.size() == 1
boost::unordered_set<boost::shared_ptr<int> > s2;
s2.insert(boost::make_shared<int>(5));
s2.insert(boost::make_shared<int>(5));
// s2.size() == 2
}
The question is: how come the size of s2 is 2 instead of 1? I'm pretty sure it must have something to do with the hash function. I tried looking at the boost docs and playing around with the hash function without luck.
Ideas?
make_shared allocates a new int, and wraps a shared_ptr around it. This means that your two shared_ptr<int>s point to different memory, and since you're creating a hash table keyed on pointer value, they are distinct keys.
For the same reason, this will result in a size of 2:
boost::unordered_set<int *> s3;
s3.insert(new int(5));
s3.insert(new int(5));
assert(s3.size() == 2);
For the most part you can consider shared_ptrs to act just like pointers, including for comparisons, except for the auto-destruction.
You could define your own hash function and comparison predicate, and pass them as template parameters to unordered_map, though:
struct your_equality_predicate
: std::binary_function<boost::shared_ptr<int>, boost::shared_ptr<int>, bool>
{
bool operator()(boost::shared_ptr<int> i1, boost::shared_ptr<int> i2) const {
return *i1 == *i2;
}
};
struct your_hash_function
: std::unary_function<boost::shared_ptr<int>, std::size_t>
{
std::size_t operator()(boost::shared_ptr<int> x) const {
return *x; // BAD hash function, replace with somethign better!
}
};
boost::unordered_set<int, your_hash_function, your_equality_predicate> s4;
However, this is probably a bad idea for a few reasons:
You have the confusing situation where x != y but s4[x] and s4[y] are the same.
If someone ever changes the value pointed-to by a hash key your hash will break! That is:
boost::shared_ptr<int> tmp(new int(42));
s4[tmp] = 42;
*tmp = 24; // UNDEFINED BEHAVIOR
Typically with hash functions you want the key to be immutable; it will always compare the same, no matter what happens later. If you're using pointers, you usually want the pointer identity to be what is matched on, as in extra_info_hash[&some_object] = ...; this will normally always map to the same hash value whatever some_object's members may be. With the keys mutable after insertion is it all too easy to actually do so, resulting in undefined behavior in the hash.
Notice that in Boost <= 1.46.0, the default hash_value of a boost::shared_ptr is its boolean value, true or false.
For any shared_ptr that is not NULL, hash_value evaluates to 1 (one), as the (bool)shared_ptr == true.
In other words, you downgrade a hash set to a linked list if you are using Boost <= 1.46.0.
This is fixed in Boost 1.47.0, see https://svn.boost.org/trac/boost/ticket/5216 .
If you are using std::shared_ptr, please define your own hash function, or use boost/functional/hash/extensions.hpp from Boost >= 1.51.0
As you found out, the two objects inserted into s2 are distinct.

Resources