About the underlying storage of std::basic_string - c++11

After reading through the description about std::basic_string on cppreference, I'm uncertain about the following two questions regarding the underlying storage of std::basic_string:
1) Since C++11, does the contiguity of std::basic_string extends to the terminating null character? Note that str[str.size()] returns a reference to a terminating null character. But I want to make sure whether this is the one after str[str.size() - 1].
2) Since C++11, data() and c_str() become equivalent. But does it hold that data() == c_str() == &front()?
Any quotation from the standard would be appreciated.

21.4.1.7 basic_string accessors
const charT* c_str() const noexcept;
const charT* data() const noexcept;
1 Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
2 Complexity: constant time.
This effectively requires that the terminating NUL be stored contiguously together with the character sequence (it forces an additional requirement on operator[] that s[s.size()] not do anything fancy, though the plain text of 21.4.5 appears to give it some latitude).
It also explicitly requires that s.c_str() == &s[0], which in turn means that s.c_str() == &s.front() (front() is defined as operator[](0)).

Related

Eigen cast with auto return type - Less efficient than explicit return type?

When casting a vector integers (i.e. Eigen::VectorXi) to a vector of doubles, and then operating on that vector of doubles, the generated assembly is dramatically different if the return type of the cast is auto.
In other words, using:
Eigen::VectorXi int_vec(3);
int_vec << 1, 2, 3;
Eigen::VectorXd dbl_vec = int_vec.cast<double>();
Compared to:
Eigen::VectorXi int_vec(3);
int_vec << 1, 2, 3;
auto dbl_vec = int_vec.cast<double>();
Here are two examples on godbolt:
VectorXd return type: https://godbolt.org/z/0FLC4r
auto return type: https://godbolt.org/z/MGxCaL
What are the ramifications of using auto for the return here? I thought it would be more efficient by avoiding a copy, but now I'm not sure.
Indeed, in your code in the question you avoid a copy (indeed, until dbl_vec is used, it's essentially a noop). However, in the code on godbolt, you traverse the original int_vec and evaluate dbl_vec at least twice, possibly thrice:
max + std::log((dbl_vec.array() - max)
^^^ ^^^^^^^ ^^^
I'm not sure if the two calls to max are collapsed into a temporary or not. I'd hope so.
In any case, kmdreko is right and you should avoid using auto with Eigen unless you know exactly what you're doing. In this case, the auto is an expression template that does not get evaluated until used. If you use it more than once, then it gets evaluated more than once. If the evaluation is expensive, then the savings from not using a copy are lost (with interest) to the additional evaluation times.

Why std::move() is not stealing an int value?

std::move() is stealing the string value whereas not an int, please help me.
int main()
{
int i = 50;
string str = "Mahesh";
int j = std::move(i);
string name = std::move(str);
std::cout <<"i: "<<i<<" J: "<<j <<std::endl;
std::cout <<"str: "<<str<<" name: "<<name <<std::endl;
return 0;
}
Output
i: 50 J: 50
str: name: Mahesh
std::move is a cast to an rvalue reference. This can change overload resolution, particularly with regard to constructors.
int is a fundamental type, it doesn't have any constructors. The definition for int initialisation does not care about whether the expression is const, volatile, lvalue or rvalue. Thus the behaviour is a copy.
One reason this is the case is that there is no benefit to a (destructive) move. Another reason is that there is no such thing as an "empty" int, in the sense that there are "empty" std::strings, and "empty" std::unique_ptrs
std::move() itself doesn't actually do any moving. It is simply used to indicate that an object may be moved from. The actual moving must be implemented for the respective types by a move constructor/move assignment operator.
std::move(x) returns an unnamed rvalue reference to x. rvalue references are really just like normal references. Their only purpose is simply to carry along the information about the "rvalue-ness" of the thing they refer to. When you then use the result of std::move() to initialize/assign to another object, overload resolution will pick a move constructor/move assignment operator if one exists. And that's it. That is literally all that std::move() does. However, the implementation of a move constructor/move assignment operator knows that the only way it could have been called is when the value passed to it is about to expire (otherwise, the copy constructor/copy assignment operator would have been called instead). It, thus, can safely "steal" the value rather than make a copy, whatever that may mean in the context of the particular type.
There is no general answer to the question what exactly it means to "steal" a value from an object. Whoever defines a type has to define whether it makes sense to move objects of this type and what exactly it means to do so (by declaring/defining the respective member functions). Built-in types don't have any special behavior defined for moving their values. So in the case of an int you just get what you get when you initialize an int with a reference to another int, which is a copy…

Why can't std::is_permutation act between two different types of data?

Suppose I have a vector of integers and of strings, and I want to compare whether they have equivalent elements, without consideration of order. Ultimately, I'm asking if the integer vector is a permutation of the string vector (or vice versa). I'd like to be able to just call is_permutation, specify a binary predicate that allows me to compare the two, and move on with my life. eg:
bool checkIntStringComparison( const std::vector<int>& intVec,
const std::vector<std::string>& stringVec,
const std::map<int, std::string>& intStringMap){
return std::is_permutation<std::vector<int>::const_iterator, std::vector<std::string>::const_iterator>(
intVec.cbegin(), intVec.cend(), stringVec.cbegin(), [&intStringMap](const int& i, const std::string& string){
return string == intStringMap.at(i);
});
}
But trying to compile this (in gcc) returns an error message that boils down to:
no match for call to stuff::< lambda(const int&, const string& >)(const std::_cxx11::basic_string&, const int&)
see how it switches the calling signature from the lambda's? If I switch them around, the signature switches itself the other way.
Digging around about this error, it seems that the standard specifies for std::is_permutation that ForwardIterator1 and 2 must be the same type. So I understand the compiler error in that regard. But why should it be this way? If I provide a binary predicate that allows me to compare the two (or if we had previously defined some equality operator between the two?), isn't the real core of the algorithm just searching through container 1 to make sure all its elements are in container 2 uniquely?
The problem is that an element can occur more than once. That means that the predicate needs to be able to not only compare the elements of the first range to the elements of the second range, but to compare the elements of the first range to themselves:
if (size(range1) != size(range2))
return false;
for (auto const& x1 : range1)
if (count_if(range1, [&](auto const& y1) { return pred(x1, y1); }) !=
count_if(range2, [&](auto const& y2) { return pred(x1, y2); }))
return false;
return true;
Since it's relatively tricky to create a function object that takes two distinct signatures, and passing two predicates would be confusing, the easiest option was to specify that both ranges must have the same value type.
Your options are:
Wrap one range (or both) in a transform that gives the same value type (e.g. use Boost.Adaptors.Transformed);
Write your own implementation of std::is_permutation (e.g. copying the example implementation on cppreference);
Actually, note that the gcc (i.e. libstdc++) implementation does not enforce that the value types are the same; it just requires several signatures which you'd have to provide anyway, so write a polymorphic predicate as e.g. a function object or a polymorphic lambda, or with parameter types convertible from both range value types (e.g. in your case boost::variant<int, string> - ugly, but probably not that bad). This is non-portable, as another implementation might choose to enforce that requirement.

How are null-terminated strings terminated in C++11?

Maybe it's stupid or obvious but I couldn't google any answer. What character ends a null-terminated string in C++11? NULL (which is in fact 0) or new nullptr? On the one hand, nullptr is supposed to replace NULL. On the other, though, I'm not sure if nullptr is a character at all. Or can be interpreted as one.
NULL and nullptr has little to do with null-terminated strings. Both NULL and nullptr are used to denote a pointer which points to nothing, ie. null.
The null-termination of c-style strings is still (and has always) been denoted by a CharT having the integral value 0; or as it's most often written when talking, through a char-literal; '\0'.
Remember that character types are nothing more than integral types with some special meaning.
Comparing a char to an int (which is the type of literal 0) is allowed, it's also allowed to assign the value 0 to a char, as stated: a character type is an integral type.. and integral types hold integral values.
Why this confusion?
Back in the days when we didn't have nullptr, instead we had the macro NULL to denote that a certain pointer didn't have anything to point towards. The value of NULL is, and was, implementation-specific but the behaviour was well-defined; it shall not compare equal to any pointer value that is actually pointing to something.
As a result of how the behaviour of NULL was described plenty of compilers used #define NULL 0, or similar construct, resulting in a "feature" where one could easily compare NULL to any integral type (including char) to see if it's relation to the value zero.
With the previously stated in mind you'd often stumbled upon code such as the below, where the for-condition would be equivalent of having *ptr != 0.
char const * str = "hello world";
for (char const * ptr = str; *ptr != NULL; ++ptr) {
...
}
Lesson learned: Just because something works doesn't mean that it is correct...
NULL and nullptr are completely separate concepts from the "null terminator". They have nothing more in common than the word "null". The null terminator is a character with value 0. It has nothing to do with null pointers.
You can use 0 or '\0' etc.

Is it safe to write to a std::strings buffer directly?

If I have the following code:
std::string hello = "hello world";
char* internalBuffer = &hello[0];
Is it then safe to write to internalBuffer up to hello.length()? Or is this UB/implemention defined? Obviously I can write tests and see that this works, but it doesn't answer my question.
Yes, it's safe. No, it's not explicitly allowed by the standard.
According to my copy of the standard draft from like half a year ago, they do assure that data() points at a contiguous array, and that that array be the same as what you receive from operator[]:
21.4.7.1 basic_string accessors [string.accessors]
const charT* c_str() const noexcept;
const charT* data() const noexcept;
Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
From this one can conclude that operator[] returns a reference to some place within that contiguous array. They also allow the returned reference from (non-const) operator[] be modified.
Having a non-const reference to one member of an array I dare to say that we can modify the entire array.
The relevant section in the standard is §21.4.5:
const_reference operator[](size_type pos) const noexcept;
reference operator[](size_type pos) noexcept;
[...]
Returns: *(begin() + pos) if pos < size(), otherwise a reference to an
object of type T with value charT(); the referenced value shall not be modified.
If I understand this correctly, it means that as long as the index given to operator[] is smaller than the string's size, one is allowed to modify the value. If however, the index is equal to size and thus we obtain the \0 terminating the string, we must not write to this value.
Cppreference uses a slightly different wording here:
If pos == size(), a reference to the character with value CharT() (the null character) is returned.
For the first (non-const) version,the behavior is undefined if this character is modified.
I read this such that 'this character' here only refers to the default constructed CharT, and not to the reference returned in the other case. But I admit that the wording is a bit confusing here.
In practice it is safe, theoretically - no.
C++ standard doesn't force to implement string as a sequential character array like it does for the vector. I'm not aware of any implementation of string where it is not safe, but theoretically there is no guarantee.
http://herbsutter.com/2008/04/07/cringe-not-vectors-are-guaranteed-to-be-contiguous/

Resources