How are null-terminated strings terminated in C++11? - c++11

Maybe it's stupid or obvious but I couldn't google any answer. What character ends a null-terminated string in C++11? NULL (which is in fact 0) or new nullptr? On the one hand, nullptr is supposed to replace NULL. On the other, though, I'm not sure if nullptr is a character at all. Or can be interpreted as one.

NULL and nullptr has little to do with null-terminated strings. Both NULL and nullptr are used to denote a pointer which points to nothing, ie. null.
The null-termination of c-style strings is still (and has always) been denoted by a CharT having the integral value 0; or as it's most often written when talking, through a char-literal; '\0'.
Remember that character types are nothing more than integral types with some special meaning.
Comparing a char to an int (which is the type of literal 0) is allowed, it's also allowed to assign the value 0 to a char, as stated: a character type is an integral type.. and integral types hold integral values.
Why this confusion?
Back in the days when we didn't have nullptr, instead we had the macro NULL to denote that a certain pointer didn't have anything to point towards. The value of NULL is, and was, implementation-specific but the behaviour was well-defined; it shall not compare equal to any pointer value that is actually pointing to something.
As a result of how the behaviour of NULL was described plenty of compilers used #define NULL 0, or similar construct, resulting in a "feature" where one could easily compare NULL to any integral type (including char) to see if it's relation to the value zero.
With the previously stated in mind you'd often stumbled upon code such as the below, where the for-condition would be equivalent of having *ptr != 0.
char const * str = "hello world";
for (char const * ptr = str; *ptr != NULL; ++ptr) {
...
}
Lesson learned: Just because something works doesn't mean that it is correct...

NULL and nullptr are completely separate concepts from the "null terminator". They have nothing more in common than the word "null". The null terminator is a character with value 0. It has nothing to do with null pointers.
You can use 0 or '\0' etc.

Related

Why std::move() is not stealing an int value?

std::move() is stealing the string value whereas not an int, please help me.
int main()
{
int i = 50;
string str = "Mahesh";
int j = std::move(i);
string name = std::move(str);
std::cout <<"i: "<<i<<" J: "<<j <<std::endl;
std::cout <<"str: "<<str<<" name: "<<name <<std::endl;
return 0;
}
Output
i: 50 J: 50
str: name: Mahesh
std::move is a cast to an rvalue reference. This can change overload resolution, particularly with regard to constructors.
int is a fundamental type, it doesn't have any constructors. The definition for int initialisation does not care about whether the expression is const, volatile, lvalue or rvalue. Thus the behaviour is a copy.
One reason this is the case is that there is no benefit to a (destructive) move. Another reason is that there is no such thing as an "empty" int, in the sense that there are "empty" std::strings, and "empty" std::unique_ptrs
std::move() itself doesn't actually do any moving. It is simply used to indicate that an object may be moved from. The actual moving must be implemented for the respective types by a move constructor/move assignment operator.
std::move(x) returns an unnamed rvalue reference to x. rvalue references are really just like normal references. Their only purpose is simply to carry along the information about the "rvalue-ness" of the thing they refer to. When you then use the result of std::move() to initialize/assign to another object, overload resolution will pick a move constructor/move assignment operator if one exists. And that's it. That is literally all that std::move() does. However, the implementation of a move constructor/move assignment operator knows that the only way it could have been called is when the value passed to it is about to expire (otherwise, the copy constructor/copy assignment operator would have been called instead). It, thus, can safely "steal" the value rather than make a copy, whatever that may mean in the context of the particular type.
There is no general answer to the question what exactly it means to "steal" a value from an object. Whoever defines a type has to define whether it makes sense to move objects of this type and what exactly it means to do so (by declaring/defining the respective member functions). Built-in types don't have any special behavior defined for moving their values. So in the case of an int you just get what you get when you initialize an int with a reference to another int, which is a copy…

Retrieving keys with maximum values in a Hashmap in java 8 [duplicate]

As of Java 1.5, you can pretty much interchange Integer with int in many situations.
However, I found a potential defect in my code that surprised me a bit.
The following code:
Integer cdiCt = ...;
Integer cdsCt = ...;
...
if (cdiCt != null && cdsCt != null && cdiCt != cdsCt)
mismatch = true;
appeared to be incorrectly setting mismatch when the values were equal, although I can't determine under what circumstances. I set a breakpoint in Eclipse and saw that the Integer values were both 137, and I inspected the boolean expression and it said it was false, but when I stepped over it, it was setting mismatch to true.
Changing the conditional to:
if (cdiCt != null && cdsCt != null && !cdiCt.equals(cdsCt))
fixed the problem.
Can anyone shed some light on why this happened? So far, I have only seen the behavior on my localhost on my own PC. In this particular case, the code successfully made it past about 20 comparisons, but failed on 2. The problem was consistently reproducible.
If it is a prevalent problem, it should be causing errors on our other environments (dev and test), but so far, no one has reported the problem after hundreds of tests executing this code snippet.
Is it still not legitimate to use == to compare two Integer values?
In addition to all the fine answers below, the following stackoverflow link has quite a bit of additional information. It actually would have answered my original question, but because I didn't mention autoboxing in my question, it didn't show up in the selected suggestions:
Why can't the compiler/JVM just make autoboxing “just work”?
The JVM is caching Integer values. Hence the comparison with == only works for numbers between -128 and 127.
Refer: #Immutable_Objects_.2F_Wrapper_Class_Caching
You can't compare two Integer with a simple == they're objects so most of the time references won't be the same.
There is a trick, with Integer between -128 and 127, references will be the same as autoboxing uses Integer.valueOf() which caches small integers.
If the value p being boxed is true, false, a byte, a char in the range \u0000 to \u007f, or an int or short number between -128 and 127, then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.
Resources :
JLS - Boxing
On the same topic :
autoboxing vs manual boxing java
"==" always compare the memory location or object references of the values. equals method always compare the values. But equals also indirectly uses the "==" operator to compare the values.
Integer uses Integer cache to store the values from -128 to +127. If == operator is used to check for any values between -128 to 127 then it returns true. for other than these values it returns false .
Refer the link for some additional info
Integer refers to the reference, that is, when comparing references you're comparing if they point to the same object, not value. Hence, the issue you're seeing. The reason it works so well with plain int types is that it unboxes the value contained by the Integer.
May I add that if you're doing what you're doing, why have the if statement to begin with?
mismatch = ( cdiCt != null && cdsCt != null && !cdiCt.equals( cdsCt ) );
The issue is that your two Integer objects are just that, objects. They do not match because you are comparing your two object references, not the values within. Obviously .equals is overridden to provide a value comparison as opposed to an object reference comparison.
Besides these given great answers, What I have learned is that:
NEVER compare objects with == unless you intend to be comparing them
by their references.
As well for correctness of using == you can just unbox one of compared Integer values before doing == comparison, like:
if ( firstInteger.intValue() == secondInteger ) {..
The second will be auto unboxed (of course you have to check for nulls first).

About the underlying storage of std::basic_string

After reading through the description about std::basic_string on cppreference, I'm uncertain about the following two questions regarding the underlying storage of std::basic_string:
1) Since C++11, does the contiguity of std::basic_string extends to the terminating null character? Note that str[str.size()] returns a reference to a terminating null character. But I want to make sure whether this is the one after str[str.size() - 1].
2) Since C++11, data() and c_str() become equivalent. But does it hold that data() == c_str() == &front()?
Any quotation from the standard would be appreciated.
21.4.1.7 basic_string accessors
const charT* c_str() const noexcept;
const charT* data() const noexcept;
1 Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
2 Complexity: constant time.
This effectively requires that the terminating NUL be stored contiguously together with the character sequence (it forces an additional requirement on operator[] that s[s.size()] not do anything fancy, though the plain text of 21.4.5 appears to give it some latitude).
It also explicitly requires that s.c_str() == &s[0], which in turn means that s.c_str() == &s.front() (front() is defined as operator[](0)).

Is it safe to write to a std::strings buffer directly?

If I have the following code:
std::string hello = "hello world";
char* internalBuffer = &hello[0];
Is it then safe to write to internalBuffer up to hello.length()? Or is this UB/implemention defined? Obviously I can write tests and see that this works, but it doesn't answer my question.
Yes, it's safe. No, it's not explicitly allowed by the standard.
According to my copy of the standard draft from like half a year ago, they do assure that data() points at a contiguous array, and that that array be the same as what you receive from operator[]:
21.4.7.1 basic_string accessors [string.accessors]
const charT* c_str() const noexcept;
const charT* data() const noexcept;
Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
From this one can conclude that operator[] returns a reference to some place within that contiguous array. They also allow the returned reference from (non-const) operator[] be modified.
Having a non-const reference to one member of an array I dare to say that we can modify the entire array.
The relevant section in the standard is §21.4.5:
const_reference operator[](size_type pos) const noexcept;
reference operator[](size_type pos) noexcept;
[...]
Returns: *(begin() + pos) if pos < size(), otherwise a reference to an
object of type T with value charT(); the referenced value shall not be modified.
If I understand this correctly, it means that as long as the index given to operator[] is smaller than the string's size, one is allowed to modify the value. If however, the index is equal to size and thus we obtain the \0 terminating the string, we must not write to this value.
Cppreference uses a slightly different wording here:
If pos == size(), a reference to the character with value CharT() (the null character) is returned.
For the first (non-const) version,the behavior is undefined if this character is modified.
I read this such that 'this character' here only refers to the default constructed CharT, and not to the reference returned in the other case. But I admit that the wording is a bit confusing here.
In practice it is safe, theoretically - no.
C++ standard doesn't force to implement string as a sequential character array like it does for the vector. I'm not aware of any implementation of string where it is not safe, but theoretically there is no guarantee.
http://herbsutter.com/2008/04/07/cringe-not-vectors-are-guaranteed-to-be-contiguous/

Easy way to get NUMBERFMT populated with defaults?

I'm using the Windows API GetNumberFormatEx to format some numbers for display with the appropriate localization choices for the current user (e.g., to make sure they have the right separators in the right places). This is trivial when you want exactly the user default.
But in some cases I sometimes have to override the number of digits after the radix separator. That requires providing a NUMBERFMT structure. What I'd like to do is to call an API that returns the NUMBERFMT populated with the appropriate defaults for the user, and then override just the fields I need to change. But there doesn't seem to be an API to get the defaults.
Currently, I'm calling GetLocaleInfoEx over and over and then translating that data into the form NUMBERFMT requires.
NUMBERFMT fmt = {0};
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT,
LOCALE_IDIGITS | LOCALE_RETURN_NUMBER,
reinterpret_cast<LPWSTR>(&fmt.NumDigits),
sizeof(fmt.NumDigits)/sizeof(WCHAR));
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT,
LOCALE_ILZERO | LOCALE_RETURN_NUMBER,
reinterpret_cast<LPWSTR>(&fmt.LeadingZero),
sizeof(fmt.LeadingZero)/sizeof(WCHAR));
WCHAR szGrouping[32] = L"";
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT, LOCALE_SGROUPING, szGrouping,
ARRAYSIZE(szGrouping));
if (::lstrcmp(szGrouping, L"3;0") == 0 ||
::lstrcmp(szGrouping, L"3") == 0
) {
fmt.Grouping = 3;
} else if (::lstrcmp(szGrouping, L"3;2;0") == 0) {
fmt.Grouping = 32;
} else {
assert(false); // unexpected grouping string
}
WCHAR szDecimal[16] = L"";
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT, LOCALE_SDECIMAL, szDecimal,
ARRAYSIZE(szDecimal));
fmt.lpDecimalSep = szDecimal;
WCHAR szThousand[16] = L"";
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT, LOCALE_STHOUSAND, szThousand,
ARRAYSIZE(szThousand));
fmt.lpThousandSep = szThousand;
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT,
LOCALE_INEGNUMBER | LOCALE_RETURN_NUMBER,
reinterpret_cast<LPWSTR>(&fmt.NegativeOrder),
sizeof(fmt.NegativeOrder)/sizeof(WCHAR));
Isn't there an API that already does this?
I just wrote some code to do this last week. Alas, there does not seem to be a GetDefaultNumberFormat(LCID lcid, NUMBERFMT* fmt) function; you will have to write it yourself as you've already started. On a side note, the grouping string has a well-defined format that can be easily parsed; your current code is wrong for "3" (should be 30) and obviously will fail on more exotic groupings (though this is probably not much of a concern, really).
If all you want to do is cut off the fractional digits from the end of the string, you can go with one of the default formats (like LOCALE_NAME_USER_DEFAULT), then check for the presence of the fractional separator (comma in continental languages, point in English) in the resulting character string, and then chop off the fractional part by replacing it with a null byte:
#define cut_off_decimals(sz, cch) \
if (cch >= 5 && (sz[cch-4] == _T('.') || sz[cch-4] == _T(','))) \
sz[cch-4] = _T('\0');
(Hungarian alert: sz is the C string, cch is character count, including the terminating null byte. And _T is the Windows generic text makro for either char or wchar_t depending on whether UNICODE is defined, only needed for compatibility with Windows 9x/ME.)
Note that this will produce incorrect results for the very odd case of a user-defined format where the third-to-last character is a dot or a comma that has some special meaning to the user other than fractional separator. I have never seen such a number format in my whole life, and hence I conclude that this is good and safe enough.
And of course this won't do anything if the third-to-last character is neither a dot nor a comma.

Resources