Easy way to get NUMBERFMT populated with defaults? - winapi

I'm using the Windows API GetNumberFormatEx to format some numbers for display with the appropriate localization choices for the current user (e.g., to make sure they have the right separators in the right places). This is trivial when you want exactly the user default.
But in some cases I sometimes have to override the number of digits after the radix separator. That requires providing a NUMBERFMT structure. What I'd like to do is to call an API that returns the NUMBERFMT populated with the appropriate defaults for the user, and then override just the fields I need to change. But there doesn't seem to be an API to get the defaults.
Currently, I'm calling GetLocaleInfoEx over and over and then translating that data into the form NUMBERFMT requires.
NUMBERFMT fmt = {0};
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT,
LOCALE_IDIGITS | LOCALE_RETURN_NUMBER,
reinterpret_cast<LPWSTR>(&fmt.NumDigits),
sizeof(fmt.NumDigits)/sizeof(WCHAR));
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT,
LOCALE_ILZERO | LOCALE_RETURN_NUMBER,
reinterpret_cast<LPWSTR>(&fmt.LeadingZero),
sizeof(fmt.LeadingZero)/sizeof(WCHAR));
WCHAR szGrouping[32] = L"";
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT, LOCALE_SGROUPING, szGrouping,
ARRAYSIZE(szGrouping));
if (::lstrcmp(szGrouping, L"3;0") == 0 ||
::lstrcmp(szGrouping, L"3") == 0
) {
fmt.Grouping = 3;
} else if (::lstrcmp(szGrouping, L"3;2;0") == 0) {
fmt.Grouping = 32;
} else {
assert(false); // unexpected grouping string
}
WCHAR szDecimal[16] = L"";
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT, LOCALE_SDECIMAL, szDecimal,
ARRAYSIZE(szDecimal));
fmt.lpDecimalSep = szDecimal;
WCHAR szThousand[16] = L"";
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT, LOCALE_STHOUSAND, szThousand,
ARRAYSIZE(szThousand));
fmt.lpThousandSep = szThousand;
::GetLocaleInfoEx(LOCALE_NAME_USER_DEFAULT,
LOCALE_INEGNUMBER | LOCALE_RETURN_NUMBER,
reinterpret_cast<LPWSTR>(&fmt.NegativeOrder),
sizeof(fmt.NegativeOrder)/sizeof(WCHAR));
Isn't there an API that already does this?

I just wrote some code to do this last week. Alas, there does not seem to be a GetDefaultNumberFormat(LCID lcid, NUMBERFMT* fmt) function; you will have to write it yourself as you've already started. On a side note, the grouping string has a well-defined format that can be easily parsed; your current code is wrong for "3" (should be 30) and obviously will fail on more exotic groupings (though this is probably not much of a concern, really).

If all you want to do is cut off the fractional digits from the end of the string, you can go with one of the default formats (like LOCALE_NAME_USER_DEFAULT), then check for the presence of the fractional separator (comma in continental languages, point in English) in the resulting character string, and then chop off the fractional part by replacing it with a null byte:
#define cut_off_decimals(sz, cch) \
if (cch >= 5 && (sz[cch-4] == _T('.') || sz[cch-4] == _T(','))) \
sz[cch-4] = _T('\0');
(Hungarian alert: sz is the C string, cch is character count, including the terminating null byte. And _T is the Windows generic text makro for either char or wchar_t depending on whether UNICODE is defined, only needed for compatibility with Windows 9x/ME.)
Note that this will produce incorrect results for the very odd case of a user-defined format where the third-to-last character is a dot or a comma that has some special meaning to the user other than fractional separator. I have never seen such a number format in my whole life, and hence I conclude that this is good and safe enough.
And of course this won't do anything if the third-to-last character is neither a dot nor a comma.

Related

What is %P in Tcl?

I saw this code en Tcl:
entry .amount -validate key -validatecommand {
expr {[string is int %P] || [string length %P]==0}
}
I know that it's an entry validation but, what does "%P" in that code? I was looking in the Tcl's doc but I didn't find nothing.
I think this is another way to do it but it has the same symbols:
proc check_the_input_only_allows_digits_only {P} {
expr {[string is int P] || [string length P] == 0}
}
entry .amount \
-validate key \
-validatecommand {check_the_input_only_allows_digits_only %P}
The tcl-tk page for entry says
%P
The value of the entry if the edit is allowed. If you are configuring the entry widget to have a new textvariable, this will be the value of that textvariable.
https://www.tcl.tk/man/tcl8.4/TkCmd/entry.html#M25
I think this is another way to do it but it has the same symbols:
You're close. You just have to use $ in a few places because you're just running a procedure and that's as normal for using parameters to procedures.
proc check_the_input_only_allows_digits_only {P} {
expr {[string is int $P] || [string length $P] == 0}
}
entry .amount \
-validate key \
-validatecommand {check_the_input_only_allows_digits_only %P}
It's recommended that you write things like that using a procedure for anything other than the most trivial of validations (or other callbacks); putting the complexity directly in the callback gets confusing quickly.
I recommend keeping validation loose during the input phase, and only making stuff strictly validated on form submission (or pressing the OK/Apply button, or whatever it is that makes sense in the GUI) precisely because it's really convenient to have invalid states there for a while in many forms while the input is being inputted. Per-key validation therefore probably should be used to only indicate whether it's believed that form submission will work, not to outright stop even transients from existing.
The string is int command returns true for zero-length input precisely because it was originally put in to work with that validation mechanism. It grinds my gears that actual validation of an integer needs string is int -strict. Can't change it now though; it's just a wrong default…
entry .amount -validate key -validatecommand {string is int %P}

strcmp() return different values for same string comparisons [duplicate]

This question already has an answer here:
Why does strcmp() in a template function return a different value?
(1 answer)
Closed 2 years ago.
char s1[] = "0";
char s2[] = "9";
printf("%d\n", strcmp(s1, s2)); // Prints -9
printf("%d\n", strcmp("0", "9")); // Prints -1
Why do strcmp returns different values when it receives the same parameters ?
Those values are still legal since strcmp's man page says that the return value of strcmp can be less, greater or equal than 0, but I don't understand why they are different in this example.
I assume you are using GCC when compiling this, I tried it on 4.8.4. The trick here is that GCC understands the semantics of certain standard library functions (strcmp being one of them). In your case, the compiler will completely eliminate the second strcmp call, because it knows that the result of strcmpgiven string constants "0" and "9" will be negative, and a standard compatible value (-1) will be used instead of doing the call. It cannot do the same with the first call, because s1 and s2 might have been changed in memory (imagine an interrupt, or multiple threads, etc.).
You can do an experiment to validate this. Add the const qualifier to the arrays to let GCC know that they cannot be changed:
const char s1[] = "0";
const char s2[] = "9";
printf("%d\n", strcmp(s1, s2)); // Now this will print -1 as well
printf("%d\n", strcmp("0", "9")); // Prints -1
You can also look at the assembler output form the compiler (use the -S flag).
The best way to check however is to use -fno-builtin, which disables this optimization. With this option, your original code will print -9 in both cases
The difference is due to the implementation of strcmp. As long as it conforms to the (<0, 0, >0), it shouldn't matter to the developer. You cannot rely on anything else. For all you know, the source code could be determining it should be negative, and randomly generating a negative number to throw you off.

How are null-terminated strings terminated in C++11?

Maybe it's stupid or obvious but I couldn't google any answer. What character ends a null-terminated string in C++11? NULL (which is in fact 0) or new nullptr? On the one hand, nullptr is supposed to replace NULL. On the other, though, I'm not sure if nullptr is a character at all. Or can be interpreted as one.
NULL and nullptr has little to do with null-terminated strings. Both NULL and nullptr are used to denote a pointer which points to nothing, ie. null.
The null-termination of c-style strings is still (and has always) been denoted by a CharT having the integral value 0; or as it's most often written when talking, through a char-literal; '\0'.
Remember that character types are nothing more than integral types with some special meaning.
Comparing a char to an int (which is the type of literal 0) is allowed, it's also allowed to assign the value 0 to a char, as stated: a character type is an integral type.. and integral types hold integral values.
Why this confusion?
Back in the days when we didn't have nullptr, instead we had the macro NULL to denote that a certain pointer didn't have anything to point towards. The value of NULL is, and was, implementation-specific but the behaviour was well-defined; it shall not compare equal to any pointer value that is actually pointing to something.
As a result of how the behaviour of NULL was described plenty of compilers used #define NULL 0, or similar construct, resulting in a "feature" where one could easily compare NULL to any integral type (including char) to see if it's relation to the value zero.
With the previously stated in mind you'd often stumbled upon code such as the below, where the for-condition would be equivalent of having *ptr != 0.
char const * str = "hello world";
for (char const * ptr = str; *ptr != NULL; ++ptr) {
...
}
Lesson learned: Just because something works doesn't mean that it is correct...
NULL and nullptr are completely separate concepts from the "null terminator". They have nothing more in common than the word "null". The null terminator is a character with value 0. It has nothing to do with null pointers.
You can use 0 or '\0' etc.

Determine if one string is a prefix of another

I have written down a simple function that determines if str1 is a prefix of str2. It's a very simple function, that looks like this (in JS):
function isPrefix(str1, str2) // determine if str1 is a prefix of a candidate string
{
if(str2.length < str1.length) // candidate string can't be smaller than prefix string
return false;
var i = 0;
while(str1.charAt(i) == str2.charAt(i) && i <= str1.length)
i++;
if(i < str1.length) // i terminated => str 1 is smaller than str 2
return false;
return true;
}
As you can see, it loops through the entire length of the prefix string to gauge if it is a prefix of the candidate string. This means it's complexity is O(N), which isn't bad but this becomes a problem when I have a huge data set to consider looping through to determine which strings have the prefix string as a part of the prefix. This makes the complexity multiple like O(M*N) where M is the total number of strings in a given data set. Not good.
I explored the Internet a bit to determine that the best answer would be a Patricia/Radix trie. Where strings are stored as prefixes. Even then, when I attempt to insert/look-up a string, there will be a considerable overhead in string matching if I use the aforementioned prefix gauging function.
Say I had a prefix string 'rom' and a set of candidate words
var dataset =["random","rapid","romance","romania","rome","rose"];
that would like this in a radix trie :
r
/ \
a o
/ \ / \
ndom pid se m
/ \
an e
/ \
ia ce
This means, for every node, I will be using the prefix match function, to determine which node has a value that matches the prefix string at the index. Somehow, this solution still seems arduous and does not sit too well with me. Is there something better or anyway I can improve the core prefix matching function ?
Looks like you've got two different problems.
One is to determine if a string is contained as a prefix in another string. For this I would suggest using a function already implemented in the language's string library. In JavaScript you could do this
if (str2.indexOf(str1) === 0) {
// string str1 is a prefix of str2
}
See documentation for String.indexOf here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf
For the other problem, in a bunch of strings, find out which ones have a given string as a prefix, building a data structure like a Trie or the one you mention seems like the way to go, if you want fast look-ups.
Check out this thread on stackoverflow - How to check if a string "StartsWith" another string? . Mark Byers solution seems to be very efficient. Also for Java there are built in String functions "endsWith" and "startsWith" - http://docs.oracle.com/javase/tutorial/java/data/comparestrings.html

Visual Studio C++ 2008 Manipulating Bytes?

I'm trying to write strictly binary data to files (no encoding). The problem is, when I hex dump the files, I'm noticing rather weird behavior. Using either one of the below methods to construct a file results in the same behavior. I even used the System::Text::Encoding::Default to test as well for the streams.
StreamWriter^ binWriter = gcnew StreamWriter(gcnew FileStream("test.bin",FileMode::Create));
(Also used this method)
FileStream^ tempBin = gcnew FileStream("test.bin",FileMode::Create);
BinaryWriter^ binWriter = gcnew BinaryWriter(tempBin);
binWriter->Write(0x80);
binWriter->Write(0x81);
.
.
binWriter->Write(0x8F);
binWriter->Write(0x90);
binWriter->Write(0x91);
.
.
binWriter->Write(0x9F);
Writing that sequence of bytes, I noticed the only bytes that weren't converted to 0x3F in the hex dump were 0x81,0x8D,0x90,0x9D, ... and I have no idea why.
I also tried making character arrays, and a similar situation happens. i.e.,
array<wchar_t,1>^ OT_Random_Delta_Limits = {0x00,0x00,0x03,0x79,0x00,0x00,0x04,0x88};
binWriter->Write(OT_Random_Delta_Limits);
0x88 would be written as 0x3F.
If you want to stick to binary files then don't use StreamWriter. Just use a FileStream and Write/WriteByte. StreamWriters (and TextWriters in generally) are expressly designed for text. Whether you want an encoding or not, one will be applied - because when you're calling StreamWriter.Write, that's writing a char, not a byte.
Don't create arrays of wchar_t values either - again, those are for characters, i.e. text.
BinaryWriter.Write should have worked for you unless it was promoting the values to char in which case you'd have exactly the same problem.
By the way, without specifying any encoding, I'd expect you to get non-0x3F values, but instead the bytes representing the UTF-8 encoded values for those characters.
When you specified Encoding.Default, you'd have seen 0x3F for any Unicode values not in that encoding.
Anyway, the basic lesson is to stick to Stream when you want to deal with binary data rather than text.
EDIT: Okay, it would be something like:
public static void ConvertHex(TextReader input, Stream output)
{
while (true)
{
int firstNybble = input.Read();
if (firstNybble == -1)
{
return;
}
int secondNybble = input.Read();
if (secondNybble == -1)
{
throw new IOException("Reader finished half way through a byte");
}
int value = (ParseNybble(firstNybble) << 4) + ParseNybble(secondNybble);
output.WriteByte((byte) value);
}
}
// value would actually be a char, but as we've got an int in the above code,
// it just makes things a bit easier
private static int ParseNybble(int value)
{
if (value >= '0' && value <= '9') return value - '0';
if (value >= 'A' && value <= 'F') return value - 'A' + 10;
if (value >= 'a' && value <= 'f') return value - 'a' + 10;
throw new ArgumentException("Invalid nybble: " + (char) value);
}
This is very inefficient in terms of buffering etc, but should get you started.
A BinaryWriter() class initialized with a stream will use a default encoding of UTF8 for any chars or strings that are written. I'm guessing that the
binWriter->Write(0x80);
binWriter->Write(0x81);
.
.
binWriter->Write(0x8F);
binWriter->Write(0x90);
binWriter->Write(0x91);
calls are binding to the Write( char) overload so they're going through the character encoder. I'm not very familiar with C++/CLI, but it seems to me that these calls should be binding to Write(Int32), which shouldn't have this problem (maybe your code is really calling Write() with a char variable that's set to the values in your example. That would account for this behavior).
0x3F is commonly known as the ASCII character '?'; the characters that are mapping to it are control characters with no printable representation. As Jon points out, use a binary stream rather than a text-oriented output mechanism for raw binary data.
EDIT -- actually your results look like the inverse of what I would expect. In the default code page 1252, the non-printable characters (i.e. ones likely to map to '?') in that range are 0x81, 0x8D, 0x8F, 0x90 and 0x9D

Resources