Is ZeroMemory the windows equivalent of null terminating a buffer? - windows

For example I by convention null terminate a buffer (set buffer equal to zero) the following way, example 1:
char buffer[1024] = {0};
And with the windows.h library we can call ZeroMemory, example 2:
char buffer[1024];
ZeroMemory(buffer, sizeof(buffer));
According to the documentation provided by microsoft: ZeroMemory Fills a block of memory with zeros. I want to be accurate in my windows application so I thought what better place to ask than stack overflow.
Are these two examples equivalent in logic?

Yes, the two codes are equivalent. The entire array is filled with zeros in both cases.
In the case of char buffer[1024] = {0};, you are explicitly setting only the first char element to 0, and then the compiler implicitly value-initializes the remaining 1023 char elements to 0 for you.
In C++11 and later, you can omit that first element value:
char buffer[1024] = {};
char buffer[1024]{};

Related

How could SSCANF provide so strange results?

I am in 4-day fight with this code:
unsigned long baudrate = 0;
unsigned char databits = 0;
unsigned char stop_bits = 0;
char parity_text[10];
char flowctrl_text[4];
const char xformat[] = "%lu,%hhu,%hhu,%[^,],%[^,]\n";
const char xtext[] = "115200,8,1,EVEN,NFC\n";
int res = sscanf(xtext, xformat, &baudrate, &databits, &stop_bits, (char*) &parity_text, (char*) &flowctrl_text);
printf("Res: %d\r\n", res);
printf("baudrate: %lu, databits: %hhu, stop: %hhu, \r\n", baudrate, databits, stop_bits);
printf("parity: %s \r\n", parity_text);
printf("flowctrl: %s \r\n", flowctrl_text);
It returns:
Res: 5
baudrate: 115200, databits: 0, stop: 1,
parity:
flowctrl: NFC
Databits and parity missing !
Actually memory under the parity variable is '\0'VEN'\0',
looks like the first characters was somehow overwritten by sscanf procedure.
Return value of sscanf is 5, which suggests, that it was able to parse the input.
My configuration:
gccarmnoneeabi 7.2.1
Visual Studio Code 1.43.2
PlatformIO Core 4.3.1
PlatformIO Home 3.1.1
Lib ST-STM 6.0.0 (Mbed 5.14.1)
STM32F446RE (Nucleo-F446RE)
I have tried (without success):
compiling with mbed RTOS and without
variable types uint8_t, uint32_t
gccarm versions: 6.3.1, 8.3.1, 9.2.1
using another IDE (CLion+PlatformIO)
compiling on another computer (same config)
What actually helps:
making the variables static
compiling in Mbed online compiler
The behavior of sscanf is as whole very unpredictable, mixing the order or datatype of variables sometimes helps, but most often ends with another flaws in the output.
This took me longer than I care to admit. But like most issues it ended up being very simple.
char parity_text[10];
char flowctrl_text[4];
Needs to be changed to:
char parity_text[10] = {0};
char flowctrl_text[5] = {0};
The flowctrl_text array is not large enough at size four to hold "EVEN" and the NULL termination. If you bump it to a size of 5 you should have no problem. Just to be safe I would also initialize the arrays to 0.
Once I increased the size I had 0 issues with your existing code. Let me know if this helps.

Compatibility of printf with utf-8 encoded strings

I'm trying to format some utf-8 encoded strings in C code (char *) using the printf function. I need to specify a length in format. Everything goes well when there are no multi-bytes characters in parameter string, but the result seems to be incorrect when there are some multibyte chars in data.
my glibc is kind of old (2.17), so I tried with some online compilers and result is the same.
#include <stdlib.h>
#include <locale.h>
int main(void)
{
setlocale( LC_CTYPE, "en_US.UTF-8" );
setlocale( LC_COLLATE, "en_US.UTF-8" );
printf( "'%-4.4s'\n", "elephant" );
printf( "'%-4.4s'\n", "éléphant" );
printf( "'%-20.20s'\n", "éléphant" );
return 0;
}
Result of execution is :
'elep'
'él�'
'éléphant '
First line is correct (4 chars in output)
Second line is obviously wrong (at least from a human point of view)
Last line is also wrong : only 18 unicode chars are written instead of 20
It seems that the printf function count chars before UTF-8 decoding (counting bytes instead of unicode chars)
Is that a bug in glibc or a well documented limitation of printf ?
It's true that printf counts bytes, not multibyte characters. If it's a bug, the bug is in the C standard, not in glibc (the standard library implementation usually used in conjunction with gcc).
In fairness, counting characters wouldn't help you align unicode output either, because unicode characters are not all the same display width even with fixed-width fonts. (Many codepoints are width 0, for example.)
I'm not going to attempt to argue that this behaviour is "well-documented". Standard C's locale facilities have never been particularly adequate to the task, imho, and they have never been particularly well documented, in part because the underlying model attempts to encompass so many possible encodings without ever grounding itself in a concrete example that it is almost impossible to explain. (...Long rant deleted...)
You can use the wchar.h formatted output functions,
which count in wide characters. (Which still isn't going to give you correct output alignment but it will count precision the way you expect.)
Let me quote rici: It's true that printf counts bytes, not multibyte characters. If it's a bug, the bug is in the C standard, not in glibc (the standard library implementation usually used in conjunction with gcc).
However, don't conflate wchar_t and UTF-8. See wikipedia to grasp the sense of the former. UTF-8, instead, can be dealt with almost as if it were good old ASCII. Just avoid truncating in the middle of a character.
In order to get alignment, you want to count characters. Then, pass the bytes count to printf. That can be achieved by using the * precision and passing the count of bytes. For example, since accented e takes two bytes:
printf("'-4.*s'\n", 6, "éléphant");
A function to count bytes is easily coded based on the format of UTF-8 characters:
static int count_bytes(char const *utf8_string, int length)
{
char const *s = utf8_string;
for (;;)
{
int ch = *(unsigned char *)s++;
if ((ch & 0xc0) == 0xc0) // first byte of a multi-byte UTF-8
while (((ch = *(unsigned char*)s) & 0xc0) == 0x80)
++s;
if (ch == 0)
break;
if (--length <= 0)
break;
}
return s - utf8_string;
}
At this point however, one would end up with lines like so:
printf("'-4.*s'\n", count_bytes("éléphant", 4), "éléphant");
Having to repeat the string twice quickly becomes a maintenance nightmare. At a minimum, one can define a macro to make sure the string is the same. Assuming the above function is saved in some utf8-util.h file, your program could be rewritten as follows:
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include "utf8-util.h"
#define INT_STR_PAIR(i, s) count_bytes(s, i), s
int main(void)
{
setlocale( LC_CTYPE, "en_US.UTF-8" );
setlocale( LC_COLLATE, "en_US.UTF-8" );
printf( "'%-4.*s'\n", INT_STR_PAIR(4, "elephant"));
printf( "'%-4.*s'\n", INT_STR_PAIR(4, "éléphant"));
printf( "'%-4.*s'\n", INT_STR_PAIR(4, "é𐅫éphant"));
printf( "'%-20.*s'\n", INT_STR_PAIR(20, "éléphant"));
return 0;
}
The last but one test uses 𐅫, the Greek acrophonic thespian three hundred (U+1016B) character. Given how the counting works, testing with consecutive non-ASCII characters makes sense. The ancient Greek character looks "wide" enough to see how much space it takes using a fixed-width font. The output may look like:
'elep'
'élép'
'é𐅫ép'
'éléphant '
(On my terminal, those 4-char strings are of equal length.)

What would be most efficient way to compare QString and char*

What would be most efficient way to compare QString and char*
if( mystring == mycharstar ) {} will perform malloc,
and
if(strcmp(mystring.toLocal8Bit().constData(),mycharstar ) == 0) {}
Will allocate a QByteArray
I would like to not have any allocation happening,
would would you guys recommend?
What about
if(mystring == QLatin1String(mycharstar))
Would it be any better?
There is no "efficient" way that only uses casts. This is because QtString internally uses 16 bits to encode a single character while C strings use only 8 bits. That means any comparison based on memory pointers will simply almost always return false.
That's why you have to encode the 16 bit wide characters of QtString to the same encoding as your C string and that always needs at least one call to malloc().
See also: How to convert QString to std::string?
It could be if( mystring == QLatin1String(mycharstar) ), as suggested here.

changing the alignment requirement while casting

I get the warning " cast increases required alignment of target type" while compiling the following code for ARM.
char data[2] = "aa";
int *ptr = (int *)(data);
I understand that the alignment requirement for char is 1 byte and that of int is 4 bytes and hence the warning.
I tried to change the alignment of char by using the aligned attribute.
char data[2] __attribute__((aligned (4)));
memcpy(data, "aa", 2);
int *ptr = (int *)(data);
But the warning doesn't go away.
My questions are
Why doesn't the warning go away?
As ARM generates hardware exception for misaligned accesses, I want to make sure that alignment issues don't occur. Is there any other way to write this code so that the alignment issue won't arise?
By the way, when I print alignof(data), it prints 4 which means the alignment of data is changed.
I'm using gcc version 4.4.1. Is it possible that the gcc would give the warning even if the aligned was changed using aligned attribute?
I don't quite understand why you would want to do this... but the problem is that the string literal "aa" isn't stored at an aligned address. The compiler likely optimized away the variable data entirely, and therefore only sees the code as int* ptr = (int*)"aa"; and then give the misalignment warning. No amount of fiddling with the data variable will change how the literal "aa" is aligned.
To avoid the literal being allocated on a misaligned address, you would have to tweak around with how string literals are stored in the compiler settings, which is probably not a very good idea.
Also note that it doesn't make sense to have a pointer to non-constant data pointing at a string literal.
So your code is nonsense. If you still for reasons unknown insist of having an int pointer to a string literal, I'd do some kind of work-around, for example like this:
typedef union
{
char arr[3];
int dummy;
} data_t;
const data_t my_literal = { .arr="aa" };
const int* strange_pointer = (const int*)&my_literal;

Stack around the variable 'xyz' was corrupted

I'm trying to get some simple piece of code I found on a website to work in VC++ 2010 on windows vista 64:
#include "stdafx.h"
#include <windows.h>
int _tmain(int argc, _TCHAR* argv[])
{
DWORD dResult;
BOOL result;
char oldWallPaper[MAX_PATH];
result = SystemParametersInfo(SPI_GETDESKWALLPAPER, sizeof(oldWallPaper)-1, oldWallPaper, 0);
fprintf(stderr, "Current desktop background is %s\n", oldWallPaper);
return 0;
}
it does compile, but when I run it, I always get this error:
Run-Time Check Failure #2 - Stack around the variable 'oldWallPaper' was corrupted.
I'm not sure what is going wrong, but I noticed, that the value of oldWallPaper looks something like "C\0:\0\0U\0s\0e\0r\0s[...]" -- I'm wondering where all the \0s come from.
A friend of mine compiled it on windows xp 32 (also VC++ 2010) and is able to run it without problems
any clues/hints/opinions?
thanks
The doc isn't very clear. The returned string is a WCHAR, two bytes per character not one, so you need to allocate twice as much space otherwise you get a buffer overrun. Try:
BOOL result;
WCHAR oldWallPaper[(MAX_PATH + 1)];
result = SystemParametersInfo(SPI_GETDESKWALLPAPER,
_tcslen(oldWallPaper), oldWallPaper, 0);
See also:
http://msdn.microsoft.com/en-us/library/ms724947(VS.85).aspx
http://msdn.microsoft.com/en-us/library/ms235631(VS.80).aspx (string conversion)
Every Windows function has 2 versions:
SystemParametersInfoA() // Ascii
SystemParametersInfoW() // Unicode
The version ending in W is the wide character type (ie Unicode) version of the function. All the \0's you are seeing are because every character you're getting back is in Unicode - 16 bytes per character - the second byte happens to be 0. So you need to store the result in a wchar_t array, and use wprintf instead of printf
wchar_t oldWallPaper[MAX_PATH];
result = SystemParametersInfo(SPI_GETDESKWALLPAPER, MAX_PATH-1, oldWallPaper, 0);
wprintf( L"Current desktop background is %s\n", oldWallPaper );
So you can use the A version SystemParametersInfoA() if you are hell-bent on not using Unicode. For the record you should always try to use Unicode, however.
Usually SystemParametersInfo() is a macro that evaluates to the W version, if UNICODE is defined on your system.

Resources