CryptStringToBinary bug? Or do I not understand something?

CryptStringToBinary bug? Or do I not understand something? - windows

It seems the value returned by CryptStringToBinary() in pdwSkip parameter is wrong.
The documentation says:
pdwSkip - a pointer to a DWORD value that receives the number of
characters skipped to reach the beginning of the actual base64 or
hexadecimal strings.
char buf[100]={0};
DWORD bufSize=sizeof(buf);
DWORD skip=0, flags=0;
BOOL rv=CryptStringToBinary("\r\n\t c3Nzc3Nzcw==\r\n\t ",0,CRYPT_STRING_BASE64,
buf,&bufSize,&skip,&flags);
if(rv) {
printf("skip=%u\n",skip);
}
The code prints:skip=0I expected it to be 4 because "the actual base64 string" is "c3Nzc3Nzcw==". And before it there are 4 characters.I tested it on Windows 8.1 with latest updates.

You are passing CRYPT_STRING_BASE64 in the third param - that means there are no headers.
If you pass CRYPT_STRING_BASE64HEADER instead, the function will interpret your string as PEM encoded data. PEM looks like this:
------ BEGIN STUFF --------------
AAAAAAAAABBBBBBBBBBBBBBBBBCCCCCCD
GGGGGGGGGGGaaaaaaaaaaaaasss666666
------ END STUFF---------
I'm not sure what exactly is the heuristic that the function uses to detect the header (probably a sequence of dashes, followed by some ASCII, followed by more dashes, then EOL), but "\r\n\t " is definitely not a reasonable header in a PEM encoded crypto object. Those are valid Base64 characters. The docs make a reference to "certificate beginning and ending headers" - that's a very specific thing, the PEM header/footer lines.
Not sure if the function is designed to quietly skip whitespace between Base64 characters, the docs are silent on that. That said, quietly skipping whitespace is pretty much a requirement for any PEM friendly Base64 decoder. PEM includes whitespace (the newlines) by design. But they definitely don't count that whitespace as a header. For one thing, whitespace in PEM occurs in the middle.

After you add the the beginning of the actual base64, you will receive the number of skipped characters.
Try this:
#include <Windows.h>
#include <wincrypt.h>
#include <stdio.h>
#include <vector>
#pragma comment(lib,"Crypt32.lib")
int main()
{
LPCSTR szPemPublicKey =
"\r\n\t "
"-----BEGIN CERTIFICATE -----"
"MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAqRLhZXK29Xo5YdSoMdAe"
"MHwDYAmThPSJzbQaBhVLCY1DTQr0JRkvd+0xfdwih97bWUXVpxuOgYH9hofIzZGP"
"-----END CERTIFICATE -----";
BYTE derPrivateKey[2048];
char buf[1024] = { 0 };
DWORD bufSize = sizeof(buf);
DWORD skip = 0, flags = 0;
BOOL rv = CryptStringToBinary(szPemPublicKey, 0, CRYPT_STRING_BASE64HEADER,
derPrivateKey, &bufSize, &skip, &flags);
if (rv) {
printf("skip=%u\n", skip);
}
}
Debug:

Related

Compatibility of printf with utf-8 encoded strings

I'm trying to format some utf-8 encoded strings in C code (char *) using the printf function. I need to specify a length in format. Everything goes well when there are no multi-bytes characters in parameter string, but the result seems to be incorrect when there are some multibyte chars in data.
my glibc is kind of old (2.17), so I tried with some online compilers and result is the same.
#include <stdlib.h>
#include <locale.h>
int main(void)
{
setlocale( LC_CTYPE, "en_US.UTF-8" );
setlocale( LC_COLLATE, "en_US.UTF-8" );
printf( "'%-4.4s'\n", "elephant" );
printf( "'%-4.4s'\n", "éléphant" );
printf( "'%-20.20s'\n", "éléphant" );
return 0;
}
Result of execution is :
'elep'
'él�'
'éléphant '
First line is correct (4 chars in output)
Second line is obviously wrong (at least from a human point of view)
Last line is also wrong : only 18 unicode chars are written instead of 20
It seems that the printf function count chars before UTF-8 decoding (counting bytes instead of unicode chars)
Is that a bug in glibc or a well documented limitation of printf ?

It's true that printf counts bytes, not multibyte characters. If it's a bug, the bug is in the C standard, not in glibc (the standard library implementation usually used in conjunction with gcc).
In fairness, counting characters wouldn't help you align unicode output either, because unicode characters are not all the same display width even with fixed-width fonts. (Many codepoints are width 0, for example.)
I'm not going to attempt to argue that this behaviour is "well-documented". Standard C's locale facilities have never been particularly adequate to the task, imho, and they have never been particularly well documented, in part because the underlying model attempts to encompass so many possible encodings without ever grounding itself in a concrete example that it is almost impossible to explain. (...Long rant deleted...)
You can use the wchar.h formatted output functions,
which count in wide characters. (Which still isn't going to give you correct output alignment but it will count precision the way you expect.)

Let me quote rici: It's true that printf counts bytes, not multibyte characters. If it's a bug, the bug is in the C standard, not in glibc (the standard library implementation usually used in conjunction with gcc).
However, don't conflate wchar_t and UTF-8. See wikipedia to grasp the sense of the former. UTF-8, instead, can be dealt with almost as if it were good old ASCII. Just avoid truncating in the middle of a character.
In order to get alignment, you want to count characters. Then, pass the bytes count to printf. That can be achieved by using the * precision and passing the count of bytes. For example, since accented e takes two bytes:
printf("'-4.*s'\n", 6, "éléphant");
A function to count bytes is easily coded based on the format of UTF-8 characters:
static int count_bytes(char const *utf8_string, int length)
{
char const *s = utf8_string;
for (;;)
{
int ch = *(unsigned char *)s++;
if ((ch & 0xc0) == 0xc0) // first byte of a multi-byte UTF-8
while (((ch = *(unsigned char*)s) & 0xc0) == 0x80)
++s;
if (ch == 0)
break;
if (--length <= 0)
break;
}
return s - utf8_string;
}
At this point however, one would end up with lines like so:
printf("'-4.*s'\n", count_bytes("éléphant", 4), "éléphant");
Having to repeat the string twice quickly becomes a maintenance nightmare. At a minimum, one can define a macro to make sure the string is the same. Assuming the above function is saved in some utf8-util.h file, your program could be rewritten as follows:
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include "utf8-util.h"
#define INT_STR_PAIR(i, s) count_bytes(s, i), s
int main(void)
{
setlocale( LC_CTYPE, "en_US.UTF-8" );
setlocale( LC_COLLATE, "en_US.UTF-8" );
printf( "'%-4.*s'\n", INT_STR_PAIR(4, "elephant"));
printf( "'%-4.*s'\n", INT_STR_PAIR(4, "éléphant"));
printf( "'%-4.*s'\n", INT_STR_PAIR(4, "é𐅫éphant"));
printf( "'%-20.*s'\n", INT_STR_PAIR(20, "éléphant"));
return 0;
}
The last but one test uses 𐅫, the Greek acrophonic thespian three hundred (U+1016B) character. Given how the counting works, testing with consecutive non-ASCII characters makes sense. The ancient Greek character looks "wide" enough to see how much space it takes using a fixed-width font. The output may look like:
'elep'
'élép'
'é𐅫ép'
'éléphant '
(On my terminal, those 4-char strings are of equal length.)

Read a utf8 file to a std::string without BOM

I am trying to read a utf8 content to char*, my file does not have any DOM, so the code is straight, (the file is unicode punctuation)
char* fileData = "\u2010\u2020";
I cannot see how a single unsigned char 0 > 255 can contain a character of value 0 > 65535 so I must be missing something.
...
std::ifstream fs8("../test_utf8.txt");
if (fs8.is_open())
{
unsigned line_count = 1;
std::string line;
while ( getline(fs8, line))
{
std::cout << ++line_count << '\t' << line << L'\n';
}
}
...
So how can I read a utf8 file into a char*, (or even a std::string)

well, you ARE reading the file correctly into std::string and std::string do support UTF8, it's probably that your console * which cannot show non-ASCII character.
basically, when a character code page is bigger than CHAR_MAX/2, you simply represent this character with many character.
how and how many characters? this is what encoding is all about.
UTF32 for example, will show each character, ASCII and non ASCII as 4 characters. hence the "32" (each byte is 8 bit, 4*8 = 32).
without providing any auditional information on what OS you are using, we can't give a an advice on how your program can show the file's line.
*or more exactly, the standard output which will probably be implemented as console text.

using MultiByteToWideChar

The following code prints the desired output but it prints garbage at the end of the string. There is something wrong with the last call to MultiByteToWideChar but I can't figure out what. Please help??
#include "stdafx.h"
#include<Windows.h>
#include <iostream>
using namespace std;
#include<tchar.h>
int main( int, char *[] )
{
TCHAR szPath[MAX_PATH];
if(!GetModuleFileName(NULL,szPath,MAX_PATH))
{cout<<"Unable to get module path"; exit(0);}
char ansiStr[MAX_PATH];
if(!WideCharToMultiByte(CP_ACP,WC_COMPOSITECHECK,szPath,-1,
ansiStr,MAX_PATH,NULL,NULL))
{cout<<"Unicode to ANSI failed\n";
cout<<GetLastError();exit(1);}
string s(ansiStr);
size_t pos = 0;
while(1)
{
pos = s.find('\\',pos);
if(pos == string::npos)
break;
s.insert(pos,1,'\\');
pos+=2;
}
if(!MultiByteToWideChar(CP_ACP,MB_PRECOMPOSED,s.c_str(),s.size(),szPath,MAX_PATH))
{cout<<"ANSI to Unicode failed"; exit(2);}
wprintf(L"%s",szPath);
}

MSDN has this to say about the cbMultiByte parameter:
If this parameter is -1, the function processes the entire input
string, including the terminating null character. Therefore, the
resulting Unicode string has a terminating null character, and the
length returned by the function includes this character.
If this parameter is set to a positive integer, the function processes
exactly the specified number of bytes. If the provided size does not
include a terminating null character, the resulting Unicode string is
not null-terminated, and the returned length does not include this
character.
..so if you want the output string to be 0 terminated you should include the 0 terminator in the length you pass in OR 0 terminate yourself based on the return value...

WM_GETTEXT usage

I'm trying to get the status of a text field in my application. But I don't get it to work. I'm using "SendMessage" to get "WM_GETTEXT", I save the content to a char *.
I output the char * to a file, but I only get "D" back. This is what I have now:
LRESULT result;
char * output = (char*)malloc(1024);
result = SendMessage(hwnd,WM_GETTEXT,1024,(LPARAM)output);
ofstream file("test.txt");
file << *output;
file.close();
delete [] output;

Pointers concepts
file << *output; will print the first element of the string array
file << output; print the entire string

C# code:
public const uint WM_GETTEXT = 0xD;
const int bufferSize = 10000;
StringBuilder sb = new StringBuilder(bufferSize);
SendMessageGetText(handle, WM_GETTEXT, new UIntPtr(bufferSize), sb);
Console.WriteLine(sb.ToString());
Working properly to me!

Sophia's answer is correct. However, the default now for a Visual Studio project is to create a Unicode project. You will only get the first letter if your project is Unicode and not MBCS.
Have you examined the buffer returned from WM_GETTEXT to verify it has the entire string?
If not, try declaring your output variable as TCHAR* (to be generic) or as a wchar_t* and see what results you get in the buffer.
p.s. It is bad form to allocate memory with malloc and release it with delete. You should either use malloc/free pairs or new/delete pairs. Even safer way to allocate a char buffer is to use std::string or use std::wstring for a wide string.
p.p.s Try making sure your project settings are for a Multibyte project and not Unicode project. Then everything in Sophia's answer will work.
One more thing... Just use GetWindowText() API instead of the SendMessage stuff. That's why it is there so you don't have to go through the rigmarole of casting a pointer to a LPARAM or WPARAM. It's more typesafe and will give you a compile time error (better than runtime errors) if your types don't match up--especially with Unicode/MBCS and wchar_t/char.

Stack around the variable 'xyz' was corrupted

I'm trying to get some simple piece of code I found on a website to work in VC++ 2010 on windows vista 64:
#include "stdafx.h"
#include <windows.h>
int _tmain(int argc, _TCHAR* argv[])
{
DWORD dResult;
BOOL result;
char oldWallPaper[MAX_PATH];
result = SystemParametersInfo(SPI_GETDESKWALLPAPER, sizeof(oldWallPaper)-1, oldWallPaper, 0);
fprintf(stderr, "Current desktop background is %s\n", oldWallPaper);
return 0;
}
it does compile, but when I run it, I always get this error:
Run-Time Check Failure #2 - Stack around the variable 'oldWallPaper' was corrupted.
I'm not sure what is going wrong, but I noticed, that the value of oldWallPaper looks something like "C\0:\0\0U\0s\0e\0r\0s[...]" -- I'm wondering where all the \0s come from.
A friend of mine compiled it on windows xp 32 (also VC++ 2010) and is able to run it without problems
any clues/hints/opinions?
thanks

The doc isn't very clear. The returned string is a WCHAR, two bytes per character not one, so you need to allocate twice as much space otherwise you get a buffer overrun. Try:
BOOL result;
WCHAR oldWallPaper[(MAX_PATH + 1)];
result = SystemParametersInfo(SPI_GETDESKWALLPAPER,
_tcslen(oldWallPaper), oldWallPaper, 0);
See also:
http://msdn.microsoft.com/en-us/library/ms724947(VS.85).aspx
http://msdn.microsoft.com/en-us/library/ms235631(VS.80).aspx (string conversion)

Every Windows function has 2 versions:
SystemParametersInfoA() // Ascii
SystemParametersInfoW() // Unicode
The version ending in W is the wide character type (ie Unicode) version of the function. All the \0's you are seeing are because every character you're getting back is in Unicode - 16 bytes per character - the second byte happens to be 0. So you need to store the result in a wchar_t array, and use wprintf instead of printf
wchar_t oldWallPaper[MAX_PATH];
result = SystemParametersInfo(SPI_GETDESKWALLPAPER, MAX_PATH-1, oldWallPaper, 0);
wprintf( L"Current desktop background is %s\n", oldWallPaper );
So you can use the A version SystemParametersInfoA() if you are hell-bent on not using Unicode. For the record you should always try to use Unicode, however.
Usually SystemParametersInfo() is a macro that evaluates to the W version, if UNICODE is defined on your system.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

CryptStringToBinary bug? Or do I not understand something? - windows

Related

Compatibility of printf with utf-8 encoded strings

Read a utf8 file to a std::string without BOM

using MultiByteToWideChar

WM_GETTEXT usage

Stack around the variable 'xyz' was corrupted

Categories

Resources