Converting LPCWSTR with WideCharToMultiByte. Need help - winapi

i have a function like this:
BOOL WINAPI MyFunction(HDC hdc, LPCWSTR text, UINT cbCount){
char AnsiBuffer[255];
int written = WideCharToMultiByte(CP_ACP, 0, text, cbCount, AnsiBuffer , 0, NULL, NULL);
if(written > -1) AnsiBuffer[written] = '\0';
if(written>0){
ofstream myfile;
myfile.open ("C:\\example.txt", ios::app);
myfile.write(AnsiBuffer, sizeof(AnsiBuffer));
myfile.write("\n", 1);
myfile.close();
}
....
When i display the input LPCWSTR text with MessageBoxW(), the text shows up fine. When i try to convert it to multibyte, the return value looks normal (ex: 22, 45, etc), but the result is strings of gibberish (ex ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ). Suggestions?

I see two problems;
1) You are passing '0' to WideCharToMultiByte for the size of the multibyte buffer. If you read the documents this results in the function returning the NUMBER of bytes needed but performing no actual conversion. (This is to allow you to subsequently allocate a buffer of the correct size and recall the function).
2) in file.write sizeof(AnsiBuffer) will result in 255 bytes being written regardless of what is in the buffer. sizeof is a compile-time calculation that returns the size of a variable. You should replace this with the 'written' variable that represents the length of the string.

You need to pass the length of the buffer to the API, instead of passing 0. When you pass 0, the function returns the required length of the buffer, but doesn't write to it. You're seeing the results of the uninitialized array.
Here's the right call, with the 255 in the right place:
int written = WideCharToMultiByte(CP_ACP, 0, text, cbCount, AnsiBuffer , 255, NULL, NULL);

Related

C Program Strange Characters retrieved due to language setting on Windows

If the below code is compiled with UNICODE as compiler option, the GetComputerNameEx API returns junk characters.
Whereas if compiled without UNICODE option, the API returns truncated value of the hostname.
This issue is mostly seen with Asia-Pacific languages like Chinese, Japanese, Korean to name a few (i.e., non-English).
Can anyone throw some light on how this issue can be resolved.
# define INFO_SIZE 30
int main()
{
int ret;
TCHAR infoBuf[INFO_SIZE+1];
DWORD bufSize = (INFO_SIZE+1);
char *buf;
buf = (char *) malloc(INFO_SIZE+1);
if (!GetComputerNameEx((COMPUTER_NAME_FORMAT)1,
(LPTSTR)infoBuf, &bufSize))
{
printf("GetComputerNameEx failed (%d)\n", GetLastError());
return -1;
}
ret = wcstombs(buf, infoBuf, (INFO_SIZE+1));
buf[INFO_SIZE] = '\0';
return 0;
}
In the languages you mentioned, most characters are represented by more than one byte. This is because these languages have alphabets of much more than 256 characters. So you may need more than 30 bytes to encode 30 characters.
The usual pattern for calling a function like wcstombs goes like this: first get the amount of bytes required, then allocate a buffer, then convert the string.
(edit: that actually relies on a POSIX extension, which also got implemented on Windows)
size_t size = wcstombs(NULL, infoBuf, 0);
if (size == (size_t) -1) {
// some character can't be converted
}
char *buf = new char[size + 1];
size = wcstombs(buf, infoBuf, size + 1);

Powerbuilder: ImportFile of UTF-8 (Converting UTF-8 to ANSI)

My Powerbuilder version is 6.5, cannot use a higher version as this is what I am supporting.
My problem is, when I am doing dw_1.ImportFile(file) the first row and first column has a funny string like this:

Which I dont understand until I tried opening the file and saving it to a new text file and trying to import that new file.which worked flawlessly without the funny string.
My conclusion is that this is happening because the file is UTF-8 (as shown in NOTEPAD++) and the new file is Ansi. The file I am trying to import is automatically given by a 3rd party and my users dont want the extra job of doing this.
How do I force convert this files to ANSI in powerbuilder. If there is none, I might have to do a command prompt conversion, any ideas?
The weird  characters are the (optional) utf-8 BOM that tells editors that the file is utf-8 encoded (as it can be difficult to know it unless we encounter an escaped character above code 127). You cannot just rid it off because if your file contains any character above 127 (accents or any special char), you will still have garbage in your displayed data (for example: é -> é, € -> €, ...) where special characters will become from 2 to 4 garbage chars.
I recently needed to convert some utf-8 encoded string to "ansi" windows 1252 encoding. With version of PB10+, a reencoding between utf-8 and ansi is as simple as
b = blob(s, encodingutf8!)
s2 = string(b, encodingansi!)
But string() and blob() do not support encoding specification before the release 10 of PB.
What you can do is to read the file yourself, skip the BOM, ask Windows to convert the string encoding via MultiByteToWideChar() + WideCharToMultiByte() and load the converted string in the DW with ImportString().
Proof of concept to get the file contents (with this reading method, the file cannot be bigger than 2GB):
string ls_path, ls_file, ls_chunk, ls_ansi
ls_path = sle_path.text
int li_file
if not fileexists(ls_path) then return
li_file = FileOpen(ls_path, streammode!)
if li_file > 0 then
FileSeek(li_file, 3, FromBeginning!) //skip the utf-8 BOM
//read the file by blocks, FileRead is limited to 32kB
do while FileRead(li_file, ls_chunk) > 0
ls_file += ls_chunk //concatenate in loop works but is not so performant
loop
FileClose(li_file)
ls_ansi = utf8_to_ansi(ls_file)
dw_tab.importstring( text!, ls_ansi)
end if
utf8_to_ansi() is a globlal function, it was written for PB9, but it should work the same with PB6.5:
global type utf8_to_ansi from function_object
end type
type prototypes
function ulong MultiByteToWideChar(ulong CodePage, ulong dwflags, ref string lpmultibytestr, ulong cchmultibyte, ref blob lpwidecharstr, ulong cchwidechar) library "kernel32.dll"
function ulong WideCharToMultiByte(ulong CodePage, ulong dwFlags, ref blob lpWideCharStr, ulong cchWideChar, ref string lpMultiByteStr, ulong cbMultiByte, ref string lpUsedDefaultChar, ref boolean lpUsedDefaultChar) library "kernel32.dll"
end prototypes
forward prototypes
global function string utf8_to_ansi (string as_utf8)
end prototypes
global function string utf8_to_ansi (string as_utf8);
//convert utf-8 -> ansi
//use a wide-char native string as pivot
constant ulong CP_ACP = 0
constant ulong CP_UTF8 = 65001
string ls_wide, ls_ansi, ls_null
blob lbl_wide
ulong ul_len
boolean lb_flag
setnull(ls_null)
lb_flag = false
//get utf-8 string length converted as wide-char
setnull(lbl_wide)
ul_len = multibytetowidechar(CP_UTF8, 0, as_utf8, -1, lbl_wide, 0)
//allocate buffer to let windows write into
ls_wide = space(ul_len * 2)
lbl_wide = blob(ls_wide)
//convert utf-8 -> wide char
ul_len = multibytetowidechar(CP_UTF8, 0, as_utf8, -1, lbl_wide, ul_len)
//get the final ansi string length
setnull(ls_ansi)
ul_len = widechartomultibyte(CP_ACP, 0, lbl_wide, -1, ls_ansi, 0, ls_null, lb_flag)
//allocate buffer to let windows write into
ls_ansi = space(ul_len)
//convert wide-char -> ansi
ul_len = widechartomultibyte(CP_ACP, 0, lbl_wide, -1, ls_ansi, ul_len, ls_null, lb_flag)
return ls_ansi
end function

Windows RegQueryValueEx odd return results

I am getting odd results when using RegQueryValueEx and I cannot figure out why.
This is what I had set up before making the RegQueryValueEx
DWORD dataSize;
TCHAR data[256];
The first time I call
LONG ret = RegQueryValueEx( hKey, dataKey, NULL, NULL, (LPBYTE)data, &dataSize);
ret is equal to 234 (ERROR_MORE_DATA)
But when I call the same thing on the next line
LONG ret2 = RegQueryValueEx( hKey, dataKey, NULL, NULL, (LPBYTE)data, &dataSize);
ret2 is equal to 0 (ERROR_SUCCESS)
Why would this function return ERROR_MORE_DATA the first time I call it, then return ERROR_SUCESS on the same call on the very next line?
I attempted to change TCHAR data[1024] but I got the exact same results. Any ideas?
Complete code:
for( int i=0; i<NUM_HISTORY; i++){
CString dataKey = getDataKey(i);
DWORD dataSize = 1024;
TCHAR data[1024];
LONG ret = RegQueryValueEx( hKey, dataKey, NULL, NULL, (LPBYTE)data, &dataSize);
LONG ret2 = RegQueryValueEx( hKey, dataKey, NULL, NULL, (LPBYTE)data, &dataSize);
// Breakpoint to see what ret and ret2 are equal to
int j = 0;
}
This is by design. The first call failed because you specified as size that was too small. But what you didn't count on is that it also updated your dataSize variable. To tell you how much memory to allocate so the call can succeed.
So the second call succeeded since you now specify a size that's exactly correct. But without doing the other thing you needed to do, actually make the buffer bigger. Nothing much good can happen when the call then causes a buffer overflow and corrupt your stack frame, be sure to use the /RTC compile option so you'll get a runtime error from that.
You avoided this problem by increasing the buffer size from 256 to 1024. But your code is still incorrect, your program will fail miserably if the registry value ever gets larger than 1024 bytes. Don't use a local array, use the new operator or malloc() to allocate the buffer so it can never fail like this. Or simply fail the call and declare "bad data".
Also note another bug, the dataSize is in bytes but the buffer is TCHAR, not a byte. Which is probably why you didn't corrupt the stack frame, the buffer was big enough by accident. You don't want to rely on accidents like this. Consider a helper class like CRegKey to avoid these kind of mistakes.

In Win32, how can a text file be successfully read into memory?

I am trying to get simple file IO working in Win32. So far the writing is working fine, but the reading is not: although it successfully reads the contents, additional "garbage" is appended to the string. The code I have so far is below. The program has UNICODE defined.
For writing:
DWORD dwTextSize = GetWindowTextLength(hWndTextBox);
WCHAR *lpszText = new WCHAR[dwTextSize];
GetWindowText(hWndTextBox, lpszText, dwTextSize + 1);
hTextFile = CreateFile(lpszTextFileName, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD dwBytesWritten;
WriteFile(hTextFile, lpszText, 2 * dwTextSize, &dwBytesWritten, NULL); // x2 for 2 bytes per Unicode character
CloseHandle(hTextFile);
DeleteObject(hTextFile);
For this example, Hello, World! is saved successfully as Hello, World!.
For reading:
lpszTextFileName = L"text.txt"; // LPCTSTR Variable
hTextFile = CreateFile(lpszTextFileName, GENERIC_READ, 0, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD dwFileSize = GetFileSize(hTextFile, &dwFileSize);
DWORD dwBytesRead;
WCHAR *lpszText = new WCHAR[dwFileSize / 2];
ReadFile(hTextFile, lpszText, dwFileSize, &dwBytesRead, NULL);
CloseHandle(hTextFile);
The string is then used to set the text of an EDIT control:
SendMessage(hWndTextBox, WM_SETTEXT, NULL, (LPARAM)lpszText); // SetWindowText() also possible
When Hello, World! is read back in, it reads back in as Hello, World!﷽﷽ꮫꮫꮫꮫﻮ or a visual variation upon this, but basically "garbage"!
I have probably missed something rather obvious, but I cannot see where! Is there a solution to this problem and if so, what is it?
Ok I started this with a comment, but its getting out of control.
For Writing
This:
WCHAR *lpszText = new WCHAR[dwTextSize];
should be:
WCHAR *lpszText = new WCHAR[dwTextSize+1];
This:
DeleteObject(hTextFile);
should not be there at all.. Get rid of it.
I'm assuming you delete [] lpszText; somewhere when you're done with it. if not, do so.
For Reading
The second parameter to GetFileSize() should not be the same variable as your return value. It is the HIGH 32bit of a 64-bit value for large file sizes. If you know you're file size is smaller than 4gB, you can leave it NULL, so change this:
DWORD dwFileSize = GetFileSize(hTextFile, &dwFileSize);
to this:
DWORD dwFileSize = GetFileSize(hTextFile, NULL);
You must account for the null terminator of your file buffer, so this:
WCHAR *lpszText = new WCHAR[dwFileSize / 2];
should be changed to this:
WCHAR *lpszText = new WCHAR[dwFileSize / 2 + 1];
lpszText[dwFileSize / 2] = 0;
and the rest should work as you're hoping it would. No error checking, which is not good, but I've seen worse. And as before, I'm assuming you delete [] lpszText; somewhere when you're done with it. if not, do so.

playing files after accepting them through open dialog box

I am a new member and joined this site after referring to it loads of times when i was stuck with some programming problems. I am trying to code a media player (Win32 SDK VC++ 6.0) for my college project and I am stuck. I have searched on various forums and msdn and finally landed on the function GetShortPathName which enables me to play through folders and files which have a whitespace in their names. I will paste the code here so it will be much more clearer as to what i am trying to do.
case IDM_FILE_OPEN :
ZeroMemory(&ofn, sizeof(ofn));
ofn.lStructSize = sizeof(ofn);
ofn.hwndOwner = hwnd;
ofn.lpstrFilter = "Media Files (All Supported Types)\0*.avi;*.mpg;*.mpeg;*.asf;*.wmv;*.mp2;*.mp3\0"
"Movie File (*.avi;*.mpg;*.mpeg)\0*.avi;*.mpg;*.mpeg\0"
"Windows Media File (*.asf;*.wmv)\0*.asf;*.wmv\0"
"Audio File (*.mp2;*.mp3)\0*.mp2;*.mp3\0"
"All Files(*.*)\0*.*\0";
ofn.lpstrFile = szFileName;
ofn.nMaxFile = MAX_PATH;
ofn.Flags = OFN_EXPLORER | OFN_FILEMUSTEXIST | OFN_HIDEREADONLY | OFN_ALLOWMULTISELECT | OFN_CREATEPROMPT;
ofn.lpstrDefExt = "mp3";
if(GetOpenFileName(&ofn))
{
length = GetShortPathName(szFileName, NULL, 0);
buffer = (TCHAR *) malloc (sizeof(length));
length = GetShortPathName(szFileName, buffer, length);
for(i = 0 ; i < MAX_PATH ; i++)
{
if(buffer[i] == '\\')
buffer[i] = '/';
}
SendMessage(hList,LB_ADDSTRING,0,(LPARAM)buffer);
mciSendString("open buffer alias myFile", NULL, 0, NULL);
mciSendString("play buffer", NULL, 0, NULL);
}
return 0;
using the GetShortPathName function i get the path as : D:/Mp3z/DEEPBL~1/03SLEE~1.mp3
Putting this path directly in Play button case
mciSendString("open D:/Mp3jh/DEEPBL~1/03SLEE~1.mp3 alias myFile", NULL, 0, NULL);
mciSendString("play myFile", NULL, 0, NULL);
the file opens and plays fine. But as soon as i try to open and play it through the open file dialog box, nothing happens. Any input appreciated.
It looks like the problem is that you're passing the name of the buffer variable to the mciSendString function as a string, rather than passing the contents of the buffer.
You need to concatenate the arguments you want to pass (open and alias myFile) with the contents of buffer.
The code can also be much simplified by replacing malloc with an automatic array. You don't need to malloc it because you don't need it outside of the block scope. (And you shouldn't be using malloc in C++ code anyway; use new[] instead.)
Here's a modified snippet of the code shown in your question:
(Warning: changes made using only my eyes as a compiler! Handle with care.)
if(GetOpenFileName(&ofn))
{
// Get the short path name, and place it in the buffer array.
// We know that a short path won't be any longer than MAX_PATH, so we can
// simply allocate a statically-sized array without futzing with new[].
//
// Note: In production code, you should probably check the return value
// of the GetShortPathName function to make sure it succeeded.
TCHAR buffer[MAX_PATH];
GetShortPathName(szFileName, buffer, MAX_PATH);
// Add the short path name to your ListBox control.
//
// Note: In C++ code, you should probably use C++-style casts like
// reinterpret_cast, rather than C-style casts!
SendMessage(hList, LB_ADDSTRING, 0, reinterpret_cast<LPARAM>(buffer));
// Build the argument string to pass to the mciSendString function.
//
// Note: In production code, you probably want to use the more secure
// alternatives to the string concatenation functions.
// See the documentation for more details.
// And, as before, you should probably check return values for error codes.
TCHAR arguments[MAX_PATH * 2]; // this will definitely be large enough
lstrcat(arguments, TEXT("open"));
lstrcat(arguments, buffer);
lstrcat(arguments, TEXT("alias myFile"));
// Or, better yet, use a string formatting function, like StringCbPrintf:
// StringCbPrintf(arguments, MAX_PATH * 2, TEXT("open %s alias myFile"),
// buffer);
// Call the mciSendString function with the argument string we just built.
mciSendString(arguments, NULL, 0, NULL);
mciSendString("play myFile", NULL, 0, NULL);
}
Do note that, as the above code shows, working with C-style strings (character arrays) is a real pain in the ass. C++ provides a better alternative, in the form of the std::string class. You should strongly consider using that instead. To call Windows API functions, you'll still need a C-style string, but you can get one of those by using the c_str method of the std::string class.

Resources