Write a unicode CString to a file using WriteFile API - winapi

How can I write contents of a CString instance to a file opened by CreateFile using WriteFile Win32 API function?
Please note MFC is not used, and CString is used by including "atlstr.h"
edit: Can just I do
WriteFile(handle, cstr, cstr.GetLength(), &dwWritten, NULL);
or
WriteFile(handle, cstr, cstr.GetLength() * sizeof(TCHAR), &dwWritten, NULL);
?

With ATL it's like this:
CString sValue;
CStringW sValueW(sValue); // NOTE: CT2CW() can be used instead
CAtlFile File;
ATLENSURE_SUCCEEDED(File.Create(sPath, GENERIC_WRITE, ...));
static const BYTE g_pnByteOrderMark[] = { 0xFF, 0xFE }; // UTF-16, Little Endian
ATLENSURE_SUCCEEDED(File.Write(g_pnByteOrderMark, sizeof g_pnByteOrderMark));
ATLENSURE_SUCCEEDED(File.Write(sValueW, (DWORD) (sValueW.GetLength() * sizeof (WCHAR))));
It's byte order mark (BOM) which lets Notepad know that encoding is UTF-16.

You need to pick a text encoding, convert the string to that encoding, and write the text file accordingly. Simplest is ANSI, so basically you would use the T2CA macro to convert your TCHAR string to ansi, then dump the contents in a file. Something like (untested/compiled):
// assumes handle is already opened in an empty new file
void DumpToANSIFile(const CString& str, HANDLE hFile)
{
USES_CONVERSION;
PCSTR ansi = T2CA(str);
DWORD dwWritten;
WriteFile(hFile, ansi, strlen(ansi) * sizeof(ansi[0]), &dwWritten, NULL);
}
Because it's ANSI encoding though, it will only be readable on computers that have your same code page settings. For a more portable solution, use UTF-8 or UTF-16.

Converting to ANSI can cause problems with code pages, so it is not acceptable in many cases. Here is a function that saves unicode string to unicode text file:
void WriteUnicodeStringToFile(const CString& str, LPCWSTR FileName)
{
HANDLE f = CreateFileW(FileName, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
if (f == INVALID_HANDLE_VALUE) return; //failed
DWORD wr;
unsigned char Header[2]; //unicode text file header
Header[0] = 0xFF;
Header[1] = 0xFE;
WriteFile(f, Header, 2, &wr, NULL);
WriteFile(f, (LPCTSTR)str, str.GetLength() * 2, &wr, NULL);
CloseHandle(f);
}
Using:
CString str = L"This is a sample unicode string";
WriteUnicodeStringToFile(str, L"c:\\Sample.txt");
Notepad understands unicode text files.

I believe you need additional casting:
WriteFile(handle, (LPCVOID)(LPCTSTR)cstr, cstr.GetLength() * sizeof(TCHAR), &dwWritten, NULL);

Also make sure that you write proper byte order mark in the beginning so that Notepad can read it properly.

Related

C Program Strange Characters retrieved due to language setting on Windows

If the below code is compiled with UNICODE as compiler option, the GetComputerNameEx API returns junk characters.
Whereas if compiled without UNICODE option, the API returns truncated value of the hostname.
This issue is mostly seen with Asia-Pacific languages like Chinese, Japanese, Korean to name a few (i.e., non-English).
Can anyone throw some light on how this issue can be resolved.
# define INFO_SIZE 30
int main()
{
int ret;
TCHAR infoBuf[INFO_SIZE+1];
DWORD bufSize = (INFO_SIZE+1);
char *buf;
buf = (char *) malloc(INFO_SIZE+1);
if (!GetComputerNameEx((COMPUTER_NAME_FORMAT)1,
(LPTSTR)infoBuf, &bufSize))
{
printf("GetComputerNameEx failed (%d)\n", GetLastError());
return -1;
}
ret = wcstombs(buf, infoBuf, (INFO_SIZE+1));
buf[INFO_SIZE] = '\0';
return 0;
}
In the languages you mentioned, most characters are represented by more than one byte. This is because these languages have alphabets of much more than 256 characters. So you may need more than 30 bytes to encode 30 characters.
The usual pattern for calling a function like wcstombs goes like this: first get the amount of bytes required, then allocate a buffer, then convert the string.
(edit: that actually relies on a POSIX extension, which also got implemented on Windows)
size_t size = wcstombs(NULL, infoBuf, 0);
if (size == (size_t) -1) {
// some character can't be converted
}
char *buf = new char[size + 1];
size = wcstombs(buf, infoBuf, size + 1);

Powerbuilder: ImportFile of UTF-8 (Converting UTF-8 to ANSI)

My Powerbuilder version is 6.5, cannot use a higher version as this is what I am supporting.
My problem is, when I am doing dw_1.ImportFile(file) the first row and first column has a funny string like this:

Which I dont understand until I tried opening the file and saving it to a new text file and trying to import that new file.which worked flawlessly without the funny string.
My conclusion is that this is happening because the file is UTF-8 (as shown in NOTEPAD++) and the new file is Ansi. The file I am trying to import is automatically given by a 3rd party and my users dont want the extra job of doing this.
How do I force convert this files to ANSI in powerbuilder. If there is none, I might have to do a command prompt conversion, any ideas?
The weird  characters are the (optional) utf-8 BOM that tells editors that the file is utf-8 encoded (as it can be difficult to know it unless we encounter an escaped character above code 127). You cannot just rid it off because if your file contains any character above 127 (accents or any special char), you will still have garbage in your displayed data (for example: é -> é, € -> €, ...) where special characters will become from 2 to 4 garbage chars.
I recently needed to convert some utf-8 encoded string to "ansi" windows 1252 encoding. With version of PB10+, a reencoding between utf-8 and ansi is as simple as
b = blob(s, encodingutf8!)
s2 = string(b, encodingansi!)
But string() and blob() do not support encoding specification before the release 10 of PB.
What you can do is to read the file yourself, skip the BOM, ask Windows to convert the string encoding via MultiByteToWideChar() + WideCharToMultiByte() and load the converted string in the DW with ImportString().
Proof of concept to get the file contents (with this reading method, the file cannot be bigger than 2GB):
string ls_path, ls_file, ls_chunk, ls_ansi
ls_path = sle_path.text
int li_file
if not fileexists(ls_path) then return
li_file = FileOpen(ls_path, streammode!)
if li_file > 0 then
FileSeek(li_file, 3, FromBeginning!) //skip the utf-8 BOM
//read the file by blocks, FileRead is limited to 32kB
do while FileRead(li_file, ls_chunk) > 0
ls_file += ls_chunk //concatenate in loop works but is not so performant
loop
FileClose(li_file)
ls_ansi = utf8_to_ansi(ls_file)
dw_tab.importstring( text!, ls_ansi)
end if
utf8_to_ansi() is a globlal function, it was written for PB9, but it should work the same with PB6.5:
global type utf8_to_ansi from function_object
end type
type prototypes
function ulong MultiByteToWideChar(ulong CodePage, ulong dwflags, ref string lpmultibytestr, ulong cchmultibyte, ref blob lpwidecharstr, ulong cchwidechar) library "kernel32.dll"
function ulong WideCharToMultiByte(ulong CodePage, ulong dwFlags, ref blob lpWideCharStr, ulong cchWideChar, ref string lpMultiByteStr, ulong cbMultiByte, ref string lpUsedDefaultChar, ref boolean lpUsedDefaultChar) library "kernel32.dll"
end prototypes
forward prototypes
global function string utf8_to_ansi (string as_utf8)
end prototypes
global function string utf8_to_ansi (string as_utf8);
//convert utf-8 -> ansi
//use a wide-char native string as pivot
constant ulong CP_ACP = 0
constant ulong CP_UTF8 = 65001
string ls_wide, ls_ansi, ls_null
blob lbl_wide
ulong ul_len
boolean lb_flag
setnull(ls_null)
lb_flag = false
//get utf-8 string length converted as wide-char
setnull(lbl_wide)
ul_len = multibytetowidechar(CP_UTF8, 0, as_utf8, -1, lbl_wide, 0)
//allocate buffer to let windows write into
ls_wide = space(ul_len * 2)
lbl_wide = blob(ls_wide)
//convert utf-8 -> wide char
ul_len = multibytetowidechar(CP_UTF8, 0, as_utf8, -1, lbl_wide, ul_len)
//get the final ansi string length
setnull(ls_ansi)
ul_len = widechartomultibyte(CP_ACP, 0, lbl_wide, -1, ls_ansi, 0, ls_null, lb_flag)
//allocate buffer to let windows write into
ls_ansi = space(ul_len)
//convert wide-char -> ansi
ul_len = widechartomultibyte(CP_ACP, 0, lbl_wide, -1, ls_ansi, ul_len, ls_null, lb_flag)
return ls_ansi
end function

In Win32, how can a text file be successfully read into memory?

I am trying to get simple file IO working in Win32. So far the writing is working fine, but the reading is not: although it successfully reads the contents, additional "garbage" is appended to the string. The code I have so far is below. The program has UNICODE defined.
For writing:
DWORD dwTextSize = GetWindowTextLength(hWndTextBox);
WCHAR *lpszText = new WCHAR[dwTextSize];
GetWindowText(hWndTextBox, lpszText, dwTextSize + 1);
hTextFile = CreateFile(lpszTextFileName, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD dwBytesWritten;
WriteFile(hTextFile, lpszText, 2 * dwTextSize, &dwBytesWritten, NULL); // x2 for 2 bytes per Unicode character
CloseHandle(hTextFile);
DeleteObject(hTextFile);
For this example, Hello, World! is saved successfully as Hello, World!.
For reading:
lpszTextFileName = L"text.txt"; // LPCTSTR Variable
hTextFile = CreateFile(lpszTextFileName, GENERIC_READ, 0, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD dwFileSize = GetFileSize(hTextFile, &dwFileSize);
DWORD dwBytesRead;
WCHAR *lpszText = new WCHAR[dwFileSize / 2];
ReadFile(hTextFile, lpszText, dwFileSize, &dwBytesRead, NULL);
CloseHandle(hTextFile);
The string is then used to set the text of an EDIT control:
SendMessage(hWndTextBox, WM_SETTEXT, NULL, (LPARAM)lpszText); // SetWindowText() also possible
When Hello, World! is read back in, it reads back in as Hello, World!﷽﷽ꮫꮫꮫꮫﻮ or a visual variation upon this, but basically "garbage"!
I have probably missed something rather obvious, but I cannot see where! Is there a solution to this problem and if so, what is it?
Ok I started this with a comment, but its getting out of control.
For Writing
This:
WCHAR *lpszText = new WCHAR[dwTextSize];
should be:
WCHAR *lpszText = new WCHAR[dwTextSize+1];
This:
DeleteObject(hTextFile);
should not be there at all.. Get rid of it.
I'm assuming you delete [] lpszText; somewhere when you're done with it. if not, do so.
For Reading
The second parameter to GetFileSize() should not be the same variable as your return value. It is the HIGH 32bit of a 64-bit value for large file sizes. If you know you're file size is smaller than 4gB, you can leave it NULL, so change this:
DWORD dwFileSize = GetFileSize(hTextFile, &dwFileSize);
to this:
DWORD dwFileSize = GetFileSize(hTextFile, NULL);
You must account for the null terminator of your file buffer, so this:
WCHAR *lpszText = new WCHAR[dwFileSize / 2];
should be changed to this:
WCHAR *lpszText = new WCHAR[dwFileSize / 2 + 1];
lpszText[dwFileSize / 2] = 0;
and the rest should work as you're hoping it would. No error checking, which is not good, but I've seen worse. And as before, I'm assuming you delete [] lpszText; somewhere when you're done with it. if not, do so.

playing files after accepting them through open dialog box

I am a new member and joined this site after referring to it loads of times when i was stuck with some programming problems. I am trying to code a media player (Win32 SDK VC++ 6.0) for my college project and I am stuck. I have searched on various forums and msdn and finally landed on the function GetShortPathName which enables me to play through folders and files which have a whitespace in their names. I will paste the code here so it will be much more clearer as to what i am trying to do.
case IDM_FILE_OPEN :
ZeroMemory(&ofn, sizeof(ofn));
ofn.lStructSize = sizeof(ofn);
ofn.hwndOwner = hwnd;
ofn.lpstrFilter = "Media Files (All Supported Types)\0*.avi;*.mpg;*.mpeg;*.asf;*.wmv;*.mp2;*.mp3\0"
"Movie File (*.avi;*.mpg;*.mpeg)\0*.avi;*.mpg;*.mpeg\0"
"Windows Media File (*.asf;*.wmv)\0*.asf;*.wmv\0"
"Audio File (*.mp2;*.mp3)\0*.mp2;*.mp3\0"
"All Files(*.*)\0*.*\0";
ofn.lpstrFile = szFileName;
ofn.nMaxFile = MAX_PATH;
ofn.Flags = OFN_EXPLORER | OFN_FILEMUSTEXIST | OFN_HIDEREADONLY | OFN_ALLOWMULTISELECT | OFN_CREATEPROMPT;
ofn.lpstrDefExt = "mp3";
if(GetOpenFileName(&ofn))
{
length = GetShortPathName(szFileName, NULL, 0);
buffer = (TCHAR *) malloc (sizeof(length));
length = GetShortPathName(szFileName, buffer, length);
for(i = 0 ; i < MAX_PATH ; i++)
{
if(buffer[i] == '\\')
buffer[i] = '/';
}
SendMessage(hList,LB_ADDSTRING,0,(LPARAM)buffer);
mciSendString("open buffer alias myFile", NULL, 0, NULL);
mciSendString("play buffer", NULL, 0, NULL);
}
return 0;
using the GetShortPathName function i get the path as : D:/Mp3z/DEEPBL~1/03SLEE~1.mp3
Putting this path directly in Play button case
mciSendString("open D:/Mp3jh/DEEPBL~1/03SLEE~1.mp3 alias myFile", NULL, 0, NULL);
mciSendString("play myFile", NULL, 0, NULL);
the file opens and plays fine. But as soon as i try to open and play it through the open file dialog box, nothing happens. Any input appreciated.
It looks like the problem is that you're passing the name of the buffer variable to the mciSendString function as a string, rather than passing the contents of the buffer.
You need to concatenate the arguments you want to pass (open and alias myFile) with the contents of buffer.
The code can also be much simplified by replacing malloc with an automatic array. You don't need to malloc it because you don't need it outside of the block scope. (And you shouldn't be using malloc in C++ code anyway; use new[] instead.)
Here's a modified snippet of the code shown in your question:
(Warning: changes made using only my eyes as a compiler! Handle with care.)
if(GetOpenFileName(&ofn))
{
// Get the short path name, and place it in the buffer array.
// We know that a short path won't be any longer than MAX_PATH, so we can
// simply allocate a statically-sized array without futzing with new[].
//
// Note: In production code, you should probably check the return value
// of the GetShortPathName function to make sure it succeeded.
TCHAR buffer[MAX_PATH];
GetShortPathName(szFileName, buffer, MAX_PATH);
// Add the short path name to your ListBox control.
//
// Note: In C++ code, you should probably use C++-style casts like
// reinterpret_cast, rather than C-style casts!
SendMessage(hList, LB_ADDSTRING, 0, reinterpret_cast<LPARAM>(buffer));
// Build the argument string to pass to the mciSendString function.
//
// Note: In production code, you probably want to use the more secure
// alternatives to the string concatenation functions.
// See the documentation for more details.
// And, as before, you should probably check return values for error codes.
TCHAR arguments[MAX_PATH * 2]; // this will definitely be large enough
lstrcat(arguments, TEXT("open"));
lstrcat(arguments, buffer);
lstrcat(arguments, TEXT("alias myFile"));
// Or, better yet, use a string formatting function, like StringCbPrintf:
// StringCbPrintf(arguments, MAX_PATH * 2, TEXT("open %s alias myFile"),
// buffer);
// Call the mciSendString function with the argument string we just built.
mciSendString(arguments, NULL, 0, NULL);
mciSendString("play myFile", NULL, 0, NULL);
}
Do note that, as the above code shows, working with C-style strings (character arrays) is a real pain in the ass. C++ provides a better alternative, in the form of the std::string class. You should strongly consider using that instead. To call Windows API functions, you'll still need a C-style string, but you can get one of those by using the c_str method of the std::string class.

Converting LPCWSTR with WideCharToMultiByte. Need help

i have a function like this:
BOOL WINAPI MyFunction(HDC hdc, LPCWSTR text, UINT cbCount){
char AnsiBuffer[255];
int written = WideCharToMultiByte(CP_ACP, 0, text, cbCount, AnsiBuffer , 0, NULL, NULL);
if(written > -1) AnsiBuffer[written] = '\0';
if(written>0){
ofstream myfile;
myfile.open ("C:\\example.txt", ios::app);
myfile.write(AnsiBuffer, sizeof(AnsiBuffer));
myfile.write("\n", 1);
myfile.close();
}
....
When i display the input LPCWSTR text with MessageBoxW(), the text shows up fine. When i try to convert it to multibyte, the return value looks normal (ex: 22, 45, etc), but the result is strings of gibberish (ex ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ). Suggestions?
I see two problems;
1) You are passing '0' to WideCharToMultiByte for the size of the multibyte buffer. If you read the documents this results in the function returning the NUMBER of bytes needed but performing no actual conversion. (This is to allow you to subsequently allocate a buffer of the correct size and recall the function).
2) in file.write sizeof(AnsiBuffer) will result in 255 bytes being written regardless of what is in the buffer. sizeof is a compile-time calculation that returns the size of a variable. You should replace this with the 'written' variable that represents the length of the string.
You need to pass the length of the buffer to the API, instead of passing 0. When you pass 0, the function returns the required length of the buffer, but doesn't write to it. You're seeing the results of the uninitialized array.
Here's the right call, with the 255 in the right place:
int written = WideCharToMultiByte(CP_ACP, 0, text, cbCount, AnsiBuffer , 255, NULL, NULL);

Resources