convert BSTR to wstring - windows

How to convert a char[256] to wstring?
update. here is my current code:
char testDest[256];
char *p= _com_util::ConvertBSTRToString(url->bstrVal);
for (int i = 0; i <= strlen(p); i++)
{
testDest[i] = p[i];
}
// need to convert testDest to wstring to I can pass it to this below function...
writeToFile(testDestwstring);

If your input is BSTR (as it seems to be) the data is already Unicode and you can just cast this directly to wstring as follows. _bstr_t has implicit conversions to both char* and wchar* which avoid the need for manual Win32 code conversion.
if (url->bstrVal)
{
// true => make a new copy - can avoid this if source
// no longer needed, by using false here and avoiding SysFreeString on source
const _bstr_t wrapper(url->bstrVal, true);
std::wstring wstrVal((const _wchar_t*)wrapper);
}
See here for more details on this area of Windows usage. It's easy to mess up the use of the Win32 API in this area - using the BSTR wrapper to do this avoids both data copy (if used judiciously) and code complexity.

MultiByteToWideChar will return a UTF-16 string. You need to specify the source codepage.

Related

UNICODE_STRING to std String Conversion

I am using pFileObject->FileName to get the name of file opened in a kernel mode filter driver.This file name returned by this is in the form of UNICODE_STRING. I want to convert it into std String. What is the method ??? Please do provide example if possible...
Below is the code
NTSTATUS FsFilterDispatchCreate(
__in PDEVICE_OBJECT DeviceObject,
__in PIRP Irp
)
{
PFILE_OBJECT pFileObject = IoGetCurrentIrpStackLocation(Irp)->FileObject;
DbgPrint("%wZ\n", &pFileObject->FileName);
return FsFilterDispatchPassThrough(DeviceObject, Irp);
}
I agree with Hans' comment. Making std:: classes work in Windows kernel mode is extremely difficult if not impossible. The default WinDDK environment is C rather than C++. Your best bet is to convert UNICODE_STRING to ANSI null-terminated string. (You can print it with DbgPrint("%s"...) etc). See example below.
UNICODE_STRING tmp;
// ...
ANSI_STRING dest;
ULONG unicodeBufferSize = tmp.Length;
// Length of unicode string in bytes must be enough to keep ANSI string
dest.Buffer = (PCHAR)ExAllocatePool(NonPagedPool, unicodeBufferSize+1);
// check for allocation failure...
dest.Length = 0;
dest.MaximumLength = unicodeBufferSize+1;
RtlUnicodeStringToAnsiString(&dest, &tmp, FALSE);
// check for failure...
dest.Buffer[dest.Length] = 0; // now we get it in dest.Buffer

Conver QString to BSTR and vice versa

I want to convert QString to BSTR and vice versa.
This is what i try to convert QString to BSTR :
std::wstring str_ = QString("some texts").toStdWString();
BSTR bstr_ = str_.c_str();
and to convert BSTR to QString :
BSTR bstr_;
wchar_t *str_ = bstr_;
QString qstring_ = QString::fromWCharArray(str_);
Is this correct? In other words is there any data lose? If yes, what is the
correct solution?
You should probably use SysAllocString to do this - BSTR also contains length prefix, which is not included with your code.
std::wstring str_ = QString("some texts").toStdWString();
BSTR bstr_ = SysAllocString(str_.c_str());
Other than that there isn't anything to be lost here - Both BSTR and QString use 16-bit Unicode encoding, so converting between each other should not modify internal data buffers at all.
To convert a BSTR to a QString you can simply use the QString::fromUtf16 function:
BSTR bstrTest = SysAllocString(L"ConvertMe");
QString qstringTest = QString::fromUtf16(bstrTest);
BSTR strings consist on two parts: four bytes for the string length; and the content it self which can contain null characters.
The short way to do it would be:
Convert QString to a two-byte null terminated string using QString::utf16. Do not use toWCharArray, a wide char is different on windows (two bytes) and linux (four bytes) (I know COM is microsoft tech, but better be sure)
Use SysAllocString to create a BSTR string that contains the string length already.
Optionally free the BSTR string with SysFreeString when you are done using it. Please read the following article to know when you need to release.
https://learn.microsoft.com/en-us/cpp/atl-mfc-shared/allocating-and-releasing-memory-for-a-bstr?view=vs-2017
BSTR bstr = ::SysAllocString(QString("stuff").utf16())
// use it
::SysFreeString(bstr)
To convert from BSTR to QString, you can reinterpret-cast BSTR to a ushort pointer, and then use QString::fromUtf16. Remember to free the BSTR when you are done with it.
QString qstr = QString::fromUtf16(reinterpret_cast<ushort*>(bstr));
The next useful article explains BSTR strings very well.
https://www.codeproject.com/Articles/13862/COM-in-plain-C-Part
BSTR oldStr;
QString newStr{QString::fromWCharArray(oldStr)};

WM_GETTEXT usage

I'm trying to get the status of a text field in my application. But I don't get it to work. I'm using "SendMessage" to get "WM_GETTEXT", I save the content to a char *.
I output the char * to a file, but I only get "D" back. This is what I have now:
LRESULT result;
char * output = (char*)malloc(1024);
result = SendMessage(hwnd,WM_GETTEXT,1024,(LPARAM)output);
ofstream file("test.txt");
file << *output;
file.close();
delete [] output;
Pointers concepts
file << *output; will print the first element of the string array
file << output; print the entire string
C# code:
public const uint WM_GETTEXT = 0xD;
const int bufferSize = 10000;
StringBuilder sb = new StringBuilder(bufferSize);
SendMessageGetText(handle, WM_GETTEXT, new UIntPtr(bufferSize), sb);
Console.WriteLine(sb.ToString());
Working properly to me!
Sophia's answer is correct. However, the default now for a Visual Studio project is to create a Unicode project. You will only get the first letter if your project is Unicode and not MBCS.
Have you examined the buffer returned from WM_GETTEXT to verify it has the entire string?
If not, try declaring your output variable as TCHAR* (to be generic) or as a wchar_t* and see what results you get in the buffer.
p.s. It is bad form to allocate memory with malloc and release it with delete. You should either use malloc/free pairs or new/delete pairs. Even safer way to allocate a char buffer is to use std::string or use std::wstring for a wide string.
p.p.s Try making sure your project settings are for a Multibyte project and not Unicode project. Then everything in Sophia's answer will work.
One more thing... Just use GetWindowText() API instead of the SendMessage stuff. That's why it is there so you don't have to go through the rigmarole of casting a pointer to a LPARAM or WPARAM. It's more typesafe and will give you a compile time error (better than runtime errors) if your types don't match up--especially with Unicode/MBCS and wchar_t/char.

How do you convert a 'System::String ^' to 'TCHAR'?

i asked a question here involving C++ and C# communicating. The problem got solved but led to a new problem.
this returns a String (C#)
return Marshal.PtrToStringAnsi(decryptsn(InpData));
this expects a TCHAR* (C++)
lpAlpha2[0] = Company::Pins::Bank::Decryption::Decrypt::Decryption("123456");
i've googled how to solve this problem, but i am not sure why the String has a carrot(^) on it. Would it be best to change the return from String to something else that C++ would accept? or would i need to do a convert before assigning the value?
String has a ^ because that's the marker for a managed reference. Basically, it's used the same way as * in unmanaged land, except it can only point to an object type, not to other pointer types, or to void.
TCHAR is #defined (or perhaps typedefed, I can't remember) to either char or wchar_t, based on the _UNICODE preprocessor definition. Therefore, I would use that and write the code twice.
Either inline:
TCHAR* str;
String^ managedString
#ifdef _UNICODE
str = (TCHAR*) Marshal::StringToHGlobalUni(managedString).ToPointer();
#else
str = (TCHAR*) Marshal::StringToHGlobalAnsi(managedString).ToPointer();
#endif
// use str.
Marshal::FreeHGlobal(IntPtr(str));
or as a pair of conversion methods, both of which assume that the output buffer has already been allocated and is large enough. Method overloading should make it pick the correct one, based on what TCHAR is defined as.
void ConvertManagedString(String^ managedString, char* outString)
{
char* str;
str = (char*) Marshal::StringToHGlobalAnsi(managedString).ToPointer();
strcpy(outString, str);
Marshal::FreeHGlobal(IntPtr(str));
}
void ConvertManagedString(String^ managedString, wchar_t* outString)
{
wchar_t* str;
str = (wchar_t*) Marshal::StringToHGlobalUni(managedString).ToPointer();
wcscpy(outString, str);
Marshal::FreeHGlobal(IntPtr(str));
}
The syntax String^ is C++/CLI talk for "(garbage collected) reference to a System.String".
You have a couple of options for the conversion of a String into a C string, which is another way to express the TCHAR*. My preferred way in C++ would be to store the converted string into a C++ string type, either std::wstring or std::string, depending on you building the project as a Unicode or MBCS project.
In either case you can use something like this:
std::wstring tmp = msclr::interop::marshal_as<std::wstring>( /* Your .NET String */ );
or
std::string tmp = msclr::interop::marshal_as<std::string>(...);
Once you've converted the string into the correct wide or narrow string format, you can then access its C string representation using the c_str() function, like so:
callCFunction(tmp.c_str());
Assuming that callCFunction expects you to pass it a C-style char* or wchar_t* (which TCHAR* will "degrade" to depending on your compilation settings.
That is a really rambling way to ask the question, but if you mean how to convert a String ^ to a char *, then you use the same marshaller you used before, only backwards:
char* unmanagedstring = (char *) Marshal::StringToHGlobalAnsi(managedstring).ToPointer();
Edit: don't forget to release the memory allocated when you're done using Marshal::FreeHGlobal.

How do I read Unicode-16 strings from a file using POSIX methods in Linux?

I have a file containing UNICODE-16 strings that I would like to read into a Linux program. The strings were written raw from Windows' internal WCHAR format. (Does Windows always use UTF-16? e.g. in Japanese versions)
I believe that I can read them using raw reads and the converting with wcstombs_l. However, I cannot figure what locale to use. Runing "locale -a" on my up-to-date Ubuntu and Mac OS X machines yields zero locales with utf-16 in their names.
Is there a better way?
Update: the correct answer and others below helped point me to using libiconv. Here's a function I'm using to do the conversion. I currently have it inside a class that makes the conversions into a one-line piece of code.
// Function for converting wchar_t* to char*. (Really: UTF-16LE --> UTF-8)
// It will allocate the space needed for dest. The caller is
// responsible for freeing the memory.
static int iwcstombs_alloc(char **dest, const wchar_t *src)
{
iconv_t cd;
const char from[] = "UTF-16LE";
const char to[] = "UTF-8";
cd = iconv_open(to, from);
if (cd == (iconv_t)-1)
{
printf("iconv_open(\"%s\", \"%s\") failed: %s\n",
to, from, strerror(errno));
return(-1);
}
// How much space do we need?
// Guess that we need the same amount of space as used by src.
// TODO: There should be a while loop around this whole process
// that detects insufficient memory space and reallocates
// more space.
int len = sizeof(wchar_t) * (wcslen(src) + 1);
//printf("len = %d\n", len);
// Allocate space
int destLen = len * sizeof(char);
*dest = (char *)malloc(destLen);
if (*dest == NULL)
{
iconv_close(cd);
return -1;
}
// Convert
size_t inBufBytesLeft = len;
char *inBuf = (char *)src;
size_t outBufBytesLeft = destLen;
char *outBuf = (char *)*dest;
int rc = iconv(cd,
&inBuf,
&inBufBytesLeft,
&outBuf,
&outBufBytesLeft);
if (rc == -1)
{
printf("iconv() failed: %s\n", strerror(errno));
iconv_close(cd);
free(*dest);
*dest = NULL;
return -1;
}
iconv_close(cd);
return 0;
} // iwcstombs_alloc()
Simplest way is convert the file from utf16 to utf8 native UNIX encoding and then read it,
iconv -f utf16 -t utf8 file_in.txt -o file_out.txt
You can also use iconv(3) (see man 3 iconv) to convert string using C. Most of other languages has bindings to iconv as well.
Than you can use any UTF-8 locale like en_US.UTF-8 that are usualy the default one
on most linux distros.
(Does Windows always use UTF-16? e.g. in Japanese versions)
Yes, NT's WCHAR is always UTF-16LE.
(The ‘system codepage’, which for Japanese installs is indeed cp932/Shift-JIS, still exists in NT for the benefit of the many, many applications that aren't Unicode-native, FAT32 paths, and so on.)
However, wchar_t is not guaranteed to be 16 bits and on Linux it won't be, UTF-32 (UCS-4) is used. So wcstombs_l is unlikely to be happy.
The Right Thing would be to use a library like iconv to read it in to whichever format you are using internally - presumably wchar_t. You could try to hack it yourself by poking bytes in, but you'd probably get things like the Surrogates wrong.
Runing "locale -a" on my up-to-date Ubuntu and Mac OS X machines yields zero locales with utf-16 in their names.
Indeed, Linux can't use UTF-16 as a locale default encoding thanks to all the \0s.
You can read as binary, then do your own quick conversion:
http://unicode.org/faq/utf_bom.html#utf16-3
But it is probably safer to use a library (like libiconv) which handles invalid sequences properly.
I would strongly recommend using a Unicode encoding as your program's internal representation. Use either UTF-16 or UTF-8. If you use UTF-16 internally, then obviously no translation is required. If you use UTF-8, you can use a locale with .UTF-8 in it such as en_US.UTF-8.

Resources