What's the meaning of BSTR, LPCOLESTR, LPCWSTR, LPTSTR, LPCWCHAR, and many others if they're all just a bunch of defines that resolve to wchar_t anyway?
LPTSTR indicates the string buffer can be ANSI or UNICODE depending on the definition of the macro: UNICODE.
LPCOLESTR was invented by the OLE team because it switches its behaviour between char and wchar_t based on the definition of OLE2ANSI
LPCWSTR is a wchar_t string
BSTR is an LPOLESTR thats been allocated with SysAllocString.
LPCWCHAR is a pointer to a single constant wide character.
They're actually all rather different. Or at least, were at some time different. Ole was developed - and needed - wide strings while the windows API was still Win16 and didnt support wide strings natively at all.
Also, early versions of the Windows SDK didnt use wchar_t for WCHAR but unsigned short. The windows SDK on GCC gets interesting as - im led to belive that GCC 32bit has a 32bit wchar_t - on compilers with 32bit wchar_t, WCHAR would be defined as an unsigned short or some other type thats 16bits on that compiler.
LPTSTR and LPWSTR and similar defines are really just defines. BSTR and LPOLESTR have special meanings - they indicate the the string pointed to is allocated in a special way.
String pointed to by BSTR must be allocated with SysAllocString() family functions. String pointed to by LPOLESTR is usually to be allocated with CoTaskMemAlloc() (this should be looked up in the documentation to the COM call accepting/returning it).
If allocation/deallocation requirements for strings pointed to by BSTR and LPOLESTR are violated the program can run into undefined behaviour.
MSDN's page on Windows Data Types might provide clarification as to the differences between some of these data types.
LPCWSTR - Pointer to a constant null-terminated string of 16-bit
Unicode characters.
LPTSTR - An LPWSTR if UNICODE is defined, an LPSTR otherwise.
Related
I was wondering why some functions have some parameters that must be set to NULL because of "reserved parameters". For example:
LONG WINAPI RegQueryValueEx(
__in HKEY hKey,
__in_opt LPCTSTR lpValueName,
__reserved LPDWORD lpReserved,
__out_opt LPDWORD lpType,
__out_opt LPBYTE lpData,
__inout_opt LPDWORD lpcbData
);
I can't understand why lpReserved exists? I mean, if it's reserved why put it, wouldn't be simpler to directly omit it?
Thanks! :) (pay no attention to my English please..)
I see at least two reasons.
One would be that this parameter is reserved for future use and possible functionality extension. Making sure it's set to NULL could guarantee to some degree that in the future, when the new functionality has been added it won't break older programs.
Second possible reason would be that this parameter could actually be used internally as part of private API and public part of API dictates to set this parameter to NULL.
Why not to omit it altogether? It's much easier later to extend functionality of the system without changing the interface. It stays binary and source code compatible with the old API and does not require rebuilding of the older software.
I'm in the process of trying to learn Unicode? For me the most difficult part is the Encoding. Can BSTRs (Basic String) content code points U+10000 or higher? If no, then what's the encoding for BSTRs?
In Microsoft-speak, Unicode is generally synonymous with UTF-16 (little endian if memory serves). In the case of BSTR, the answer seems to be it depends:
On Microsoft Windows, consists of a string of Unicode characters (wide or
double-byte characters).
On Apple Power Macintosh, consists of a single-byte string.
May contain multiple embedded null characters.
So, on Windows, yes, it can contain characters outside the basic multilingual plane but these will require two 'wide' chars to store.
BSTR's on Windows originally contained UCS-2, but can in principle contain the entire unicode set, using surrogate pairs. UTF-16 support is actually up to the API that receives the string - the BSTR has no say how it gets treated. Most API's support UTF-16 by now. (Michael Kaplan sorts out the details.)
The windows headers still contain another definition for BSTR, it's basically
#if defined(_WIN32) && !defined(OLE2ANSI)
typedef wchar_t OLECHAR;
#else
typedef char OLECHAR;
#endif
typedef OLECHAR * BSTR;
There's no real reason to consider the char, however, unless you desperately want to be compatible with whatever this was for. (IIRC it was active - or could be activated - for early MFC builds, and might even have been used in Office for Mac or something like that.)
What are the various differences between the two symbols TCHAR and _TCHAR type defined in the Windows header tchar.h? Explain with examples. Briefly describe scenarios where you would use TCHAR as opposed to _TCHAR in your code. (10 marks)
In addition to what #RussC said, TCHAR is used by the Win32 API and is based on the UNICODE define, whereas _TCHAR is used by the C runtime and is based on the _UNICODE define instead. UNICODE and _UNICODE are usually defined/omitted together, making TCHAR and _TCHAR interchangable, but that is not a requirement. They are semantically separated for use by different frameworks.
Found your answer over here:
MSDN Forums >> Visual Studio Developer Center >> TCHAR vs _TCHAR
TCHAR and _TCHAR are identical, although since TCHAR doesn't have a
leading underscore, Microsoft aren't allowed to reserved it as a
keyword (imagine if you had a variable called TCHAR. Think what would
happen). Hence TCHAR will not be #defined when Language Extensions are
disabled (/Za).
TCHAR is defined in winnt.h (which you'll get when you #include
), and also tchar.h under /Ze.
_TCHAR is available only in tchar.h (which also #defines _TSCHAR and _TUCHAR). Those are unsigned/signed variants of the normal TCHAR data type.
I'd like to preview WCHAR strings in the variable display of the Xcode 3.2 debugger.
Bascially if I have
WCHAR wtext[128];
wcscpy(wtext, L"Hello World");
I'd like to see "Hello World" for wtext when tracing into the function.
You'll have to write a Custom Data Formatter for your wchar types, complete with your assumption of what the byte width is and what the character encoding is. THe C++ standard does not specify either of these, which is why wchar and wstring are not very portable and not well-supported on Mac OS X.
One example, with caveats about how you have to customize it for your particular mode of use, is here.
I need to convert a CString instance into a properly allocated BSTR and pass that BSTR into a COM method. To have code that compiles and works indentically for both ANSI and Unicode I use CString::AllocSysString() to convert whatever format CString to a Unicode BSTR.
Since noone owns the returned BSTR I need to take care of it and release it after the call is done in the most exception-safe manner posible and with as little code as possible.
Currently I use ATL::CComBSTR for lifetime management:
ATL::CComBSTR converted;
converted.Attach( sourceString.AllocSysString() ); //simply attaches to BSTR, doesn't reallocate it
interface->CallMethod( converted );
what I don't like here is that I need two separate statements to just construct the ATL::CComBSTR bound to the convertion result.
Is there a better way to accomplish the same task?
CComBSTR has overloaded constructors for both char* and wchar_t*, which make the call to SysAllocString() on your behalf. So the explicit allocation in your code snippet is actually unnecessary. The following would work just as well:
ATL::CComBSTR converted = sourceString;
interface->CallMethod(converted);
Furthermore, if you have no need to use the converted BSTR elsewhere in your code, you can perform the object construction in-place in the method call, like so:
interface->CallMethod(ATL::CComBSTR(sourceString));
The same applies to the _bstr_t class, which can be used instead of CComBSTR if you don't want a dependency on the ATL.
One of the confusing aspects of Windows programming is managing the conversion of Visual Basic style strings to/from C language style strings. It isn't that it is so difficult, it is just difficult to remember the details. It is usually not done often, and the MSDN documentation is so voluminous that it is difficult to find answers to your questions. But, the worst part is that you could perform some typecast that compiles fine, but doesn't work the way you expect. This results in code that doesn't work, and the bugs are hard to track down. After some experience, you learn to make sure your string conversions are doing what you expect.
C strings are arrays of characters terminated by a NULL character. Visual Basic strings differ in that the length of the string precede the characters in the string. So, a VB string knows its own length. In addition, all VB strings are Unicode (16 bits per character).
String Types
BSTR/C String conversions are required if:
You are doing COM programming in C/C++
You are writing multiple language applications, such as C++ DLL's accessed by Visual Basic applications.
One of _bstr_t constructors allows you to simply attach to existing BSTR so that you can have the exception that you want from CString::AllocSysString when BSTR allocation fails.
// _bstr_t simply attaches to BSTR, doesn't reallocate it
interface->CallMethod( _bstr_t(sourceString.AllocSysString(), false) );
The _bstr_t constructor documentation says:
_bstr_t(
BSTR bstr,
bool fCopy
);
fCopy
If false, the bstr argument is attached to the new object without making a copy by calling SysAllocString.
On the other hand, CComBSTR constructor doesn't seem to have the corresponding signature; although it can be used as well if BSTR allocation failure exception is not really needed, as mentioned by Phil Booth in his answer.