Why does GetWindowLong have ANSI and Unicode variants? - winapi

I found out today that GetWindowLong (and GetWindowLongPtr) has 'ANSI' (A) and 'Unicode' (W) flavours, even though they don't have TSTR arguments. The MSDN page on GetWindowLong only indicates that these variants exist, but doesn't mention why.
I can imagine that it must match the encoding of CreateWindowEx (which also has A/W flavours) or RegisterClass, but for many reasons, I don't think this makes sense. Apparently, it matters, because someone reported that the Unicode version may fail on XP (even though XP is NT and, as I understand it, all Unicode under the hood). I have also tried to disassemble the 32-bit version of USER32.DLL (which contains both flavours of GetWindowLong), and there is extra work done based on some apparent encoding difference*.
Which function am I supposed to choose?
*The flavours of GetWindowLong are identical, except for a boolean they pass around to other functions. This boolean is compared to a flag bit in a memory structure I can't be bothered to track down using static code analysis.

I believe the reason is explained in Raymond Chen's article, What are these strange values returned from GWLP_WNDPROC?
If the current window procedure is incompatible with the caller of GetWindowLongPtr, then the real function pointer cannot be returned since you can't call it. Instead, a "magic cookie" is returned. The sole purpose of this cookie is to be recognized by CallWindowProc so it can translate the message parameters into the format that the window procedure expects.
For example, suppose that you are running Windows XP and the window is a UNICODE window, but a component compiled as ANSI calls GetWindowLong(hwnd, GWL_WNDPROC). The raw window procedure can't be returned, because the caller is using ANSI window messages, but the window procedure expects UNICODE window messages. So instead, a magic cookie is returned. When you pass this magic cookie to CallWindowProc, it recognizes it as a "Oh, I need to convert the message from ANSI to UNICODE and then give the UNICODE message to that window procedure over there."

Related

Getting the default RTL codepage in Lazarus

Lazarus Wiki states
Lazarus (actually its LazUtils package) takes advantage of that API
and changes it to UTF-8 (CP_UTF8). It means also Windows users now use
UTF-8 strings in the RTL
In our cross-platform and cross-compiler code, we'd like to detect this specific situation. GetACP() Windows API function still returns "1252", and so does GetDefaultTextEncoding() function in Lazarus. But the text (specifically, the filename returned by FindFirst() function) contains the string with UTF8-encoded filename, and the codepage of the string (variable) is 65001 too.
So, how do we figure out that the RTL operates with UTF8 strings by default? I've spent several hours trying to figure this out from Lazarus source code, but probably I am missing something ...
I understand that in many scenarios, we need to inspect the codepage of each specific string, but I am interested in the way to find out the default RTL codepage which is UTF8 in Lazarus, yet Windows-defined one in FPC/Windows without Lazarus.
Turns out, that there's no single code page variable or function. Results of the filesystem API calls are converted to the codepage, defined in DefaultRTLFileSystemCodePage variable. The only problem is that this variable is present in the source code and is supposed to be in system unit, but the compiler doesn't see it.

What differentiation between GetMessageA and GetMessageW function?

I am learning programming Window GUI. I don't know differentiation between 2 function GetMessageA and GetMessageW. I saw the GetMessage function have not any parameters involve to ANSI or Unicode.
All older Win32 calls that involve strings are actually macros that expand to either a Unicode version or an ANSI version, based upon the "Character Set" property of the project.
GetMessage(...) will map to either GetMessageA(...) or GetMessageW(...) where the "A" version will handle messages that involve strings as ANSI formatted text and the "W" version will use UTF-16.

can mvprintw(), curses function work with usual ascii codes?

I've developed a little console C++ game, that uses ASCII graphics, using cout for the moment. But because I want to make things work better, I have to use pdcurses. The thing is curses functions like printw(), or mvprintw() don't use the regular ascii codes, and for this game I really need to use the smiley characters, heart, spades and so on.
Is there a way to make curses work with the regular ascii codes ?
You shouldn't think of characters like the smiley face as "regular ASCII codes", because they really aren't ASCII at all. (ASCII only covers characters 32-127, plus a handful of control codes under 32.) They're a special case, and the only reason you're able to see them in (I assume?) your Windows CMD shell is that it's maintaining backwards compatibility with IBM Code Page 437 (or similar) from ancient DOS systems. Meanwhile, outside of the DOS box, Windows uses a completely different mapping, Windows-1252 (a modified version of ISO-8859-1), or similar, for its 8-bit, so-called "ANSI" character set. But both of these types of character sets are obsolete, compared to Unicode. Confused yet? :)
With curses, your best bet is to use pure ASCII, plus the defined ACS_* macros, wherever possible. That will be portable. But it won't get you a smiley face. With PDCurses, there are a couple of ways to get that smiley face: If you can safely assume that your console is using an appropriate code page, then you can pass the A_ALTCHARSET attribute, or'ed with the character, to addch(); or you can use addrawch(); or you can call raw_output(TRUE) before printing the character. (Those are all roughly equivalent.) Alternatively, you can use the "wide" build of PDCurses, figure out the Unicode equivalents of the CP437 characters, and print those, instead. (That approach is also portable, although it's questionable whether the characters will be present on non-PCs.)

how does windows wchar_t handle unicode characters outside the basic multilingual plane?

I've looked at a number of other posts here and elsewhere (see below), but I still don't have a clear answer to this question: How does windows wchar_t handle unicode characters outside the basic multilingual plane?
That is:
many programmers seem to feel that UTF-16 is harmful because it is a variable-length code.
wchar_t is 16-bits wide on windows, but 32-bits wide on Unix/MacOS
The Windows APIs use wide-characters, not Unicode.
So what does Windows do when you want to code something like ๐ ‚Š (U+2008A) Han Character on Windows?
The implementation of wchar_t under the Windows stdlib is UTF-16-oblivious: it knows only about 16-bit code units.
So you can put a UTF-16 surrogate sequence in a string, and you can choose to treat that as a single character using higher level processing. The string implementation won't do anything to help you, nor to hinder you; it will let you include any sequence of code units in your string, even ones that would be invalid when interpreted as UTF-16.
Many of the higher-level features of Windows do support characters made out of UTF-16 surrogates, which is why you can call a file ๐€.txt and see it both render correctly and edit correctly (taking a single keypress, not two, to move past the character) in programs like Explorer that support complex text layout (typically using Windows's Uniscribe library).
But there are still places where you can see the UTF-16-obliviousness shining through, such as the fact you can create a file called ๐€.txt in the same folder as ๐จ.txt, where case-insensitivity would otherwise disallow it, or the fact that you can create [U+DC01][U+D801].txt programmatically.
This is how pedants can have a nice long and basically meaningless argument about whether Windows โ€œsupportsโ€ UTF-16 strings or only UCS-2.
Windows used to use UCS-2 but adopted UTF-16 with Windows 2000. Windows wchar_t APIs now produce and consume UTF-16.
Not all third party programs handle this correctly and so may be buggy with data outside the BMP.
Also, note that UTF-16, being a variable length encoding, does not conform to the C or C++ requirements for an encoding used with wchar_t. This causes some problems such as some standard functions that take a single wchar_t, such as wctomb, can't handle characters beyond the BMP on Windows, and Windows defining some additional functions that use a wider type in order to be able to handle single characters outside the BMP. I forget what function it was, but I ran into a Windows function that returned int instead of wchar_t (and it wasn't one where EOF was a possible result).

Exaustive lists of all possible errors for various Windows API calls?

CreateFile for example. When I get INVALID_HANDLE_VALUE, what are all the possible values that can be returned by GetLastError? MSDN doesn't say. It mentions some and I can guess others, but how (if at all) can I be sure that my switch statement will never reach default?
Such a list doesn't exist and in fact you can't ever have such a list. In some future version of Windows a function may well start returning an error code that did not exist when you compiled your program.
The standard way to deal with this is handle any error codes that you know about that need special treatment, and let all others fall through to a default handler. Call FormatMessage() to get a descriptive text string for the error.

Resources