I am learning programming Window GUI. I don't know differentiation between 2 function GetMessageA and GetMessageW. I saw the GetMessage function have not any parameters involve to ANSI or Unicode.
All older Win32 calls that involve strings are actually macros that expand to either a Unicode version or an ANSI version, based upon the "Character Set" property of the project.
GetMessage(...) will map to either GetMessageA(...) or GetMessageW(...) where the "A" version will handle messages that involve strings as ANSI formatted text and the "W" version will use UTF-16.
Related
Lazarus Wiki states
Lazarus (actually its LazUtils package) takes advantage of that API
and changes it to UTF-8 (CP_UTF8). It means also Windows users now use
UTF-8 strings in the RTL
In our cross-platform and cross-compiler code, we'd like to detect this specific situation. GetACP() Windows API function still returns "1252", and so does GetDefaultTextEncoding() function in Lazarus. But the text (specifically, the filename returned by FindFirst() function) contains the string with UTF8-encoded filename, and the codepage of the string (variable) is 65001 too.
So, how do we figure out that the RTL operates with UTF8 strings by default? I've spent several hours trying to figure this out from Lazarus source code, but probably I am missing something ...
I understand that in many scenarios, we need to inspect the codepage of each specific string, but I am interested in the way to find out the default RTL codepage which is UTF8 in Lazarus, yet Windows-defined one in FPC/Windows without Lazarus.
Turns out, that there's no single code page variable or function. Results of the filesystem API calls are converted to the codepage, defined in DefaultRTLFileSystemCodePage variable. The only problem is that this variable is present in the source code and is supposed to be in system unit, but the compiler doesn't see it.
I've developed a little console C++ game, that uses ASCII graphics, using cout for the moment. But because I want to make things work better, I have to use pdcurses. The thing is curses functions like printw(), or mvprintw() don't use the regular ascii codes, and for this game I really need to use the smiley characters, heart, spades and so on.
Is there a way to make curses work with the regular ascii codes ?
You shouldn't think of characters like the smiley face as "regular ASCII codes", because they really aren't ASCII at all. (ASCII only covers characters 32-127, plus a handful of control codes under 32.) They're a special case, and the only reason you're able to see them in (I assume?) your Windows CMD shell is that it's maintaining backwards compatibility with IBM Code Page 437 (or similar) from ancient DOS systems. Meanwhile, outside of the DOS box, Windows uses a completely different mapping, Windows-1252 (a modified version of ISO-8859-1), or similar, for its 8-bit, so-called "ANSI" character set. But both of these types of character sets are obsolete, compared to Unicode. Confused yet? :)
With curses, your best bet is to use pure ASCII, plus the defined ACS_* macros, wherever possible. That will be portable. But it won't get you a smiley face. With PDCurses, there are a couple of ways to get that smiley face: If you can safely assume that your console is using an appropriate code page, then you can pass the A_ALTCHARSET attribute, or'ed with the character, to addch(); or you can use addrawch(); or you can call raw_output(TRUE) before printing the character. (Those are all roughly equivalent.) Alternatively, you can use the "wide" build of PDCurses, figure out the Unicode equivalents of the CP437 characters, and print those, instead. (That approach is also portable, although it's questionable whether the characters will be present on non-PCs.)
I have an inquiry about the "Character set" option in Visual Studio. The Character Set options are:
Not Set
Use Unicode Character Set
Use Multi-Byte Character Set
I want to know what the difference between three options in Character Set?
Also if I choose something of them, will affect the support for languages other than English (like RTL languages)?
It is a compatibility setting, intended for legacy code that was written for old versions of Windows that were not Unicode enabled. Versions in the Windows 9x family, Windows ME was the last and widely ignored one. With "Not Set" or "Use Multi-Byte Character Set" selected, all Windows API functions that take a string as an argument are redefined to a little compatibility helper function that translates char* strings to wchar_t* strings, the API's native string type.
Such code critically depends on the default system code page setting. The code page maps 8-bit characters to Unicode which selects the font glyph. Your program will only produce correct text when the machine that runs your code has the correct code page. Characters whose value >= 128 will get rendered wrong if the code page doesn't match.
Always select "Use Unicode Character Set" for modern code. Especially when you want to support languages with a right-to-left layout and you don't have an Arabic or Hebrew code page selected on your dev machine. Use std::wstring or wchar_t[] in your code. Getting actual RTL layout requires turning on the WS_EX_RTLREADING style flag in the CreateWindowEx() call.
Hans has already answered the question, but I found these settings to have curious names. (What exactly is not being set, and why do the other two options sound so similar?) Regarding that:
"Unicode" here is Microsoft-speak for UCS-2 encoding in particular. This is the recommended and non-codepage-dependent described by Hans. There is a corresponding C++ #define flag called _UNICODE.
"Multi-Byte Character Set" (aka MBCS) here the official Microsoft phrase for describing their former international text-encoding scheme. As Hans described, there are different MBCS codepages describing different languages. The encodings are "multi-byte" in that some or all characters may be represented by multiple bytes. (Some codepages use a variable-length encoding akin to UTF-8.) Your typical codepage will still represent all the ASCII characters as one-byte each. There is a corresponding C++ #define flag called _MBCS
"Not set" apparently refers to compiling with_UNICODE nor _MBCS being #defined. In this case Windows works with a strict one-byte per character encoding. (Once again there are several different codepages available in this case.)
Difference between MBCS and UTF-8 on Windows goes into these issues in a lot more detail.
I found out today that GetWindowLong (and GetWindowLongPtr) has 'ANSI' (A) and 'Unicode' (W) flavours, even though they don't have TSTR arguments. The MSDN page on GetWindowLong only indicates that these variants exist, but doesn't mention why.
I can imagine that it must match the encoding of CreateWindowEx (which also has A/W flavours) or RegisterClass, but for many reasons, I don't think this makes sense. Apparently, it matters, because someone reported that the Unicode version may fail on XP (even though XP is NT and, as I understand it, all Unicode under the hood). I have also tried to disassemble the 32-bit version of USER32.DLL (which contains both flavours of GetWindowLong), and there is extra work done based on some apparent encoding difference*.
Which function am I supposed to choose?
*The flavours of GetWindowLong are identical, except for a boolean they pass around to other functions. This boolean is compared to a flag bit in a memory structure I can't be bothered to track down using static code analysis.
I believe the reason is explained in Raymond Chen's article, What are these strange values returned from GWLP_WNDPROC?
If the current window procedure is incompatible with the caller of GetWindowLongPtr, then the real function pointer cannot be returned since you can't call it. Instead, a "magic cookie" is returned. The sole purpose of this cookie is to be recognized by CallWindowProc so it can translate the message parameters into the format that the window procedure expects.
For example, suppose that you are running Windows XP and the window is a UNICODE window, but a component compiled as ANSI calls GetWindowLong(hwnd, GWL_WNDPROC). The raw window procedure can't be returned, because the caller is using ANSI window messages, but the window procedure expects UNICODE window messages. So instead, a magic cookie is returned. When you pass this magic cookie to CallWindowProc, it recognizes it as a "Oh, I need to convert the message from ANSI to UNICODE and then give the UNICODE message to that window procedure over there."
what is the diffrence between these six functions?
LoadLibrary
LoadLibraryA
LoadLibraryEx
LoadLibraryExA
LoadLibraryExW
LoadLibraryW
what is the meaning of each suffix in the winapi and what is the difference between all of those functions?
LoadLibrary and LoadLibraryEx are macros which are defined depending on whether your project is compiled with unicode support. If so, they point to LoadLibraryW and LoadLibraryExW, otherwise they point to LoadLibraryA and LoadLibraryExA.
Typically, you are expected to write code using versions without A or W in the end and let compiler definitions make all the magic for you.
The Ex suffix is a standard way of denoting an "EXtended" function: one that is similar to the regular version, but provides additional functionality. Generally, they were added in a newer version of Windows and may not always be available (although most of them are so old now that they were added back in Windows 3.1 or 95).
The exact difference between functions, as mentioned before, should always be checked on MSDN.
A means ANSI; W means Wide (Unicode).
The A versions do not support Unicode strings; they're relics from Win9X.
The suffix-less version will expand to the A or W versions at compile-time, depending on whether the symbol UNICODE is defined.
The Ex versions are newer versions of the API method with additional functionality; consult the documentation for more details.
A - ansi
W - unicode
Ex - extended version of same function, for example some additional parameters