How can I programmatically determine the current default codepage of Windows? - winapi

I have to convert the encoding of a string output of a VB6 application to a specific encoding.
The problem is, I don't know the encoding of the string, because of that:
According to the VB6 documentation when accessing certain API functions the internal Unicode strings are converted to ANSI strings using the default codepage of Windows.
Because of that, the encoding of the string output can be different on different systems, but I have to know it to perform the conversion.
How can I read the default codepage using the Win32 API or - if there's no other way - by reading the registry?

It could be even more succinct by using GetACP - the Win32 API call for returning the default code page! (Default code page is often called "ANSI")
int nCodePage = GetACP();
Also many API calls (such as MultiByteToWideChar) accept the constant value CP_ACP (zero) which always means "use the system code page". So you may not actually need to know the current code page, depending on what you want to do with it.

GetSystemDefaultLCID() gives you the system locale.
If the LCID is not enough and you truly need the codepage, use this code:
TCHAR szCodePage[10];
int cch= GetLocaleInfo(
GetSystemDefaultLCID(), // or any LCID you may be interested in
LOCALE_IDEFAULTANSICODEPAGE,
szCodePage,
countof(szCodePage));
nCodePage= cch>0 ? _ttoi(szCodePage) : 0;

That worked for me, thanks, but can be written more succinctly as:
UINT nCodePage = CP_ACP;
const int cch = ::GetLocaleInfo(LOCALE_SYSTEM_DEFAULT,
LOCALE_RETURN_NUMBER|LOCALE_IDEFAULTANSICODEPAGE,
(LPTSTR)&nCodePage, sizeof(nCodePage) / sizeof(_TCHAR) );

Related

Convert AnsiString to UnicodeString in Lazarus with FreePascal

I found similar topics here but none of them had the solution to my question, so I am asking it in a new thread.
Couple of days ago, I changed the format the preferences of an application I am developing is saved, from INI to JSON.
I use the jsonConf unit for this.
A sample of the code I use to save a key-value pair in the file would be like below.
Procedure TMyClass.SaveSettings();
var
c: TJSONConfig;
begin
c:= TJSONConfig.Create(nil);
try
c.Filename:= m_settingsFilePath;
c.SetValue('/Systems/CustomName', m_customName);
finally
c.Free;
end;
end;
In my code, m_customName is an AnsiString type variable. TJSONConfig.SetValue procedure requires the key and value both to be of UnicodeString type. The application compiles fine, but I get warnings such
Warning: Implicit strung type conversion from "AnsiString" to "UnicodeString".
Some messages warn saying there is a potential data loss.
Of course I can go and change everything to UnicodeString type but this is too risky. I have't seen any issues so far by ignoring these warnings, but they show up all the time and it might cause issues on a different PC.
How do I fix this?
To avoid the warning do an explicit conversion because this way you tell the compiler that you know what you are doing (I hope...). In case of c.SetValue the expected type is a Unicodestring (UTF16), m_customname should be declared as a string unless there is good reason to do differently (see below), otherwise you may trigger unwanted internal conversions.
A string in Lazarus is UTF8-encoded, by default. Therefore, you can use the function UTF8Decode() for the conversion from UTF8 to Unicode, or UTF8ToUTF16() (unit LazUtf8).
var
c: TJSONConfig;
m_customName: String;
...
c.SetValue('/Systems/CustomName', UTF8Decode(m_customName));
You say above that the key-value pairs are in a file. Then the conversion depends on the encoding of the file. Normally I open the file in a good text editor and find the encoding somewhere - NotePad++, for example, displays the name of the encoding in the right corner of the statusbar. Suppose the encoding is that of codepage 1252 (Latin-1). These are ansistrings, therefore, you can declare the strings read from the file as ansistring. Because UTF8 strings are so common in Lazarus there is no direct conversion from ansistring to Unicode, and you must convert to UTF8 first. In the unit lconvencoding you find many conversion routines between various encodings. Select CP1252toUTF8() to go to UTF8, and then apply UTF8Decode() to finally get Unicode.
var
c: TJSONConfig;
m_customName: ansistring;
...
c.SetValue('/Systems/CustomName', UTF8Decode(CP1252ToUTF8(m_customName)));
The FreePascal compiler 3.0 can handle many of these conversions automatically using strings with predefined encodings. But I think explicit conversions are very clear to see what is happening. And fpc3.0 still emits the warnings which you want to avoid...

Predefined Windows icons: Unicode

I am assigning to the lpszIcon member of the MSGBOXPARAMSW structure(notice the W). I want to use one of the predefined icons like IDI_APPLICATION or IDI_WARNING but they are all ASCII (defined as MAKEINTRESOURCE). I tried doing this:
MSGBOXPARAMSW mbp = { 0 };
mbp.lpszIcon = (LPCWSTR) IDI_ERROR;
but then no icon displayed at all. So how can I use the unicode versions of the IDI_ icons?
There is no ANSI or Unicode variant of a numeric resource ID. The code that you use to set lpszIcon is correct. It is idiomatic to use the MAKEINTRESOURCE macro rather than a cast, but the cast has identical meaning. Your problem lies in the other code, the code that we cannot see.
Reading between the lines, I think that you are targeting ANSI or MBCS. You tried to use MAKEINTRESOURCE but that expands to MAKEINTRESOURCEA. That's what led you to cast. You should have used MAKEINTRESOURCEW to match MSGBOXPARAMSW. That would have resolved the compilation error you encountered. You could equally have changed the project to target UNICODE.
But none of that explains why the icon does not appear in the dialog. There has to be a problem elsewhere. If the dialog appears then the most likely explanation is that you have set hInstance to a value other than NULL. But the code to set lpszIcon is correct, albeit not idiomatic.

How do format a date/time/number/currency in another locale?

How do i format something for another locale in Windows?
For example, in managed C# code, i would try to render a DateTime using en-US locale with:
String s = DateTime.Now.ToString(CultureInfo.CreateSpecificCulture("en-US"));
TextRenderer.DrawText(
e.Graphics, s, SystemFonts.IconTitleFont,
new Point(16, 16), SystemColors.ControlText);
And that works fine when my computer's locale is en-US:
It even works fine when my computer's locale is de-DE:
But it completely falls apart when my computer's locale is ps-AF:
Note: My sample code is in .NET, but can also be native.
Update: Attempting to set System.Threading.Thread.CurrentThread.CurrentCulture to en-US before calling DrawText:
var oldCulture = System.Threading.Thread.CurrentThread.CurrentCulture;
System.Threading.Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("en-US");
try
{
// String s = DateTime.Now.ToString(CultureInfo.CreateSpecificCulture("en-US"));
String s = DateTime.Now.ToString();
TextRenderer.DrawText(e.Graphics, s, SystemFonts.IconTitleFont, new Point(16, 16), SystemColors.ControlText);
}
finally
{
System.Threading.Thread.CurrentThread.CurrentCulture = oldCulture;
}
No help.
Nine, no help
Jack, no help
Eight, possible straight
King, possible flush
Ace, no help
Six, possible straight
Dave of love for the dealer
Ace bets.
Update Two:
From Michael Kaplan's blog entry:
Sometimes, GDI respects users (even if no one else does!)
GDI doesn't give a crap about formatting or really anything related to locales, with one single exception:
Digit Substitution
Any time you go to render text it will grab those digit substitution settings in the user locale (including the user override information) and use the info to decide how to display numbers.
And there is no way to override those settings at the level where GDI uses them.
i wonder how Chrome manages it. When i write digits here, in the stackoverflow question, Chrome renders them using latin digits:
0123456789
See:
What you are seeing is due to the digit substitution that occurs when your system's locale is ps-AF.
I believe that's OK -- Users of such a locale are used to seeing digits presented this way.
Normally the way this is done is slightly different, see here for example, but I don't actually think this should make any difference:
String s = DateTime.Now.ToString(new CultureInfo("en-US"));
An alternative is to set Thread.CurrentCulture to your desired locale.
I.e. do this:
Thread.CurrentCulture = new CultureInfo("en-US");
And you can then replace the first line of your code with this:
String s = DateTime.Now.ToString();
I am not quite sure, but I believe that this would solve the digit substitution issue as DrawText would now be based on the en-US culture, rather than ps-AF

convert case of wide characters, given the LCID (Visual C++)

I have some existing Visual C++ code where I need to add the conversion of wide character strings to upper or lower case.
I know there are pitfalls to this (such as the Turkish "I"), but most of these can be ironed-out if you know the language. Fortunately in this area of code I know the LCID value (locale ID) which I guess is the same as knowing the language.
As LCID is a Windows type, is there a Windows function that will convert wide strings to upper or lower case?
The C runtime function _towupper_l() sounds like it would be ideal but it takes a _locale_t parameter instead of LCID, so I guess it's unsuitable unless there is a completely reliable way of converting an LCID to a _locale_t.
The function you're searching for is called LCMapString and it is part of the Windows NLS APIs. The LCMAP_UPPERCASE flag maps characters to uppercase, while the LCMAP_LOWERCASE maps characters to lowercase.
For applications targeting Windows Vista and later, there is an Ex variant that works on locale names instead of identifiers, which are what Microsoft now says you should prefer to use.
In fact, in the CRT implementation provided with VS 2010 (and presumably other versions as well), functions such as _towupper_l ultimately end up calling LCMapString after they extract the locale ID (LCID) from the specified _locale_t.
If you're like me, and less familiar with the i8n APIs than you should be, you probably already know about the CharUpper, CharLower, CharUpperBuff, and CharLowerBuff family of functions. These have been the old standbys from the early days of Windows for altering the case of chars/strings, but as their documentation warns:
Note that CharXxx always maps uppercase I to lowercase I ("i"), even when the current language is Turkish or Azeri. If you need a function that is linguistically sensitive in this respect, call LCMapString.
What it neglects to mention is filled in by a couple of posts on Michael Kaplan's wonderful blog on internationalization issues: What does "linguistic casing" mean?, How best to alter case. The executive summary is that you achieve the same results as the CharXxx family of functions by calling LCMapString and not specifying the LCMAP_LINGUISTIC_CASING flag, whereas you can be linguistically sensitive by ensuring that you do specify the LCMAP_LINGUISTIC_CASING flag.
Sample code:
std::wstring test("Does my code pass the Turkey test?");
if (!LCMapStringW(lcid, /* your LCID, defined elsewhere */
LCMAP_UPPERCASE | LCMAP_LINGUISTIC_CASING,
test.c_str(), /* input string */
test.length(), /* length of input string */
&test[0], /* output buffer (can reuse input) */
test.length())) /* length of output buffer (same as input) */
{
// Uh-oh! Something went wrong in the call to LCMapString, so you need to
// handle the error somehow here.
// A good start is calling GetLastError to determine the error code.
}

Hiding an entry (or a "fin the registry

I'm trying to hide some values in the registry (such as serial numbers) with C++/windows
so I've been looking at this article http://www.codeproject.com/KB/system/NtRegistry.aspx
which says:
How is this possible? The answer is
that a name which is a counted as a
Unicode string can explicitly include
NULL characters (0) as part of the
name. For example, "Key\0". To include
the NULL at the end, the length of the
Unicode string is specified as 4.
There is absolutely no way to specify
this name using the Win32 API since if
"Key\0" is passed as a name, the API
will determine that the name is "Key"
(3 characters in length) because the
"\0" indicates the end of the name.
When a key (or any other object with a
name such as a named Event, Semaphore,
or Mutex) is created with such a name,
any application using the Win32 API
will be unable to open the name, even
though they might seem to see it.
so I tried doing something similar:
HKEY keyHandle;
PHKEY key;
unsigned long status = 0;
wchar_t *wKeyName = new wchar_t[m_keyLength];
MultiByteToWideChar(CP_ACP, 0, m_keyName, m_keyLength, wKeyName, m_keyLength);
wKeyName[18] = '\0';
long result = RegCreateKeyExW(HKEY_LOCAL_MACHINE,
wKeyName,
0,
NULL,
0,
KEY_ALL_ACCESS,
NULL,
&keyHandle,
&status);
where m_keyName is the ASCII text and wKeyName is the wide char text, but in regedit I see that it is treated the same and the key is just cut where I put the '\0'.
what is wrong with it?
The problem is that you are using the Win32 API and not the NT Native API. There is a table about 1/2 way through the article that you referenced that contains the list of Native APIs. For example, you would use NtCreateKey or ZwCreateKey instead of RegCreateKeyExW. The Win32 API assumes that alls strings are terminated by a NUL character whereas the Native API counterparts use a UNICODE_STRING structure for the name.
I'll take a stab in the dark, as I have never tried to do this.
It appears that you are using the wrong function to create your registry key. You should be using the NtCreateKey method because RegCreateKeyEx[AW] will notice your '\0' and chop off past it.
Why not use the class provided in the example? It provides a method called CreateHiddenKey. To use it, simply call SetKey before it. It would be much cleaner.

Resources