How to display unicode Arabic string in VS output window?

How to display unicode Arabic string in VS output window? - visual-studio

I have a uni-code string in Arabic to display in output window rather than in console, so I could only use OutputDebugStringW, and I call SetConsoleOutputCP(1256) to set Arabic code page but still it only output "????". What should I do...

This is a documented restriction for OutputDebugStringW():
OutputDebugStringW converts the specified string based on the current system locale information and passes it to OutputDebugStringA to be displayed. As a result, some Unicode characters may not be displayed correctly.
Calling SetConsoleOutputCP() doesn't solve the problem, that changes the code page for the console window, not the debugger. You'd have to change your system locale, Control Panel + Region, Administrative tab. If Arabic is your favorite language then changing it to 1256 is the appropriate thing to do. It will of course have system-wide effects.

Related

How to set default unicode of notepad to UTF8?

Every time I'm saving a file that has some Unicode characters in
notepad, it prompts me that this file is going to be saved in ansi
format and you will losing some data and I should cancel saving
and choose UTF8 as unicode.
How can I set default encoding to UTF8 so it will not prompt me every time?
thanks in advance.

In windows 10, get to
Control Panel > Region > Tab Administrative
Hit button "Change system".
Then choose the language you use from the combobox labeled "Current system locale".
And check the checkbox labeled "Beta: Use Unicode UTF-8 for worldwide language s".
Hit the ok button.

Short answer - Notepad simply does not support what you are asking for. It will always default to ANSI, you have to tell it explicitly not to use ANSI. However, there are alternatives available, see Changing the default ANSI to UTF-8 in Notepad on SuperUser.

Display Arabic text in Rich TextBox

Hi i am using VB6 and I have a field in my DB which stores Arabic text like "ÓÔÄÇåì". Now I want to display this in RichTextbox. I have set RichTextBox1.Font.CharSet property to 178 and RichTextBox1.Font.Name to Arial. But still it's not displaying in the correct format which is "سشؤاهى". Please help.

You can use the Text Object Model (tom) type library to get Unicode access to the RichEdit control wrapped within a RichTextBox.
You can use API calls to get Unicode access to the same thing.
You can replace usage of the RichTextBox with another control, for example the InkEdit control that ships as part of Windows beginning in Vista. Turn off the ink capture capability and you have a Unicode RTF box via the normal properties (.Text, .TextRTF, etc.).
All of this presumes that your database actually contains Unicode text.

Has anyone heard of this strange bug with the standard Windows message box?

Years ago, I was messing around with Visual Basic and I discovered a bug with the MsgBox function. I tried searching for it, but nobody had ever said anything about it. It's not just with Visual Basic though; it's with anything that uses the standard Windows MessageBox API call.
The bug is triggered when the title text has more than one character, and the first character is a lowercase 'y' with an umlaut ('ÿ'). What's so special about this character? It almost definitely not the character itself, but rather its ASCII value that's special. 'ÿ' is character 255 (0xFF), meaning it's the highest value that can be stored in an unsigned byte, and all its bits are set to 1.
What does this bug do? Well, there are two different possibilities, which depend on the number of characters in the title text. If there are an even number of characters (unless it's 2) in the title text, no message box appears, and you just hear the alert sound. If there are two characters in the title text, or any odd number other than 1 (in which case the bug wouldn't be triggered)...then this happens:
And that's not all--the message will also be truncated to one line. It seems like the kind of bug that would occur in at least one semi-high-profile incident, considering how often this API call is used. Are there any reports of this on the Internet, or anything showing what could cause it? Maybe it's a Unicode-related glitch, like that "bush hid the facts" glitch in Notepad?
I made a program in case you want to play around with this; download it here.
Alternatively, copy the following into Notepad, save it with a .vbs extension, and double-click it to display the dialog box seen above:
MsgBox "Windows 3.1 font, anyone?", 0, "ÿ ODD NUMBER!"
Or for a different font:
MsgBox "I CAN HAS CHEEZBURGER?", 0, "ÿ HImpact"
EDIT: It seems that if the first four characters are ÿ's, it doesn't ever display the message, even if there's an odd number of characters.

This is a bug with dialog templates generally. It is not a message box bug as such.
For example, in Visual Studio create the default win32 application. In the .rc file, change the caption in the template for the about box from
CAPTION "About sampleapp"
to
CAPTION "ÿT"
and the bug will manifest itself when you display the about box.
In the DLGTEMPLATEEX documentation note that the menu and class name have type sz_Or_Ord which means either a null-terminated string or 0xFFFF followed by a single word resource identifier.
Windows incorrectly applies a similar scheme to the dialog title: if the first character is 0xFF then it treats the title as being two WORDs long, but only when it is trying to locate the font information. When it is displaying the title it correctly treats the title as a string.
In other words, Windows is looking for the font information inside the title string. In most case this won't specify a valid font, so Windows defaults to the system font.
To prove this, I constructed a dialog template in memory (based on this). Once this was working I deleted the code that writes the font information to the template and used the dialog title "ÿa\xd\x200\x21SimSun". This displays the dialog in italic SimSun because windows is reading the font information from the title string.
This bug is likely a hangover from 16-bit Windows, where (I guess) 0xFF was used as the resource ID marker.

A strange bug. I suspect the symptoms are the result of the way the MessageBox() actually displays the dialog.
Internally, MessageBox() builds a dialog template dynamically. If you look at the description of a DLGTEMPLATE structure you'll find the following nugget of information:
In a standard template for a dialog box, the DLGTEMPLATE structure is
always immediately followed by three variable-length arrays that
specify the menu, class, and title for the dialog box. When the
DS_SETFONT style is specified, these arrays are also followed by a
16-bit value specifying point size and another variable-length array
specifying a typeface name.
So, the in-memory layout of a dialog template has the font specification immediately following the dialog box title.
Visual Basic does not use Unicode and so the function you're calling is actually MessageBoxA(). This is simply a thunk that converts the passed-in strings from multibyte to Unicode and then calls MessageBoxW().
I believe what's happening is that, for some reason, the conversion of that string from multibyte to Unicode is either going wrong, or returning a spurious length value. This has the knock-on effect, when the dialog template is built in memory, of corrupting the memory immediately following the title string - which, as we know, is the font specification.

What does the charset parameter of CreateFont exactly set?

The code page on my Windows is set to ANSI (Latin1, Windows-1252).
I create a font with CreateFont and pass RUSSIAN_CHARSET in fdwCharSet
This is what I experience:
Windows controls (like a Static for example) using this font ignore the font's character set: the string passed to SetWindowTextA is displayed with Latin characters
After selecting this font on the DC, GDI text functions (Ext)TextOutA and DrawTextA use the character set of the font. Strings passed to them are displayed with cyrillic letters.
Why? When is the charset parameter of the font taken into account and when is it ignored? Can I force windows controls to use the font's character set?

You will have to convert the text to Unicode and call SetWindowTextW() instead of SetWindowTextA().
Make sure the window's class is registered with RegisterClassW() and not RegisterClassA(). This is what really determines if a window is Unicode. You can use IsWindowUnicode() to verify that the window is really Unicode.
Make sure you pass unhandled messages to DefWindowProcW() and not DefWindowProcA().
Or, if the window is a dialog, just make sure it is created with CreateDialogW() or DialogBoxParamW().

>Can I force windows controls to use the font's character set?
AFAIK no, you can't.
SetWindowTextA just converts the argument to Unicode, then calls SetWindowTextW: both windows kernel, shell, and GDI are unicode.
To convert the argument to Unicode, SetWindowTextA uses the setting from Window's regional options ("Language for non-Unicode programs").

Here is what is happening:
You call SetWindowText with something like "\xC4\xEE\xE1\xF0\xEE\xE5 xF3\xF2\xF0\xEE".
You're compiled as an ANSI application rather than Unicode, so that maps to a call to SetWindowTextA.
SetWindowTextA sees that the window was created in ANSI mode, so it sets the string directly. (If it had been a Unicode window, then it would converts the ANSI input string to Unicode and passes it onto SetWindowTextW.)
The ANSI window converts the ANSI string to Unicode so that it can display it. But it doesn't know the string is in a different code page than the default one for the system. It's this conversion that's changing everything back to Latin characters. It assumes that the input string is in the process's default code page (Windows 1252 in your case). So now you have a bunch of accented Latin characters instead of a string of Cyrillic ones.
The control tries to display this Unicode string using something like DrawTextW or TextOutW.
The lower level part of the system says, "Oh crap, this string is a bunch of Latin characters, but the user has chosen a Cyrillic font." To solve the problem, it uses font linking (or font fallback, I get those terms confused) to effectively select a font compatible with 1252.
You see Latin gobbledygook instead of proper Russian.
I tried coming up with a minimal way to do what you need, but I failed. My first idea was to do the conversion yourself and call SetWindowTextW directly:
void SetWindowTextRussian(HWND hwnd, char *pszCyrillic) {
const int cchCyrillic = ::lstrlen(pszCyrillic);
const int cchUnicode = 4 * cchCyrillic; // worst case
WCHAR *pszUnicode = new WCHAR[cchUnicode];
// See: http://msdn.microsoft.com/en-us/library/dd317756(v=vs.85).aspx
const UINT CP_CYRILLIC = 1251;
if (::MultiByteToWideChar(CP_CYRILLIC, 0, pszCyrillic, -1,
pszUnicode, cchUnicode) > 0) {
::SetWindowTextW(hwnd, pszUnicode);
}
delete [] pszUnicode;
}
But this doesn't work. I suspect that since the window was created as an ANSI window, the Unicode string is converted back to ANSI (assuming the wrong code page again), and then you get question marks instead of Latin nonsense.
I think you're going to have to convert to a Unicode app, or only run with the default code page set to 1251.
Update: If you control the creation of the window (e.g., you call CreateWindow directly rather than having a dialog box instantiate the controls), then you can probably make the above work by calling CreateWindowW directly and creating a Unicode window for the controls that matter.

Consider hooking gdi32full.dll GetCodePage to select the code page that you need. CP_UTF8 for example. It has single pointer param, returns single DWORD (the code page) and stdcall calling convention.

Unicode characters in window caption

We're having trouble setting window captions using cyrillic or japanese characters. We either see question marks or random garbage, but not the text we want. We've tried using different encodings, SetWindowText(), SetWindowTextW(), SetWindowTextA(), and so on. We can't even get it to work by passing a string literal to SetWindowText().
Our Windows install does appear to have everything it needs - Internet Explorer and Firefox do show cyrillic and japanese captions correctly, for example. So I'm pretty sure we're not finding the correct encoding/method combination. Any suggestions?

The problem you've got (at a guess) is that the top-level frame window of your application is an ANSI window. Under the hood, when you create a window (with CreateWindow() or CreateWindowEx()) a window class must be supplied. This window class determines the window's properties, including whether or not it, by default, accepts ANSI messages or Unicode messages. In turn, this is set by whether you (or your framework) register the window class by calling RegisterClassExA() or RegisterClassExW().
What's almost certainly happing is that your top-level window's class is being registered with RegisterClassExA(). This means that the default window procedure for the window will translate all Unicode strings in messages to ANSI before processing them, hence the question marks and odd characters everywhere.
The easiest solution to all this is to just make your application Unicode throughout (usually done by defining _UNICODE). The other way is to figure out where RegisterClassEx() is being called, and make it RegisterClassExW(). This may cause ANSI/Unicode issues with other messages, but it should (in theory, at least) work. Of course, either way will break Windows 9X, if that's an issue.
If all this sounds horrendously complicated, you're not wrong ...

SetWindowText()? Did you compile your application as Unicode? If not, SetWindowText() is equivalent to SetWindowTextA(), which in turn is limited to your current system locale (aka "Language for non-Unicode applications").
Also, how did you CREATE your window? Using an explicitely Unicode-aware API such as CreateWindowExW()? If not, make sure that your program is compiled as Unicode.
If your program is not compiled as Unicode, you might want to either modify your "Language for non-Unicode applications" in CP/Regional Options. Reboot required. Or more easily: Use MS AppLocale to simulate a cyrillic system locale

You have to compile your application with _UNICODE defined. Otherwise all windows will still be MBCS and not utf-16 and therefore can't show cyrillic or japanese characters if the codepage doesn't match.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio