How to display unicode control characters in visual studio text visualizer? - visual-studio-2010

I get some text string from service, which contains Unicode control characters
(i.e \u202B or \u202A and others for Arabic language support).
But while debugging I can't see them in default text visualizer. So I need to enable display for such characters to determine which of them my text consists of. There is checkbox in text visualizer "show all characters", but it doesn't work as I expect.
Any suggestions?
Thanks in advance

Those are codes for explicit RLE and LRE order, ie if in RLE something should be displayed in LRE order.
http://unicode.org/reports/tr9/#Directional_Formatting_Codes

Related

Why is facepalm emoji 🤦‍♀️ followed by U+200D♀

I am trying to send utf-8 symbols via serial device to browser and display them. I have found out when I print facepalm emoji 🤦‍♀️ (on windows 10 Win+.) it has U+200D and ♀ characters behind. Others emojis don't have that. I was using View non-printable unicode characters tool. Also I found, if you print it in notepad it will show you ♀, when you print it in browser address bar ♀ is invisible but if you press backspace you delete it. And finally, if you print it in some html text input, you can delete whole emoji with single backspace. Why is that?
Emoji sequences have more than one code point to signify variations (below may or may not look different for each sequence depending on browser):
🤦 PERSON FACEPALMING U+1F926
🤦‍♂️ MAN FACEPALMING U+1F926 U+200D U+2642 U+FE0F
🤦‍♀️ WOMAN FACEPALMING U+1F926 U+200D U+2640 U+FE0F
References:
Emoji List, v13.1 No. 260-262.
Full Emoji List, v13.1, No. 260-262 (With browser-specific images)
Unicode® Standard Annex #29, UNICODE TEXT SEGMENTATION
Some editors/browsers handle the sequences better than others and may not show differences in all variations or may not recognize the latest Unicode specfication and newer emojis.

How to set default unicode of notepad to UTF8?

Every time I'm saving a file that has some Unicode characters in
notepad, it prompts me that this file is going to be saved in ansi
format and you will losing some data and I should cancel saving
and choose UTF8 as unicode.
How can I set default encoding to UTF8 so it will not prompt me every time?
thanks in advance.
In windows 10, get to
Control Panel > Region > Tab Administrative
Hit button "Change system".
Then choose the language you use from the combobox labeled "Current system locale".
And check the checkbox labeled "Beta: Use Unicode UTF-8 for worldwide language s".
Hit the ok button.
Short answer - Notepad simply does not support what you are asking for. It will always default to ANSI, you have to tell it explicitly not to use ANSI. However, there are alternatives available, see Changing the default ANSI to UTF-8 in Notepad on SuperUser.

Has anyone heard of this strange bug with the standard Windows message box?

Years ago, I was messing around with Visual Basic and I discovered a bug with the MsgBox function. I tried searching for it, but nobody had ever said anything about it. It's not just with Visual Basic though; it's with anything that uses the standard Windows MessageBox API call.
The bug is triggered when the title text has more than one character, and the first character is a lowercase 'y' with an umlaut ('ÿ'). What's so special about this character? It almost definitely not the character itself, but rather its ASCII value that's special. 'ÿ' is character 255 (0xFF), meaning it's the highest value that can be stored in an unsigned byte, and all its bits are set to 1.
What does this bug do? Well, there are two different possibilities, which depend on the number of characters in the title text. If there are an even number of characters (unless it's 2) in the title text, no message box appears, and you just hear the alert sound. If there are two characters in the title text, or any odd number other than 1 (in which case the bug wouldn't be triggered)...then this happens:
And that's not all--the message will also be truncated to one line. It seems like the kind of bug that would occur in at least one semi-high-profile incident, considering how often this API call is used. Are there any reports of this on the Internet, or anything showing what could cause it? Maybe it's a Unicode-related glitch, like that "bush hid the facts" glitch in Notepad?
I made a program in case you want to play around with this; download it here.
Alternatively, copy the following into Notepad, save it with a .vbs extension, and double-click it to display the dialog box seen above:
MsgBox "Windows 3.1 font, anyone?", 0, "ÿ ODD NUMBER!"
Or for a different font:
MsgBox "I CAN HAS CHEEZBURGER?", 0, "ÿ HImpact"
EDIT: It seems that if the first four characters are ÿ's, it doesn't ever display the message, even if there's an odd number of characters.
This is a bug with dialog templates generally. It is not a message box bug as such.
For example, in Visual Studio create the default win32 application. In the .rc file, change the caption in the template for the about box from
CAPTION "About sampleapp"
to
CAPTION "ÿT"
and the bug will manifest itself when you display the about box.
In the DLGTEMPLATEEX documentation note that the menu and class name have type sz_Or_Ord which means either a null-terminated string or 0xFFFF followed by a single word resource identifier.
Windows incorrectly applies a similar scheme to the dialog title: if the first character is 0xFF then it treats the title as being two WORDs long, but only when it is trying to locate the font information. When it is displaying the title it correctly treats the title as a string.
In other words, Windows is looking for the font information inside the title string. In most case this won't specify a valid font, so Windows defaults to the system font.
To prove this, I constructed a dialog template in memory (based on this). Once this was working I deleted the code that writes the font information to the template and used the dialog title "ÿa\xd\x200\x21SimSun". This displays the dialog in italic SimSun because windows is reading the font information from the title string.
This bug is likely a hangover from 16-bit Windows, where (I guess) 0xFF was used as the resource ID marker.
A strange bug. I suspect the symptoms are the result of the way the MessageBox() actually displays the dialog.
Internally, MessageBox() builds a dialog template dynamically. If you look at the description of a DLGTEMPLATE structure you'll find the following nugget of information:
In a standard template for a dialog box, the DLGTEMPLATE structure is
always immediately followed by three variable-length arrays that
specify the menu, class, and title for the dialog box. When the
DS_SETFONT style is specified, these arrays are also followed by a
16-bit value specifying point size and another variable-length array
specifying a typeface name.
So, the in-memory layout of a dialog template has the font specification immediately following the dialog box title.
Visual Basic does not use Unicode and so the function you're calling is actually MessageBoxA(). This is simply a thunk that converts the passed-in strings from multibyte to Unicode and then calls MessageBoxW().
I believe what's happening is that, for some reason, the conversion of that string from multibyte to Unicode is either going wrong, or returning a spurious length value. This has the knock-on effect, when the dialog template is built in memory, of corrupting the memory immediately following the title string - which, as we know, is the font specification.

How to display unicode Arabic string in VS output window?

I have a uni-code string in Arabic to display in output window rather than in console, so I could only use OutputDebugStringW, and I call SetConsoleOutputCP(1256) to set Arabic code page but still it only output "????". What should I do...
This is a documented restriction for OutputDebugStringW():
OutputDebugStringW converts the specified string based on the current system locale information and passes it to OutputDebugStringA to be displayed. As a result, some Unicode characters may not be displayed correctly.
Calling SetConsoleOutputCP() doesn't solve the problem, that changes the code page for the console window, not the debugger. You'd have to change your system locale, Control Panel + Region, Administrative tab. If Arabic is your favorite language then changing it to 1256 is the appropriate thing to do. It will of course have system-wide effects.

Nabla Special Character Shows as Null Character

I have a page that uses the nabla character for bullets on a menu. The code for it is ∇. However, on my machine as well as some others around the office, this character comes up as the null character, [], in IE. In Firefox, though, it displays the actual character. My question is whether there is a fix for this so I can make sure anyone viewing this site will see the actual character. Is there a browser font that needs to be installed on any machine that views this site, or is it an issue that can be fixed from my end.
The empty square character is not the null character, it is a visual placeholder - a representation of a Unicode character for which there is no associated glyph in the current font.
There are couple of possible issues:
your page does not have explicit encoding associated with it and your IE is configured to use Win-1252 as default instead of Unicode.
the font family you specify in the CSS is missing on your computer and IE fallback font is different from Firefox and does not have tha babka char.
Make sure your page explicitly specifies Unicode encoding and use the iE developer tools and Firebug to examine the actual rendered style for the bullet and see what font is being used by the two browsers.

Resources