How to display "OEM extended ASCII" characters with PDCurses?

How to display "OEM extended ASCII" characters with PDCurses? - curses

I've been trying to display the "box characters" using PDCurses but for some reason they are not available in the character set. I used a loop to print all characters from 0x00 to 0xFF (through a call to the PDCurses function printw("%c",index)) and it gives me this:
I have no idea how to display the characters that should to be in the region where PDCurses displays the question marks in the example above. Does anyone know why this happens? If it's something about the codepage how can I change the codepage? Thanks!
PS: I'm on Windows 7, and my program is compiled in MSVC 10.

You can print some box drawing characters using the curses ACS constants, like ACS_ULCORNER
In the PDCurses Documentation, search for "alternate character set".
You may need to use wprintw and WACS_[whatever].

Related

Why is facepalm emoji 🤦‍♀️ followed by U+200D♀

I am trying to send utf-8 symbols via serial device to browser and display them. I have found out when I print facepalm emoji 🤦‍♀️ (on windows 10 Win+.) it has U+200D and ♀ characters behind. Others emojis don't have that. I was using View non-printable unicode characters tool. Also I found, if you print it in notepad it will show you ♀, when you print it in browser address bar ♀ is invisible but if you press backspace you delete it. And finally, if you print it in some html text input, you can delete whole emoji with single backspace. Why is that?

Emoji sequences have more than one code point to signify variations (below may or may not look different for each sequence depending on browser):
🤦 PERSON FACEPALMING U+1F926
🤦‍♂️ MAN FACEPALMING U+1F926 U+200D U+2642 U+FE0F
🤦‍♀️ WOMAN FACEPALMING U+1F926 U+200D U+2640 U+FE0F
References:
Emoji List, v13.1 No. 260-262.
Full Emoji List, v13.1, No. 260-262 (With browser-specific images)
Unicode® Standard Annex #29, UNICODE TEXT SEGMENTATION
Some editors/browsers handle the sequences better than others and may not show differences in all variations or may not recognize the latest Unicode specfication and newer emojis.

Has anyone heard of this strange bug with the standard Windows message box?

Years ago, I was messing around with Visual Basic and I discovered a bug with the MsgBox function. I tried searching for it, but nobody had ever said anything about it. It's not just with Visual Basic though; it's with anything that uses the standard Windows MessageBox API call.
The bug is triggered when the title text has more than one character, and the first character is a lowercase 'y' with an umlaut ('ÿ'). What's so special about this character? It almost definitely not the character itself, but rather its ASCII value that's special. 'ÿ' is character 255 (0xFF), meaning it's the highest value that can be stored in an unsigned byte, and all its bits are set to 1.
What does this bug do? Well, there are two different possibilities, which depend on the number of characters in the title text. If there are an even number of characters (unless it's 2) in the title text, no message box appears, and you just hear the alert sound. If there are two characters in the title text, or any odd number other than 1 (in which case the bug wouldn't be triggered)...then this happens:
And that's not all--the message will also be truncated to one line. It seems like the kind of bug that would occur in at least one semi-high-profile incident, considering how often this API call is used. Are there any reports of this on the Internet, or anything showing what could cause it? Maybe it's a Unicode-related glitch, like that "bush hid the facts" glitch in Notepad?
I made a program in case you want to play around with this; download it here.
Alternatively, copy the following into Notepad, save it with a .vbs extension, and double-click it to display the dialog box seen above:
MsgBox "Windows 3.1 font, anyone?", 0, "ÿ ODD NUMBER!"
Or for a different font:
MsgBox "I CAN HAS CHEEZBURGER?", 0, "ÿ HImpact"
EDIT: It seems that if the first four characters are ÿ's, it doesn't ever display the message, even if there's an odd number of characters.

This is a bug with dialog templates generally. It is not a message box bug as such.
For example, in Visual Studio create the default win32 application. In the .rc file, change the caption in the template for the about box from
CAPTION "About sampleapp"
to
CAPTION "ÿT"
and the bug will manifest itself when you display the about box.
In the DLGTEMPLATEEX documentation note that the menu and class name have type sz_Or_Ord which means either a null-terminated string or 0xFFFF followed by a single word resource identifier.
Windows incorrectly applies a similar scheme to the dialog title: if the first character is 0xFF then it treats the title as being two WORDs long, but only when it is trying to locate the font information. When it is displaying the title it correctly treats the title as a string.
In other words, Windows is looking for the font information inside the title string. In most case this won't specify a valid font, so Windows defaults to the system font.
To prove this, I constructed a dialog template in memory (based on this). Once this was working I deleted the code that writes the font information to the template and used the dialog title "ÿa\xd\x200\x21SimSun". This displays the dialog in italic SimSun because windows is reading the font information from the title string.
This bug is likely a hangover from 16-bit Windows, where (I guess) 0xFF was used as the resource ID marker.

A strange bug. I suspect the symptoms are the result of the way the MessageBox() actually displays the dialog.
Internally, MessageBox() builds a dialog template dynamically. If you look at the description of a DLGTEMPLATE structure you'll find the following nugget of information:
In a standard template for a dialog box, the DLGTEMPLATE structure is
always immediately followed by three variable-length arrays that
specify the menu, class, and title for the dialog box. When the
DS_SETFONT style is specified, these arrays are also followed by a
16-bit value specifying point size and another variable-length array
specifying a typeface name.
So, the in-memory layout of a dialog template has the font specification immediately following the dialog box title.
Visual Basic does not use Unicode and so the function you're calling is actually MessageBoxA(). This is simply a thunk that converts the passed-in strings from multibyte to Unicode and then calls MessageBoxW().
I believe what's happening is that, for some reason, the conversion of that string from multibyte to Unicode is either going wrong, or returning a spurious length value. This has the knock-on effect, when the dialog template is built in memory, of corrupting the memory immediately following the title string - which, as we know, is the font specification.

How to display unicode control characters in visual studio text visualizer?

I get some text string from service, which contains Unicode control characters
(i.e \u202B or \u202A and others for Arabic language support).
But while debugging I can't see them in default text visualizer. So I need to enable display for such characters to determine which of them my text consists of. There is checkbox in text visualizer "show all characters", but it doesn't work as I expect.
Any suggestions?
Thanks in advance

Those are codes for explicit RLE and LRE order, ie if in RLE something should be displayed in LRE order.
http://unicode.org/reports/tr9/#Directional_Formatting_Codes

What does the charset parameter of CreateFont exactly set?

The code page on my Windows is set to ANSI (Latin1, Windows-1252).
I create a font with CreateFont and pass RUSSIAN_CHARSET in fdwCharSet
This is what I experience:
Windows controls (like a Static for example) using this font ignore the font's character set: the string passed to SetWindowTextA is displayed with Latin characters
After selecting this font on the DC, GDI text functions (Ext)TextOutA and DrawTextA use the character set of the font. Strings passed to them are displayed with cyrillic letters.
Why? When is the charset parameter of the font taken into account and when is it ignored? Can I force windows controls to use the font's character set?

You will have to convert the text to Unicode and call SetWindowTextW() instead of SetWindowTextA().
Make sure the window's class is registered with RegisterClassW() and not RegisterClassA(). This is what really determines if a window is Unicode. You can use IsWindowUnicode() to verify that the window is really Unicode.
Make sure you pass unhandled messages to DefWindowProcW() and not DefWindowProcA().
Or, if the window is a dialog, just make sure it is created with CreateDialogW() or DialogBoxParamW().

>Can I force windows controls to use the font's character set?
AFAIK no, you can't.
SetWindowTextA just converts the argument to Unicode, then calls SetWindowTextW: both windows kernel, shell, and GDI are unicode.
To convert the argument to Unicode, SetWindowTextA uses the setting from Window's regional options ("Language for non-Unicode programs").

Here is what is happening:
You call SetWindowText with something like "\xC4\xEE\xE1\xF0\xEE\xE5 xF3\xF2\xF0\xEE".
You're compiled as an ANSI application rather than Unicode, so that maps to a call to SetWindowTextA.
SetWindowTextA sees that the window was created in ANSI mode, so it sets the string directly. (If it had been a Unicode window, then it would converts the ANSI input string to Unicode and passes it onto SetWindowTextW.)
The ANSI window converts the ANSI string to Unicode so that it can display it. But it doesn't know the string is in a different code page than the default one for the system. It's this conversion that's changing everything back to Latin characters. It assumes that the input string is in the process's default code page (Windows 1252 in your case). So now you have a bunch of accented Latin characters instead of a string of Cyrillic ones.
The control tries to display this Unicode string using something like DrawTextW or TextOutW.
The lower level part of the system says, "Oh crap, this string is a bunch of Latin characters, but the user has chosen a Cyrillic font." To solve the problem, it uses font linking (or font fallback, I get those terms confused) to effectively select a font compatible with 1252.
You see Latin gobbledygook instead of proper Russian.
I tried coming up with a minimal way to do what you need, but I failed. My first idea was to do the conversion yourself and call SetWindowTextW directly:
void SetWindowTextRussian(HWND hwnd, char *pszCyrillic) {
const int cchCyrillic = ::lstrlen(pszCyrillic);
const int cchUnicode = 4 * cchCyrillic; // worst case
WCHAR *pszUnicode = new WCHAR[cchUnicode];
// See: http://msdn.microsoft.com/en-us/library/dd317756(v=vs.85).aspx
const UINT CP_CYRILLIC = 1251;
if (::MultiByteToWideChar(CP_CYRILLIC, 0, pszCyrillic, -1,
pszUnicode, cchUnicode) > 0) {
::SetWindowTextW(hwnd, pszUnicode);
}
delete [] pszUnicode;
}
But this doesn't work. I suspect that since the window was created as an ANSI window, the Unicode string is converted back to ANSI (assuming the wrong code page again), and then you get question marks instead of Latin nonsense.
I think you're going to have to convert to a Unicode app, or only run with the default code page set to 1251.
Update: If you control the creation of the window (e.g., you call CreateWindow directly rather than having a dialog box instantiate the controls), then you can probably make the above work by calling CreateWindowW directly and creating a Unicode window for the controls that matter.

Consider hooking gdi32full.dll GetCodePage to select the code page that you need. CP_UTF8 for example. It has single pointer param, returns single DWORD (the code page) and stdcall calling convention.

extended ascii chars and ansi in the mac osx terminal app

i wanted to create some shellscripts that display pretty ansi colored graphics for osx but unfortunately i find just very little information about that topic.
osx seems to use monaco 10 as its default console font. is there some way to find out all displayable characters for this font?
osx terminal runs in UTF-8 per default as far as i found out (can somebody confirm this?)
is there a way to show or enter the extended ascii characters on osx (how it was done on windows/dos with alt gr+entering the digits)
thanks!

In Bash, you should be able to do:
for i in {32..255}; do printf "$i "\\$(($i/64*100+$i%64/8*10+$i%8))"\n"; done | column
to get a table of ASCII characters.

Extended chars ar usually hused thorugh option+character key and shift+option+character key. I beleive terminal is utf-8 by default. Monaco should support the compelete character map. If you open up FontBook you can gain access to the char map there.
Additionally you can customize a lot of this type of thing from the preferences pane for terminal.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio