Meaning of nonprintable characters

Meaning of nonprintable characters - ascii

Sometimes, when I accidentally route nonprintable characters to the console, I get little boxes with 0's, 1's and other things in them.
What is the meaning of these boxes?
What is the meaning of the 0's and 1's?
Why show the characters this way?

Those boxes are the glyphs printed for Unicode characters which are not included in the console font you're using. The numbers indicate the code point for the character. They're shown that way so that there's a visual indication of a missing glyph in the font.

Related

NSTextView line height changes when using CJK characters

I use NSTextView to render NSAttributedStrings that may contain non-Latin characters, and it seems that lines containing any CJK character are always 6 pixels taller than lines without those. Even setting the NSParagraphStyle's minimumLineHeight property to a much higher value (e.g. 32 pixels, when using the standard system font size) retains this problem (Lines with CJK characters a rendered as 38 pixels).
Moreover, NSAttributedString's boundingRectWithSize seems to report the wrong ("correct") size (without the extra 6 pixels).
What am I missing?

Setting layoutManager.usesFontLeading to NO solved this problem.

GetCharABCWidthsFloat works for most of UNICODE, except CJKV characters

I am attempting to render a series of UNICODE characters onto a spritesheet. This all works quite well for most characters, including Cyrillic ones.
When using GetCharABCWidthsFloat with certain CJKV characters however, the ABCFLOAT::abcfB parameter provides a value lower than expected. It does not account for underhangs or overhangs, which is the exact purpose of the ABCs:
The B spacing is the width of the drawn portion of the character glyph.
Source: ABCFLOAT | Microsoft Docs
As you can see, all characters do not overlap left-to-right, except the last few characters:
I get around this by creating a customizable padding option, to handle such cases, but this bloats the rest of the glyphs and thus requires a larger surface:
Font being used is Arial. For the character 美, ABC returns (2, 10, 2), which sums to a advance of 14 pixels, when in fact, 17 pixels are needed.
I use TextOut to actually render the glyphs, but I do wonder if there is someone out there who's experienced this and came up with a universal solution.
Using functions like GetTextExtentPoint32W or DrawTextEx to get the rectangle does not allow precise per-character placement, which is the whole point of the ABC. And some unmentioned functions only work with TrueType fonts.
I question if certain characters shift to a different font under certain conditions, causing the results to be inaccurate. If that is the case, is there a way to determine if a character is not available for a font, knowing what Windows does automatically so I can reproduce the behaviour? That is, is there some sort of way to determine when a character should fall back on another font, and a way to determine what that font should be?
I have been on this problem for quite some time, so anyone with experience with these APIs would be greatly welcomed!

From the documentation on GetCharABCWidthsFloat:
The ABC widths of the default character are used for characters outside the range of the currently selected font.
Arial contains a lot of characters, including Cyrillic, but it does not contain CJKV ideographs. Other text-related calls may give you the false impression that it does have those characters (through a default/fallback font mechanism).
Before using (maybe before getting) the ABCFLOAT, you should first check that the characters you want metrics for are within the range of the currently selected font.

Is there a font or (better!) set of unicode characters representing the numbers 0-255 for displaying ASCII character codes (or other uses)?

I have strings that are mostly standard alpha-numeric and other printable ASCII characters, and would like to display these in a console window on Windows. What I'm looking for is either a console font or a unicode set of characters that represent the numbers 0-255 (0-FF) using a single glyph for each. The thing that comes closest that I am aware of is that unicode has a small set of circled numbers, 1-20, and elsewhere numbers 21-50. Something along those lines, but for 0-255 (or 0-FF) is what I'm trying to find.
It seems to me that this would be a relatively common need/desire, but I've been unable to track down a solution. Any help appreciated!

The C0 and C1 controls can be represented by control pictures. The rest of the C0 Controls and Basic Latin, and C1 Controls and Latin-1 Supplement blocks can be represented by themselves. You may have to test a few fonts to find one that supports all these characters.
However, you said "ASCII" and "0-255". But, ASCII has only 128 codepoints. Your codepoints 128-255 must be from an unnamed character set. Although you probably mean one of the well-known ones, they are so numerous that a detailed answer isn't practical.
There is also the Unicode BMP Fallback SIL font that covers U+0000 to U+FFFF (but not U+10000 to U+10FFFF).

TextOut() and the Cambria Math font

The Cambria Math font has UNICODE characters beyond 0xFFFF. You can see them in a Word document, just by inserting a Symbol and selecting the Cambria Math font. By the way, the Windows Character Map does not show these characters. My question is : how to exhibit those UNICODE characters in a Windows app using TextOut() ?

To display these supplementary code points you need to use UTF-16 surrogate pairs.
A surrogate pair is a way of representing single code points beyond 0xFFFF as two wide characters. You simply pass a surrogate pair to TextOut() and it will be displayed.

Which characters to choose when "drawing" a box on Windows console?

I'm trying to port a curses program to Windows. Now one of the problems is that the default ACS_XXXX characters become double-width on Windows console, thus breaking the alignment.
I tried looking for other characters to do the job, like '-' or '|' in basic ASCII, but none of them looks good because the line is not continuous. And finding characters to "draw" corners seems more difficult.
Are there any commonly used characters in such a situation?

I got it to work using the MingLiu font. That is, to draw boxes around Chinese characters with ASCII characters without any alignment issues.

There are border characters in the system font. This includes joints, corners, and both double and single edges. They appear in the higher positions.
Check out http://www.asciitable.com/ for details. They range from 179 to 218 (decimal) in the extended ascii table.

There are a few box drawing characters that were available in the old DOS days - you should be able to use those.
However, keep in mind that the Windows console may require some jumping through hoops to output this as Unicode, which might be a problem unless you accept that your code editor is unlikely to display the character correctly. Michael Kaplan summarizes the problem quite nicely, with information about how to get around this.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio