Which characters to choose when "drawing" a box on Windows console?

Which characters to choose when "drawing" a box on Windows console? - windows

I'm trying to port a curses program to Windows. Now one of the problems is that the default ACS_XXXX characters become double-width on Windows console, thus breaking the alignment.
I tried looking for other characters to do the job, like '-' or '|' in basic ASCII, but none of them looks good because the line is not continuous. And finding characters to "draw" corners seems more difficult.
Are there any commonly used characters in such a situation?

I got it to work using the MingLiu font. That is, to draw boxes around Chinese characters with ASCII characters without any alignment issues.

There are border characters in the system font. This includes joints, corners, and both double and single edges. They appear in the higher positions.
Check out http://www.asciitable.com/ for details. They range from 179 to 218 (decimal) in the extended ascii table.

There are a few box drawing characters that were available in the old DOS days - you should be able to use those.
However, keep in mind that the Windows console may require some jumping through hoops to output this as Unicode, which might be a problem unless you accept that your code editor is unlikely to display the character correctly. Michael Kaplan summarizes the problem quite nicely, with information about how to get around this.

Related

GetCharABCWidthsFloat works for most of UNICODE, except CJKV characters

I am attempting to render a series of UNICODE characters onto a spritesheet. This all works quite well for most characters, including Cyrillic ones.
When using GetCharABCWidthsFloat with certain CJKV characters however, the ABCFLOAT::abcfB parameter provides a value lower than expected. It does not account for underhangs or overhangs, which is the exact purpose of the ABCs:
The B spacing is the width of the drawn portion of the character glyph.
Source: ABCFLOAT | Microsoft Docs
As you can see, all characters do not overlap left-to-right, except the last few characters:
I get around this by creating a customizable padding option, to handle such cases, but this bloats the rest of the glyphs and thus requires a larger surface:
Font being used is Arial. For the character 美, ABC returns (2, 10, 2), which sums to a advance of 14 pixels, when in fact, 17 pixels are needed.
I use TextOut to actually render the glyphs, but I do wonder if there is someone out there who's experienced this and came up with a universal solution.
Using functions like GetTextExtentPoint32W or DrawTextEx to get the rectangle does not allow precise per-character placement, which is the whole point of the ABC. And some unmentioned functions only work with TrueType fonts.
I question if certain characters shift to a different font under certain conditions, causing the results to be inaccurate. If that is the case, is there a way to determine if a character is not available for a font, knowing what Windows does automatically so I can reproduce the behaviour? That is, is there some sort of way to determine when a character should fall back on another font, and a way to determine what that font should be?
I have been on this problem for quite some time, so anyone with experience with these APIs would be greatly welcomed!

From the documentation on GetCharABCWidthsFloat:
The ABC widths of the default character are used for characters outside the range of the currently selected font.
Arial contains a lot of characters, including Cyrillic, but it does not contain CJKV ideographs. Other text-related calls may give you the false impression that it does have those characters (through a default/fallback font mechanism).
Before using (maybe before getting) the ABCFLOAT, you should first check that the characters you want metrics for are within the range of the currently selected font.

Is there a font or (better!) set of unicode characters representing the numbers 0-255 for displaying ASCII character codes (or other uses)?

I have strings that are mostly standard alpha-numeric and other printable ASCII characters, and would like to display these in a console window on Windows. What I'm looking for is either a console font or a unicode set of characters that represent the numbers 0-255 (0-FF) using a single glyph for each. The thing that comes closest that I am aware of is that unicode has a small set of circled numbers, 1-20, and elsewhere numbers 21-50. Something along those lines, but for 0-255 (or 0-FF) is what I'm trying to find.
It seems to me that this would be a relatively common need/desire, but I've been unable to track down a solution. Any help appreciated!

The C0 and C1 controls can be represented by control pictures. The rest of the C0 Controls and Basic Latin, and C1 Controls and Latin-1 Supplement blocks can be represented by themselves. You may have to test a few fonts to find one that supports all these characters.
However, you said "ASCII" and "0-255". But, ASCII has only 128 codepoints. Your codepoints 128-255 must be from an unnamed character set. Although you probably mean one of the well-known ones, they are so numerous that a detailed answer isn't practical.
There is also the Unicode BMP Fallback SIL font that covers U+0000 to U+FFFF (but not U+10000 to U+10FFFF).

MacOS CALayer Character Spacing

I have a problem which I hope you can help me solving.
I'm creating a program using Xamarin.Mac (C# for Mac) and I need to draw a DNA sequence (ATGC and so on). However, I need to know the exact position of each character so I can draw several other objects which should be aligned with the characters in DNA sequence.
Screenshot of the Windows version of my app which illustrates the behavior I'm looking for:
Currently I'm looking to use the CALayer drawing method, which appears to be fast enough to render 12 lines of 70 characters in less than 50 ms. CALayers are not fast enough to render 1000 CATextLayers with one (A/T/G/C) character each, so (I think) I need to render them as lines with specific spacing. This means that I need to have exactly 10 (example) pixels between the center of each character.
However, I cannot find a way to do this.
The NSAttributedString Kerning seems be added to an unknown existing tracking (or spacing) of the font, and thus may be used with monospace fonts but still results in an unknown actual spacing.
I CAN get around the issue by trial and error until the letter spacing appears to match the desired spacing, but I'm not very confident in robustness of this method across different devices (screen resolutions). This requires that I use a monospace font, which is okay, but not optimal.
Is it possible to have specific character spacing using a single CATextLayer and what are my options if not? is it possible to have 1000 characters drawn individually without a huge performance impact?
Thank you.

Meaning of nonprintable characters

Sometimes, when I accidentally route nonprintable characters to the console, I get little boxes with 0's, 1's and other things in them.
What is the meaning of these boxes?
What is the meaning of the 0's and 1's?
Why show the characters this way?

Those boxes are the glyphs printed for Unicode characters which are not included in the console font you're using. The numbers indicate the code point for the character. They're shown that way so that there's a visual indication of a missing glyph in the font.

Algorithm for separating Japanese characters one by one from an image using OpenCV

I have an application that needs separating Japanese characters one by one from an image.
Input: an image with ONE line of Japanese text. It can have halfwidth Katakana, halfwidth numbers, fullwidth Katakana, Hiragana and numbers as well. Maybe halfwidth or fullwidth English characters as well. (let's forget about English characters for the moment)
Issue:
I can easily separate out the characters by using adaptive thresholding, dilating and eroding. But there is one big issue.
Some of the Japanese characters have a space in between them. Like　川, 体, 休, 非. So simply looking at vertical white gaps doesn't help. Finding the width doesn't help either because there can be fullwidth characters (2btyte) or halfwidth characters (1byte). i seem to need an exquisite way to do this.
any idea how i should proceed with this? any idea is a good idea :)
here are couple of sample images. (characters circled in red are the problematic ones)
http://imageshack.us/a/img833/3810/e31z.png
http://imageshack.us/a/img12/2395/7mqn.png

Don't expect to find one single simple algorithm able to do what you want, be prepared to combine a handful of techniques, including, but not limiting to those you already mentioned.
My personal advice, taken out of previous personal experience, would be for you to take a look at template matching techniques.
Basicaly that's what you'll need to do:
Select a few sample images of each symbol you want to identify to form your templates database.
Develop an algorithm to segment each individual character out of the image. That I think you've acomplished already.
Here it is important that you scale the characters and normalize their perspective so that they match the exact conditions on which the templates were generated. getperspectivetransform and warpPerspective might come in handy.
Compare each character against each of your templates using cv::matchTemplate for example.
Out of the top matches do some fine selection using heuristics like those you mentioned yourself, namely, checking for the existance of gaps on expected places and so on.
Test and retest, refining the heuristics for the closest cases till you reach the desired accuracy.
If you find yourself dealing with too much variety in terms of lighting conditions, characters colors, fonts, sizes and so on, you'll realize you'll be needing a huge database to cover all the various possibilities. In this case, it might help to use some transform invariant to the varying conditions. For character identification I believe skeletonization could work well. Take a look at topological skeleton and morphological skeleton and also here for a brief example.

Hope OCR is what you need to do. As this link says opencv doesnt support OCR. But there is another opensource tesseract which will do this. Just check if this helps.
Few more links I got on googling.
Opencv OCR
OCR exaple in Opencv
Hope this helps!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio