Why some software can display all characters and some not? - windows

Reference text: どうもありがとうございました
Copied to:
Notepad/Notepad++: displays it with no problems
LibreOffice Writer: it changes the font family to work, if you convert to Lucida Console, square boxes appear
Windows: displays it with no problems
Console: it needs the correct chcp and a font family (Lucida Console displays square boxes here too) which can display them if I am right
Is it possible to explain why Notepad can display any text in any font family and LibreOffice + Console cannot? Where is(are) the difference(s)? Is it possible to have the same behaviour on the console as the Notepad does for example?

Some Windows fonts have glyphs for many different scripts, some cover a few scripts, and many cover just one. (Fonts which support many scripts are sometimes called "Unicode fonts," which can be a misleading term. In other OSes, these kinds of fonts are more prevalent. Windows itself doesn't ship with any, though I think you get one or two with the Office suite.)
When you try to output text in multiple scripts using standard Windows functions using one of the well-known fonts, then Windows uses font fallback and/or font linking, which automatically switches between fonts as needed to output the whole string. Most programs, like Notepad and Notepad++, thus get coverage automatically.
I haven't read the LibreOffice code, but I suspect that when you select a font for a span of text, it sticks with that font, effectively preventing Windows's font fallback and font linking mechanisms from helping. This isn't surprising, since a WYSIWYG editor is likely to use lower-level APIs for outputting text in order to have more typographic control. But using the lower-level APIs means you don't get fallback and linking for free, so you'd have to implement it yourself, and that's a lot of extra work that may not be important to very many users.
The Windows console has a lot of legacy and limitations that persist for backward compatibility with older programs. The console mostly emulates DOS systems, which didn't have any sort of Unicode support and instead relied on "Code Pages," which are, roughly speaking, alternate mappings between character values and glyphs. Code Pages are geared at just one (or maybe two) scripts, so if you need characters from another script, you were basically out of luck. I think modern versions of Windows have hacked in some support for a pseudo code page that supports UTF-8, but I've never gotten it to work well and it, too, has limitations.

Related

How could Windows clipboard remain format informations like color between a browser and the OneNote?

Just out of curiosity. I observed that when I copied some webpage text in Firefox that contained font size and color (set by CSS) and pasted them into OneNote, the font size and color were copied along with it. How is this formatting information transferred between the two applications?
OneNote offers several paste operations: keep the original formatting, merge formatting, and keep only the text. But this formatting information is supposed to be saved to the Windows clipboard when the copy button is pressed? I have no knowledge of Windows application development, but I assume that Firefox is the active window when I press the copy key, so it is Firefox that accepts and handles this keyboard event?
I went searching for Firefox's guidance documentation and didn't find anything related to the system clipboard.
By reading Microsoft technical documentation I learned that there are many kinds of clipboard data formats (yes, because Windows' clipboard can handle many data formats, it needs so many formats). If you want to pass data between two applications, I think this format must be one of the standard formats, but I'm not sure which one.
Or is the truth a completely different mechanism from my guess?
When an application is asked to copy something to the clipboard it can store "that something" in multiple formats simultaneously and when another application is asked to paste, it can pick from all the applicable formats.
OneNote perhaps picks CF_HTML > CF_RTF > CF_UNICODETEXT. On the other hand, when you ask it to paste without formatting it might pick CF_UNICODETEXT first (and if it is not available, manually strip the formatting from the HTML/RTF).
There are various tools that lets you see which formats are on the clipboard...

how does windows deal with drawing chars not in the current font

I have an app that is trying to display U+23CE (⏎). This is a terminal app, so we are using "Consolas"/"Cascadia"/"Courier". As far as I can see, none of these fonts have this character. And yet, in Visual Studio, when I am debugging this app, it actually displays it correctly in the debugger. Also, when displayed by the new Windows Terminal, it displays correctly. But when I use the app I am working with (actually Putty), it displays the "I don't know this character" glyph.
Putty is a classic Win32 app using ExtTextOutW() to draw that text. I have checked that the correct font is bound to the HDC.
I am assuming that Visual Studio and Windows Terminal are using DirectWrite or other more modern text output logic, but ultimately they have to be getting these unknown glyphs from somewhere.
UPDATE:
I found a font with that character ("Segue UI Symbol"), and if I set Putty to use that font, it displays the missing character (woohoo). Sadly, this is a proportional font, so it looks terrible, and this is not the solution.
#dvix pointed me at a Microsoft page discussing this exact topic, but its not clear which things are done by Windows and which by an app developer. I tried linking "Courier New" (Putty's default) to "Segoe Symbol"", but it made no difference. Does the Putty code need to do all the work itself? Detect an unknown character, read the Registry, and substitute the font for that one char? That is certainly doable, but a pain.
Windows can be directed to "borrow" missing glyphs in a font from another font that carries them using font linking. This applies to both consoles and GUI apps that use GDI (DrawText, ExtTextOut) to render text in Windows 2000 and later.
For example, the following registry entry will link the Consolas font to Segoe UI Symbol (the following can be saved as a .reg file and merged into the registry, will take effect at the next logon).
REGEDIT4
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\FontLink\SystemLink]
"Consolas"=hex(7):53,45,47,55,49,53,59,4d,2e,54,54,46,2c,53,65,\
67,6f,65,20,55,49,20,53,79,6d,62,6f,6c,00,00
; "Consolas"=REG_MULTI_SZ:"SEGUISYM.TTF,Segoe UI Symbol"
One handy tool to explore coverage of the different fonts is BabelMap. For example this is the list of fonts that carry U+23CE (⏎) on a fairly clean Win10 system.
Another feature of BabelMap is the option to create temporary user-defined composite fonts on the fly, as opposed to the ones "statically" defined in the registry. This is presumably done using the MLang
IMLangFontLink interface, more about that in Raymond Chen's How to display a string without those ugly boxes and Michael Kaplan's Font substitution and linking #2.

Cygwin displays error messages in Hebrew and garbled

I have been using Cygwin to build my Android library using the NDK's ndk-build script and Cygwin's make tool. It started giving me errors with a bunch of Latin non-English characters. When copying the text to Google, it was pasted as Hebrew (which I can read). Is there any way to force it to output errors in English? Any idea why this happens?
Check your environment variables for the correct locale. LANG or LC_MESSAGES are probably responsible. Set those to an English locale (in your profile to have that in future sessions as well) to get English error messages. Sorry, I'm a Windows person and know nearly nothing of Unix so you'd have to look up the specifics elsewhere, but this should be the general direction to go.
Some programs/libraries try to be overly smart by guessing the locale from the keyboard layout or the user's locale. And oftentimes ignoring the fact that on Windows locale and UI language are two different concepts (and that different languages on the console are even harder to get right).
As for why the messages appear garbled that's likely because the console window uses the wrong code page. The easiest fix is usually to use a TrueType font for the console window, but in this case neither Consolas nor Lucida Console include glyphs for Hebrew, so you'd only see boxes anyway.

Displaying Hebrew text in a console

How to add a new font to the console (win7), and where can I find the right font in hebrew?
I've already checked this, but it didn't help.
Thanks.
There is another alternative console - ConEmu (open source too). It may be more useful for you.
I'm an author of this utility.
Here is a short list of its advantages: proportional and bdf fonts support, ANSI X3.64 and Xterm 256 colors, run simple GUI apps in tabs, text search in console, configurable status bar, optional settings (e.g. pallette) for selected applications...
In case you just want it for short testing purposes while debugging, just use Debug.WriteLine that does support unicode (tested with heb chars only).
This will enable you to get some sort of output while debugging the program.
Just download console2. It's an alternative console for Windows.

View plain text files with different background colors in Mac OSX, for different programming languages

I work with Mac OS X Leopard. I usually have 5 or 10 text files opened at the same time with different programming languages; one for a bash script, another for a python one, etc. When I use exposé all of them look the same, so it is difficult to select them.
I wonder how could I work with just plain text files in OSX, so when they are opened in an editor the background color changes or some other thign, so when using exposé it is clear to me which window belongs to what language.
I thought about inserting some kind of info to the last line of each document, and then creat some applescript that converts it to RTF or someother text document which includes color in bacjground, so then it is opened with textmate or someother app.
Do you know a better approach for this?
Thanks
Some editors allows different settings for different programming languages. Try TextMate or BBEdit, for example...
Try smultron... it's free and I think it does what you are looking for

Resources