Does the Windows console supporsts ANSI control characters?
It doesn't support many ANSI control characters by default (which is also mentioned in the wikipedia article http://en.wikipedia.org/wiki/ANSI_escape_code), but there are ways to make that possible.
Look into the answers to this question: How to load ANSI escape codes or get coloured file listing in WinXP cmd shell?
You might happen upon something useful.
I assume you're referring to ASCII control characters.
The answer is "some". You can read backspace keypresses, for example, and you can pipe-in things like the ASCII "Bell" character.
However if you mean that the Windows console automatically resolves escaped characters, such as converting "\b" into "Bell", then no, you have to do that yourself.
Note that I speak about entering keypresses directly into the console and not batch files, for that see #ProblemFactory's answer.
Related
I am currently making an app that deals with Ancient Greek (Unicode extended) characters. To put it simple, the user types a word in Greek and the program analyses it morphologically and shows all its declension, i. e. all its possible forms. Pretty simple, I am new to this. I made it with Latin (a language with no special Unicode characters) a week ago and it works perfectly.
I work with Ruby 3.0.2 and the Command Prompt attached to its installation file. I can write code using Greek Unicode letters (like "puts "ἀγαθός"") and they are displayed on the Command Prompt without problem. UTF-8 works fine there. I think the default codepage is UTF-8 for the .rb file.
The problem is when the Command Prompt tries to recognize the letters the user has written. To test recognition, when the user writes a word, the program shows again the letters written (I wrote them using the Windows Polytonic Greek Keyboard). To my disgrace these letters are only squares with question marks inside them. After a while, I couldn't write any single Greek character (not even non-Unicode-extended) because the command prompt crashes after doing it.
One solution I made was to change, before running the program, the codepage to Windows-1253, which supports Greek characters but not Unicode characters. That made possible to write Greek common characters and the command prompt recognizes them well. But of course it continues crashing if I dare to write a single alpha with spiritus asper.
But I really would like to use UTF-8 for everything, and I don't know why the program is doing this. Of course Windows Powershell does the same.
I hope I have explained the problem well. Sorry if my language is not appropriate, but I hope you have got the point. Thanks!
In the following code, the ü is not the single Unicode character U+00FC but is a single grapheme cluster composed of two Unicode characters, the plain ASCII u U+0075 followed by the combining diaeresis U+0308.
fmt.Println("Jürgen Džemal")
fmt.Println("Ju\u0308rgen \u01c5emel")
If I run it in the go playground, it works as expected.
If I run it in a MS Windows 10 "Command Prompt" window, it doesn't visually combine the combining character with the prior character.
However when I cut and paste the text into here it appears correctly:
C:\> ver
Microsoft Windows [Version 10.0.17134.228]
C:\> test
Jürgen Džemal
Jürgen Džemel
On screen, in the "Command Prompt" window it looked more like:
Ju¨rgen Džemel
Changing the code page (chcp) from 850 to 65001 made no difference. Changing fonts (Consolas, Courier etc) made no difference.
In the past I have experienced problems that were fundamentally because Microsoft require Windows programs to use a different API to output characters to STDOUT depending on whether STDOUT is attached to a console or to a file. I don't know if this is a different manifestation of the same issue.
Is there something I can do to make this Unicode grapheme-cluster appear correctly?
As eryksun and Peter commented,
The Windows console (conhost.exe) doesn't support combining codes. You'll have to first normalize to an equivalent string that uses precomposed characters.
you can use golang.org/x/text/unicode/norm to do the normalization (e.g. norm.NFC.String("Jürgen Džemal"))
I tried this
s := "Ju\u0308rgen \u01c5emel"
fmt.Println(s) // dieresis not combined with u by conhost.exe
s = norm.NFC.String(s)
fmt.Println(s) // shows correctly
And the output looked like this
or, for the visually impaired with fabulously sophisticated screen readers - a bit like this:
Ju¨rgen Džemel
Jürgen Džemel
Note that Unicode has four different normalised forms but NFC is the most used on the Internet in web-pages and is also appropriate for this situation.
There are other methods in this package that may be more efficient or more useful
I read there are visual-characters in use which can only be represented in Unicode using combining characters. In other words for which there is no precomposed character. A more thorough approach would be needed to do something appropriate with those. Essentially the complications of Unicode (or perhaps more accurately of human languages and their typography) are almost without end. It sometimes seems that way to me.
References
https://blog.golang.org/normalization
https://godoc.org/golang.org/x/text/unicode/norm
https://learn.microsoft.com/en-us/windows/desktop/intl/using-unicode-normalization-to-represent-strings
For example, several characters used in writing Lithuanian have double diacritics, as they have only decomposed forms. An example is lowercase U with macron and tilde ("ū̃", U+016b U+0303, where the first code point is a lowercase U with macron and the second is a combining acute accent).
Open up irb and
type gets. It should work fine.
Then try system("choice /c YN") It should work as expected.
Now try gets again, it behaves oddly.
Can someone tell me why this is?
EDIT: For some clarification on the "odd" behavior, it allows me to type for gets, but doesn't show me the characters and I have to press the enter key twice.
Terminal input-output handling is dark and mysterious art. Anyone trying to make colorized output of bash work in windows PowerShell via ssh knows that. (And various shortcutting habits like Ctrl+Backspace only make things worse.)
One of the possible reasons for your problem is special characters handling. Every terminal out there can type characters in number of different modes, and it parses its own output in search for certain character sequences in order to toggle states.
F.e. here one can find ANSI escape code sequences, one of possible supported standards among different kind of terminals.
See there Esc[5;45m? That will make all the following output to blink on magenta background. And there is significantly more stuff like that out there.
So, the answer to your question taken literally is — your choice command messes something with output modes using special escape sequences, and ruby's gets breaks in that quirk special mode of terminal operation.
But more useful will be the link to HighLine gem documentation. Why one might want to implement platform-specific and obtrusive behavior when it is possible to implement the same with about 12 LOC? All the respect for the Gist goes to botimer, I've only stumbled into his code using search.
This is quite a low-level (low in the sense of "closer to the metal") question.
I was wondering if any of you could point me to documentation, explanations, etc. of how, upon receiving a Unicode character (or any character code, but I'm particularly interested in the Unicode Standard) the console in Windows, good ol' cmd.exe (using, say, codepage 65001) and xterm in Linux started with, say, LC_CTYPE=en_US.UTF-8 look up the corresponding glyph (and where).
I know it may be harder to know in Windows, but I can't really find much information.
Thank you.
As far as I can tell, cmd.exe is bound to whatever 256-character code page you defined as the "codepage for non-Unicode programs" or whatever it was called.
To elaborate, if I set the above setting to Japanese, cmd.exe suddenly replaces backslashes with yen signs (as does every other non-Unicode app on the system) and correctly interprets ShiftJIS codes, for example. Setting it to Dutch gives me an accented I (I forgot which), while another codepage would give a half-filled vertical solid instead on the same character.
Not Unicode. Unicode would let me do all three at the same time.
The console uses a TextWriter with an encoding created from the codepage. That means that the characters written are encoded into bytes using the specific Encoding object for the codepage.
the console doesn't support Unicode. :)
I've made a program in MVSC++ which outputs memory contents (in ASCII). The ASCII I see in windows console seem to match what I see in various ASCII tables (smiley, diamond, club, right arrow etc). This program needs to compile under Linux (which is does), but the ASCII output looks completely different. A few symbols are the same but the rest are so different. Is there any way to change how terminal displays ASCII code?
EDIT: The program executes correctly, it's just the ASCII that is being displayed differently.
ASCII defines character codes from 0x00 through 0x7f. Everything else (0x80-0xff) is not part of the ASCII standard and depends on what the operating system defines as the characters to display. However, the characters you mention (smiley, diamond, club, etc) are the representations of the ASCII "control characters" that don't normally have a visual representation. Windows lets you print such characters and see the glyphs it has defined for them, but your Linux is probably interpreting the control characters as formatting control codes (which they are) instead of printing corresponding glyphs.
What you are seeing is the "extended" character set that IBM initially included when PCs were first unleashed upon the world. Yes, we are going back to the age of mighty dinosaurs, so bear with me. These characters live above $7F and the interpretation of their symbols on the screen can even be influenced by the font chosen. Most linux distros are now using UTF-8 (or something close) and as such, the fonts installed may have completely different symbols, or even missing glyphs. In cases where you are comparing "ASCII" representations (which is a misnomer, as it's not really true ASCII) of the same data, it may or may not exactly match, as you must have the same "glyph" renderings in both display fonts to correctly see similar representations. Try getting both your Windows and Linux installs to use the same font if possible, and then see if there is a change.
If your browser supports Unicode (and you have the correct fonts installed), you will see them bellow.
You can copy and paste into an editor with unicode support(Notepad). Save as UTF-16BE
Then if you open in a HexEditor you will see all the unicode codes for each char visible glyph.
In example the first ascii char Null has Unicode visible glyph 0x2639
in c\c++\java you can use it like \u2639.
Its not a null char but the visual representation.
http://en.wikipedia.org/wiki/Code_page_437
☹☺☻♥♦♣♠•◘○◙♂♀♪♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼ !"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~⌂ÇüéâäàåçêëèïîìÄÅÉæÆôöòûùÿÖÜ¢£¥₧ƒáíóúñѪº¿⌐¬½¼¡«»░▒▓│┤╡╢╖╕╣║╗╝╜╛┐└┴┬├─┼╞╟╚╔╩╦╠═╬╧╨╤╥╙╘╒╓╫╪┘┌█▄▌▐▀αßΓπΣσµτΦΘΩδ∞φε∩≡±≥≤⌠⌡÷≈°∙·√ⁿ²■⓿