ANSI questions: "\x1B[?25h" and "\x1BE" - bash

What does "\x1B[?25h" do?
How is "\x1BE" different from "\n"? According to http://ascii-table.com/ansi-escape-sequences-vt-100.php it "moves to next line"? Seems like that's what "\n" does?
I tried echo "xxx\nxxx\n" and echo "xxx\x1BExxx\n" in PHP and they both output the same thing.
Any ideas?
Thanks!

These are ANSI escape sequences (also known as VT100 codes) are an early standardisation of control codes pre-dating ASCII.
The escape sequence \x1BE, or Esc+E, is NEL or "Next line", and is used on older terminals and mainframes to denote CR+LF, or \r\n.
The escape sequence \x1B[ (Esc+[) is an example of a Control Sequence Introducer. (\x9B is another single-character CSI.) The control sequence ?25h following it is used to show the cursor.
Most terminals will support these control codes; to enter escape sequences you can type Ctrl+V, Ctrl+[ which should render as ^[ (the C0 code for ESC), followed by the escape code.
References:
ANSI escape code
C0 and C1 control codes

Related

What is the meaning of special character sequences like `\027[0K`?

I found this commit from facebook infer, and I have no idea what \027[0K and \027[%iA means.
What does these special string mean? And (I think) if there are more strings like this, where can I find the full documentation about this?
Those are escape sequences to tell your terminal what to do.
For example, the sequence of characters represented by \027[0K (where \027 is ASCII decimal value for Esc character) tells the terminal to "clear line from cursor to the end."
One helpful document/guide on this subject can be found at https://shiroyasha.svbtle.com/escape-sequences-a-quick-guide-1
The facebook code is copied from another source here, which uses hard-coded formatters imitating termcap (this page gives some background). The original has comments indicating where its information came from.
The formatter uses "%i" for integers. That's a repeat-count for the cursor movement "cursor-up" \033[A
In most languages, \033 (octal) is used for the ASCII escape character. But this source (according to the github analysis) is written in OCaml, and is using the decimal value for the ASCII escape character. According to the OCaml syntax, you could use an octal value like this: \o033
Once you see that the formatting parts (how the escape character is represented, the use of %i to format a number), the rest of this is documented in several places.
The relevant standard is ECMA-48
the termcap (or analogous terminfo) information is in the terminal database.

Space characters inside the ESC[ (0x1b/0x5b) sequence invalidate the sequence?

I can not find information about what should be done to spaces within ESC sequence. Example: position cursor
ESC[10;20H
is a valid ESC sequence, but is the one including spaces like
ESC[ 10; 20H
valid too? The point is that while ESC character is a control character with code 0x1b, text following it is human and machine readable text, and in general spaces should not harm the meaning of ESC sequence, thus I would just remove all the spaces found within ESC sequence.
Lots of article on the internet talking what ESC sequence is and what they may consist of (however there're only just few good and really informative ones), but none of them clarify this matter.
I found this one, and it says
Since ASCII control functions do not follow a structured syntax, the notation used to describe function sequences and parameters is important to avoid confusion. Escape sequences are shown with a space between each character to make them easier to read. These spaces are not part of the Escape sequence.
While it says that space char separates characters for readability, they do not say if keeping space invalidates the ESC sequence.
Is there any related RFC for it? I hope it unambiguously defines this case.
Update: thanks Thomas to pointing to space char being one of the ESC sequence operators. So now it is clear that [ should follow ESC character, and space is not allowed between them.
But what is about following arguments? As in the example above, spaces in row and column coordinates ESC[SP10;SP20H makes sequence invalid and I must stop processing it starting displaying space character instead?
Update1: I did small test using Windows telnet application. Logged into the remote server, and that server responds with ESC sequence. The result is:
ESC[2;5H positions properly row 2 column 5
ESC[ 2; 5H displays "2; 5H" in current cursor position
ESC[2 ; 5H displays "; 5H" in current cursor position
So basing on the empirical findings I suspect spaces are NOT allowed, and space char invalidates/cancels the sequence.
ECMA-48's the place to look (if you want an RFC). Look for mention of 02/00 (the way it represents the hexadecimal 0x20 for space).
For what it's worth, there are DEC control sequences (VT220 and up) with an embedded space, e.g., the ones marked with SP:
Controls beginning with ESC
This excludes controls where ESC is part of a 7-bit equivalent to 8-bit
C1 controls, ordered by the final character(s).
ESC SP F 7-bit controls (S7C1T), VT220.
ESC SP G 8-bit controls (S8C1T), VT220.
ESC SP L Set ANSI conformance level 1 (dpANS X3.134.1).
ESC SP M Set ANSI conformance level 2 (dpANS X3.134.1).
ESC SP N Set ANSI conformance level 3 (dpANS X3.134.1).
In your examples, the whitespace is just for readability, and non-printing characters such as ASCII escape (decimal 27, hexadecimal 01xb) are shown by a name such as ESC.

Where can I find a list of terminal ANSI codes sent by Ctrl-key sequences?

I am writing some behavioural tests for code that interacts with a terminal and I need to assert behaviour on the sequence C-p C-q (ctrl-p ctrl-q). In order to do this, I need to write the raw characters to the PTY. I have a small mapping at the moment for things like C-d => 0x04, C-h => 0x08.
Is there somewhere I can get a basic mapping of human readable control sequences, mapped to raw byte sequences for xterm?
Take the ASCII value of the character (e.g., for ^H, take 72), and subtract 64. Thus, ^H is 8.
This works for any control character. Using it, you can discover that, for example, ^# is the NUL character and ^[ is ESC.

Using non-ASCII characters in a cmd batch file

I'm working on a .bat program, and the program is written in Finnish. The problem is that CMD doesn't know these "special" letters, such as Ä, Ö, Å.
Is there a way to make those work? I'd also like it if the user could use those letters too.
Part of my code:
#echo off
/u
title JustATestProgram
goto test123
:test123
echo Letters : Ää Öö Åå
pause
exit
When I open this file, the letters look like this:
Try putting this line at the top of the batch file:
chcp 65001
It should change the console encoding to UTF-8, and you should be able to read the file properly in the script after that.
Theoretically you just need to use the /u (Unicode) switch:
c:\>cmd /u
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.
c:\>echo Ä
Ä
If you use Notepad++, you can simply change the charset. Doing this will allow you to write letters from desired charset. The western region -US. should support it.
You can do it in a drop down menu in Notepad++ or by hand by writing chcp 437. But I recommend doing this in Notepad++ as it will show you the output as it will be in the batch. So you will then easily see if you use the right code page. And at same time it's easy to switch if you want more special symbols. You can also as stated in previous posts. Try UTF-8.
You can read more about this here: http://ss64.com/nt/chcp.html. And here's a list over different code pages (check out the OEM pages): Code Page Identifiers
The command prompt uses DOS encoding. Windows uses ANSI or Unicode.
PS I'm assuming you are in the US with code page 437 rather than international English/Western European 850.
So I used Character Map to get the DOS code then find out what ANSI character that code maps to.
This is the notepad contents.
echo Ž„™”†
Which was made by putting the DOS codes for your characters into notepad.
0142, 0132, 0153, 0148, 0143, 0134 which display as the above ANSI characters.
Command prompt output
C:\Windows\system32>echo ÄäÖöÅå
ÄäÖöÅå
Alt + Character Code [Prev | Next | Contents]
Holding down alt and pressing the character code on the numeric keypad will enter that character. The keyboard language in use must support entering that character. If your keyboard supports it the code is shown on the right hand side of the status bar in Character Map else this section of the status bar is empty. The status bar us also empty for characters with well known keys, like the letters A to Z.
However there is two ways of entering codes. The point to remember here that the characters are the same for the first 127 codes. The difference is if the first number typed is a zero of not. If it is then the code will insert the character from the current character set else it will insert a character from the OEM character set. Codes over 255 enter the unicode character and are in decimal. Characters entered are converted to OEM for Dos applications and either ANSI or Unicode depending on the Windows' application. See Converting Between Decimal and Hexadecimal.
E.G., Alt + 0 then 6 then 5 then release Alt enters the letter A
From Shortcut Keys and Key Modifiers by Me at https://1drv.ms/f/s!AvqkaKIXzvDieQFjUcKneSZhDjw

ncurses escape sequence detection

what is the best way to detect escape sequences in ncurses raw mode.
The thing that come to my mind is do getch and then add it to some kind of buffer and then when
the text matches known escape sequence execute appropriate command otherwise ignore the sequence.
However in this algorithm if i hit an escape sequence that system doesn't know about even if a continue typing other characters they will be considered as a part of escape sequence.
Are there any rules of how long the escape sequence can be or some standart characters that end escape sequence
Or is there a way of detecting in ncurses when user stopped typing the characters? like escape sequence usually come as sequence of characters but i have no way to detect the last character because i have a blocking getch that just blocks when there are no characters from input system
like for example if i press page down and c button i have continuous stream of
escape sequence characters ^[[6~ and then character c. how can i detect between those two that the user pressed first escape sequence and then c character if i don't have a predefined set of known escape sequences
You may wish to use libtermkey to parse the key events.
http://www.leonerd.org.uk/code/libtermkey/
Either you can connect it directly to stdin, or feed it bytes as given to you by getch().
As to the general idea of detecting the end; there are certain rules that sequences follow. Sequences starting ESC [ are CSI sequences, and end on the first octet in the range [\x40-\x7e]. There are a few other kinds but they're rare to come from a terminal.
Without a predefined set of escape sequences you will not be able to determine the end of a sequence.
Ncurses uses termcap/terminfo to enumerate the sequences and their meanings.

Resources