ncurses escape sequence detection - terminal

what is the best way to detect escape sequences in ncurses raw mode.
The thing that come to my mind is do getch and then add it to some kind of buffer and then when
the text matches known escape sequence execute appropriate command otherwise ignore the sequence.
However in this algorithm if i hit an escape sequence that system doesn't know about even if a continue typing other characters they will be considered as a part of escape sequence.
Are there any rules of how long the escape sequence can be or some standart characters that end escape sequence
Or is there a way of detecting in ncurses when user stopped typing the characters? like escape sequence usually come as sequence of characters but i have no way to detect the last character because i have a blocking getch that just blocks when there are no characters from input system
like for example if i press page down and c button i have continuous stream of
escape sequence characters ^[[6~ and then character c. how can i detect between those two that the user pressed first escape sequence and then c character if i don't have a predefined set of known escape sequences

You may wish to use libtermkey to parse the key events.
http://www.leonerd.org.uk/code/libtermkey/
Either you can connect it directly to stdin, or feed it bytes as given to you by getch().
As to the general idea of detecting the end; there are certain rules that sequences follow. Sequences starting ESC [ are CSI sequences, and end on the first octet in the range [\x40-\x7e]. There are a few other kinds but they're rare to come from a terminal.

Without a predefined set of escape sequences you will not be able to determine the end of a sequence.
Ncurses uses termcap/terminfo to enumerate the sequences and their meanings.

Related

What is the meaning of special character sequences like `\027[0K`?

I found this commit from facebook infer, and I have no idea what \027[0K and \027[%iA means.
What does these special string mean? And (I think) if there are more strings like this, where can I find the full documentation about this?
Those are escape sequences to tell your terminal what to do.
For example, the sequence of characters represented by \027[0K (where \027 is ASCII decimal value for Esc character) tells the terminal to "clear line from cursor to the end."
One helpful document/guide on this subject can be found at https://shiroyasha.svbtle.com/escape-sequences-a-quick-guide-1
The facebook code is copied from another source here, which uses hard-coded formatters imitating termcap (this page gives some background). The original has comments indicating where its information came from.
The formatter uses "%i" for integers. That's a repeat-count for the cursor movement "cursor-up" \033[A
In most languages, \033 (octal) is used for the ASCII escape character. But this source (according to the github analysis) is written in OCaml, and is using the decimal value for the ASCII escape character. According to the OCaml syntax, you could use an octal value like this: \o033
Once you see that the formatting parts (how the escape character is represented, the use of %i to format a number), the rest of this is documented in several places.
The relevant standard is ECMA-48
the termcap (or analogous terminfo) information is in the terminal database.

Space characters inside the ESC[ (0x1b/0x5b) sequence invalidate the sequence?

I can not find information about what should be done to spaces within ESC sequence. Example: position cursor
ESC[10;20H
is a valid ESC sequence, but is the one including spaces like
ESC[ 10; 20H
valid too? The point is that while ESC character is a control character with code 0x1b, text following it is human and machine readable text, and in general spaces should not harm the meaning of ESC sequence, thus I would just remove all the spaces found within ESC sequence.
Lots of article on the internet talking what ESC sequence is and what they may consist of (however there're only just few good and really informative ones), but none of them clarify this matter.
I found this one, and it says
Since ASCII control functions do not follow a structured syntax, the notation used to describe function sequences and parameters is important to avoid confusion. Escape sequences are shown with a space between each character to make them easier to read. These spaces are not part of the Escape sequence.
While it says that space char separates characters for readability, they do not say if keeping space invalidates the ESC sequence.
Is there any related RFC for it? I hope it unambiguously defines this case.
Update: thanks Thomas to pointing to space char being one of the ESC sequence operators. So now it is clear that [ should follow ESC character, and space is not allowed between them.
But what is about following arguments? As in the example above, spaces in row and column coordinates ESC[SP10;SP20H makes sequence invalid and I must stop processing it starting displaying space character instead?
Update1: I did small test using Windows telnet application. Logged into the remote server, and that server responds with ESC sequence. The result is:
ESC[2;5H positions properly row 2 column 5
ESC[ 2; 5H displays "2; 5H" in current cursor position
ESC[2 ; 5H displays "; 5H" in current cursor position
So basing on the empirical findings I suspect spaces are NOT allowed, and space char invalidates/cancels the sequence.
ECMA-48's the place to look (if you want an RFC). Look for mention of 02/00 (the way it represents the hexadecimal 0x20 for space).
For what it's worth, there are DEC control sequences (VT220 and up) with an embedded space, e.g., the ones marked with SP:
Controls beginning with ESC
This excludes controls where ESC is part of a 7-bit equivalent to 8-bit
C1 controls, ordered by the final character(s).
ESC SP F 7-bit controls (S7C1T), VT220.
ESC SP G 8-bit controls (S8C1T), VT220.
ESC SP L Set ANSI conformance level 1 (dpANS X3.134.1).
ESC SP M Set ANSI conformance level 2 (dpANS X3.134.1).
ESC SP N Set ANSI conformance level 3 (dpANS X3.134.1).
In your examples, the whitespace is just for readability, and non-printing characters such as ASCII escape (decimal 27, hexadecimal 01xb) are shown by a name such as ESC.

Why do xterm's docs call ' ' a control character?

I'm writing a parser for ANSI escape codes using xterm's docs as a guideline. Under the list of single character functions, they include:
SP Space.
Now, for most of the single character functions, I understand the purpose: BEL, for example, is going to require some special help from your terminal emulator to process, and TAB is likely to be involved in autocompletion rather than being printed as a normal character.
I can't imagine any situation where SP would need to be treated as anything other than a literal space character, so I'm considering dropping the SP control code from my parser. Would I risk anything by doing so? Is there a use for SP in the console that I'm not aware of?
Space isn't a "control" character. In ASCII, the control characters are codes 0 to 31 (space is 32), and 127 (DEL). The POSIX locale uses the same data, not coincidentally.
They are called control characters, because they allow the host (computer) to control (tell) the terminal to perform functions rather than simply print text:
A space is actually "printing" in this regard because (like all of the other ASCII characters), it advances the carriage position by one column. In the C language of course, a space is treated as non-graphic, which is a different shade of meaning. "Graphic" characters are visible.
In contrast, a TAB requires the terminal to do something special: move the carriage position by an amount that depends on where it happens to be at the moment.
"Carriage position" of course refers to printing terminals (such as those on which Unix was originally developed), or typewriters. The "carriage" (noun) is the mechanism which moved left/right to allow the terminal (or typewriter) to print at different positions along the line. "Carriage controls" in turn refer to the control characters which move the carriage left and right (other than as a side-effect of printing individual characters). It's obvious if you have ever used a typewriter...
In XTerm Control Sequences, SP is shown for clarity (to be able to reuse that name in other places, e.g., where a 32 is actually part of a control sequence). That wording was added in patch #25 to support the description of the group of controls S7C1T, S8C1T, and DECSCL — setting ANSI conformance level, none of which fall within ECMA-48.
A quick check shows 8 control sequences containing a space (which happens to be a valid intermediate byte, per ECMA-48, just like semicolon, which is visually distinct and does not require a name in the control sequences descriptions — you might find the PDF clearer than the HTML). None of those sequences are used in the obscure sense referred to in ECMA-48:
ECMA 48 section 6.1.1 is talking about overstriking one character on another to render a mixture of the two. This is very rare in video terminals, but assumed in most printing devices. The closest to this in a terminfo description might be ul (underline character overstrikes), and reviewing the few possibilities, some of those appear to be incorrect. xterm doesn't do that.
ECMA 48 section 8.3.140 in its comment about "character escapement" is referring to proportional fonts or variable-width character pitch (again, very rare in video terminals, but implemented in some printing devices). There are a few terminfo capabilities referring to pitch, and all of those are marked as "printer support". ncurses has one entry (att5310) using the cpi capability.
So: if you are referring to xterm's documentation, it is unlikely that you intend your parser for any other use than for video terminals. But if you intend it to be more general, then reading about printers would be a good way to improve your application.
ECMA 48 sheds some light on this.
tl;dr:
Some terminals may choose to differentiate between erased characters and space characters.
In terminals with variable width fonts, SP can be considered a control character that introduces a configurable amount of horizontal spacing.
Neither is really relevant today, so you're entirely free to just treat as just another character.
ECMA 48 section 6.1.1:
Depending on the implementation, there may or may not be a distinction between a character position in
the erased state and a character position imaging SPACE
ECMA 48 section 8.3.140:
SSW is used to establish for subsequent text the character escapement associated with the character
SPACE. The established escapement remains in effect until the next occurrence of SSW in the data
stream or until it is reset to the default value by a subsequent occurrence of CARRIAGE RETURN/LINE
FEED (CR/LF), CARRIAGE RETURN/FORM FEED (CR/FF), or of NEXT LINE (NEL) in the data
stream, see annex C.

Where can I find a list of terminal ANSI codes sent by Ctrl-key sequences?

I am writing some behavioural tests for code that interacts with a terminal and I need to assert behaviour on the sequence C-p C-q (ctrl-p ctrl-q). In order to do this, I need to write the raw characters to the PTY. I have a small mapping at the moment for things like C-d => 0x04, C-h => 0x08.
Is there somewhere I can get a basic mapping of human readable control sequences, mapped to raw byte sequences for xterm?
Take the ASCII value of the character (e.g., for ^H, take 72), and subtract 64. Thus, ^H is 8.
This works for any control character. Using it, you can discover that, for example, ^# is the NUL character and ^[ is ESC.

Easiest way to react on cursor keys (arrows) in interactive shell script

I'd like to write a shell script which shall be interactive and react on the use of cursor keys. My current approach seems a little complicated. I would determine the escape sequence for the cursor keys using tput kcud1 etc. and then use read -s -n 1 a to read byte-by-byte, append the byte to a collected string, and then compare the collected string to the determined escape sequences. If one matches, then I can react.
This is problematic because I've got no idea where an escape sequence ends. For instance, tput kcud1 (down arrow) here returns "\eOB". I would use a timeout in order to ignore unknown escape sequences (and probably start over if an escape character arrives). This all doesn't look nice to me.
Isn't there a simpler way of reacting on arrow key usage in a shell script?

Resources