Cygwin BASH and ANSI control sequences - bash

Several things here:
Can anyone point me at C code to decode ANSI console escape sequences?
Is there a way to get Cygwin BASH to emulate a dumb old TTY?
Maybe this should be 2 questions.
Thanks.

It's a somewhat indirect answer, but the GNU ncurses library handles terminals of all sorts. One way of finding out which control sequences are applicable to ANSI terminals would be to decompile an ANSI terminal description:
infocmp ansi
This would give you the set of terminfo attributes that are used by curses programs to achieve effects on an ANSI terminal. Of course, you then have to know what those hieroglyphs mean.
On Cygwin, I got:
$ infocmp ansi
# Reconstructed via infocmp from file: /usr/share/terminfo/61/ansi
ansi|ansi/pc-term compatible with color,
am, mc5i, mir, msgr,
colors#8, cols#80, it#8, lines#24, ncv#3, pairs#64,
acsc=+\020\,\021\030.^Y0\333`\004a\261f\370g\361h\260j\331k\277l\332m\300n\305o~p\304q\304r\304s_t\303u\264v\301w\302x\263y\363z\362{\343|\330}\234~\376,
bel=^G, blink=\E[5m, bold=\E[1m, cbt=\E[Z, clear=\E[H\E[J,
cr=^M, cub=\E[%p1%dD, cub1=\E[D, cud=\E[%p1%dB, cud1=\E[B,
cuf=\E[%p1%dC, cuf1=\E[C, cup=\E[%i%p1%d;%p2%dH,
cuu=\E[%p1%dA, cuu1=\E[A, dch=\E[%p1%dP, dch1=\E[P,
dl=\E[%p1%dM, dl1=\E[M, ech=\E[%p1%dX, ed=\E[J, el=\E[K,
el1=\E[1K, home=\E[H, hpa=\E[%i%p1%dG, ht=\E[I, hts=\EH,
ich=\E[%p1%d#, il=\E[%p1%dL, il1=\E[L, ind=^J,
indn=\E[%p1%dS, invis=\E[8m, kbs=^H, kcbt=\E[Z, kcub1=\E[D,
kcud1=\E[B, kcuf1=\E[C, kcuu1=\E[A, khome=\E[H, kich1=\E[L,
mc4=\E[4i, mc5=\E[5i, nel=\r\E[S, op=\E[39;49m,
rep=%p1%c\E[%p2%{1}%-%db, rev=\E[7m, rin=\E[%p1%dT,
rmacs=\E[10m, rmpch=\E[10m, rmso=\E[m, rmul=\E[m,
s0ds=\E(B, s1ds=\E)B, s2ds=\E*B, s3ds=\E+B,
setab=\E[4%p1%dm, setaf=\E[3%p1%dm,
sgr=\E[0;10%?%p1%t;7%;%?%p2%t;4%;%?%p3%t;7%;%?%p4%t;5%;%?%p6%t;1%;%?%p7%t;8%;%?%p9%t;11%;m,
sgr0=\E[0;10m, smacs=\E[11m, smpch=\E[11m, smso=\E[7m,
smul=\E[4m, tbc=\E[3g, u6=\E[%i%d;%dR, u7=\E[6n,
u8=\E[?%[;0123456789]c, u9=\E[c, vpa=\E[%i%p1%dd,
$
The '\E' notation refers to the ESC character.
Failing that, you could look up the standard itself.

Tweaking the TERM environment variable might make applications based on terminfo/termcap avoid using advanced escape sequences. (export TERM=dumb)
I am not sure that's what you want, though.

Related

ncurses function keys only returns escape

I'm testing an ncurses program that was running under ncurses5 but recently compiled under curses6 in a new environment (putty/xterm/virtualbox) and can't get it to recognize any function keys. The arrow keys work fine but only those that use an escape sequence appear to fail.
chtype c;
initscr();
start_color();
noecho();
cbreak();
intrflush(stdscr, TRUE);
keypad(stdscr, TRUE);
c=getch();
printf("c=%d\n", (int)c);
Pressing F1 returns "c=27". I'm using putty and tried various settings with TERM set to xterm. Outside of curses, F1 returns \EOP as expected, and I'm using TERM=xterm which appears to define the function key properly in termcap. I understand the keypad() routine is suppose to cause the getch/wgetch routines to return the numeric key equivalent of 265 KEY_F(1), but I can't get anything but 27 with various combos of break, raw, notimeout, etc.
Both putty and xterm have several options for their function-keys. The default configurations for each differ, which you can see using
infocmp putty xterm
and it seems that kf1 (F1) is one of many differences, e.g., (putty on the left, xterm on the right):
kent: NULL, '\EOM'.
kf1: '\E[11~', '\EOP'.
kf13: '\E[25~', '\E[1;2P'.
kf14: '\E[26~', '\E[1;2Q'.
kf15: '\E[28~', '\E[1;2R'.
kf16: '\E[29~', '\E[1;2S'.
kf17: '\E[31~', '\E[15;2~'.
kf18: '\E[32~', '\E[17;2~'.
kf19: '\E[33~', '\E[18;2~'.
kf2: '\E[12~', '\EOQ'.
kf20: '\E[34~', '\E[19;2~'.
kf21: NULL, '\E[20;2~'.
kf22: NULL, '\E[21;2~'.
kf23: NULL, '\E[23;2~'.
kf24: NULL, '\E[24;2~'.
(Some copies of ncurses' terminal database are minimal, but there's a full database which includes the putty description).
If the terminal database doesn't show the key as you're configured, ncurses will not recognize it, and you will see the escape character.
Doh! I finally found there was an alias "alias cmd='TERM=Linux cmd'" in an old .bashrc file, so my TERM was set to linux only for the duration of the command. Stupid simple issue that took hours of debugging to figure out. Lesson learned.
At least on my copy of putty (0.76) the default TERM is set to xterm. As others have explained putty does not use the same escape codes as xterm, so this is definitely an unfortunate default. Set the term option (available in the putty menus before you open the connection) to putty and everything will work right, unless your .bashrc file also overrides the TERM (which would be highly unfortunate).
you can check what your current TERM is: echo $TERM

How to pass argument values containing non-ascii characters to CLI apps?

I'm unable to correctly pass UTF-8 string values as arguments to command line apps.
Approaches I've tried:
pass the value between double quotes: "café"
pass with single quotes: 'café'
use the char code: 'caf\233'
use a $ sign before the string: $'café'
I'm using Mac OS 10.10, iTerm and my current locale outputs:
LANG=
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
It is doubtful this has anything to do with the shell. I would make sure that your tools (both the writer tools and whatever you're reading with) correctly deal with UTF-8 at all. I would most suspect that whatever you're reading your tags with is interpreting and printing this as Latin-1. You should look inside the file with a hex editor and look for the tag. I'm betting it will be correct (C3 82, which is é in UTF-8, and é in Latin-1). Your output tool is probably the problem, not the writer (and definitely not the shell).
If your reading tool demands Latin-1, then you need to encode é as E9. The iconv tool can be useful in making those conversions for scripts.

Ruby - Win32Console - using colors changes encoding?

I have recently installed gem Win32Console for my program. The program has Polish “interface”, which includes Polish special characters. Which works fine for every
puts "Ciekawym polskim słowem jest: żółć"
However, using escape characters in order to colorize the test (which works) seem to change the encoding and Windows 7 CMD displays such diacritic marks incorrectly:
green = "\e[1;32;40m"
puts "#{green}Ciekawym polskim słowem jest: żółć"
Honestly, with my limited knowledge of hot Ruby treats different encoding, I don't really even know where to start - is that a problem with Ruby, Win32Console or Command Prompt itself?
Windows console does not support ASCII escape sequence (\e[...) at all. (ANSI escape code - Wikipedia).
Turns out it was the gem I installed. I later found out that Ruby 2.0 and higher has built-in support for escape codes and it works just fine with UTF-8.

Can a makefile contain UTF8 chracters?

I'm currently trying to figure out if anything can be done about dmake resulting in this error message on a makefile with a simple filename containing utf8 characters:
Name contains non-printable character [0xffffffe0]
In my research i've been unable to find any mention of whether GNU make or dmake are even supposed to be able to handle makefiles with UTF8 characters in them.
Thus my question is: Can a makefile can contain UTF8 characters and if that answer is known, where is that documented?
To answer myself:
GNU make can deal with UTF-8 just fine.
dmake, being a mostly abandoned reimplementation of make, can only deal with ASCII.
Make on windows does not work with UTF-8. You will get the "missing separator" even with a blank file. Use notepad.exe to convert the makefile to ANSI. NOTE: there is a little dropdown list box next to the save button.

Windows Perl --> Unix not working after port, possible encoding issue

I've got a Perl program that I wrote on Windows. It starts with:
$unused_header = <STDIN>;
my #header_fields = split('\|\^\|', $unused_header, -1);
Which should split input that consists of a very large file of:
The|^|Quick|^|Brown|^|Fox|!|
Into:
{The, Quick, Brown, Fox|!|}
Note: This line just does the headre alone, theres another one like it to do the repetitive data lines.
It worked great on windows, but on linux it fails. However, if I define a string with the same contents within Perl, and run the split on that, it works fine.
I think it's a UTF-16 encoding handling issue, but I'm not sure how to handle it. Does anyone know how I can get perl to understand the UTF-16 being piped into STDIN?
I found: http://www.haboogo.com/matching_patterns/2009/01/utf-16-processing-issue-in-perl.html but I'm not sure what to do with it.
If STDIN is UTF-16, use one of the following
binmode(STDIN, ':encoding(UTF-16le)'); # Byte order used by Windows.
binmode(STDIN, ':encoding(UTF-16be)'); # The other byte order.
binmode(STDIN, ':encoding(UTF-16)'); # Use BOM to determine byte order.
Tom has written a lengthy answer with regards to perl and unicode. It contains some bolierplate code to properly and fully support UTF-8, but you can replace with UTF-16 as needed.
I doubt it's a UTF-xx encoding issue, as neither Windows Perl nor Unix Perl will try to read data with those encodings unless you tell it to.
If the Unix script is reading the exact same file as the Windows script but behaves differently, maybe it's a line-ending issue. The dos2unix command on most Unix-y systems can change the line endings on a file, or you can strip off the line-endings yourself in the Perl script
$unused_header = <STDIN>;
$unused_header =~ s/\r?\n$//; # chop \r\n (Windows) or \n (Unix)

Resources