Escaping zero length characters in bash - bash

Based on Bash color how-to I've tried to output a string in grey:
printf " \[\033[1;30m\]foo\[\033[0m\]"
What I get out though is: \[\]foo\[\]
According to the link above any zero width characters need to be surrounded by \[ and \] but it seems those characters are being output.
Any idea how to ensure the \[ is handled correctly?

As #chepner mentioned in his comment: the \[ sequences are used by some utilities, including PS1, to represent the \001 and \002 control codes:
\[ => \x01 or \001
\] => \x02 or \002
printf and echo don't do this translation from \[ to \001.
So the solution was to do the conversion myself. Instead of wrapping zero length characters in \[:
echo "\[\033[1;30m\]foo\[\033[0m\]"
which would output \[foo\]
I instead output the actual control code:
echo "\x01\033[1;30m\x02foo\x01\033[0m\x02"
which outputs foo on both PS1 printf and echo.
For a concrete example see this commit on git-radar.

printf "\033[1;30mfoo\033[0m"
is enough. \033[ (ESC[) starts the ANSI escape codes and m closes them.

Related

Escape ANSI sequence of non-printing characters between \[ and \]

The bash manual says that, in the prompt, any sequences of non-printing characters should be enclosed like: \[this\]:
\[ Begin a sequence of non-printing characters.
This could be used to embed a terminal control sequence into the prompt.
\] End a sequence of non-printing characters.
Given a string to be included in the prompt, how can i automatically escape all ANSI control / color codes, to make the prompt display / wrap correctly under all circumstances?
Differentiation: Here i assume that a string with ANSI control codes has already been produced.
This related question assumes that the delimiters can be inserted by editing the string's generating function.
The following will enclose ANSI control sequences in ASCII SOH (^A) and STX (^B) which are equivalent to \[ and \] respectively:
function readline_ANSI_escape() {
if [[ $# -ge 1 ]]; then
echo "$*"
else
cat # Read string from STDIN
fi | \
perl -pe 's/(?:(?<!\x1)|(?<!\\\[))(\x1b\[[0-9;]*[mG])(?!\x2|\\\])/\x1\1\x2/g'
}
Use it like:
$ echo $'\e[0;1;31mRED' | readline_ANSI_escape
Or:
$ readline_ANSI_escape "$string"
As a bonus, running the function multiple times will not re-escape already escaped control codes.
Don't try to automate it.
The reason why Bash asks you to add the \[...\]'s manually is because the shell can't reasonably know how any given terminal will interpret any escape codes. If it was, Bash would just do it in the first place.
For example, here are a few of the many cases that the other answer fails to handle. In each case, no output is printed on my particular terminal and yet the escaping function fails to escape them:
printf '\e]1;Hello\a' # Set window title
printf '\e[0;2;0;0;0}' # 24-bit color
printf '\e[13]' # Unblank screen

Unexpected strings escape in process argv

Got kinda surprised with:
$ node -p 'process.argv' $SHELL '$SHELL' \t '\t' '\\t'
[ 'node', '/bin/bash', '$SHELL', 't', '\\t', '\\\\t' ]
$ python -c 'import sys; print sys.argv' $SHELL '$SHELL' \t '\t' '\\t'
['-c', '/bin/bash', '$SHELL', 't', '\\t', '\\\\t']
Expected the same behavior as with:
$ echo $SHELL '$SHELL' \t '\t' '\\t'
/bin/bash $SHELL t \t \\t
Which is how I need the stuff to be passed in.
Why the extra escape with '\t', '\\t' in process argv? Why handled differently than '$SHELL'? Where's this actually coming from? Why different from the echo behavior?
First I thought this to be some extras on the minimist part, but then got the same with both bare Node.js and Python. Might be missing something obvious here.
Use $'...' form to pass escape sequences like \t, \n, \r, \0 etc in BASH:
python -c 'import sys; print sys.argv' $SHELL '$SHELL' \t $'\t' $'\\t'
['-c', '/bin/bash', '$SHELL', 't', '\t', '\\t']
As per man bash:
Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard. Backslash escape sequences, if present, are decoded as follows:
\a alert (bell)
\b backspace
\e
\E an escape character
\f form feed
\n new line
\r carriage return
\t horizontal tab
\v vertical tab
\\ backslash
\' single quote
\" double quote
\nnn the eight-bit character whose value is the octal value nnn (one to three digits)
\xHH the eight-bit character whose value is the hexadecimal value HH (one or two hex digits)
\uHHHH the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHH (one to four hex digits)
\UHHHHHHHH the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHHHHHH (one to eight hex digits)
\cx a control-x character
In both python and node.js, there is a difference between the way print works with scalar strings and the way it works with collections.
Strings are printed simply as a sequence of characters. The resulting output is generally what the user expects to see, but it cannot be used as the representation of the string in the language. But when a list/array is printed out, what you get is a valid list/array literal, which can be used in a program.
For example, in python:
>>> print("x")
x
>>> print(["x"])
['x']
When printing the string, you just see the characters. But when printing the list containing the string, python adds quote characters, so that the output is a valid list literal. Similarly, it would add backslashes, if necessary:
>>> print("\\")
\
>>> print(["\\"])
['\\']
node.js works in exactly the same way:
$ node -p '"\\"'
\
$ node -p '["\\"]'
[ '\\' ]
When you print the string containing a single backslash, you just get a single backslash. But when you print a list/array containing a string consisting of a single backslash, you get a quoted string in which the backslash is escaped with a backslash, allowing it to be used as a literal in a program.
As with the printing of strings in node and python, the standard echo shell utility just prints the actual characters in the string. In a standard shell, there is no mechanism similar to node and python printing of arrays. Bash, however, does provide a mechanism for printing out the value of a variable in a format which could be used as part of a bash program:
$ quote=\"
# $quote is a single character:
$ echo "${#quote}"
1
# $quote prints out as a single quote, as you would expect
$ echo "$quote"
"
# If you needed a representation, use the 'declare' builtin:
$ declare -p quote
declare -- quote="\""
# You can also use the "%q" printf format (a bash extension)
$ printf "%q\n" "$quote"
\"
(References: bash manual on declare and printf. Or type help declare and help printf in a bash session.)
That's not the full story, though. It is also important to understand how the shell interprets what you type. In other words, when you write
some_utility \" "\"" '\"'
What does some_utility actually see in the argv array?
In most contexts in a standard shell (including bash), C-style escapes sequences like \t are not interpreted as such. (The standard shell utility printf does interpret these sequences when they appear in a format string, and some other standard utilities also interpret the sequences, but the shell itself does not.) The handling of backslash by a standard shell depends on the context:
Unquoted strings: the backslash quotes the following character, whatever it is (unless it is a newline, in which case both the backslash and the newline are removed from the input).
Double-quoted strings: backslash can be used to escape the characters $, \, ", `; also, a backslash followed by a newline is removed from the input, as in an unquoted string. In bash, if history expansion is enabled (as it is by default in interactive shells), backslash can also be used to avoid history expansion of !, but the backslash is retained in the final string.
Single-quoted strings: backslash is treated as a normal character. (As a result, there is no way to include a single quote in a single-quoted string.)
Bash adds two more quoting mechanisms:
C-style quoting, $'...'. If a single-quoted string is preceded by a dollar sign, then C-style escape sequences inside the string are interpreted in roughly the same way a C compiler would. This includes the standard whitespace characters such as newline (\n), octal, hexadecimal and unicode escapes (\010, \x0a, \u000A, \U0000000A), plus a few non-C sequences including "control" characters (\cJ) and the ESC character \e or \E (the same as \x1b). Backslashes can also be used to escape \, ' and ". (Note that this is a different list from the list of backslashable characters in double-quoted strings; here, a backslash before a dollar sign or a backtic is not special, while a backslash before a single quote is special; moreover, the backslash-newline sequence is not interpreted.)
Locale-specific Translation: $"...". If a double-quoted string is preceded by a dollar sign, backslashes (and variable expansions and command substitutions) are interpreted as with a normal double-quoted strings, and then the string is looked up in a message catalog determined by the current locale.
(References: Posix standard, Bash manual.)

Rewriting commands from history causes pieces of command and PS1 to be deleted and cursor to move unexpectedly [duplicate]

I'm using custom bash prompt to show git branch.
Everything is in /etc/bash/bashrc:
function formattedGitBranch {
_branch="$(git branch 2>/dev/null | sed -e "/^\s/d" -e "s/^\*\s//")"
# tried these:
echo -e "\e[0;91m ($_branch)"
echo -e "\e[0;91m ($_branch) \e[m"
echo -e $'\e[0;91m'"($_branch)"
echo "($_branch)"
echo "$(tput setaf 2) ($_branch) $(tput setaf 9)"
printf "\e[0;91m ($_branch)"
}
# color is set before function call
PS1='\[\033[01;34m\] \[\033[0;91m\]$(formattedGitBranch) \$\[\033[00m\] '
# color is set inside function
PS1='\[\033[01;34m\] $(formattedGitBranch) \$\[\033[00m\] '
Problem is that when I set color for $_branch in the function, my prompt will be overwritten when EOL is reached:
Tried all possible variants tput, printf, $'' notation.
I solved the problem by setting the colour only in PS1:
But..
I would like to know why it is overwriting my prompt
How to fix this issue when function is used
I'm using Gentoo Linux. GNU bash, verze 4.2.37(1)-release (i686-pc-linux-gnu)
1) I would like to know why it is overwriting my prompt
Because every non-printable characters have to be escaped by \[ and \] otherwise readline cannot keep track of the cursor position correctly.
You must put \[ and \] around any non-printing escape sequences in your prompt.
Without the \[ \] bash will think the bytes which constitute the escape sequences for the color codes will actually take up space on the screen, so bash won't be able to know where the cursor actually is.
\[ Begin a sequence of non-printing characters. (like color escape sequences). This
allows bash to calculate word wrapping correctly.
\] End a sequence of non-printing characters.
-- BashFAQ
...note the escapes for the non printing characters, these ensure that readline can keep track of the cursor position correctly. -- ss64.com
2) How to fix this issue when function is used
If you want to set colours inside a function whose output is used in PS you have two options.
Either escape the whole function call:
PS1='\[ $(formattedGitBranch) \] '
Or replace the non-printing Escape sequences inside echo. That is, replace:
\[ and \] with \001 \002
(thanks to user grawity!)
echo -e is not aware of bash's \[ \] so you have to substitute these with \001 & \002 ASCII control codes to delimit non-printable chars from printable:
function formattedGitBranch { echo -e "\001\e[0;91m\002 ($_branch)"; }
PS1='$(formattedGitBranch) '
Strings like \e[0;91m needs additional quoting, to prevent bash from calculating its length.
Enclose these strings from formattedGitBranch in \[ & \] as, \[\e[0;91m\]
You have done it correctly in other places. Just missed it in formattedGitBranch.
You have to take care of non printable character inside [\ and ] otherwise you might be getting cursor right on top of command prompt as shared in question itself , so I found something and just sharing it :-
For getting cursor after PS1 output on the same line :
few examples :
PS1='[\u#\h:\w]\$
PS1='[\[\033[0;32m\]\u#\h:\[\033[36m\]\W\[\033[0m\]]\$ '
Refer Link : syntax for bash PS1

Custom Bash prompt is overwriting itself

I'm using custom bash prompt to show git branch.
Everything is in /etc/bash/bashrc:
function formattedGitBranch {
_branch="$(git branch 2>/dev/null | sed -e "/^\s/d" -e "s/^\*\s//")"
# tried these:
echo -e "\e[0;91m ($_branch)"
echo -e "\e[0;91m ($_branch) \e[m"
echo -e $'\e[0;91m'"($_branch)"
echo "($_branch)"
echo "$(tput setaf 2) ($_branch) $(tput setaf 9)"
printf "\e[0;91m ($_branch)"
}
# color is set before function call
PS1='\[\033[01;34m\] \[\033[0;91m\]$(formattedGitBranch) \$\[\033[00m\] '
# color is set inside function
PS1='\[\033[01;34m\] $(formattedGitBranch) \$\[\033[00m\] '
Problem is that when I set color for $_branch in the function, my prompt will be overwritten when EOL is reached:
Tried all possible variants tput, printf, $'' notation.
I solved the problem by setting the colour only in PS1:
But..
I would like to know why it is overwriting my prompt
How to fix this issue when function is used
I'm using Gentoo Linux. GNU bash, verze 4.2.37(1)-release (i686-pc-linux-gnu)
1) I would like to know why it is overwriting my prompt
Because every non-printable characters have to be escaped by \[ and \] otherwise readline cannot keep track of the cursor position correctly.
You must put \[ and \] around any non-printing escape sequences in your prompt.
Without the \[ \] bash will think the bytes which constitute the escape sequences for the color codes will actually take up space on the screen, so bash won't be able to know where the cursor actually is.
\[ Begin a sequence of non-printing characters. (like color escape sequences). This
allows bash to calculate word wrapping correctly.
\] End a sequence of non-printing characters.
-- BashFAQ
...note the escapes for the non printing characters, these ensure that readline can keep track of the cursor position correctly. -- ss64.com
2) How to fix this issue when function is used
If you want to set colours inside a function whose output is used in PS you have two options.
Either escape the whole function call:
PS1='\[ $(formattedGitBranch) \] '
Or replace the non-printing Escape sequences inside echo. That is, replace:
\[ and \] with \001 \002
(thanks to user grawity!)
echo -e is not aware of bash's \[ \] so you have to substitute these with \001 & \002 ASCII control codes to delimit non-printable chars from printable:
function formattedGitBranch { echo -e "\001\e[0;91m\002 ($_branch)"; }
PS1='$(formattedGitBranch) '
Strings like \e[0;91m needs additional quoting, to prevent bash from calculating its length.
Enclose these strings from formattedGitBranch in \[ & \] as, \[\e[0;91m\]
You have done it correctly in other places. Just missed it in formattedGitBranch.
You have to take care of non printable character inside [\ and ] otherwise you might be getting cursor right on top of command prompt as shared in question itself , so I found something and just sharing it :-
For getting cursor after PS1 output on the same line :
few examples :
PS1='[\u#\h:\w]\$
PS1='[\[\033[0;32m\]\u#\h:\[\033[36m\]\W\[\033[0m\]]\$ '
Refer Link : syntax for bash PS1

What's going on with tr(anslate) here?

In applescript, if I do:
do shell script "echo \"G:\\CRE\\MV Studios\\Exhibition Projects\"|tr \"\\\\\" \"/\""
I'd expect all my backslashes to come back as forward slashes. To make it slightly easier to understand, the tr command would look like this without all the escapes
tr "\\" "/" #there's still an escaped \ for the shell
But what I get is:
"G:/CRE/MV Studiosxhibition Projects"
Note that when I copied that from Script Editor it added a weird character where the missing /E should be, it doesn't show up in the event log or once I've posted this. Obviously it's doing something weird with \E.
Any ideas on what to do about it?
It appears that echo is interpreting \E as an escape character (ASCII code 27, ESC). You can disable this with the echo -E option to disable interpretation of escape sequences.
From help echo on my Mac:
echo: echo [-neE] [arg ...]
Output the ARGs. If -n is specified, the trailing newline is
suppressed. If the -e option is given, interpretation of the
following backslash-escaped characters is turned on:
\a alert (bell)
\b backspace
\c suppress trailing newline
\E escape character
\f form feed
\n new line
\r carriage return
\t horizontal tab
\v vertical tab
\\ backslash
\0nnn the character whose ASCII code is NNN (octal). NNN can be
0 to 3 octal digits
You can explicitly turn off the interpretation of the above characters
with the -E option.

Resources