What characters does $# count? - bash

I'm having trouble determining what characters '$#' actually counts
Example:
bold=$'\e[1m'
red=$'\e[0;31m'
clr=$'\e[0m'
string="${red}[!]${clr} ${bold}Warning:${clr} foo bar"
printf "String count: %s\n" "${#string}"
Output:
String count: 39
The length of string with its variables is 45 characters. The length of string with its variables substituted is 43 characters (i.e. \e[0;31m in place of ${red}, etc.).
So, what characters is the shell not counting for it to output 39 as the total length of the string?

To simplify the answer, consider the strings '\n', and $'\n'
This first is 2 characters. The second is single character (newline).
As per bash man:
Words of the form $'string' are treated specially. The word expands
to string, with backslash-escaped characters replaced as specified by
the ANSI C standard. Backslash escape sequences, if present, are decoded as follows:
...
\n new line
\r carriage return
\t horizontal tab
...
Replacement of backslash-escaped characters also occur with echo -e, evaluation of various dynamic parameters (TIMEFORMAT, PS2, ..). Many utilities will parse escape sequences (e.g., sed, tr), sometimes with different rules. Unfortunately, those situation, in many cases dictated by backward compatibility can be confusing and may conflict with each other, sometimes resulting in having to double, triple and sometimes quadruple escapes, when strings have to be passed to external programs.

Related

Brace needs to be escaped with \ inside single quotes

I expect the following to work:
ls -l | grep '^.{38}<some date>'
It should give me the files which have said date in modification time. But it does not work. The following works:
ls -l | grep '^.\{38\}<some date>'
Isn't '...' supposed to turn off special meaning for all the meta characters? Why should we have to escape braces?
The regular expression .{38}, as interpreted here by grep, matches an arbitrary string of exactly 38 characters. To match literal braces, you need to escape them.
.\{38\}
In order to ensure that that exact 7-character sequence is seen by grep, you need to quote the string so that the shell doesn't perform quote removal and reduce it to .{38} before grep gets a chance to see it.
Misunderstanding the question, it appears grep is using basic regular expressions, in which unescaped braces are the literal characters and the escaped ones introduce a brace expression. In extended regular expressions, it's the other way around. In either case, though, the single quotes are protecting all enclosed characters from special treatment by the shell; whether grep treats them specially is another question.
There are many variants of regular expression syntax. By default, grep uses the "basic" ("BRE" or "obsolete") regular expression syntax, in which braces must be escaped to be treated as repetition bounds (what you're trying to do here); without the escapes, they're treated as just literal characters. In the "extended" ("ERE" or "modern"), Perl-compatible ("PCRE"), and ... well, pretty much all other variants, it's the other way around: escaped braces are treated as literal characters, and unescaped ones define repetition bounds.
grep '^.{38}<some date>' # Matches any character followed by literal braces around "38"
grep '^.\{38\}<some date>' # Matches 38 characters
grep -E '^.{38}<some date>' # Matches 38 characters (-E invokes "extended" syntax)
egrep '^.{38}<some date>' # Matches 38 characters (egrep uses "extended" syntax)
BTW, parentheses are the same: literal unless escaped in the basic syntax, literal if escaped in the extended syntax. And there are a few other differences; see the re_format man page. There are also many other syntax variants (Perl-compatible, etc). It's important to know what variant the tool you're using accepts, and format your RE appropriately for it.
BTW2, as #Charles Duffy pointed out in a comment, parsing ls output isn't a good idea. In this case, the number of characters before the date will depend on the width of other fields (user, group, size), which will not be consistent, so skipping 38 characters might skip part of the date field or not skip enough. You'd be much better off using something like find with the -mtime or -mmin tests, or at least using stat instead of ls (since you can control the fields with the format string, and e.g. put the date at the beginning of the line) (but stat will still have some of ls's other problems).

Unexpected strings escape in process argv

Got kinda surprised with:
$ node -p 'process.argv' $SHELL '$SHELL' \t '\t' '\\t'
[ 'node', '/bin/bash', '$SHELL', 't', '\\t', '\\\\t' ]
$ python -c 'import sys; print sys.argv' $SHELL '$SHELL' \t '\t' '\\t'
['-c', '/bin/bash', '$SHELL', 't', '\\t', '\\\\t']
Expected the same behavior as with:
$ echo $SHELL '$SHELL' \t '\t' '\\t'
/bin/bash $SHELL t \t \\t
Which is how I need the stuff to be passed in.
Why the extra escape with '\t', '\\t' in process argv? Why handled differently than '$SHELL'? Where's this actually coming from? Why different from the echo behavior?
First I thought this to be some extras on the minimist part, but then got the same with both bare Node.js and Python. Might be missing something obvious here.
Use $'...' form to pass escape sequences like \t, \n, \r, \0 etc in BASH:
python -c 'import sys; print sys.argv' $SHELL '$SHELL' \t $'\t' $'\\t'
['-c', '/bin/bash', '$SHELL', 't', '\t', '\\t']
As per man bash:
Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard. Backslash escape sequences, if present, are decoded as follows:
\a alert (bell)
\b backspace
\e
\E an escape character
\f form feed
\n new line
\r carriage return
\t horizontal tab
\v vertical tab
\\ backslash
\' single quote
\" double quote
\nnn the eight-bit character whose value is the octal value nnn (one to three digits)
\xHH the eight-bit character whose value is the hexadecimal value HH (one or two hex digits)
\uHHHH the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHH (one to four hex digits)
\UHHHHHHHH the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHHHHHH (one to eight hex digits)
\cx a control-x character
In both python and node.js, there is a difference between the way print works with scalar strings and the way it works with collections.
Strings are printed simply as a sequence of characters. The resulting output is generally what the user expects to see, but it cannot be used as the representation of the string in the language. But when a list/array is printed out, what you get is a valid list/array literal, which can be used in a program.
For example, in python:
>>> print("x")
x
>>> print(["x"])
['x']
When printing the string, you just see the characters. But when printing the list containing the string, python adds quote characters, so that the output is a valid list literal. Similarly, it would add backslashes, if necessary:
>>> print("\\")
\
>>> print(["\\"])
['\\']
node.js works in exactly the same way:
$ node -p '"\\"'
\
$ node -p '["\\"]'
[ '\\' ]
When you print the string containing a single backslash, you just get a single backslash. But when you print a list/array containing a string consisting of a single backslash, you get a quoted string in which the backslash is escaped with a backslash, allowing it to be used as a literal in a program.
As with the printing of strings in node and python, the standard echo shell utility just prints the actual characters in the string. In a standard shell, there is no mechanism similar to node and python printing of arrays. Bash, however, does provide a mechanism for printing out the value of a variable in a format which could be used as part of a bash program:
$ quote=\"
# $quote is a single character:
$ echo "${#quote}"
1
# $quote prints out as a single quote, as you would expect
$ echo "$quote"
"
# If you needed a representation, use the 'declare' builtin:
$ declare -p quote
declare -- quote="\""
# You can also use the "%q" printf format (a bash extension)
$ printf "%q\n" "$quote"
\"
(References: bash manual on declare and printf. Or type help declare and help printf in a bash session.)
That's not the full story, though. It is also important to understand how the shell interprets what you type. In other words, when you write
some_utility \" "\"" '\"'
What does some_utility actually see in the argv array?
In most contexts in a standard shell (including bash), C-style escapes sequences like \t are not interpreted as such. (The standard shell utility printf does interpret these sequences when they appear in a format string, and some other standard utilities also interpret the sequences, but the shell itself does not.) The handling of backslash by a standard shell depends on the context:
Unquoted strings: the backslash quotes the following character, whatever it is (unless it is a newline, in which case both the backslash and the newline are removed from the input).
Double-quoted strings: backslash can be used to escape the characters $, \, ", `; also, a backslash followed by a newline is removed from the input, as in an unquoted string. In bash, if history expansion is enabled (as it is by default in interactive shells), backslash can also be used to avoid history expansion of !, but the backslash is retained in the final string.
Single-quoted strings: backslash is treated as a normal character. (As a result, there is no way to include a single quote in a single-quoted string.)
Bash adds two more quoting mechanisms:
C-style quoting, $'...'. If a single-quoted string is preceded by a dollar sign, then C-style escape sequences inside the string are interpreted in roughly the same way a C compiler would. This includes the standard whitespace characters such as newline (\n), octal, hexadecimal and unicode escapes (\010, \x0a, \u000A, \U0000000A), plus a few non-C sequences including "control" characters (\cJ) and the ESC character \e or \E (the same as \x1b). Backslashes can also be used to escape \, ' and ". (Note that this is a different list from the list of backslashable characters in double-quoted strings; here, a backslash before a dollar sign or a backtic is not special, while a backslash before a single quote is special; moreover, the backslash-newline sequence is not interpreted.)
Locale-specific Translation: $"...". If a double-quoted string is preceded by a dollar sign, backslashes (and variable expansions and command substitutions) are interpreted as with a normal double-quoted strings, and then the string is looked up in a message catalog determined by the current locale.
(References: Posix standard, Bash manual.)

Strings encapsulated like $"Hello World." [duplicate]

This question already has an answer here:
Dollar character before double quotes in Bash
(1 answer)
Closed 5 years ago.
I was just reading through the /etc/init.d/httpd on a Centos 6.5 box and noticed that all of the strings seem to be quoted like $"Hello World.". I've never seen this syntax before, and I can't seem to turn up anything via google.
Excerpt:
if ! LANG=$HTTPD_LANG $httpd $OPTIONS -t >&/dev/null; then
RETVAL=6
echo $"not reloading due to configuration syntax error"
failure $"not reloading $httpd due to configuration syntax error"
What's the deal?
From man bash
Words of the form $'string' are treated specially. The word expands to
string, with backslash-escaped characters replaced as specified by the
ANSI C standard. Backslash escape sequences, if present, are decoded
as follows:
\a alert (bell)
\b backspace
\e
\E an escape character
\f form feed
\n new line
\r carriage return
\t horizontal tab
\v vertical tab
\\ backslash
\' single quote
\" double quote
\nnn the eight-bit character whose value is the octal value
nnn (one to three digits)
\xHH the eight-bit character whose value is the hexadecimal
value HH (one or two hex digits)
\uHHHH the Unicode (ISO/IEC 10646) character whose value is the
hexadecimal value HHHH (one to four hex digits)
\UHHHHHHHH
the Unicode (ISO/IEC 10646) character whose value is the
hexadecimal value HHHHHHHH (one to eight hex digits)
\cx a control-x character
The expanded result is single-quoted, as if the dollar sign had not
been present.
A double-quoted string preceded by a dollar sign ($"string") will cause
the string to be translated according to the current locale. If the
current locale is C or POSIX, the dollar sign is ignored. If the
string is translated and replaced, the replacement is double-quoted.

understanding $' ' quotes in bash

At least in bash pattern substitution following quotes are often used: $' ' For example ${arr[#]/%/$'\n\n\n'} prints three newline characters after each array "arr" item. Are those some sort of special quotes? How are they called? Where are they used besides bash pattern substitution?
ANSI-C Quoting
Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard.
For example:
$'hello\nworld'
You'll get 11 characters with newline in the middle.
echo -e 'hello\nworld'
echo $'hello\nworld'
They give you the same result.

new line in bash parameter substitution ${REV%%\n*}

does not work
echo ${REV%%\n*}
does work
echo ${REV%%
*}
After reading trough http://tldp.org/LDP/abs/html/parameter-substitution.html I still can not figure out how to make \n work.
${REV%%$'\n*'} seems to work. See the quoting section of the bash documentation.
If your intent is to try and get it on one line and you're willing to go "outside" of bash, you can use:
echo "$(echo "${REV}" | head -1l)"
But, assuming your version of bash is recent enough, you can try:
pax> export REV="abc
...> def"
pax> echo "${REV}"
abc
def
pax> echo "${REV%%$'\n'*}"
abc
The reason you need $'\n' is because the bash definition of word is somewhat restrictive, compared to what you expect. The bash manpage has this to say:
Words of the form $'string' are treated specially. The word expands
to string, with backslash-escaped characters replaced as specified by
the ANSI C standard. Backslash escape sequences, if present, are decoded
as follows:
\a alert (bell)
\b backspace
\e
\E an escape character
\f form feed
\n new line
\r carriage return
\t horizontal tab
\v vertical tab
\\ backslash
\' single quote
\" double quote
\nnn the eight-bit character whose value is the octal value
nnn (one to three digits)
\xHH the eight-bit character whose value is the hexadecimal
value HH (one or two hex digits)
\uHHHH the Unicode (ISO/IEC 10646) character whose value is
the hexadecimal value HHHH (one to four hex digits)
\UHHHHHHHH
the Unicode (ISO/IEC 10646) character whose value is
the hexadecimal value HHHHHHHH (one to eight hex digits)
\cx a control-x character
More portably (ie, POSIX):
var="yo
yo"
newline="
"
echo "${var#*"$newline"}"

Resources