Why isn't bash splitting fields inside $() operator? - bash

Just for the sake of learning, while I was trying a bunch of stuff, I noticed a bash behavior that I couldn't logically explain.
$ ls $(echo a)
ls: cannot access 'a': No such file or directory
$ ls $($'\145\143\150\157\40\141')
echo a: command not found
$ ls $($"\145\143\150\157\40\141")
\145\143\150\157\40\141: command not found
In the first case, echo a was being evaluated to a which becomes an argument to ls. Fair enough. However, in the second case, the octal encoded string was being evaluated to echo a as expected; but the entire string was being treated as the command by $(). Moreover, in the third case, no expansion is taking place with double quotes. Why don't the fields split? I guess it has got something to do with field splitting in bash. But, I failed to explain what could the exact problem be. Is there any way I can make the field splitting work so that it gets treated like the first case?

A word of the form $'string' is expanded to a single-quoted string, as if the dollar sign had not been present. That means,
$($'\145\143\150\157\40\141')
is the same as
$('echo a')
And single-quoted strings don't undergo word splitting or any other kind of expansion. See Bash Reference Manual § ANSI-C Quoting.
Is there any way I can make the field splitting work so that it gets treated like the first case?
$ ls $(eval $'\145\143\150\157\40\141')
ls: cannot access 'a': No such file or directory
This is inadvisable though, see BashFAQ#048.
Concerning $"string" syntax, see Bash Reference Manual § Locale-Specific Translation, it's whole another thing.

Related

How can I confirm whether whitespace or special characters are escaped in a wildcard pattern?

I know that when you use a for loop in Bash, the items that you loop through are separated using the $IFS variable.
However, if I run the following commands, I correctly show the two files I have created - even though they have spaces:
touch file\ {1..2}.txt
for file in *.txt; do
echo "Found: ${file}"
done
The output is:
Found: file 1.txt
Found: file 2.txt
I am assuming that this is because when the shell sees the wildcard pattern, it expands it and escapes any special characters or whitespace. This is in contrast to if I run:
touch file\ {1..2}.txt
files=$(ls *.txt)
for file in $files; do
echo "Found: ${file}"
done
This results in:
Found: file
Found: 1.txt
Found: file
Found: 2.txt
Which makes sense - by default $IFS contains whitespace, so the file names are split.
What I want to understand is:
Am I correct that wildcard expansion results in a set of strings that contain escaped special characters
Where is it documented that this is the case, if I am correct?
Is there any way to show that this is happening?
I was hoping I could use something like set -x to show what the wildcard expands to and actually see the escaped characters, because I really want to be able to understand what is going on here.
I am writing a long series of articles on effective shell usage (effective-shell.com) and I'm struggling to find a way to explain the differences of behaviour here, I'm assuming that the shell is escaping characters but I'd like to know if this is the case and how to see it if possible!
Thanks in advance.
done
Am I correct that wildcard expansion results in a set of strings that contain escaped special characters
No. There is no need for the shell to escape special characters at that point, because filename expansion is the last word expansion to be performed; strings resulting from it are not subjected to word splitting or any other expansion; they stay as-is. This is documented in the manual as follows:
The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and filename expansion.

Why isn't a semicolon in command substitution output treated identical to one in the original code?

In my understanding of command substitution this should work, but it doesn't, can you explain me why and how to do something like this.
Why does this work:
cd ..; echo 123 # output is "123", after changing directories
...when this doesn't:
cd $(echo "..; echo 123") # error message is "cd: too many arguments"
Command substitution results (like expansion of variables) do not go through all parsing phases; they only go through word-splitting[1] and glob expansion[2], and even those happen only when the expansion itself is unquoted.
That means that your result is identical to:
cd "..;" "echo" "123"
...the semicolon is treated as literal text to be passed to the cd command, not shell syntax.
This is a feature, not a bug: If command substitution results were parsed as syntax, writing secure shell scripts handling untrusted data would be basically impossible.
[1] dividing the results into "words" on characters in IFS -- or, by default, whitespace.
[2] looking at whether each resulting word can be treated as a filename-matching pattern, and, if so, matching them against the local filesystem.

Bash wildcard pattern using `seq`

I am trying the following command:
ls myfile.h1.{`seq -s ',' 3501 3511`}*
But ls raises the error:
ls: cannot access myfile.h1.{3501,3502,3503,3504,3505,3506,3507,3508,3509,3510,3511}*: No such file or directory
Seems like ls is thinking the entire line is a filename and not a wildcard pattern. But if I just copy that command ls myfile.h1.{3501,3502,3503,3504,3505,3506,3507,3508,3509,3510,3511}* in the terminal I get the listing as expected.
Why does typing out the command in full work, but not the usage with seq?
seq is not needed for your case, try
$ ls myfile.h1.{3500..3511}
if you want to use seq I would suggest using format option
$ ls $(seq -f 'myfile.h1.%g' 3501 3511)
but I don't think there is any reason to do so.
UPDATE:
Note that I didn't notice the globbing in the original post. With that, the brace extension still preferred way
$ ls myfile.h1.{3500..3511}*
perhaps even factoring the common digit out, if your bash support zero padding
$ ls myfile.h1.35{00..11}*
if not you can extract at least 3 out
$ ls myfile.h1.3{500..511}*
Note that the seq alternative won't work with globbing.
Other answer has more details...
karakfa's answer, which uses a literal sequence brace expansion expression, is the right solution.
As for why your approach didn't work:
Bash's brace expansion {...} only works with literal expressions - neither variable references nor, as in your case, command substitutions (`...`, or, preferably, $(...)) work[1] - for a concise overview, see this answer of mine.
With careful use of eval, however, you can work around this limitation; to wit:
from=3501 to=3511
# CAVEAT: Only do this if you TRUST that $from and $to contain
# decimal numbers only.
eval ls "myfile.h1.{$from..$to}*"
#ghoti suggests the following improvement in a comment to make the use of eval safe here:
# Use parameter expansion to remove all non-digit characters from the values
# of $from and $to, thus ensuring that they either contain only a decimal
# number or the empty string; this expansion happens *before* eval is invoked.
eval ls "myfile.h1.{${from//[^0-9]/}..${to//[^0-9]/}}*"
As for how your command was actually evaluated:
Note: Bash applies 7-8 kinds of expansions to a command line; only the ones that actually come into play here are discussed below.
first, the command in command substitution `seq -s ',' 3501 3511` is executed, and replaced by its output (also note the trailing ,):
3501,3502,3503,3504,3505,3506,3507,3508,3509,3510,3511,
the result then forms a single word with its prefix, myfile.h1.{ and its suffix, }*, yielding:
myfile.h1.{3501,3502,3503,3504,3505,3506,3507,3508,3509,3510,3511,}*
pathname expansion (globbing) is then applied to the result - in your case, since no files match, it is left as-is (by default; shell options shopt -s nullglob or shopt -s failglob could change that).
finally, literal myfile.h1.{3501,3502,3503,3504,3505,3506,3507,3508,3509,3510,3511,}* is passed to ls, which - because it doesn't refer to an existing filesystem item - results in the error message you saw.
[1] Note that the limitation only applies to sequence brace expansions (e.g., {1..3}); list brace expansions (e.g, {1,2,3}) are not affected, because no up-front interpretation (interpolation) is needed; e.g. {$HOME,$USER} works, because brace expansion results expanding the list to separate words $HOME, and $USER, which are only later expanded.
Historically, sequence brace expansions were introduced later, at a time when the order of shell expansions was already fixed.

Bash - Diference between echo `basename $HOME` and echo $(basename $HOME)

Thank you very much in advance for helping.
The title says everything: what's the difference between using:
echo `basename $HOME`
and
echo $(basename $HOME)
Please notice that I know what the basename command does, that both syntax are valid and both commands give the same output.
I was just wondering if there is any difference between both and if it's possible, why there are two syntaxes for this.
Cheers
Rafael
The second form has different escaping rules making it much easier to nest. e.g.
echo $(echo $(basename $HOME))
I'll leave working out how to do that with ` as an exercise for the reader, it should prove enlightening.
They are one of the same.
please read this.
EDIT (from the link):
Command substitution
Command substitution allows the output of a command to replace the command itself. Command substitution occurs when a command is enclosed like this:
$(command)
or like this using backticks:
`command`
Bash performs the expansion by executing COMMAND and replacing the command substitution with the standard output of the command, with any trailing newlines deleted. Embedded newlines are not deleted, but they may be removed during word splitting.
$ franky ~> echo `date`
Thu Feb 6 10:06:20 CET 2003
When the old-style backquoted form of substitution is used, backslash retains its literal meaning except when followed by "$", "`", or "\". The first backticks not preceded by a backslash terminates the command substitution. When using the $(COMMAND) form, all characters between the parentheses make up the command; none are treated specially.
Command substitutions may be nested. To nest when using the backquoted form, escape the inner backticks with backslashes.
If the substitution appears within double quotes, word splitting and file name expansion are not performed on the results.
They are alternative syntaxes for command substitution. as #Steve mentions they have different quoting rules and th backticks are harder to nest with. On the other hand they are more portable with older version of bash, and other shells eg csh.

What is the benefit of using $() instead of backticks in shell scripts? [duplicate]

This question already has answers here:
What is the difference between $(command) and `command` in shell programming?
(6 answers)
Closed last year.
There are two ways to capture the output of command line in bash:
Legacy Bourne shell backticks ``:
var=`command`
$() syntax (which as far as I know is Bash specific, or at least not supported by non-POSIX old shells like original Bourne)
var=$(command)
Is there any benefit to using the second syntax compared to backticks? Or are the two fully 100% equivalent?
The major one is the ability to nest them, commands within commands, without losing your sanity trying to figure out if some form of escaping will work on the backticks.
An example, though somewhat contrived:
deps=$(find /dir -name $(ls -1tr 201112[0-9][0-9]*.txt | tail -1l) -print)
which will give you a list of all files in the /dir directory tree which have the same name as the earliest dated text file from December 2011 (a).
Another example would be something like getting the name (not the full path) of the parent directory:
pax> cd /home/pax/xyzzy/plugh
pax> parent=$(basename $(dirname $PWD))
pax> echo $parent
xyzzy
(a) Now that specific command may not actually work, I haven't tested the functionality. So, if you vote me down for it, you've lost sight of the intent :-) It's meant just as an illustration as to how you can nest, not as a bug-free production-ready snippet.
Suppose you want to find the lib directory corresponding to where gcc is installed. You have a choice:
libdir=$(dirname $(dirname $(which gcc)))/lib
libdir=`dirname \`dirname \\\`which gcc\\\`\``/lib
The first is easier than the second - use the first.
The backticks (`...`) is the legacy syntax required by only the very oldest of non-POSIX-compatible bourne-shells and $(...) is POSIX and more preferred for several reasons:
Backslashes (\) inside backticks are handled in a non-obvious manner:
$ echo "`echo \\a`" "$(echo \\a)"
a \a
$ echo "`echo \\\\a`" "$(echo \\\\a)"
\a \\a
# Note that this is true for *single quotes* too!
$ foo=`echo '\\'`; bar=$(echo '\\'); echo "foo is $foo, bar is $bar"
foo is \, bar is \\
Nested quoting inside $() is far more convenient:
echo "x is $(sed ... <<<"$y")"
instead of:
echo "x is `sed ... <<<\"$y\"`"
or writing something like:
IPs_inna_string=`awk "/\`cat /etc/myname\`/"'{print $1}' /etc/hosts`
because $() uses an entirely new context for quoting
which is not portable as Bourne and Korn shells would require these backslashes, while Bash and dash don't.
Syntax for nesting command substitutions is easier:
x=$(grep "$(dirname "$path")" file)
than:
x=`grep "\`dirname \"$path\"\`" file`
because $() enforces an entirely new context for quoting, so each command substitution is protected and can be treated on its own without special concern over quoting and escaping. When using backticks, it gets uglier and uglier after two and above levels.
Few more examples:
echo `echo `ls`` # INCORRECT
echo `echo \`ls\`` # CORRECT
echo $(echo $(ls)) # CORRECT
It solves a problem of inconsistent behavior when using backquotes:
echo '\$x' outputs \$x
echo `echo '\$x'` outputs $x
echo $(echo '\$x') outputs \$x
Backticks syntax has historical restrictions on the contents of the embedded command and cannot handle some valid scripts that include backquotes, while the newer $() form can process any kind of valid embedded script.
For example, these otherwise valid embedded scripts do not work in the left column, but do work on the rightIEEE:
echo ` echo $(
cat <<\eof cat <<\eof
a here-doc with ` a here-doc with )
eof eof
` )
echo ` echo $(
echo abc # a comment with ` echo abc # a comment with )
` )
echo ` echo $(
echo '`' echo ')'
` )
Therefore the syntax for $-prefixed command substitution should be the preferred method, because it is visually clear with clean syntax (improves human and machine readability), it is nestable and intuitive, its inner parsing is separate, and it is also more consistent (with all other expansions that are parsed from within double-quotes) where backticks are the only exception and ` character is easily camouflaged when adjacent to " making it even more difficult to read, especially with small or unusual fonts.
Source: Why is $(...) preferred over `...` (backticks)? at BashFAQ
See also:
POSIX standard section "2.6.3 Command Substitution"
POSIX rationale for including the $() syntax
Command Substitution
bash-hackers: command substitution
From man bash:
$(command)
or
`command`
Bash performs the expansion by executing command and replacing the com-
mand substitution with the standard output of the command, with any
trailing newlines deleted. Embedded newlines are not deleted, but they
may be removed during word splitting. The command substitution $(cat
file) can be replaced by the equivalent but faster $(< file).
When the old-style backquote form of substitution is used, backslash
retains its literal meaning except when followed by $, `, or \. The
first backquote not preceded by a backslash terminates the command sub-
stitution. When using the $(command) form, all characters between the
parentheses make up the command; none are treated specially.
In addition to the other answers,
$(...)
stands out visually better than
`...`
Backticks look too much like apostrophes; this varies depending on the font you're using.
(And, as I just noticed, backticks are a lot harder to enter in inline code samples.)
$() allows nesting.
out=$(echo today is $(date))
I think backticks does not allow it.
It is the POSIX standard that defines the $(command) form of command substitution. Most shells in use today are POSIX compliant and support this preferred form over the archaic backtick notation. The command substitution section (2.6.3) of the Shell Language document describes this:
Command substitution allows the output of a command to be substituted in place of the command name itself.  Command substitution shall occur when the command is enclosed as follows:
$(command)
or (backquoted version):
`command`
The shell shall expand the command substitution by executing command
in a subshell environment (see Shell Execution Environment) and
replacing the command substitution (the text of command plus the
enclosing "$()" or backquotes) with the standard output of the
command, removing sequences of one or more <newline> characters at the
end of the substitution. Embedded <newline> characters before the end
of the output shall not be removed; however, they may be treated as
field delimiters and eliminated during field splitting, depending on
the value of IFS and quoting that is in effect. If the output contains
any null bytes, the behavior is unspecified.
Within the backquoted style of command substitution, <backslash> shall
retain its literal meaning, except when followed by: '$' , '`', or
<backslash>. The search for the matching backquote shall be satisfied
by the first unquoted non-escaped backquote; during this search, if a
non-escaped backquote is encountered within a shell comment, a
here-document, an embedded command substitution of the $(command)
form, or a quoted string, undefined results occur. A single-quoted or
double-quoted string that begins, but does not end, within the "`...`"
sequence produces undefined results.
With the $(command) form, all characters following the open
parenthesis to the matching closing parenthesis constitute the
command. Any valid shell script can be used for command, except a
script consisting solely of redirections which produces unspecified
results.
The results of command substitution shall not be processed for further
tilde expansion, parameter expansion, command substitution, or
arithmetic expansion. If a command substitution occurs inside
double-quotes, field splitting and pathname expansion shall not be
performed on the results of the substitution.
Command substitution can be nested. To specify nesting within the
backquoted version, the application shall precede the inner backquotes
with <backslash> characters; for example:
\`command\`
The syntax of the shell command language has an ambiguity for expansions beginning with "$((",
which can introduce an arithmetic expansion or a command substitution that starts with a subshell.
Arithmetic expansion has precedence; that is, the shell shall first determine
whether it can parse the expansion as an arithmetic expansion
and shall only parse the expansion as a command substitution
if it determines that it cannot parse the expansion as an arithmetic expansion.
The shell need not evaluate nested expansions when performing this determination.
If it encounters the end of input without already having determined
that it cannot parse the expansion as an arithmetic expansion,
the shell shall treat the expansion as an incomplete arithmetic expansion and report a syntax error.
A conforming application shall ensure that it separates the "$(" and '(' into two tokens
(that is, separate them with white space) in a command substitution that starts with a subshell.
For example, a command substitution containing a single subshell could be written as:
$( (command) )
I came up with a perfectly valid example of $(...) over `...`.
I was using a remote desktop to Windows running Cygwin and wanted to iterate over a result of a command. Sadly, the backtick character was impossible to enter, either due to the remote desktop thing or Cygwin itself.
It's sane to assume that a dollar sign and parentheses will be easier to type in such strange setups.
Here in 2021 it is worth mentioning a curious fact as a supplement to the other answers.
The Microsoft DevOps YAML "scripting" for pipelines may include Bash tasks. However, the notation $() is used for referring to variables defined in the YAML context, so in this case backticks should be used for capturing the output of commands.
This is mostly a problem when copying scripting code into a YAML script since the DevOps preprocessor is very forgiving about nonexisting variables, so there will not be any error message.

Resources