Different result from $((++n)) when running bash vs dash - bash

I'm getting different outputs when running the program in bash and dash
#!/bin/sh
echo $SHELL
n=1
a=$((++n))
echo $n
Bash:
$ bash shell_test.sh
2
Dash:
$ dash shell_test.sh
1

dash is the Debian Almquist shell and an extreme light-weight version of a full POSIX-compliant shell-implementation of /bin/sh that aims to be as small as possible creating faster bootup times.
Operators such as $((n++)), $((--n)) and similar are features that are not required by POSIX and therefore not implemented.
To see how dash interprets these statements, see Chepner's answer
A nice page explaining how to make your script POSIX compliant, is here.
2.6.4 Arithmetic Expansion: Arithmetic expansion provides a mechanism for evaluating an arithmetic expression and substituting its value. The format for arithmetic expansion shall be as follows:
$((expression))
The expression shall be treated as if it were in double-quotes, except that a double-quote inside the expression is not treated specially. The shell shall expand all tokens in the expression for parameter expansion, command substitution, and quote removal.
Next, the shell shall treat this as an arithmetic expression and substitute the value of the expression. The arithmetic expression shall be processed according to the rules given in Arithmetic Precision and Operations, with the following exceptions:
Only signed long integer arithmetic is required.
Only the decimal-constant, octal-constant, and hexadecimal-constant constants specified in the ISO C standard, Section 6.4.4.1 are required to be recognized as constants.
The sizeof() operator and the prefix and postfix ++ and -- operators are not required.
Selection, iteration, and jump statements are not supported.
source: POSIX IEEE Std 1003.1-2017

Prefix ++ is not required by POSIX, and dash doesn't implement it. Instead, it's parsed as two unary + operators:
$ n=1
$ echo $((+(+n)))
1
$ echo $((++n))
1
$ echo $n
1

Related

Is there a syntactical difference between single and double quoted empty strings?

According to the bash manual, there is no syntactical difference. The bash-parser on the other hand seems to have a different opinion on that when dealing with arithmetic expressions:
$ echo "$BASH_VERSION"
5.2.15(1)-release
$ echo $((""))
0
$ echo $((''))
bash: '': syntax error: operand expected (error token is "''")
Related:
Difference between single and double quotes in Bash
There seems to be a subtle difference introduced in Bash 5.2. The manual states:
(( expression ))
The arithmetic expression is evaluated according to the rules described below (see Shell Arithmetic). The expression undergoes the same expansions as if it were within double quotes, but double quote characters in expression are not treated specially and are removed. If the value of the expression is non-zero, the return status is 0; otherwise the return status is 1.
Source: Bash Reference Manual: Section Conditional Constructs
This implies that (("")) is equivalent to (()) but (('')) is a syntactical error as single quotes are not removed from expression.
Exploring how different shell brands handles this
bash version 5.1-6
dash version 0.5.11
ksh93 version 1.0.0~beta.2
zsh version 5.8.1
Ksh93 seems to show the most distinctive behavior.
What it teaches is:
Within an arithmetic context, shells interpret a single quote as the single quote character itself, but not as the quoted literal value.
#!/usr/bin/env sh
for shell in bash dash ksh93 zsh; do
printf 'Testing with %s:\n' "$shell"
"$shell" <<'EOF'
LC_ALL=C
echo "$((''))"
EOF
echo
done
Output:
Testing with bash:
bash: line 2: '': syntax error: operand expected (error token is "''")
Testing with dash:
ash: 2: arithmetic expression: expecting primary: "''"
Testing with ksh93:
39
Testing with zsh:
zsh: bad math expression: illegal character: '
Seems like a bug, as the manual says
All tokens in the expression undergo parameter and variable expansion, command substitution, and quote removal.

Order of brace expansion and parameter expansion

A common trope on StackOverflow bash is: "Why doesn't x=99; echo {1..$x} work?"
The answer is "because braces are expanded before parameters/variables".
Therefore, I thought it should be possible to expand multiple variables using a single $ and a brace. I'd expect a=1; b=2; c=3; echo ${{a..c}} to print 1 2 3. First, the inner brace would expand to ${a} ${b} ${c} (which it does when writing echo \${{a..c}}). Then that result would undergo parameter expansion.
However, I got -bash: ${{a..c}}: bad substitution so {a..c} wasn't expanded at all.
Bash's manual is a bit more specific (emphasis mine).
Expansion is performed on the command line after it has been split into tokens [...]
The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and filename expansion.
Note the ; and , in that list. "Left-to-right fashion" seems to apply to the whole (therefore unordered) list before the ;. Just like the mathematical operators * and / have no precedence over each other.
Ok, so brace expansion is not really of higher precedence than parameter expansion. It's just that both {1..$x} and ${{a..c}} are evaluated from left to right, meaning the brace { comes before the parameter $x and the parameter ${ comes before the brace {a..c}.
Or so I thought. However, when using $ instead of ${ then parameters on the left expand after braces on the right:
# in bash 5.0.3(1)
x=nil; x1=one; x2=two
echo ${x{1..2}} # prints `-bash: ${x{1..2}}: bad substitution`
echo $x{1..2} # prints `one two`
Question
Could it be that the bash manual is flawed or did I read it wrong?
If the manual is flawed: What is the exact order of all expansions?
I'm just asking because I'm curious. I don't plan to use thinks like $x{1..2} anywhere. I'm not interested in better solutions or alternatives to address multiple variables (e.g. array slices ${array[#]:1:2}). I just want to get a deeper understanding.
from: https://www.gnu.org/software/bash/manual/html_node/Brace-Expansion.html
To avoid conflicts with parameter expansion, the string ‘${’ is not
considered eligible for brace expansion, and inhibits brace expansion
until the closing ‘}’.
That said, for echo $x{1..2} , first the brace expansion takes place, and then the parameter expansion, so we have echo $x1 $x2. For echo ${x{1..2}} the brace expansion doesn't happen, because we are after the ${ and haven't reached the closing } of the parameter expansion.
Regarding the bash manual part you have quoted, left-to-right order still exists for the expansions (with respect to allowed nested ones). Things get clearer if you format the list instead of using , and ;:
brace expansion
In a left-to-right fashion:
tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution
word splitting
filename expansion.
Read Mo Budlong's 1988 classic Command Line Psychology, which was written for regular Unix, but most of it still applies to bash. The order of evaluation goes:
1 History substitution (except for the Bourne shell)
2 Splitting words, including special characters
3 Updating the history list (except for the Bourne shell)
4 Interpreting single and double quotes
5 Alias substitution (except for the Bourne shell)
6 Redirection of input and output (< > and |)
7 Variable substitution (variables starting with $)
8 Command substitution (commands inside back quotes)
9 File name expansion (file name wild cards)
So what bash does with code like {1..3} happens before step 7 above, and that's why the OP code fails.
But if we must, there's always eval, (which should only be used if the variables are known in advance, or first cautiously type checked):
a=1; b=2; c=3; eval echo \{$a..$c}
Output:
1 2 3

A behavior when we parse non-number in arithmetic evaluation

I know from the manpage from bash that the variable that is null or unset is regarded as zero.
And I guess that non-number should be regarded as zero in arithmetic evaluation.
But without official ruling, it could be ambiguous like the second case of below example.
$ FOO=10
$ echo $((FOO))
10
$ FOO=10.abc
$ echo $((FOO))
bash: 10.abc: syntax error: invalid arithmetic operator (error token is ".abc")
atoi() from C parses the second example as 10.
What's the formal semantic of
parsing non-number in bash's arithmetic evaluation?
The $(( ... )) construct for arithmetic expansion was first introduced in the Korn shell back in the '80s, and then adopted for bash (which originally had a different syntax for arithmetic expansions) and POSIX.
Bash and ksh93 expect well-formed arithmetic expressions inside $(( ... )). They go way beyond atoi() in parsing them.
How the shells handle empty and unset variables is a red herring in this case. It's a sensible convenience that just happens to make sense in the context of the shell.

What is the benefit of using $() instead of backticks in shell scripts? [duplicate]

This question already has answers here:
What is the difference between $(command) and `command` in shell programming?
(6 answers)
Closed last year.
There are two ways to capture the output of command line in bash:
Legacy Bourne shell backticks ``:
var=`command`
$() syntax (which as far as I know is Bash specific, or at least not supported by non-POSIX old shells like original Bourne)
var=$(command)
Is there any benefit to using the second syntax compared to backticks? Or are the two fully 100% equivalent?
The major one is the ability to nest them, commands within commands, without losing your sanity trying to figure out if some form of escaping will work on the backticks.
An example, though somewhat contrived:
deps=$(find /dir -name $(ls -1tr 201112[0-9][0-9]*.txt | tail -1l) -print)
which will give you a list of all files in the /dir directory tree which have the same name as the earliest dated text file from December 2011 (a).
Another example would be something like getting the name (not the full path) of the parent directory:
pax> cd /home/pax/xyzzy/plugh
pax> parent=$(basename $(dirname $PWD))
pax> echo $parent
xyzzy
(a) Now that specific command may not actually work, I haven't tested the functionality. So, if you vote me down for it, you've lost sight of the intent :-) It's meant just as an illustration as to how you can nest, not as a bug-free production-ready snippet.
Suppose you want to find the lib directory corresponding to where gcc is installed. You have a choice:
libdir=$(dirname $(dirname $(which gcc)))/lib
libdir=`dirname \`dirname \\\`which gcc\\\`\``/lib
The first is easier than the second - use the first.
The backticks (`...`) is the legacy syntax required by only the very oldest of non-POSIX-compatible bourne-shells and $(...) is POSIX and more preferred for several reasons:
Backslashes (\) inside backticks are handled in a non-obvious manner:
$ echo "`echo \\a`" "$(echo \\a)"
a \a
$ echo "`echo \\\\a`" "$(echo \\\\a)"
\a \\a
# Note that this is true for *single quotes* too!
$ foo=`echo '\\'`; bar=$(echo '\\'); echo "foo is $foo, bar is $bar"
foo is \, bar is \\
Nested quoting inside $() is far more convenient:
echo "x is $(sed ... <<<"$y")"
instead of:
echo "x is `sed ... <<<\"$y\"`"
or writing something like:
IPs_inna_string=`awk "/\`cat /etc/myname\`/"'{print $1}' /etc/hosts`
because $() uses an entirely new context for quoting
which is not portable as Bourne and Korn shells would require these backslashes, while Bash and dash don't.
Syntax for nesting command substitutions is easier:
x=$(grep "$(dirname "$path")" file)
than:
x=`grep "\`dirname \"$path\"\`" file`
because $() enforces an entirely new context for quoting, so each command substitution is protected and can be treated on its own without special concern over quoting and escaping. When using backticks, it gets uglier and uglier after two and above levels.
Few more examples:
echo `echo `ls`` # INCORRECT
echo `echo \`ls\`` # CORRECT
echo $(echo $(ls)) # CORRECT
It solves a problem of inconsistent behavior when using backquotes:
echo '\$x' outputs \$x
echo `echo '\$x'` outputs $x
echo $(echo '\$x') outputs \$x
Backticks syntax has historical restrictions on the contents of the embedded command and cannot handle some valid scripts that include backquotes, while the newer $() form can process any kind of valid embedded script.
For example, these otherwise valid embedded scripts do not work in the left column, but do work on the rightIEEE:
echo ` echo $(
cat <<\eof cat <<\eof
a here-doc with ` a here-doc with )
eof eof
` )
echo ` echo $(
echo abc # a comment with ` echo abc # a comment with )
` )
echo ` echo $(
echo '`' echo ')'
` )
Therefore the syntax for $-prefixed command substitution should be the preferred method, because it is visually clear with clean syntax (improves human and machine readability), it is nestable and intuitive, its inner parsing is separate, and it is also more consistent (with all other expansions that are parsed from within double-quotes) where backticks are the only exception and ` character is easily camouflaged when adjacent to " making it even more difficult to read, especially with small or unusual fonts.
Source: Why is $(...) preferred over `...` (backticks)? at BashFAQ
See also:
POSIX standard section "2.6.3 Command Substitution"
POSIX rationale for including the $() syntax
Command Substitution
bash-hackers: command substitution
From man bash:
$(command)
or
`command`
Bash performs the expansion by executing command and replacing the com-
mand substitution with the standard output of the command, with any
trailing newlines deleted. Embedded newlines are not deleted, but they
may be removed during word splitting. The command substitution $(cat
file) can be replaced by the equivalent but faster $(< file).
When the old-style backquote form of substitution is used, backslash
retains its literal meaning except when followed by $, `, or \. The
first backquote not preceded by a backslash terminates the command sub-
stitution. When using the $(command) form, all characters between the
parentheses make up the command; none are treated specially.
In addition to the other answers,
$(...)
stands out visually better than
`...`
Backticks look too much like apostrophes; this varies depending on the font you're using.
(And, as I just noticed, backticks are a lot harder to enter in inline code samples.)
$() allows nesting.
out=$(echo today is $(date))
I think backticks does not allow it.
It is the POSIX standard that defines the $(command) form of command substitution. Most shells in use today are POSIX compliant and support this preferred form over the archaic backtick notation. The command substitution section (2.6.3) of the Shell Language document describes this:
Command substitution allows the output of a command to be substituted in place of the command name itself.  Command substitution shall occur when the command is enclosed as follows:
$(command)
or (backquoted version):
`command`
The shell shall expand the command substitution by executing command
in a subshell environment (see Shell Execution Environment) and
replacing the command substitution (the text of command plus the
enclosing "$()" or backquotes) with the standard output of the
command, removing sequences of one or more <newline> characters at the
end of the substitution. Embedded <newline> characters before the end
of the output shall not be removed; however, they may be treated as
field delimiters and eliminated during field splitting, depending on
the value of IFS and quoting that is in effect. If the output contains
any null bytes, the behavior is unspecified.
Within the backquoted style of command substitution, <backslash> shall
retain its literal meaning, except when followed by: '$' , '`', or
<backslash>. The search for the matching backquote shall be satisfied
by the first unquoted non-escaped backquote; during this search, if a
non-escaped backquote is encountered within a shell comment, a
here-document, an embedded command substitution of the $(command)
form, or a quoted string, undefined results occur. A single-quoted or
double-quoted string that begins, but does not end, within the "`...`"
sequence produces undefined results.
With the $(command) form, all characters following the open
parenthesis to the matching closing parenthesis constitute the
command. Any valid shell script can be used for command, except a
script consisting solely of redirections which produces unspecified
results.
The results of command substitution shall not be processed for further
tilde expansion, parameter expansion, command substitution, or
arithmetic expansion. If a command substitution occurs inside
double-quotes, field splitting and pathname expansion shall not be
performed on the results of the substitution.
Command substitution can be nested. To specify nesting within the
backquoted version, the application shall precede the inner backquotes
with <backslash> characters; for example:
\`command\`
The syntax of the shell command language has an ambiguity for expansions beginning with "$((",
which can introduce an arithmetic expansion or a command substitution that starts with a subshell.
Arithmetic expansion has precedence; that is, the shell shall first determine
whether it can parse the expansion as an arithmetic expansion
and shall only parse the expansion as a command substitution
if it determines that it cannot parse the expansion as an arithmetic expansion.
The shell need not evaluate nested expansions when performing this determination.
If it encounters the end of input without already having determined
that it cannot parse the expansion as an arithmetic expansion,
the shell shall treat the expansion as an incomplete arithmetic expansion and report a syntax error.
A conforming application shall ensure that it separates the "$(" and '(' into two tokens
(that is, separate them with white space) in a command substitution that starts with a subshell.
For example, a command substitution containing a single subshell could be written as:
$( (command) )
I came up with a perfectly valid example of $(...) over `...`.
I was using a remote desktop to Windows running Cygwin and wanted to iterate over a result of a command. Sadly, the backtick character was impossible to enter, either due to the remote desktop thing or Cygwin itself.
It's sane to assume that a dollar sign and parentheses will be easier to type in such strange setups.
Here in 2021 it is worth mentioning a curious fact as a supplement to the other answers.
The Microsoft DevOps YAML "scripting" for pipelines may include Bash tasks. However, the notation $() is used for referring to variables defined in the YAML context, so in this case backticks should be used for capturing the output of commands.
This is mostly a problem when copying scripting code into a YAML script since the DevOps preprocessor is very forgiving about nonexisting variables, so there will not be any error message.

Handling arithmetic expressions in shell scripting

Kindly tell me that is it necessary to use "expr" keyword.
EG:-
echo `expr a*b`
And where we can simply handle arithmetic expressions using simple arithmetic operators.
EG:-
echo a*b
Thanks in advance.
In a Posix shell you can evaluate expressions directly in the shell when they are enclosed in
$(( ... ))
So:
a=12
b=34
echo $(($a + $b))
And although this wasn't always the case, all Posix shells you are likely to encounter will also deal with:
echo $((a + b))
This all happened because, a long time ago, the shell did not do arithmetic, and so the external program expr was written. These days, expr is usually a builtin (in addition to still being in /bin) and there is the Posix $((...)) syntax available. If $((...)) had been around from day one there would be no expr.
The shell is not exactly a normal computer language, and not exactly a macro processor: it's a CLI. It doesn't do inline expressions; an unquoted * is a wildcard for filename matching, because a CLI needs to reference files more often than it needs to do arithmetic.
The second form will almost surely never do what you want. In Bash, you have a built-in numeric expression handler, though:
A=4; B=6; echo $((A * B))
You can do arithmatic using $(())
echo $((2*3))
results in 6

Resources