A behavior when we parse non-number in arithmetic evaluation

A behavior when we parse non-number in arithmetic evaluation - bash

I know from the manpage from bash that the variable that is null or unset is regarded as zero.
And I guess that non-number should be regarded as zero in arithmetic evaluation.
But without official ruling, it could be ambiguous like the second case of below example.
$ FOO=10
$ echo $((FOO))
10
$ FOO=10.abc
$ echo $((FOO))
bash: 10.abc: syntax error: invalid arithmetic operator (error token is ".abc")
atoi() from C parses the second example as 10.
What's the formal semantic of
parsing non-number in bash's arithmetic evaluation?

The $(( ... )) construct for arithmetic expansion was first introduced in the Korn shell back in the '80s, and then adopted for bash (which originally had a different syntax for arithmetic expansions) and POSIX.
Bash and ksh93 expect well-formed arithmetic expressions inside $(( ... )). They go way beyond atoi() in parsing them.
How the shells handle empty and unset variables is a red herring in this case. It's a sensible convenience that just happens to make sense in the context of the shell.

Related

Is there a syntactical difference between single and double quoted empty strings?

According to the bash manual, there is no syntactical difference. The bash-parser on the other hand seems to have a different opinion on that when dealing with arithmetic expressions:
$ echo "$BASH_VERSION"
5.2.15(1)-release
$ echo $((""))
0
$ echo $((''))
bash: '': syntax error: operand expected (error token is "''")
Related:
Difference between single and double quotes in Bash

There seems to be a subtle difference introduced in Bash 5.2. The manual states:
(( expression ))
The arithmetic expression is evaluated according to the rules described below (see Shell Arithmetic). The expression undergoes the same expansions as if it were within double quotes, but double quote characters in expression are not treated specially and are removed. If the value of the expression is non-zero, the return status is 0; otherwise the return status is 1.
Source: Bash Reference Manual: Section Conditional Constructs
This implies that (("")) is equivalent to (()) but (('')) is a syntactical error as single quotes are not removed from expression.

Exploring how different shell brands handles this
bash version 5.1-6
dash version 0.5.11
ksh93 version 1.0.0~beta.2
zsh version 5.8.1
Ksh93 seems to show the most distinctive behavior.
What it teaches is:
Within an arithmetic context, shells interpret a single quote as the single quote character itself, but not as the quoted literal value.
#!/usr/bin/env sh
for shell in bash dash ksh93 zsh; do
printf 'Testing with %s:\n' "$shell"
"$shell" <<'EOF'
LC_ALL=C
echo "$((''))"
EOF
echo
done
Output:
Testing with bash:
bash: line 2: '': syntax error: operand expected (error token is "''")
Testing with dash:
ash: 2: arithmetic expression: expecting primary: "''"
Testing with ksh93:
39
Testing with zsh:
zsh: bad math expression: illegal character: '

Seems like a bug, as the manual says
All tokens in the expression undergo parameter and variable expansion, command substitution, and quote removal.

Why can I refer to a variable inside (( )) without the $ symbol?

In ABS guide, I could see below snippet
var1=5
var2=4
if (( var1 > var2 ))
then
echo "$var1 is greater than $var2"
fi
I am not able to understand, why we don't need $ symbol. I added $ symbol, shellcheck shows "$ symbol is not necessary on arithmetic variables".
I am still not able to understand how that dereferencing of var1 and var2 works...

Expressions inside ((...)) are evaluated in arithmetic context. Strings which can be variable names are considered as variables whose values are integers, since evaluating these strings as literal strings makes no sense in arithmetic context. These considerations are also valid for C style for loops: in for ((i = 0; i < 10; ++i)), preceding i with $ is not necessary (but it may be necessary, depending on the context, within the body of the loop).

(( ... )) follows the same evaluation rules as defined in the POSIX specification for arithmetic expressions. (The main difference is ((...)) produces an exit status reflecting if the result is zero/nonzero, while $((...)) produces the result as a string.) In particular:
If the shell variable x contains a value that forms a valid integer constant, optionally including a leading <plus-sign> or <hyphen-minus>, then the arithmetic expansions "$((x))" and "$(($x))" shall return the same value.
The shell variable var1 contains an integer constant, so $((var1)) and $(($var1)) are equivalent. This holds recursively as well.
Various shells seem to treat variables that refer to variables differently. Nothing in the POSIX wording seems to require the following kind of recursive evaluation, though both bash and dash do:
$ foo=bar
$ bar=5
$ echo $((foo)) # foo/$foo evaluates to bar, which contains an integer constant
5
bash seems to take it a step further, allowing any string to expand, followed by an attempt to evaluate the result as an arithmetic expression.
$ foo="x + 3"
$ x=5
$ echo $((foo)) # foo evaluates to x + 3, which evaluates to 5 + 3
8
but in dash:
$ foo="x+3"
$ x=5
$ echo $((foo))
dash: 3: Illegal number: x+3

It's documented in the manual in 6.5 Shell Arithmetic. It says
Within an expression, shell variables may also be referenced by name without using the parameter expansion syntax.
This features makes arithmetic expressions so much easier to read.
Interestingly, you can also refer to array elements like this as well
values=(42 54) i=0 j=1
echo $(( values[i] + values[j] )) # => 96
Other places you don't need $ (i.e. other arithmetic "contexts"):
For numerically-indexed arrays, the index (${values[i]} not ${values[$i]})
the offset and length parts of the ${var:offset:length} parameter expansion.

You don't need the $ if it's inside the (( )) for doing arithmetic.
I am still not able to understand how that dereferencing of var1 and var2 works
bash parses things differently inside the (( )), undoubtedly because it makes it much easier to read complex arithmetic expressions without the $.

Different result from $((++n)) when running bash vs dash

I'm getting different outputs when running the program in bash and dash
#!/bin/sh
echo $SHELL
n=1
a=$((++n))
echo $n
Bash:
$ bash shell_test.sh
2
Dash:
$ dash shell_test.sh
1

dash is the Debian Almquist shell and an extreme light-weight version of a full POSIX-compliant shell-implementation of /bin/sh that aims to be as small as possible creating faster bootup times.
Operators such as $((n++)), $((--n)) and similar are features that are not required by POSIX and therefore not implemented.
To see how dash interprets these statements, see Chepner's answer
A nice page explaining how to make your script POSIX compliant, is here.
2.6.4 Arithmetic Expansion: Arithmetic expansion provides a mechanism for evaluating an arithmetic expression and substituting its value. The format for arithmetic expansion shall be as follows:
$((expression))
The expression shall be treated as if it were in double-quotes, except that a double-quote inside the expression is not treated specially. The shell shall expand all tokens in the expression for parameter expansion, command substitution, and quote removal.
Next, the shell shall treat this as an arithmetic expression and substitute the value of the expression. The arithmetic expression shall be processed according to the rules given in Arithmetic Precision and Operations, with the following exceptions:
Only signed long integer arithmetic is required.
Only the decimal-constant, octal-constant, and hexadecimal-constant constants specified in the ISO C standard, Section 6.4.4.1 are required to be recognized as constants.
The sizeof() operator and the prefix and postfix ++ and -- operators are not required.
Selection, iteration, and jump statements are not supported.
source: POSIX IEEE Std 1003.1-2017

Prefix ++ is not required by POSIX, and dash doesn't implement it. Instead, it's parsed as two unary + operators:
$ n=1
$ echo $((+(+n)))
1
$ echo $((++n))
1
$ echo $n
1

Bash tilde substring expansion - undocumented feature?

I was surprised that this expansion:
$ echo "${foo:~abc}"
yielded the empty string when foo was unset. I expected that it would parse like this:
$ echo "${foo:(~abc)}"
and yield the string "~abc". But instead, I found that if I did define
$ foo='abcdefg'
$ echo "${foo:~abc}"
g
In fact, it's taking "abc" in arithmetic context and doing. "${foo:~0}". Likewise
$ foo='abcdefg'
$ echo "${foo:~3}"
defg
It gets you the last n+1 characters of the expansion. I looked in the "Parameter Expansion" section of the manpage. I see no mention of tildes there. Bash Hackers Wiki only mentions tildes as (also undocumented) case modifiers.
This behavior goes back to at least 3.2.57.
Am I just missing where this form of substring expansion is documented, or is it not documented at all?

It's not undocumented (you may have been confusing ${foo:~abc} with ${foo-~abc}).
${parameter:offset}
${parameter:offset:length}
Substring Expansion. Expands to up to length characters of the
value of parameter starting at the character specified by off-
set. [...] If length is omitted, expands to the substring of the
value of parameter starting at the character specified by offset
and extending to the end of the value. length and offset are
arithmetic expressions (see ARITHMETIC EVALUATION below).
Here, ~abc is the offset field of the expansion, and ~ is the bitwise negation operator in the arithmetic expression. An undefined parameter evaluates to 0 in an arithmetic expression, and ~0 == -1.

Handling arithmetic expressions in shell scripting

Kindly tell me that is it necessary to use "expr" keyword.
EG:-
echo `expr a*b`
And where we can simply handle arithmetic expressions using simple arithmetic operators.
EG:-
echo a*b
Thanks in advance.

In a Posix shell you can evaluate expressions directly in the shell when they are enclosed in
$(( ... ))
So:
a=12
b=34
echo $(($a + $b))
And although this wasn't always the case, all Posix shells you are likely to encounter will also deal with:
echo $((a + b))
This all happened because, a long time ago, the shell did not do arithmetic, and so the external program expr was written. These days, expr is usually a builtin (in addition to still being in /bin) and there is the Posix $((...)) syntax available. If $((...)) had been around from day one there would be no expr.
The shell is not exactly a normal computer language, and not exactly a macro processor: it's a CLI. It doesn't do inline expressions; an unquoted * is a wildcard for filename matching, because a CLI needs to reference files more often than it needs to do arithmetic.

The second form will almost surely never do what you want. In Bash, you have a built-in numeric expression handler, though:
A=4; B=6; echo $((A * B))

You can do arithmatic using $(())
echo $((2*3))
results in 6

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio