Should shell aliases be substituted when parsed or when executed? - bash

Consider the script:
alias al='echo A'
foo(){ echo $(al);}
alias al='echo B'
foo
bash, ksh, and zsh prints B, while dash and yash prints A.
Which is correct? Or both?
According to the POSIX standard (Vol. Shell & Utilities) 2.3.1 Alias Substitution,
After a token has been delimited, but before applying the grammatical rules in Shell Grammar, a resulting word that is identified to be the command name word of a simple command shall be examined to determine whether it is an unquoted, valid alias name.
It seems aliases should be resolved when the function id defined, as you don't know this is in a function before applying the grammatical rules. However, this doesn't make much sense to me, as you can't identify a word to be the command name word ebfore applying the grammatical rules, either.

Related

Is there a linter for fish like there is for bash with shellcheck?

For sh/bash/zsh there is https://github.com/koalaman/shellcheck however there won't be support for fish with it https://github.com/koalaman/shellcheck/issues/209 - is there any linters for fish?
To my knowledge, there is not (and obviously this is impossible to prove).
And if someone were to create such a thing, there'd need to be consensus about what the "typical beginner's syntax issues" and "semantic problems that cause a shell to behave strangely and counter-intuitively" are.
Fish doesn't have many of POSIX sh's warts (as it was written as a reaction to them). Some examples from the shellcheck README:
echo $1 # Unquoted variables
Fish's quoting behavior is quite different - in particular, there is no word splitting on variables, so unquoted variables usually do what you want.
v='--verbose="true"'; cmd $v # Literal quotes in variables
This is presumably an (unsuccessful) attempt to defeat word splitting, which isn't necessary.
This example nicely illustrates the issue - there are multiple decades worth of sh scripts. The flaws and unintuitive behaviors are really well known. So well known in fact, that the common-but-incorrect workarounds are known as well. That's just not the case for fish.
(Obviously, other examples do apply to fish as well, especially the "Frequently misused commands" section.)
Some things in fish that I know new users often trip over:
Unquoted variables expand to one argument per element in the list (since every variable is one). That includes zero if the list is empty, which is an issue with test - e.g. test -n $var will return 0 because fish's test builtin is one of the few parts that are POSIX-compatible (since POSIX demands test with one argument returns 0). Double-quote if you always need one argument.
{} expands to nothing and {x} expands to "x", which means find -exec needs quoting, as do some git commit-ishes (HEAD#{4}). (edit: This has since been changed, {} expands to {} and {x} expands to {x} unless x has a comma or other expansion, so HEAD#{4} works)
fish -n or --no-execute "does not execute any commands, only performs syntax checking", so you could do something like what I am doing here:
for f in **/*.fish; do fish -n "$f"; done

Bash - Why does $VAR1=FOO or 'VAR=FOO' (with quotes) return command not found?

For each of two examples below I'll try to explain what result I expected and what I got instead. I'm hoping for you to help me understand why I was wrong.
1)
VAR1=VAR2
$VAR1=FOO
result: -bash: VAR2=FOO: command not found
In the second line, $VAR1 gets expanded to VAR2, but why does Bash interpret the resulting VAR2=FOO as a command name rather than a variable assignment?
2)
'VAR=FOO'
result: -bash: VAR=FOO: command not found
Why do the quotes make Bash treat the variable assignment as a command name?
Could you please describe, step by step, how Bash processes my two examples?
How best to indirectly assign variables is adequately answered in other Q&A entries in this knowledgebase. Among those:
Indirect variable assignment in bash
Saving function output into a variable named in an argument
If that's what you actually intend to ask, then this question should be closed as a duplicate. I'm going to make a contrary assumption and focus on the literal question -- why your other approaches failed -- below.
What does the POSIX sh language specify as a valid assignment? Why does $var1=foo or 'var=foo' fail?
Background: On the POSIX sh specification
The POSIX shell command language specification is very specific about what constitutes an assignment, as quoted below:
4.21 Variable Assignment
In the shell command language, a word consisting of the following parts:
varname=value
When used in a context where assignment is defined to occur and at no other time, the value (representing a word or field) shall be assigned as the value of the variable denoted by varname.
The varname and value parts shall meet the requirements for a name and a word, respectively, except that they are delimited by the embedded unquoted equals-sign, in addition to other delimiters.
Also, from section 2.9.1, on Simple Commands, with emphasis added:
The words that are recognized as variable assignments or redirections according to Shell Grammar Rules are saved for processing in steps 3 and 4.
The words that are not variable assignments or redirections shall be expanded. If any fields remain following their expansion, the first field shall be considered the command name and remaining fields are the arguments for the command.
Redirections shall be performed as described in Redirection.
Each variable assignment shall be expanded for tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal prior to assigning the value.
Also, from the grammar:
If all the characters preceding '=' form a valid name (see the Base Definitions volume of IEEE Std 1003.1-2001, Section 3.230, Name), the token ASSIGNMENT_WORD shall be returned. (Quoted characters cannot participate in forming a valid name.)
Note from this:
The command must be recognized as an assignment at the very beginning of the parsing sequence, before any expansions (or quote removal!) have taken place.
The name must be a valid name. Literal quotes are not part of a valid variable name.
The equals sign must be unquoted. In your second example, the entire string was quoted.
Assignments are recognized before tilde expansion, parameter expansion, command substitution, etc.
Why $var1=foo fails to act as an assignment
As given in the grammar, all characters before the = in an assignment must be valid characters within a variable name for an assignment to be recognized. $ is not a valid character in a name. Because assignments are recognized in step 1 of simple command processing, before expansion takes place, the literal text $var1, not the value of that variable, is used for this matching.
Why 'var=foo' fails to act as an assignment
First, all characters before the = must be valid in variable names, and ' is not valid in a variable name.
Second, an assignment is only recognized if the = is not quoted.
1)
VAR1=VAR2
$VAR1=FOO
You want to use a variable name contained in a variable for the assignment. Bash syntax does not allow this. However, there is an easy workaround :
VAR1=VAR2
declare "$VAR1"=FOO
It works with local and export too.
2)
By using single quotes (double quotes would yield the same result), you are telling Bash that what is inside is a string and to treat it as a single entity. Since it is the first item on the line, Bash tries to find an alias, or shell builtin, or an executable file in its PATH, that would be named VAR=FOO. Not finding it, it tells you there is no such command.
An assignment is not a normal command. To perform an assignment contained in a quote, you would need to use eval, like so :
eval "$VAR1=FOO" # But please don't do that in real life
Most experienced bash programmers would probably tell you to avoid eval, as it has serious drawbacks, and I am giving it as an example just to recommend against its use : while in the example above it would not involve any security risk or error potential because the value of VAR1 is known and safe, there are many cases where an arbitrary (i.e. user-supplied) value could cause a crash or unexpected behavior. Quoting inside an eval statement is also more difficult and reduces readability.
You declare VAR2 earlier in the program, right?
If you are trying to assign the value of VAR2 to VAR1, then you need to make sure and use $ in front of VAR2, like so:
VAR1=$VAR2
That will set the value of VAR2 equal to VAR1, because when you utilize the $, you are saying that value that is stored in the variable. Otherwise it doesn't recognize it as a variable.
Basically, a variable that doesn't have a $ in front of it will be interpreted as a command. Any word will. That's why we have the $ to clarify "hey this is a variable".

Bash command groups: Why do curly braces require a semicolon?

I know the difference in purpose between parentheses () and curly braces {} when grouping commands in bash.
But why does the curly brace construct require a semicolon after the last command, whereas for the parentheses construct, the semicolon is optional?
$ while false; do ( echo "Hello"; echo "Goodbye"; ); done
$ while false; do ( echo "Hello"; echo "Goodbye" ); done
$ while false; do { echo "Hello"; echo "Goodbye"; }; done
$ while false; do { echo "Hello"; echo "Goodbye" }; done
bash: syntax error near unexpected token `done'
$
I'm looking for some insight as to why this is the case. I'm not looking for answers such as "because the documentation says so" or "because it was designed that way". I'd like to know why it was designed this is way. Or maybe if it is just a historical artifact?
This may be observed in at least the following versions of bash:
GNU bash, version 3.00.15(1)-release (x86_64-redhat-linux-gnu)
GNU bash, version 3.2.48(1)-release (x86_64-apple-darwin12)
GNU bash, version 4.2.25(1)-release (x86_64-pc-linux-gnu)
Because { and } are only recognized as special syntax if they are the first word in a command.
There are two important points here, both of which are found in the definitions section of the bash manual. First, is the list of metacharacters:
metacharacter
A character that, when unquoted, separates words. A metacharacter is a blank or one of the following characters: ‘|’, ‘&’, ‘;’, ‘(’, ‘)’, ‘<’, or ‘>’.
That list includes parentheses but not braces (neither curly nor square). Note that it is not a complete list of characters with special meaning to the shell, but it is a complete list of characters which separate tokens. So { and } do not separate tokens, and will only be considered tokens themselves if they are adjacent to a metacharacter, such as a space or a semi-colon.
Although braces are not metacharacters, they are treated specially by the shell in parameter expansion (eg. ${foo}) and brace expansion (eg. foo.{c,h}). Other than that, they are just normal characters. There is no problem with naming a file {ab}, for example, or }{, since those words do not conform to the syntax of either parameter expansion (which requires a $ before the {) or brace expansion (which requires at least one comma between { and }). For that matter, you could use { or } as a filename without ever having to quote the symbols. Similarly, you can call a file if, done or time without having to think about quoting the name.
These latter tokens are "reserved words":
reserved word
A word that has a special meaning to the shell. Most reserved words introduce shell flow control constructs, such as for and while.
The bash manual doesn't contain a complete list of reserved words, which is unfortunate, but they certainly include the Posix-designated:
! { }
case do done elif else
esac fi for if in
then until while
as well as the extensions implemented by bash (and some other shells):
[[ ]]
function select time
These words are not the same as built-ins (such as [), because they are actually part of the shell syntax. The built-ins could be implemented as functions or shell scripts, but reserved words cannot because they change the way that the shell parses the command line.
There is one very important feature of reserved words, which is not actually highlighted in the bash manual but is made very explicit in Posix (from which the above lists of reserved words were taken, except for time):
This recognition [as a reserved word] shall only occur when none of the characters is quoted and when the word is used as:
The first word of a command …
(The full list of places where reserved words is recognized is slightly longer, but the above is a pretty good summary.) In other words, reserved words are only reserved when they are the first word of a command. And, since { and } are reserved words, they are only special syntax if they are the first word in a command.
Example:
ls } # } is not a reserved word. It is an argument to `ls`
ls;} # } is a reserved word; `ls` has no arguments
There is lots more I could write about shell parsing, and bash parsing in particular, but it would rapidly get tedious. (For example, the rule about when # starts a comment and when it is just an ordinary character.) The approximate summary is: "don't try this at home"; really, the only thing which can parse shell commands is a shell. And don't try to make sense of it: it's just a random collection of arbitrary choices and historical anomalies, many but not all based on the need to not break ancient shell scripts with new features.

Why would I not leave extglob enabled in bash?

I just found out about the bash extglob shell option here:-
How can I use inverse or negative wildcards when pattern matching in a unix/linux shell?
All the answers that used shopt -s extglob also mentioned shopt -u extglob to turn it off.
Why would I want to turn something so useful off? Indeed why isn't it on by default?
Presumably it has the potential for giving some nasty surprises.
What are they?
No nasty surprises -- default-off behavior is only there for compatibility with traditional, standards-compliant pattern syntax.
Which is to say: It's possible (albeit unlikely) that someone writing fo+(o).* actually intended the + and the parenthesis to be treated as literal parts of the pattern matched by their code. For bash to interpret this expression in a different manner than what the POSIX sh specification calls for would be to break compatibility, which is right now done by default in very few cases (echo -e with xpg_echo unset being the only one that comes immediately to mind).
This is different from the usual case where bash extensions are extending behavior undefined by the POSIX standard -- cases where a baseline POSIX shell would typically throw an error, but bash instead offers some new and different explicitly documented behavior -- because the need to treat these characters as matching themselves is defined by POSIX.
To quote the relevant part of the specification, with emphasis added:
An ordinary character is a pattern that shall match itself. It can be any character in the supported character set except for NUL, those special shell characters in Quoting that require quoting, and the following three special pattern characters. Matching shall be based on the bit pattern used for encoding the character, not on the graphic representation of the character. If any character (ordinary, shell special, or pattern special) is quoted, that pattern shall match the character itself. The shell special characters always require quoting.
When unquoted and outside a bracket expression, the following three characters shall have special meaning in the specification of patterns:
? - A question-mark is a pattern that shall match any character.
* - An asterisk is a pattern that shall match multiple characters, as described in Patterns Matching Multiple Characters.
[ - The open bracket shall introduce a pattern bracket expression.
Thus, the standard explicitly requires any non-NUL character other than ?, * or [ or those listed elsewhere as requiring quoting to match themselves. Bash's behavior of having extglob off by default allows it to conform with this standard in its default configuration.
However, for your own scripts and your own interactive shell, unless you're making a habit of running code written for POSIX sh with unusual patterns included, enabling extglob is typically worth doing.
Being a Kornshell person, I have extglob on in my .bashrc by default because that's the way it is in Kornshell, and I use it a lot.
For example:
$ find !(target) -name "*.xml"
In Kornshell, this is no problem. In BASH, I need to set extglob. I also set lithist and set -o vi. This allows me to use VI commands in using my shell history, and when I hit v, it shows my code as a bunch of lines.
Without lithist set:
for i in *;do;echo "I see $i";done
With listhist set:
for i in *
do
echo "I see $i"
done
Now, only if BASH had the print statement, I'd be all set.

Why do backslashes prevent alias expansion?

In the first part of my question I will provide some background info as a
service to the community. The second part contains the actual question.
Part I
Assume I've created the following alias:
alias ls='ls -r'
I know how to temporarily unalias (i.e., override this alias) in the following
ways, using:
1) the full pathname of the command: /bin/ls
2) command substitution: $(which ls)
3) the command builtin: command ls
4) double quotation marks: "ls"
5) single quotation marks: 'ls'
6) a backslash character: \ls
Case 1 is obvious and case 2 is simply a variation. The command builtin in case 3 was designed to ignore shell functions, but apparently it also works for circumventing aliases. Finally, cases 4 and 5 are consistent with both the POSIX standard (2.3.1):
"a resulting word that is identified
to be the command name word of a
simple command shall be examined to
determine whether it is an unquoted,
valid alias name."
and the Bash Reference Manual (6.6):
"The first word of each simple
command, if unquoted, is checked to
see if it has an alias."
Part II
Here's the question: why is case 6 (overriding the alias by saying \ls)
considered quoting the word? In keeping with the style of this question, I am looking for references to the "official" documentation.
The documentation says that a backslash only escapes the following
character, as opposed to single and double quotation marks, which quote a
sequence of characters. POSIX standard (2.2.1):
"A backslash that is not quoted shall
preserve the literal value of the
following character, with the
exception of a < newline >"
Bash Reference Manual (3.1.2.1):
"A non-quoted backslash ‘\’ is the
Bash escape character. It preserves
the literal value of the next
character that follows, with the
exception of newline."
(BTW, isn't "the next character that follows" a bit of overkill?)
A possible answer might be that this situation isn't that special: it is
similar to a few cases in ANSI-C quoting, e.g. \nnn. However, that is still
escaping a single character (the eight-bit character whose value is the octal
value nnn), not a sequence of characters.
Historically, and maintained by POSIX, quoting any part of the word causes the entire word to be considered quoted for the purposes of functions and alias expansion. It also applies to quoting the end token for a here document:
cat << \EOF
this $text is fully quoted
EOF
Just for completion, here's yet another way to suppress alias & function lookups (by clearing the entire shell environment for a single command):
# cf. http://bashcurescancer.com/temporarily-clearing-environment-variables.html
env -i ls
\ls only quotes the first character rather than the whole word. It's equivalent to writing 'l's.
You can verify it like this:
$ touch \?l
$ \??
bash: ?l: command not found
If \?? quoted the whole word it would say ?? not found rather than ?l not found.
I.e. it has the same effect as:
$ '?'?
bash: ?l: command not found
rather than:
$ '??'
bash: ??: command not found

Resources