According to bash manual:
control operator
A token that performs a control function. It is a newline or one of the following: ‘||’, ‘&&’, ‘&’, ‘;’, ‘;;’, ‘|’, ‘|&’, ‘(’, or ‘)’.
metacharacter
A character that, when unquoted, separates words. A metacharacter is a blank or one of the following characters: ‘|’, ‘&’, ‘;’, ‘(’, ‘)’, ‘<’, or ‘>’.
Many characters are both control operator and metacharacter.
So how could I konw the syntax category of e.g. a ;?
Take if COND ; then CMD ; fi as an example.
; seems like a control operator in the context, for it can be substituted by newline.
However removing pre and post spaces around ; still works ok.
Isn't it supposed to be separated by sapces if it's an operator?
According to the bash manual, an operator is:
A control operator or a redirection operator. See Redirections,
for a list of redirection operators. Operators contain at least one
unquoted metacharacter.
The metacharacter is basically any character that cannot be part of a word.
Definition of word:
A sequence of characters treated as a unit by the shell. Words may not include unquoted metacharacters.
There is no need for spaces around operators because they always contain metacharacters, which makes the parser know it is not part of the word.
An exception is redirection, where e.g.
ls 2>&1
requires a space prior to the redirection statement since the operator has a parameter 2, and requires the parameter to be next to the operator (otherwise it will be a parameter to ls).
Related
What does a caret do when appended to a bash variable but within braces? I'm trying to decipher this within a bash script:
readonly TEST=${USER^}
When I don't know the meaning of some syntax in bash/sh I use my browsers find function in bash's manual and sh's specification. This is pretty effective as both contain the entire manual in a single page.
From bash's manual:
${parameter^pattern}
[...]
The ‘^’ operator converts lowercase letters matching pattern to uppercase
[...]
If pattern is omitted, it is treated like a ‘?’, which matches every character.
So ${variable^} expands to the value of $variable with the first letter converted to its uppercase variant.
Bracket expressions within case patterns seem to disallow [() &;]. However I can't seem to find any such restrictions (or escaping workarounds) in the POSIX shell spec, or in the bash manual for that matter.
case '&' in
# *[&]*) echo y ;; # won't parse
*[\&]*) echo y ;; # will parse & work
esac
# similar for ';', ' ', '(', ')'
# not a problem for ${var#[&; ()]}
This is in a sh shell script function that can't afford to call external utilities (but I'm curious about bash too). So... is there any spec that describes backslash-ing these characters within a bracket expression pattern?
No, I don't think it is explicitly documented anywhere.
But it can be deduced that Token Recognition Rule 6 is applied while the pattern list is being parsed. That is, unless quoted, control operators, redirection operators, and end of input are recognized as operators, and delimit a pattern. The shell expects | (indicates that another pattern follows) or ) (marks the end of the pattern list) to do that; and anything else causes a parse error.
As square brackets have no special meaning to the parser during tokenization, whether an operator occurs between them is irrelevant. And ${var#[&; ()]} is a different case; covered in Token Recognition Rule 5 and Parameter Expansion.
Consider the following two assignments.
$ a="foo bar"
$ b=$a
$ b=foo bar
bash: bar: command not found
Why does the second assignment work fine? How is the second command any different from the third command?
I was hoping the second assignment to fail because
b=$a
would expand to
b=foo bar
Since $a is not within double-quotes, foo bar is not quoted, therefore field-splitting should occur (as per my understanding) which would result in b=foo to be considered an assignment and bar to be a command that cannot be found.
Summary: I was expecting the 2nd command to fail for the same reason that caused the 3rd command to fail. Why does the 2nd command succeed?
I went through the POSIX but I am unable to find anything that specifies that field splitting won't occur after parameter expansion that occurs in an assignment.
I mean anywhere else field splitting would occur for an unquoted parameter after parameter expansion. For example,
$ a="foo bar"
$ printf "[%s] [%s]\n" $a
[foo] [bar]
See Section 2.6.5.
After parameter expansion (Parameter Expansion), command substitution (Command Substitution), and arithmetic expansion (Arithmetic Expansion), the shell shall scan the results of expansions and substitutions that did not occur in double-quotes for field splitting and multiple fields can result.
So which part of the POSIX standard prevents field splitting when parameter expansion occurs in an assignment statement?
In 2.9.1, "Simple Commands":
The words that are recognized as variable assignments or redirections according to Shell Grammar Rules are saved for processing in steps 3 and 4.
Step 2 -- which is explicitly skipped in this case per the above text -- reiterates that it ignores assignments when performing expansion and field splitting:
The words that are not variable assignments or redirections shall be expanded. If any fields remain following their expansion, the first field shall be considered the command name and remaining fields are the arguments for the command.
Thus, it's step 2 that determines the command to run (based on contents other than variable assignments and redirections), which addresses the b=$a case given in your question.
Step 4 performs other expansions -- "tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal" -- for assignments. Notably, field splitting is not a member of this set. Indeed, it's explicit in 2.6 that none of these create multiple words in and of themselves:
Tilde expansions, parameter expansions, command substitutions, arithmetic expansions, and quote removals that occur within a single word expand to a single field. It is only field splitting or pathname expansion that can create multiple fields from a single word. The single exception to this rule is the expansion of the special parameter '#' within double-quotes, as described in Special Parameters.
I'm fairly new to Bash and I'm having trouble working out what is happening to my input as it is interpreted. Specifically, when escaping occurs relative to the other expansion steps.
From what I've read, bash does the following (in order):
brace expansion
tilde expansion
parameter and variable expansion
command substitution
arithmetic expansion
word splitting
filename expansion
But this list doesn't include when it converts all escape sequences e.g. '\\' into their meanings e.g. '\'. That is, if I want to print a backslash character. The command to run is
echo \\
not
echo \
So the syntax required for the semantics of a backslash character is two backslashes. This must be converted into a single slash representation internally.
It seems to be sometime before command substitution as I found out with a small test program.
So, my question is: When does this step take place? (or a complete list of the bash interpretation loop would be perfect)
and also, are there any other subtleties in the interpreter that are likely to catch me out? (related to knowing the complete list i guess)
From the man page's Expansion section, just before the Redirection section.
Quote Removal
After the preceding expansions, all unquoted occurrences of the characters \, ', and " that did not result from one of the above expansions
are removed.
Quote removal is one final process after the seven expansions you list.
I've recently discovered that Awk's -v VAR=VAL syntax for initializing variables on the command line expands escape sequences in VAL. I previously thought that it was a good way to pass strings into Awk without needing to run an escaping function over them first.
For example, the following script:
awk -v VAR='x\tx' 'BEGIN{printf("%s\n", VAR);}'
I would expect to print
x\tx
but actually prints:
x x
An aside: environment variables to pass strings in unmodified instead, this question isn't asking how to get the behaviour I previously expected.
Here's what the man page has to say on the matter:
-v var=val, --assign var=val Assign the value val to the variable var, before execution of the program begins. Such variable values are available to the
BEGIN block of an AWK program.
And further down:
String Constants
String constants in AWK are sequences of characters enclosed between double quotes (like "value"). Within strings, certain escape
sequences are recognized, as in C. These are:
... list of escape seqeuences ...
The escape sequences may also be used inside constant regular expressions (e.g., /[ \t\f\n\r\v]/ matches whitespace characters).
In compatibility mode, the characters represented by octal and hexadecimal escape sequences are treated literally when used in
regular expression constants. Thus, /a\52b/ is equivalent to /a*b/.
The way I read this, val in -v var=val is not a string constant, and there is no text to indicate that the string constant escaping rules apply.
My questions:
Is there a more authoritative source for the awk language than the man page, and if so what does it specify?
What does POSIX have to say about this, if anything?
Do all versions of Awk behave this way, i.e. can I rely on the expansion being done if I actually want it?
The assignment is a string constant.
The relevant sections from the standard are:
-v assignment
The application shall ensure that the assignment argument is in the same form as an assignment operand. The specified variable assignment shall occur prior to executing the awk program, including the actions associated with BEGIN patterns (if any). Multiple occurrences of this option can be specified.
and
An operand that begins with an underscore or alphabetic character from the portable character set (see the table in XBD Portable Character Set ), followed by a sequence of underscores, digits, and alphabetics from the portable character set, followed by the '=' character, shall specify a variable assignment rather than a pathname. The characters before the '=' represent the name of an awk variable; if that name is an awk reserved word (see Grammar ) the behavior is undefined. The characters following the <equals-sign> shall be interpreted as if they appeared in the awk program preceded and followed by a double-quote ( ' )' character, as a STRING token (see Grammar ), except that if the last character is an unescaped , it shall be interpreted as a literal rather than as the first character of the sequence "\""