Bash command groups: Why do curly braces require a semicolon? - bash

I know the difference in purpose between parentheses () and curly braces {} when grouping commands in bash.
But why does the curly brace construct require a semicolon after the last command, whereas for the parentheses construct, the semicolon is optional?
$ while false; do ( echo "Hello"; echo "Goodbye"; ); done
$ while false; do ( echo "Hello"; echo "Goodbye" ); done
$ while false; do { echo "Hello"; echo "Goodbye"; }; done
$ while false; do { echo "Hello"; echo "Goodbye" }; done
bash: syntax error near unexpected token `done'
$
I'm looking for some insight as to why this is the case. I'm not looking for answers such as "because the documentation says so" or "because it was designed that way". I'd like to know why it was designed this is way. Or maybe if it is just a historical artifact?
This may be observed in at least the following versions of bash:
GNU bash, version 3.00.15(1)-release (x86_64-redhat-linux-gnu)
GNU bash, version 3.2.48(1)-release (x86_64-apple-darwin12)
GNU bash, version 4.2.25(1)-release (x86_64-pc-linux-gnu)

Because { and } are only recognized as special syntax if they are the first word in a command.
There are two important points here, both of which are found in the definitions section of the bash manual. First, is the list of metacharacters:
metacharacter
A character that, when unquoted, separates words. A metacharacter is a blank or one of the following characters: ‘|’, ‘&’, ‘;’, ‘(’, ‘)’, ‘<’, or ‘>’.
That list includes parentheses but not braces (neither curly nor square). Note that it is not a complete list of characters with special meaning to the shell, but it is a complete list of characters which separate tokens. So { and } do not separate tokens, and will only be considered tokens themselves if they are adjacent to a metacharacter, such as a space or a semi-colon.
Although braces are not metacharacters, they are treated specially by the shell in parameter expansion (eg. ${foo}) and brace expansion (eg. foo.{c,h}). Other than that, they are just normal characters. There is no problem with naming a file {ab}, for example, or }{, since those words do not conform to the syntax of either parameter expansion (which requires a $ before the {) or brace expansion (which requires at least one comma between { and }). For that matter, you could use { or } as a filename without ever having to quote the symbols. Similarly, you can call a file if, done or time without having to think about quoting the name.
These latter tokens are "reserved words":
reserved word
A word that has a special meaning to the shell. Most reserved words introduce shell flow control constructs, such as for and while.
The bash manual doesn't contain a complete list of reserved words, which is unfortunate, but they certainly include the Posix-designated:
! { }
case do done elif else
esac fi for if in
then until while
as well as the extensions implemented by bash (and some other shells):
[[ ]]
function select time
These words are not the same as built-ins (such as [), because they are actually part of the shell syntax. The built-ins could be implemented as functions or shell scripts, but reserved words cannot because they change the way that the shell parses the command line.
There is one very important feature of reserved words, which is not actually highlighted in the bash manual but is made very explicit in Posix (from which the above lists of reserved words were taken, except for time):
This recognition [as a reserved word] shall only occur when none of the characters is quoted and when the word is used as:
The first word of a command …
(The full list of places where reserved words is recognized is slightly longer, but the above is a pretty good summary.) In other words, reserved words are only reserved when they are the first word of a command. And, since { and } are reserved words, they are only special syntax if they are the first word in a command.
Example:
ls } # } is not a reserved word. It is an argument to `ls`
ls;} # } is a reserved word; `ls` has no arguments
There is lots more I could write about shell parsing, and bash parsing in particular, but it would rapidly get tedious. (For example, the rule about when # starts a comment and when it is just an ordinary character.) The approximate summary is: "don't try this at home"; really, the only thing which can parse shell commands is a shell. And don't try to make sense of it: it's just a random collection of arbitrary choices and historical anomalies, many but not all based on the need to not break ancient shell scripts with new features.

Related

How to create new names for files with problematic characters for use in an existing bash scripted environment?

The goal is to get rid of (by changing) filenames that give headaches for scripting by translating them to something else. The reason is that in this nearly 30 year Unix / Linux environment, with a lot of existing scripts that may not be "written correctly", a new, large and important cache of files arrived that have to be managed, and so, a colleague has asked me to write a script to help with "problematic filenames" and translate them. They've got a list of chars to turn into dots, such as the comma, and another list to turn into underscores, such as whitespace, as but two examples and ran into problems which I asked about over here.
I was using tr to do it, but commenters to it said I should perhaps ask just about this instead of how to get tr to work. So, I have!
Parameter expansion can do this for you.
Note that unlike when using tr (as requested on your other question), when using parameter expansion you don't need to use backslashes inside your character class definitions: put the expansion in double quotes and bash will treat the results of that expansion as literal.
#!/usr/bin/env bash
toDots='\,;:|+##$%^&*~'
toUnderscores='}{]['"'"'="()`!'
# requires bash 5+: if debug=1, then print what we would do instead of doing it
runOrDebug() {
if (( debug )); then
printf '%s\n' "${*#Q}"
else
"$#"
fi
}
renameFiles() {
local name subDots subBoth
for name; do
subDots=${name//["$toDots"]/.}
subBoth=${subDots//["$toUnderscores"]/_}
if [[ $subBoth != "$name" ]]; then
runOrDebug mv -- "$name" "$subBoth"
fi
done
}
debug=1 renameFiles '[/a],/;[p:r|o\b+lem#a#t$i%c]/#(%$^!/(e^n&t*ry)~='
Note that toUnderscores is (except for the single quote in the middle) in single quotes, so all the backslashes in it are part of the variable's data rather than being syntax; because globs use character class syntax from REs, they're parsed as POSIX regular expression character class syntax.
See a demonstration of the technique running at https://ideone.com/kKE7IJ

Why is a separator not required in `for i do cmd; done` [duplicate]

This question already has an answer here:
Is a semicolon prohibited after NAME in `for NAME do ...`?
(1 answer)
Closed 4 years ago.
for i do echo $i; done
How is this even legal? (I would expect it to be written with an extra semi-colon for i; do echo $i; done) It works in bash, dash, zsh, and ksh. The standard (by which I mean http://pubs.opengroup.org/onlinepubs/9699919799/) states:
The for loop requires that the reserved words do and done be used to
delimit the sequence of commands. The format for the for loop is as follows:
for name [ in [word ... ]]
do
compound-list
done
So clearly when "in word" is omitted, do is serving as a separator. So the implication seems to be that the separator (the newline) after the [ in [word .. ]] actually belongs inside the closing right bracket. Can someone point to anything in the standard which justifies this (IMO) horrible abuse of the language?
If you look at the GNU man page, you see the loop has this syntax:
for
The syntax of the for command is:
for name [ [in [words …] ] ; ] do commands; done
So as you can see the extra semi-colon is part of the optional section.
The linux-die man page states the same.
Perhaps someone else can fill you in on the history of the syntax, but in shell scripting, space characters (by default) are the delimiters. The ; character is just a statement separator. It's often used instead of a newline to write logic as a "one-liner". Semicolons at the end of a line in a shell script are superfluous as both are command separators.
Also worth noting is the bracket syntax above is meant to indicate that those parts of the syntax are optional (i.e. you don't actually use brackets in the script there).
Finally, I find it's good to think of shells as... well, shells. You can write scripts with them and as "languages" they're Turing complete, but the syntax is often kind of funky.

How does `extension="${filename##*.}"` work in bash? [duplicate]

This question already has answers here:
What is the meaning of the ${0##...} syntax with variable, braces and hash character in bash?
(4 answers)
Closed 2 years ago.
While looking online on how to get a file's extension and name, I found:
filename=$(basename "$fullfile")
extension="${filename##*.}"
filename="${filename%.*}
What is the ${} syntax...? I know regular expressions but "${filename##*.}" and "${filename%.*} escape my understanding.
Also, what's the difference between:
filename=$(basename "$fullfile")
And
filename=`basename "$fullfile"`
...?
Looking in Google is a nightmare, because of the strange characters...
The ${filename##*.} expression is parameter expansion ("parameters" being the technical name for the shell feature that other languages call "variables"). Plain ${varname} is the value of the parameter named varname, and if that's all you're doing, you can leave off the curly braces and just put $varname. But if you leave the curly braces there, you can put other things inside them after the name, to modify the result. The # and % are some of the most basic modifiers - they remove a prefix or suffix of the string that matches a wildcard pattern. # removes from the beginning, and % from the end; in each case, a single instance of the symbol removes the shortest matching string, while a double symbol matches the longest. So ${filename##*.} is "the value of filename with everything from the beginning to the last period removed", while ${filename%.*} is "the value of filename with everything from the last period to the end removed".
The backticks syntax (`...`) is the original way of doing command substitution in the Bourne shell, and has since been borrowed by languages like Perl and Ruby to incorporate calling out to system commands. But it doesn't deal well with nesting, and its attempt to even allow nesting means that quoting works differently inside them, and it's all very confusing. The newer $(...) syntax, originally introduced in the Korn shell and then adopted by Bash and zsh and codified by POSIX, lets quoting work the same at all levels of a nested substitution and makes for a nice symmetry with the ${...} parameter expansion.
As #e0k states in a comment on the question the ${varname...} syntax is Bash's parameter (variable) expansion. It has its own syntax that is unrelated to regular expressions; it encompasses a broad set of features that include:
specifying a default value
prefix and postfix stripping
string replacement
substring extraction
The difference between `...` and $(...) (both of which are forms of so-called command substitutions) is:
`...` is the older syntax (often called deprecated, but that's not strictly true).
$(...) is its modern equivalent, which facilitates nested use and works more intuitively when it comes to quoting.
See here for more information.

how to escape paths to be executed with $( )?

I have program whose textual output I want to directly execute in a shell. How shall I format the output of this program such that the paths with spaces are accepted by the shell ?
$(echo ls /folderA/folder\ with\ spaces/)
Some more info: the program that generates the output is coded in Haskell (source). It's a simple program that keeps a list of my favorite commands. It prints the commands with 'cmdl -l'. I can then choose one command to execute with 'cmdl -g12' for command number 12. Thanks for pointing out that instead of $( ) use 'cmdl -g12 | bash', I wasn't aware of that...
How shall I format the output of this program such that the paths with
spaces are accepted by the shell ?
The shell cannot distinguish between spaces that are part of a path and spaces that are separator between arguments, unless those are properly quoted. Moreover, you actually need proper quoting using single quotes ('...') in order to "shield" all those characters combinations that might otherwise have special meaning for the shell (\, &, |, ||, ...).
Depending the language used for your external tool, their might be a library available for that purpose. As as example, Python has pipes.quote (shlex.quote on Python 3) and Perl has String::ShellQuote::shell_quote.
I'm not quite sure I understand, but don't you just want to pipe through the shell?
For a program called foo
$ foo | sh
To format output from your program so Bash won't try to space-separate them into arguments either update, probably easiest just to double-quote them with any normal quoting method around each argument, e.g.
mkdir "/tmp/Joey \"The Lips\" Fagan"
As you saw, you can backslash the spaces alternatively, but I find that less readable ususally.
EDIT:
If you may have special shell characters (&|``()[]$ etc), you'll have to do it the hard/proper way (with a specific escaper for your language and target - as others have mentioned.
It's not just spaces you need to worry about, but other characters such as [ and ] (glob a.k.a pathname-expansion characters) and metacharacters such as ;, &, (, ...
You can use the following approach:
Enclose the string in single quotes.
Replace existing single quotes in the string with '\'' (which effectively breaks the string into multiple parts with spliced in \-escaped single quotes; the shell then reassembles the parts into a single string).
Example:
I'm good (& well[1];) would encode to 'I'\''m good (& well[1]);'
Note how single-quoting allows literal use of the glob characters and metacharacters.
Since single quotes themselves can never be used within single-quoted strings (there's not even an escape), the splicing-in approach described above is needed.
As described by #mklement0, a safe algorithm is to wrap every argument in a pair of single quotes, and quote single quotes inside arguments as '\''. Here is a shell function that does it:
function quote {
typeset cmd="" escaped
for arg; do
escaped=${arg//\'/\'\\\'\'}
cmd="$cmd '$escaped'"
done
printf %s "$cmd"
}
$ quote foo "bar baz" "don't do it"
'foo' 'bar baz' 'don'\''t do it'

Why would I not leave extglob enabled in bash?

I just found out about the bash extglob shell option here:-
How can I use inverse or negative wildcards when pattern matching in a unix/linux shell?
All the answers that used shopt -s extglob also mentioned shopt -u extglob to turn it off.
Why would I want to turn something so useful off? Indeed why isn't it on by default?
Presumably it has the potential for giving some nasty surprises.
What are they?
No nasty surprises -- default-off behavior is only there for compatibility with traditional, standards-compliant pattern syntax.
Which is to say: It's possible (albeit unlikely) that someone writing fo+(o).* actually intended the + and the parenthesis to be treated as literal parts of the pattern matched by their code. For bash to interpret this expression in a different manner than what the POSIX sh specification calls for would be to break compatibility, which is right now done by default in very few cases (echo -e with xpg_echo unset being the only one that comes immediately to mind).
This is different from the usual case where bash extensions are extending behavior undefined by the POSIX standard -- cases where a baseline POSIX shell would typically throw an error, but bash instead offers some new and different explicitly documented behavior -- because the need to treat these characters as matching themselves is defined by POSIX.
To quote the relevant part of the specification, with emphasis added:
An ordinary character is a pattern that shall match itself. It can be any character in the supported character set except for NUL, those special shell characters in Quoting that require quoting, and the following three special pattern characters. Matching shall be based on the bit pattern used for encoding the character, not on the graphic representation of the character. If any character (ordinary, shell special, or pattern special) is quoted, that pattern shall match the character itself. The shell special characters always require quoting.
When unquoted and outside a bracket expression, the following three characters shall have special meaning in the specification of patterns:
? - A question-mark is a pattern that shall match any character.
* - An asterisk is a pattern that shall match multiple characters, as described in Patterns Matching Multiple Characters.
[ - The open bracket shall introduce a pattern bracket expression.
Thus, the standard explicitly requires any non-NUL character other than ?, * or [ or those listed elsewhere as requiring quoting to match themselves. Bash's behavior of having extglob off by default allows it to conform with this standard in its default configuration.
However, for your own scripts and your own interactive shell, unless you're making a habit of running code written for POSIX sh with unusual patterns included, enabling extglob is typically worth doing.
Being a Kornshell person, I have extglob on in my .bashrc by default because that's the way it is in Kornshell, and I use it a lot.
For example:
$ find !(target) -name "*.xml"
In Kornshell, this is no problem. In BASH, I need to set extglob. I also set lithist and set -o vi. This allows me to use VI commands in using my shell history, and when I hit v, it shows my code as a bunch of lines.
Without lithist set:
for i in *;do;echo "I see $i";done
With listhist set:
for i in *
do
echo "I see $i"
done
Now, only if BASH had the print statement, I'd be all set.

Resources