Bash: Reading files with defined file extensions in a loop - bash

While this code works
#!/bin/bash
d="test_files/*"
for f in $d.{mp3,txt} ;do
do something
done
putting the {mp3,txt} in to a variable does not, see code below.
#!/bin/bash
a={mp3,txt}
d="test_files/*"
for f in $d."$a" ;do
do smoething
done
the output here is /*.{mp3,txt}
Putting {mp3,txt} in to an array
a=({mp3,txt})
outputs only files with the *.mp3 extension.

It doesn't work because brace expansion happens before all other expansions.
From man bash:
Brace expansion is performed before any other expansions, and any
characters special to other expansions are preserved in the result. It
is strictly textual. Bash does not apply any syntactic interpretation
to the context of the expansion or the text between the braces. To
avoid conflicts with parameter expansion, the string ‘${’ is not
considered eligible for brace expansion
You can use eval to do brace expansion stored in variables, but it is not recommended. For example:
eval echo "$d.$a"

Related

Grep fails when file name stored in variable using BASH symbols like { and }

I have two text files identical to each other a.text and b.text with the same content.
Content of a.text and b.text
abcd
target
efgh
Can anyone explain why one of the commands work but not the other and if there is a way of making it work?
Output of command 1
grep "target" {a,b}.text
>>a.text:target
b.text:target
Output of command 2
file="{a,b}.text"
grep "target" $file
>>grep: {a,b}.text: No such file or directory
Happy if someone can point me to a location where I can read more about this as well. I can only assume that when storing it as a variable it explicitly looks for a file called {a,b}.text although, what I am not as sure about is why and what leads to that.
As user1934428 said, brace expansion happens before parameter expansion. Quoting from the bash manual:
Expansion is performed on the command line after it has been split into words. There are seven kinds of expansion performed: brace expansion, tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, word splitting, and pathname expansion.
The order of expansions is: brace expansion, tilde expansion, parameter, variable and arithmetic expansion and command substitution (done in a left-to-right fashion), word splitting, and pathname expansion.
To get this to work you can store the file names in an array. Arrays can hold multiple file names without being subject to quoting or expansion issues that plague normal string variables.
files=({a,b}.text)
grep "target" "${files[#]}"
This works because {a,b} is now evaluated when the variable is assigned, rather than when it is expanded.
In the first case, you have a brace-expansion. In the second case, you are searching a file with the literal name {a,b}.text. The reason is that in bash, brace expansion happens before parameter expansion.

Order of brace expansion and parameter expansion

A common trope on StackOverflow bash is: "Why doesn't x=99; echo {1..$x} work?"
The answer is "because braces are expanded before parameters/variables".
Therefore, I thought it should be possible to expand multiple variables using a single $ and a brace. I'd expect a=1; b=2; c=3; echo ${{a..c}} to print 1 2 3. First, the inner brace would expand to ${a} ${b} ${c} (which it does when writing echo \${{a..c}}). Then that result would undergo parameter expansion.
However, I got -bash: ${{a..c}}: bad substitution so {a..c} wasn't expanded at all.
Bash's manual is a bit more specific (emphasis mine).
Expansion is performed on the command line after it has been split into tokens [...]
The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and filename expansion.
Note the ; and , in that list. "Left-to-right fashion" seems to apply to the whole (therefore unordered) list before the ;. Just like the mathematical operators * and / have no precedence over each other.
Ok, so brace expansion is not really of higher precedence than parameter expansion. It's just that both {1..$x} and ${{a..c}} are evaluated from left to right, meaning the brace { comes before the parameter $x and the parameter ${ comes before the brace {a..c}.
Or so I thought. However, when using $ instead of ${ then parameters on the left expand after braces on the right:
# in bash 5.0.3(1)
x=nil; x1=one; x2=two
echo ${x{1..2}} # prints `-bash: ${x{1..2}}: bad substitution`
echo $x{1..2} # prints `one two`
Question
Could it be that the bash manual is flawed or did I read it wrong?
If the manual is flawed: What is the exact order of all expansions?
I'm just asking because I'm curious. I don't plan to use thinks like $x{1..2} anywhere. I'm not interested in better solutions or alternatives to address multiple variables (e.g. array slices ${array[#]:1:2}). I just want to get a deeper understanding.
from: https://www.gnu.org/software/bash/manual/html_node/Brace-Expansion.html
To avoid conflicts with parameter expansion, the string ‘${’ is not
considered eligible for brace expansion, and inhibits brace expansion
until the closing ‘}’.
That said, for echo $x{1..2} , first the brace expansion takes place, and then the parameter expansion, so we have echo $x1 $x2. For echo ${x{1..2}} the brace expansion doesn't happen, because we are after the ${ and haven't reached the closing } of the parameter expansion.
Regarding the bash manual part you have quoted, left-to-right order still exists for the expansions (with respect to allowed nested ones). Things get clearer if you format the list instead of using , and ;:
brace expansion
In a left-to-right fashion:
tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution
word splitting
filename expansion.
Read Mo Budlong's 1988 classic Command Line Psychology, which was written for regular Unix, but most of it still applies to bash. The order of evaluation goes:
1 History substitution (except for the Bourne shell)
2 Splitting words, including special characters
3 Updating the history list (except for the Bourne shell)
4 Interpreting single and double quotes
5 Alias substitution (except for the Bourne shell)
6 Redirection of input and output (< > and |)
7 Variable substitution (variables starting with $)
8 Command substitution (commands inside back quotes)
9 File name expansion (file name wild cards)
So what bash does with code like {1..3} happens before step 7 above, and that's why the OP code fails.
But if we must, there's always eval, (which should only be used if the variables are known in advance, or first cautiously type checked):
a=1; b=2; c=3; eval echo \{$a..$c}
Output:
1 2 3

Bash nested subshell argument expansion

Why is the $bar being printed here as a literal, even thought the outer subshell should expand it's parameters according to bash command line processing rules?
$ foo='$bar' bar=expanded
$ echo $(echo $(echo $foo))
$bar
The inner subshell prints $bar, but why doesn't the outer subshell expand it? Does the bash implicitly pass it as a literal and if so, why and how? According to my knowledge, the parameter expansions happens after each fork of the subshell, inside the new process. In the case of nested subshells, the command substitution is done from inside out, inner subshell printing out the literal, raw text representation of the outer shell command line before the fork happens and the command line (string of characters) is being split, expanded and processed by the new shell. Now the question is, why the text $bar is not expanded in the outer subshell, even thought it actually doesn't contain quotes? What causes it to be implicitly quoted here?
Here is example of the same logic and expected output without nested shells
$ foo='$bar' bar=expanded
$ echo $foo
$bar
$ echo $bar
expanded
Also, by adding eval I get the result which I would expect in the first example, but I don't undertand why it's necessary and how it wokrs.
$ echo $(eval echo $(echo $foo))
expanded
The Bash manual explains the ordering shell expansions: (reformatted for clarity)
The order of expansions is:
brace expansion;
tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion);
word splitting;
and filename expansion.
On systems that can support it, there is an additional expansion available: process substitution. This is performed at the same time as tilde, parameter, variable, and arithmetic expansion and command substitution.
After these expansions are performed, quote characters present in the original word are removed unless they have been quoted themselves (quote removal).
This essentially echoes the Posix shell specification with the addition of some bash-specific expansions.
Note that the second group of expansions, which includes command substitution ($(...)) is only performed once, left-to-right. They are not performed repetitively, so the result of a command substitution is not subject to parameter expansion. Unless quoted, it is subject to word-splitting, filename expansion, and quote removal.
The commands evaluated in subshells are, indeed, evaluated inside out, but at each level the inner command substitution is only subject to word-splitting, filename expansion and quote removal (none of which apply in thus example).
So the only parameter expansion done is the replacement of $foo with its value.

No such file or directory (ls) in conjunction with tilde expansion

I am writing a simple bash script and wanted to display all the items in a a particular directory. I tried doing the following:
desktop="~/Desktop/testr/"
echo $desktop
echo `ls $desktop`
However I keep getting the output:
~/Desktop/testr/
ls: ~/Desktop/testr/: No such file or directory
But when I run ls from the terminal, I can see the items. I suspect that the problem is that the ~ is not getting expanded but I thought that the double quotes would have taken care of that.
Thanks for your help!
This is because within quoted strings there is no tilde expansion and tilde expansion comes before parameter substitution in the echo line.
The sequence of expansions is:
Tilde expansion
parameter expansion
command substitution
arithmetic expansion
Field splitting
Pathname expansion
Quote removal
See the POSIX Shell Specification on Word Expansions for the gory details.

Quoting vs not quoting the variable on the RHS of a variable assignment

In shell scripting, what is the difference between these two when assigning one variable to another:
a=$b
and
a="$b"
and when should I use one over the other?
I think there is no big difference here. Yes, it is advisable to enclose a variable in double quotes when that variable is being referenced. However, $x does not seem to be referenced here in your question.
y=$x does not by itself affect how whitespaces will be handled. It is only when $y is actually used that quoting matters. For example:
$ x=" a b "
$ y=$x
$ echo $y
a b
$ echo "$y"
a b
From section 2.9.1 of the POSIX shell syntax specification:
Each variable assignment shall be expanded for tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal prior to assigning the value.
String-splitting and globbing (the steps which double quotes suppress) are not in this list.
Thus, the quotes are superfluous in all simple assignments (not speaking here to those implemented with arguments to declare, export or similar commands) except those where (1) the behavior of single-quoted, not double-quoted, strings are desired; or (2) whitespace or other content in the value would be otherwise parsed as syntactic rather than literal.
(Note that the decision on how to parse a command -- thus, whether it is an assignment, a simple command, a compound command, or something else -- takes place before parameter expansions; thus, var=$1 is determined to be an assignment before the value of $1 is ever considered! Were this untrue, such that data could silently become syntax, it would be far more difficult -- if not impossible -- to write secure code handling untrusted data in bash).
There are no (good) reasons to double quote the RHS of a variable assignment when used as a statement on its own.
The RHS of an assignment statement is not subject to word splitting (or brace expansion), etc. so cannot need quotes to assign correctly. All other expansions (as far as I'm aware) do occur on the RHS but also occur in double quotes so the quoting serves no purpose.
That being said there are reasons not to quote the RHS. Namely how to address error "bash: !d': event not found" in Bash command substitution (specifically see my answer and rici's answer).
Here are some other examples: ( having two files in the current directory t.sh and file)
a='$(ls)' # no command substitution
b="$(ls)" # command substitution, no word splitting
c='*' # no filename expansion
d="*" # no filename expansion
e=* # no filename expansion
f=$a # no expansions or splittings
g="$a" # no expansions or splittings
h=$d # no expansions or splittings
echo ---'$a'---
echo $a # no command substitution
echo ---'$b'---
echo $b # word splitting
echo ---'"$b"'---
echo "$b" # no word splitting
echo ---'$c'---
echo $c # filename expansion, word splitting
echo ---'"$c"'---
echo "$c" # no filename expansion, no word splitting
echo ---'$d'---
echo $d # filename expansion, word splitting
echo ---'"$d"'---
echo "$d" # no filename expansion, no word splitting
echo ---'"$e"'---
echo "$e" # no filename expansion, no word splitting
echo ---'$e'---
echo $e # filename expansion, word splitting
echo ---'"$f"'---
echo "$f" # no filename expansion, no word splitting
echo ---'"$g"'---
echo "$g" # no filename expansion, no word splitting
echo ---'$h'---
echo $h # filename expansion, word splitting
echo ---'"$h"'---
echo "$h" # no filename expansion, no word splitting
Output:
---$a---
$(ls)
---$b---
file t.sh
---"$b"---
file
t.sh
---$c---
file t.sh
---"$c"---
*
---$d---
file t.sh
---"$d"---
*
---"$e"---
*
---$e---
file t.sh
---"$f"---
$(ls)
---"$g"---
$(ls)
---$h---
file t.sh
---"$h"---
*
One interesting thing to notice is that command substitution occurs in variable assignments if they are in double quotes, and if the RHS is given explicitly as "$(ls)" and not implicitly as "$a"..
Advanced Bash-Scripting Guide: Chapter 5: Quoting
When referencing a variable, it is
generally advisable to enclose its
name in double quotes. This prevents
reinterpretation of all special
characters within the quoted string.
Use double quotes to prevent word
splitting. An argument enclosed in
double quotes presents itself as a
single word, even if it contains
whitespace separators.

Resources