Using "expanding characters" in a variable in a bash script - bash

I apologize beforehand for this question, which is probably both ill formulated and answered a thousand times over. I get the feeling that my inability to find an answer is that I don't quite know how to ask the question.
I'm writing a script that traverses folders in a bunch of mounted external hard drives, like so:
for g in /Volumes/compartment-?/{Private/Daniel,Daniel}/Projects/*/*
It then proceeds to perform long-running tasks on each of the directories found there. Because these operations are io-intensive rather than cpu-intensive, I thought I'd add the option to provide which "compartment" I want to work in, so that I can parallelize the workloads.
But, doing
cmp="?"
[[ ! "$1" = "" ]] && cmp="$1"
And then,
for g in /Volumes/compartment-$cmp/{Private/Daniel,Daniel}/Projects/*/*
Doesn't work - the question mark that should expand to all compartments instead becomes literal, so I get an error that "compartment-?" doesn't exist, which is of course true.
How do I create a variable with a value that "expands," like dir="./*" working with ls $dir?
EDIT: Thanks to #dan for the answer. I was brought up to be courteous and thank people, so I did thank him for it in a comment on his question, but that comment has been removed, and I'm anxious that repeating it might be some kind of infraction here. I ended up simply escaping my question mark glob character, i.e. \?, since for this script I only need to either search all drives or one particular drive. But I'll keep the answer handy for the next time I write a script where I'd like to support more advanced arguments.

Brace expansion occurs before variable expansion. Pathname/glob expansion (eg ?, *) occurs last. Therefore you can't use the glob character ? in a variable, and in a brace expansion.
You can use a glob expression in an unquoted variable, without brace expansion. Eg. q=\?; echo compartment-$q is equivalent to echo compartment-?.
To solve your problem, you could define an array based on the input argument:
if [[ $1 ]]; then
[[ -d /Volumes/compartment-$1 ]] || exit 1
files=("/Volumes/compartment-$1"/{Private/Daniel,Daniel}/Projects/*/*)
else
files=(/Volumes/compartment-?/{Private/Daniel,Daniel}/Projects/*/*)
fi
# then iterate the list:
for i in "${files[#]}"; do
...
Another option is a nested loop. The path expression in the outer loop doesn't use brace expansion, so (unlike the first example) it can expand a glob in $1 (or default to ? if $1 is empty):
for i in /Volumes/compartments-${1:-?}; do
[[ -d $i ]] &&
for j in {Private/Daniel,Daniel}/Projects/*/*; do
[[ -e $j ]] || continue
...
Note that the second example expands a glob expression passed in $1 (eg. ./script '[1-9]'). The first example does not.
Remember that pathname expansion has the property of expanding only to existing files, or literally. shopt -s nullglob guarantees expansion only to existing files (or nothing).
You should either use nullglob, or check that each file or directory exists, like in the examples above.
Using $1 unquoted also subjects it to word splitting on whitespace. You can set IFS= (empty) to avoid this.

Related

Bash use variable in 'for' keyword [duplicate]

This question already has answers here:
When to wrap quotes around a shell variable?
(5 answers)
In bash, how do I expand a wildcard while it's inside double quotes?
(1 answer)
Closed 5 months ago.
How to get correctly work?
Below line is working but need use argument in 'for'.
for f in ~/files/*/*.txt do
Code:
list ()
{
for f in $1; do
echo $f
done
}
list "~/files/*/*.txt"
list "~/files/*.txt"
Output:
~/files/*/*.txt
~/files/*.txt
The problem lies in the expansion order in bash.
There is a certain logic in what you did.
It would work with list "./dir/*/*.txt" for example.
You enclosed list argument with double quotes, so that pattern expansion is not done when calling list.
So, $1 in "list" function is literally ./dir/*/*.txt. As you wanted it to be.
Then you don't enclose $1 in the for instruction, so that it is expanded into a list of file names.
So it does what you want.
Or almost so.
Problem is that the order of expansion is :
~ then parameters then patterns (plus other irrelevant in between).
So, when $1 is substituted by its value, it is too late to substitute ~.
If you had done that (not at all a suggestion. Just an explanation) :
list ()
{
for f in ~/$1; do
echo $f
done
}
list "files/*/*.txt"
list "files/*.txt"
It would have worked. Not a solution obviously. But it helps understand what happens: "files/*.txt" is literally passed to "list".
Then
for f in ~/$1
is transformed into
for f in /home/you/$1 [~ substitution]
then transformed into
for f in /home/you/files/*.txt [parameter substitution]
then transformed into
for f in /home/you/files/a.txt /home/you/files/b.txt [pattern expansion]
Now for the solution, quoting the arguments and then using $#, as suggested in comments, would do the trick indeed.
If you don't quote the argument and call
list ~/files/*.txt
Then expansion will occur before the call.
list ~/files/*.txt
is transformed into
list /home/you/files/*.txt
then info
list /home/you/files/a.txt /home/you/files/b.txt.
Then passed to list.
But then, inside list, what you have are 2 arguments.
So indeed, for the for the for, then, you need to use "$#"
list ()
{
for f in "$#"; do
echo $f
done
}
list ~/files/list.txt
Another way, if you have a reason to want the expansion to occurs inside "list" (for example, if you may want to pass two of those patterns), would be to force the expansion after the parameter substitution.
There is no ideal way to do that.
Either you need to recode (or use external commands, such as realpath) the path expansion.
Or you use "eval" to force double evaluation of your "$1". But that is a huge security breach if the arguments come from the user (one could use $(rm -fr /) as an argument, and eval would execute it), plus it can also be tricky, if you have, for example, filenames containing "$".
If you know that the patterns will always look like your examples (maybe a tilde and some * and likes) then you could just do the tilde substitution yourself and keep the rest of the code as is
list ()
{
param=${1/#\~/$HOME}
for f in $param; do
echo $f
done
}
list "~/files/list.txt"
Not the best solution. But the one closest to yours.
tl;dr:
The problem is the order of substitution is bash. You need to understand how bash works by rewriting commands in several stages before execution.
More specifically, because ~ is expanded before parameters and variables. So if x="~/*", then echo $x means echo $x after ~ expansion (no ~ in echo $x), then echo ~/* after variable expansion ($x is replaced by its value), and then echo ~/* after * expansion (since you have to directory literally named ~, * matches nothing).
The easiest solution is to have list take many arguments, not just one, let the expansion occurs before the call to list (so not enclosing argument to list in "), and then, rewrite list by taking into account that $1 is just the first of many arguments.
If you insist on having a single argument to list, you have to deal with potential ~ yourself. Like with ${1/#\~/$HOME} if ~ are always single ~ (not ~user) at the beginning of the pattern.

What are the shell statements where expanding a variable with or without double quotes is 100% equivalent?

Let's suppose that you have a variable which is subject to word splitting, globing and pattern matching:
var='*
.?'
While I'm pretty sure that everyone agrees that "$var" is the best way to expand the variable as a string literal, I've identified a few cases where you don't need to use the double quotes:
Simple assignment: x=$var
Case statement: case $var in ...
Leftmost part of bash test construct: [[ $var .... ]]
UPDATE1: Bash here-string: <<< $var which works starting from bash-4.4 (thank you #GordonDavidson)
UPDATE2: Exported assignment (in bash): export x=$var
Is it correct? Is there any other shell/bash statement where the variable isn't subject to glob expansion or word splitting without using double-quotes? where expanding a variable with or without double quotes is 100% equivalent?
The reason why I ask this question is that when reading foreign code, knowing the above mentioned border-cases might help.
For example, one bug that I found in a script that I was debugging is something like:
out_exists="-f a.out"
[[ $out_exists ]] && mv a.out prog.exe
mv: cannot stat ‘a.out’: No such file or directory
This question is a duplicate of What are the contexts where Bash doesn't perform word splitting and globbing?, but that was closed before it was answered.
For a thorough answer to the question see the answer by Stéphane Chazelas to What are the contexts where Bash doesn't perform word splitting and globbing? - Unix & Linux Stack Exchange. Another good answer is in the "Where you can omit the double quotes" section in the answer by Gilles to When is double-quoting necessary? - Unix & Linux Stack Exchange.
There seem to be a small number of cases that aren't covered by the links above:
With the for (( expr1 ; expr2 ; expr3 )) ... loop, variable expansions in any of the expressions inside the (( ... )) don't need to be quoted.
Several of the expansions described in the Shell Parameter Expansion section of the Bash Reference Manual are described with a word argument that isn't subject to word splitting or pathname expansion (globbing). Examples include ${parameter:-word}, ${parameter#word}, and ${parameter%word}.
Great question! If you need to word split a variable, the quotes should be left off.
If I think of other cases, I'll add to this.
var='abc xyz'
set "$var"
echo $1
abc xyz
set $var
echo $1
abc

Multiple elements instead of one in bash script for loop

I have been following the answers given in these questions
Shellscript Looping Through All Files in a Folder
How to iterate over files in a directory with Bash?
to write a bash script which goes over files inside a folder and processes them. So, here is the code I have:
#!/bin/bash
YEAR="2002/"
INFOLDER="/local/data/datasets/Convergence/"
for f in "$INFOLDER$YEAR*.mdb";
do
echo $f
absname=$INFOLDER$YEAR$(basename $f)
# ... the rest of the script ...
done
I am receiving this error: basename: extra operand.
I added echo $f and I realized that f contains all the filenames separated by space. But I expected to get one at a time. What could be the problem here?
You're running into problems with quoting. In the shell, double-quotes prevent word splitting and wildcard expansion; generally, you don't want these things to happen to variable's values, so you should double-quote variable references. But when you have something that should be word-split or wildcard-expanded, it cannot be double-quoted. In your for statement, you have the entire file pattern in double-quotes:
for f in "$INFOLDER$YEAR*.mdb";
...which prevents word-splitting and wildcard expansion on the variables' values (good) but also prevents it on the * which you need expanded (that's the point of the loop). So you need to quote selectively, with the variables inside quotes and the wildcard outside them:
for f in "$INFOLDER$YEAR"*.mdb;
And then inside the loop, you should double-quote the references to $f in case any filenames contain whitespace or wildcards (which are completely legal in filenames):
echo "$f"
absname="$INFOLDER$YEAR$(basename "$f")"
(Note: the double-quotes around the assignment to absname aren't actually needed -- the right side of an assignment is one of the few places in the shell where it's safe to skip them -- but IMO it's easier and safer to just double-quote all variable references and $( ) expressions than to try to keep track of where it's safe and where it's not.)
Just quote your shell variables if they are supposed to contain strings with spaces in between.
basename "$f"
Not doing so will lead to splitting of the string into separate characters (see WordSplitting in bash), thereby messing up the basename command which expects one string argument rather than multiple.
Also it would be a wise to include the * outside the double-quotes as shell globbing wouldn't work inside them (single or double-quote).
#!/bin/bash
# good practice to lower-case variable names to distinguish them from
# shell environment variables
year="2002/"
in_folder="/local/data/datasets/Convergence/"
for file in "${in_folder}${year}"*.mdb; do
# break the loop gracefully if no files are found
[ -e "$file" ] || continue
echo "$file"
# Worth noting here, the $file returns the name of the file
# with absolute path just as below. You don't need to
# construct in manually
absname=${in_folder}${year}$(basename "$file")
done
just remove "" from this line
for f in "$INFOLDER$YEAR*.mdb";
so it looks like this
#!/bin/bash
YEAR="2002/"
INFOLDER="/local/data/datasets/Convergence/"
for f in $INFOLDER$YEAR*.mdb;
do
echo $f
absname=$INFOLDER$YEAR$(basename $f)
# ... the rest of the script ...
done

Bash check if file exists with double bracket test and wildcards

I am writing a Bash script and need to check to see if a file exists that looks like *.$1.*.ext I can do this really easily with POSIX test as [ -f *.$1.*.ext ] returns true, but using the double bracket [[ -f *.$1.*.ext ]] fails.
This is just to satisfy curiosity as I can't believe the extended testing just can't pick out whether the file exists. I know that I can use [[ `ls *.$1.*.ext` ]] but that will match if there's more than one match. I could probably pipe it to wc or something but that seems clunky.
Is there a simple way to use double brackets to check for the existence of a file using wildcards?
EDIT: I see that [[ -f `ls -U *.$1.*.ext` ]] works, but I'd still prefer to not have to call ls.
Neither [ -f ... ] nor [[ -f ... ]] (nor other file-test operators) are designed to work with patterns (a.k.a. globs, wildcard expressions) - they always interpret their operand as a literal filename.[1]
A simple trick to test if a pattern (glob) matches exactly one file is to use a helper function:
existsExactlyOne() { [[ $# -eq 1 && -f $1 ]]; }
if existsExactlyOne *."$1".*.ext; then # ....
If you're just interested in whether there are any matches - i.e., one or more - the function is even simpler:
exists() { [[ -f $1 ]]; }
If you want to avoid a function, it gets trickier:
Caveat: This solution does not distinguish between regular files directories, for instance (though that could be fixed.)
if [[ $(shopt -s nullglob; set -- *."$1".*.ext; echo $#) -eq 1 ]]; then # ...
The code inside the command substitution ($(...)) does the following:
shopt -s nullglob instructs bash to expand the pattern to an empty string, if there are no matches
set -- ... assigns the results of the pattern expansion to the positional parameters ($1, $2, ...) of the subshell in which the command substitution runs.
echo $# simply echoes the count of positional parameters, which then corresponds to the count of matching files;
That echoed number (the command substitution's stdout output) becomes the left-hand side to the -eq operator, which (numerically) compares it to 1.
Again, if you're just interested in whether there are any matches - i.e., one or more - simply replace -eq with -ge.
[1]
As #Etan Reisinger points out in a comment, in the case of the [ ... ] (single-bracket syntax), the shell expands the pattern before the -f operator even sees it (normal command-line parsing rules apply).
By contrast, different rules apply to bash's [[ ... ]], which is parsed differently, and in this case simply treats the pattern as a literal (i.e., doesn't expand it).
Either way, it won't work (robustly and predictably) with patterns:
With [[ ... ]] it never works: the pattern is always seen as a literal by the file-test operator.
With [ ... ] it only works properly if there happens to be exactly ONE match.
If there's NO match:
The file-test operator sees the pattern as a literal, if nullglob is OFF (the default), or, if nullglob is ON, the conditional always returns true, because it is reduced to -f, which, due to the missing operand, is no longer interpreted as a file test, but as a nonempty string (and a nonempty string evaluates to true)).
If there are MULTIPLE matches: the [ ... ] command breaks as a whole, because the pattern then expands to multiple words, whereas file-test operators only take one argument.
as your question is bash tagged, you can take advantage of bash specific facilities, such as an array:
file=(*.ext)
[[ -f "$file" ]] && echo "yes, ${#file[#]} matching files"
this first populates an array with one item for each matching file name, then tests the first item only: Referring to the array by name without specifying an index addresses its first element. As this represents only one single file, -f behaves nicely.
An added bonus is that the number of populated array items corresponds with the number of matching files, should you need the file count, and can thereby be determined easily, as shown in the echoed output above. You may find it an advantage that no extra function needs to be defined.

What is the meaning of #(${VAR})?

Whilst looking at some shell scripts I encountered several instances of if statements comparing some normal variable against another variable which is enclosed in #( ) brackets.
Does #(....) have some special meaning or am I missing something obvious here? Example of if test:
if [[ ${VAR} != #(${VAR2}) ]]
Thanks
It's an extended pattern, borrowed from ksh. Originally you would need to enable support for it with shopt -s extglob, but it became the default behavior inside [[ ... ]] in bash 4.1. #(...) matches one of the enclosed patterns. By itself, #(pattern) and pattern would be equivalent, so I would assume that the contents of $VAR2 contains at least one pipe, so that the expansion is something like #(foo|bar). In that case, the test would succeed if $VAR1 does not match foo or bar.
From the bash man page:
#(pattern-list)
Matches one of the given patterns
So ${VAR2} is expected to be a list of patterns separated by |, and your code tests whether ${VAR} matches any of them.

Resources