Bash check if file exists with double bracket test and wildcards - bash

I am writing a Bash script and need to check to see if a file exists that looks like *.$1.*.ext I can do this really easily with POSIX test as [ -f *.$1.*.ext ] returns true, but using the double bracket [[ -f *.$1.*.ext ]] fails.
This is just to satisfy curiosity as I can't believe the extended testing just can't pick out whether the file exists. I know that I can use [[ `ls *.$1.*.ext` ]] but that will match if there's more than one match. I could probably pipe it to wc or something but that seems clunky.
Is there a simple way to use double brackets to check for the existence of a file using wildcards?
EDIT: I see that [[ -f `ls -U *.$1.*.ext` ]] works, but I'd still prefer to not have to call ls.

Neither [ -f ... ] nor [[ -f ... ]] (nor other file-test operators) are designed to work with patterns (a.k.a. globs, wildcard expressions) - they always interpret their operand as a literal filename.[1]
A simple trick to test if a pattern (glob) matches exactly one file is to use a helper function:
existsExactlyOne() { [[ $# -eq 1 && -f $1 ]]; }
if existsExactlyOne *."$1".*.ext; then # ....
If you're just interested in whether there are any matches - i.e., one or more - the function is even simpler:
exists() { [[ -f $1 ]]; }
If you want to avoid a function, it gets trickier:
Caveat: This solution does not distinguish between regular files directories, for instance (though that could be fixed.)
if [[ $(shopt -s nullglob; set -- *."$1".*.ext; echo $#) -eq 1 ]]; then # ...
The code inside the command substitution ($(...)) does the following:
shopt -s nullglob instructs bash to expand the pattern to an empty string, if there are no matches
set -- ... assigns the results of the pattern expansion to the positional parameters ($1, $2, ...) of the subshell in which the command substitution runs.
echo $# simply echoes the count of positional parameters, which then corresponds to the count of matching files;
That echoed number (the command substitution's stdout output) becomes the left-hand side to the -eq operator, which (numerically) compares it to 1.
Again, if you're just interested in whether there are any matches - i.e., one or more - simply replace -eq with -ge.
[1]
As #Etan Reisinger points out in a comment, in the case of the [ ... ] (single-bracket syntax), the shell expands the pattern before the -f operator even sees it (normal command-line parsing rules apply).
By contrast, different rules apply to bash's [[ ... ]], which is parsed differently, and in this case simply treats the pattern as a literal (i.e., doesn't expand it).
Either way, it won't work (robustly and predictably) with patterns:
With [[ ... ]] it never works: the pattern is always seen as a literal by the file-test operator.
With [ ... ] it only works properly if there happens to be exactly ONE match.
If there's NO match:
The file-test operator sees the pattern as a literal, if nullglob is OFF (the default), or, if nullglob is ON, the conditional always returns true, because it is reduced to -f, which, due to the missing operand, is no longer interpreted as a file test, but as a nonempty string (and a nonempty string evaluates to true)).
If there are MULTIPLE matches: the [ ... ] command breaks as a whole, because the pattern then expands to multiple words, whereas file-test operators only take one argument.

as your question is bash tagged, you can take advantage of bash specific facilities, such as an array:
file=(*.ext)
[[ -f "$file" ]] && echo "yes, ${#file[#]} matching files"
this first populates an array with one item for each matching file name, then tests the first item only: Referring to the array by name without specifying an index addresses its first element. As this represents only one single file, -f behaves nicely.
An added bonus is that the number of populated array items corresponds with the number of matching files, should you need the file count, and can thereby be determined easily, as shown in the echoed output above. You may find it an advantage that no extra function needs to be defined.

Related

Using "expanding characters" in a variable in a bash script

I apologize beforehand for this question, which is probably both ill formulated and answered a thousand times over. I get the feeling that my inability to find an answer is that I don't quite know how to ask the question.
I'm writing a script that traverses folders in a bunch of mounted external hard drives, like so:
for g in /Volumes/compartment-?/{Private/Daniel,Daniel}/Projects/*/*
It then proceeds to perform long-running tasks on each of the directories found there. Because these operations are io-intensive rather than cpu-intensive, I thought I'd add the option to provide which "compartment" I want to work in, so that I can parallelize the workloads.
But, doing
cmp="?"
[[ ! "$1" = "" ]] && cmp="$1"
And then,
for g in /Volumes/compartment-$cmp/{Private/Daniel,Daniel}/Projects/*/*
Doesn't work - the question mark that should expand to all compartments instead becomes literal, so I get an error that "compartment-?" doesn't exist, which is of course true.
How do I create a variable with a value that "expands," like dir="./*" working with ls $dir?
EDIT: Thanks to #dan for the answer. I was brought up to be courteous and thank people, so I did thank him for it in a comment on his question, but that comment has been removed, and I'm anxious that repeating it might be some kind of infraction here. I ended up simply escaping my question mark glob character, i.e. \?, since for this script I only need to either search all drives or one particular drive. But I'll keep the answer handy for the next time I write a script where I'd like to support more advanced arguments.
Brace expansion occurs before variable expansion. Pathname/glob expansion (eg ?, *) occurs last. Therefore you can't use the glob character ? in a variable, and in a brace expansion.
You can use a glob expression in an unquoted variable, without brace expansion. Eg. q=\?; echo compartment-$q is equivalent to echo compartment-?.
To solve your problem, you could define an array based on the input argument:
if [[ $1 ]]; then
[[ -d /Volumes/compartment-$1 ]] || exit 1
files=("/Volumes/compartment-$1"/{Private/Daniel,Daniel}/Projects/*/*)
else
files=(/Volumes/compartment-?/{Private/Daniel,Daniel}/Projects/*/*)
fi
# then iterate the list:
for i in "${files[#]}"; do
...
Another option is a nested loop. The path expression in the outer loop doesn't use brace expansion, so (unlike the first example) it can expand a glob in $1 (or default to ? if $1 is empty):
for i in /Volumes/compartments-${1:-?}; do
[[ -d $i ]] &&
for j in {Private/Daniel,Daniel}/Projects/*/*; do
[[ -e $j ]] || continue
...
Note that the second example expands a glob expression passed in $1 (eg. ./script '[1-9]'). The first example does not.
Remember that pathname expansion has the property of expanding only to existing files, or literally. shopt -s nullglob guarantees expansion only to existing files (or nothing).
You should either use nullglob, or check that each file or directory exists, like in the examples above.
Using $1 unquoted also subjects it to word splitting on whitespace. You can set IFS= (empty) to avoid this.

Are these two logical comparisons equivalent?

Given the variable:
path="a/b/c"
Are these two logical comparisons equivalent?
[[ $path = */* ]] && echo 1
and
case $path in
*/*)
echo 1
;;
esac
Is each case effectively equivalent to a bracket test construct? And is there any disparity for case against [ vs [[?
For the pattern */*, case and [[ will behave identically.
However, [[ always accepts extended globs (so it ignores the setting of shell option extglob), while case only allows extended globs if extglob is set. So if the pattern had been an extended glob and extglob were not set, the two constructs would act differently. (Most likely, the use of an extended glob in a case pattern would result in a syntax error.)
[ does not do pattern matching. The arguments to the command [ $path = */* ] will undergo filename expansion before the command is interpreted, which is likely to result in bash complaining that [ has too many arguments (unless you have some interestingly named files or there is only exactly one file which matches */*).

How to capture Filename Expansion? (expanding globs)

--Disclaimer--
I am open to better titles for this question.
I am trying to get the full name of a file matching: "target/cs-*.jar".
The glob is the version number.
Right now the version is 0.0.1-SNAPSHOT.
So, below, I would like jar_location to evaluate to cs-0.0.1-SNAPSHOT.jar
I've tried a few solutions, some of them work, some don't and I'm not sure what I'm missing.
Works
jar_location=( $( echo "target/cs-*.jar") )
echo "${jar_location[0]}"
Doesn't work
jar_location=$( echo "target/cs-*.jar")
echo "$jar_location"
jar_location=( "/target/cs-*.jar" )
echo "${jar_location}"
jar_location=$( ls "target/cs-*.jar" )
echo "${jar_location}"
--EDIT--
Added Filename Expansion to the title
Link to Bash Globbing / Filename Expansion
Similar question: The best way to expand glob pattern?
If you're using bash, the best option is to use an array to expand the glob:
shopt -s nullglob
jar_locations=( target/cs-*.jar )
if [[ ${#jar_locations[#]} -gt 0 ]]; then
jar_location=${jar_locations##*/}
fi
Enabling nullglob means that the array will be empty if there are no matches; without this shell option enabled, the array would contain the literal string target/cs-*.jar in the case of no matches.
If the length of the array is greater than zero, then set the variable, using the expansion to remove everything up to the last / from the first element of the array. This uses the fact that ${jar_locations[0]} and $jar_locations get you the same thing, namely the first element of the array. If you don't like that, you can always assign to a temporary variable.
An alternative for those with GNU find:
jar_location=$(find target -name 'cs-*.jar' -printf '%f' -quit)
This prints the filename of the first result and quits.
Note that if there is more than one file found, the output of these two commands may differ.

Bash check shows file exists for non-existent files?

Run the following in bash:
stuff=`rpm -ql <some package> | grep dasdasdfd`
(non existent file in package, exit code = 1, stdout is empty)
if [ -f $stuff ]; then echo "whaaat"; fi
Above command checks if file exists... but:
file $stuff
Just prints usage info for file... and
stat $stuff
Missing operand...
Can someone please explain why? Is this a bug? Am I doing something wrong? I just want to make sure that a file that's in the package is present on fs
You probably need to surround $stuff in quotes
if [ -f "$stuff" ]; then
As a general rule, you almost always want to add quotes around pathnames everywhere you use them.
I find it more useful to think or variables in shell scripting as "macros", which are expanded on first use to their value. This is different from variables in almost every other programming language.
So if $stuff contains hello world (notice the space), it would be the same as if you've typed:
[ -f hello world ]
which is obviously an error.
In this case, you mentioned that you're dealing with a non-existent file, so $stuff is actually empty, which would be like typing:
[ -f ]
Which is actually valid, but always succeeds. This is a bit of obscure test behaviour, from the POSIX spec we read that test always succeeds if there if only a single argument (in this case, the argument is -f):
1 argument:
Exit true (0) if $1 is not null; otherwise, exit false.
This is probably to facilitate the writing of:
[ $variable_that_may_or_may_not_be_defined ]
If you add quotes, you're passing 2 arguments, and more sane things happen:
if [ -f "" ]; then
Martin Tournoij's answer and DevSolar's answer both provide correct solutions and helpful background info: with respect to [ ... ] in one case, and [[ ... ]] in the other.
Since it may not be obvious if and when to choose [[ ... ]] over [ ... ] (and its (virtual) alias, test ...), let me attempt a summary:
If your code must be portable (POSIX-compliant), you MUST use [ ... ] (or test ...).
Tokens inside [ ... ] are parsed just like arguments passed to an executable, so you must double-quote your variable references, unless you explicitly want all shell expansions - notably word splitting (automatic splitting into multiple tokens by whitespace) and globbing - applied to them.
[ -f "$stuff" ] # double-quoting required, if $stuff has embedded whitespace
If you know that your code will be run with bash, you can use [[ ... ]] for more features and fewer surprises.
Tokens inside [[ ... ]] are parsed in a special context in which neither word splitting nor pathname expansion (globbing) are applied (though other expansions, such as parameter expansion, do occur), so there is typically no need to double-quote variable references.
[[ -f $stuff ]] # double-quoting optional
Note that ksh and zsh also support [[ ... ]] (presumably with subtle variations in behavior).
For more background info, such as the additional features that [[ ... ]] offers, read on.
[[ ... ]] improves on [ ... ] / test ... as follows:
"RHS" below means "right-hand side", i.e., the right operand of a binary operator.
(typically) requires NO quoting of variable references (except on the RHS of == and =~ to specify a literal string or substring(s))
f='some file'; [[ -f $f ]] # ok, double quotes optional
v='*'; [[ $v == '*' ]] # ok, double quotes optional
Neither word splitting nor pathname expansion is applied inside [[ ... ]], so it's safe to use unquoted references to variables whose values have embedded whitespace and/or values such as * that would normally lead to globbing.
offers string pattern matching with = / ==, with an unquoted pattern on the RHS (or at least unquoted pattern metachars.)
[[ abc == a* ]] && echo yes # matches; use of = instead of == works too
Caveat: Thus, on the RHS of = / == you must double-quote variable references (or single-quote literals) if you want their values to be treated as literals.
v='a*'; [[ abc == "$v" ]] # does NOT match
offers regex matching with =~, with an unquoted extended regular expression on the RHS (or at least unquoted regex metachars.)
[[ abc =~ ^a.+$ ]] && echo yes # matches
Caveat: Thus, on the RHS of =~ you must double-quote variable references (or single-quote literals) if you want their values to be treated as literals.
v='a.+'; [[ abc =~ ^"$v"$ ]] # does NOT match
Also note that the unquoted / quoted distinction was only introduced in bash 3.2 - you can still use shopt -s compat31 to have single- and double-quoted strings treated as regexes, too.
Caveat: The regex dialect understood by =~ is platform-specific, so a regex that works on one platform may not work on another (this is one of the few cases where bash's behavior is platform-dependent). For instance, on Linux you can use \b and \< / \> for word-boundary assertions, whereas BSD/macOS only supports [[:<]] and [[:>]], which, in turn, Linux doesn't support - see this answer of mine.
offers grouping and negation with unescaped (, ), and ! chars.
offers use of && and || (Boolean AND and OR)
[[ (3 -gt 2) && ! -f / ]] && echo yes
Note that, inside [[ ... ]], && has higher precedence than || - unlike OUTSIDE (as so-called [command-]list operators, where they combine entire commands / command lists), where they have equal precedence.
(while [ and test have -a and -o, even the POSIX spec. for test cautions against their use)
within [[ ... ]], you may spread your conditional across multiple lines for readability without the need for the line-continuation char. (\), assuming the line breaks come after && or ||, as codeforester points out.
[[ ... ]] is faster than [ ... ], though that will typically not matter.
If you are interested in relative performance, see this answer of mine.
Implementation notes re [ and test:
[[ a is shell keyword (supported in bash, ksh, and zsh), which allows for different parsing rules, as described above.
By contrast, [ and test are builtins in all major POSIX-like shells (bash, ksh, zsh, dash).
In addition, both [ and test exist as external utilities (executable files that require a separate process to invoke), as mandated by POSIX.
In fact, you need external utility versions so as to be able to use [ or test in "shell-less" invocation scenarios such as when passing a test to find -exec or xargs.
While the [ utility could conceivably be implemented as a symlink to the test utility (as long as test knows how it was invoked and enforces the closing ] when invoked as [), in practice they are often (always?) separate executables (true on Linux and macOS / BSD, for instance; on Linux, their content differs, whereas on macOS / BSD their content is identical (they are copies of the same file)).
One option would be to put $stuff in quotes, as Carpetsmoker said.
But since this is tagged bash, and because catering for whitespace in filenames is a pain, you could go for:
if [[ -f $stuff ]]
As opposed to [ which is an alias for test, the [[ construct "knows" how to handle the contents of $stuff correctly.

How to use [[ == ]] properly to match a glob?

Bash's manpage teaches that [[ == ]] matches patterns. In Bash therefore, why does the following not print matched?
Z=abc; [[ "$Z" == 'a*' ]] && echo 'matched'
The following however does indeed print matched:
Z=abc; [[ "$Z" == a* ]] && echo 'matched'
Isn't this exactly backward? Why does the a*, without the quotes, not immediately expand to list whatever filenames happen to begin with the letter a in the current directory? And besides, why doesn't the quoted 'a*' work in any case?
Glob pattern must not be quoted to make it work.
This should also work with just glob pattern out of quote whereas static text is still qupted:
[[ "$Z" == "a"* ]] && echo 'matched'
matched
[[ "$Z" == "ab"* ]] && echo 'matched'
matched
Explanation snippet from man page:
When the == and != operators are used, the string to the right of
the operator is considered a pattern and matched according to the
rules described below under Pattern Matching. If the shell option
nocasematch is enabled, the match is performed without regard to
the case of alphabetic characters. The return value is 0 if the
string matches (==) or does not match (!=) the pattern, and 1
otherwise. Any part of the pattern may be quoted to force it to be
matched as a string.
Additionally, one of the reasons to use [[ over [ is that [[ is a shell built-in and thus can have its own syntax and doesn't need to follow the normal expansion rules (which is why the arguments to [[ aren't subject to word-splitting for example).
While the existing answer is correct, I don't believe that it tells the full story.
Globs have two uses. There is a difference in behaviour between globs inside a [[ ]] construct which test the contents of a variable against a pattern and other globs, which expand to list a range of files. In either case, if you put quotes around character, it will be interpreted literally and not expanded.
It is also worth mentioning that the variable on the left hand side doesn't need to be quoted after the [[, so you could write your code like this:
Z=abc; [[ $Z == a* ]] && echo 'matched'
It is also possible to use a single = but the == looks more familiar to those coming from other coding backgrounds, so personally I prefer to use it in bash as well. As mentioned in the comments, the single = is the more widely compatible, as it is used to test string equality in all of POSIX-compliant shells, e.g. [ "$a" = "abc" ]. For this reason you may prefer to use it in bash as well.
As always, Greg's wiki contains some good information on the subject of pattern matching in bash.

Resources