Wildcard expansion (globbing) in a string composed of quoted and unquoted parts - bash

I have a line in a shell script that looks like this:
java -jar "$dir/"*.jar
, since I just want to execute whatever the jar file happens to be named in that folder. But this is not working as I expected. I get the error message:
Error: Unable to access jarfile [folder-name]/*.jar
It is taking the '*' character literally, instead of doing the replacement that I want. How do I fix this?
EDIT: It is working now. I just had the wrong folder prefix :/ For anybody wondering, this is the correct way to do it.

You just need to set failglob:
shopt -s failglob
to avoid showing literal *.jar when none are matched in a given folder.
PS: This will generate an error when it fails to match any *.jar as:
-bash: no match: *.jar

Explanation and background information
The OP's problem was NOT with globbing per se - for a glob (pattern) to work, the special pattern characters such as * must be unquoted, which works even in strings that are partly single- or double-quoted, as the OP correctly did in his question:
"$dir/"*.jar # OK, because `*` is unquoted
Rather, the problem was bash's - somewhat surprising - default behavior of leaving a pattern unexpanded (leaving it as is), if it happens not to match anything, effectively resulting in a string that does NOT represent any actual filesystem items.
In the case at hand, "$dir" happened to expand to a directory that did not contain *.jar files, and thus the resulting string passed to java ended in literal *.jar ('<value of $dir>/*.jar'), which due to not referring to actual .jar files, resulted in the error quoted in the question.
Shell options govern globbing (more formally called pathname expansion):
set -f (shopt -so noglob) turns off globbing altogether, so that unquoted strings (characters) that would normally cause a string to be treated as a glob are treated as literals.
shopt -s nullglob alters the default behavior to expanding a non-matching glob to an empty string.
shopt -s failglob alters the default behavior to reporting an error and setting the exit code to 1 in the case of a non-matching glob, without even executing the command at hand - see below for pitfalls.
There are other globbing-related options not relevant to this discussion - to see a list of all of them, run { shopt -o; shopt; } | fgrep glob. For a description, search by their names in man bash.
Robust globbing solutions
Note: Setting shell options globally affects the current shell, which is problematic, as third-party code usually makes the - reasonable - assumption that defaults are in effect. Thus, it is good practice to only change shell options temporarily (change, perform action, restore) or to localize changing their effect by using a subshell ((...)).
shopt -s nullglob
Useful for enumerating matches in a loop with for - it ensures that the loop is never entered, if there are no matches:
shopt -s nullglob # expand non-matching globs to empty string
for f in "$dir/"*.jar; do
# If the glob matched nothing, we never get here.
# !! Without `nullglob`, the loop would be entered _once_, with
# !! '<value of $dir>/*.jar'.
done
Problematic when used with arguments to commands that have default behavior in the absence of filename arguments, as it can result in unexpected behavior:
shopt -s nullglob # expand non-matching globs to empty string
wc -c "$dir/"*.jar # !! If no matches, expands to just `wc -c`
If the glob doesn't match anything, just wc -c is executed, which does not fail, and instead starts reading stdin input (when run interactively, this will simply wait for interactive input lines until terminated with Ctrl-D).
shopt -s failglob
Useful for reporting a specific error message, especially when combined with set -e to cause automatic aborting of a script in case a glob matches nothing:
set -e # abort automatically in case of error
shopt -s failglob # report error if a glob matches nothing
java -jar "$dir/"*.jar # script aborts, if this glob doesn't match anything
Problematic when needing to know the specific cause of an error and when combined with the || <command in case of failure> idiom:
shopt -s failglob # report error if a glob matches nothing
# !! DOES NOT WORK AS EXPECTED.
java -jar "$dir/"*.jar || { echo 'No *.jar files found.' >&2; exit 1; }
# !! We ALWAYS get here (but exit code will be 1, if glob didn't match anything).
Since, with failglob on, bash never even executes the command at hand if globbing fails, the || clause is also not executed, and overall execution continues.
While the failed glob will cause the exit code to be set to 1, you won't be able to distinguish between failure due to a non-matching glob vs. failure reported by the command (after successful globbing).
Alternative solution without changing shell options:
With a little more effort, you can do your own checking for non-matching globs:
Ad-hoc:
glob="$dir/*.jar"
[[ -n $(shopt -s nullglob; echo $glob) ]] ||
{ echo 'No *.jar files found.' >&2; exit 1; }
java -jar $glob
$(shopt -s nullglob; echo $glob) sets nullglob and then expands the glob with echo, so that the subshell either returns matching filenames or, if nothing matches, an empty string; that output, thanks to command substitution ($(...)), is passed to -n, which tests whether the string is empty, so that the overall [[ ... ]] conditional's exit code reflects if something matched (exit code 0) or not (exit code 1).
Note that any command inside a command substitution runs in a subshell, which ensures that the effect of shopt -s nullglob only applies to that very subshell and thus doesn't alter global state.
Also note how the entire right-hand side in the variable assignment glob="$dir/*.jar" is double-quoted, to illustrate the point that quoting with respect to globbing matters when a variable is referenced later, not when it is defined. The unquoted references to $glob later ensure that the entire string is interpreted as a glob.
With a small helper function:
# Define simple helper function.
exists() { [[ -e $1 ]]; }
glob="$dir/*.jar"
exists $glob || { echo 'No *.jar files found.' >&2; exit 1; }
java -jar $glob
The helper function takes advantage of shell applying globbing upon invocation of the function, and passing the results of globbing (pathname expansion) as arguments. The function then simply tests whether the 1st resulting argument (if any), refers to an existing item, and sets the exit code accordingly (this will work regardless of whether nullglob happens to be in effect or not).

Related

Stalling problem with wildcard in bash script

I am very new to writing bash scripts. This question is likely very basic, but I have not been able to find a clear answer yet. I'm working on an Ubuntu subsystem installation on Windows 10.
I'm running a script that contains the following conditional:
if [ -z "$date1" ]; then
date1=$(head -n 1 "$dir"/*.txt | sed "s/^[^0-9]*//g" | date +%Y%m%d -f - 2>/dev/null)
fi
It runs into issues when it encounters a directory (the dir variable) that has no .txt file, but I don't quite understand the nature of the problem. I do know the issue is in the head command, at least partially. I don't get an error, the script just stalls when it reaches a directory without a .txt file. I want the script to simply move on. If I run the line on its own (without the conditional) in the terminal, I get a No such file or directory error, which makes sense. What really confuses me is that if I place quotes (single or double) around the wildcard portion (i.e. '*.txt'), then the script spits out the head error and moves on. My limited and perhaps incorrect understanding is that the quotes in this case mean the program no longer treats the * as a wildcard and simply looks for a file by the literal name *.txt. But I thought that when the * was interpreted by bash that it first looks for any possible expansion and then tries the literal interpretation if it finds none. So why does the script stall in one case and not the other. Shouldn't both simply give me the same No such file or directory error, as they do when run outside the script?
I'll also mention that the script includes preceding conditionals that first look for .docx files and only moves on to .txt files when there are no .docx files. It handles the cases where there are no .docx files perfectly well, although the first command in that pipe is unzip rather than head. This question seems relevant, but since the script is able to move on when there are quotes around the wildcard, and since it moves on in the similar scenario where there are no .docx files, I wanted to understand what the issue is here and the best way to fix it.
I appreciate your help.
In quotes, * will not expand and will be a literal * character.
On the other hand, when * tries to expand and fails, one of three things happens:
it is interpreted literally as the string *.txt (plus whatever $dir/ expands to)
You can enforce this behavior with shopt -u nullglob, which should be the default.
it expands to nothing, making the string $dir/*.txt equal to the empty string
You can enforce this behavior with shopt -s nullglob.
it raises an error
You can enforce this behavior with shopt -s failglob (or turn it off with shopt -u failglob).
Examples:
bash-5.0# shopt -s | grep glob
globasciiranges on
bash-5.0# echo *.asdf
*.asdf
bash-5.0# shopt -s nullglob
bash-5.0# echo *.asdf
bash-5.0# shopt -u nullglob
bash-5.0# echo *.asdf
*.asdf
bash-5.0# shopt -s failglob
bash-5.0# echo *.asdf
bash: no match: *.asdf
bash-5.0# shopt -s | grep glob
failglob on
globasciiranges on
When the glob expands to the empty string, head will hang forever unless you enter stdin (head $(echo '') | cat will never complete unless you type)

getops $OPTARG is empty if flag value contains brackets

When I pass a flag containing [...] to my bash script, getops gives me an empty string when I try to grab the value with $OPTARG.
shopt -s nullglob
while getopts ":f:" opt; do
case $opt in
f)
str=$OPTARG
;;
esac
done
echo ${str}
Running the script:
$ script.sh -f [0.0.0.0]
<blank line>
How can I get the original value back inside the script?
Short summary: Double-quote your variable references. And use shellcheck.net.
Long explanation: When you use a variable without double-quotes around it (e.g. echo ${str}), the shell tries to split its value into words, and expand anything that looks like a wildcard expression into a list of matching files. In the case of [0.0.0.0], the brackets make it a wildcard expression that'll match either the character "0" or "." (equivalent to [0.]). If you had a file named "0", it would expand to that string. With no matching file(s), it's normally left unexpanded, but with the nullglob set it expands to ... null.
Turning off nullglob solves the problem if there are no matching files, but isn't really the right way do it. I remember (but can't find right now) a question we had about a script that failed on one particular computer, and it turned out the reason was that one computer happened to have a file that matched a bracket expression in an unquoted variable's value.
The right solution is to put double-quotes around the variable reference. This tells the shell to skip word splitting and wildcard expansion. Here's an interactive example:
$ str='[0.0.0.0]' # Quotes aren't actually needed here, but they don't hurt
$ echo $str # This works without nullglob or a matching file
[0.0.0.0]
$ shopt -s nullglob
$ echo $str # This fails because of nullglob
$ shopt -u nullglob
$ touch 0
$ echo $str # This fails because of a matching file
0
$ echo "$str" # This just works, no matter whether file(s) match and/or nullglob is set
[0.0.0.0]
So in your script, simply change the last line to:
echo "${str}"
Note that double-quotes are not required in either case $opt in or str=$OPTARG because variables in those specific contexts aren't subject to word splitting or wildcard expansion. But IMO keeping track of which contexts it's safe to leave the double-quotes off is more hassle than it's worth, and you should just double-quote 'em all.
BTW, shellcheck.net is good at spotting common mistakes like this; I recommend feeding your scripts through it, since this is probably not the only place you have this problem.
Assuming that shopt -s nullglob is needed in the bigger script.
You can temporary disable shopt -s nullglob using shopt -u nullglob
shopt -s nullglob
shopt -u nullglob
while getopts ":f:" opt; do
case $opt in
f)
str=$OPTARG
;;
esac
done
echo ${str}
shopt -s nullglob

shell script exit with no match with question mark symbol

Why ./script.sh ? throws No match. ./script.sh is running fine.
script.sh
#!/bin/sh
echo "Hello World"
? is a glob character on UNIX. By default, in POSIX shells, a glob that matches no files at all will evaluate to itself; however, many shells have the option to modify this behavior and either pass no arguments in this case or make it an error.
If you want to pass this (or any other string which can be interpreted as a glob) literally, quote it:
./script.sh '?'
If you didn't use quotes, consider what the following would do:
touch a b c
./script.sh ? ## this is the same as running: ./script.sh a b c
That said -- the behavior of your outer shell (exiting when no matches exist, rather than defaulting to pass the non-matching glob expression as a literal) is non-default. If this shell is bash, you can modify it with:
shopt -u failglob
Note, however, that this doesn't really fix your problem, but only masks it when your current directory has no single-character filenames. The only proper fix is to correct your usage to quote and escape values properly.

How to expand wildcards after substituting them into a filepath as arguments, in bash script?

I am currently trying to substitute arguments into a filepath
FILES=(~/some/file/path/${1:-*}*/${2:-*}*/*)
I'm trying to optionally substitute variables, so that if there are no arguments the path looks like ~/some/file/path/**/**/* and if there is just one, it looks like ~/some/file/path/arg1*/**/*, etc. However, I need the wildcard expansion to occur after the filepath has been constructed. Currently what seems to be happening is that the filepath is into FILES as a single filepath with asterisks.
The broader goal is to pass all subdirectories that are two levels down from the current directory into the FILES variable, unless arguments are given, in which case the first argument is used to pick a particular directory at the first level, the second argument for the second level.
edit:
This script generates directories and then grabs random files from them, and previously had ** instead of *, however it still works, and correctly restricts the files to pull from when given arguments. Issue resolved.
#!/bin/bash
mkdir dir1 dir1/a
touch dir1/a/foo.txt dir1/a/bar.txt
cp -r dir1/a dir1/b
cp -r dir1 dir2
files=(./*${1:-}/*/*)
for i in {1..10}
do
# Get random file
nextfile=${files[$RANDOM % ${#files[#]} ]}
# Use file
echo "$nextfile" || break
sleep 0.5
done
rm -r dir1 dir2
I can't reproduce this behavior.
$ files=( ~/tmp/foo/${1:-*}*/${2:-*}*/* )
$ declare -p files
declare -a files='([0]="/Users/chaduffy/tmp/foo/bar/baz/qux")'
To explain why this is expected to work: Parameter expansion happens before glob expansion, so by the time glob expansion takes place, content has already been expanded. See lhunath's simplified diagram of the bash parse/expansion process for details.
A likely explanation is simply that your glob has no matches, and is evaluating to itself for that reason. This behavior can be disabled with the nullglob switch, which will give you an empty array:
shopt -s nullglob
files=(~/some/file/path/${1:-*}*/${2:-*}*/*)
declare -p files
Another note: ** only has special meaning in shells where shopt -s globstar has been run, and where this feature (added in 4.0) is available. On Mac OS X (without installation of a newer version of bash via MacPorts or similar), it doesn't exist; you'll want to use find for recursive operations. If your glob would only match if ** triggered recursion, this would explain the behavior in question.

What behavior can and should I expect of a shell with the command and glob "echo -?"?

If I want to match a file called "-f" or "-r" I might do something like
test.sh -?
And if I want to send the literal '-?' to a program as an argument I might do something like:
test.sh -\?
If no such file "-f" or "-r" or anything like it exists, then what should my shell do with
test.sh -?
Should it tell me that no file matches this pattern?
In bash, the default is to treat an unmatched pattern literally. If the nullglob option is set, an unmatched pattern "evaporates"; it is removed from command, not even expanding to the empty string.
In zsh, an unmatched pattern produces an error by default. Setting the nomatch option causes an unmatched pattern to be treated literally, and zsh also supports a nullglob option which causes unmatched patterns to disappear. There is also a cshnullglob option which acts like nullglob, but requires at least one pattern in a command to match, or an error is produced.
Note that POSIX specifies that if the pattern contains an invalid bracket expression or does not match any existing filenames or pathnames, the pattern string shall be left unchanged in sh.
ash, dash, ksh, bash and zsh all behave this way when invoked as sh.
POSIX specifies that if the pattern contains an invalid bracket expression or does not match any existing filenames or pathnames, the pattern string shall be left unchanged in sh.
ash, dash, ksh, bash and zsh all behave this way when invoked as sh.
You seem to be looking for the nullglob option, at least with Bash:
shopt -s nullglob
Without the nullglob option, an unmatched pattern is passed as its literal self to your program: the shell will pass -? to the script if there isn't a file that matches. With the nullglob option, unmatched patterns are replaced with nothing at all.
If no such pattern exists, the shell, by default, just returns the pattern your gave, including whatever * or ? characters you used. To determine whether the file actually exists, test it. Thus, inside your script, use:
[ -f "$1" ] || echo "no such file exists"

Resources