Bash: Expand braces and globs with spaces in filenames? - bash

I have some files that look like:
/path/with spaces/{a,b,c}/*.gz
And I need all files matching the glob under a subset of the a,b,c dirs to end up as arguments to a single command:
mycmd '/path/with spaces/a/1.gz' '/path/with spaces/a/2.gz' '/path/with spaces/c/3.gz' ...
The directories I care about come in as command line params and I have them in an array:
dirs=( "$#" )
And I want to do something like:
IFS=,
mycmd "/path/with spaces/{${dirs[*]}}/"*.gz
but this doesn't work, because bash expands braces before variables. I have tried tricks with echo and ls and even eval (*shudder*) but it's tough to make them work with spaces in filenames. find doesn't seem to be much help because it doesn't do braces. I can get a separate glob for each dir in an array with:
dirs=( "${dirs[#]/#//path/with spaces/}" )
dirs=( "${dirs[#]/%//*.gz}" )
but then bash quotes the wildcards on expansion.
So: is there an elegant way to get all the files matching a variable brace and glob pattern, properly handling spaces, or am I stuck doing for loops? I'm using Bash 3 if that makes a difference.

To perform brace expansion and globbing on a path with spaces, you can quote the portions of the path that contain spaces, e.g.
mycmd '/path/with spaces/'{a,b,c}/*.gz
Doing brace expansion using a list of values from a variable is a little tricky since brace expansion is done before any other expansion. I don't see any way but to use the dreaded eval.
eval mycmd "'/path/with spaces/'{a,b,c}/*.gz"
P.S. In such a case however, I would personally opt for a loop to build the argument list rather than the approach shown above. While more verbose, a loop will be a lot easier to read for the uninitiated and will avoid the need to use eval (especially when the expansion candidates are derived from user input!).
Proof of concept:
Using a dummy command (x.sh) which prints out the number of arguments and prints out each argument:
[me#home]$ shopt -s nullglob # handle case where globbing returns no match
[me#home]$ ./x.sh 'path with space'/{a,b}/*.txt
Number of arguments = 3
- path with space/a/1.txt
- path with space/b/2.txt
- path with space/b/3.txt
[me#home]:~/temp$ dirs="a,b"
[me#home]k:~/temp$ eval ./x.sh "'path with space'/{$dirs}/*.txt"
Number of arguments = 3
- path with space/a/1.txt
- path with space/b/2.txt
- path with space/b/3.txt

Okay, so here is one using bash for the "braces" and find for the globs:
find "${dirs[#]/#//path/with spaces/}" -name '*.gz' -print0 | xargs -0 mycmd
Useful with this if you need the results in an array.

Here's one for the GNU Parallel fans:
parallel -Xj1 mycmd {}/*.gz ::: "${dirs[#]/#//path/with spaces/}"

Related

--exclude-dir option in grep does not work as expected

I'm trying to exclude multiple directories when using grep as in the following command
grep -r --exclude-dir={folder1, folder2} 'foo'
However, an error is raised grep: foo: No such file or directory. Maybe I'm doing something wrong with --exclude-dir option since the command below works as expected
grep -r 'foo'
How can I use --exclude-dir option correctly? Thanks in advance.
The --exclude-dir flag of GNU grep takes a glob expression as an argument. The glob expression with more than items then becomes a brace expansion sequence which is expanded by the shell.
The expansion involves words separated by a comma character and doesn't like spaces between the words. So ideally it should have been
--exclude-dir={folder1,folder2}
You can see this as a simple brace expansion in your shell by running
echo {a,b} # produces 'a b'
echo {a, b} # this doesn't undergo expansion by shell
echo --exclude-dir={folder1, folder2}
--exclude-dir={folder1, folder2}
so, your original command becomes
grep -r '--exclude-dir={folder1,' 'folder2}' foo
i.e. the exclude-dir takes a unexpanded glob expansion string as {folder1,' and 'folder2}' becomes the content that you are trying to search for, leaving foo as an unwanted extra argument, which the argparser of grep doesn't like throwing a command line parse error.
Remember brace expansion is a feature of the shell (e.g. bash), and not grep. In shells that don't support the feature, putting directories between {..} will be treated literally and might not work desirably.

Invoke ls command in bash script and get all the results [duplicate]

I want to run this cmd line script
$ script.sh lib/* ../test_git_thing
I want it to process all the files in the /lib folder.
FILES=$1
for f in $FILES
do
echo "Processing $f file..."
done
Currently it only prints the first file. If I use $#, it gives me all the files, but also the last param which I don't want. Any thoughts?
The argument list is being expanded at the command line when you invoke "script.sh lib/*" your script is being called with all the files in lib/ as args. Since you only reference $1 in your script, it's only printing the first file. You need to escape the wildcard on the command line so it's passed to your script to perform the globbing.
As correctly noted, lib/* on the command line is being expanded into all files in lib. To prevent expansion, you have 2 options. (1) quote your input:
$ script.sh 'lib/*' ../test_git_thing
Or (2), turn file globbing off. However, the option set -f will disable pathname expansion within the shell, but it will disable all pathname expansion (setting it within the script doesn't help as expansion is done by the shell before passing arguments to your script). In your case, it is probably better to quote the input or pass the first arguments as a directory name, and add the expansion in the script:
DIR=$1
for f in "$DIR"/*
In bash and ksh you can iterate through all arguments except the last like this:
for f in "${#:1:$#-1}"; do
echo "$f"
done
In zsh, you can do something similar:
for f in $#[1,${#}-1]; do
echo "$f"
done
$# is the number of arguments and ${#:start:length} is substring/subsequence notation in bash and ksh, while $#[start,end] is subsequence in zsh. In all cases, the subscript expressions are evaluated as arithmetic expressions, which is why $#-1 works. (In zsh, you need ${#}-1 because $#- is interpreted as "the length of $-".)
In all three shells, you can use the ${x:start:length} syntax with a scalar variable, to extract a substring; in bash and ksh, you can use ${a[#]:start:length} with an array to extract a subsequence of values.
This answers the question as given, without using non-POSIX features, and without workarounds such as disabling globbing.
You can find the last argument using a loop, and then exclude that when processing the list of files. In this example, $d is the directory name, while $f has the same meaning as in the original answer:
#!/bin/sh
if [ $# != 0 ]
then
for d in "$#"; do :; done
if [ -d "$d" ]
then
for f in "$#"
do
if [ "x$f" != "x$d" ]
then
echo "Processing $f file..."
fi
done
fi
fi
Additionally, it would be a good idea to also test if "$f" is a file, since it is common for shells to pass the wildcard character through the argument list if no match is found.

Multiple elements instead of one in bash script for loop

I have been following the answers given in these questions
Shellscript Looping Through All Files in a Folder
How to iterate over files in a directory with Bash?
to write a bash script which goes over files inside a folder and processes them. So, here is the code I have:
#!/bin/bash
YEAR="2002/"
INFOLDER="/local/data/datasets/Convergence/"
for f in "$INFOLDER$YEAR*.mdb";
do
echo $f
absname=$INFOLDER$YEAR$(basename $f)
# ... the rest of the script ...
done
I am receiving this error: basename: extra operand.
I added echo $f and I realized that f contains all the filenames separated by space. But I expected to get one at a time. What could be the problem here?
You're running into problems with quoting. In the shell, double-quotes prevent word splitting and wildcard expansion; generally, you don't want these things to happen to variable's values, so you should double-quote variable references. But when you have something that should be word-split or wildcard-expanded, it cannot be double-quoted. In your for statement, you have the entire file pattern in double-quotes:
for f in "$INFOLDER$YEAR*.mdb";
...which prevents word-splitting and wildcard expansion on the variables' values (good) but also prevents it on the * which you need expanded (that's the point of the loop). So you need to quote selectively, with the variables inside quotes and the wildcard outside them:
for f in "$INFOLDER$YEAR"*.mdb;
And then inside the loop, you should double-quote the references to $f in case any filenames contain whitespace or wildcards (which are completely legal in filenames):
echo "$f"
absname="$INFOLDER$YEAR$(basename "$f")"
(Note: the double-quotes around the assignment to absname aren't actually needed -- the right side of an assignment is one of the few places in the shell where it's safe to skip them -- but IMO it's easier and safer to just double-quote all variable references and $( ) expressions than to try to keep track of where it's safe and where it's not.)
Just quote your shell variables if they are supposed to contain strings with spaces in between.
basename "$f"
Not doing so will lead to splitting of the string into separate characters (see WordSplitting in bash), thereby messing up the basename command which expects one string argument rather than multiple.
Also it would be a wise to include the * outside the double-quotes as shell globbing wouldn't work inside them (single or double-quote).
#!/bin/bash
# good practice to lower-case variable names to distinguish them from
# shell environment variables
year="2002/"
in_folder="/local/data/datasets/Convergence/"
for file in "${in_folder}${year}"*.mdb; do
# break the loop gracefully if no files are found
[ -e "$file" ] || continue
echo "$file"
# Worth noting here, the $file returns the name of the file
# with absolute path just as below. You don't need to
# construct in manually
absname=${in_folder}${year}$(basename "$file")
done
just remove "" from this line
for f in "$INFOLDER$YEAR*.mdb";
so it looks like this
#!/bin/bash
YEAR="2002/"
INFOLDER="/local/data/datasets/Convergence/"
for f in $INFOLDER$YEAR*.mdb;
do
echo $f
absname=$INFOLDER$YEAR$(basename $f)
# ... the rest of the script ...
done

How to store curly brackets in a Bash variable

I am trying to write a bash script. I am not sure why in my script:
ls {*.xml,*.txt}
works okay, but
name="{*.xml,*.txt}"
ls $name
doesn't work. I get
ls: cannot access {*.xml,*.txt}: No such file or directory
The expression
ls {*.xml,*.txt}
results in Brace expansion and shell passes the expansion (if any) to ls as arguments. Setting shopt -s nullglob makes this expression evaluate to nothing when there are no matching files.
Double quoting the string suppresses the expansion and shell stores the literal contents in your variable name (not sure if that is what you wanted). When you invoke ls with $name as the argument, shell does the variable expansion but no brace expansion is done.
As #Cyrus has mentioned, eval ls $name will force brace expansion and you get the same result as that of ls {\*.xml,\*.txt}.
The reason your expansion doesn't work is that brace expansion is performed before variable expansion, see Shell expansions in the manual.
I'm not sure what it is you're trying to do, but if you want to store a list of file names, use an array:
files=( {*.txt,*.xml} ) # these two are the same
files=(*.txt *.xml)
ls -l "${files[#]}" # give them to a command
for file in "${files[#]}" ; do # or loop over them
dosomething "$file"
done
"${array[#]}" expands to all elements of the array, as separate words. (remember the quotes!)

Bash glob parameter only shows first file instead of all files

I want to run this cmd line script
$ script.sh lib/* ../test_git_thing
I want it to process all the files in the /lib folder.
FILES=$1
for f in $FILES
do
echo "Processing $f file..."
done
Currently it only prints the first file. If I use $#, it gives me all the files, but also the last param which I don't want. Any thoughts?
The argument list is being expanded at the command line when you invoke "script.sh lib/*" your script is being called with all the files in lib/ as args. Since you only reference $1 in your script, it's only printing the first file. You need to escape the wildcard on the command line so it's passed to your script to perform the globbing.
As correctly noted, lib/* on the command line is being expanded into all files in lib. To prevent expansion, you have 2 options. (1) quote your input:
$ script.sh 'lib/*' ../test_git_thing
Or (2), turn file globbing off. However, the option set -f will disable pathname expansion within the shell, but it will disable all pathname expansion (setting it within the script doesn't help as expansion is done by the shell before passing arguments to your script). In your case, it is probably better to quote the input or pass the first arguments as a directory name, and add the expansion in the script:
DIR=$1
for f in "$DIR"/*
In bash and ksh you can iterate through all arguments except the last like this:
for f in "${#:1:$#-1}"; do
echo "$f"
done
In zsh, you can do something similar:
for f in $#[1,${#}-1]; do
echo "$f"
done
$# is the number of arguments and ${#:start:length} is substring/subsequence notation in bash and ksh, while $#[start,end] is subsequence in zsh. In all cases, the subscript expressions are evaluated as arithmetic expressions, which is why $#-1 works. (In zsh, you need ${#}-1 because $#- is interpreted as "the length of $-".)
In all three shells, you can use the ${x:start:length} syntax with a scalar variable, to extract a substring; in bash and ksh, you can use ${a[#]:start:length} with an array to extract a subsequence of values.
This answers the question as given, without using non-POSIX features, and without workarounds such as disabling globbing.
You can find the last argument using a loop, and then exclude that when processing the list of files. In this example, $d is the directory name, while $f has the same meaning as in the original answer:
#!/bin/sh
if [ $# != 0 ]
then
for d in "$#"; do :; done
if [ -d "$d" ]
then
for f in "$#"
do
if [ "x$f" != "x$d" ]
then
echo "Processing $f file..."
fi
done
fi
fi
Additionally, it would be a good idea to also test if "$f" is a file, since it is common for shells to pass the wildcard character through the argument list if no match is found.

Resources