Remove all files except files with certain extension - shell

This removes all files that end with .a or .b
$ ls *.a
a.a b.a c.a
$ ls *.b
a.b b.b c.b
$ rm *.a *.b
How would I do the opposite and remove all files that end with *.* except the ones that end with *.a and *.b?

The linked answer has useful info, though the question is somewhat ambiguous and the answers use differing interpretations.
The simplest approach in your case is probably (a streamlined version of https://stackoverflow.com/a/10448940/45375):
(GLOBIGNORE='*.a:*.b'; rm *.*)
Note the use of a subshell ((...)) to localize setting the GLOBIGNORE variable.
The patterns assigned to GLOBIGNORE must be :-separated.
The appeal of this approach is that you can use a single subshell without changing global state.
By contrast, getting away with a single subshell with shopt -s extglob requires a bit of trickery:
(shopt -s extglob; glob='*.!(a|b)'; echo $glob)
Note the mandatory use of an intermediate variable, without which the command would break (because a literal glob would be expanded BEFORE executing the commands, at which point the extended globbing syntax is not yet recognized).
Caveat: Using GLOBIGNORE has an unexpected side effect (bug?):
If GLOBIGNORE is set - to whatever value - pathname expansion of * and *.* behaves as if shell option dotglob were in effect - even if it isn't.
In other words: If GLOBIGNORE is set, hidden files not explicitly exempted by a pattern in GLOBIGNORE are always matched by * and *.*.
dotglob is OFF by default, causing * NOT to include hidden files (if GLOBIGNORE is not set, which is true by default).
If you also wanted to exclude hidden files while using GLOBIGNORE, add the following pattern: .*; applied to the question, you'd get:
(GLOBIGNORE='*.a:*.b:.*'; rm *.*)
By contrast, using extended globbing after turning on the extglob shell option DOES respect the dotglob option.

You can enable extended glob in bash:
shopt -s extglob
Then you can use:
rm *.!(a|b)
To remove all files that end with *.* except the ones that end with *.a OR *.b
Update: (Thanks to #mklement0) Here is a way to localize setting extglob (without altering the global state) by doing this in a subshell and using an intermediate variable:
(shopt -s extglob; glob='*.!(a|b)'; rm $glob)

There are some shells that are capable of this (I think?), however, bash is not by default. If you are running bash on Cygwin, you can do this:
rm $(ls -1 | grep -v '.*\.a' | grep -v '.*\.b')
ls -1 (that's a one) list all files in current directory one per line.
grep -v '.*\.a' return all matches that don't end in .a
grep -v '.*\.b' return all matches that don't end in .b

Sometimes it's better to not insist on solving a problem a certain way. And for the general problem of "acting on certain files to be determined in some tricky way", find is probably the best all-around tool you'll find.
find . -type f -maxdepth 1 ! -name \*.[ab] -delete
Omit the -maxdepth 1 if you want to recurse into subdirectories.

Related

Why does ls not subset by filetype recursively (with -R)?

I want to list all .jpg files in all subdirectories using ls.
For the same directory this works fine:
ls *.jpg
However, when using the -R for recursiveness:
ls -R *.jpg
I get:
zsh:no matches found: *.jpg
Why does this not work?
Note: I know it can be done using find or grep but I want to know why the above does not work.
The program ls is not designed to handle patterns by itself.
When you run ls -R *.jpg, the pattern *.jpg is not directly passed to ls. The shell replaces it by a list of all files that match the pattern. (Only if there is no file with a matching name, ls will see the file name *.jpg and not find a file of this name.)
Since you are using zsh (with the default setting setopt nomatch), it prints an error message instead of passing the pattern to ls.
If there are matching files, e.g. A.jpg, B.jpg, C.jpg, the command
ls *.jpg
will be run by the shell as
ls A.jpg B.jpg C.jpg
In contrast to this, find is designed to handle patterns with its -name test. When using find you should make sure the pattern is not replaced by the shell, e.g. by using -name '*.jpg' or -name \*.jpg. Otherwise you might get unexpected results or an error if there are matching files in the current directory.
Edit:
As shown in Martin Tournoij's answer you could use the recursive glob pattern ls **/*.jpg, but this is also handled by the shell not by ls, so you don't need option -R. In zsh this recursive pattern ** is enabled by default, in bash you need to enable it with shopt -s globstar.
The shell first expands any glob patterns, and then runs the command. So from ls's perspective, ls *.jpg is exactly the same as if you had typed ls one.jpg two.jpg. The -R flag to ls only makes sense if you use it on a directory, which you're not doing here.
This is also why mv *.jpg *.png doesn't work as expected on Unix systems, since mv never sees those patterns but just the filenames it expanded to (it does on e.g. Windows, where the globbing is done by the program rather than the shell; there are advantages and disadvantages to both approaches).
* matches all characters except a /, so *.jpg only expands to patterns in the current directory. **/ is similar, but also matches /, so it expands to patterns in any directory. This is supported by both bash and zsh.
So ls **/*.jpg will do what you want; you don't need to use find or grep. In zsh, especially you rarely need to use find since globbing is so much more powerful than in the standard Bourne shell or bash.
In zsh you can also use setopt glob_star_short and then **.jpg will work as well, which is a shortcut for **/*.jpg.

What is the shortest expression to match all file/dir names (including those beginning with a dot) in Bash?

I have extglob set and dotglob unset.
.* also yields . and .., very evil in conjunction with mv or cp.
I played around a bit and found that *(?(.)+([^.])) and $(ls -A) give the desired result, but I think there should be an easier way...
EDIT: Sorry, I should have mentioned that I am looking for an expression to be used at the prompt, not within a script.
unset GLOBIGNORE # empty-by-default, but let's make sure
shopt -s dotglob # disable special handling for "hidden" files
# ...and with the above items both done:
files=( * ) # just an example use of a glob
...sets the array files to contain all objects in the current directory except . and ..; any other use of * would behave similarly.

How to convert files in Unix using iconv?

I'm new to Bash scripting. I have a requirement to convert multiple input files in UTF-8 encoding to ISO 8859-1.
I am using the below command, which is working fine for the conversion part:
cd ${DIR_INPUT}/
for f in *.txt; do iconv -f UTF-8 -t ISO-8859-1 $f > ${DIR_LIST}/$f; done
However, when I don't have any text files in my input directory ($DIR_INPUT), it still creates an empty .txt file in my output directory ($DIR_LIST).
How can I prevent this from happening?
The empty file *.txt is being created in your output directory because by default, bash expands an unmatched expansions to the literal string that you supplied. You can change this behaviour in a number of ways, but what you're probably looking for is shopt -s nullglob. Observe:
$ for i in a*; do echo "$i"; done
a*
$ shopt -s nullglob
$ for i in a*; do echo "$i"; done
$
You can find documentation about this in the bash man page under Pathname Expansion. Or here or here.
In your case, I'd probably rewrite this in this way:
shopt -s nullglob
for f in "$DIR_INPUT"/*.txt; do
iconv -f UTF-8 -t ISO-8859-1 "$f" > "${DIR_LIST}/${f##*/}"
done
This avoids the need for the initial cd, and uses parameter expansion to strip off the path portion of $f for the output redirection. The nullglob will obviously eliminate the work being done on a nonexistent file.
As #ghoti pointed out, in the absence of files matching the wildcard expression a* the expression itself becomes the result of pathname expansion. By default (when nullglob option is unset), a* is expanded to, literally, a*.
You can set nullglob option, of course. But then you should be aware of the fact that all subsequent pathname expansions will be affected, unless you unset the option after the loop.
I would rather use find command which has a clear interface (and, in my opinion, is less likely to perform implicit conversions as opposed to the Bash globbing). E.g.:
cmd='iconv --verbose -f UTF-8 -t ISO-8859-1 "$0" > "$1"/$(basename "$0")'
find "${DIR_INPUT}/" \
-mindepth 1 \
-maxdepth 1 \
-type f \
-name '*.txt' \
-exec sh -c "$cmd" {} "${DIR_LIST}" \;
In the example above, $0 and $1 are positional arguments for the file path and ${DIR_LIST} respectively. The command is invoked via standard shell (sh) because of the need to refer to the file path {} twice. Although most modern implementations of find may handle multiple occurrences of {} correctly, the POSIX specification states:
If more than one argument containing the two characters "{}" is present, the behavior is unspecified.
As in the for loop, the -name pattern *.txt is evaluated as true if the basename of the current pathname matches the operand (*.txt) using the pattern matching notation. But, unlike the for loop, filename expansion do not apply as this is a matching operation, not an expansion.

How to deal with `*` expansion when there are no files

I am making a shell script that allows you to select a file from a directory using YAD. I am doing this:
list='';
exc='!'
for f in "$SHOTS_NOT_CONVERTED_DIR"/*;do
f=`basename $f`
list="${list}${exc}${f}"
done
The problem is that if there are no files in that directory, I end up with a selection with *.
What's the easiest, most elegant way to make this work in Bash?
The goal is to have an empty list if there are no files there.
* expansion is called a glob expressions. The bash manual calls it filename expansion.
You need to set the nullglob option. Doing so gives you an empty result if the glob expression does not find files:
shopt -s nullglob
list='';
exc='!'
for f in "$SHOTS_NOT_CONVERTED_DIR"/*;do
# Btw, use $() instead of ``
f=$(basename "$f")
list="${list}${exc}${f}"
done

How to expand wildcards after substituting them into a filepath as arguments, in bash script?

I am currently trying to substitute arguments into a filepath
FILES=(~/some/file/path/${1:-*}*/${2:-*}*/*)
I'm trying to optionally substitute variables, so that if there are no arguments the path looks like ~/some/file/path/**/**/* and if there is just one, it looks like ~/some/file/path/arg1*/**/*, etc. However, I need the wildcard expansion to occur after the filepath has been constructed. Currently what seems to be happening is that the filepath is into FILES as a single filepath with asterisks.
The broader goal is to pass all subdirectories that are two levels down from the current directory into the FILES variable, unless arguments are given, in which case the first argument is used to pick a particular directory at the first level, the second argument for the second level.
edit:
This script generates directories and then grabs random files from them, and previously had ** instead of *, however it still works, and correctly restricts the files to pull from when given arguments. Issue resolved.
#!/bin/bash
mkdir dir1 dir1/a
touch dir1/a/foo.txt dir1/a/bar.txt
cp -r dir1/a dir1/b
cp -r dir1 dir2
files=(./*${1:-}/*/*)
for i in {1..10}
do
# Get random file
nextfile=${files[$RANDOM % ${#files[#]} ]}
# Use file
echo "$nextfile" || break
sleep 0.5
done
rm -r dir1 dir2
I can't reproduce this behavior.
$ files=( ~/tmp/foo/${1:-*}*/${2:-*}*/* )
$ declare -p files
declare -a files='([0]="/Users/chaduffy/tmp/foo/bar/baz/qux")'
To explain why this is expected to work: Parameter expansion happens before glob expansion, so by the time glob expansion takes place, content has already been expanded. See lhunath's simplified diagram of the bash parse/expansion process for details.
A likely explanation is simply that your glob has no matches, and is evaluating to itself for that reason. This behavior can be disabled with the nullglob switch, which will give you an empty array:
shopt -s nullglob
files=(~/some/file/path/${1:-*}*/${2:-*}*/*)
declare -p files
Another note: ** only has special meaning in shells where shopt -s globstar has been run, and where this feature (added in 4.0) is available. On Mac OS X (without installation of a newer version of bash via MacPorts or similar), it doesn't exist; you'll want to use find for recursive operations. If your glob would only match if ** triggered recursion, this would explain the behavior in question.

Resources