How to use find with variable and wildcards - bash

I'm writing a simple program that downloads multiple images from multiple pages of a website. When trying to implement folder creation that has a naming structure similar to how the website is layed out, I ran into issues. Below is a bare bones example of what I used to replicate the behavior of my other program.
#!/bin/bash
# Sample inputs:
# http://testurl.com/post/1234
# http://testurl.com/post/5678
folder=""
if [[ $1 == *"post"* ]]; then
folder=${1##*/}
folder=${folder//[$'\t\r\n ']}
fi
if [[ $(find "$HOME" -name "*$folder*" -print -quit) ]]; then
echo 'Hi'
else
echo 'Bye'
fi
# Sample directories:
# /home/user/1234
# /home/user/0001
Everywhere I've looked tells me this should run perfectly. However, this does not run as it should and I've been at it for hours. Can anyone help me?
Bash version: GNU bash, version 5.0.3(1)-release (x86_64-pc-linux-gnu)

this simplifies the test whether find found something, using standard grep rather than bashisms:
if find "$HOME" -type d -name "$folder" -print -quit | grep .; then
echo "Hi"
else
echo "Bye"
fi
i also changed two constraints for find:
only search for directories: -type d
so you don't get ordinary files
only search for paths where the basename (the last componenent of the full path) matches ${folder} exactly
so you don't get matches for the /home/user/12345 or /home/user/.emacs.d/auto-save-list/.saves-12350-localhost~
for practical reasons (once the script is known to work), i would discard the output of grep, by redirecting it to /dev/null)
if the directories are all directly in "${HOME}", you could also add -maxdepth 1 as the first argument to find (to not recurse into subdirectories).
so you end up with something like:
if find "$HOME" -maxdepth 1 -type d -name "$folder" -print -quit | grep . >/dev/null
then
echo "Hi"
else
echo "Bye"
fi
or simply use:
if [ -d "${HOME}/${folder}" ]; then
echo "Hi"
else
echo "Bye"
fi

Related

find command with filename coming from bash printf builtin not working

I'm trying to do a script which lists files on a directory and then searchs one by one every file in other directory. For dealing with spaces and special characters like "[" or "]" I'm using $(printf %q "$FILENAME") as input for the find command: find /directory/to/search -type f -name $(printf %q "$FILENAME").
It works like a charm for every filename except in one case: when there's multibyte characters (UTF-8). In that case the output of printf is an external quoted string, i.e.: $'file name with blank spaces and quoted characters in the form of \NNN\NNN', and that string is not being expanded without the $'' quoting, so find searchs for a file with a name including that quote: «$'filename'».
Is there an alternative solution in order to be able to pass to find any kind of filename?
My script is like follows (I know some lines can be deleted, like the "RESNAME="):
#!/bin/bash
if [ -d $1 ] && [ -d $2 ]; then
IFSS=$IFS
IFS=$'\n'
FILES=$(find $1 -type f )
for FILE in $FILES; do
BASEFILE=$(printf '%q' "$(basename "$FILE")")
RES=$(find $2 -type f -name "$BASEFILE" -print )
if [ ${#RES} -gt 1 ]; then
RESNAME=$(printf '%q' "$(basename "$RES")")
else
RESNAME=
fi
if [ "$RESNAME" != "$BASEFILE" ]; then
echo "FILE NOT FOUND: $FILE"
fi
done
else
echo "Directories do not exist"
fi
IFS=$IFSS
As an answer said, I've used associative arrays, but with no luck, maybe I'm not using correctly the arrays, but echoing it (array[#]) returns nothing. This is the script I've written:
#!/bin/bash
if [ -d "$1" ] && [ -d "$2" ]; then
declare -A files
find "$2" -type f -print0 | while read -r -d $'\0' FILE;
do
BN2="$(basename "$FILE")"
files["$BN2"]="$BN2"
done
echo "${files[#]}"
find "$1" -type f -print0 | while read -r -d $'\0' FILE;
do
BN1="$(basename "$FILE")"
if [ "${files["$BN1"]}" != "$BN1" ]; then
echo "File not found: "$BN1""
fi
done
fi
Don't use for loops. First, it is slower. Your find has to complete before the rest of your program can run. Second, it is possible to overload the command line. The enter for command must fit in the command line buffer.
Most importantly of all, for sucks at handling funky file names. You're running conniptions trying to get around this. However:
find $1 -type f -print0 | while read -r -d $'\0' FILE
will work much better. It handles file names -- even file names that contain \n characters. The -print0 tells find to separate file names with the NUL character. The while read -r -d $'\0 FILE will read each file name (separate by the NUL character) into $FILE.
If you put quotes around the file name in the find command, you don't have to worry about special characters in the file names.
Your script is running find once for each file found. If you have 100 files in your first directory, you're running find 100 times.
Do you know about associative (hash) arrays in BASH? You are probably better off using associative arrays. Run find on the first directory, and store those files names in an associative array.
Then, run find (again using the find | while read syntax) for your second directory. For each file you find in the second directory, see if you have a matching entry in your associative array. If you do, you know that file is in both arrays.
Addendum
I've been looking at the find command. It appears there's no real way to prevent it from using pattern matching except through a lot of work (like you were doing with printf. I've tried using the -regex matching and using \Q and \E to remove the special meaning of pattern characters. I haven't been successful.
There comes a time that you need something a bit more powerful and flexible than shell to implement your script, and I believe this is the time.
Perl, Python, and Ruby are three fairly ubiquitous scripting languages found on almost all Unix systems and are available on other non-POSIX platforms (cough! ...Windows!... cough!).
Below is a Perl script that takes two directories, and searches them for matching files. It uses the find command once and uses associative arrays (called hashes in Perl). I key the hash to the name of my file. In the value portion of the hash, I store an array of the directories where I found this file.
I only need to run the find command once per directory. Once that is done, I can print out all the entries in the hash that contain more than one directory.
I know it's not shell, but this is one of the cases where you can spend a lot more time trying to figure out how to get shell to do what you want than its worth.
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use File::Find;
use constant DIRECTORIES => qw( dir1 dir2 );
my %files;
#
# Perl version of the find command. You give it a list of
# directories and a subroutine for filtering what you find.
# I am basically rejecting all non-file entires, then pushing
# them into my %files hash as an array.
#
find (
sub {
return unless -f;
$files{$_} = [] if not exists $files{$_};
push #{ $files{$_} }, $File::Find::dir;
}, DIRECTORIES
);
#
# All files are found and in %files hash. I can then go
# through all the entries in my hash, and look for ones
# with more than one directory in the array reference.
# IF there is more than one, the file is located in multiple
# directories, and I print them.
#
for my $file ( sort keys %files ) {
if ( #{ $files{$file} } > 1 ) {
say "File: $file: " . join ", ", #{ $files{$file} };
}
}
Try something like this:
find "$DIR1" -printf "%f\0" | xargs -0 -i find "$DIR2" -name \{\}
How about this one-liner?
find dir1 -type f -exec bash -c 'read < <(find dir2 -name "${1##*/}" -type f)' _ {} \; -printf "File %f is in dir2\n" -o -printf "File %f is not in dir2\n"
Absolutely 100% safe regarding files with funny symbols, newlines and spaces in their name.
How does it work?
find (the main one) will scan through directory dir1 and for each file (-type f) will execute
read < <(find dir2 -name "${1##*/} -type f")
with argument the name of the current file given by the main find. This argument is at position $1. The ${1##*/} removes everything before the last / so that if $1 is path/to/found/file the find statement is:
find dir2 -name "file" -type f
This outputs something if file is found, otherwise has no output. That's what is read by the read bash command. read's exit status is true if it was able to read something, and false if there wasn't anything read (i.e., in case nothing is found). This exit status becomes bash's exit status which becomes -exec's status. If true, the next -printf statement is executed, and if false, the -o -printf part will be executed.
If your dirs are given in variables $dir1 and $dir2 do this, so as to be safe regarding spaces and funny symbols that could occur in $dir2:
find "$dir1" -type f -exec bash -c 'read < <(find "$0" -name "${1##*/}" -type f)' "$dir2" {} \; -printf "File %f is in $dir2\n" -o -printf "File %f is not in $dir2\n"
Regarding efficiency: this is of course not an efficient method at all! the inner find will be executed as many times as there are found files in dir1. This is terrible, especially if the directory tree under dir2 is deep and has many branches (you can rely a little bit on caching, but there are limits!).
Regarding usability: you have fine-grained control on how both find's work and on the output, and it's very easy to add many more tests.
So, hey, tell me how to compare files from two directories? Well, if you agree on loosing a little bit of control, this will be the shortest and most efficient answer:
diff dir1 dir2
Try it, you'll be amazed!
Since you are only using find for its recursive directory following, it will be easier to simply use the globstar option in bash. (You're using associative arrays, so your bash is new enough).
#!/bin/bash
shopt -s globstar
declare -A files
if [[ -d $1 && -d $2 ]]; then
for f in "$2"/**/*; do
[[ -f "$f" ]] || continue
BN2=$(basename "$f")
files["$BN2"]=$BN2
done
echo "${files[#]}"
for f in "$1"/**/*; do
[[ -f "$f" ]] || continue
BN1=$(basename $f)
if [[ ${files[$BN1]} != $BN1 ]]; then
echo "File not found: $BN1"
fi
done
fi
** will match zero or more directories, so $1/**/* will match all the files and directories in $1, all the files and directories in those directories, and so forth all the way down the tree.
If you want to use associative arrays, here's one possibility that will work well with files with all sorts of funny symbols in their names (this script has too much to just show the point, but it is usable as is – just remove the parts you don't want and adapt to your needs):
#!/bin/bash
die() {
printf "%s\n" "$#"
exit 1
}
[[ -n $1 ]] || die "Must give two arguments (none found)"
[[ -n $2 ]] || die "Must give two arguments (only one given)"
dir1=$1
dir2=$2
[[ -d $dir1 ]] || die "$dir1 is not a directory"
[[ -d $dir2 ]] || die "$dir2 is not a directory"
declare -A dir1files
declare -A dir2files
while IFS=$'\0' read -r -d '' file; do
dir1files[${file##*/}]=1
done < <(find "$dir1" -type f -print0)
while IFS=$'\0' read -r -d '' file; do
dir2files[${file##*/}]=1
done < <(find "$dir2" -type f -print0)
# Which files in dir1 are in dir2?
for i in "${!dir1files[#]}"; do
if [[ -n ${dir2files[$i]} ]]; then
printf "File %s is both in %s and in %s\n" "$i" "$dir1" "$dir2"
# Remove it from dir2 has
unset dir2files["$i"]
else
printf "File %s is in %s but not in %s\n" "$i" "$dir1" "$dir2"
fi
done
# Which files in dir2 are not in dir1?
# Since I unset them from dir2files hash table, the only keys remaining
# correspond to files in dir2 but not in dir1
if [[ -n "${!dir2files[#]}" ]]; then
printf "File %s is in %s but not in %s\n" "$dir2" "$dir1" "${!dir2files[#]}"
fi
Remark. The identification of files is only based on their filenames, not their contents.

List directories not containing certain files?

I used this command to find all the directories containing .mp3 in the current directory, and filtered out only the directory names:
find . -iname "*.mp3" | sed -e 's!/[^/]*$!!' -e 's!^\./!!' | sort -u
I now want the opposite, but I found it a little harder. I can't just add a '!' to the find command since it'll only exclude .mp3 when printing them not find directories that do not contain .mp3.
I googled this and searched on stackoverflow and unix.stackexchange.com.
I have tried this script so far and it returns this error:
#!/bin/bash
find . -type d | while read dir
do
if [[! -f $dir/*.mp3 ]]
then
echo $dir
fi
done
/home/user/bin/try.sh: line 5: [[!: command not found
#!/bin/bash
find . -type d | while read dir
do
if [! -f $dir/*.mp3 ]
then
echo $dir
fi
done
/home/user/bin/try.sh: line 5: [!: command not found
#!/bin/bash
find . -type d | while read dir
do
if [[! -f "$dir/*.mp3" ]]
then
echo $dir
fi
done
/home/user/bin/try.sh: line 5: [!: command not found
I'm thinking it has to do with multiple arguments for the test command.
Since I'm testing all the directories the variable is going to change, and I use a wildcard for the filenames.
Any help is much appreciated. Thank You.
[ "$(echo $dir/*.mp3)" = "$dir/*.mp3" ]
should work.
Or simply add a space between '[' and '!'
A method that is probably significantly faster is
if find "$dir" -name '*.mp3' -quit ; then
: # there are mp3-files in there.
else
; # no mp3:s
fi
Okay, I solved my own answer by using a counter.
I don't know how efficient it is, but it works. I know it can be made better. Please feel free to critique.
find . -type d | while read dir
do
count=`ls -1 "$dir"/*.mp3 2>/dev/null | wc -l`
if [ $count = 0 ]
then
echo $dir
fi
done
This prints all directories not containing MP3s It also shows sub-directories thanks to the find command printing directories recursively.
I ran a script to automatically download cover art for my mp3 collection. It put a file called "cover.jpg" in the directory for each album for which it could retrieve the art. I needed to check for which albums the script had failed - i.e. which CDs (directories) did not contain a file called cover.jpg. This was my effort:
find . -maxdepth 1 -mindepth 1 -type d | while read dir; do [[ ! -f $dir/cover.jpg ]] && echo "$dir has no cover art"; done
The maxdepth 1 stops the find command from descending into a hidden directory which my WD My Cloud NAS server had created for each album and placed a default generic disc image. (This got cleared during the next scan.)
Edit: cd to the MP3 directory and run it from there, or change the . in the command above to the path to point to it.

Suppress output to StdOut when piping echo

I'm making a bash script that crawls through a directory and outputs all files of a certain type into a text file. I've got that working, it just also writes out a bunch of output to console I don't want (the names of the files)
Here's the relevant code so far, tmpFile is the file I'm writing to:
for DIR in `find . -type d` # Find problem directories
do
for FILE in `ls "$DIR"` # Loop through problems in directory
do
if [[ `echo ${FILE} | grep -e prob[0-9]*_` ]]; then
`echo ${FILE} >> ${tmpFile}`
fi
done
done
The files I'm putting into the text file are in the format described by the regex prob[0-9]*_ (something like prob12345_01)
Where I pipe the output from echo ${FILE} into grep, it still outputs to stdout, something I want to avoid. I think it's a simple fix, but it's escaping me.
All this can be done in one single find command. Consider this:
find . -type f -name "prob[0-9]*_*" -exec echo {} >> ${tmpFile} \;
EDIT:
Even simpler: (Thanks to #GlennJackman)
find . -type f -name "prob[0-9]*_*" >> $tmpFile
To answer your specific question, you can pass -q to grep for silent output.
if echo "hello" | grep -q el; then
echo "found"
fi
But since you're already using find, this can be done with just one command:
find . -regex ".*prob[0-9]*_.*" -printf '%f\n' >> ${tmpFile}
find's regex is a match on the whole path, which is why the leading and trailing .* is needed.
The -printf '%f\n' prints the file name without directory, to match what your script is doing.
what you want to do is, read the output of the find command,
for every entry find returned, you want to get all (*) the files under that location
and then you want to check whether that filename matches the pattern you want
if it matches then add it to the tmpfile
while read -r dir; do
for file in "$dir"/*; do # will not match hidden files, unless dotglob is set
if [[ "$file" =~ prob[0-9]*_ ]]; then
echo "$file" >> "$tmpfile"
fi
done < <(find . -type d)
however find can do that alone
anubhava got me there ;)
so look his answer on how that's done

UNIX: find a file in directories above $PWD

I want to find a file with a certain name, but search in direcotories above the current one, instead of below.
I'd like something similar to: (except functional)
$ cd /some/long/path/to/my/dir/
$ find -maxdepth -1 -name 'foo'
/some/long/path/to/foo
/some/foo
Shell scripts or one-liners preferred.
In response to the several questions, the difference between the above example and the real find is that the search is proceeding upward from the current directory (and -maxdepth doesn't take a negative argument).
Interesting question, so I try to give a interesting answer :)
find `( CP=${PWD%/*}; while [ -n "$CP" ] ; do echo $CP; CP=${CP%/*}; done; echo / ) ` -mindepth 1 -maxdepth 1 -type f -name 'foo'
A bit of elaborate, the 'while' loop will try in generate list of path which is parent to current directory. The while loop won't generate /, so I add additional 'echo /' to cover that.
Finally, the enclosing "find" command is fairly basic usage.
You could use Parameter Expansion:
path="/some/long/path/to/my/dir"
while [ -n "$var" ]
do
find $path -maxdepth 1 -name 'foo'
path="${var%/*}"
done
This works, but it's not as simple as I hoped.
FILE=foo
DIR=$PWD
while [[ $DIR != '/' ]]; do
if [[ -e $DIR/$FILE ]]; then
echo $DIR/$FILE
else
DIR=`dirname $DIR`
fi
done
If you mean exclude the current dir:
find / -name 'foo' ! -iwholename "$PWD*"
If you mean: direct matches in any dir in the trail, this would work, but my bash-fu is not enough to easily get the list of dirs:
find /some/ /some/long /some/long/path/ /some/long/path/to/ /some/long/path/to/my -maxdepth=1 -name='foo'
So all we need is a method to alter /some/long/path/to/my/dir to
/some/ /some/long /some/long/path/ /some/long/path/to/ /some/long/path/to/my

How to check the extension of a filename in a bash script?

I am writing a nightly build script in bash.
Everything is fine and dandy except for one little snag:
#!/bin/bash
for file in "$PATH_TO_SOMEWHERE"; do
if [ -d $file ]
then
# do something directory-ish
else
if [ "$file" == "*.txt" ] # this is the snag
then
# do something txt-ish
fi
fi
done;
My problem is determining the file extension and then acting accordingly. I know the issue is in the if-statement, testing for a txt file.
How can I determine if a file has a .txt suffix?
Make
if [ "$file" == "*.txt" ]
like this:
if [[ $file == *.txt ]]
That is, double brackets and no quotes.
The right side of == is a shell pattern.
If you need a regular expression, use =~ then.
I think you want to say "Are the last four characters of $file equal to .txt?" If so, you can use the following:
if [ "${file: -4}" == ".txt" ]
Note that the space between file: and -4 is required, as the ':-' modifier means something different.
You could also do:
if [ "${FILE##*.}" = "txt" ]; then
# operation for txt files here
fi
You just can't be sure on a Unix system, that a .txt file truly is a text file. Your best bet is to use "file". Maybe try using:
file -ib "$file"
Then you can use a list of MIME types to match against or parse the first part of the MIME where you get stuff like "text", "application", etc.
You can use the "file" command if you actually want to find out information about the file rather than rely on the extensions.
If you feel comfortable with using the extension you can use grep to see if it matches.
The correct answer on how to take the extension available in a filename in linux is:
${filename##*\.}
Example of printing all file extensions in a directory
for fname in $(find . -maxdepth 1 -type f) # only regular file in the current dir
do echo ${fname##*\.} #print extensions
done
Similar to 'file', use the slightly simpler 'mimetype -b' which will work no matter the file extension.
if [ $(mimetype -b "$MyFile") == "text/plain" ]
then
echo "this is a text file"
fi
Edit: you may need to install libfile-mimeinfo-perl on your system if mimetype is not available
I wrote a bash script that looks at the type of a file then copies it to a location, I use it to look through the videos I've watched online from my firefox cache:
#!/bin/bash
# flvcache script
CACHE=~/.mozilla/firefox/xxxxxxxx.default/Cache
OUTPUTDIR=~/Videos/flvs
MINFILESIZE=2M
for f in `find $CACHE -size +$MINFILESIZE`
do
a=$(file $f | cut -f2 -d ' ')
o=$(basename $f)
if [ "$a" = "Macromedia" ]
then
cp "$f" "$OUTPUTDIR/$o"
fi
done
nautilus "$OUTPUTDIR"&
It uses similar ideas to those presented here, hope this is helpful to someone.
I guess that '$PATH_TO_SOMEWHERE'is something like '<directory>/*'.
In this case, I would change the code to:
find <directory> -maxdepth 1 -type d -exec ... \;
find <directory> -maxdepth 1 -type f -name "*.txt" -exec ... \;
If you want to do something more complicated with the directory and text file names, you could:
find <directory> -maxdepth 1 -type d | while read dir; do echo $dir; ...; done
find <directory> -maxdepth 1 -type f -name "*.txt" | while read txtfile; do echo $txtfile; ...; done
If you have spaces in your file names, you could:
find <directory> -maxdepth 1 -type d | xargs ...
find <directory> -maxdepth 1 -type f -name "*.txt" | xargs ...
Credit to #Jox for the majority of this answer, though I found (js) was matching .json files so I added an eol character to more fully match the extension.
$file does not need to be quoted because [[ ]] won't expand and so spaces aren't an issue (credit: Hontvári Levente)
if [[ $file =~ .*\.(js$|json$) ]]; then
echo "The extension of '$file' matches .js|.json";
fi
Another important detail, you don't use else with another if inside:
else
if [ "$file" == "*.txt" ]
# this is the snag
then
# do something txt-ish
fi
instead:
elif [ "$file" == "*.txt" ]
# this is the snag
then
# do something txt-ish
fi
else is used when there's nothing else left > do > thatcommand
just because you can do something that doesn't necessarily means you should always do it

Resources