bash `find` escaping - bash

I need to find all of the TIFFs in a directory, recursively, but ignore some artifacts (basically all hidden files) that also happen to end with ".tif". This command:
find . -type f -name '*.tif' ! -name '.*'
works exactly how I want it on the command line, but inside a bash script it doesn't find anything. I've tried replacing ! with -and -not and--I think--just about every escaping permutation I can think of and/or recommended by the googleshpere, e.g. .\*, leaving out single quotes, etc. Obviously I'm missing something, any help is appreciated.
EDIT: here's the significant part of the script; the directory it's doing the find on is parameterized, but I've been debugging with it hard-coded; it makes no difference:
#!/bin/bash
RECURSIVE=1
DIR=$1
#get the absolute path to $DIR
DIR=$(cd $DIR; pwd)
FIND_CMD="find $DIR -type f -name '*.tif' ! -name '.*'"
if [ $RECURSIVE == 1 ]; then
FIND_CMD="$FIND_CMD -maxdepth 1"
fi
for in_img in $($FIND_CMD | sort); do
echo $in_img # for debugging
#stuff
done

It was related to having the expression stored in a variable. The solution was to use eval, which of course would be the right thing to do anyway. Everything is the same as above except at the start of the for loop:
for in_img in $(eval $FIND_CMD | sort); do
#stuff
done

To not find hidden files you can use the following:
find . \( ! -regex '.*/\..*' \) -type f -name "*.tif"
It checks the filename and doesn't show (! in the parenthesis) the files beginning with a dot, which are the hidden files.

Related

Getting the contents of a directory excluding everything inside .git in bash

I need to get the number of the contents of a directory that is a git repository.
I have to get the number of:
1) Other directories inside the directory I am currently iterating (and the other sub-directories inside them if they exist)
2) .txt files inside the directory and its sub-directories
3) All the non-txt files inside the directory and its sub-directories
In all the above cases I must ignore the .git directory, along with all the files and directories that are inside of it.
Also I must use bash script exclusively. I can't use another programing language.
Right now I am using the following commands to achieve this:
To get all the .txt files I use : find . -type f \( -name "*.txt" \). There are no .txt files inside .git so this is working.
To get all the non-txt files I use: find . -type f \( ! -name "*.txt" \). The problem is that I also get all the files from .git and I don't know how to ignore them.
To get all the directories and sub-directories I use: find . -type d. I don't know how to ignore the .git directory and it's sub-directories
The easy way is to just add these extra tests:
find . ! -path './.git/*' ! -path ./.git -type f -name '*.txt'
The problem with this is ./.git is still traversed, unnecessarily, which takes time.
Instead, -prune can be used. -prune is not a test (like -path, or -type). It's an action. The action is "don't descend the current path, if it's a directory". It must be used separately to the print action.
# task 1
find . -path './.git' -prune -o -type f -name '*.txt' -print
# task 2
find . -path './.git' -prune -o -type f ! -name '*.txt' -print
# task 3
find . -path './.git' -prune -o -type d -print
If -print isn't specified, ./.git is also printed as the default action.
I used -path ./.git, because you said "the .git directory". If for some reason there are other .git directories in the tree, they will be traversed and printed. To ignore all directories in the tree named .git, replace -path ./.git with -name .git.
Sometimes writing a bash loop is more clear than a one-liner
for f in $(find .); do
if [[ -d $f && "$f" == "./.git" ]]; then
echo "skipping dir $f";
else
echo "do something with $f";
fi;
done

How to randomly name file when using find exec in a bash script?

When it comes to quickly converting a bunch of files and randomly renaming them I use a pretty simple way to do so with a for loop:
for i in *; do convert [...] $i ../output/$RANDOM.jpg; done
Easy as that. The details what imagemagick does here aren't important here. It works as intended. It's just about how to handle the bash stuff.
Now my current case the folder does not only contain photos, it also does contain subfolders with other photos themself. Expected behavior is again that all photos are randomly renamed and the output files are merged in a single folder.
Since I don't know a way to recursively loop with for, I use a find construct here.
find . \( -iname "*.jpg" -or -iname "*.png" \) -exec convert [...] {} ../output/$RANDOM.jpg \;
Problem is $RANDOM does only get called once, so it stays the same over the whole process and the images get overwritten again and again. So in fact the output folder does only one image, the one that got processed the last.
So the question is:
How do I get the $RANDOM variable to change with each new file?
Kind regards!
Throw it into a loop.
find . \( -iname "*.jpg" -or -iname "*.png" \) -type f -print0 |
while read -d '' -r f
do convert [...] "$f" ../output/$RANDOM.jpg # copied mostly from your find above
done
The -print0 and read -d '' are unnecessary if you never have embedded newlines in your filenames.
Don't use find at all; just use the globstar option.
shopt -s globstar
for f in **/*.jpg **/*.png; do
convert [...] "$i" ../output/$RANDOM.jpg
done
I would go with a shell loop as detailed in the other answers, but it's still useful to know how to run arbitrary shell code like $RANDOM in a find -exec command. You do it by running a shell:
find . \( -iname "*.jpg" -or -iname "*.png" \) \
-exec bash -c 'convert [...] "$1" "../output/$RANDOM.jpg"' _ {} \;

find with nested command reading blacklist

I have a script that recursively searches all directories for specific files or specific file endings.
These certain files I want to save the path in a description file.
Looks for example like this:
./org/apache/commons/.../file1.pom
./org/apache/commons/.../file1.jar
./org/apache/commons/.../file1.zip
and so on.
In a blacklist , I describe which file endings I want to ignore.
! -path "./.cache/*" ! -path "./org/*" ! -name "*.sha1" ! -name"*.lastUpdated"
and so on.
Now i want to read this blacklist file while the search to ignore the described files:
find . -type f $(cat blacklist) > artifact.descriptor
Unfortunately, the blacklist will not be included while the search.
When:
echo "find . -type f $(cat blacklist) > artifact.descriptor"
Result is as expected:
find . -type f ! -path "./.cache/*" ! -path "./org/*" ! -name "*.sha1" ! -name"*.lastUpdated" > artifact.descriptor
But it does not work or exclude the described files.
I tried with following command and it works, but i want to know why not with with find alone.
find . -type f | grep -vf $blacklist > artifact.descriptor
Hopefully someone can explain it to me :)
Thanks a lot.
As tripleee suggests, it is generally considered bad practice to store a command in a variable because it does not catch all the cornercases.
However you can use eval as a workaround
/tmp/test$ ls
blacklist test.a test.b test.c
/tmp/test$ cat blacklist
-not -name *.c -not -name *.b
/tmp/test$ eval "find . -type f "`cat blacklist`
./test.a
./blacklist
In your case I think it fails because the quotes in your blacklist file are considered as a literal and not as enclosing the patterns and I think it works if you remove them, but still it's probably not safe for other reasons.
! -path ./.cache/* ! -path ./org/* ! -name *.sha1 ! -name *.lastUpdated

Expanding asterisk in bash

I'm trying to run find, and exclude several directories listed in an array. I'm finding some weird behavior when it's expanding, though, which is causing me issues:
~/tmp> skipDirs=( "./dirB" "./dirC" )
~/tmp> bars=$(find . -name "bar*" -not \( -path "${skipDirs[0]}/*" $(printf -- '-o -path "%s/\*" ' "${skipDirs[#]:1}") \) -prune); echo $bars
./dirC/bar.txt ./dirA/bar.txt
This did not skip dirC as I wold have expected. The problem is that the print expands the quotes around "./dirC".
~/tmp> set -x
+ set -x
~/tmp> bars=$(find . -name "bar*" -not \( -path "${skipDirs[0]}/*" $(printf -- '-o -path "%s/*" ' "${skipDirs[#]:1}") \) -prune); echo $bars
+++ printf -- '-o -path "%s/*" ' ./dirC
++ find . -name 'bar*' -not '(' -path './dirB/*' -o -path '"./dirC/*"' ')' -prune
+ bars='./dirC/bar.txt
./dirA/bar.txt'
+ echo ./dirC/bar.txt ./dirA/bar.txt
./dirC/bar.txt ./dirA/bar.txt
If I try to remove the quotes in the $(print..), then the * gets expanded immediately, which also gives the wrong results. Finally, if I remove the quotes and try to escape the *, then the \ escape character gets included as part of the filename in the find, and that does not work either. I'm wondering why the above does not work, and, what would work? I'm trying to avoid using eval if possible, but currently I'm not seeing a way around it.
Note: This is very similar to: Finding directories with find in bash using a exclude list, however, the posted solutions to that question seem to have the issues I listed above.
The safe approach is to build your array explicitly:
#!/bin/bash
skipdirs=( "./dirB" "./dirC" )
skipdirs_args=( -false )
for i in "${skipdirs[#]}"; do
args+=( -o -type d -path "$i" )
done
find . \! \( \( "${skipdirs_args[#]}" \) -prune \) -name 'bar*'
I slightly modify the logic in your find since you had a slight (logic) error in there: your command was:
find -name 'bar*' -not stuff_to_prune_the_dirs
How does find proceed? it will parse the files tree and when it finds a file (or directory) that matches bar* then it will apply the -not ... part. That's really not what you want! your -prune is never going to be applied!
Look at this instead:
find . \! \( -type d -path './dirA' -prune \)
Here find will completely prune the directory ./dirA and print everything else. Now it's among everything else that you want to apply the filter -name 'bar*'! the order is very important! there's a big difference between this:
find . -name 'bar*' \! \( -type d -path './dirA' -prune \)
and this:
find . \! \( -type d -path './dirA' -prune \) -name 'bar*'
The first one doesn't work as expected at all! The second one is fine.
Notes.
I'm using \! instead of -not as \! is POSIX, -not is an extension not specified by POSIX. You'll argue that -path is not POSIX either so it doesn't matter to use -not. That's a detail, use whatever you like.
You had to use some dirty trick to build your commands to skip your dir, as you had to consider the first term separately from the other. By initializing the array with -false, I don't have to consider any terms specially.
I'm specifying -type d so that I'm sure I'm pruning directories.
Since my pruning really applies to the directories, I don't have to include wildcards in my exclude terms. This is funny: your problem that seemingly is about wildcards that you can't handle disappears completely when you use find appropriately as explained above.
Of course, the method I gave really applies with wildcards too. For example, if you want to exclude/prune all subdirectories called baz inside subdirectories called foo, the skipdirs array given by
skipdirs=( "./*/foo/baz" "./*/foo/*/baz" )
will work fine!
The issue here is that the quotes you are using on "%s/*" aren't doing what you think they are.
That is to say, you think you need the quotes on "%s/*" to prevent the results from the printf from being globbed however that isn't what is happening. Try the same thing without the directory separator and with files that start and end with double quotes and you'll see what I mean.
$ ls
"dirCfoo"
$ skipDirs=( "dirB" "dirC" )
$ printf '%s\n' -- -path "${skipDirs[0]}*" $(printf -- '-o -path "%s*" ' "${skipDirs[#]:1}")
-path
dirB*
-o
-path
"dirCfoo"
$ rm '"dirCfoo"'
$ printf -- '%s\n' -path "${skipDirs[0]}*" $(printf -- '-o -path "%s*" ' "${skipDirs[#]:1}")
-path
dirB*
-o
-path
"dirC*"
See what I mean? The quotes aren't being handled specially by the shell. They just happen not to glob in your case.
This issue is part of why things like what is discussed at http://mywiki.wooledge.org/BashFAQ/050 don't work.
To do what you want here I believe you need to create the find arguments array manually.
sD=(-path /dev/null)
for dir in "${skipDirs}"; do
sD+=(-o -path "$dir")
done
and then expand "${sD[#]}" on the find command line (-not \( "${sD[#]}" \) or so).
And yes, I believe this makes the answer you linked to incorrect (though the other answer might work (for non-whitespace, etc. files) because of the array indirection that is going on.

Not expanding asterisk by shell - excluding paths from find

I don't code in Bash daily. I'm trying to implement small functionality: user define an array of directories or files to omit in find command. Unfortunately I have a problem with expanding asterisk and other meta-characters by shell (* is expanded during concatenation). My code is:
excluded=( "subdirectory/a/*"
"subdirectory/b/*"
)
cnt=0
for i in "${excluded[#]}"; do
directories="$directories ! -path \"./${excluded[$cnt]}\""
cnt=$(($cnt+1))
done
echo "$directories"
for i in $(find . -type f -name "*.txt" $directories); do
new_path=$(echo "$i"|sed "s/\.\//..\/up\//g")
echo $new_path
done
Unfortunately, I still see excluded directories in results.
EDIT:
This is not duplicate of existing question. I don't ask you how to exclude directories in find. I have a problem with expanding meta-characters like "*" by passing variables to find command. E.g I have almost working solution below:
excluded=( "subdirectory/a/*"
"subdirectory/b/*"
)
cnt=0
for i in "${excluded[#]}"; do
directories="$directories ! -path ./${excluded[$cnt]}"
cnt=$(($cnt+1))
done
echo "$directories"
for i in $(find . -type f -name "*.txt" $directories); do
new_path=$(echo "$i"|sed "s/\.\//..\/up\//g")
echo $new_path
done
It works, but problem is when e.g directory c contains more than one file. In such case, asterisk sign is replaced by full file paths. Consequently I have an error:
find: paths must precede expression: ./subdirectory/c/fgo.txt
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
Why? Because asterisk sgin has been expanded to full file name:
! -path ./subdirectory/a/aaa.txt ! -path ./subdirectory/b/dfdji.txt ! -path ./subdirectory/c/asd.txt ./subdirectory/c/fgo.txt
My question is: how to avoid such situation?
You'll want to use the -prune switch in find.
Here's an example (I found this on stackoverflow itself)
find . -type d \( -path dir1 -o -path dir2 -o -path dir3 \) -prune -o -print
This omits, dir1, dir2, dir3.
Source: https://stackoverflow.com/a/4210072/1220089
My initial thought is to use double quotes to prevent the expansion:
for i in $(find . -type f -name "*.txt" "$directories" | sed 's#\./#\.\./up/#g'); do
but this of course fails. You can accomplish the same effect with (untested):
pre='! -path'
excluded=( "$pre subdirectory/a/*"
"$pre subdirectory/b/*"
)
for i in $(find . -type f -name "*.txt" "${excluded[#]}" | sed ...); do

Resources