Expanding asterisk in bash - bash

I'm trying to run find, and exclude several directories listed in an array. I'm finding some weird behavior when it's expanding, though, which is causing me issues:
~/tmp> skipDirs=( "./dirB" "./dirC" )
~/tmp> bars=$(find . -name "bar*" -not \( -path "${skipDirs[0]}/*" $(printf -- '-o -path "%s/\*" ' "${skipDirs[#]:1}") \) -prune); echo $bars
./dirC/bar.txt ./dirA/bar.txt
This did not skip dirC as I wold have expected. The problem is that the print expands the quotes around "./dirC".
~/tmp> set -x
+ set -x
~/tmp> bars=$(find . -name "bar*" -not \( -path "${skipDirs[0]}/*" $(printf -- '-o -path "%s/*" ' "${skipDirs[#]:1}") \) -prune); echo $bars
+++ printf -- '-o -path "%s/*" ' ./dirC
++ find . -name 'bar*' -not '(' -path './dirB/*' -o -path '"./dirC/*"' ')' -prune
+ bars='./dirC/bar.txt
./dirA/bar.txt'
+ echo ./dirC/bar.txt ./dirA/bar.txt
./dirC/bar.txt ./dirA/bar.txt
If I try to remove the quotes in the $(print..), then the * gets expanded immediately, which also gives the wrong results. Finally, if I remove the quotes and try to escape the *, then the \ escape character gets included as part of the filename in the find, and that does not work either. I'm wondering why the above does not work, and, what would work? I'm trying to avoid using eval if possible, but currently I'm not seeing a way around it.
Note: This is very similar to: Finding directories with find in bash using a exclude list, however, the posted solutions to that question seem to have the issues I listed above.

The safe approach is to build your array explicitly:
#!/bin/bash
skipdirs=( "./dirB" "./dirC" )
skipdirs_args=( -false )
for i in "${skipdirs[#]}"; do
args+=( -o -type d -path "$i" )
done
find . \! \( \( "${skipdirs_args[#]}" \) -prune \) -name 'bar*'
I slightly modify the logic in your find since you had a slight (logic) error in there: your command was:
find -name 'bar*' -not stuff_to_prune_the_dirs
How does find proceed? it will parse the files tree and when it finds a file (or directory) that matches bar* then it will apply the -not ... part. That's really not what you want! your -prune is never going to be applied!
Look at this instead:
find . \! \( -type d -path './dirA' -prune \)
Here find will completely prune the directory ./dirA and print everything else. Now it's among everything else that you want to apply the filter -name 'bar*'! the order is very important! there's a big difference between this:
find . -name 'bar*' \! \( -type d -path './dirA' -prune \)
and this:
find . \! \( -type d -path './dirA' -prune \) -name 'bar*'
The first one doesn't work as expected at all! The second one is fine.
Notes.
I'm using \! instead of -not as \! is POSIX, -not is an extension not specified by POSIX. You'll argue that -path is not POSIX either so it doesn't matter to use -not. That's a detail, use whatever you like.
You had to use some dirty trick to build your commands to skip your dir, as you had to consider the first term separately from the other. By initializing the array with -false, I don't have to consider any terms specially.
I'm specifying -type d so that I'm sure I'm pruning directories.
Since my pruning really applies to the directories, I don't have to include wildcards in my exclude terms. This is funny: your problem that seemingly is about wildcards that you can't handle disappears completely when you use find appropriately as explained above.
Of course, the method I gave really applies with wildcards too. For example, if you want to exclude/prune all subdirectories called baz inside subdirectories called foo, the skipdirs array given by
skipdirs=( "./*/foo/baz" "./*/foo/*/baz" )
will work fine!

The issue here is that the quotes you are using on "%s/*" aren't doing what you think they are.
That is to say, you think you need the quotes on "%s/*" to prevent the results from the printf from being globbed however that isn't what is happening. Try the same thing without the directory separator and with files that start and end with double quotes and you'll see what I mean.
$ ls
"dirCfoo"
$ skipDirs=( "dirB" "dirC" )
$ printf '%s\n' -- -path "${skipDirs[0]}*" $(printf -- '-o -path "%s*" ' "${skipDirs[#]:1}")
-path
dirB*
-o
-path
"dirCfoo"
$ rm '"dirCfoo"'
$ printf -- '%s\n' -path "${skipDirs[0]}*" $(printf -- '-o -path "%s*" ' "${skipDirs[#]:1}")
-path
dirB*
-o
-path
"dirC*"
See what I mean? The quotes aren't being handled specially by the shell. They just happen not to glob in your case.
This issue is part of why things like what is discussed at http://mywiki.wooledge.org/BashFAQ/050 don't work.
To do what you want here I believe you need to create the find arguments array manually.
sD=(-path /dev/null)
for dir in "${skipDirs}"; do
sD+=(-o -path "$dir")
done
and then expand "${sD[#]}" on the find command line (-not \( "${sD[#]}" \) or so).
And yes, I believe this makes the answer you linked to incorrect (though the other answer might work (for non-whitespace, etc. files) because of the array indirection that is going on.

Related

Why is my `find` command giving me errors relating to ignored directories?

I have this find command:
find . -type f -not -path '**/.git/**' -not -path '**/node_modules/**' | xargs sed -i '' s/typescript-library-skeleton/xxx/g;
for some reason it's giving me these warnings/errors:
find: ./.git/objects/3c: No such file or directory
find: ./.git/objects/3f: No such file or directory
find: ./.git/objects/41: No such file or directory
I even tried using:
-not -path '**/.git/objects/**'
and got the same thing. Anybody know why the find is searching in the .git directory? Seems weird.
why is the find searching in the .git directory?
GNU find is clever and supports several optimizations over a naive implementation:
It can flip the order of -size +512b -name '*.txt' and check the name first, because querying the size will require a second syscall.
It can count the hard links of a directory to determine the number of subdirectories, and when it's seen all it no longers needs to check them for -type d or for recursing.
It can even rewrite (-B -or -C) -and -A so that if the checks are equally costly and free of side effects, the -A will be evaluated first, hoping to reject the file after 1 test instead of 2.
However, it is not yet clever enough to realize that -not -path '*/.git/*' means that if you find a directory .git then you don't even need to recurse into it because all files inside will fail to match.
Instead, it dutifully recurses, finds each file and matches it against the pattern as if it was a black box.
To explicitly tell it to skip a directory entirely, you can instead use -prune. See How to exclude a directory in find . command
Both more efficient and more correct would be to avoid the default -print action, change -not -path ... to -prune, and ensure that xargs is only used with NUL-delimited input:
find . -name .git -prune -o \
-name node_modules -prune -o \
-type f -print0 | xargs -0 sed -i '' s/typescript-library-skeleton/xxx/g '{}' +
Note the following points:
We use -prune to tell find to not even recurse down the undesired directories, rather than -not -path ... to tell it to discard names in those directories after they were found.
We put the -prunes before the -type f, so we're able to match directories for pruning.
We have an explicit action, not depending on the default -print. This is important because the default -print effectively has a set of parenthesis: find ... behaves like find '(' ... ')' -print, not like find ... -print, no if explicit action is given.
We use xargs only with the -0 argument enabling NUL-delimited input, and the -print0 action on the find side to generate a NUL-delimited list of names. NUL is the only character which cannot be present in an arbitrary file path (yes, newlines can be present) -- and thus the only character which is safe to use to separate paths. (If the -0 extension to xargs and the -print0 extension to find are not guaranteed to be available, use -exec sed -i '' ... {} + instead).

Exclude directories with find command and executing a script on other directories

I currently have a directory structure that I need to be able to roll through each of 100 or so directories and run a script on them individually while excluding this check on a handful of other directories.
This is what I have been using in the past:
find ./OTHER/ -maxdepth 2 -wholename '*_*/*.txt' -execdir /files/bin/other_process {} +
I would like to exclude certain directories from this check and have not found a sufficient answer to this problem.
This has been my best attempt (or two) at the problem:
find ./OTHER/ \( -path ./OTHER/X???_EXCLUDE_THIS -prune -o -path ./OTHER/X???_IGNORE_THIS -prune -o \) -type d \(-name *_*/*.txt \) -execdir /files/bin/other_process {} +
I get:
find: paths must precede expression ./OTHER/A101_EXCLUDE_THIS/
This is the return that I get on nearly every variation that I have used.
This has been my best attempt (or two) at the problem:
find ./OTHER/ \( -path ./OTHER/X???_EXCLUDE_THIS -prune -o -path ./OTHER/X???_IGNORE_THIS -prune -o \) -type d \(-name *_*/*.txt \) -execdir /files/bin/other_process {} +
Errors in this attempt:
\(-name: There must be a space after \(.
-name *_*/*.txt: -name is for base of file name; use -path here.
*_*/*.txt: You should quote such patterns to prevent expansion by the shell.
-o \): -o does not belong at the end of an expression; you mean \) -o. But you don't need parentheses here.
-type d: Since you want to find regular files *.txt, you must not look for a directory.
With those errors corrected, it works:
find ./OTHER/ -path './OTHER/X???_EXCLUDE_THIS' -prune -o -path './OTHER/X???_IGNORE_THIS' -prune -o -path '*_*/*.txt' -execdir echo {} +

Bash - Excluding subdirectories using the find command [duplicate]

This question already has answers here:
How do I exclude a directory when using `find`?
(46 answers)
Closed 7 years ago.
I'm using the find command to get a list of folders where certain files are located. But because of a permission denied error for certain subdirectories, I want to exclude a certain subdirectory name.
I already tried these solutions I found here:
find /path/to/folders -path "*/noDuplicates" -prune -type f -name "fileName.txt"
find /path/to/folders ! -path "*/noDuplicates" -type f -name "fileName.txt"
And some variations for these commands (variations on the path name for example).
In the first case it won't find a folder at all, in the second case I get the error again, so I guess it still tries to access this directory. Does anyone know what I'm doing wrong or does anyone have a different solution for this?
To complement olivm's helpful answer and address the OP's puzzlement at the need for -o:
-prune, as every find primary (action or test, in GNU speak), returns a Boolean, and that Boolean is always true in the case of -prune.
Without explicit operators, primaries are implicitly connected with -a (-and), which, like its brethren -o (-or) performs short-circuiting Boolean logic.
-a has higher precedence than -o.
For a summary of all find concepts, see https://stackoverflow.com/a/29592349/45375
Thus, the accepted answer,
find . -path ./ignored_directory -prune -o -name fileName.txt -print
is equivalent to (parentheses are used to make the evaluation precedence explicit):
find . \( -path ./ignored_directory -a -prune \) \
-o \
\( -name fileName.txt -a -print \)
Since short-circuiting applies, this is evaluated as follows:
an input path matching ./ignored_directory causes -prune to be evaluated; since -prune always returns true, short-circuiting prevents the right side of the -o operator from being evaluated; in effect, nothing happens (the input path is ignored)
an input path NOT matching ./ignored_directory, instantly - again due to short-circuiting - continues evaluation on the right side of -o:
only if the filename part of the input path matches fileName.txt is the -print primary evaluated; in effect, only input paths whose filename matches fileName.txt are printed.
Edit: In spite of what I originally claimed here, -print IS needed on the right-hand side of -o here; without it, the implied -print would apply to the entire expression and thus also print for left-hand side matches; see below for background information.
By contrast, let's consider what mistakenly NOT using -o does:
find . -path ./ignored_directory -prune -name fileName.txt -print
This is equivalent to:
find . -path ./ignored_directory -a -prune -a -name fileName.txt -a -print
This will only print pruned paths (that also match the -name filter), because the -name and -print primaries are (implicitly) connected with logical ANDs;
in this specific case, since ./ignored_directory cannot also match fileName.txt, nothing is printed, but if -path's argument is a glob, it is possible to get output.
A word on find's implicit use of -print:
POSIX mandates that if a find command's expression as a WHOLE does NOT contain either
output-producing primaries, such as -print itself
primaries that execute something, such as -exec and -ok
(the example primaries given are exhaustive for the POSIX spec. of find, but real-world implementations such as GNU find and BSD find add others, such as the output-producing -print0 primary, and the executing -execdir primary)
that -print be applied implicitly, as if the expression had been specified as:
\( expression \) -print
This is convenient, because it allows you to write commands such as find ., without needing to append -print.
However, in certain situations an explicit -print is needed, as is the case here:
Let's say we didn't specify -print at the end of the accepted answer:
find . -path ./ignored_directory -prune -o -name fileName.txt
Since there's now no output-producing or executing primary in the expression, it is evaluated as:
find . \( -path ./ignored_directory -prune -o -name fileName.txt \) -print
This will NOT work as intended, as it will print paths if the entire parenthesized expression evaluates to true, which in this case mistakenly includes the pruned directory.
By contrast, by explicitly appending -print to the -o branch, paths are only printed if the right-hand side of the -o expression evaluates to true; using parentheses to make the logic clearer:
find . -path ./ignored_directory -prune -o \( -name fileName.txt -print \)
If, by contrast, the left-hand side is true, only -prune is executed, which produces no output (and since the overall expression contains a -print, -print is NOT implicitly applied).
Following my previous comment, this works on my Debian :
find . -path ./ignored_directory -prune -o -name fileName.txt -print
or
find /path/to/folder -path "*/ignored_directory" -prune -o -name fileName.txt -print
or
find /path/to/folder -name fileName.txt -not -path "*/ignored_directory/*"
The differences are nicely debated here
Edit (added behavior specification details)
Pruning all permission denied directories in find
Using gnufind.
Specification behavior details - in this solutions we want to:
exclude unreadable directories contents (prune them),
avoid "permission denied" errors coming from unreadable dierctory,
keep the other errors and return states, but
process all files (even unreadable files, if we can read their names)
The basic design pattern is:
find ... \( -readable -o -prune \) ...
Example
find /var/log/ \( -readable -o -prune \) -name "*.1"
\thanks{mklement0}
The problem is in the way find evaluates the expression you are passing to the -path option.
Instead, you should try something like:
find /path/to/folders ! -path "*noDuplicates*" -type f -name "fileName.txt"

Not expanding asterisk by shell - excluding paths from find

I don't code in Bash daily. I'm trying to implement small functionality: user define an array of directories or files to omit in find command. Unfortunately I have a problem with expanding asterisk and other meta-characters by shell (* is expanded during concatenation). My code is:
excluded=( "subdirectory/a/*"
"subdirectory/b/*"
)
cnt=0
for i in "${excluded[#]}"; do
directories="$directories ! -path \"./${excluded[$cnt]}\""
cnt=$(($cnt+1))
done
echo "$directories"
for i in $(find . -type f -name "*.txt" $directories); do
new_path=$(echo "$i"|sed "s/\.\//..\/up\//g")
echo $new_path
done
Unfortunately, I still see excluded directories in results.
EDIT:
This is not duplicate of existing question. I don't ask you how to exclude directories in find. I have a problem with expanding meta-characters like "*" by passing variables to find command. E.g I have almost working solution below:
excluded=( "subdirectory/a/*"
"subdirectory/b/*"
)
cnt=0
for i in "${excluded[#]}"; do
directories="$directories ! -path ./${excluded[$cnt]}"
cnt=$(($cnt+1))
done
echo "$directories"
for i in $(find . -type f -name "*.txt" $directories); do
new_path=$(echo "$i"|sed "s/\.\//..\/up\//g")
echo $new_path
done
It works, but problem is when e.g directory c contains more than one file. In such case, asterisk sign is replaced by full file paths. Consequently I have an error:
find: paths must precede expression: ./subdirectory/c/fgo.txt
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
Why? Because asterisk sgin has been expanded to full file name:
! -path ./subdirectory/a/aaa.txt ! -path ./subdirectory/b/dfdji.txt ! -path ./subdirectory/c/asd.txt ./subdirectory/c/fgo.txt
My question is: how to avoid such situation?
You'll want to use the -prune switch in find.
Here's an example (I found this on stackoverflow itself)
find . -type d \( -path dir1 -o -path dir2 -o -path dir3 \) -prune -o -print
This omits, dir1, dir2, dir3.
Source: https://stackoverflow.com/a/4210072/1220089
My initial thought is to use double quotes to prevent the expansion:
for i in $(find . -type f -name "*.txt" "$directories" | sed 's#\./#\.\./up/#g'); do
but this of course fails. You can accomplish the same effect with (untested):
pre='! -path'
excluded=( "$pre subdirectory/a/*"
"$pre subdirectory/b/*"
)
for i in $(find . -type f -name "*.txt" "${excluded[#]}" | sed ...); do

bash `find` escaping

I need to find all of the TIFFs in a directory, recursively, but ignore some artifacts (basically all hidden files) that also happen to end with ".tif". This command:
find . -type f -name '*.tif' ! -name '.*'
works exactly how I want it on the command line, but inside a bash script it doesn't find anything. I've tried replacing ! with -and -not and--I think--just about every escaping permutation I can think of and/or recommended by the googleshpere, e.g. .\*, leaving out single quotes, etc. Obviously I'm missing something, any help is appreciated.
EDIT: here's the significant part of the script; the directory it's doing the find on is parameterized, but I've been debugging with it hard-coded; it makes no difference:
#!/bin/bash
RECURSIVE=1
DIR=$1
#get the absolute path to $DIR
DIR=$(cd $DIR; pwd)
FIND_CMD="find $DIR -type f -name '*.tif' ! -name '.*'"
if [ $RECURSIVE == 1 ]; then
FIND_CMD="$FIND_CMD -maxdepth 1"
fi
for in_img in $($FIND_CMD | sort); do
echo $in_img # for debugging
#stuff
done
It was related to having the expression stored in a variable. The solution was to use eval, which of course would be the right thing to do anyway. Everything is the same as above except at the start of the for loop:
for in_img in $(eval $FIND_CMD | sort); do
#stuff
done
To not find hidden files you can use the following:
find . \( ! -regex '.*/\..*' \) -type f -name "*.tif"
It checks the filename and doesn't show (! in the parenthesis) the files beginning with a dot, which are the hidden files.

Resources