I don't code in Bash daily. I'm trying to implement small functionality: user define an array of directories or files to omit in find command. Unfortunately I have a problem with expanding asterisk and other meta-characters by shell (* is expanded during concatenation). My code is:
excluded=( "subdirectory/a/*"
"subdirectory/b/*"
)
cnt=0
for i in "${excluded[#]}"; do
directories="$directories ! -path \"./${excluded[$cnt]}\""
cnt=$(($cnt+1))
done
echo "$directories"
for i in $(find . -type f -name "*.txt" $directories); do
new_path=$(echo "$i"|sed "s/\.\//..\/up\//g")
echo $new_path
done
Unfortunately, I still see excluded directories in results.
EDIT:
This is not duplicate of existing question. I don't ask you how to exclude directories in find. I have a problem with expanding meta-characters like "*" by passing variables to find command. E.g I have almost working solution below:
excluded=( "subdirectory/a/*"
"subdirectory/b/*"
)
cnt=0
for i in "${excluded[#]}"; do
directories="$directories ! -path ./${excluded[$cnt]}"
cnt=$(($cnt+1))
done
echo "$directories"
for i in $(find . -type f -name "*.txt" $directories); do
new_path=$(echo "$i"|sed "s/\.\//..\/up\//g")
echo $new_path
done
It works, but problem is when e.g directory c contains more than one file. In such case, asterisk sign is replaced by full file paths. Consequently I have an error:
find: paths must precede expression: ./subdirectory/c/fgo.txt
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
Why? Because asterisk sgin has been expanded to full file name:
! -path ./subdirectory/a/aaa.txt ! -path ./subdirectory/b/dfdji.txt ! -path ./subdirectory/c/asd.txt ./subdirectory/c/fgo.txt
My question is: how to avoid such situation?
You'll want to use the -prune switch in find.
Here's an example (I found this on stackoverflow itself)
find . -type d \( -path dir1 -o -path dir2 -o -path dir3 \) -prune -o -print
This omits, dir1, dir2, dir3.
Source: https://stackoverflow.com/a/4210072/1220089
My initial thought is to use double quotes to prevent the expansion:
for i in $(find . -type f -name "*.txt" "$directories" | sed 's#\./#\.\./up/#g'); do
but this of course fails. You can accomplish the same effect with (untested):
pre='! -path'
excluded=( "$pre subdirectory/a/*"
"$pre subdirectory/b/*"
)
for i in $(find . -type f -name "*.txt" "${excluded[#]}" | sed ...); do
Related
I need to get the number of the contents of a directory that is a git repository.
I have to get the number of:
1) Other directories inside the directory I am currently iterating (and the other sub-directories inside them if they exist)
2) .txt files inside the directory and its sub-directories
3) All the non-txt files inside the directory and its sub-directories
In all the above cases I must ignore the .git directory, along with all the files and directories that are inside of it.
Also I must use bash script exclusively. I can't use another programing language.
Right now I am using the following commands to achieve this:
To get all the .txt files I use : find . -type f \( -name "*.txt" \). There are no .txt files inside .git so this is working.
To get all the non-txt files I use: find . -type f \( ! -name "*.txt" \). The problem is that I also get all the files from .git and I don't know how to ignore them.
To get all the directories and sub-directories I use: find . -type d. I don't know how to ignore the .git directory and it's sub-directories
The easy way is to just add these extra tests:
find . ! -path './.git/*' ! -path ./.git -type f -name '*.txt'
The problem with this is ./.git is still traversed, unnecessarily, which takes time.
Instead, -prune can be used. -prune is not a test (like -path, or -type). It's an action. The action is "don't descend the current path, if it's a directory". It must be used separately to the print action.
# task 1
find . -path './.git' -prune -o -type f -name '*.txt' -print
# task 2
find . -path './.git' -prune -o -type f ! -name '*.txt' -print
# task 3
find . -path './.git' -prune -o -type d -print
If -print isn't specified, ./.git is also printed as the default action.
I used -path ./.git, because you said "the .git directory". If for some reason there are other .git directories in the tree, they will be traversed and printed. To ignore all directories in the tree named .git, replace -path ./.git with -name .git.
Sometimes writing a bash loop is more clear than a one-liner
for f in $(find .); do
if [[ -d $f && "$f" == "./.git" ]]; then
echo "skipping dir $f";
else
echo "do something with $f";
fi;
done
I have a script that recursively searches all directories for specific files or specific file endings.
These certain files I want to save the path in a description file.
Looks for example like this:
./org/apache/commons/.../file1.pom
./org/apache/commons/.../file1.jar
./org/apache/commons/.../file1.zip
and so on.
In a blacklist , I describe which file endings I want to ignore.
! -path "./.cache/*" ! -path "./org/*" ! -name "*.sha1" ! -name"*.lastUpdated"
and so on.
Now i want to read this blacklist file while the search to ignore the described files:
find . -type f $(cat blacklist) > artifact.descriptor
Unfortunately, the blacklist will not be included while the search.
When:
echo "find . -type f $(cat blacklist) > artifact.descriptor"
Result is as expected:
find . -type f ! -path "./.cache/*" ! -path "./org/*" ! -name "*.sha1" ! -name"*.lastUpdated" > artifact.descriptor
But it does not work or exclude the described files.
I tried with following command and it works, but i want to know why not with with find alone.
find . -type f | grep -vf $blacklist > artifact.descriptor
Hopefully someone can explain it to me :)
Thanks a lot.
As tripleee suggests, it is generally considered bad practice to store a command in a variable because it does not catch all the cornercases.
However you can use eval as a workaround
/tmp/test$ ls
blacklist test.a test.b test.c
/tmp/test$ cat blacklist
-not -name *.c -not -name *.b
/tmp/test$ eval "find . -type f "`cat blacklist`
./test.a
./blacklist
In your case I think it fails because the quotes in your blacklist file are considered as a literal and not as enclosing the patterns and I think it works if you remove them, but still it's probably not safe for other reasons.
! -path ./.cache/* ! -path ./org/* ! -name *.sha1 ! -name *.lastUpdated
I'm trying to run find, and exclude several directories listed in an array. I'm finding some weird behavior when it's expanding, though, which is causing me issues:
~/tmp> skipDirs=( "./dirB" "./dirC" )
~/tmp> bars=$(find . -name "bar*" -not \( -path "${skipDirs[0]}/*" $(printf -- '-o -path "%s/\*" ' "${skipDirs[#]:1}") \) -prune); echo $bars
./dirC/bar.txt ./dirA/bar.txt
This did not skip dirC as I wold have expected. The problem is that the print expands the quotes around "./dirC".
~/tmp> set -x
+ set -x
~/tmp> bars=$(find . -name "bar*" -not \( -path "${skipDirs[0]}/*" $(printf -- '-o -path "%s/*" ' "${skipDirs[#]:1}") \) -prune); echo $bars
+++ printf -- '-o -path "%s/*" ' ./dirC
++ find . -name 'bar*' -not '(' -path './dirB/*' -o -path '"./dirC/*"' ')' -prune
+ bars='./dirC/bar.txt
./dirA/bar.txt'
+ echo ./dirC/bar.txt ./dirA/bar.txt
./dirC/bar.txt ./dirA/bar.txt
If I try to remove the quotes in the $(print..), then the * gets expanded immediately, which also gives the wrong results. Finally, if I remove the quotes and try to escape the *, then the \ escape character gets included as part of the filename in the find, and that does not work either. I'm wondering why the above does not work, and, what would work? I'm trying to avoid using eval if possible, but currently I'm not seeing a way around it.
Note: This is very similar to: Finding directories with find in bash using a exclude list, however, the posted solutions to that question seem to have the issues I listed above.
The safe approach is to build your array explicitly:
#!/bin/bash
skipdirs=( "./dirB" "./dirC" )
skipdirs_args=( -false )
for i in "${skipdirs[#]}"; do
args+=( -o -type d -path "$i" )
done
find . \! \( \( "${skipdirs_args[#]}" \) -prune \) -name 'bar*'
I slightly modify the logic in your find since you had a slight (logic) error in there: your command was:
find -name 'bar*' -not stuff_to_prune_the_dirs
How does find proceed? it will parse the files tree and when it finds a file (or directory) that matches bar* then it will apply the -not ... part. That's really not what you want! your -prune is never going to be applied!
Look at this instead:
find . \! \( -type d -path './dirA' -prune \)
Here find will completely prune the directory ./dirA and print everything else. Now it's among everything else that you want to apply the filter -name 'bar*'! the order is very important! there's a big difference between this:
find . -name 'bar*' \! \( -type d -path './dirA' -prune \)
and this:
find . \! \( -type d -path './dirA' -prune \) -name 'bar*'
The first one doesn't work as expected at all! The second one is fine.
Notes.
I'm using \! instead of -not as \! is POSIX, -not is an extension not specified by POSIX. You'll argue that -path is not POSIX either so it doesn't matter to use -not. That's a detail, use whatever you like.
You had to use some dirty trick to build your commands to skip your dir, as you had to consider the first term separately from the other. By initializing the array with -false, I don't have to consider any terms specially.
I'm specifying -type d so that I'm sure I'm pruning directories.
Since my pruning really applies to the directories, I don't have to include wildcards in my exclude terms. This is funny: your problem that seemingly is about wildcards that you can't handle disappears completely when you use find appropriately as explained above.
Of course, the method I gave really applies with wildcards too. For example, if you want to exclude/prune all subdirectories called baz inside subdirectories called foo, the skipdirs array given by
skipdirs=( "./*/foo/baz" "./*/foo/*/baz" )
will work fine!
The issue here is that the quotes you are using on "%s/*" aren't doing what you think they are.
That is to say, you think you need the quotes on "%s/*" to prevent the results from the printf from being globbed however that isn't what is happening. Try the same thing without the directory separator and with files that start and end with double quotes and you'll see what I mean.
$ ls
"dirCfoo"
$ skipDirs=( "dirB" "dirC" )
$ printf '%s\n' -- -path "${skipDirs[0]}*" $(printf -- '-o -path "%s*" ' "${skipDirs[#]:1}")
-path
dirB*
-o
-path
"dirCfoo"
$ rm '"dirCfoo"'
$ printf -- '%s\n' -path "${skipDirs[0]}*" $(printf -- '-o -path "%s*" ' "${skipDirs[#]:1}")
-path
dirB*
-o
-path
"dirC*"
See what I mean? The quotes aren't being handled specially by the shell. They just happen not to glob in your case.
This issue is part of why things like what is discussed at http://mywiki.wooledge.org/BashFAQ/050 don't work.
To do what you want here I believe you need to create the find arguments array manually.
sD=(-path /dev/null)
for dir in "${skipDirs}"; do
sD+=(-o -path "$dir")
done
and then expand "${sD[#]}" on the find command line (-not \( "${sD[#]}" \) or so).
And yes, I believe this makes the answer you linked to incorrect (though the other answer might work (for non-whitespace, etc. files) because of the array indirection that is going on.
In a bash script this fails:
fileloc='/var/adm/logs/morelogs'
filename=' -name "*.user"'
fileList="$(find "$fileloc"/* -type f -prune "$filename" -print)"
find: bad option -name "*.user"
find: [-H | -L] path-list predicate-list
but this works:
find /var/adm/logs/morelogs/* -type f -prune -name "*.user" -print
in the same manner:
this fails:
fileloc='/var/adm/logs/morelogs'
filename='\( -name "admin.*" -o -name "*.user" -o -name "*.user.gz" \)'
fileList="$(find "$fileloc"/* -type f -prune "$filename" -print)"
find: bad option \( -name "admin.*" -o -name "*.user" -o -name "*.user.gz" \)
find: [-H | -L] path-list predicate-list
but this works:
find /var/adm/logs/morelogs/* -type f -prune \( -name "admin.*" -o -name "*.user" -o -name "*.user.gz" \) -print
GNU bash, version 3.00.16(1)-release-(sparc-sun-solaris2.10)
This is usecase when you should use BASH arrays or BASH function.
Using BASH arrays:
#!/bin/bash
# initialize your constants
fileloc='/var/adm/logs/morelogs'
filename='*.user'
# create an array with full find command
cmd=( find "$fileloc" -type f -prune -name "$filename" -print )
# execute find command line using BASH array
"${cmd[#]}"
It sounds like you're trying to build the list of names to search for dynamically -- if this is the case, a variant of #anubhava's answer using the array for just the name patterns is the best approach:
namepatterns=() # Start with no filenames to search for
while something; do
newsuffix="whatever"
namepatterns+=(-o -name "*.$newsuffix")
done
# Note that "${namepatterns[#]}" is not quite what we want to pass to find, since
# it always starts with "-o" (unless it's empty, in which case this'll have other
# problems). But "${namepatterns[#]:1}" leaves off the first element, and gets us
# what we need.
fileList="$(find "$fileloc"/* -type f -prune "(" "${namepatterns[#]:1}" ")" -print)"
Other notes: I second #BroSlow's recommendation to read BashFAQ #50: I'm trying to put a command in a variable, but the complex cases always fail!, and also you're going to have trouble using that filelist variable if any of the filenames contain funny characters (esp. whitespace and wildcards) -- see BashFAQ #20: How can I find and safely handle file names containing newlines, spaces or both? (short answer: arrays are better for this as well!)
Lets see what are you doing with set -x:
$ fileloc='/var/adm/logs/morelogs'
+ fileloc=/var/adm/logs/morelogs
$ filename=' -name "*.user"'
+ filename=' -name "*.user"'
Everything seems fine, now, next line:
$ fileList="$(find "$fileloc"/* -type f -prune "$filename" -print)"
++ find '/var/adm/logs/morelogs/*' -type f -prune ' -name "*.user"' -print
find: paths must precede expression: -name "*.user"
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
+ fileList=
I think you see the problem, if you execute find '/var/adm/logs/morelogs/*' -type f -prune ' -name "*.user"' -print it will throw you an error:
$ find '/var/adm/logs/morelogs/*' -type f -prune ' -name "*.user"' -print
find: paths must precede expression: -name "*.user"
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
What's happening? Well, there's a bunch of single quotes that are in the way, but the one that causes problems is the two lasts, before -name and -print, which cause find to see it as a single parameter, the other can be ignored. So, how to fix this? Don't use double quotes to ask for the $filename variable:
$ find "$fileloc" -type f -prune $filename -print
+ find /var/adm/logs/morelogs -type f -prune -name '*.user' -print
That should solve it.
not an answer to problem, but a poor solution. After getting frustrated, i just hard-coded the search to have full options list.
so it looks like this now: and it works. i had to build some cases, and repeat myself - not a good programming practice, but i was tired of this shell ting....
so for example one option looks like:
fileList="$(find "$fileloc"/* -type f -prune \( -name "admin.*" -o -name "*.user" -o -name "*.user.gz" \) -print)"
I need to find all of the TIFFs in a directory, recursively, but ignore some artifacts (basically all hidden files) that also happen to end with ".tif". This command:
find . -type f -name '*.tif' ! -name '.*'
works exactly how I want it on the command line, but inside a bash script it doesn't find anything. I've tried replacing ! with -and -not and--I think--just about every escaping permutation I can think of and/or recommended by the googleshpere, e.g. .\*, leaving out single quotes, etc. Obviously I'm missing something, any help is appreciated.
EDIT: here's the significant part of the script; the directory it's doing the find on is parameterized, but I've been debugging with it hard-coded; it makes no difference:
#!/bin/bash
RECURSIVE=1
DIR=$1
#get the absolute path to $DIR
DIR=$(cd $DIR; pwd)
FIND_CMD="find $DIR -type f -name '*.tif' ! -name '.*'"
if [ $RECURSIVE == 1 ]; then
FIND_CMD="$FIND_CMD -maxdepth 1"
fi
for in_img in $($FIND_CMD | sort); do
echo $in_img # for debugging
#stuff
done
It was related to having the expression stored in a variable. The solution was to use eval, which of course would be the right thing to do anyway. Everything is the same as above except at the start of the for loop:
for in_img in $(eval $FIND_CMD | sort); do
#stuff
done
To not find hidden files you can use the following:
find . \( ! -regex '.*/\..*' \) -type f -name "*.tif"
It checks the filename and doesn't show (! in the parenthesis) the files beginning with a dot, which are the hidden files.