Recursive grep within specific subdirectories - bash

I wish to grep certain files that only exist below a specific subdirectory. Here, I only want .xml files if they exist below the /bar/ subdirectory:
./file0.xml
./.git/file1a.xml
./.git/bar/file1b.xml
./.svn/file2a.xml
./.svn/foo/bar/baz/file2b.xml
./path1/file3.xml
./path1/foo/file4.xml
./path2/foo/bar/file5.xml
./path2/foo/baz/file6.xml
./path3/bar/file7.xml
./path3/foo/bar/baz/file8.xml
I want only the following files to be grepped: file5.xml, file7.xml, file8.xml
To exclude .git and .svn, I came up with:
grep -r --exclude-dir={.git,.svn} --include=\*.xml "pattern"
which still searches file3.xml through file8.xml.
If I grep -v the undesired directories:
grep -r --exclude-dir={.git,.svn} --include=\*.xml "pattern" | grep -v /bar/
I get the desired results, but it spends a lot of time parsing the non-/bar/ files.
Using find to find the xml files under /res/, I get the desired results (and it's much faster than the above result):
find . -type d \( -name .git -o -name .svn \) -prune -o \
-path '*/bar/*.xml' -exec grep "pattern" {} +
I'm trying to avoid this, however, as I use this within a script and don't want to be limited to starting the search in the top ./ directory.
Is there a way to accomplish this using only grep (so it doesn't prevent the user from specifying additional grep options and/or starting search directories)? Perhaps something like:
grep -r --exclude-dir={.git,.svn} --include=\*/bar/\*.xml "pattern"

find+grep is definitely a good approach. You could make it more flexible by defining a function that inserts arguments in strategic places. For example:
search() {
local dir=$1
local pattern=$2
local args=("${#:3}")
find "$dir" -type d \( -name .git -o -name .svn \) -prune -o \
-path '*/bar/*.xml' -exec grep "${args[#]}" "$pattern" {} +
}

Related

how to find identical files by names and at same location in 2 or more directories

I am trying to write a bash script that can give a list of identical files just by name between 2 or more directory locations
diff -srq Ear2.ear/ Ear1.ear/ | grep identical
but seems this is comparing contents as well.
I already have a file that has list of all the target directories I need to compare. However, I need to exclude certain sub-directores while comparing.
An array cross-section would be an interesting way to solve this.
$ mkdir tmp1 tmp2
$ touch tmp1/foo tmp1/bar tmp1/baz
$ touch tmp2/foo tmp2/bar tmp2/slurm
$ cd tmp1; a=( * ); cd -
$ cd tmp2; declare -A b; for f in *; do b[$f]=1; done; cd -
$ for x in "${a[#]}"; do [[ "${b[$x]}" ]] && echo "$x"; done
bar
foo
However, you mentioned that you "need to exclude certain sub-directores while comparing", and your diff includes -r, so you're looking to be selectively recursive.
To achieve this, I'd suggest using bash's globstar and then removing the parts you don't want. For example:
$ shopt -s globstar
$ a=( **/* )
$ for x in "${!a[#]}"; do [[ "${a[$x]}" = tmp3/* ]] && unset a[$x]; done
Note that globstar requires bash version 4.
This takes advantage of the find utility's -prune option to exclude directories:
comm -1 -2 <(cd $1; find . -name "*" -path "./folder1" -prune -o -print | sort) <(cd $2; find . -name "*" -path "./folder1" -prune -o -print | sort)
cd so that we don't include the parent directory in the output of find.
Run find with appropriate parameters to print all files, excluding given subfolders.
Pipe that into sort, so that we can
Use the comm utility via process substitution to show only the lines (aka file names) in common.
Basic Example:
I have the folder structure:
diffdir1/
file1.txt
file2.txt
uniqueTo1.txt
folder1/
file1.txt
folder2/
file1.txt
folderUniqueTo1/
file1.txt
diffdir2/
file1.txt
file2.txt
uniqueTo2.txt
folder1/
file1.txt
folder2/
file1.txt
(Contents do differ between the various file1.txts, although that's not checked here.) Using the above script, I get:
$ ./script.sh diffdir1 diffdir2
.
./file1.txt
./file2.txt
aka only the two files with the same names.
As a sanity check, if I remove the -path "./folder1" -prune -o -print part of the command, this should no longer exclude things under folder1:
$ ./script2.sh diffdir1 diffdir2
.
./file1.txt
./file2.txt
./folder1
./folder1/file1.txt
Using a file for the list of directories and such would just be a matter of modifying what goes into the different parameters of the find command.
Example: Exclude multiple subdirectories
This command will exclude the folders ./abc/xyz/obj64, ./abc/video, and ./sim:
comm -1 -2 <(cd $1; find . -name "*" \( -path "./abc/xyz/obj64" -o -path "./abc/video" -o -path "./sim" \) -prune -o -print | sort) <(cd $2; find . -name "*" \( -path "./abc/xyz/obj64" -o -path "./abc/video" -o -path "./sim" \) -prune -o -print | sort)
Note that the list of paths must be placed inside of \( \) parenthesis. -o means "or", so it's now checking if any of the paths match for pruning.
Example: Include only files matching a particular pattern
Expanding off of the previous example, now let's only return files matching a pattern. In this example I'll search for only files ending in .xml:
comm -1 -2 <(cd $1; find . \( -path "./abc/xyz/obj64" -o -path "./abc/video" -o -path "./sim" \) -prune -o -name "*.xml" -print | sort) <(cd $2; find . \( -path "./abc/xyz/obj64" -o -path "./abc/video" -o -path "./sim" \) -prune -o -name "*.xml" -print | sort)
The difference here is that the -name argument was moved to after pruning. This doesn't make a difference if you are searching for all files ("*"), but does matter when you have a pattern. So it's a good idea to put the -name at the end anyways in case you might change it later.

BASH: find and rename files & directories

I would like to replace :2f with a - in all file/dir names and for some reason the one-liner below is not working, is there any simpler way to achieve this?
Directory name example:
AN :2f EXAMPLE
Command:
for i in $(find /tmp/ \( -iname ".*" -prune -o -iname "*:*" -print \)); do { mv $i $(echo $i | sed 's/\:2f/\-/pg'); }; done
You don't have to parse the output of find:
find . -depth -name '*:2f*' -execdir bash -c 'echo mv "$0" "${0//:2f/-}"' {} \;
We're using -execdir so that the command is executed from within the directory containing the found file. We're also using -depth so that the content of a directory is considered before the directory itself. All this to avoid problems if the :2f string appears in a directory name.
As is, this command is harmless and won't perform any renaming; it'll only show on the terminal what's going to be performed. Remove echo if you're happy with what you see.
This assumes you want to perform the renaming for all files and folders (recursively) in current directory.
-execdir might not be available for your version of find, though.
If your find doesn't support -execdir, you can get along without as so:
find . -depth -name '*:2f*' -exec bash -c 'dn=${0%/*} bn=${0##*/}; echo mv "$dn/$bn" "$dn/${bn//:2f/-}"' {} \;
Here, the trick is to separate the directory part from the filename part—that's what we store in dn (dirname) and bn (basename)—and then only change the :2f in the filename.
Since you have filenames containing space, for will split these up into separate arguments when iterating. Pipe to a while loop instead:
find /tmp/ \( -iname ".*" -prune -o -iname "*:*" -print \) | while read -r i; do
mv "$i" "$(echo "$i" | sed 's/\:2f/\-/pg')"
Also quote all the variables and command substitutions.
This will work as long as you don't have any filenames containing newline.

bash function grep --exclude-dir not working

I have the following function defined in my .bashrc, but for some reason the --exclude-dir option is not excluding the .git directory. Can anyone see what I've done wrong? I'm using Ubuntu 13.10 if that helps.
function fif # find in files
{
pattern=${1?" Usage: fif <word_pattern> [files pattern]"};
files=${2:+"-iname \"$2\""};
grep "$pattern" --color -n -H -s $(find . $files -type f) --exclude-dir=.git --exclude="*.min.*"
return 0;
}
Make sure not to include a trailing slash when you specify the directory to exclude. For example:
Do this:
$ grep -r --exclude-dir=node_modules firebase .
NOT this:
$ grep -r --exclude-dir=node_modules/ firebase .
(This answer not applicable to OP, but may be helpful for others who find --exclude-dir not to be working -- it worked for me.)
Do a man grep on your system, and see what version you have. Your version of grep may not be able to use --exclude-dirs.
You're really better off using find to find the files you want, then use grep to parse them:
$ find . -name '.git' -type d -prune \
-o -name "*.min.*" -prune \
-o -type f -exec grep --color -n -H {} "$pattern" \;
I'm not a fan of the recursive grep. Its syntax has become bloated, and it's really unnecessary. We have a perfectly good tool for finding files that match a particular criteria, thank you.
In the find program, the -o separate out the various clauses. If a file has not been filtered out by a previous -prune clause, it is passed to the next one. Once you've pruned out all of the .git directories and all of the *.min.* files, you pass the results to the -exec clause that executes your grep command on that one file.
Some people prefer it this way:
$ find . -name '.git' -type d -prune \
-o -name "*.min.*" -prune \
-o -type f -print0 | xargs -0 grep --color -n -H "$pattern"
The -print0 prints out all of the found files separated by the NULL character. The xargs -0 will read in that list of files and pass them to the grep command. The -0 tells xargs that the file names are NULL separated and not whitespace separated. Some xargs will take --null instead of the -0 parameter.

Gnu find: apply -prune to directories which match a pattern in external file

I wonder if there is a more efficient way to obtain directory patterns for use with -prune from an external file:
find . \( -type d -a -exec sh -c "echo \"{}\" | grep -qEx -f patterns.prune" \; \) -prune -o \( <further checks> \)
this works but is of course very slow due to the use of a shell/pipe for every previous match. So is there a more elegant way than the above or do i really have to chain the lines of the pattern file as commandline switches for find ?
Thanks.
You could try to pipe to grep at the end of the run, to only invoke it once, i.e. something like:
find . <your_other_conditions> | grep -v -f patterns.prune
This may not apply to your particular case, since it will now A) find everything under the pruned directories as well (though you can fix that by tweaking patterns.prune) and B) relieve control from find, so that you can't use find's builtins (e.g. -exec) on the results.

What's a more concise way of finding text in a set of files?

I currently use the following command, but it's a little unwieldy to type. What's a shorter alternative?
find . -name '*.txt' -exec grep 'sometext' '{}' \; -print
Here are my requirements:
limit to a file extension (I use SVN and don't want to be searching through all those .svn directories)
can default to the current directory, but it's nice to be able to specify a different directory
must be recursive
UPDATE: Here's my best solution so far:
grep -r 'sometext' * --include='*.txt'
UPDATE #2: After using grep for a bit, I realized that I like the output of my first method better. So, I followed the suggestions of several responders and simply made a shell script and now I call that with two parameters (extension and text to find).
grep has -r (recursive) and --include (to search only in files and directories matching a pattern).
If its too unweildy, write a script that does it and put it in your personal bin directory. I have a 'fif' script which searches source files for text, basically just doing a single find like you have here:
#!/bin/bash
set -f # disable pathname expansion
pattern="-iname *.[chsyl] -o -iname *.[ch]pp -o -iname *.hh -o -iname *.cc
-o -iname *.java -o -iname *.inl"
prune=""
moreargs=true
while $moreargs && [ $# -gt 0 ]; do
case $1 in
-h)
pattern="-iname *.h -o -iname *.hpp -o -iname *.hh"
shift
;;
-prune)
prune="-name $2 -prune -false -o $prune"
shift
shift
;;
*)
moreargs=false;
;;
esac
done
find . $prune $pattern | sed 's/ /\\ /g' | xargs grep "$#"
it started life as a single-line script and got features added over the years as I needed them.
This is much more efficient since it invokes grep many fewer times, though it's hard to say it's more succinct:
find . -name '*.txt' -print0 | xargs -0 grep 'sometext' /dev/null
Notes:
/find -print0 and xargs -0 makes pathnames with embedded blanks work correctly.
The /dev/null argument makes sure grep always prepends a filename.
Install ack and use
ack -aG'\.txt$' 'sometext'
I second ephemient's suggestion of ack. I'm writing this post to highlight a particular issue.
In response to jgormley (in the comments): ack is available as a single file which will work wherever the right Perl version is installed (which is everywhere).
Given that on non-Linux platforms grep regularly does not accept -R, arguably using ack is more portable.
I use zsh, which has recursive globbing. If you needed to look at specific filetypes, the following would be equivalent to your example:
grep 'sometext' **/*.txt
If you don't care about the filetype, the -r option will be better:
grep -r 'sometext' *
Although, A minor tweak to your original example will give you exactly what you want:
find . -name '*.txt' \! -wholename '*/.svn/*' -exec grep 'sometext' '{}' \; -print
If this is something you do frequently, make it a function (put this in your shell config):
function grep_no_svn {
find . -name "${2:-*}" \! -wholename '*/.svn/*' -exec grep "$1" '{}' \; -print
}
Where the first argument to the function is the text you're searching for. So:
$ grep_here_no_svn "sometext"
Or:
$ grep_here_no_svn "sometext" "*.txt"
You could write a script (in bash or whatever -- I have one in Groovy) and place it on the path. E.g.
$ myFind.sh txt targetString
where myFind.sh is:
find . -name "*.$1" -exec grep $2 {} \; -print
I usualy avoid the "man find" by using grep $(find . -name "*,txt")
You say that you like the output of your method (using find) better. The only difference I can see between them is that grepping multiple files will put the filename on the front.
You can always (in GNU grep, but you must be using that or -r and --include wouldn't work) turn the filename off by using -h (--no-filename). The opposite, for anyone who does want filenames but has to use find for some other reason, is -H (--with-filename).

Resources