How do I write a bash alias/function to grep all files in all subdirectories for a string? - bash

I've been using the following command to grep for a string in all the python source files in and below my current directory:
find . -name '*.py' -exec grep -nHr <string> {} \;
I'd like to simplify things so that I can just type something like
findpy <string>
And get the exact same result. Aliases don't seem sufficient since they only do a string expansion, and the argument I need to specify is not the last argument. It sounds like functions are suitable for the task, so I have several questions:
How do I write it?
Where do I put it?

If you don't want to create an entire script for this, you can do it with just a shell function:
findpy() { find . -name '*.py' -exec grep -nHr "$1" {} \; ; }
...but then you may have to define it in both ~/.bashrc and ~/.bash_profile, so it gets defined for both login and interactive shells (see the INVOCATION section of bash's man page).

All the "find ... -exec" solutions above are OK in the sense that they work, but they are horribly inefficient and will be extremely slow for large trees. The reason is that they launch a new process for every single *.py file. Instead, use xargs(1), and run grep only on files (not directories):
#! /bin/sh
find . -name \*.py -type f | xargs grep -nHr "$1"
For example:
$ time sh -c 'find . -name \*.cpp -type f -exec grep foo {} \; >/dev/null'
real 0m3.747s
$ time sh -c 'find . -name \*.cpp -type f | xargs grep foo >/dev/null'
real 0m0.278s

On a side note, you should take a look at Ack for what you are doing. It is designed as a replacement for Grep written in Perl. Filtering files based on the target language or ignoring .svn directories and the like.
Example (snippet from Trac source):
$ ack --python foo ./mysource
ticket/tests/wikisyntax.py
139:milestone:foo
144:<a class="missing milestone" href="/milestone/foo" rel="nofollow">milestone:foo</a>
ticket/tests/conversion.py
34: ticket['foo'] = 'This is a custom field'
ticket/query.py
239: count_sql = 'SELECT COUNT(*) FROM (' + sql + ') AS foo'

I wanted something similar, and the answer by Idelic reminded of one of the nice features of xargs: that it puts the command at the end. You see, my problem was that I wanted to write a shell alias that would "accept parameters" (really, that it would expand in such a way to allow me to pass parameter so grep).
Here's what I added to my bash_aliases:
alias findpy="find . -type f -name '*.py' | xargs grep"
This way, I could write findpy WORD or findpy -e REGEX or findpy -il WORD - the point being that could use any grep command-line option.

Put the following three lines in a file named findpy
#!/bin/bash
find . -name '*.py' -exec grep -nHr $1 {} \;
Then say
chmod u+x findpy
I normally have a directory called bin in my home directory where I put little shell scripts like this. Make sure to add the directory to your PATH.

The script:
#!/bin/bash
find . -name '*.py' -exec grep -nHr "$1" {} ';'
is how I'd do it.
You write it with an editor like vim and put it somewhere on your path. My normal approach is to have a ~/bin directory and make sure my .profile file (or equivalent) contains:
PATH=$PATH:~/bin

Many versions of grep have options to do recursion, specify filename pattern, etc.
grep --perl-regexp --recursive --include='*.py' --regexp="$1" .
This recurses starting from the current directory (.), looks only at files ending in 'py', uses Perl-style regular expressions.
If your version of grep doesn't support --recursive and --include, then you can still use find and xargs, but be sure to allow for pathnames with embedded spaces by using the -print0 argument to find and the --null option to xargs to handle that.
find . -type f -name '*.py' -print0 | xargs --null grep "$1"
should work.

Add the following line to your ~/.bashrc or ~/.bash_profile or ~/.profile
alias findpy='find . -type f -name "*.py" -print0 | xargs -0 grep'
then you can use it like this
findpy def
or with grep options
findpy -i class
the following alias will ignore the version control meta-directory of git and svn
alias findpy='find . -type f -not -path "*/.git/*" -a -not -path "*/.svn/*" -name "*.py" -print0 | xargs -0 grep'

#######################################################################################
#
# Function to search all files (including sub-directories) that match a given file
# extension ($2) looking for an indicated string ($1) - in a case insensitive manner.
#
# For Example:
#
# -> findfile AllowNegativePayments cpp
#
#
#######################################################################################
findfile ()
{
find . -iname "*.$2*" -type f -print0 | xargs -0 grep -i "$1" {} \; 2> /dev/nul
}
alias _ff='findfile'

Related

Bash script to return all elements given an extension, without using print flags

I want to create shell script that search inside all folders of the actual directory and return all files that satisfy some condition, but without using any print flag.
(Here the condition is to end with .py)
What I have done:
find . -name '*.py'| sed -n 's/\.py$//p'
The output:
./123
./test
./abc/dfe/test3
./testing
./test2
What I would like to achieve:
123
test
test3
testing
test2
Use -exec:
find . -name '*.py' -exec sh -c 'for f; do f=${f%.py}; echo "${f##*/}"; done' sh {} +
If GNU basename is an option, you can simplify this to
find . -name '*.py' -exec basename -s .py {} +
POSIX basename is a little more expensive, as you'll have to call it on every file individually:
find . -name '*.py' -exec basename {} .py \;
Using GNU grep instead of sed:
find . -name '*.py' | grep -oP '[^/]+(?=\.py$)'
If portability is not a concern, this is a very readable option:
find . -name '*.py' | xargs basename -a
This is also differentiated from chepner's answer in that it retains the .py file ending in the output.
I'm not familiar with the -exec flag, and I'm sure his one-liners can be customized to do the same, but I couldn't do so off the top of my head.
Chepner's version achieves the same with the small modification:
find . -name '*.py' -exec basename {} \;
if you want the literal output from find and didn't intend to drop the file endings when you used dummy variables (123,test, etc.) in your question.
find shows entries relative to where you ask it to search, you can simply replace the . with a *:
find * -name '*.py'| sed -n 's/\.py$//p'
(Be aware that this skips top level hidden directories)
This might work for you (GNU parallel):
find . -name '*.py*' 2>/dev/null | parallel echo "{/.}"

bash function grep --exclude-dir not working

I have the following function defined in my .bashrc, but for some reason the --exclude-dir option is not excluding the .git directory. Can anyone see what I've done wrong? I'm using Ubuntu 13.10 if that helps.
function fif # find in files
{
pattern=${1?" Usage: fif <word_pattern> [files pattern]"};
files=${2:+"-iname \"$2\""};
grep "$pattern" --color -n -H -s $(find . $files -type f) --exclude-dir=.git --exclude="*.min.*"
return 0;
}
Make sure not to include a trailing slash when you specify the directory to exclude. For example:
Do this:
$ grep -r --exclude-dir=node_modules firebase .
NOT this:
$ grep -r --exclude-dir=node_modules/ firebase .
(This answer not applicable to OP, but may be helpful for others who find --exclude-dir not to be working -- it worked for me.)
Do a man grep on your system, and see what version you have. Your version of grep may not be able to use --exclude-dirs.
You're really better off using find to find the files you want, then use grep to parse them:
$ find . -name '.git' -type d -prune \
-o -name "*.min.*" -prune \
-o -type f -exec grep --color -n -H {} "$pattern" \;
I'm not a fan of the recursive grep. Its syntax has become bloated, and it's really unnecessary. We have a perfectly good tool for finding files that match a particular criteria, thank you.
In the find program, the -o separate out the various clauses. If a file has not been filtered out by a previous -prune clause, it is passed to the next one. Once you've pruned out all of the .git directories and all of the *.min.* files, you pass the results to the -exec clause that executes your grep command on that one file.
Some people prefer it this way:
$ find . -name '.git' -type d -prune \
-o -name "*.min.*" -prune \
-o -type f -print0 | xargs -0 grep --color -n -H "$pattern"
The -print0 prints out all of the found files separated by the NULL character. The xargs -0 will read in that list of files and pass them to the grep command. The -0 tells xargs that the file names are NULL separated and not whitespace separated. Some xargs will take --null instead of the -0 parameter.

Why does my shell script not find anything (find . -name script.sh | grep watermelon)

I have a script that I'm running from the home directory to search for all files called "script.sh" that contain the string "watermelon". It's not finding anything but I can clearly see these scripts in the subdirectories. Could someone please suggest a change to the command I'm using:
find . -name script.sh | grep watermelon
You need to use xargs:
find . -name script.sh | xargs grep watermelon
xargs will modify the behavior to search within the files, rather than just search within the names of the files.
find returns the filename it finds by default. If you want it to search within the files then you need to pipe it to xargs or use the -exec and -print predicates:
find . -name script.sh -exec grep -q watermelon {} \; -print
use -type f to indicate file
find . -type f -name "script.sh" -exec grep "watermelon" "{}" +;
or if you have bash 4
shopt -s globstar
grep -Rl "watermelon" **/script.sh

What's a more concise way of finding text in a set of files?

I currently use the following command, but it's a little unwieldy to type. What's a shorter alternative?
find . -name '*.txt' -exec grep 'sometext' '{}' \; -print
Here are my requirements:
limit to a file extension (I use SVN and don't want to be searching through all those .svn directories)
can default to the current directory, but it's nice to be able to specify a different directory
must be recursive
UPDATE: Here's my best solution so far:
grep -r 'sometext' * --include='*.txt'
UPDATE #2: After using grep for a bit, I realized that I like the output of my first method better. So, I followed the suggestions of several responders and simply made a shell script and now I call that with two parameters (extension and text to find).
grep has -r (recursive) and --include (to search only in files and directories matching a pattern).
If its too unweildy, write a script that does it and put it in your personal bin directory. I have a 'fif' script which searches source files for text, basically just doing a single find like you have here:
#!/bin/bash
set -f # disable pathname expansion
pattern="-iname *.[chsyl] -o -iname *.[ch]pp -o -iname *.hh -o -iname *.cc
-o -iname *.java -o -iname *.inl"
prune=""
moreargs=true
while $moreargs && [ $# -gt 0 ]; do
case $1 in
-h)
pattern="-iname *.h -o -iname *.hpp -o -iname *.hh"
shift
;;
-prune)
prune="-name $2 -prune -false -o $prune"
shift
shift
;;
*)
moreargs=false;
;;
esac
done
find . $prune $pattern | sed 's/ /\\ /g' | xargs grep "$#"
it started life as a single-line script and got features added over the years as I needed them.
This is much more efficient since it invokes grep many fewer times, though it's hard to say it's more succinct:
find . -name '*.txt' -print0 | xargs -0 grep 'sometext' /dev/null
Notes:
/find -print0 and xargs -0 makes pathnames with embedded blanks work correctly.
The /dev/null argument makes sure grep always prepends a filename.
Install ack and use
ack -aG'\.txt$' 'sometext'
I second ephemient's suggestion of ack. I'm writing this post to highlight a particular issue.
In response to jgormley (in the comments): ack is available as a single file which will work wherever the right Perl version is installed (which is everywhere).
Given that on non-Linux platforms grep regularly does not accept -R, arguably using ack is more portable.
I use zsh, which has recursive globbing. If you needed to look at specific filetypes, the following would be equivalent to your example:
grep 'sometext' **/*.txt
If you don't care about the filetype, the -r option will be better:
grep -r 'sometext' *
Although, A minor tweak to your original example will give you exactly what you want:
find . -name '*.txt' \! -wholename '*/.svn/*' -exec grep 'sometext' '{}' \; -print
If this is something you do frequently, make it a function (put this in your shell config):
function grep_no_svn {
find . -name "${2:-*}" \! -wholename '*/.svn/*' -exec grep "$1" '{}' \; -print
}
Where the first argument to the function is the text you're searching for. So:
$ grep_here_no_svn "sometext"
Or:
$ grep_here_no_svn "sometext" "*.txt"
You could write a script (in bash or whatever -- I have one in Groovy) and place it on the path. E.g.
$ myFind.sh txt targetString
where myFind.sh is:
find . -name "*.$1" -exec grep $2 {} \; -print
I usualy avoid the "man find" by using grep $(find . -name "*,txt")
You say that you like the output of your method (using find) better. The only difference I can see between them is that grepping multiple files will put the filename on the front.
You can always (in GNU grep, but you must be using that or -r and --include wouldn't work) turn the filename off by using -h (--no-filename). The opposite, for anyone who does want filenames but has to use find for some other reason, is -H (--with-filename).

How do I use a pipe in the exec parameter for a find command?

I'm trying to construct a find command to process a bunch of files in a directory using two different executables. Unfortunately, -exec on find doesn't allow to use pipe or even \| because the shell interprets that character first.
Here is specifically what I'm trying to do (which doesn't work because pipe ends the find command):
find /path/to/jpgs -type f -exec jhead -v {} | grep 123 \; -print
Try this
find /path/to/jpgs -type f -exec sh -c 'jhead -v {} | grep 123' \; -print
Alternatively you could try to embed your exec statement inside a sh script and then do:
find -exec some_script {} \;
A slightly different approach would be to use xargs:
find /path/to/jpgs -type f -print0 | xargs -0 jhead -v | grep 123
which I always found a bit easier to understand and to adapt (the -print0 and -0 arguments are necessary to cope with filenames containing blanks)
This might (not tested) be more effective than using -exec because it will pipe the list of files to xargs and xargs makes sure that the jhead commandline does not get too long.
With -exec you can only run a single executable with some arguments, not arbitrary shell commands. To circumvent this, you can use sh -c '<shell command>'.
Do note that the use of -exec is quite inefficient. For each file that is found, the command has to be executed again. It would be more efficient if you can avoid this. (For example, by moving the grep outside the -exec or piping the results of find to xargs as suggested by Palmin.)
Using find command for this type of a task is maybe not the best alternative. I use the following command frequently to find files that contain the requested information:
for i in dist/*.jar; do echo ">> $i"; jar -tf "$i" | grep BeanException; done
As this outputs a list would you not :
find /path/to/jpgs -type f -exec jhead -v {} \; | grep 123
or
find /path/to/jpgs -type f -print -exec jhead -v {} \; | grep 123
Put your grep on the results of the find -exec.
There is kind of another way you can do it but it is also pretty ghetto.
Using the shell option extquote you can do something similar to this in order to make find exec stuff and then pipe it to sh.
root#ifrit findtest # find -type f -exec echo ls $"|" cat \;|sh
filename
root#ifrit findtest # find -type f -exec echo ls $"|" cat $"|" xargs cat\;|sh
h
I just figured I'd add that because at least the way i visualized it, it was closer to the OP's original question of using pipes within exec.

Resources