What's a more concise way of finding text in a set of files? - bash

I currently use the following command, but it's a little unwieldy to type. What's a shorter alternative?
find . -name '*.txt' -exec grep 'sometext' '{}' \; -print
Here are my requirements:
limit to a file extension (I use SVN and don't want to be searching through all those .svn directories)
can default to the current directory, but it's nice to be able to specify a different directory
must be recursive
UPDATE: Here's my best solution so far:
grep -r 'sometext' * --include='*.txt'
UPDATE #2: After using grep for a bit, I realized that I like the output of my first method better. So, I followed the suggestions of several responders and simply made a shell script and now I call that with two parameters (extension and text to find).

grep has -r (recursive) and --include (to search only in files and directories matching a pattern).

If its too unweildy, write a script that does it and put it in your personal bin directory. I have a 'fif' script which searches source files for text, basically just doing a single find like you have here:
#!/bin/bash
set -f # disable pathname expansion
pattern="-iname *.[chsyl] -o -iname *.[ch]pp -o -iname *.hh -o -iname *.cc
-o -iname *.java -o -iname *.inl"
prune=""
moreargs=true
while $moreargs && [ $# -gt 0 ]; do
case $1 in
-h)
pattern="-iname *.h -o -iname *.hpp -o -iname *.hh"
shift
;;
-prune)
prune="-name $2 -prune -false -o $prune"
shift
shift
;;
*)
moreargs=false;
;;
esac
done
find . $prune $pattern | sed 's/ /\\ /g' | xargs grep "$#"
it started life as a single-line script and got features added over the years as I needed them.

This is much more efficient since it invokes grep many fewer times, though it's hard to say it's more succinct:
find . -name '*.txt' -print0 | xargs -0 grep 'sometext' /dev/null
Notes:
/find -print0 and xargs -0 makes pathnames with embedded blanks work correctly.
The /dev/null argument makes sure grep always prepends a filename.

Install ack and use
ack -aG'\.txt$' 'sometext'

I second ephemient's suggestion of ack. I'm writing this post to highlight a particular issue.
In response to jgormley (in the comments): ack is available as a single file which will work wherever the right Perl version is installed (which is everywhere).
Given that on non-Linux platforms grep regularly does not accept -R, arguably using ack is more portable.

I use zsh, which has recursive globbing. If you needed to look at specific filetypes, the following would be equivalent to your example:
grep 'sometext' **/*.txt
If you don't care about the filetype, the -r option will be better:
grep -r 'sometext' *
Although, A minor tweak to your original example will give you exactly what you want:
find . -name '*.txt' \! -wholename '*/.svn/*' -exec grep 'sometext' '{}' \; -print
If this is something you do frequently, make it a function (put this in your shell config):
function grep_no_svn {
find . -name "${2:-*}" \! -wholename '*/.svn/*' -exec grep "$1" '{}' \; -print
}
Where the first argument to the function is the text you're searching for. So:
$ grep_here_no_svn "sometext"
Or:
$ grep_here_no_svn "sometext" "*.txt"

You could write a script (in bash or whatever -- I have one in Groovy) and place it on the path. E.g.
$ myFind.sh txt targetString
where myFind.sh is:
find . -name "*.$1" -exec grep $2 {} \; -print

I usualy avoid the "man find" by using grep $(find . -name "*,txt")

You say that you like the output of your method (using find) better. The only difference I can see between them is that grepping multiple files will put the filename on the front.
You can always (in GNU grep, but you must be using that or -r and --include wouldn't work) turn the filename off by using -h (--no-filename). The opposite, for anyone who does want filenames but has to use find for some other reason, is -H (--with-filename).

Related

Bash script to return all elements given an extension, without using print flags

I want to create shell script that search inside all folders of the actual directory and return all files that satisfy some condition, but without using any print flag.
(Here the condition is to end with .py)
What I have done:
find . -name '*.py'| sed -n 's/\.py$//p'
The output:
./123
./test
./abc/dfe/test3
./testing
./test2
What I would like to achieve:
123
test
test3
testing
test2
Use -exec:
find . -name '*.py' -exec sh -c 'for f; do f=${f%.py}; echo "${f##*/}"; done' sh {} +
If GNU basename is an option, you can simplify this to
find . -name '*.py' -exec basename -s .py {} +
POSIX basename is a little more expensive, as you'll have to call it on every file individually:
find . -name '*.py' -exec basename {} .py \;
Using GNU grep instead of sed:
find . -name '*.py' | grep -oP '[^/]+(?=\.py$)'
If portability is not a concern, this is a very readable option:
find . -name '*.py' | xargs basename -a
This is also differentiated from chepner's answer in that it retains the .py file ending in the output.
I'm not familiar with the -exec flag, and I'm sure his one-liners can be customized to do the same, but I couldn't do so off the top of my head.
Chepner's version achieves the same with the small modification:
find . -name '*.py' -exec basename {} \;
if you want the literal output from find and didn't intend to drop the file endings when you used dummy variables (123,test, etc.) in your question.
find shows entries relative to where you ask it to search, you can simply replace the . with a *:
find * -name '*.py'| sed -n 's/\.py$//p'
(Be aware that this skips top level hidden directories)
This might work for you (GNU parallel):
find . -name '*.py*' 2>/dev/null | parallel echo "{/.}"

bash function grep --exclude-dir not working

I have the following function defined in my .bashrc, but for some reason the --exclude-dir option is not excluding the .git directory. Can anyone see what I've done wrong? I'm using Ubuntu 13.10 if that helps.
function fif # find in files
{
pattern=${1?" Usage: fif <word_pattern> [files pattern]"};
files=${2:+"-iname \"$2\""};
grep "$pattern" --color -n -H -s $(find . $files -type f) --exclude-dir=.git --exclude="*.min.*"
return 0;
}
Make sure not to include a trailing slash when you specify the directory to exclude. For example:
Do this:
$ grep -r --exclude-dir=node_modules firebase .
NOT this:
$ grep -r --exclude-dir=node_modules/ firebase .
(This answer not applicable to OP, but may be helpful for others who find --exclude-dir not to be working -- it worked for me.)
Do a man grep on your system, and see what version you have. Your version of grep may not be able to use --exclude-dirs.
You're really better off using find to find the files you want, then use grep to parse them:
$ find . -name '.git' -type d -prune \
-o -name "*.min.*" -prune \
-o -type f -exec grep --color -n -H {} "$pattern" \;
I'm not a fan of the recursive grep. Its syntax has become bloated, and it's really unnecessary. We have a perfectly good tool for finding files that match a particular criteria, thank you.
In the find program, the -o separate out the various clauses. If a file has not been filtered out by a previous -prune clause, it is passed to the next one. Once you've pruned out all of the .git directories and all of the *.min.* files, you pass the results to the -exec clause that executes your grep command on that one file.
Some people prefer it this way:
$ find . -name '.git' -type d -prune \
-o -name "*.min.*" -prune \
-o -type f -print0 | xargs -0 grep --color -n -H "$pattern"
The -print0 prints out all of the found files separated by the NULL character. The xargs -0 will read in that list of files and pass them to the grep command. The -0 tells xargs that the file names are NULL separated and not whitespace separated. Some xargs will take --null instead of the -0 parameter.

Modifying replace string in xargs

When I am using xargs sometimes I do not need to explicitly use the replacing string:
find . -name "*.txt" | xargs rm -rf
In other cases, I want to specify the replacing string in order to do things like:
find . -name "*.txt" | xargs -I '{}' mv '{}' /foo/'{}'.bar
The previous command would move all the text files under the current directory into /foo and it will append the extension bar to all the files.
If instead of appending some text to the replace string, I wanted to modify that string such that I could insert some text between the name and extension of the files, how could I do that? For instance, let's say I want to do the same as in the previous example, but the files should be renamed/moved from <name>.txt to /foo/<name>.bar.txt (instead of /foo/<name>.txt.bar).
UPDATE: I manage to find a solution:
find . -name "*.txt" | xargs -I{} \
sh -c 'base=$(basename $1) ; name=${base%.*} ; ext=${base##*.} ; \
mv "$1" "foo/${name}.bar.${ext}"' -- {}
But I wonder if there is a shorter/better solution.
The following command constructs the move command with xargs, replaces the second occurrence of '.' with '.bar.', then executes the commands with bash, working on mac OSX.
ls *.txt | xargs -I {} echo mv {} foo/{} | sed 's/\./.bar./2' | bash
It is possible to do this in one pass (tested in GNU) avoiding the use of the temporary variable assignments
find . -name "*.txt" | xargs -I{} sh -c 'mv "$1" "foo/$(basename ${1%.*}).new.${1##*.}"' -- {}
In cases like this, a while loop would be more readable:
find . -name "*.txt" | while IFS= read -r pathname; do
base=$(basename "$pathname"); name=${base%.*}; ext=${base##*.}
mv "$pathname" "foo/${name}.bar.${ext}"
done
Note that you may find files with the same name in different subdirectories. Are you OK with duplicates being over-written by mv?
If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:
find . -name "*.txt" | parallel 'ext={/} ; mv -- {} foo/{/.}.bar."${ext##*.}"'
Watch the intro videos for GNU Parallel to learn more:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
If you're allowed to use something other than bash/sh, AND this is just for a fancy "mv"... you might try the venerable "rename.pl" script. I use it on Linux and cygwin on windows all the time.
http://people.sc.fsu.edu/~jburkardt/pl_src/rename/rename.html
rename.pl 's/^(.*?)\.(.*)$/\1-new_stuff_here.\2/' list_of_files_or_glob
You can also use a "-p" parameter to rename.pl to have it tell you what it WOULD HAVE DONE, without actually doing it.
I just tried the following in my c:/bin (cygwin/windows environment). I used the "-p" so it spit out what it would have done. This example just splits the base and extension, and adds a string in between them.
perl c:/bin/rename.pl -p 's/^(.*?)\.(.*)$/\1-new_stuff_here.\2/' *.bat
rename "here.bat" => "here-new_stuff_here.bat"
rename "htmldecode.bat" => "htmldecode-new_stuff_here.bat"
rename "htmlencode.bat" => "htmlencode-new_stuff_here.bat"
rename "sdiff.bat" => "sdiff-new_stuff_here.bat"
rename "widvars.bat" => "widvars-new_stuff_here.bat"
the files should be renamed/moved from <name>.txt to /foo/<name>.bar.txt
You can use rename utility, e.g.:
rename s/\.txt$/\.txt\.bar/g *.txt
Hint: The subsitution syntax is similar to sed or vim.
Then move the files to some target directory by using mv:
mkdir /some/path
mv *.bar /some/path
To do rename files into subdirectories based on some part of their name, check for:
-p/--mkpath/--make-dirs Create any non-existent directories in the target path.
Testing:
$ touch {1..5}.txt
$ rename --dry-run "s/.txt$/.txt.bar/g" *.txt
'1.txt' would be renamed to '1.txt.bar'
'2.txt' would be renamed to '2.txt.bar'
'3.txt' would be renamed to '3.txt.bar'
'4.txt' would be renamed to '4.txt.bar'
'5.txt' would be renamed to '5.txt.bar'
Adding on that the wikipedia article is surprisingly informative
for example:
Shell trick
Another way to achieve a similar effect is to use a shell as the launched command, and deal with the complexity in that shell, for example:
$ mkdir ~/backups
$ find /path -type f -name '*~' -print0 | xargs -0 bash -c 'for filename; do cp -a "$filename" ~/backups; done' bash
Inspired by an answer by #justaname above, this command which incorporates Perl one-liner will do it:
find ./ -name \*.txt | perl -p -e 's/^(.*\/(.*)\.txt)$/mv $1 .\/foo\/$2.bar.txt/' | bash

Find, grep, and execute - all in one?

This is the command I've been using for finding matches (queryString) in php files, in the current directory, with grep, case insensitive, and showing matching results in line:
find . -iname "*php" -exec grep -iH queryString {} \;
Is there a way to also pipe just the file name of the matches to another script?
I could probably run the -exec command twice, but that seems inefficient.
What I'd love to do on Mac OS X is then actually to "reveal" that file in the finder. I think I can handle that part. If I had to give up the inline matches and just let grep show the files names, and then pipe that to a third script, that would be fine, too - I would settle.
But I'm actually not even sure how to pipe the output (the matched file names) to somewhere else...
Help! :)
Clarification
I'd like to reveal each of the files in a finder window - so I'm probably not going to using the -q flag and stop at the first one.
I'm going to run this in the console, ideally I'd like to see the inline matches printed out there, as well as being able to pipe them to another script, like oascript (applescript, to reveal them in the finder). That's why I have been using -H - because I like to see both the file name and the match.
If I had to settle for just using -l so that the file name could more easily be piped to another script, that would be OK, too. But I think after looking at the reply below from #Charlie Martin, that xargs could be helpful here in doing both at the same time with a single find, and single grep command.
I did say bash but I don't really mind if this needs to be ran as /bin/sh instead - I don't know too much about the differences yet, but I do know there are some important ones.
Thank you all for the responses, I'm going to try some of them at the command line and see if I can get any of them to work and then I think I can choose the best answer. Leave a comment if you want me to clarify anything more.
Thanks again!
You bet. The usual thing is something like
$ find /path -name pattern -print | xargs command
So you might for example do
$ find . -name '*.[ch]' -print | xargs grep -H 'main'
(Quiz: why -H?)
You can carry on with this farther; for example. you might use
$ find . -name '*.[ch]' -print | xargs grep -H 'main' | cut -d ':' -f 1
to get the vector of file names for files that contain 'main', or
$ find . -name '*.[ch]' -print | xargs grep -H 'main' | cut -d ':' -f 1 |
xargs growlnotify -
to have each name become a Growl notification.
You could also do
$ grep pattern `find /path -name pattern`
or
$ grep pattern $(find /path -name pattern)
(in bash(1) at least these are equivalent) but you can run into limits on the length of a command line that way.
Update
To answer your questions:
(1) You can do anything in bash you can do in sh. The one thing I've mentioned that would be any different is the use of $(command) in place of using backticks around command, and that works in the version of sh on Macs. The csh, zsh, ash, and fish are different.
(2) I think merely doing $ open $(dirname arg) will opena finder window on the containing directory.
It sounds like you want to open all *.php files that contain querystring from within a Terminal.app session.
You could do it this way:
find . -name '*.php' -exec grep -li 'querystring' {} \; | xargs open
With my setup, this opens MacVim with each file on a separate tab. YMMV.
Replace -H with -l and you will get a list of those filenames that matched the pattern.
if you have bash4, simply do
grep pattern /path/**/*.php
the ** operator is like
grep pattern `find -name \*.php -print`
find /home/aaronmcdaid/Code/ -name '*.cpp' -exec grep -q -iH boost {} \; -exec echo {} \;
The first change I made is to add -q to your grep command. This is "Exit immediately with zero status if any match is found".
The good news is that this speeds up grep when a file has many matching lines. You don't care how many matches there are. But that means we need another exec on the end to actually print the filenames when grep has been successful
The grep result will be sent to stdout, so another -exec predicate is probably the best solution here.
Pipe to another script:
find . -iname "*.php" | myScript
File names will come into the stdin of myScript 1 line at a time.
You can also use xargs to form/execute commands to act on each file:
find . -iname "*.php" | xargs ls -l
act on files you find that match:
find . -iname "*.php" | xargs grep -l pattern | myScript
act that don't match pattern
find . -iname "*.php" | xargs grep -L pattern | myScript
In general using multiple -exec's and grep -q will be FAR faster than piping, since find has implied short circuits -a's separating each juxtaposed pair of expressions that's not separated with an explicit operator. The main problem here, is that you want something to happen if grep matches something AND for matches to be printed. If the files are reasonably sized then this should be faster (because grep -q exits after finding a single match)
find . -iname "*php" -exec grep -iq queryString {} \; -exec grep -iH queryString {} \; -exec otherprogram {} \;
If the files are particularly big, encapsulating it in a shell script may be faster then running multiple grep commands
find . -iname "*php" -exec bash -c \
'out=$(grep -iH queryString "$1"); [[ -n $out ]] && echo "$out" && exit 0 || exit 1' \
bash {} \; -print
Also note, if the matches are not particularly needed, then
find . -iname "*php" -exec grep -iq queryString {} \; -exec otherprogram {} \;
Will virtually always be faster than then a piped solution like
find . -iname "*php" -print0 | xargs -0 grep -iH | ...
Additionally, you should really have -type f in all cases, unless you want to catch *php directories
Regarding the question of which is faster, and you actually care about the minuscule time difference, which maybe you might if you are trying to see which will save your processor some time... perhaps testing using the command as a suffix to the "time" command, and see which one performs better.

How do I write a bash alias/function to grep all files in all subdirectories for a string?

I've been using the following command to grep for a string in all the python source files in and below my current directory:
find . -name '*.py' -exec grep -nHr <string> {} \;
I'd like to simplify things so that I can just type something like
findpy <string>
And get the exact same result. Aliases don't seem sufficient since they only do a string expansion, and the argument I need to specify is not the last argument. It sounds like functions are suitable for the task, so I have several questions:
How do I write it?
Where do I put it?
If you don't want to create an entire script for this, you can do it with just a shell function:
findpy() { find . -name '*.py' -exec grep -nHr "$1" {} \; ; }
...but then you may have to define it in both ~/.bashrc and ~/.bash_profile, so it gets defined for both login and interactive shells (see the INVOCATION section of bash's man page).
All the "find ... -exec" solutions above are OK in the sense that they work, but they are horribly inefficient and will be extremely slow for large trees. The reason is that they launch a new process for every single *.py file. Instead, use xargs(1), and run grep only on files (not directories):
#! /bin/sh
find . -name \*.py -type f | xargs grep -nHr "$1"
For example:
$ time sh -c 'find . -name \*.cpp -type f -exec grep foo {} \; >/dev/null'
real 0m3.747s
$ time sh -c 'find . -name \*.cpp -type f | xargs grep foo >/dev/null'
real 0m0.278s
On a side note, you should take a look at Ack for what you are doing. It is designed as a replacement for Grep written in Perl. Filtering files based on the target language or ignoring .svn directories and the like.
Example (snippet from Trac source):
$ ack --python foo ./mysource
ticket/tests/wikisyntax.py
139:milestone:foo
144:<a class="missing milestone" href="/milestone/foo" rel="nofollow">milestone:foo</a>
ticket/tests/conversion.py
34: ticket['foo'] = 'This is a custom field'
ticket/query.py
239: count_sql = 'SELECT COUNT(*) FROM (' + sql + ') AS foo'
I wanted something similar, and the answer by Idelic reminded of one of the nice features of xargs: that it puts the command at the end. You see, my problem was that I wanted to write a shell alias that would "accept parameters" (really, that it would expand in such a way to allow me to pass parameter so grep).
Here's what I added to my bash_aliases:
alias findpy="find . -type f -name '*.py' | xargs grep"
This way, I could write findpy WORD or findpy -e REGEX or findpy -il WORD - the point being that could use any grep command-line option.
Put the following three lines in a file named findpy
#!/bin/bash
find . -name '*.py' -exec grep -nHr $1 {} \;
Then say
chmod u+x findpy
I normally have a directory called bin in my home directory where I put little shell scripts like this. Make sure to add the directory to your PATH.
The script:
#!/bin/bash
find . -name '*.py' -exec grep -nHr "$1" {} ';'
is how I'd do it.
You write it with an editor like vim and put it somewhere on your path. My normal approach is to have a ~/bin directory and make sure my .profile file (or equivalent) contains:
PATH=$PATH:~/bin
Many versions of grep have options to do recursion, specify filename pattern, etc.
grep --perl-regexp --recursive --include='*.py' --regexp="$1" .
This recurses starting from the current directory (.), looks only at files ending in 'py', uses Perl-style regular expressions.
If your version of grep doesn't support --recursive and --include, then you can still use find and xargs, but be sure to allow for pathnames with embedded spaces by using the -print0 argument to find and the --null option to xargs to handle that.
find . -type f -name '*.py' -print0 | xargs --null grep "$1"
should work.
Add the following line to your ~/.bashrc or ~/.bash_profile or ~/.profile
alias findpy='find . -type f -name "*.py" -print0 | xargs -0 grep'
then you can use it like this
findpy def
or with grep options
findpy -i class
the following alias will ignore the version control meta-directory of git and svn
alias findpy='find . -type f -not -path "*/.git/*" -a -not -path "*/.svn/*" -name "*.py" -print0 | xargs -0 grep'
#######################################################################################
#
# Function to search all files (including sub-directories) that match a given file
# extension ($2) looking for an indicated string ($1) - in a case insensitive manner.
#
# For Example:
#
# -> findfile AllowNegativePayments cpp
#
#
#######################################################################################
findfile ()
{
find . -iname "*.$2*" -type f -print0 | xargs -0 grep -i "$1" {} \; 2> /dev/nul
}
alias _ff='findfile'

Resources