grep something with xargs and find - bash

bash guru ;) I'm trying to improve some string in bash which grep specific keyword's matches in specific files. It looks like that:
find /<path>/hp -iname '*.ppd' -print0 | xargs -0 grep "\*ModelName\:"
which works very fast for me! In 20 times faster than this one:
find /<path>/hp -iname '*.ppd' -print0 | xargs -0 -I {} bash -c 'grep "\*ModelName\:" {}'
But the problem is that in the first script I'm getting the following lines:
/<path>/hp/hp-laserjet_m9040_mfp-ps.ppd:*ModelName: "HP LaserJet M9040 M9050 MFP"
but desired result is just
*ModelName: "HP LaserJet M9040 M9050 MFP"
(as in the second script). How can I achieve it?
P.S.: I'm using find for flexibility and future improvements of the script.

No need for find:
grep -rh --include "*.ppd" "\*ModelName\:"

The -h option to grep suppress filenames from the output.
find /<path>/hp -iname '*.ppd' -print0 | xargs -0 grep -h "\*ModelName\:"
If your grep does not provide -h the use cat:
find /<path>/hp -iname '*.ppd' -print0 | xargs -0 cat | grep "\*ModelName\:"
Also, for your information, find provides the -exec option which would render xargs unnecessary had you wanted to pursue your second option:
find /<path>/hp -iname '*.ppd' -exec grep grep "\*ModelName\:" '{}' \;

You can get rid of find altogether (in bash):
shopt -s globstar
grep -h "\*ModelName\:" /<path>/hp/**.[pP][pP][dD]
Might be a bit slower if you have a huge directory tree (which I doubt in your case).
Pro: only one process launched!
Con: the future improvement you mentioned might be more difficult to implement.
In this case, you'd better use:
find /<path>/hp -iname '*.ppd' -exec grep -h "\*ModelName\:" {} +
(observe the + at the end: only one grep will be launched).

Think of your output line
/<path>/hp/hp-laserjet_m9040_mfp-ps.ppd:*ModelName: "HP LaserJet M9040 M9050 MFP"
as a record of three fields separated by a colon. If you think of your output line this way, then you want to extract the third field as your final answer. If you don't know anything about awk, you should know at least how to print a column of output data using a specific column separator, as I am showing you below:
find /<path>/hp -iname '*.ppd' -print0 | xargs -0 grep "\*ModelName\:" | awk -F:'{ print $3}'
The other thing you should know about awk is how to sum up (and occasionally, take the average) of the numbers in a specific column of output data, but that's another story for another day :)
The advantage of appending the awk command to your command chain is that the you are building on and taking advantage of the fast performance of your optimized command chain :)
In your case, the answer is grep with xargs and find and awk :)

Related

Grep for specific php malware pattern

I am trying to figure out grep for malware that is hard to match with a single pattern. One line from the malicious file looks like this:
$bhbwjhu[11].$bhbwjhu[15].$bhbwjhu[34].$bhbwjhu[23].$bhbwjhu[30].$bhbwjhu[6].$bhbwjhu[3].$bhbwjhu[34].$bhbwjhu[31]
Tried with something like this, but obviously, my grep skills are quite poor (this gives invalid range end error):
find . -type f | xargs grep -s -l "\$[A-z]*\[[0-9]*\]\.\$[A-z]*\[[0-9]*\]\.\$[A-z]*\[[0-9]*\]"
Any way to search for that bunch of array elements in files?
Grep version is
grep (GNU grep) 2.20
Linux version 2.6.32-896.16.1.lve1.4.54.el6.x86_64
I recommend using the following:
find . -type f -print0 | xargs -0 grep -s -l '\$[[:alpha:]]*\[[[:digit:]]*\]\.\$[[:alpha:]]*\[[[:digit:]]*\]\.\$[[:alpha:]]*\[[[:digit:]]*\]'
Using the character classes is much more safer, than using ranges. Also I recommend using -print0 and xargs -0 so filenames with whitespaces don't mess your command up. See also this explaination.

"grep -R" replacement?

I have a machine with grep installed but option -R is not compiled-in and there is also no replacement switch.
How can I replace it in bash?
I tried:
for i in `find *`; do
grep 'pattern' $i;
done
but that is not right re-interpretation, isn't it?
Try piping the output of find to xargs so that grep only gets invoked a few times (xargs keeps reading input until it gets so much that more would not fit in an argument list):
find -type f | xargs grep foo
We usually use
find . -exec grep 'pattern' {} \;
That usually works similarly to grep -R.

Find, grep, and execute - all in one?

This is the command I've been using for finding matches (queryString) in php files, in the current directory, with grep, case insensitive, and showing matching results in line:
find . -iname "*php" -exec grep -iH queryString {} \;
Is there a way to also pipe just the file name of the matches to another script?
I could probably run the -exec command twice, but that seems inefficient.
What I'd love to do on Mac OS X is then actually to "reveal" that file in the finder. I think I can handle that part. If I had to give up the inline matches and just let grep show the files names, and then pipe that to a third script, that would be fine, too - I would settle.
But I'm actually not even sure how to pipe the output (the matched file names) to somewhere else...
Help! :)
Clarification
I'd like to reveal each of the files in a finder window - so I'm probably not going to using the -q flag and stop at the first one.
I'm going to run this in the console, ideally I'd like to see the inline matches printed out there, as well as being able to pipe them to another script, like oascript (applescript, to reveal them in the finder). That's why I have been using -H - because I like to see both the file name and the match.
If I had to settle for just using -l so that the file name could more easily be piped to another script, that would be OK, too. But I think after looking at the reply below from #Charlie Martin, that xargs could be helpful here in doing both at the same time with a single find, and single grep command.
I did say bash but I don't really mind if this needs to be ran as /bin/sh instead - I don't know too much about the differences yet, but I do know there are some important ones.
Thank you all for the responses, I'm going to try some of them at the command line and see if I can get any of them to work and then I think I can choose the best answer. Leave a comment if you want me to clarify anything more.
Thanks again!
You bet. The usual thing is something like
$ find /path -name pattern -print | xargs command
So you might for example do
$ find . -name '*.[ch]' -print | xargs grep -H 'main'
(Quiz: why -H?)
You can carry on with this farther; for example. you might use
$ find . -name '*.[ch]' -print | xargs grep -H 'main' | cut -d ':' -f 1
to get the vector of file names for files that contain 'main', or
$ find . -name '*.[ch]' -print | xargs grep -H 'main' | cut -d ':' -f 1 |
xargs growlnotify -
to have each name become a Growl notification.
You could also do
$ grep pattern `find /path -name pattern`
or
$ grep pattern $(find /path -name pattern)
(in bash(1) at least these are equivalent) but you can run into limits on the length of a command line that way.
Update
To answer your questions:
(1) You can do anything in bash you can do in sh. The one thing I've mentioned that would be any different is the use of $(command) in place of using backticks around command, and that works in the version of sh on Macs. The csh, zsh, ash, and fish are different.
(2) I think merely doing $ open $(dirname arg) will opena finder window on the containing directory.
It sounds like you want to open all *.php files that contain querystring from within a Terminal.app session.
You could do it this way:
find . -name '*.php' -exec grep -li 'querystring' {} \; | xargs open
With my setup, this opens MacVim with each file on a separate tab. YMMV.
Replace -H with -l and you will get a list of those filenames that matched the pattern.
if you have bash4, simply do
grep pattern /path/**/*.php
the ** operator is like
grep pattern `find -name \*.php -print`
find /home/aaronmcdaid/Code/ -name '*.cpp' -exec grep -q -iH boost {} \; -exec echo {} \;
The first change I made is to add -q to your grep command. This is "Exit immediately with zero status if any match is found".
The good news is that this speeds up grep when a file has many matching lines. You don't care how many matches there are. But that means we need another exec on the end to actually print the filenames when grep has been successful
The grep result will be sent to stdout, so another -exec predicate is probably the best solution here.
Pipe to another script:
find . -iname "*.php" | myScript
File names will come into the stdin of myScript 1 line at a time.
You can also use xargs to form/execute commands to act on each file:
find . -iname "*.php" | xargs ls -l
act on files you find that match:
find . -iname "*.php" | xargs grep -l pattern | myScript
act that don't match pattern
find . -iname "*.php" | xargs grep -L pattern | myScript
In general using multiple -exec's and grep -q will be FAR faster than piping, since find has implied short circuits -a's separating each juxtaposed pair of expressions that's not separated with an explicit operator. The main problem here, is that you want something to happen if grep matches something AND for matches to be printed. If the files are reasonably sized then this should be faster (because grep -q exits after finding a single match)
find . -iname "*php" -exec grep -iq queryString {} \; -exec grep -iH queryString {} \; -exec otherprogram {} \;
If the files are particularly big, encapsulating it in a shell script may be faster then running multiple grep commands
find . -iname "*php" -exec bash -c \
'out=$(grep -iH queryString "$1"); [[ -n $out ]] && echo "$out" && exit 0 || exit 1' \
bash {} \; -print
Also note, if the matches are not particularly needed, then
find . -iname "*php" -exec grep -iq queryString {} \; -exec otherprogram {} \;
Will virtually always be faster than then a piped solution like
find . -iname "*php" -print0 | xargs -0 grep -iH | ...
Additionally, you should really have -type f in all cases, unless you want to catch *php directories
Regarding the question of which is faster, and you actually care about the minuscule time difference, which maybe you might if you are trying to see which will save your processor some time... perhaps testing using the command as a suffix to the "time" command, and see which one performs better.

Use find, wc, and sed to count lines

I was trying to use sed to count all the lines based on a particular extension.
find -name '*.m' -exec wc -l {} \; | sed ...
I was trying to do the following, how would I include sed in this particular line to get the totals.
You may also get the nice formatting from wc with :
wc `find -name '*.m'`
Most of the answers here won't work well for a large number of files. Some will break if the list of file names is too long for a single command line call, others are inefficient because -exec starts a new process for every file. I believe a robust and efficient solution would be:
find . -type f -name "*.m" -print0 | xargs -0 cat | wc -l
Using cat in this way is fine, as its output is piped straight into wc so only a small amount of the files' content is kept in memory at once. If there are too many files for a single invocation of cat, cat will be called multiple times, but all the output will still be piped into a single wc process.
You can cat all files through a single wc instance to get the total number of lines:
find . -name '*.m' -exec cat {} \; | wc -l
On modern GNU platforms wc and find take -print0 and -files0-from parameters that can be combined into a command that count lines in files with total at the end. Example:
find . -name '*.c' -type f -print0 | wc -l --files0-from=-
you could use sed also for counting lines in place of wc:
find . -name '*.m' -exec sed -n '$=' {} \;
where '$=' is a "special variable" that keep the count of lines
EDIT
you could also try something like sloccount
Hm, solution with cat may be problematic if you have many files, especially big ones.
Second solution doesn't give total, just lines per file, as I tested.
I'll prefer something like this:
find . -name '*.m' | xargs wc -l | tail -1
This will do the job fast, no matter how many and how big files you have.
sed is not the proper tool for counting. Use awk instead:
find . -name '*.m' -exec awk '{print NR}' {} +
Using + instead of \; forces find to call awk every N files found (like with xargs).
For big directories we should use:
find . -type f -name '*.m' -exec sed -n '$=' '{}' + 2>/dev/null | awk '{ total+=$1 }END{print total}'
# alternative using awk twice
find . -type f -name '*.m' -exec awk 'END {print NR}' '{}' + 2>/dev/null | awk '{ total+=$1 }END{print total}'

Write a shell script that find-greps and outputs filename and content in 1 line

To see all the php files that contain "abc" I can use this simple script:
find . -name "*php" -exec grep -l abc {} \;
I can omit the -l and i get extracted some part of the content instead of the filenames as results:
find . -name "*php" -exec grep abc {} \;
What I would like now is a version that does both at the same time, but on the same line.
Expected output:
path1/filename1: lorem abc ipsum
path2/filename2: ipsum abc lorem
path3/filename3: non abc quod
More or less like grep abc * does.
Edit: I want to use this as a simple shell script. It would be great if the output is on one line, so further grepping would be possible. But it is not necessary that the script is only one line, i am putting it in a bash script file anyways.
Edit 2: Later I found "ack", which is a great tool and I use this now in most cases instead of grep. It does all this and more. http://betterthangrep.com/ You would write ack --php --nogroup abc to get the desired result
Use the -H switch (man grep):
find . -name "*php" -exec grep -H abc {} \;
Alternative using xargs (now the -H switch is not needed, at least for the version of grep I have here):
find . -name "*php" -print | xargs grep abc
Edit: As a consequence of grep's behavior as noted by orsogufo, the second command above should use -H if find could conceivably return only a single filename (i.e. if there is only a single PHP file). If orsogufo's comment w.r.t. -print0 is also incorporated, the command becomes:
find . -name "*php" -print0 | xargs -0 grep -H abc
Edit 2: A (more1) POSIX compliant version as proposed by Jonathan Leffler, which through the use of /dev/null avoids the -H switch:
find . -name "*php" -print0 | xargs -0 grep abc /dev/null
1: A quote from the opengroup.org manual on find hints that -print0 is non-standard:
A feature of SVR4's find utility was
the -exec primary's + terminator. This
allowed filenames containing special
characters (especially s) to
be grouped together without the
problems that occur if such filenames
are piped to xargs. Other
implementations have added other ways
to get around this problem, notably a
-print0 primary that wrote filenames with a null byte terminator. This was
considered here, but not adopted.
Using a null terminator meant that any
utility that was going to process
find's -print0 output had to add a new
option to parse the null terminators
it would now be reading.
If you don't need to recursively search, you can just do..
grep -H abc *.php
..which gives you the desired output. -H is the default behaviour (at least on the OS X version of grep), so you can omit this:
grep abc *.php
You can grep recursively using the -R flag, but you're unable limit it to .php files:
grep -R abc *
Again, this has the same desired output.
I know this doesn't exactly answer your questions, it's just.. an alternative... The above are just grep with a single flag, so are easier to remember than find/-exec/grep/xargs combinations! (irrelevant for a script, but useful for day-to-day shell'ing)
find /path -type f -name "*.php" | awk '
{
while((getline line<$0)>0){
if(line ~ /time/){
print $0":"line
#do some other things here
}
}
}'

Resources