grep like command to find matching lines plus neighbourhood lines - bash

grep command is really powerful and I use it a lot.
Sometime I have the necessity to find something with grep looking inside many many files to find the string I barely remember helping myself with -i (ignore case) option, -r (recursive) and also -v (exclude).
But what I really need is to have a special output from grep which highlight the matching line(s) plus the neighbourhood lines (given the matching line I'd like to see, let's say, the 2 preceding and the 2 subsequent lines).
Is there a way to get this result using bash?

Grep itself will do this
grep -A 2 -B 2 foo myfile.txt

most greps allow the "context" flag making it a bit more readable:
grep --context=3 foo myfile.txt

You can omit -C
grep -2 foo myfile.txt
is equal to
grep -C 2 foo myfile.txt

Related

How to grep, excluding some patterns?

I'd like find lines in files with an occurrence of some pattern and an absence of some other pattern. For example, I need find all files/lines including loom except ones with gloom. So, I can find loom with command:
grep -n 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp)
Now, I want to search loom excluding gloom. However, both of following commands failed:
grep -v 'gloom' -n 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp)
grep -n 'loom' -v 'gloom' ~/projects/**/trunk/src/**/*.#(h|cpp)
What should I do to achieve my goal?
EDIT 1: I mean that loom and gloom are the character sequences (not necessarily the words). So, I need, for example, bloomberg in the command output and don't need ungloomy.
EDIT 2: There is sample of my expectations.
Both of following lines are in command output:
I faced the icons that loomed through the veil of incense.
Arty is slooming in a gloomy day.
Both of following lines aren't in command output:
It’s gloomyin’ ower terrible — great muckle doolders o’ cloods.
In the south west round of the heigh pyntit hall
How about just chaining the greps?
grep -n 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp) | grep -v 'gloom'
Another solution without chaining grep:
egrep '(^|[^g])loom' ~/projects/**/trunk/src/**/*.#(h|cpp)
Between brackets, you exclude the character g before any occurrence of loom, unless loom is the first chars of the line.
A bit old, but oh well...
The most up-voted solution from #houbysoft will not work as that will exclude any line with "gloom" in it, even if it has "loom". According to OP's expectations, we need to include lines with "loom", even if they also have "gloom" in them. This line needs to be in the output "Arty is slooming in a gloomy day.", but this will be excluded by a chained grep like
grep -n 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp) | grep -v 'gloom'
Instead, the egrep regex example of Bentoy13 works better
egrep '(^|[^g])loom' ~/projects/**/trunk/src/**/*.#(h|cpp)
as it will include any line with "loom" in it, regardless of whether or not it has "gloom". On the other hand, if it only has gloom, it will not include it, which is precisely the behaviour OP wants.
Just use awk, it's much simpler than grep in letting you clearly express compound conditions.
If you want to skip lines that contains both loom and gloom:
awk '/loom/ && !/gloom/{ print FILENAME, FNR, $0 }' ~/projects/**/trunk/src/**/*.#(h|cpp)
or if you want to print them:
awk '/(^|[^g])loom/{ print FILENAME, FNR, $0 }' ~/projects/**/trunk/src/**/*.#(h|cpp)
and if the reality is you just want lines where loom appears as a word by itself:
awk '/\<loom\>/{ print FILENAME, FNR, $0 }' ~/projects/**/trunk/src/**/*.#(h|cpp)
-v is the "inverted match" flag, so piping is a very good way:
grep "loom" ~/projects/**/trunk/src/**/*.#(h|cpp)| grep -v "gloom"
Simply use! grep -v multiple times.
#Content of file
[root#server]# cat file
1
2
3
4
5
#Exclude the line or match
[root#server]# cat file |grep -v 3
1
2
4
5
#Exclude the line or match multiple
[root#server]# cat file |grep -v "3\|5"
1
2
4
/*You might be looking something like this?
grep -vn "gloom" `grep -l "loom" ~/projects/**/trunk/src/**/*.#(h|cpp)`
The BACKQUOTES are used like brackets for commands, so in this case with -l enabled,
the code in the BACKQUOTES will return you the file names, then with -vn to do what you wanted: have filenames, linenumbers, and also the actual lines.
UPDATE Or with xargs
grep -l "loom" ~/projects/**/trunk/src/**/*.#(h|cpp) | xargs grep -vn "gloom"
Hope that helps.*/
Please ignore what I've written above, it's rubbish.
grep -n "loom" `grep -l "loom" tt4.txt` | grep -v "gloom"
#this part gets the filenames with "loom"
#this part gets the lines with "loom"
#this part gets the linenumber,
#filename and actual line
You can use grep -P (perl regex) supported negative lookbehind:
grep -P '(?<!g)loom\b' ~/projects/**/trunk/src/**/*.#(h|cpp)
I added \b for word boundaries.
grep -n 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp) | grep -v 'gloom'
Question: search for 'loom' excluding 'gloom'.
Answer:
grep -w 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp)

How to grep return result as the matching term

I would like to return only the first instance (case-insensitive) of the term I used to search (if there's a match), how would I do this?
example:
$ grep "exactly-this"
Binary file /Path/To/Some/Files/file.txt matches
I would like to return the result like:
$ grep "exactly-this"
exactly-this
grep has an inbuilt count argument
You can use the -m option to give a count argument to grep
grep -m 1 "exactly-this"
If you want to avoid the message in case of the binary files,use
grep -a -m 1 "exactly-this"
Note that this will print the word in which the match occurred.Since it is a binary file,the word may span over multiple lines
What you need is the -o option of grep.
From the man page
-o, --only-matching
Prints only the matching part of the lines.
Test:
[jaypal:~/Temp] cat file
This is a file with some exactly this in the middle
with exactly this in the begining
and some at the very end in brackets (exactly this)
[jaypal:~/Temp] grep -o 'exactly this' file
exactly this
exactly this
exactly this
[jaypal:~/Temp] grep -om1 'exactly this' file
exactly this

Is it more efficient to grep twice or use a regular expression once?

I'm trying to parse a couple of 2gb+ files and want to grep on a couple of levels.
Say I want to fetch lines that contain "foo" and lines that also contain "bar".
I could do grep foo file.log | grep bar, but my concern is that it will be expensive running it twice.
Would it be beneficial to use something like grep -E '(foo.*bar|bar.*foo)' instead?
grep -E '(foo|bar)' will find lines containing 'foo' OR 'bar'.
You want lines containing BOTH 'foo' AND 'bar'. Either of these commands will do:
sed '/foo/!d;/bar/!d' file.log
awk '/foo/ && /bar/' file.log
Both commands -- in theory -- should be much more efficient than your cat | grep | grep construct because:
Both sed and awk perform their own file reading; no need for pipe overhead
The 'programs' I gave to sed and awk above use Boolean short-circuiting to quickly skip lines not containing 'foo', thus testing only lines containing 'foo' to the /bar/ regex
However, I haven't tested them. YMMV :)
In theory, the fastest way should be:
grep -E '(foo.*bar|bar.*foo)' file.log
For several reasons: First, grep reads directly from the file, rather than adding the step of having cat read it and stuff it down a pipe for grep to read. Second, it uses only a single instance of grep, so each line of the file only has to be processed once. Third, grep -E is generally faster than plain grep on large files (but slower on small files), although this will depend on your implementation of grep. Finally, grep (in all its variants) is optimized for string searching, while sed and awk are general-purpose tools that happen to be able to search (but aren't optimized for it).
These two operations are fundamentally different. This one:
cat file.log | grep foo | grep bar
looks for foo in file.log, then looks for bar in whatever the last grep output. Whereas cat file.log | grep -E '(foo|bar)' looks for either foo or bar in file.log. The output should be very different. Use whatever behavior you need.
As for efficiency, they're not really comparable because they do different things. Both should be fast enough, though.
If you're doing this:
cat file.log | grep foo | grep bar
You're only printing lines that contain both foo and bar in any order. If this is your intention:
grep -e "foo.*bar" -e "bar.*foo" file.log
Will be more efficient since I only have to parse the output once.
Notice I don't need the cat which is more efficient in itself. You rarely ever need cat unless you are concatinating files (which is the purpose of the command). 99% of the time you can either add a file name to the end of the first command in a pipe, or if you have a command like tr that doesn't allow you to use a file, you can always redirect the input like this:
tr `a-z` `A-Z` < $fileName
But, enough about useless cats. I have two at home.
You can pass multiple regular expressions to a single grep which is usually a bit more efficient than piping multiple greps. However, if you can eliminate regular expressions, you might find this the most efficient:
fgrep "foo" file.log | fgrep "bar"
Unlike grep, fgrep doesn't parse regular expressions which means it can parse lines much, much faster. Try this:
time fgrep "foo" file.log | fgrep "bar"
and
time grep -e "foo.*bar" -e "bar.*foo" file.log
And see which is faster.

bash grep newline

[Editorial insertion: Possible duplicate of the same poster's earlier question?]
Hi, I need to extract from the file:
first
second
third
using the grep command, the following line:
second
third
How should the grep command look like?
Instead of grep, you can use pcregrep which supports multiline patterns
pcregrep -M 'second\nthird' file
-M allows the pattern to match more than one line.
Your question abstract "bash grep newline", implies that you would want to match on the second\nthird sequence of characters - i.e. something containing newline within it.
Since the grep works on "lines" and these two are different lines, you would not be able to match it this way.
So, I'd split it into several tasks:
you match the line that contains "second" and output the line that has matched and the subsequent line:
grep -A 1 "second" testfile
you translate every other newline into the sequence that is guaranteed not to occur in the input. I think the simplest way to do that would be using perl:
perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;'
you do a grep on these lines, this time searching for string ##UnUsedSequence##third:
grep "##UnUsedSequence##third"
you unwrap the unused sequences back into the newlines, sed might be the simplest:
sed -e 's/##UnUsedSequence##/\n'
So the resulting pipe command to do what you want would look like:
grep -A 1 "second" testfile | perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;' | grep "##UnUsedSequence##third" | sed -e 's/##UnUsedSequence##/\n/'
Not the most elegant by far, but should work. I'm curious to know of better approaches, though - there should be some.
I don't think grep is the way to go on this.
If you just want to strip the first line from any file (to generalize your question), I would use sed instead.
sed '1d' INPUT_FILE_NAME
This will send the contents of the file to standard output with the first line deleted.
Then you can redirect the standard output to another file to capture the results.
sed '1d' INPUT_FILE_NAME > OUTPUT_FILE_NAME
That should do it.
If you have to use grep and just don't want to display the line with first on it, then try this:
grep -v first INPUT_FILE_NAME
By passing the -v switch, you are telling grep to show you everything but the expression that you are passing. In effect show me everything but the line(s) with first in them.
However, the downside is that a file with multiple first's in it will not show those other lines either and may not be the behavior that you are expecting.
To shunt the results into a new file, try this:
grep -v first INPUT_FILE_NAME > OUTPUT_FILE_NAME
Hope this helps.
I don't really understand what do you want to match. I would not use grep, but one of the following:
tail -2 file # to get last two lines
head -n +2 file # to get all but first line
sed -e '2,3p;d' file # to get lines from second to third
(not sure how standard it is, it works in GNU tools for sure)
So you just don't want the line containing "first"? -v inverts the grep results.
$ echo -e "first\nsecond\nthird\n" | grep -v first
second
third
Line? Or lines?
Try
grep -E -e '(second|third)' filename
Edit: grep is line oriented. you're going to have to use either Perl, sed or awk to perform the pattern match across lines.
BTW -E tell grep that the regexp is extended RE.
grep -A1 "second" | grep -B1 "third" works nicely, and if you have multiple matches it will even get rid of the original -- match delimiter
grep -E '(second|third)' /path/to/file
egrep -w 'second|third' /path/to/file
you could use
$ grep -1 third filename
this will print a string with match and one string before and after. Since "third" is in the last string you get last two strings.
I like notnoop's answer, but building on AndrewY's answer (which is better for those without pcregrep, but way too complicated), you can just do:
RESULT=`grep -A1 -s -m1 '^\s*second\s*$' file | grep -s -B1 -m1 '^\s*third\s*$'`
grep -v '^first' filename
Where the -v flag inverts the match.

Use lines in a file as filenames for grep?

I have a file which contains filenames (and the full path to them) and I want to search for a word within all of them.
some pseudo-code to explain:
grep keyword <all files specified in files.txt>
or
cat files.txt > grep keyword
cat files txt | grep keyword
the problem is that I can only get grep to search the filenames, not the contents of the actual files.
cat files.txt | xargs grep keyword
or
grep keyword `cat files.txt`
or (equivalent to previous but harder to mis-read)
grep keyword $(cat files.txt)
should do the trick.
Pitfalls:
If files.txt contains file names with spaces, either solution will malfunction, because "This is a filename.txt" will be interpreted as four files, "This", "is", "a", and "filename.txt". A good reason why you shouldn't have spaces in your filenames, ever.
There are ways around this, but none of them is trivial. (find ... -print0 / xargs -0 is one of them.)
The second (cat) version can result in a very long command line (which might fail when exceeding the limits of your environment). The first (xargs) version handles long input automatically; xargs offers several options to control the details.
Both of the answers from DevSolar work (tested on Linux Ubuntu), but the xargs version is preferable if there may be many files, since it will avoid running into command line length limits.
so:
cat files.txt | xargs grep keyword
is the way to go
tr '\n' '\0' <files.txt | LANG=C xargs -r0 grep -F keyword
tr will delimit names with NUL character so that spaces not significant (note the corresponding -0 option to xargs).
xargs -r will start a single grep process for a "large" number of files, but not start any grep process if there are no files.
LANG=C means use quick routines for matching, rather than slow locale ones
grep -F means use quick string matching rather than slow regular expression matching
bash, ksh & zsh version:
grep keyword $(<files.txt)
Long time when last created a bash shell script, but you could store the result of the first grep (the one finding all filenames) in an array and iterate over it, issuing even more grep commands.
A good starting point should be the bash scripting guide.

Resources