How to grep, excluding some patterns? - bash

I'd like find lines in files with an occurrence of some pattern and an absence of some other pattern. For example, I need find all files/lines including loom except ones with gloom. So, I can find loom with command:
grep -n 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp)
Now, I want to search loom excluding gloom. However, both of following commands failed:
grep -v 'gloom' -n 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp)
grep -n 'loom' -v 'gloom' ~/projects/**/trunk/src/**/*.#(h|cpp)
What should I do to achieve my goal?
EDIT 1: I mean that loom and gloom are the character sequences (not necessarily the words). So, I need, for example, bloomberg in the command output and don't need ungloomy.
EDIT 2: There is sample of my expectations.
Both of following lines are in command output:
I faced the icons that loomed through the veil of incense.
Arty is slooming in a gloomy day.
Both of following lines aren't in command output:
It’s gloomyin’ ower terrible — great muckle doolders o’ cloods.
In the south west round of the heigh pyntit hall

How about just chaining the greps?
grep -n 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp) | grep -v 'gloom'

Another solution without chaining grep:
egrep '(^|[^g])loom' ~/projects/**/trunk/src/**/*.#(h|cpp)
Between brackets, you exclude the character g before any occurrence of loom, unless loom is the first chars of the line.

A bit old, but oh well...
The most up-voted solution from #houbysoft will not work as that will exclude any line with "gloom" in it, even if it has "loom". According to OP's expectations, we need to include lines with "loom", even if they also have "gloom" in them. This line needs to be in the output "Arty is slooming in a gloomy day.", but this will be excluded by a chained grep like
grep -n 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp) | grep -v 'gloom'
Instead, the egrep regex example of Bentoy13 works better
egrep '(^|[^g])loom' ~/projects/**/trunk/src/**/*.#(h|cpp)
as it will include any line with "loom" in it, regardless of whether or not it has "gloom". On the other hand, if it only has gloom, it will not include it, which is precisely the behaviour OP wants.

Just use awk, it's much simpler than grep in letting you clearly express compound conditions.
If you want to skip lines that contains both loom and gloom:
awk '/loom/ && !/gloom/{ print FILENAME, FNR, $0 }' ~/projects/**/trunk/src/**/*.#(h|cpp)
or if you want to print them:
awk '/(^|[^g])loom/{ print FILENAME, FNR, $0 }' ~/projects/**/trunk/src/**/*.#(h|cpp)
and if the reality is you just want lines where loom appears as a word by itself:
awk '/\<loom\>/{ print FILENAME, FNR, $0 }' ~/projects/**/trunk/src/**/*.#(h|cpp)

-v is the "inverted match" flag, so piping is a very good way:
grep "loom" ~/projects/**/trunk/src/**/*.#(h|cpp)| grep -v "gloom"

Simply use! grep -v multiple times.
#Content of file
[root#server]# cat file
1
2
3
4
5
#Exclude the line or match
[root#server]# cat file |grep -v 3
1
2
4
5
#Exclude the line or match multiple
[root#server]# cat file |grep -v "3\|5"
1
2
4

/*You might be looking something like this?
grep -vn "gloom" `grep -l "loom" ~/projects/**/trunk/src/**/*.#(h|cpp)`
The BACKQUOTES are used like brackets for commands, so in this case with -l enabled,
the code in the BACKQUOTES will return you the file names, then with -vn to do what you wanted: have filenames, linenumbers, and also the actual lines.
UPDATE Or with xargs
grep -l "loom" ~/projects/**/trunk/src/**/*.#(h|cpp) | xargs grep -vn "gloom"
Hope that helps.*/
Please ignore what I've written above, it's rubbish.
grep -n "loom" `grep -l "loom" tt4.txt` | grep -v "gloom"
#this part gets the filenames with "loom"
#this part gets the lines with "loom"
#this part gets the linenumber,
#filename and actual line

You can use grep -P (perl regex) supported negative lookbehind:
grep -P '(?<!g)loom\b' ~/projects/**/trunk/src/**/*.#(h|cpp)
I added \b for word boundaries.

grep -n 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp) | grep -v 'gloom'

Question: search for 'loom' excluding 'gloom'.
Answer:
grep -w 'loom' ~/projects/**/trunk/src/**/*.#(h|cpp)

Related

How to grep only matching string from this result?

I am just simply trying to grab the commit ID, but not quite sure what I'm missing:
➜ ~ curl https://github.com/microsoft/vscode/releases -s | grep -oE 'microsoft/vscode/commit/(.*?)/hovercard'
microsoft/vscode/commit/ccbaa2d27e38e5afa3e5c21c1c7bef4657064247/hovercard
The only thing I need back from this is ccbaa2d27e38e5afa3e5c21c1c7bef4657064247.
This works just fine on regex101.com and in ruby/python. What am I missing?
If supported, you can use grep -oP
echo "microsoft/vscode/commit/ccbaa2d27e38e5afa3e5c21c1c7bef4657064247/hovercard" | grep -oP "microsoft/vscode/commit/\K.*?(?=/hovercard)"
Output
ccbaa2d27e38e5afa3e5c21c1c7bef4657064247
Another option is to use sed with a capture group
echo "microsoft/vscode/commit/ccbaa2d27e38e5afa3e5c21c1c7bef4657064247/hovercard" | sed -E 's/microsoft\/vscode\/commit\/([^\/]+)\/hovercard/\1/'
Output
ccbaa2d27e38e5afa3e5c21c1c7bef4657064247
The point is that grep does not support extracting capturing group submatches. If you install pcregrep you could do that with
curl https://github.com/microsoft/vscode/releases -s | \
pcregrep -o1 'microsoft/vscode/commit/(.*?)/hovercard' | head -1
The | head -1 part is to fetch the first occurrence only.
I would suggest using awk here:
awk 'match($0,/microsoft\/vscode\/commit\/[^\/]*\/hovercard/){print substr($0,RSTART+24,RLENGTH-34);exit}'
The regex will match a line containing
microsoft\/vscode\/commit\/ - microsoft/vscode/commit/ fixed string
[^\/]* - zero or more chars other than /
\/hovercard - a /hovercard string.
The substr($0,RSTART+24,RLENGTH-34) will print the part of the line starting at the RSTART+24 (24 is the length of microsoft/vscode/commit/) index and the RLENGTH is the length of microsoft/vscode/commit/ + the length of the /hovercard.
The exit command will fetch you the first occurrence. Remove it if you need all occurrences.
You can use sed:
curl -s https://github.com/microsoft/vscode/releases |
sed -En 's=.*microsoft/vscode/commit/([^/]+)/hovercard.*=\1=p' |
head -n 1
head -n 1 is to print the first match (there are 10)grep -o will print (only) everything that matches, including microsoft/ etc.
Your task can not be achieved with Mac's grep. grep -o prints all matching text (compared to default behaviour of printing matching lines), including microsoft/ etc. A grep which implemented perl regex (like GNU grep on Linux) could make use of look ahead/behind (grep -Po '(?<=microsoft/vscode/commit/)[^/]+(?=/hovercard)'). But it's just not available on Mac's grep.
On MacOS you don't have gnu utilities available by default. You can just pipe your output to a simple awk like this:
curl https://github.com/microsoft/vscode/releases -s |
grep -oE 'microsoft/vscode/commit/[^/]+/hovercard' |
awk -F/ '{print $(NF-1)}'
ccbaa2d27e38e5afa3e5c21c1c7bef4657064247
3a6960b964327f0e3882ce18fcebd07ed191b316
f4af3cbf5a99787542e2a30fe1fd37cd644cc31f
b3318bc0524af3d74034b8bb8a64df0ccf35549a
6cba118ac49a1b88332f312a8f67186f7f3c1643
c13f1abb110fc756f9b3a6f16670df9cd9d4cf63
ee8c7def80afc00dd6e593ef12f37756d8f504ea
7f6ab5485bbc008386c4386d08766667e155244e
83bd43bc519d15e50c4272c6cf5c1479df196a4d
e7d7e9a9348e6a8cc8c03f877d39cb72e5dfb1ff

Grep/Sed/Awk Options

How could you grep or use sed or awk to parse for a dynamic length substring? Here are some examples:
I need to parse out everything except for the "XXXXX.WAV" in these strings, but the strings are not a set length.
Sometimes its like this:
{"filename": "/assets/JFM/imaging/19001.WAV"},
{"filename": "/assets/JFM/imaging/19307.WAV"},
{"filename": "/assets/JFM/imaging/19002.WAV"}
And sometimes like this:
{"filename": "/assets/JFM/LN_405999/101.WAV"},
{"filename": "/assets/JFM/LN_405999/102.WAV"},
{"filename": "/assets/JFM/LN_405999/103.WAV"}
Is there a great dynamic way to parse for just the .WAV? Maybe if I start at "/" and parse until "?
Edit:
Expected output like this:
19001.WAV
19307.WAV
19002.WAV
Or:
101.WAV
101.WAV
103.WAV
Just use grep as proposed in comments:
grep -o '[^/]\{1,\}\.WAV' yourfile
If the wav file always contains numbers, this seems more explicit (same result):
grep -o '[0-9]\{1,\}\.WAV'
Assuming there are [ and ] lines at the beginning and end of your file, it looks like your input is JSON, in which case I would recommend installing and using jq rather than text-based utilities, and doing something like this:
jq -r '.[]|.filename|split("/")[-1]'
But failing that, any of the tools listed will work just fine.
grep -o '[^/]*\.WAV'
or
sed -ne 's,.*/\([^/]*\.WAV\).*$,\1,p'
or
awk -F'"' '/WAV/ {split($4,a,"/"); print a[length(a)]}'
In each case there are a variety of other possible solutions as well.
Or with sed
$ sed 's,.*/,,; s,".*,,' x
101.WAV
102.WAV
103.WAV
Explanation:
s,.*/,, - delete everything up to and including the rightmost /
s,".*,, - delete everything starting with the leftmost " to the end of the line
another awk
awk -F'[/"]' '{print $(NF-1)}' file
19001.WAV
19307.WAV
19002.WAV
Try this -
awk -F'[{":}/]' '{print $(NF-2)}' f
19001.WAV
19307.WAV
19002.WAV
OR
egrep -o '[[:digit:]]{5}.WAV' f
19001.WAV
19307.WAV
19002.WAV
OR
egrep -o '[[:digit:]]{5}.[[:alpha:]]{3}' f
19001.WAV
19307.WAV
19002.WAV
You can easily change the value of digit and character as per your need for different example in egrep but awk will work fine for both case.
All of the programs you listed use regex to parse the names, so I will show you an example using grep, being probably the most basic one for this case.
There are a couple of options, depending on the exact way you define the XXX part before the ".wav".
Option 1, as you pointed out is just the file name, i.e., everything after the last slash:
grep -hoi "[^/]\+\.WAV"
This reads as "any character besides slash" ([^/]) repeated at least once (\+), followed by a literal .WAV (\.WAV).
Option 2 would be to only grab the digits before the extension:
grep -hoi "[[:digit:]]\+\.WAV"
OR
grep -hoi "[0-9]\+\.WAV"
These read as "digits" ([[:digit:]] and [0-9] mean the same thing) repeated at least once (\+), followed by a literal .WAV (\.WAV).
In all cases, I recommend using the flags -h, -o, -i, which I have concatenated into a single option -hoi. -h suppresses the file name from the output. -o makes grep only output the portion that matches. -i makes the match case insensitive, so should your extension ever change to .wav instead of .WAV, you'll be fine.
Also, in all cases, the input is up to you. You can pipe it in from another program, which will look like
program | grep -hoi "[^/]\+\.WAV"
You can get it from a file using stdin redirection:
grep -hoi "[^/]\+\.WAV" < somefile.txt
Or you can just pass the filename to grep:
grep -hoi "[^/]\+\.WAV" somefile.txt
awk -F/ '{print substr($5,1,7)}' file
101.WAV
102.WAV
103.WAV

How to grep and match the first occurrence of a line?

Given the following content:
title="Bar=1; Fizz=2; Foo_Bar=3;"
I'd like to match the first occurrence of Bar value which is 1. Also I don't want to rely on soundings of the word (like double quote in the front), because the pattern could be in the middle of the line.
Here is my attempt:
$ grep -o -m1 'Bar=[ ./0-9a-zA-Z_-]\+' input.txt
Bar=1
Bar=3
I've used -m/--max-count which suppose to stop reading the file after num matches, but it didn't work. Why this option doesn't work as expected?
I could mix with head -n1, but I wondering if it is possible to achieve that with grep?
grep is line-oriented, so it apparently counts matches in terms of lines when using -m[1]
- even if multiple matches are found on the line (and are output individually with -o).
While I wouldn't know to solve the problem with grep alone (except with GNU grep's -P option - see anubhava's helpful answer), awk can do it (in a portable manner):
$ awk -F'Bar=|;' '{ print $2 }' <<<"Bar=1; Fizz=2; Foo_Bar=3;"
1
Use print "Bar=" $2, if the field name should be included.
Also note that the <<< method of providing input via stdin (a so-called here-string) is specific to Bash, Ksh, Zsh; if POSIX compliance is a must, use echo "..." | grep ... instead.
[1] Options -m and -o are not part of the grep POSIX spec., but both GNU and BSD/OSX grep support them and have chosen to implement the line-based logic.
This is consistent with the standard -c option, which counts "selected lines", i.e., the number of matching lines:
grep -o -c 'Bar=[ ./0-9a-zA-Z_-]\+' <<<"Bar=1; Fizz=2; Foo_Bar=3;" yields 1.
Using perl based regex flavor in gnu grep you can use:
grep -oP '^(.(?!Bar=\d+))*Bar=\d+' <<< "Bar=1; Fizz=2; Foo_Bar=3;"
Bar=1
(.(?!Bar=\d+))* will match 0 or more of any characters that don't have Bar=\d+ pattern thus making sure we match first Bar=\d+
If intent is to just print the value after = then use:
grep -oP '^(.(?!Bar=\d+))*Bar=\K\d+' <<< "Bar=1; Fizz=2; Foo_Bar=3;"
1
You can use grep -P (assuming you are on gnu grep) and positive look ahead ((?=.*Bar)) to achieve that in grep:
echo "Bar=1; Fizz=2; Foo_Bar=3;" | grep -oP -m 1 'Bar=[ ./0-9a-zA-Z_-]+(?=.*Bar)'
First use a grep to make the line start with Bar, and then get the Bar at the start of the line:
grep -o "Bar=.*" input.txt | grep -o -m1 "^Bar=[ ./0-9a-zA-Z_-]\+"
When you have a large file, you can optimize with
grep -o -m1 "Bar=.*" input.txt | grep -o -m1 "^Bar=[ ./0-9a-zA-Z_-]\+"

Can I grep only the first n lines of a file?

I have very long log files, is it possible to ask grep to only search the first 10 lines?
The magic of pipes;
head -10 log.txt | grep <whatever>
For folks who find this on Google, I needed to search the first n lines of multiple files, but to only print the matching filenames. I used
gawk 'FNR>10 {nextfile} /pattern/ { print FILENAME ; nextfile }' filenames
The FNR..nextfile stops processing a file once 10 lines have been seen. The //..{} prints the filename and moves on whenever the first match in a given file shows up. To quote the filenames for the benefit of other programs, use
gawk 'FNR>10 {nextfile} /pattern/ { print "\"" FILENAME "\"" ; nextfile }' filenames
Or use awk for a single process without |:
awk '/your_regexp/ && NR < 11' INPUTFILE
On each line, if your_regexp matches, and the number of records (lines) is less than 11, it executes the default action (which is printing the input line).
Or use sed:
sed -n '/your_regexp/p;10q' INPUTFILE
Checks your regexp and prints the line (-n means don't print the input, which is otherwise the default), and quits right after the 10th line.
You have a few options using programs along with grep. The simplest in my opinion is to use head:
head -n10 filename | grep ...
head will output the first 10 lines (using the -n option), and then you can pipe that output to grep.
grep "pattern" <(head -n 10 filename)
head -10 log.txt | grep -A 2 -B 2 pattern_to_search
-A 2: print two lines before the pattern.
-B 2: print two lines after the pattern.
head -10 log.txt # read the first 10 lines of the file.
You can use the following line:
head -n 10 /path/to/file | grep [...]
The output of head -10 file can be piped to grep in order to accomplish this:
head -10 file | grep …
Using Perl:
perl -ne 'last if $. > 10; print if /pattern/' file
An extension to Joachim Isaksson's answer: Quite often I need something from the middle of a long file, e.g. lines 5001 to 5020, in which case you can combine head with tail:
head -5020 file.txt | tail -20 | grep x
This gets the first 5020 lines, then shows only the last 20 of those, then pipes everything to grep.
(Edited: fencepost error in my example numbers, added pipe to grep)
grep -A 10 <Pattern>
This is to grab the pattern and the next 10 lines after the pattern. This would work well only for a known pattern, if you don't have a known pattern use the "head" suggestions.
grep -m6 "string" cov.txt
This searches only the first 6 lines for string

bash grep newline

[Editorial insertion: Possible duplicate of the same poster's earlier question?]
Hi, I need to extract from the file:
first
second
third
using the grep command, the following line:
second
third
How should the grep command look like?
Instead of grep, you can use pcregrep which supports multiline patterns
pcregrep -M 'second\nthird' file
-M allows the pattern to match more than one line.
Your question abstract "bash grep newline", implies that you would want to match on the second\nthird sequence of characters - i.e. something containing newline within it.
Since the grep works on "lines" and these two are different lines, you would not be able to match it this way.
So, I'd split it into several tasks:
you match the line that contains "second" and output the line that has matched and the subsequent line:
grep -A 1 "second" testfile
you translate every other newline into the sequence that is guaranteed not to occur in the input. I think the simplest way to do that would be using perl:
perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;'
you do a grep on these lines, this time searching for string ##UnUsedSequence##third:
grep "##UnUsedSequence##third"
you unwrap the unused sequences back into the newlines, sed might be the simplest:
sed -e 's/##UnUsedSequence##/\n'
So the resulting pipe command to do what you want would look like:
grep -A 1 "second" testfile | perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;' | grep "##UnUsedSequence##third" | sed -e 's/##UnUsedSequence##/\n/'
Not the most elegant by far, but should work. I'm curious to know of better approaches, though - there should be some.
I don't think grep is the way to go on this.
If you just want to strip the first line from any file (to generalize your question), I would use sed instead.
sed '1d' INPUT_FILE_NAME
This will send the contents of the file to standard output with the first line deleted.
Then you can redirect the standard output to another file to capture the results.
sed '1d' INPUT_FILE_NAME > OUTPUT_FILE_NAME
That should do it.
If you have to use grep and just don't want to display the line with first on it, then try this:
grep -v first INPUT_FILE_NAME
By passing the -v switch, you are telling grep to show you everything but the expression that you are passing. In effect show me everything but the line(s) with first in them.
However, the downside is that a file with multiple first's in it will not show those other lines either and may not be the behavior that you are expecting.
To shunt the results into a new file, try this:
grep -v first INPUT_FILE_NAME > OUTPUT_FILE_NAME
Hope this helps.
I don't really understand what do you want to match. I would not use grep, but one of the following:
tail -2 file # to get last two lines
head -n +2 file # to get all but first line
sed -e '2,3p;d' file # to get lines from second to third
(not sure how standard it is, it works in GNU tools for sure)
So you just don't want the line containing "first"? -v inverts the grep results.
$ echo -e "first\nsecond\nthird\n" | grep -v first
second
third
Line? Or lines?
Try
grep -E -e '(second|third)' filename
Edit: grep is line oriented. you're going to have to use either Perl, sed or awk to perform the pattern match across lines.
BTW -E tell grep that the regexp is extended RE.
grep -A1 "second" | grep -B1 "third" works nicely, and if you have multiple matches it will even get rid of the original -- match delimiter
grep -E '(second|third)' /path/to/file
egrep -w 'second|third' /path/to/file
you could use
$ grep -1 third filename
this will print a string with match and one string before and after. Since "third" is in the last string you get last two strings.
I like notnoop's answer, but building on AndrewY's answer (which is better for those without pcregrep, but way too complicated), you can just do:
RESULT=`grep -A1 -s -m1 '^\s*second\s*$' file | grep -s -B1 -m1 '^\s*third\s*$'`
grep -v '^first' filename
Where the -v flag inverts the match.

Resources