Select line below search expression in log file - bash

I am trying to search logs for an expression, then select the line below each match.
Example
I know I want the lines below CommonText, for example given the log data:
CommonTerm: something
This Should
random stuff
CommonTerm: something else
Be The
random stuff
more random stuff
CommonTerm: something else
Output Text
random fluff
Desired Output
This Should
Be The
Output Text
Current Attempt
Currently I can use grep log_file CommonTerm -B 0 -A 1 to get:
CommonTerm: something
This Should
--
CommonTerm: something else
Be The
--
CommonTerm: something else
Output Text
I can then pipe this through | grep "\-\-" -B 0 -A 1 to get
This Should
--
--
Be The
--
--
Output Text
--
And then through awk '{if (count++%3==0) print $0;}', giving:
This Should
Be The
Output Text
My question is: surely there's a good 'unix-y' way to do this? Multi greps and a hacky awk feels pretty silly... Is there?
Edit: I also tried:
(grep 'CommonTerm:' log_file -B 0 -A 2) | grep "\-\-" -B 1 -A 0 | grep -v "^--$"
but it seems much more clunky than the answers below which was expected ;)
Edit:
There are some great answers coming in, are there any which would let me easily select the nth line after the search term? I see a few might be more easy than others...

awk 'p { print; p=0 }
/CommonTerm/ { p=1 }' file

You can use sed:
sed -n "/^CommonTerm: /{n;p}" log_file
This searches for "CommonTerm: " at the start of the line (^), then skips to the next line (n) and prints it (p).
EDIT: As per the comment thread, if you're using BSD sed rather than GNU sed (likely to be the case on OS X), you need a couple of extra semicolons to get round a bug:
sed -n "/^CommonTerm: /{;n;p;}" log_file

How about:
grep -B 0 -A 1 "CommonTerm" log_file | grep -v "^CommonTerm:" | grep -v "^--$"

I'd do this with awk:
awk 'found{found=0;print;next}/CommonTerm/{found=1}'

For those that have pcregrep installed, this can be done at one shot. Notice the use of \K to reset the starting point of the match
pcregrep -Mo 'CommonTerm.*?\n\K.*?(?=\n)' file

Related

Echo something while piping stdout

I know how to pipe stdout:
./myScript | grep 'important'
Example output of the above command:
Very important output.
Only important stuff here.
But while greping I would also like to echo something each line so it looks like this:
1) Very important output.
2) Only important stuff here.
How can I do that?
Edit: Apparently, I haven't specified well enough what I want to do. Numbering of lines is just an example, I want to know in general how to add text (any text, including variables and whatnot) to pipe output. I see one can achieve that using awk '{print $0}' where $0 is the solution I'm looking for.
Are there any other ways to achieve this?
This will number the hits from 0
./myScript | grep 'important' | awk '{printf("%d) %s\n", NR, $0)}'
1) Very important output.
2) Only important stuff here.
This will give you the line number of the hit
./myScript | grep -n 'important'
3:Very important output.
47:Only important stuff here.
If you want line numbers on the new output running from 1..n where n is number of lines in the new output:
./myScript | awk '/important/{printf("%d) %s\n", ++i, $0)}'
# ^ Grep part ^ Number starting at 1
A solution with a while loop is not suited for large files, so you should only use this solution when you do not have a lot important stuff:
i=0
while read -r line; do
((i++))
printf "(%s) Look out: %s" $i "${line}"
done < <(./myScript | grep 'important')

remove n lines from STDOUT on bash

Do you have any bash solution to remove N lines from stdout?
like a 'head' command, print all lines, only except last N
Simple solition on bash:
find ./test_dir/ | sed '$d' | sed '$d' | sed '$d' | ...
but i need to copy sed command N times
Any better solution?
except awk, python etc...
Use head with a negative number. In my example it will print all lines but last 3:
head -n -3 infile
if head -n -3 filename doesn't work on your system (like mine), you could also try the following approach (and maybe alias it or create a function in your .bashrc)
head -`echo "$(wc -l filename)" | awk '{ print $1 - 3; }'` filename
Where filename and 3 above are your file and number of lines respectively.
The tail command can skip from the end of a file on Mac OS / BSD. tail accepts +/- prefix, which facilitates expression below, which will show 3 lines from the start
tail -n +3 filename.ext
Or, to skip lines from the end of file, use - prefixed, instead.
tail -n -3 filenme.ext
Typically, the default for tail is the - prefix, thus counting from the end of the file. See a similar answer to a different question here: Print a file skipping first X lines in Bash

Make grep stop after first NON-matching line

I'm trying to use grep to go through some logs and only select the most recent entries. The logs have years of heavy traffic on them so it's silly to do
tac error.log | grep 2012
tac error.log | grep "Jan.2012"
etc.
and wait for 10 minutes while it goes through several million lines which I already know are not going to match. I know there is the -m option to stop at the first match but I don't know of a way to make it stop at first non-match. I could do something like grep -B MAX_INT -m 1 2011 but that's hardly an optimal solution.
Can grep handle this or would awk make more sense?
How about using awk like this:
tac error.log | awk '{if(/2012/)print;else exit}'
This should exit as soon as a line not matching 2012 is found.
Here is a solution in python:
# foo.py
import sys, re
for line in sys.stdin:
if re.match(r'2012', line):
print line,
continue
break
you#host> tac foo.txt | python foo.py
I don't think grep supports this.
But here is my "why did we have awk again" answer:
tail -n `tac biglogfile | grep -vnm1 2012 | sed 's/:.*//' | xargs expr -1 +` biglogfile
Note that this isn't going to be exact if your log is being written to.
The excellent one-line scripts for sed page to the rescue:
# print section of file between two regular expressions (inclusive)
sed -n '/Iowa/,/Montana/p' # case sensitive
In other words, you should be able to do the following:
sed -n '/Jan 01 2012/,/Feb 01 2012/p' error.log | grep whatevs
Here is an example that parses the user's dot-plan file and stops at the first non-matching line:
PID=$$
while read ln; do
echo $ln | {
if grep "^[-*+] " >/dev/null; then
# matched
echo -e $ln
elif grep "^[#]" >/dev/null; then
# ignore comment line
:
else
# stop at first non-matching line
kill $PID
fi
}
done <$HOME/.plan
Of course this approach is considerably slower than if Grep reads the lines but at least you can incorporate several cases (not just the non-match).
For more complex scripts, it is worth noting that Bash can also apply regular expressions to variables, i.e. you can also do completely without grep.

Can I grep only the first n lines of a file?

I have very long log files, is it possible to ask grep to only search the first 10 lines?
The magic of pipes;
head -10 log.txt | grep <whatever>
For folks who find this on Google, I needed to search the first n lines of multiple files, but to only print the matching filenames. I used
gawk 'FNR>10 {nextfile} /pattern/ { print FILENAME ; nextfile }' filenames
The FNR..nextfile stops processing a file once 10 lines have been seen. The //..{} prints the filename and moves on whenever the first match in a given file shows up. To quote the filenames for the benefit of other programs, use
gawk 'FNR>10 {nextfile} /pattern/ { print "\"" FILENAME "\"" ; nextfile }' filenames
Or use awk for a single process without |:
awk '/your_regexp/ && NR < 11' INPUTFILE
On each line, if your_regexp matches, and the number of records (lines) is less than 11, it executes the default action (which is printing the input line).
Or use sed:
sed -n '/your_regexp/p;10q' INPUTFILE
Checks your regexp and prints the line (-n means don't print the input, which is otherwise the default), and quits right after the 10th line.
You have a few options using programs along with grep. The simplest in my opinion is to use head:
head -n10 filename | grep ...
head will output the first 10 lines (using the -n option), and then you can pipe that output to grep.
grep "pattern" <(head -n 10 filename)
head -10 log.txt | grep -A 2 -B 2 pattern_to_search
-A 2: print two lines before the pattern.
-B 2: print two lines after the pattern.
head -10 log.txt # read the first 10 lines of the file.
You can use the following line:
head -n 10 /path/to/file | grep [...]
The output of head -10 file can be piped to grep in order to accomplish this:
head -10 file | grep …
Using Perl:
perl -ne 'last if $. > 10; print if /pattern/' file
An extension to Joachim Isaksson's answer: Quite often I need something from the middle of a long file, e.g. lines 5001 to 5020, in which case you can combine head with tail:
head -5020 file.txt | tail -20 | grep x
This gets the first 5020 lines, then shows only the last 20 of those, then pipes everything to grep.
(Edited: fencepost error in my example numbers, added pipe to grep)
grep -A 10 <Pattern>
This is to grab the pattern and the next 10 lines after the pattern. This would work well only for a known pattern, if you don't have a known pattern use the "head" suggestions.
grep -m6 "string" cov.txt
This searches only the first 6 lines for string

Script to grab output lines after specific pattern

I have a program, "wimaxcu scan" to be precise, that outputs data in a format like the following:
network A
frequency
signal strength
noise ratio
network B
frequency
signal strength
noise ratio
etc....
There are a huge number of elements that get output by the program. I am only interested in a few of the properties of one particular network, say for example network J. I would like to write a bash script that will place the signal strength and noise ratio of J on a new line in a specified text file every time that I run the script. So after running the script many times I would have a file that looks like:
Point 1 signal_strength noise_ratio
Point 2 signal_strength noise_ratio
Point 3 signal_strength noise_ratio
etc...
I was advised to pipe the output into grep to accomplish this. I'm fairly certain that grep is not the best method to accomplish this because the lines I want to grab are indistinguishable from other noise and signal strengths lines. I'm thinking that the "network J" pattern would have to be recognized (it is unique), and then the lines that come 2nd and 3rd after the found pattern would be grabbed.
My question is how others would recommend that I implement such a script. I'm not very experienced with bash, so the simplest method would be appreciated, rather than a complex but efficient method.
With awk!
If your data is in a file called "data," you can do this on the command line:
$ awk -v RS='\n\n' -v FS='\n' '{ print $1,$3,$4 }' data
What that will do is set your "record separator" to two newlines, the "field separator" to a single newline, and then print fields one, three, and four from each data set.
Awk, if you're not familiar, operates on records, and can do various things with them. So this simply says "a record looks like this, and print it this way." Specifically, "A record has fields that are separated by newlines, and each record is separated by two consecutive newlines. Print the first, third, and fourth fields of these records out for each record."
Edit: As Jo So (who fully read and comprehended what you were asking for) points out, you can add an if statement to the inside of the curly braces to specify a specific network. Or, if it were unique, you could just throw in a pipe to grep at the end. But his solution is more correct, since it will only match against that first field!
$ awk -v RS='\n\n' -v FS='\n' '{ if ($1 == "Network J") print $1,$3,$4 }' data
To complete Dan Fego's very good answer (sorry, it seems I'm not yet allowed to place comments), consider this:
awk -v RS='\n\n' -v FS='\n' '{if ($1 == "network J") print $3}' data
This is actually a very robust piece of code.
Actually Grep is the right option.
What you have to do is use the -A (after) and -B (before) options of grep. You can use something like:
grep "network J" -A 3 original_output
this will output the 3 lines after network J including the line network J. But you don't want the words "network J" so
grep "network J" -A 3 original_output | grep -v "network J"
you then have to put them in one line which is easily done by echoing the output as in:.
echo $(grep "network J" -A original_output | grep -v "network J")
Now you will end up with all instances of Network J in the file. you can append them to an output
Part A
echo $(grep "network J" -A original_output | grep -v "network J") >> net_j_report.txt
adding Point 1 ... etc to the beginning can be done later by:
Part B
grep -v '^[[:space:]]*$' net_j_report.txt | cat -n | sed -e 's/^/Point /'
here grep -v removes any accidental empty lines, cat -n adds line numbers and last sed statement puts the word Point in the beginning.
so combine part A and B and voila.
This might work for you:
# cat file1 # same format for file2, file3, ...
network A
frequency
signalA strength
noise1 ratio
network B
frequency
signalB strength
noise1 ratio
# sed -n '/network/{s/network \(.\)/cat <<\\EOF >>\1/p;n;n;N;y/ /_/;s/\n/ /;s/$/\nEOF/p}' file1 | sh
# sed -n '/network/{s/network \(.\)/cat <<\\EOF >>\1/p;n;n;N;y/ /_/;s/\n/ /;s/$/\nEOF/p}' file2 | sh
# sed -n '/network/{s/network \(.\)/cat <<\\EOF >>\1/p;n;n;N;y/ /_/;s/\n/ /;s/$/\nEOF/p}' file3 | sh
# sed -i = A
# sed -i 'N;s/^/Point /;s/\n/ /' A
# sed -i = B
# sed -i 'N;s/^/Point /;s/\n/ /' B
# cat A
Point 1 signalA_strength noise1_ratio
Point 2 signalA_strength noise2_ratio
Point 3 signalA_strength noise3_ratio
# cat B
Point 1 signalB_strength noise1_ratio
Point 2 signalB_strength noise2_ratio
Point 3 signalB_strength noise3_ratio

Resources