Script to grab output lines after specific pattern - bash

I have a program, "wimaxcu scan" to be precise, that outputs data in a format like the following:
network A
frequency
signal strength
noise ratio
network B
frequency
signal strength
noise ratio
etc....
There are a huge number of elements that get output by the program. I am only interested in a few of the properties of one particular network, say for example network J. I would like to write a bash script that will place the signal strength and noise ratio of J on a new line in a specified text file every time that I run the script. So after running the script many times I would have a file that looks like:
Point 1 signal_strength noise_ratio
Point 2 signal_strength noise_ratio
Point 3 signal_strength noise_ratio
etc...
I was advised to pipe the output into grep to accomplish this. I'm fairly certain that grep is not the best method to accomplish this because the lines I want to grab are indistinguishable from other noise and signal strengths lines. I'm thinking that the "network J" pattern would have to be recognized (it is unique), and then the lines that come 2nd and 3rd after the found pattern would be grabbed.
My question is how others would recommend that I implement such a script. I'm not very experienced with bash, so the simplest method would be appreciated, rather than a complex but efficient method.

With awk!
If your data is in a file called "data," you can do this on the command line:
$ awk -v RS='\n\n' -v FS='\n' '{ print $1,$3,$4 }' data
What that will do is set your "record separator" to two newlines, the "field separator" to a single newline, and then print fields one, three, and four from each data set.
Awk, if you're not familiar, operates on records, and can do various things with them. So this simply says "a record looks like this, and print it this way." Specifically, "A record has fields that are separated by newlines, and each record is separated by two consecutive newlines. Print the first, third, and fourth fields of these records out for each record."
Edit: As Jo So (who fully read and comprehended what you were asking for) points out, you can add an if statement to the inside of the curly braces to specify a specific network. Or, if it were unique, you could just throw in a pipe to grep at the end. But his solution is more correct, since it will only match against that first field!
$ awk -v RS='\n\n' -v FS='\n' '{ if ($1 == "Network J") print $1,$3,$4 }' data

To complete Dan Fego's very good answer (sorry, it seems I'm not yet allowed to place comments), consider this:
awk -v RS='\n\n' -v FS='\n' '{if ($1 == "network J") print $3}' data
This is actually a very robust piece of code.

Actually Grep is the right option.
What you have to do is use the -A (after) and -B (before) options of grep. You can use something like:
grep "network J" -A 3 original_output
this will output the 3 lines after network J including the line network J. But you don't want the words "network J" so
grep "network J" -A 3 original_output | grep -v "network J"
you then have to put them in one line which is easily done by echoing the output as in:.
echo $(grep "network J" -A original_output | grep -v "network J")
Now you will end up with all instances of Network J in the file. you can append them to an output
Part A
echo $(grep "network J" -A original_output | grep -v "network J") >> net_j_report.txt
adding Point 1 ... etc to the beginning can be done later by:
Part B
grep -v '^[[:space:]]*$' net_j_report.txt | cat -n | sed -e 's/^/Point /'
here grep -v removes any accidental empty lines, cat -n adds line numbers and last sed statement puts the word Point in the beginning.
so combine part A and B and voila.

This might work for you:
# cat file1 # same format for file2, file3, ...
network A
frequency
signalA strength
noise1 ratio
network B
frequency
signalB strength
noise1 ratio
# sed -n '/network/{s/network \(.\)/cat <<\\EOF >>\1/p;n;n;N;y/ /_/;s/\n/ /;s/$/\nEOF/p}' file1 | sh
# sed -n '/network/{s/network \(.\)/cat <<\\EOF >>\1/p;n;n;N;y/ /_/;s/\n/ /;s/$/\nEOF/p}' file2 | sh
# sed -n '/network/{s/network \(.\)/cat <<\\EOF >>\1/p;n;n;N;y/ /_/;s/\n/ /;s/$/\nEOF/p}' file3 | sh
# sed -i = A
# sed -i 'N;s/^/Point /;s/\n/ /' A
# sed -i = B
# sed -i 'N;s/^/Point /;s/\n/ /' B
# cat A
Point 1 signalA_strength noise1_ratio
Point 2 signalA_strength noise2_ratio
Point 3 signalA_strength noise3_ratio
# cat B
Point 1 signalB_strength noise1_ratio
Point 2 signalB_strength noise2_ratio
Point 3 signalB_strength noise3_ratio

Related

Making bash output a certain word from a .txt file

I have a question on Bash:
Like the title says, I require bash to output a certain word, depending on where it is in the file. In my explicit example I have a simple .txt file.
I already found out that you can count the number of words within a file with the command:
wc -w < myFile.txt
An output example would be:
78501
There certainly is also a way to make "cat" to only show word number x. Something like:
cat myFile.txt | wordno. 3125
desired-word
Notice, that I will welcome any command, that gets this done, not only cat.
Alternatively or in addition, I would be happy to know how you can make certain characters in a file show, based on their place in it. Something like:
cat myFile.txt | characterno. 2342
desired-character
I already know how you can achieve this with a variable:
a="hello, how are you"
echo ${a:9:1}
w
Only problem is a variable can only be so long. Is it as long as a whole .txt file, it won't work.
I look forward to your answers!
You could use awkfor this job it splits the string at spaces and prints the $wordnumber stringpart and tr is used to remove newlines
cat myFile.txt | tr -d '\n' | awk -v wordnumber=5 '{ print $wordnumber }'
And if you want the for example 5th. character you could do this like so
head -c 5 myFile.txt | tail -c 1
Since you have NOT shown samples of Input_file or expected output so couldn't test it. You could simply do this with awk as follows could be an example.
awk 'FNR==1{print substr($0,2342,1);next}' Input_file
Where we are telling awk to look for 1st line FNR==1 and in substr where we tell awk to take character 2342 and next 1 means from that position take only 1 character you could increase its value or keep it as per your need too.
With gawk:
awk 'BEGIN{RS="[[:space:]]+"} NR==12345' file
or
gawk 'NR==12345' RS="[[:space:]]+" file
I'm setting the record separator to a sequences of spaces which includes newlines and then print the 12345th record.
To improve the average performance you can exit the script once the match is found:
gawk 'BEGIN{RS="[[:space:]]+"}NR==12345{print;exit}' file

Echo something while piping stdout

I know how to pipe stdout:
./myScript | grep 'important'
Example output of the above command:
Very important output.
Only important stuff here.
But while greping I would also like to echo something each line so it looks like this:
1) Very important output.
2) Only important stuff here.
How can I do that?
Edit: Apparently, I haven't specified well enough what I want to do. Numbering of lines is just an example, I want to know in general how to add text (any text, including variables and whatnot) to pipe output. I see one can achieve that using awk '{print $0}' where $0 is the solution I'm looking for.
Are there any other ways to achieve this?
This will number the hits from 0
./myScript | grep 'important' | awk '{printf("%d) %s\n", NR, $0)}'
1) Very important output.
2) Only important stuff here.
This will give you the line number of the hit
./myScript | grep -n 'important'
3:Very important output.
47:Only important stuff here.
If you want line numbers on the new output running from 1..n where n is number of lines in the new output:
./myScript | awk '/important/{printf("%d) %s\n", ++i, $0)}'
# ^ Grep part ^ Number starting at 1
A solution with a while loop is not suited for large files, so you should only use this solution when you do not have a lot important stuff:
i=0
while read -r line; do
((i++))
printf "(%s) Look out: %s" $i "${line}"
done < <(./myScript | grep 'important')

Searching a file (grep/awk) for 2 carriage return/line-feed characters

I'm trying to write a script that'll simply count the occurrences of \r\n\r\n in a file. (Opening the sample file in vim binary mode shows me the ^M character in the proper places, and the newline is still read as a newline).
Anyway, I know there are tons of solutions, but they don't seem to get me what I want.
e.g. awk -e '/\r/,/\r/!d' or using $'\n' as part of the grep statement.
However, none of these seem to produce what I need. I can't find the \r\n\r\n pattern with grep's "trick", since that just expands one variable. The awk solution is greedy, and so gets me way more lines than I want/need.
Switching grep to binary/Perl/no-newline mode seems to be closer to what I want,
e.g. grep -UPzo '\x0D', but really what I want then is grep -UPzo '\x0D\x00\x0D\x00', which doesn't produce the output I want.
It seems like such a simple task.
By default, awk treats \n as the record separator. That makes it very hard to count \r\n\r\n. If we choose some other record separator, say a letter, then we can easily count the appearance of this combination. Thus:
awk '{n+=gsub("\r\n\r\n", "")} END{print n}' RS='a' file
Here, gsub returns the number of substitutions made. These are summed and, after the end of the file has been reached, we print the total number.
Example
Here, we use bash's $'...' construct to explicitly add newlines and linefeeds:
$ echo -n $'\r\n\r\n\r\n\r\na' | awk '{n+=gsub("\r\n\r\n", "")} END{print n}' RS='a'
2
Alternate solution (GNU awk)
We can tell it to treat \r\n\r\n as the record separator and then return the count (minus 1) of the number of records:
cat file <(echo 1) | awk 'END{print NR-1;}' RS='\r\n\r\n'
In awk, RS is the record separator and NR is the count of the number of records. Since we are using a multiple-character record separator, this requires GNU awk.
If the file ends with \r\n\r\n, the above would be off by one. To avoid that, the echo -n 1 statement is used to assure that there are always at least one character after the last \r\n\r\n in the file.
Examples
Here, we use bash's $'...' construct to explicitly add newlines and linefeeds:
$ echo -n $'abc\r\n\r\n' | cat - <(echo 1) | awk 'END{print NR-1;}' RS='\r\n\r\n'
1
$ echo -n $'abc\r\n\r\ndef' | cat - <(echo 1) | awk 'END{print NR-1;}' RS='\r\n\r\n'
1
$ echo -n $'\r\n\r\n\r\n\r\n' | cat - <(echo 1) | awk 'END{print NR-1;}' RS='\r\n\r\n'
2
$ echo -n $'1\r\n\r\n2\r\n\r\n3' | cat - <(echo 1) | awk 'END{print NR-1;}' RS='\r\n\r\n'
2

Select line below search expression in log file

I am trying to search logs for an expression, then select the line below each match.
Example
I know I want the lines below CommonText, for example given the log data:
CommonTerm: something
This Should
random stuff
CommonTerm: something else
Be The
random stuff
more random stuff
CommonTerm: something else
Output Text
random fluff
Desired Output
This Should
Be The
Output Text
Current Attempt
Currently I can use grep log_file CommonTerm -B 0 -A 1 to get:
CommonTerm: something
This Should
--
CommonTerm: something else
Be The
--
CommonTerm: something else
Output Text
I can then pipe this through | grep "\-\-" -B 0 -A 1 to get
This Should
--
--
Be The
--
--
Output Text
--
And then through awk '{if (count++%3==0) print $0;}', giving:
This Should
Be The
Output Text
My question is: surely there's a good 'unix-y' way to do this? Multi greps and a hacky awk feels pretty silly... Is there?
Edit: I also tried:
(grep 'CommonTerm:' log_file -B 0 -A 2) | grep "\-\-" -B 1 -A 0 | grep -v "^--$"
but it seems much more clunky than the answers below which was expected ;)
Edit:
There are some great answers coming in, are there any which would let me easily select the nth line after the search term? I see a few might be more easy than others...
awk 'p { print; p=0 }
/CommonTerm/ { p=1 }' file
You can use sed:
sed -n "/^CommonTerm: /{n;p}" log_file
This searches for "CommonTerm: " at the start of the line (^), then skips to the next line (n) and prints it (p).
EDIT: As per the comment thread, if you're using BSD sed rather than GNU sed (likely to be the case on OS X), you need a couple of extra semicolons to get round a bug:
sed -n "/^CommonTerm: /{;n;p;}" log_file
How about:
grep -B 0 -A 1 "CommonTerm" log_file | grep -v "^CommonTerm:" | grep -v "^--$"
I'd do this with awk:
awk 'found{found=0;print;next}/CommonTerm/{found=1}'
For those that have pcregrep installed, this can be done at one shot. Notice the use of \K to reset the starting point of the match
pcregrep -Mo 'CommonTerm.*?\n\K.*?(?=\n)' file

Grep penultimate line

Like the title says, how can I filter with grep (or similar bash tool) the line-before-the-last-line of a (variable length) file?
That is, show everything EXCEPT the penultimate line.
Thanks
You can use a combination of head and tail like this for example:
$ cat input
one
two
three
HIDDEN
four
$ head -n -2 input ; tail -n 1 input
one
two
three
four
From the coreutils head documentation:
‘-n k’
‘--lines=k’
Output the first k lines. However, if k starts with a ‘-’, print all but the last k lines of each file. Size multiplier suffixes are the same as with the -c option.
So the head -n -2 part strips all but the last two lines of its input.
This is unfortunately not portable. (POSIX does not allow negative values in the -n parameter.)
grep is the wrong tool for this. You can wing it with something like
# Get line count
count=$(wc -l <file)
# Subtract one
penultimate=$(expr $count - 1)
# Delete that line, i.e. print all other lines.
# This doesn't modify the file, just prints
# the requested lines to standard output.
sed "${penultimate}d" file
Bash has built-in arithmetic operators which are more elegant than expr; but expr is portable to other shells.
You could also do this in pure sed but I don't want to think about it. In Perl or awk, it would be easy to print the previous line and then at EOF print the final line.
Edit: I thought about sed after all.
sed -n '$!x;1!p' file
In more detail; unless we are at the last line ($), exchange the pattern space and the hold space (remember the current line; retrieve the previous line, if any). Then, unless this is the first line, print whatever is now in the pattern space (the previous line, except when we are on the last line).
awk oneliner: (test with seq 10):
kent$ seq 10|awk '{a[NR]=$0}END{for(i=1;i<=NR;i++)if(i!=NR-1)print a[i]}'
1
2
3
4
5
6
7
8
10
Using ed:
printf '%s\n' H '$-1d' wq | ed -s file # in-place file edit
printf '%s\n' H '$-1d' ',p' wq | ed -s file # write to stdout

Resources