Get lines between two patterns - shell

I'm using ksh shell and below is the sample text in a file
AAA
ccc
ddd
eee
XXX
AAA
lll
mmm
eee
YYY
from the above text, I want to print only the line between AAA and XXX and final output will be like
AAA
ccc
ddd
eee
XXX

You would use sed for a task like that. It supports a syntax like from lines matching AAA print everything up to and including a line matching XXX.
Alas your input is a bit ill-formed because the starting pattern AAA occurs twice without a matching XXX for the second AAA. sed default behavior is to match from the second AAA until the last line in the input when the XXX is not found after the second AAA. The details are explained in the last section of the sed faq.
But there is also a solution how to match only the first block: this code is directly taken from the FAQ and adopted to your question:
sed -n '/AAA/{:a; N;/XXX/! b a; p;} yourfile.txt'
/AAA/ and /XXX/ are sed expression to match your start and end line
/AAA/{:a;N;/XXX/! ba; ... } is a loop: from a line matching AAA it it
executes a N command reading the next line
if the line does not match /XXX/! (notice the ! which negates the match) it branches back (b) to label a reading the next line.
only when the line matches XXX we leave the branch loop and print p the lines
If your input has always a matching XXX for every AAA and those blocks are not nested, the command is much more intuitive:
sed -n '/AAA/,/XXX/ p' yourfile.txt

Related

grep for Error and print all the lines containing 2 strings above and below Error

saaa vcahJJ HKak vk
Import xxx xXXXXX xxxx
aaaa aaaa aaaa ffffff
hhhhhh hhhhhh hhh hhh hhhhhh
Error reading readStatus api
aaa hhhh aaa aaaaa
gggggggg ggggg xxxxxxxxxx
uuuu hhhhhhhh fffffffff
query run ends
qidIdih II v iQE Iqe
I want to find the 'Error' string in the file containing above logs and then print all the info available between 2 strings 'Import' and 'ends'.
How can I do this using grep/sed
Tried this
but didn't get much.
Note: I dont know how many lines will be before and after. It may vary from above sample I have provided
How about:
$ awk 'BEGIN{RS=ORS="ends\n"} /Error/' file
RS is the input record separator which needs to be ends. ORSgets the same value for output purposes. Also, your example had /^Error/ but Error does not start the record (^).
Grep's -A 1 option will give you one line after; -B 1 will give you one line before; and -C 1 combines both to give you one line both before and after.
grep -C 1 "Error" <logfile>
As per your requirement, You can use-
sed -n '/Import/,/ends/p' filename
Here you are:
out=$( sed -n '/^Import/,/end$/p' file )
echo "$out" | grep Error >/dev/null && echo "$out"
This will capture the text between "Import" and "end" and print it only if the extracted text contains "Error".
You can try this sed
sed '/^Import/!d;:A;N;/\nImport/{h;s/.*\n//;bA};/ends$/!bA;h;s/\nError//;tB;d;:B;x' infile
Explanation :
sed '
/^Import/!d # if a line start with Import
:A
N # get an other line
/\nImport/{h;s/.*\n//;bA} # if the last line start with Import
# keep only this last line and return to A
/ends$/!bA # If the last line end with ends
h # keep all the lines in the hold space
s/\nError// # look for a line which start with Error
tB # if find jump to B
d # not find, delete all and return to the start
:B
x # restore all the lines and print
' infile

sed not working as expected (trying to get value between two matches in a string)

I have a file (/tmp/test) the has a the string "aaabbbccc" in it
I want to extract "bbb" from the string with sed.
Doing this returns the entire string:
sed -n '/aaa/,/ccc/p' /tmp/test
I just want to return bbb from the string with sed (I am trying to learn sed so not interested in other solutions for this)
Sed works on a line basic, and a,b{action} will run action for lines matching a until lines matching b. In your case
sed -n '/aaa/,/ccc/p'
will start printing lines when /aaa/ is matched, and stop when /ccc/ is matched which is not what you want.
To manipulate a line there is multiply options, one is s/search/replace/ which can be utilized to remove the leading aaa and trailing ccc:
% sed 's/^aaa\|ccc$//g' /tmp/test
bbb
Breakdown:
s/
^aaa # Match literal aaa in beginning of string
\| # ... or ...
ccc$ # Match literal ccc at the end of the sting
// # Replace with nothing
g # Global (Do until there is no more matches, normally when a match is
# found and replacement is made this command stops replacing)
If you are not sure how many a's and c's you have you can use:
% sed 's/^aa*\|cc*$//g' /tmp/test
bbb
Which will match literal a followed by zero or more a's at the beginning of the line. Same for the c's but just at the end.
With GNU sed:
sed 's/aaa\(.*\)ccc/\1/' /tmp/test
Output:
bbb
See: The Stack Overflow Regular Expressions FAQ

Search for pattern ABC and replace pattern XYZ in line

I have to modify contents of a text file
I wil read the file line by line and wheck if the line contains a pattern ABC
If it contains the pattern ABC, I have to replace another pattern XYZ in that line to PQR
This is the code I made
while read line
do
#command to search for pattern ABC. If yes, replace another pattern XYZ in this line with PQR
done < myfilename.txt
With sed, you don't need a while read loop :
sed -i '/ABC/s/XYZ/PQR/g' myfilename.txt
/ABC/ is the address, only lines matching this address will be processed.
s is the substitution command that replace XYZ with ̀PQR.
g is a modifier to replace all occurrences of XYZ (g is for general replacement)
the -i option is for applying the changes in place (the file is overwritten).
You can use ed when the sed -i is not supported:
ed -s input <<< $',s/ABC/DEF/g\nw'
or
ed -s input << EOF
,s/ABC/DEF/g
w
q
EOF

SED AWK to strip data from log file

Hi I have the following entries in log file.
I need to produce a list of names in the name field if I see Denied on the line above. So I need to get something like:
Sally
Matt
Linda
Can you help me with this and I would appreciate if you could explain the command so I can use it later on for other logs.
<!-- user 1 -- >
<ABC 12345 "123" text="*Denied: ths is aa test status="0" >
<key flags="tdst" name="sally" />
<userbody>
</Status>
<!-- user 2 -- >
<ABD 12345 "123" text="*Denied: ths is aa test status="0" >
<key flags="tdst" name="Matt" />
<userbody>
</Status>
<!-- user 3 -- >
<ABD 12345 "123" text="*Denied: ths is aa test status="0" >
<key flags="tdst" name="Linda" />
<userbody>
</Status>
Regards
This GNU sed could work
sed -n -r '/Denied:/{N; s/^.*name="([^"]*)".*$/\1/; p}' file
n is skip printing lines
r using extended regular expressions, used for grouping here, to not escape () characters
N is reading next line and adding it to pattern
space
s/input/output/ is substitution
^ is start of line, so ^.*name=" will find everything till [^"] first next quote.
$ is end of line
[^"] is any character which is not " (set negation)
\1 is taking only matching group i.e. ([^"]*)
p is printing line (when prev condition Denied is fullfiled on processed 2
lines
output
sally
Matt
Linda
Try this:
sed -rn '/Denied/{n;s#(.+)(name="(\w+))"(.+)#\3#p}' < sample.txt
/Denied/ - search for the keyword
{n; - if found then read next line
s#(.+)(name="(\w+))"(.+)#\3#p - lookup regex groups and print out only the third one, which is equal to the name within quotes in your data sample.

Extracting lines between two patterns and including line above the first and below the second

Having the following text file, I need to extract and print strings between two patterns and ,also, include the line above the first pattern and the one following the second
asdgs sdagasdg sdagdsag
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
I have found many solution with sed and awk to extract between two tags as the following
sed -n '/FIRST/,/SECOND/p' FileName
but how to include the line before and after the pattern?
Desired output:
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
As you've asked for an sed/awk solution (and everyone is scared of ed ;-), here's one way you can do it in awk:
awk '/FIRST/{print p; f=1} {p=$0} /SECOND/{c=1} f; c--==0{f=0}' file
When the first pattern is matched, print the previous line p and set the print flag f. When the second pattern is matched set c to 1. If f is 1 (true), the current line will be printed. c--==0 is only true the line after the second pattern is matched.
Another way you can do this is by looping through the file twice:
awk 'NR==FNR{if(/FIRST/)s=NR;else if(/SECOND/)e=NR;next}FNR>=s-1&&FNR<=e+1' file file
The first pass through the file loops through the file and records the line numbers. The second prints the lines in the range.
The advantage of the second approach is that it is trivially easy to print M lines before and N lines after the range, simply by changing the numbers in the script.
To use shell variables instead of hard-coded patterns, you can pass the variables like this:
awk -v first="$first" -v second="$second" '...' file
Then use $0 ~ first instead of /FIRST/.
I'd say
sed '/FIRST/ { x; G; :a n; /SECOND/! ba; n; q; }; h; d' filename
That is:
/FIRST/ { # If a line matches FIRST
x # swap hold buffer and pattern space,
G # append hold buffer to pattern space.
# We saved the last line before the match in the hold
# buffer, so the pattern space now contains the previous
# and the matching line.
:a # jump label for looping
n # print pattern space, fetch next line.
/SECOND/! ba # unless it matches SECOND, go back to :a
n # fetch one more line after the match
q # quit (printing that last line in the process)
}
h # If we get here, it's before the block. Hold the current
# line for later use.
d # don't print anything.
Note that BSD sed (as comes with Mac OS X and *BSD) is a bit picky about branching commands. If you're working on one of those platforms,
sed -e '/FIRST/ { x; G; :a' -e 'n; /SECOND/! ba' -e 'n; q; }; h; d' filename
should work.
This will work whether or not there's multiple ranges in your file:
$ cat tst.awk
/FIRST/ { print prev; gotBeg=1 }
gotBeg {
print
if (gotEnd) gotBeg=gotEnd=0
if (/SECOND/) gotEnd=1
}
{ prev=$0 }
$ awk -f tst.awk file
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
If you ever need to print more than 1 line before FIRST change prev to an array. If you ever need to print more than 1 line after SECOND, change gotEnd to a count.
sed '#n
H;$!d
x;s/\n/²/g
/FIRST.*SECOND/!b
s/.*²\([^²]*²[^²]*FIRST\)/\1/
:a
s/\(FIRST.*SECOND[^²]*²[^²]*\)².\{1,\}/\1/
ta
s/²/\
/g
p' YourFile
POSIX sed version (GNU sed use --posix)
take the following SECOND pattern also if on the same line, easy to adapt for taking at least one new line between
#n : don't print unless expres request (like p)
H;$!d : append each line to buffer, if not last line, delete current line and loop
x;s/\n/²/g : load buffer and replace any new line with another character (here i use ²) because posix sed does not allow a [^\n]
/FIRST.*SECOND/!b : if no pattern presence, quit without output
s/.*²\([^²]*²[^²]*FIRST\)/\1/ : remove everything before line before your first pattern
:a : label for a goto (used later)
s/\(FIRST.*SECOND[^²]*²[^²]*\)².\{1,\}/\1/ : remove everything after a line after your second pattern. It take the biggest string so last occurence of the pattern is the reference
ta : if last s/// occur, got to label a. It cyle, until first SECOND pattern occuring in file (after FIRST)
s/²/\
/g : put back the new lines
p : print the result
based on the Tom's comment: if the file isn't large we can just store it in the array, and then loop over it:
awk '{a[++i]=$0} /FIRST/{s=NR} /SECOND/{e=NR} END {for(i=s-1;i<e+1;i++) print a[i]}'
I would do it with Perl personally. We have the 'range operator' which we can use to detect if we're between two patterns:
if ( m/FIRST/ .. /SECOND/ )
That's the easy part. What's a little less easy is 'catching' the preceeding and next lines. So I set a $prev_line value, so that when I first hit that test, I know what to print. And I clear that $prev_line, both because then it's empty when I print it again, but also because then I can spot the transition at the end of the range.
So something like this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_line = " ";
while (<DATA>) {
if ( m/FIRST/ .. /SECOND/ ) {
print $prev_line;
$prev_line = '';
print;
}
else {
if ( not $prev_line ) {
print;
}
$prev_line = $_;
}
}
__DATA__
asdgs sdagasdg sdagdsag
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
This might work for you (GNU sed):
sed '/FIRST/!{h;d};H;g;:a;n;/SECOND/{n;q};$!ba' file
If the current line is not FIRST save it in the hold space and delete the current line. If the line is FIRST append it to the saved line and then print both and any further lines untill SECOND when an additional line is printed and the script exited.

Resources