Display data between two fixed patterns - shell

I've random data coming in from a source into a file. I have to read thru the file and extract only that portion of data which falls between particular patterns.
Example: Let's suppose the file myfile.out looks like this.
info-data
some more info-data
=================================================================
some-data
some-data
some-data
=================================================================
======================= CONFIG PARMS : ==========================
some-data
some-data
some-data
=================================================================
======================= REQUEST PARAMS : ========================
some-data
some-data
some-data
=================================================================
===================== REQUEST RESULTS ===========================
some-data
=================================================================
some-data
some-data
=================================================================
Data-I-Need
Data-I-Need
...
...
...
Data-I-Need
==========================F I N I S H============================
some-info-data
I'm looking for the data that matches this particular pattern only
=================================================================
Data-I-Need
Data-I-Need
...
...
...
Data-I-Need
==========================F I N I S H============================
I did try to look around a bit, like
How to select lines between two marker patterns which may occur multiple times with awk/sed
Bash. How to get multiline text between tags
But the awk, sed solutions given there doesn't seem to work, the commands don't give any errors or outputs.
I tried this
PATTERN1="================================================================="
PATTERN2="==========================F I N I S H============================"
awk -v PAT1="$PATTERN1" -v PAT2="$PATTERN2" 'flag{ if (/PAT2/){printf "%s", buf; flag=0; buf=""} else buf = buf $0 ORS}; /PAT1/{flag=1}' myfile.out
and
PATTERN1="================================================================="
PATTERN2="==========================F I N I S H============================"
awk -v PAT1="$PATTERN1" -v PAT2="$PATTERN2" 'PAT1 {flag=1;next} PAT2 {flag=0} flag { print }' file
Maybe it is due to the pattern? Or I'm doing something wrong.
Script will run on RHEL 6.5.

This might work for you (GNU sed):
sed -r '/^=+$/h;//!H;/^=+F I N I S H=+$/!d;x;s/^[^\n]*\n|\n[^\n]*$//g' file
Store a line containing only ='s in the hold space (replacing anything that was there before). Append all other lines to hold space. If the current line is not a line containing ='s followed by F I N I S H followed by ='s, delete it. Otherwise, swap to the hold space, remove the first and last lines and print the remainder.

Assuming you only need the data and not the pattern, using GNU awk:
awk -v RS='\n={26,}[ A-Z]*={28,}\n' 'RT~/F I N I S H/' file
The record separator RS is set to match lines with a series of = and some optional uppercase characters inbetween.
The only statement is check if the record terminator RT (of the current record) has the FINISH keyword in it. If so, awk will print the whole record consisting of multiple lines.

sed can handle this.
Assuming you want to keep the header and footer lines -
$: sed -En '/^=+$/,/^=+F I N I S H=+$/ { /^=+$/ { x; d; }; /^[^=]/ { H; d; }; /^=+F I N I S H=+$/{ H; x; p; q; }; }' infile
=================================================================
Data-I-Need
Data-I-Need
...
...
...
Data-I-Need
==========================F I N I S H============================
If not, use
sed -En '/^=+$/,/^=+F I N I S H=+$/ { /^=+$/ { s/.*//g; x; d; }; /^[^=]/ { H; d; }; /^=+F I N I S H=+$/{ x; p; q; }; }' infile
Note that if you aren't using GNU sed you'll need to insert newlines instead of all those semicolons.
sed -En '
/^=+$/,/^=+F I N I S H=+$/ {
/^=+$/ {
s/.*//g
x
d
}
/^[^=]/ {
H
d
}
/^=+F I N I S H=+$/{
x
p
q
}
}' infile
Data-I-Need
Data-I-Need
...
...
...
Data-I-Need
Breaking it down -
sed -En '...'
The -En says to use extended pattern matching (the -E, which I really only used for the +'s), and not to output anything unless specifically asked (the -n).
/^=+$/,/^=+F I N I S H=+$/ {...}
says to execute these commands only between lines that are all ='s and lines that are all ='s except for F I N I S H in the middle somewhere. All the stuff between the {}'s will be checked on all lines between those. That does mean from the first =+ line, but that's ok, we handle that inside.
(a) /^=+$/ { x; d; };
(b) /^=+$/ { s/.*//g; x; d; };
(a) says on each of the lines that are all ='s, swap (x) the current line (the "pattern space") with the "hold space", then delete (d) the pattern space. That keeps the current line and deletes whatever you might have accumulated above on false starts. (Remember -n keeps anything from printing till we want it.)
(b) says erase the current line first, THEN swap and delete. It will still add a newline. Did you want that removed?
/^[^=]/ { H; d; };
Both versions use this. On any line that does not start with an =, add it to the hold space (H), then deletes the pattern space (d). The delete always restarts the cycle, reading the next record.
(a) /^=+F I N I S H=+$/{ H; x; p; q; };
(b) /^=+F I N I S H=+$/{ x; p; q; };
On any line with the sentinel F I N I S H string between all ='s, (a) will first append (H) the pattern to the hold space - (b) will not. Both will then swap the pattern and hold spaces (x), print (p) the pattern space (which is now the value accumulated into the hold space), and then delete (d) the pattern space, triggering the next cycle.
At that point, you will be outside the initial toggle, so unless another row of all ='s happens, you'll skip all the remaining lines. If one does it will again begin to accumulate records, but will not print them unless it hits another F I N I S H record.
}' infile
This just closes the script and passes in whatever filename you were using. Note that is is not an in-place edit...
Hope that helps.

Although there is already a sed solution there, I like sed for its simplicity:
sed -n '/^==*\r*$/,/^==*F I N I S H/{H;/^==*[^F=]/h;${g;p}}' file
In this sed command we made a range for our commands to be run against. This range starts with a line which begins, contains only and ends to = and then finishes on a line that starts with = and heads to F I N I S H. Now our commands:
H appends each line immediately to hold space. Then /^==*[^F=]/h executes on other sections' header or footer that it replaces hold space with current pattern space.
And at the last line we replaces current pattern space with what is in hold space and then print it using ${g;p}. The whole thing outputs this:
=================================================================
Data-I-Need
Data-I-Need
...
...
...
Data-I-Need
==========================F I N I S H============================

Related

Remove comma from last element in each block

I've got a file with the following contents, and want to remove the last comma (in this case, the comma after the 'c' and 'f').
heading1(
a,
b,
c,
);
some more text
heading2(
d,
e,
f,
);
This has to be used using bash and not Perl or Python etc as these are not installed on my target system. I can use sed, awk etc, but I cannot use sed with the -z argument as I'm using an old version of the utility.
So sed -zi 's/,\n);/\n);/g' $file is off the table.
Any help would be greatly appreciated. Thanks
This might work in your version of sed. Then again it might not.
sed 'x;1d;G;/;$/s/,//;$!s/\n.*//' $file
Rough translation: "Swap this line with the hold space. If this is the first line, do no more with it. Append the hold space to the line in the buffer (so that you're looking at the last line and the current one). If what you have ends with a semicolon, delete the comma. If you're not on the last line of the file, delete the second of the two lines you have (i.e. the current line, which we'll deal with after we see the next one)."
Using awk, RS="^$" to read in the whole file and regex to replace parts of the text:
$ awk -v RS=^$ '{gsub(/,\n\);/,"\n);")}1' file
Some output:
heading1(
a,
b,
c
);
...
This should work with GNU sed and BSD sed on the shown input:
sed -e ':a' -e '/,\n);$/!{N' -e 'ba' -e '}' -e 's/,\n);$/\n);/' file.txt
We concatenate lines in the pattern space until it ends with ,\n);. Then we delete the comma, print (the default) and restart the cycle with a new line.
Simpler and more readable version with GNU sed (that you do not have):
sed ':a;/,\n);$/!{N;ba};s/,\n);$/\n);/' file.txt
Using awk:
awk '
$0==");" {sub(/,$/, "", l)}
FNR!=1 {print l}
{l=$0}
END {print l}'
This might work for you (GNU sed):
sed '/,$/{N;/);$/Ms/,$//M;P;D}' file
If a line ends with a comma, fetch the next line and if this ends in );, remove the comma.
Otherwise, if the following line does not match as above, print/delete the first of the lines and repeat.
Using sed there are broadly two approaches:
Keep multiple lines in the pattern space; or
Keep the previous line in the hold space.
Using just the pattern space means a very concise version:
sed 'N; s/,[[:space:]]*\n*[[:space:]]*)/)/; P; D'
This relies on the pattern space being able to hold multiple lines, and being able to match the newline with \n. Not all versions of sed can do this, but GNU sed can.
This also relies on the implicit behaviours of N, P, and D, which change depending on when end-of-input is reached. Read man sed for the gory details.
Unrolling this to one command per line gets:
sed '
N
s/,[[:space:]]*\n*[[:space:]]*)/)/
P
D
'
If you have only a POSIX version of sed available, you'll need to use the hold space as well. In this case the idea is that when you see the ) in the pattern space, you edit the line that's in the hold space to remove the comma:
sed '1 { h; d; }; /^)/ { x; s/,[[:space:]]*$//; x; }; x; $ { p; x; s/,$//; }'
Unrolling that we get:
sed '
1 {
h
d
}
/^)/ {
x
s/,[[:space:]]*$//
x
}
x
$ {
p
x
s/,[[:space:]]*$//
}
'
Breaking that apart: what follows is a "sed script"; so just put '' around it and "sed" in front of it:
sed '
Start by unconditionally copying the first line from the pattern space to the hold space, and then deleting the pattern space (which forces a skip to the next line)
1 {
h
d
}
For each line that starts with ')', swap the pattern space and hold space (so you now have the previous line in the pattern space), remove the trailing comma (if any), and then swap back again:
/^)/ {
x
s/,[[:space:]]*$//
x
}
Now swap the pattern space with the hold space, so that the hold space now hold the current line and pattern space holds the previous line.
x
Normally contents of the pattern space will be sent to output when the end of the script is reached, but we have one more case to take care of first.
On the last line, print the previous line, then swap to retrieve the last line and then (because we reach the end of the script) print it too. This code will also remove a trailing comma from the last line, but that's optional; you can remove the s command in the following if you don't want that.
$ {
p
x
s/,[[:space:]]*$//
}
Upon reaching the end of the sed script, the pattern space will be printed; so there's no "p" at the end.
As mentioned before, close the quote from the beginning.
'
Note:
If you need to scan ahead more than one line, instead of "x" to swap one line, use "H;g" to append to the hold space and then copy the hold space to the pattern space, then "P;D" to print and remove up to the first newline. (H, P & D are GNU extensions.)

How to do an insertion of text before a multi-line regex using sed or awk?

Given the following input (not literally what follows, but shown with some meta notation):
... any content can be above the match ...
# ... optional comment above the match ...
# ... optional comment above the match can have spaces before it ...
"<key>": ... any content can follow ...
... any content can be below the match ...
where the match is ^\s*"<key>": where the <key> is a placeholder for an actual string. Note that comments are matched by ^\s*#.*.
I want to insert a string of text before the matched <key> and before any comments that are immediately above the matched <key>. There may be a variable number of comments, or none at all.
I've come up with a solution using sed; however, it is very ugly because it uses a tr hack. I'm hoping for a simpler solution using either sed or awk.
First, here's a test case:
test.txt:
{
# 1a
# 2a
"key1": true,
# 1b
# 2b
"key2": false,
}
Now my present solution involves sed and translating all newlines to a delimiter character ($'\x01') to make it easier to do multi-line operations. My example involves a regex that matches multiple comment lines followed by a key-value pair.
# The string to insert before the match
s='# 1x
# 2x
"keyx": null,
'
# Define the key before which to do the insertion:
Key='key2'
# Normalize that string: s -> ns
ns="$(printf '%s' "$s" | tr '\n' $'\x01')"
# Normalize test.txt
tr '\n' $'\x01' < test.txt |
# Perform the multi-line insertion
sed "s/\(^\|\x01\)\(\(\s*#[^\x01]*\x01\)*\)\(\s*\"$Key\":\)/\1$ns\2\4/" |
# Return to standard form with newlines
tr $'\x01' '\n'
The above code when executed with the test.txt input produces the correct and expected output:
{
# 1a
# 2a
"key1": true,
# 1x
# 2x
"keyx": null,
# 1b
# 2b
"key2": false,
}
How might I improve on what I've done above using sed or awk to make for more maintainable code? Specifically:
Is there another way to do this using sed without the tr hack above?
Is there a simpler way to do this using awk?
Following your update that the input could include either no or varying amounts of comments, this is the edit (due to some problems editing it, I'm having to edit out v1, so if you want it back leave a comment.)
sed doesn't do loops or if/elses really, just labels and branches, so trying to pick a range of lines is a bit more complicated it seems. Or at least for my knowledge level.
export key='key2'
s='# 1x\n# 2x\n"keyx": null,\n'
key_pattern='[[:space:]]*"'"$key"'":'
sed -n '
/'"$key_pattern"'/ {
:b; i\
'"$s"'
p; d
}
/^[[:space:]]*#/ {
h; :a; n; H
/^[[:space:]]*#/ ba
/'"$key_pattern"'/ { x; bb; }
x; p; d;
}
p
'
This script breaks into three types of patterns; where the key_pattern matches but is on its own (no comments before):
/'"$key_pattern"'/ { # here :b creates label b,
:b; i\ # and inserts
'"$s"' # the contents of this line
p; d # print then delete from buffer and start next line
}
When a group of comments is followed by the key_pattern:
/^[[:space:]]*#/ { # if comment found
h; # copy pattern space into hold space
:a; # create label a
n; H # get next line, append to hold space.
/^[[:space:]]*#/ ba # if new line is comment, goto `a`
/'"$key_pattern"'/ { x; bb; } # else if our pattern retrieve hold
# and goto `b`
x; p; d; # retrieve hold space, print and delete
}
And finally, When the line doesn't match anything else:
p; # print line and start next.
The following code comes with these assumptions:
Blank line between keys and data
Curly braces not elsewhere
awk '/key2/{$0 = "# 1x\n# 2x\n\"keyx\": null,\n\n"$0}ORS = RT' RS='[{}\n]\n' input_file
The main focus here is on setting up the RS value so it delimits each record

Extracting lines between two patterns and including line above the first and below the second

Having the following text file, I need to extract and print strings between two patterns and ,also, include the line above the first pattern and the one following the second
asdgs sdagasdg sdagdsag
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
I have found many solution with sed and awk to extract between two tags as the following
sed -n '/FIRST/,/SECOND/p' FileName
but how to include the line before and after the pattern?
Desired output:
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
As you've asked for an sed/awk solution (and everyone is scared of ed ;-), here's one way you can do it in awk:
awk '/FIRST/{print p; f=1} {p=$0} /SECOND/{c=1} f; c--==0{f=0}' file
When the first pattern is matched, print the previous line p and set the print flag f. When the second pattern is matched set c to 1. If f is 1 (true), the current line will be printed. c--==0 is only true the line after the second pattern is matched.
Another way you can do this is by looping through the file twice:
awk 'NR==FNR{if(/FIRST/)s=NR;else if(/SECOND/)e=NR;next}FNR>=s-1&&FNR<=e+1' file file
The first pass through the file loops through the file and records the line numbers. The second prints the lines in the range.
The advantage of the second approach is that it is trivially easy to print M lines before and N lines after the range, simply by changing the numbers in the script.
To use shell variables instead of hard-coded patterns, you can pass the variables like this:
awk -v first="$first" -v second="$second" '...' file
Then use $0 ~ first instead of /FIRST/.
I'd say
sed '/FIRST/ { x; G; :a n; /SECOND/! ba; n; q; }; h; d' filename
That is:
/FIRST/ { # If a line matches FIRST
x # swap hold buffer and pattern space,
G # append hold buffer to pattern space.
# We saved the last line before the match in the hold
# buffer, so the pattern space now contains the previous
# and the matching line.
:a # jump label for looping
n # print pattern space, fetch next line.
/SECOND/! ba # unless it matches SECOND, go back to :a
n # fetch one more line after the match
q # quit (printing that last line in the process)
}
h # If we get here, it's before the block. Hold the current
# line for later use.
d # don't print anything.
Note that BSD sed (as comes with Mac OS X and *BSD) is a bit picky about branching commands. If you're working on one of those platforms,
sed -e '/FIRST/ { x; G; :a' -e 'n; /SECOND/! ba' -e 'n; q; }; h; d' filename
should work.
This will work whether or not there's multiple ranges in your file:
$ cat tst.awk
/FIRST/ { print prev; gotBeg=1 }
gotBeg {
print
if (gotEnd) gotBeg=gotEnd=0
if (/SECOND/) gotEnd=1
}
{ prev=$0 }
$ awk -f tst.awk file
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
If you ever need to print more than 1 line before FIRST change prev to an array. If you ever need to print more than 1 line after SECOND, change gotEnd to a count.
sed '#n
H;$!d
x;s/\n/²/g
/FIRST.*SECOND/!b
s/.*²\([^²]*²[^²]*FIRST\)/\1/
:a
s/\(FIRST.*SECOND[^²]*²[^²]*\)².\{1,\}/\1/
ta
s/²/\
/g
p' YourFile
POSIX sed version (GNU sed use --posix)
take the following SECOND pattern also if on the same line, easy to adapt for taking at least one new line between
#n : don't print unless expres request (like p)
H;$!d : append each line to buffer, if not last line, delete current line and loop
x;s/\n/²/g : load buffer and replace any new line with another character (here i use ²) because posix sed does not allow a [^\n]
/FIRST.*SECOND/!b : if no pattern presence, quit without output
s/.*²\([^²]*²[^²]*FIRST\)/\1/ : remove everything before line before your first pattern
:a : label for a goto (used later)
s/\(FIRST.*SECOND[^²]*²[^²]*\)².\{1,\}/\1/ : remove everything after a line after your second pattern. It take the biggest string so last occurence of the pattern is the reference
ta : if last s/// occur, got to label a. It cyle, until first SECOND pattern occuring in file (after FIRST)
s/²/\
/g : put back the new lines
p : print the result
based on the Tom's comment: if the file isn't large we can just store it in the array, and then loop over it:
awk '{a[++i]=$0} /FIRST/{s=NR} /SECOND/{e=NR} END {for(i=s-1;i<e+1;i++) print a[i]}'
I would do it with Perl personally. We have the 'range operator' which we can use to detect if we're between two patterns:
if ( m/FIRST/ .. /SECOND/ )
That's the easy part. What's a little less easy is 'catching' the preceeding and next lines. So I set a $prev_line value, so that when I first hit that test, I know what to print. And I clear that $prev_line, both because then it's empty when I print it again, but also because then I can spot the transition at the end of the range.
So something like this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_line = " ";
while (<DATA>) {
if ( m/FIRST/ .. /SECOND/ ) {
print $prev_line;
$prev_line = '';
print;
}
else {
if ( not $prev_line ) {
print;
}
$prev_line = $_;
}
}
__DATA__
asdgs sdagasdg sdagdsag
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
This might work for you (GNU sed):
sed '/FIRST/!{h;d};H;g;:a;n;/SECOND/{n;q};$!ba' file
If the current line is not FIRST save it in the hold space and delete the current line. If the line is FIRST append it to the saved line and then print both and any further lines untill SECOND when an additional line is printed and the script exited.

Delete lines before and after a match in bash (with sed or awk)?

I'm trying to delete two lines either side of a pattern match from a file full of transactions. Ie. find the match then delete two lines before it, then delete two lines after it and then delete the match. The write this back to the original file.
So the input data is
D28/10/2011
T-3.48
PINITIAL BALANCE
M
^
and my pattern is
sed -i '/PINITIAL BALANCE/,+2d' test.txt
However this is only deleting two lines after the pattern match and then deleting the pattern match. I can't work out any logical way to delete all 5 lines of data from the original file using sed.
an awk one-liner may do the job:
awk '/PINITIAL BALANCE/{for(x=NR-2;x<=NR+2;x++)d[x];}{a[NR]=$0}END{for(i=1;i<=NR;i++)if(!(i in d))print a[i]}' file
test:
kent$ cat file
######
foo
D28/10/2011
T-3.48
PINITIAL BALANCE
M
x
bar
######
this line will be kept
here
comes
PINITIAL BALANCE
again
blah
this line will be kept too
########
kent$ awk '/PINITIAL BALANCE/{for(x=NR-2;x<=NR+2;x++)d[x];}{a[NR]=$0}END{for(i=1;i<=NR;i++)if(!(i in d))print a[i]}' file
######
foo
bar
######
this line will be kept
this line will be kept too
########
add some explanation
awk '/PINITIAL BALANCE/{for(x=NR-2;x<=NR+2;x++)d[x];} #if match found, add the line and +- 2 lines' line number in an array "d"
{a[NR]=$0} # save all lines in an array with line number as index
END{for(i=1;i<=NR;i++)if(!(i in d))print a[i]}' #finally print only those index not in array "d"
file # your input file
sed will do it:
sed '/\n/!N;/\n.*\n/!N;/\n.*\n.*PINITIAL BALANCE/{$d;N;N;d};P;D'
It works this way:
if sed has only one string in pattern space it joins another one
if there are only two it joins the third one
if it does natch to pattern LINE + LINE + LINE with BALANCE it joins two following strings, deletes them and goes at the beginning
if not, it prints the first string from pattern and deletes it and goes at the beginning without swiping the pattern space
To prevent the appearance of pattern on the first string you should modify the script:
sed '1{/PINITIAL BALANCE/{N;N;d}};/\n/!N;/\n.*\n/!N;/\n.*\n.*PINITIAL BALANCE/{$d;N;N;d};P;D'
However, it fails in case you have another PINITIAL BALANCE in string which are going to be deleted. However, other solutions fails too =)
For such a task, I would probably reach for a more advanced tool like Perl:
perl -ne 'push #x, $_;
if (#x > 4) {
if ($x[2] =~ /PINITIAL BALANCE/) { undef #x }
else { print shift #x }
}
END { print #x }' input-file > output-file
This will remove 5 lines from the input file. These lines will be the 2 lines before the match, the matched line, and the two lines afterwards. You can change the total number of lines being removed modifying #x > 4 (this removes 5 lines) and the line being matched modifying $x[2] (this makes the match on the third line to be removed and so removes the two lines before the match).
A more simple and easy to understand solution might be:
awk '/PINITIAL BALANCE/ {print NR-2 "," NR+2 "d"}' input_filename \
| sed -f - input_filename > output_filename
awk is used to make a sed-script that deletes the lines in question and the result is written on the output_filename.
This uses two processes which might be less efficient than the other answers.
This might work for you (GNU sed):
sed ':a;$q;N;s/\n/&/2;Ta;/\nPINITIAL BALANCE$/!{P;D};$q;N;$q;N;d' file
save this code into a file grep.sed
H
s:.*::
x
s:^\n::
:r
/PINITIAL BALANCE/ {
N
N
d
}
/.*\n.*\n/ {
P
D
}
x
d
and run a command like this:
`sed -i -f grep.sed FILE`
You can use it so either:
sed -i 'H;s:.*::;x;s:^\n::;:r;/PINITIAL BALANCE/{N;N;d;};/.*\n.*\n/{P;D;};x;d' FILE

Delete n1 previous lines and n2 lines following with respect to a line containing a pattern

sed -e '/XXXX/,+4d' fv.out
I have to find a particular pattern in a file and delete 5 lines above and 4 lines below it simultaneously. I found out that the line above removes the line containing the pattern and four lines below it.
sed -e '/XXXX/,~5d' fv.out
In sed manual it was given that ~ represents the lines which is followed by the pattern. But when i tried it, it was the lines following the pattern that was deleted.
So, how do I delete 5 lines above and 4 lines below a line containing the pattern simultaneously?
One way using sed, assuming that the patterns are not close enough each other:
Content of script.sed:
## If line doesn't match the pattern...
/pattern/ ! {
## Append line to 'hold space'.
H
## Copy content of 'hold space' to 'pattern space' to work with it.
g
## If there are more than 5 lines saved, print and remove the first
## one. It's like a FIFO.
/\(\n[^\n]*\)\{6\}/ {
## Delete the first '\n' automatically added by previous 'H' command.
s/^\n//
## Print until first '\n'.
P
## Delete data printed just before.
s/[^\n]*//
## Save updated content to 'hold space'.
h
}
### Added to fix an error pointed out by potong in comments.
### =======================================================
## If last line, print lines left in 'hold space'.
$ {
x
s/^\n//
p
}
### =======================================================
## Read next line.
b
}
## If line matches the pattern...
/pattern/ {
## Remove all content of 'hold space'. It has the five previous
## lines, which won't be printed.
x
s/^.*$//
x
## Read next four lines and append them to 'pattern space'.
N ; N ; N ; N
## Delete all.
s/^.*$//
}
Run like:
sed -nf script.sed infile
A solution using awk:
awk '$0 ~ "XXXX" { lines2del = 5; nlines = 0; }
nlines == 5 { print lines[NR%5]; nlines-- }
lines2del == 0 { lines[NR%5] = $0; nlines++ }
lines2del > 0 { lines2del-- }
END { while (nlines-- > 0) { print lines[(NR - nlines) % 5] } }' fv.out
Update:
This is the script explained:
I remember the last 5 lines in the array lines using rotatory indexes (NR%5; NR is the record number; in this case lines).
If I find the pattern in the current line ($0 ~ "XXXX; $0 being the current record: in this case a line; and ~ being the Extended Regular Expression match operator), I reset the number of lines read and note that I have 5 lines to delete (including the current line).
If I already read 5 lines, I print the current line.
If I do not have lines to delete (which is also true if I had read 5 lines, I put the current line in the buffer and increment the number of lines. Note how the number of lines is decremented and then incremented if a line is printed.
If lines need to be deleted, I do not print anything and decrement the number of lines to delete.
At the end of the script, I print all the lines that are in the array.
My original version of the script was the following, but I ended up optimizing it to the above version:
awk '$0 ~ "XXXX" { lines2del = 5; nlines = 0; }
lines2del == 0 && nlines == 5 { print lines[NR%5]; lines[NR%5] }
lines2del == 0 && nlines < 5 { lines[NR%5] = $0; nlines++ }
lines2del > 0 { lines2del-- }
END { while (nlines-- > 0) { print lines[(NR - nlines) % 5] } }' fv.out
awk is a great tool ! I strongly recommend that you find a tutorial on the net and read it. One important thing: awk works with Extended Regular Expressions (ERE). Their syntax is a little different from Standard Regular Expression (RE) used in sed, but all that can be done with RE can be done with ERE.
The idea is to read 5 lines without printing them. If you find the pattern, delete the unprinted lines and the 4 lines bellow. If you do not find the pattern, remember the current line and print the 1st unprinted line. At the end, print what is unprinted.
sed -n -e '/XXXX/,+4{x;s/.*//;x;d}' -e '1,5H' -e '6,${H;g;s/\n//;P;s/[^\n]*//;h}' -e '${g;s/\n//;p;d}' fv.out
Of course, this only works if you have one occurrence of your pattern in the file. If you have many, you need to read 5 new lines after finding your pattern, and it gets complicated if you again have your pattern in those lines. In this case, I think sed is not the right tool.
This might work for you:
sed 'H;$!d;g;s/\([^\n]*\n\)\{5\}[^\n]*PATTERN\([^\n]*\n\)\{5\}//g;s/.//' file
or this:
awk --posix -vORS='' -vRS='([^\n]*\n){5}[^\n]*PATTERN([^\n]*\n){5}' 1 file
a more efficient sed solution:
sed ':a;/PATTERN/,+4d;/\([^\n]*\n\)\{5\}/{P;D};$q;N;ba' file
If you are happy to output the result to a file instead of stdout, vim can do it quite efficiently:
vim -c 'g/pattern/-5,+4d' -c 'w! outfile|q!' infile
or
vim -c 'g/pattern/-5,+4d' -c 'x' infile
to edit the file in-place.

Resources