Using awk or sed, how to get contents between two parameters , parameters occurs multiple time in a file
For instance, file contents
Entering AAA
12
Entering BBB
13
Leaving AAA
14
Leaving AAA
15
Leaving AAA
16
Leaving BBB
Currently I am using
cat 1.txt |sed -n '/Entering AAA/,/Leaving AAA/ p'
with this , I am getting contents between first occurrence of "Entering AAA" and first occurrence of "Leaving AAA"
ie
Entering AAA
12
Entering BBB
13
Leaving AAA
But , I want contents from first occurrence of "Entering AAA" to last occurrence of "Leaving AAA"
Expected output :
Entering AAA
12
Entering BBB
13
Leaving AAA
14
Leaving AAA
15
Leaving AAA
Kindly help.
In any awk using a 2-pass approach:
$ awk 'NR==FNR{if (/Leaving AAA/) end=NR; next} /Entering AAA/{f=1} f; FNR==end{exit}' file file
Entering AAA
12
Entering BBB
13
Leaving AAA
14
Leaving AAA
15
Leaving AAA
Alternatively doing it in one pass with GNU awk for multi-char RS and RT:
$ awk -v RS='Entering AAA.*Leaving AAA' 'RT{print RT}' file
Entering AAA
12
Entering BBB
13
Leaving AAA
14
Leaving AAA
15
Leaving AAA
Short tac + awk trick:
tac file | awk '/Leaving AAA/,/Entering AAA/' | tac
The output:
Entering AAA
12
Entering BBB
13
Leaving AAA
14
Leaving AAA
15
Leaving AAA
Here is an alternative solution using perl to get this done in a single pass in slurp mode:
perl -0777 -pe 's/(?ms).*?(^Entering AAA.*Leaving AAA\R*).*/$1/' file
Entering AAA
12
Entering BBB
13
Leaving AAA
14
Leaving AAA
15
Leaving AAA
.* is a greedy pattern that ensures to match longest string between start and end patterns.
(?ms) enables MULTILINE and DOTALL modes for this regex
You may also use a back-reference:
perl -0777 -pe 's/(?ms).*?(^Entering (AAA).*Leaving \2\R*).*/$1/' file
Related
This question already has answers here:
How to print lines between two patterns, inclusive or exclusive (in sed, AWK or Perl)?
(9 answers)
Closed 3 years ago.
Using sed, AWK (or Perl), how do you print all lines between (the first instance of) two patterns, exclusive of the patterns?1
That is, given as input:
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
Or possibly even:
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
fff
PATTERN1
ggg
hhh
iii
PATTERN2
jjj
I would expect, in both cases:
bbb
ccc
ddd
1 A number of users voted to close this question as a duplicate of this one. In the end, I provided a gist that proves they are different. The question is also superficially similar to a number of others, but there is no exact match, and none of them are of high quality, and, as I believe that this specific problem is the one most commonly faced, it deserves a clear formulation, and a set of correct, clear answers.
If you have GNU sed (tested using version 4.7 on Mac OS X), the simplest solution could be:
sed '0,/PATTERN1/d;/PATTERN2/Q'
Explanation:
The d command deletes from line 1 to the line matching /PATTERN1/ inclusive.
The Q command then exits without printing on the first line matching /PATTERN2/.
If the file has only once instance of the pattern, or if you don't mind extracting all of them, and you want a solution that doesn't depend on a GNU extension, this works:
sed -n '/PATTERN1/,/PATTERN2/{//!p}'
Explanation:
Note that the empty regular expression // repeats the last regular expression match.
With awk (assumes that PATTERN1 and PATTERN2 are always present in pairs and either of them do not occur inside a pair)
$ cat ip.txt
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
fff
PATTERN1
ggg
hhh
iii
PATTERN2
jjj
$ awk '/PATTERN2/{exit} f; /PATTERN1/{f=1}' ip.txt
bbb
ccc
ddd
/PATTERN1/{f=1} set flag if /PATTERN1/ is matched
/PATTERN2/{exit} exit if /PATTERN2/ is matched
f; print input line if flag is set
Generic solution, where the block required can be specified
$ awk -v b=1 '/PATTERN2/ && c==b{exit} c==b; /PATTERN1/{c++}' ip.txt
bbb
ccc
ddd
$ awk -v b=2 '/PATTERN2/ && c==b{exit} c==b; /PATTERN1/{c++}' ip.txt
2
46
This might work for you (GNU sed);
sed -n '/PATTERN1/{:a;n;/PATTERN2/q;p;$!ba}' file
This prints only the lines between the first set of delimiters, or if the second delimiter does not exist, to the end of the file.
I attempted twice to answer, but the questions switched hold/duplicate statuses..
Borrowing input from #Sundeep and adding the answer which I shared in the question comments.
Using awk
awk -v x=0 -v y=1 ' /PATTERN1/&&y { x=1;next } /PATTERN2/&&y { x=0;y=0; next } x ' file
with Perl
perl -0777 -ne ' while( /PATTERN1.*?\n(.+?)^[^\n]*?PATTERN2/msg ) { print $1 if $x++ <1 } '
Results:
$ cat ip.txt
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
PATTERN1
2
46
PATTERN2
xyz
$
$ awk -v x=0 -v y=1 ' /PATTERN1/&&y { x=1;next } /PATTERN2/&&y { x=0;y=0; next } x ' ip.txt
bbb
ccc
ddd
$ perl -0777 -ne ' while( /PATTERN1.*?\n(.+?)^[^\n]*?PATTERN2/msg ) { print $1 if $x++ <1 } ' ip.txt
bbb
ccc
ddd
$
To make it generic
awk here y is the input
awk -v x=0 -v y=2 ' /PATTERN1/ { x++;next } /PATTERN2/ { if(x==y) exit } x==y ' ip.txt
2
46
perl check ++$x against the occurence.. here it is 2
perl -0777 -ne ' while( /PATTERN1.*?\n(.+?)^[^\n]*?PATTERN2/msg ) { print $1 if ++$x==2 } ' ip.txt
2
46
Adding more solutions(possible ways here, for fun :) and not at all claiming that these are better than usual ones) All tested and written in GNU awk. Also tested with given examples only.
1st Solution:
awk -v RS="" -v FS="PATTERN2" -v ORS="" '$1 ~ /\nPATTERN1\n/{sub(/.*PATTERN1\n/,"",$1);print $1}' Input_file
2nd solution:
awk -v RS="" -v ORS="" 'match($0,/PATTERN1[^(PATTERN2)]*/){val=substr($0,RSTART,RLENGTH);gsub(/^PATTERN1\n|^$\n/,"",val);print val}' Input_file
3rd solution:
awk -v RS="" -v OFS="\n" -v ORS="" 'sub(/PATTERN2.*/,"") && sub(/.*PATTERN1/,"PATTERN1"){$1=$1;sub(/^PATTERN1\n/,"")} 1' Input_file
In all above codes output will be as follows.
bbb
ccc
ddd
Using GNU sed:
sed -nE '/PATTERN1/{:s n;/PATTERN2/q;p;bs}'
-n will prune all but lines between PATTERN1 and PATTERN2 including both, because there will be p printout command.
every sed range check if it's true will execute only one the next, so {} grouping is mandated..
Drop PATTERN1 by n command (means next), if reach the first PATTERN2 outrightly quit otherwise print the line then and continue the next line within that boundary.
I'm trying to parse a log file that will have lines like this:
aaa bbb ccc: [DDD] efg oi
aaa bbb ccc: lll [DDD] efg oo
aaa bbb ccc: [DDD]
where [DDD] can be at any place in line.
Only one thing will be between [ and ] in any line
Using awk and space as a delimiter, how can I print 1st, 3rd and all data (whole string) between [ and ]?
Expected output: aaa ccc: DDD
gawk(GNU awk) approach:
Let's say we a file with the following line:
aaa bbb ccc: ddd [fff] ggg hhh
The command:
awk '{match($0,/\[([^]]+)\]/, a); print $1,$3,a[1]}' file
The output:
aaa ccc: fff
match(string, regexp [, array]) Search string for the longest, leftmost substring matched by the regular expression regexp and return the character position (index) at which that substring begins (one, if it starts at the beginning of string). If no match is found, return zero..
Given:
$ cat file
aaa bbb ccc: [DDD] efg oi
aaa bbb [ccc:] lll DDD efg oo
aaa [bbb] ccc: DDD
(note -- changed from the OP's example)
In POSIX awk:
awk 'BEGIN{fields[1]; fields[3]}
{s=""
for (i=1;i<=NF;i++)
if ($i~/^\[/ || i in fields)
s=i>1 ? s OFS $i : $i
gsub(/\[|\]/,"",s)
print s
}' file
Prints:
aaa ccc: DDD
aaa ccc:
aaa bbb ccc:
This does not print the field twice if it is both enclosed in [] and in the selected fields array. (i.e., [aaa] bbb ccc: does not print aaa twice) It will also print in correct field order if you have aaa [bbb] ccc ...
awk '$5=="[DDD]"{gsub("[\\[\\]]","");print $1,$3,$5}' file
or
awk '$5=="[DDD]"{print $1,$3, substr($5,2,3)}' file
aaa ccc: DDD
Given this file
$ cat foo.txt
AAA
111
BBB
222
CCC
333
I would like to replace the first line after BBB with 999. I came up with this command
awk '/BBB/ {f=1; print; next} f {$1=999; f=0} 1' foo.txt
but I am curious to any shorter commands with either awk or sed.
This might work for you (GNU sed)
sed '/BBB/!b;n;c999' file
If a line contains BBB, print that line and then change the following line to 999.
!b negates the previous address (regexp) and breaks out of any processing, ending the sed commands, n prints the current line and then reads the next into the pattern space, c changes the current line to the string following the command.
This is some shorter:
awk 'f{$0="999";f=0}/BBB/{f=1}1' file
f {$0="999";f=0} if f is true, set line to 999 and f to 0
/BBB/ {f=1} if pattern match set f to 1
1 print all lines, since 1 is always true.
can use sed also, it's shorter
sed '/BBB/{n;s/.*/999/}'
$ awk '{print (f?999:$0); f=0} /BBB/{f=1}' file
AAA
111
BBB
999
CCC
333
awk '/BBB/{print;getline;$0="999"}1' your_file
sed 's/\(BBB\)/\1\
999/'
works on mac
I want to print the output of file1 to first column in new file and file 2 to the second column in the new file.
Something like this.
file1
AAA
BBB
CCC
file2
XXX
YYY
ZZZ
file3
AAA XXX
BBB YYY
CCC ZZZ
paste command will do this job out-of-the-box:
paste file1 file2 > file3
AAA XXX
BBB YYY
CCC ZZZ
Try this click here
You can use paste and format using cut to remove leading and trailing spaces
I want to find a line in a txt file and then insert string 3 lines above the found line
Input:
aaa
bbb
ccc
ddd
eee
fff
I want to look for "eee" and then print "WWW" 3 lines above it. Output:
aaa
WWW
bbb
ccc
ddd
eee
fff
I'm using awk and can only print "WWW" 1 line above "eee", and not 3:
awk '/eee/{print "WWW"} 4' file.txt
any ideas?
One way:
awk '{a[NR]=$0;}/eee/{a[NR-3]="www\n" a[NR-3];}END{for(i=1;i<=NR;i++)print a[i];}' file