Print ranges of text starting from a line before match - bash

I need to print blocks of text that start one line before a pattern matches till the next blank line. I managed to do it with awk/sed but starting from the line that PATTERN2 (passed as variable $ID) appears and not the previous one. My inputfile:
2022/12/28 02:06:29 [Time]
Processing id: PATTERN1
multiple lines follow
2023/01/14 04:06:29 [Time]
Processing id: PATTERN2
multiple lines follow
2023/02/15 08:07:29 [Time]
Processing id: PATTERN3
multiple lines follow
2023/02/16 14:06:29 [Time]
Processing id: PATTERN2
multiple lines follow
....
with sed:
sed -n "/Processing id: $ID/,/^$/p" inputfile
with awk:
awk -v myid="$ID" '$0 ~ "Processing id: "myid,/^$/ {print}' inputfile
Desired output:
2023/01/14 04:06:29 [Time]
Processing id: PATTERN2
multiple lines follow
2023/02/16 14:06:29 [Time]
Processing id: PATTERN2
multiple lines follow

With awk, using RS="" to define blank-line separated records, and a conditional action with the ~ includes operator:
pattern="PATTERN2"
awk -v myid="$pattern" 'BEGIN{RS=""; ORS="\n\n"} $0 ~ myid' inputfile
Output:
2023/01/14 04:06:29 [Time]
Processing id: PATTERN2
multiple lines follow
2023/02/16 14:06:29 [Time]
Processing id: PATTERN2
multiple lines follow

With awk:
$ awk -vid=PATTERN2 'BEGIN{RS="\n\n"}{if ($0 ~ id) print $0, "\n"}' file
Output
2023/01/14 04:06:29 [Time]
Processing id: PATTERN2
multiple lines follow
2023/02/16 14:06:29 [Time]
Processing id: PATTERN2
multiple lines follow

With perl in paragraph mode:
$ perl -00 -sne 'print if /$id/' -- -id=PATTERN2 file
Output
2023/01/14 04:06:29 [Time]
Processing id: PATTERN2
multiple lines follow
2023/02/16 14:06:29 [Time]
Processing id: PATTERN2
multiple lines follow

With extended grep matching:
myid="PATTERN2"
grep -A1 -B1 --group-separator='' "Processing id: $myid" file
2023/01/14 04:06:29 [Time]
Processing id: PATTERN2
multiple lines follow
2023/02/16 14:06:29 [Time]
Processing id: PATTERN2
multiple lines follow

A sed solution:
idline="Processing id: $id"
sed -e "/$idline/,/^$/!{h;d;}" -e "/$idline/{H;x;}" file

Related

Selecting lines between marker patterns where one pattern may occur twice

If I have a file that contains some text data such as
PATTERN1
TEXT1
PATTERN1
TEXT2
PATTERN2
How would I select the TEXT2 data from this file I know PATTERN1 and PATTERN2 ?
I have tried using awk as mentioned here, but it prints both TEXT1 and TEXT2.
If TEXT2 is always surrounded by PATTERN1 and PATTERN2 you can use grep:
grep -B2 "PATTERN2" file | grep -A1 "PATTERN1" | grep -v "PATTERN1"
grep -B2 "PATTERN2" -> grab PATTERN2 and the preceding 2 lines
grep -A1 "PATTERN1" -> from these three lines, grab PATTERN1 and the line after
grep -v "PATTERN1" -> get rid of the line/s containing PATTERN1 and you are left with TEXT2
$ awk '
inBlock {
if ( /PATTERN2/ ) {
printf "%s", block
inBlock = 0
} else {
block = block $0 ORS
}
}
/PATTERN1/ {
inBlock = 1
block = ""
}
' file
TEXT2
If PATTERN2 can occure multiple times, this extracts only inner text:
sed '/PATTERN1/h;//!H;/PATTERN2/!d;//{x;/PATTERN1/!d}'
If PATTERN2 can occur only once, you can use such sed script:
sed -n '/PATTERN1/h;//!H;/PATTERN2/{x;p}' input_file.txt
or:
sed '/PATTERN1/h;//!H;/PATTERN2/!d;//x'
You can reverse the lines, then use sed with 2 addresses and reverse lines again:
tac input_file.txt | sed -n '/PATTERN2/,/PATTERN1/p' | tac
With sed -z we could remove everything in front and after the patterns, since regex is greedy:
sed -z 's/.*\(PATTERN1\n\)/\1/;s/\(PATTERN2\n\).*/\1/g'
This might work for you (GNU sed):
sed '/PATTERN1/{z;x;d};/PATTERN2/!{H;d};g;s/.//p;d' file
If the current line contains PATTERN1, clear the line and delete the hold space (HS).
If the current line does not contain PATTERN2, append it to the HS and delete the line.
If the current line contains PATTERN2, replace it by the contents of the HS, remove the first character (which will be an introduced newline), print the result and delete the line.
Alternative:
sed -En '/PATTERN1/{:a;/PATTERN1/z;N;/PATTERN2/!ba;s/.(.*)\n.*/\1/p}' file
The first solution presupposes that the file will contain PATTERN1 and PATTERN2, the second does not.
Perl to the rescue!
perl -ne 'print(#buffer), $inside = #buffer = () if /PATTERN2/;
push #buffer, $_ if $inside;
#buffer = (), $inside = 1 if /PATTERN1/;
' -- file.txt
We keep an array of lines to output in #buffer. We also keep a flag $inside that's set to true if we've met PATTERN1, but not PATTERN2 yet.
If we see PATTERN2, we print the buffer and clear the flag.
If we are inside, we remember the current line.
If we see PATTERN1, regardless of whether we've seen it before or not, we clear the buffer and set the flag.

Print all lines between two patterns, exclusive, first instance only (in sed, AWK or Perl) [duplicate]

This question already has answers here:
How to print lines between two patterns, inclusive or exclusive (in sed, AWK or Perl)?
(9 answers)
Closed 3 years ago.
Using sed, AWK (or Perl), how do you print all lines between (the first instance of) two patterns, exclusive of the patterns?1
That is, given as input:
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
Or possibly even:
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
fff
PATTERN1
ggg
hhh
iii
PATTERN2
jjj
I would expect, in both cases:
bbb
ccc
ddd
1 A number of users voted to close this question as a duplicate of this one. In the end, I provided a gist that proves they are different. The question is also superficially similar to a number of others, but there is no exact match, and none of them are of high quality, and, as I believe that this specific problem is the one most commonly faced, it deserves a clear formulation, and a set of correct, clear answers.
If you have GNU sed (tested using version 4.7 on Mac OS X), the simplest solution could be:
sed '0,/PATTERN1/d;/PATTERN2/Q'
Explanation:
The d command deletes from line 1 to the line matching /PATTERN1/ inclusive.
The Q command then exits without printing on the first line matching /PATTERN2/.
If the file has only once instance of the pattern, or if you don't mind extracting all of them, and you want a solution that doesn't depend on a GNU extension, this works:
sed -n '/PATTERN1/,/PATTERN2/{//!p}'
Explanation:
Note that the empty regular expression // repeats the last regular expression match.
With awk (assumes that PATTERN1 and PATTERN2 are always present in pairs and either of them do not occur inside a pair)
$ cat ip.txt
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
fff
PATTERN1
ggg
hhh
iii
PATTERN2
jjj
$ awk '/PATTERN2/{exit} f; /PATTERN1/{f=1}' ip.txt
bbb
ccc
ddd
/PATTERN1/{f=1} set flag if /PATTERN1/ is matched
/PATTERN2/{exit} exit if /PATTERN2/ is matched
f; print input line if flag is set
Generic solution, where the block required can be specified
$ awk -v b=1 '/PATTERN2/ && c==b{exit} c==b; /PATTERN1/{c++}' ip.txt
bbb
ccc
ddd
$ awk -v b=2 '/PATTERN2/ && c==b{exit} c==b; /PATTERN1/{c++}' ip.txt
2
46
This might work for you (GNU sed);
sed -n '/PATTERN1/{:a;n;/PATTERN2/q;p;$!ba}' file
This prints only the lines between the first set of delimiters, or if the second delimiter does not exist, to the end of the file.
I attempted twice to answer, but the questions switched hold/duplicate statuses..
Borrowing input from #Sundeep and adding the answer which I shared in the question comments.
Using awk
awk -v x=0 -v y=1 ' /PATTERN1/&&y { x=1;next } /PATTERN2/&&y { x=0;y=0; next } x ' file
with Perl
perl -0777 -ne ' while( /PATTERN1.*?\n(.+?)^[^\n]*?PATTERN2/msg ) { print $1 if $x++ <1 } '
Results:
$ cat ip.txt
aaa
PATTERN1
bbb
ccc
ddd
PATTERN2
eee
PATTERN1
2
46
PATTERN2
xyz
$
$ awk -v x=0 -v y=1 ' /PATTERN1/&&y { x=1;next } /PATTERN2/&&y { x=0;y=0; next } x ' ip.txt
bbb
ccc
ddd
$ perl -0777 -ne ' while( /PATTERN1.*?\n(.+?)^[^\n]*?PATTERN2/msg ) { print $1 if $x++ <1 } ' ip.txt
bbb
ccc
ddd
$
To make it generic
awk here y is the input
awk -v x=0 -v y=2 ' /PATTERN1/ { x++;next } /PATTERN2/ { if(x==y) exit } x==y ' ip.txt
2
46
perl check ++$x against the occurence.. here it is 2
perl -0777 -ne ' while( /PATTERN1.*?\n(.+?)^[^\n]*?PATTERN2/msg ) { print $1 if ++$x==2 } ' ip.txt
2
46
Adding more solutions(possible ways here, for fun :) and not at all claiming that these are better than usual ones) All tested and written in GNU awk. Also tested with given examples only.
1st Solution:
awk -v RS="" -v FS="PATTERN2" -v ORS="" '$1 ~ /\nPATTERN1\n/{sub(/.*PATTERN1\n/,"",$1);print $1}' Input_file
2nd solution:
awk -v RS="" -v ORS="" 'match($0,/PATTERN1[^(PATTERN2)]*/){val=substr($0,RSTART,RLENGTH);gsub(/^PATTERN1\n|^$\n/,"",val);print val}' Input_file
3rd solution:
awk -v RS="" -v OFS="\n" -v ORS="" 'sub(/PATTERN2.*/,"") && sub(/.*PATTERN1/,"PATTERN1"){$1=$1;sub(/^PATTERN1\n/,"")} 1' Input_file
In all above codes output will be as follows.
bbb
ccc
ddd
Using GNU sed:
sed -nE '/PATTERN1/{:s n;/PATTERN2/q;p;bs}'
-n will prune all but lines between PATTERN1 and PATTERN2 including both, because there will be p printout command.
every sed range check if it's true will execute only one the next, so {} grouping is mandated..
Drop PATTERN1 by n command (means next), if reach the first PATTERN2 outrightly quit otherwise print the line then and continue the next line within that boundary.

Non matching word from file1 to file2

I have two files - file1 & file2.
file1 contains (only words) says-
ABC
YUI
GHJ
I8O
..................
file2 contains many para.
dfghjo ABC kll njjgg bla bla
GHJ njhjckhv chasjvackvh ..
ihbjhi hbhibb jh jbiibi
...................
I am using below command to get the matching lines which contains word from file1 in file2
grep -Ff file1 file2
(Gives output of lines where words of file1 found in file2)
I also need the words which doesn't match/found in file 2 and unable to find Un-matching word.
Can anyone help in getting below output
YUI
I8O
i am looking one liner command (via grep,awk,sed), as i am using pssh command and can't use while,for loop
You can print only the matched parts with -o.
$ grep -oFf file1 file2
ABC
GHJ
Use that output as a list of patterns for a search in file1. Process substitution <(cmd) simulates a file containing the output of cmd. With -v you can print lines that did not match. If file1 contains two lines such that one line is a substring of another line you may want to add -x (only match whole lines) to prevent false positives.
$ grep -vxFf <(grep -oFf file1 file2) file1
YUI
I8O
Using Perl - both matched/non-matched in same one-liner
$ cat sinw.txt
ABC
YUI
GHJ
I8O
$ cat sin_in.txt
dfghjo ABC kll njjgg bla bla
GHJ njhjckhv chasjvackvh ..
ihbjhi hbhibb jh jbiibi
$ perl -lne '
BEGIN { %x=map{chomp;$_=>1} qx(cat sinw.txt); $w="\\b".join("\|",keys %x)."\\b"}
print "$&" and delete($x{$&}) if /$w/ ;
END { print "\nnon-matched\n".join("\n", keys %x) }
' sin_in.txt
ABC
GHJ
non-matched
I8O
YUI
$
Getting only the non-matched
$ perl -lne '
BEGIN {
%x = map { chomp; $_=>1 } qx(cat sinw.txt);
$w = "\\b" . join("\|",keys %x) . "\\b"
}
delete($x{$&}) if /$w/;
END { print "\nnon-matched\n".join("\n", keys %x) }
' sin_in.txt
non-matched
I8O
YUI
$
Note that even a single use of $& variable used to be very expensive for the whole program, in Perl versions prior to 5.20.
Assuming your "words" in file1 are in more than 1 line :
while read line
do
for word in $line
do
if ! grep -q $word file2
then echo $word not found
fi
done
done < file1
For Un-matching words, here's one GNU awk solution:
awk 'NR==FNR{a[$0];next} !($1 in a)' RS='[ \n]' file2 file1
YUI
I8O
Or !($0 in a), it's the same. Since I set RS='[ \n]', every space as line separator too.
And note that I read file2 first, and then file1.
If file2 could be empty, you should change NR==FNR to different file checking methods, like ARGIND==1 for GNU awk, or FILENAME=="file2", or FILENAME==ARGV[1] etc.
Same mechanism for only the matched one too:
awk 'NR==FNR{a[$0];next} $0 in a' RS='[ \n]' file2 file1
ABC
GHJ

how to fetch multiple pattern and numbers

i have this this file ( pattern1 and pattern2 is fixed but numbers is randoms )
aaaa patern1[1234] bbbb cccc pattern2[5678]
jjjj patern1[9999] hhhhhhhh
and I want to extract the following patterns with bash script
pattern1[1234] pattern2[5678]
pattern1[9999]
I try by grep -Eo 'pattern1\[[0-9]{1,4}' it works for one pattern not for two,
$ cat ip.txt
aaaa pattern1[1234] bbbb cccc pattern2[5678]
jjjj pattern1[9999] hhhhhhhh
$ perl -lne 'print join " ", /pattern[12]\[\d+\]/g' ip.txt
pattern1[1234] pattern2[5678]
pattern1[9999]
pattern[12]\[\d+\] pattern to extract
print join " ", to print the results separated by space
If lines not containing the desired pattern are to be omitted:
perl -lne 'print join " ", //g if /pattern[12]\[\d+\]/' ip.txt
You can use the pipe character | to allow for multiple patterns:
grep -oP '(patern1|pattern2)\[[0-9]{1,4}\]' file
patern1[1234]
pattern2[5678]
patern1[9999]
Since the patterns are similar, you can simplify like this:
grep -oP 'patt?ern[12]\[[0-9]{1,4}\]' file
$ awk '{ c=0; while ( match($0,/(patern1|pattern2)[[][^][]+[]]/) ) { printf "%s%s", (c++?OFS:""), substr($0,RSTART,RLENGTH); $0=substr($0,RSTART+RLENGTH) } if (c) print "" }' file
patern1[1234] pattern2[5678]
patern1[9999]
If you prefer brevity over clarity then consider this, using GNU awk for multi-char RS and RT and run against the same input file as shown in https://stackoverflow.com/a/39453928/1745001:
$ awk -v RS='pattern[12][[][0-9]+[]]|\n' '{$0=RT;ORS=(/\n/?x:FS)} 1' file
pattern1[1234] pattern2[5678]
pattern1[9999]

Print the last 1,2,3..Nth or first 1,2,3...Nth matching block pattern using awk or sed

pattern1
a
b
pattern2
cd
pattern1
re
pattern2
gh
pattern1
ef
pattern2
qw
e
I can show all matching pattern by
sed -n '/pattern1/,/pattern2/p'
Choose the second matching pattern or any Nth by
awk -vM=2 '(x+=/pattern1/)==M&&x+=/pattern2/' file
pattern1
re
pattern2
Print only last matching pattern by
awk 'x+=/pattern1|pattern2/{!y++&&B="";B=B?B"\n"$0:$0;x==2&&y=x=0}END{print B}' file
pattern1
ef
pattern2
But how can I print for example the last/first 2 or Nth matching block pattern?
pattern1
re
pattern2
pattern1
ef
pattern2
This might work for you (GNU sed):
sed -n '/pattern1/,/pattern2/{p;/pattern2/{H;x;s///2;x;T;q}}' file
This prints the first 2 matches of pattern1 through pattern2 and then quits.
sed -nr '/pattern1/,/pattern2/H;$!b;x;s/.*((pattern1.*){2})$/\1/p' file
This prints the last 2 matches of pattern1 through pattern2.

Resources