I am fresher in writing perl scripts,so i am asking this as question or support on this, below is the code
start pattern1
line1
Matching pattern can be here
line2
Matching Pattern can be here
line3
line4
...
end pattern1
.
start pattern1
line1
line2
start pattern1
start pattern1
line1
Matching pattern can be here
line2
start pattern1
so from perl i need to grep the lines between start pattern1 ... end pattern1,
for this i am using awk command to grep
$cmd = q(awk '/start pattern1/,end pattern1 /' x.file );
$n1 = system($cmd);
For this output works fine,Below is the output,
start pattern1
line1
**Matching pattern can be here**
line2
**Matching Pattern can be here**
...
end pattern1
But in the files i have 1000 of lines like this, so i need to grep those lines which have Matching pattern. i.e i need to grep only those starting pattern lines to ending pattern lines has matching pattern
For this i tried
$cmd = q(awk '/start pattern1/,end pattern1 /' x.file | grep '$n2\|line4');
$n1 = system($cmd);
But when i use the above command i don't see any output
Here $n2 contains some pattern which is grepped from another file.
if i use direct matched patternin place of $n2 it works fine, why cant i use $n2 here?
Note:i am using this in perl script
From the Awk command i get all the lines between start pattern1...end pattern1,But i have 1000 of such prints, so i need the bunch of the lines of start pattern1 to end pattern1 of thos which get matched with the matching pattern
The expected output when i do is,
start pattern1
line1
Matching pattern can be here
line2
Matching Pattern can be here
line3
line4
...
end pattern1
start pattern1
line1
Matching pattern can be here
line2
start pattern1
No need to summon awk from within perl since perl is much more powerful than awk.
It's not clear to me whether you want every line between start pattern1 and end pattern1 if there is at least one match inside, or just the matching lines.
If every line between start and end if match:
my #blocks = join("",<>)=~/start pattern1\s*(.*?)end pattern1/gsi;
print grep /matching pattern/i, #blocks;
If every line INCLUDING start|end pattern1:
my #blocks = join("",<>)=~/(start pattern1.*?end pattern1\s*)/gsi;
If just lines with /matching pattern/ between start and end:
print grep { /start pattern1/i../end pattern1/i and /matching pattern/i } <>;
Put that inside a file program.pl and run:
perl program.pl inputfile > outputfile
Some explanation might be needed: join("",<>) returns the whole inputfile as one multi-line string. The /gsi modifiers means: g matches globally so that the #block array will contain what is matched by the parentheses, one array element for each match (without g the #block array would just get the first block of lines), s means that . also matches newline characters which it otherwise wouldn't and i matches by ignoring case (sees no difference between a-z and A-Z letters). The question mark in .*? means no-greedy matching of every character, that is, match until the next end pattern1 and not the last. The <> returns the lines of inputfile (args after perl program.pl) as an array of strings. The .. is the flip-flop operator which is true after the left side becomes true and false after the right side becomes true and stays false until the left side is true again and so on.
Related
If I have a file that contains some text data such as
PATTERN1
TEXT1
PATTERN1
TEXT2
PATTERN2
How would I select the TEXT2 data from this file I know PATTERN1 and PATTERN2 ?
I have tried using awk as mentioned here, but it prints both TEXT1 and TEXT2.
If TEXT2 is always surrounded by PATTERN1 and PATTERN2 you can use grep:
grep -B2 "PATTERN2" file | grep -A1 "PATTERN1" | grep -v "PATTERN1"
grep -B2 "PATTERN2" -> grab PATTERN2 and the preceding 2 lines
grep -A1 "PATTERN1" -> from these three lines, grab PATTERN1 and the line after
grep -v "PATTERN1" -> get rid of the line/s containing PATTERN1 and you are left with TEXT2
$ awk '
inBlock {
if ( /PATTERN2/ ) {
printf "%s", block
inBlock = 0
} else {
block = block $0 ORS
}
}
/PATTERN1/ {
inBlock = 1
block = ""
}
' file
TEXT2
If PATTERN2 can occure multiple times, this extracts only inner text:
sed '/PATTERN1/h;//!H;/PATTERN2/!d;//{x;/PATTERN1/!d}'
If PATTERN2 can occur only once, you can use such sed script:
sed -n '/PATTERN1/h;//!H;/PATTERN2/{x;p}' input_file.txt
or:
sed '/PATTERN1/h;//!H;/PATTERN2/!d;//x'
You can reverse the lines, then use sed with 2 addresses and reverse lines again:
tac input_file.txt | sed -n '/PATTERN2/,/PATTERN1/p' | tac
With sed -z we could remove everything in front and after the patterns, since regex is greedy:
sed -z 's/.*\(PATTERN1\n\)/\1/;s/\(PATTERN2\n\).*/\1/g'
This might work for you (GNU sed):
sed '/PATTERN1/{z;x;d};/PATTERN2/!{H;d};g;s/.//p;d' file
If the current line contains PATTERN1, clear the line and delete the hold space (HS).
If the current line does not contain PATTERN2, append it to the HS and delete the line.
If the current line contains PATTERN2, replace it by the contents of the HS, remove the first character (which will be an introduced newline), print the result and delete the line.
Alternative:
sed -En '/PATTERN1/{:a;/PATTERN1/z;N;/PATTERN2/!ba;s/.(.*)\n.*/\1/p}' file
The first solution presupposes that the file will contain PATTERN1 and PATTERN2, the second does not.
Perl to the rescue!
perl -ne 'print(#buffer), $inside = #buffer = () if /PATTERN2/;
push #buffer, $_ if $inside;
#buffer = (), $inside = 1 if /PATTERN1/;
' -- file.txt
We keep an array of lines to output in #buffer. We also keep a flag $inside that's set to true if we've met PATTERN1, but not PATTERN2 yet.
If we see PATTERN2, we print the buffer and clear the flag.
If we are inside, we remember the current line.
If we see PATTERN1, regardless of whether we've seen it before or not, we clear the buffer and set the flag.
I have a bunch of files with many lines in them, and usually one or two blank lines at the end.
I want to remove the blank lines at the end, while keeping all of the blank lines that may exist within the file.
I want to restrict the operation to the use of GNU utilities or similar, i.e. bash, sed, awk, cut, grep etc.
I know that I can easily remove all blank lines, with something like:
sed '/^$/d'
But I want to keep blank lines which exist prior to further content in the file.
File input might be as follows:
line1
line2
line4
line5
I'd want the output to look like:
line1
line2
line4
line5
All files are <100K, and we can make temporary copies.
With Perl:
perl -0777 -pe 's/\n*$//; s/$/\n/' file
Second S command (s/$/\n/) appends again a newline to end of your file to be POSIX compilant.
Or shorter:
perl -0777 -pe 's/\n*$/\n/' file
With Fela Maslen's comment to edit files in place (-i) and glob all elements in current directory (*):
perl -0777 -pe 's/\n*$/\n/' -i *
If lines containing just space chars are to be considered empty:
$ tac file | awk 'NF{f=1}f' | tac
line1
line2
line4
line5
otherwise:
$ tac file | awk '/./{f=1}f' | tac
line1
line2
line4
line5
Here is an awk solution (Standard linux gawk). I enjoyed writing.
single line:
awk '/^\s*$/{s=s $0 ORS; next}{print s $0; s=""}' input.txt
using a readable script script.awk
/^\s*$/{skippedLines = skippedLines $0 ORS; next}
{print skippedLines $0; skippedLines= ""}
explanation:
/^\s*$/ { # for each empty line
skippedLines = skippedLines $0 ORS; # pad string of newlines
next; # skip to next input line
}
{ # for each non empty line
print skippedLines $0; # print any skippedLines and current input line
skippedLines= ""; # reset skippedLines
}
This might work for you (GNU sed):
sed ':a;/\S/{n;ba};$d;N;ba' file
If the current line contains a non-space character, print the current pattern space, fetch the next line and repeat. If the current line(s) is/are empty and it is the last line in the file, delete the pattern space, otherwise append the next line and repeat.
I am trying to match nested text, including the line immediately prior to the nested text with sed or grep.
An example of what I'm working with:
pattern3
abcde
fghij
pattern3
pattern1
abcde
fghij
pattern1
pattern1
klmno
pattern1
pattern3
abcde
pattern1
pqrst
patterh3
fghij
Note that there are always four (4) spaces prefixing the nested text. Also, there may or may not be nested text after a matching pattern.
I'm interested in all pattern1 lines, plus the lines following pattern1 that are proceeded by spaces.
The output I'm looking for is:
pattern1
abcde
fghij
pattern1
pattern1
klmno
pattern1
pattern1
pqrst
I got close with:
sed -n '/^pattern1/,/^pattern1/p' data.txt
But it seems to skip nested text after the right hand side pattern1 match, and move onto the next iteration.
I also tried sed -n '/^\"pattern1\"$/,/^\"pattern1\"$/p' data.txt | sed '1d;$d' with no luck either.
With GNU sed:
sed -n '/pattern1/{p;:x;n;s/^ .*/&/;p;tx}' file
or simplified:
sed -n '/pattern1/{p;:x;n;p;/^ /bx}' file
Output:
pattern1
abcde
fghij
pattern1
pattern1
klmno
pattern1
pattern1
pqrst
Could you please try following.
awk '/pattern[23]/{flag=""} /pattern1/{flag=1} flag' Input_file
OR
awk '/pattern[^1]/{flag=""} /pattern1/{flag=1} flag' Input_file
Explanation: Adding explanation too here.
awk '
/pattern[^1]/{ ##Checking condition if a line is having string pattern with apart from digit 1 in it then do following.
flag="" ##Nullifying variable flag value here.
}
/pattern1/{ ##Checking condition here if a line is having string pattern1 then do following.
flag=1 ##Setting value of variable flag as 1 here.
}
flag ##Checking condition if value of flag is NOT NULL then print the line value.
' Input_file ##Mentioning Input_file name here.
$ awk '/^[^ ]/{f=/^pattern1$/} f' file
pattern1
abcde
fghij
pattern1
pattern1
klmno
pattern1
pattern1
pqrst
This might work for you (GNU sed):
sed '/^\S/h;G;/pattern1/P;d' file
Store the current pattern in the hold space and append it to each line. If the current pattern is pattern1, print the current line and/or delete the current line.
pattern1
a
b
pattern2
cd
pattern1
re
pattern2
gh
pattern1
ef
pattern2
qw
e
I can show all matching pattern by
sed -n '/pattern1/,/pattern2/p'
Choose the second matching pattern or any Nth by
awk -vM=2 '(x+=/pattern1/)==M&&x+=/pattern2/' file
pattern1
re
pattern2
Print only last matching pattern by
awk 'x+=/pattern1|pattern2/{!y++&&B="";B=B?B"\n"$0:$0;x==2&&y=x=0}END{print B}' file
pattern1
ef
pattern2
But how can I print for example the last/first 2 or Nth matching block pattern?
pattern1
re
pattern2
pattern1
ef
pattern2
This might work for you (GNU sed):
sed -n '/pattern1/,/pattern2/{p;/pattern2/{H;x;s///2;x;T;q}}' file
This prints the first 2 matches of pattern1 through pattern2 and then quits.
sed -nr '/pattern1/,/pattern2/H;$!b;x;s/.*((pattern1.*){2})$/\1/p' file
This prints the last 2 matches of pattern1 through pattern2.
sed -n '/pattern1/,/pattern2/p'
pattern1
a
b
pattern2
cd
pattern1
ef
pattern2
gh
pattern1
ef
pattern2
This will generate the output between all matching pattern1 and pattern2
But how can i choose which block to print like only print the last matching pattern block or the first matching pattern block only?
Print the Nth occurrence
Here's one way you could do it using awk:
$ awk '/pattern1/{++f;p=1}p&&f==2;/pattern2/{p=0}' file
pattern1
ef
pattern2
The number 2 in the middle controls which occurrence is printed (in this case, the second).
Explanation
When the opening pattern is matched, f is incremented and the p flag is set. When the closing pattern is matched, the p flag is unset. Lines are only printed when the p flag is set and f has a specific value.
If you wanted, you could pass the value in from the shell:
$ c=2
$ awk -v c="$c" '/pattern1/{++f;p=1}p&&f==c;/pattern2/{p=0}' file
pattern1
ef
pattern2
Print the last occurrence
To always print the last occurrence within the range, you could use an array:
$ awk '{a[NR]=$0}/pattern1/{s=NR}/pattern2/{e=NR}END{for(i=s;i<=e;++i)print a[i]}' file
pattern1
ef
pattern2
Explanation
Every line in the file is stored sequentially in the array a. s and e are overwritten with the current line number NR every time the start or end pattern is matched. At the end, print the elements that you're interested in.
A potential disadvantage of this approach is that the contents of the entire file are stored in memory but unless you have very large files, this may not be a problem.
Another awk way
Find the nth occurence
awk -vM=2 '(x+=/pattern1/)==M&&x+=/pattern2/' file
Output
pattern1
ef
pattern2
Explanation
-vM=2
Set M to whatever occurence you want to find
(x+=/pattern1/)==M
Increments x for every occurence of pattern1 and checks if it equals M.
&&x+=/pattern2/
If it does then increment it for every occurence of pattern2 so when it reaches pattern 2 it wil print that line but no more as it will now be larger than M.
Default action for awk is print.
Print the last occurence
This only stores the last block seen in memory.
awk 'x+=/pattern1|pattern2/{!y++&&B="";B=B?B"\n"$0:$0;x==2&&y=x=0}END{print B}' file
Output
pattern1
ef
pattern2
Explanation
Increment x for every occurence of pattern1 or 2
Flush B when y is not set(when a new set is found) then set y
If x exists then add lines to the variable B
Unset x and y if the count is 2 meaning both have been seen.
Perl to the rescue!
Printing the last match:
perl -ne 'push #keep, $_ if (/pattern1/ and #keep = ("")) .. /pattern2/;
}{ print #keep'
Explanation: The match is stored in #keep, which is emptied when pattern1 is matched. So, #keep will contain the last match after the whole input is processed.
Printing n-th match:
perl -ne 'push #keep, $_
if (/pattern1/ and ++$c and #keep = "")
.. ($e = /pattern2/);
print(#keep), last if $e and 2 == $c'
# ^
# |
# the second match
$c counts the matches. $e signals the end of the match.