delete lines if firstline matches expression, but next 2 lines do not match different expression - bash

I have a test file in this format:
G03X22Y22.5
G01X48.5
M98P9001 (OFF)****
G00X20Y25
M98P8051 (FAST CUT)
G01X22Y34
G01X25Y33
I am trying to make a bash or MSDOS script that will :
Find all lines in the file that match : M98P9001
if the NEXT 2 LINES do not contain the code { M98P8050, M98P8080 OR M09 } Delete all 3 lines . which would result in the output :
G03X22Y22.5
G01X48.5
G01X22Y34
G01X25Y33
I've tried solutions with SED or AWK, but haven't gotten the right one yet:
sed -e '/M98P9001/,+2d' input.txt >> output.txt
this one will always delete all 3 lines after finding the match , but I need to only delete the lines if the next 2 lines following the match do not have a match with { M98P8050, M98P8080 OR M09 }.

a mark and sweep approach
$ awk 'NR==FNR {if(!(/M98P80[58]0|M09/ && p~/M98P80[58]0|M09/) && pp~/M98P9001/)
{a[NR]; a[NR-1]; a[NR-2]}
pp=p; p=$0; next}
!(FNR in a)' file{,}
G03X22Y22.5
G01X48.5
G01X22Y34
G01X25Y33

This seems to give your desired output:
awk '
/M98P9001/ {
getline l2; getline l3;
if((l2 l3)~/M98P8050|M98P8080|M09/) printf "%s\n%s\n%s\n", $0, l2, l3;
next;
}
{ print; }'
Description:
If first line pattern match, read in next two lines to variables.
Check concatenation of both lines for any of the 3 secondary patterns
If match, print all three lines, else print nothing.
go to next record.
on all other lines, print.

This might work for you (GNU sed):
sed -E ':a;N;s/\n/&/2;Ta;/^[^\n]*M98P9001/{/\n.*(M98P8050|M98P8080|M09)/!d};P;D' file
Open a three line window throughout the length of the file.
If the first line of the window contains M98P9001 and either of the second or third lines do not contain M98P8050, M98P8080 or M09 delete the entire window and repeat.
Otherwise, print/delete the first line of the window and repeat.
N.B. The idiom :a;N;s/\n/&/2;Ta tops up the three line window.

Related

to check if a line before and after a string empty

I need to delete certain numbers of line before a desired text but only if a line before and after searched string is empty.
E.g (line number, content)
1
2
3 Hello
4
5 yellow
in this case, if lines before and after line containing Hello are empty (line 2 and 4), i have to delete lines from 3 to 1.
I can delete lines from 3 to 1 using tac and sed command but m having difficulty in putting tht if condition.
tac file1|sed -e '/Hello/,+3d'|tac
This might work for you (GNU sed):
sed ':a;N;s/\n/&/3;Ta;/\n\n.*Hello.*\n$/s/.*\n//;ta;P;D' file
Gather up 4 lines in the pattern space and if the 2nd and the 4th are empty and the 3rd contains Hello, delete the first three lines and repeat. Otherwise print the first line and repeat.
Could you please try following if you are ok with awk.
awk -v string="Hello" '
FNR==NR{
a[FNR]=$0
next
}
($0==string) && a[FNR-1]=="" && a[FNR+1]==""{
a[FNR-1]=a[FNR]=a[FNR-2]="del_flag"
}
END{
for(i=1;i<=length(a);i++){
if(a[i]!="del_flag"){
print a[i]
}
}
}
' Input_file Input_file
With GNU sed option -z you can match
some_line
empty line
line With Hello
empty line
and replace this with an empty line.
sed -rz 's/(^|\n)[^\n]*\n\nHello\n\n/\1\n/g' file1
EDIT: added g for multiple segments.

sed/awk - Put all text on the same line as a preceding number

How can I get all text that proceeds 'number:number' onto the same line as the preceding 'number:number'?
10:15
text line one
text line two
text no pattern
11:12
random text
text is random
totally random
could be four lines
could be five
Should then become
10:15 text line one text line two text no pattern
11:12 random text text is random totally random could be four lines could be five
This works for your example-
tr '\n' ' ' < file.txt | sed 's/[0-9]*:[0-9]*/\n&/g'
Explanation-
tr will initially put everything on the same line.
Then that sed one liner will insert new lines before each num:num pattern.
Given that input file all you need is to tell awk to read a blank-line-separated paragraph at a time using RS=<null> and recompile each record using the default OFS value of a blank char
$ awk -v RS= '{$1=$1}1' file
10:15 text line one text line two text no pattern
11:12 random text text is random totally random could be four lines could be five
Both sed and awk solutions join lines till a new record is detected or input is done in which case the joined lines are printed and cleared - use either solution
the sed oneliner
sed -nr '/^[0-9]{2}:[0-9]{2}$/!{H;$!b}; x; s/\n/ /gp'
the awk script
awk '
!/^[0-9]{2}:[0-9]{2}$/ {
lines=lines" "$0
next
}
{if(lines) print lines; lines=$0}
END {print lines}
'
Here is an GNU AWK script:
script.awk
BEGIN { RS = "\n[0-9]+:[0-9]+|\n$" }
{ gsub(/\n/,"",$0)
printf( "%s%s ", $0,RT) }
Use it like this awk -f script.awk file.txt
It uses the GNU AWK specific extensions RT and regex RS:
the record separator is set to "colon separated number pairs".
to get the final newline at the end of the file the "|\n$" is added to match the last newline in the file.
In order to start separation at the second pair: the "\n" is added in front. Thus the first colon separated number pair "10:15" is included in the first $0 and not in RT.
The trick here is that you want to split the file on paragraphs instead of lines. In awk, if you set RS="" it enables paragraph mode. Each iteration of the awk loop will have a paragraph in $0. You can then substitute the newlines and turn them into spaces.
awk <data.txt 'BEGIN { RS = "" ; FS = "\n" } { gsub(/\n/, " ", $0) ; print }'
Output:
10:15 text line one text line two text no pattern
11:12 random text text is random totally random could be four lines could be five
The benefit of this is that awk handles all the special cases for you: files that end in a blank line, end without a blank line, end without a newline, etc.

Getting repeated lines with awk in Bash

I'm trying to know which are the lines that are repeated X times in a text file, and I'm using awk but I see that awk in my command, not work with lines that begin with the same characters or words. That is, does not recognize the full line individually.
Using this command I try to get the lines that are repeated 3 times:
awk '++A[$1]==3' ./textfile > ./log
This is what you need hopefully:
awk '{a[$0]++}END{for(i in a){if(a[i]==3)print i}}' File
Increment array a with the line($0) as index for each line. In the end, for each index ($0), check if the count(a[i] which is the original a[$0]) equals 3. If so, print the line (i which is the original $0 / line). Hope it's clear.
This returns lines repeated 3 times but adds a space at the beginning of each 3x-repeated line:
sort ./textfile | uniq -c | awk '$1 == 3 {$1 = ""; print}' > ./log

I have a text file and I need to delete the first blank line and then all the text after the 2nd blank line

I'm using bash and I have a file that is in 3 parts of text. The first part, then a blank line, then the 2nd part then another blank line, then the file 3 part of text. I need to output this to a new file that contains only the first 2 parts without the blank line in between. I've been playing with sed and awk, but can't quite figure it out.
Most simply with awk:
awk -v RS= 'NR <= 2' filename
With an empty record separator RS, awk splits the file into records at empty lines. With the selection NR <= 2, only the first two are printed (delimited by the default output record separator, which is a newline).
If the file is very large, it might be prudent to amend this to
awk -v RS= '1; NR == 2 { exit }' filename
This stops processing the file after the second record and prints all until then.
Addendum: Obligatory crazy sed solution (not recommended for use, written for fun):
sed -n '/^$/ { x; /./q; H; d; }; p' filename

How to grep the last occurrence of a line pattern

I have a file with contents
x
a
x
b
x
c
I want to grep the last occurrence,
x
c
when I try
sed -n "/x/,/b/p" file
it lists all the lines, beginning x to c.
I'm not sure if I got your question right, so here are some shots in the dark:
Print last occurence of x (regex):
grep x file | tail -1
Alternatively:
tac file | grep -m1 x
Print file from first matching line to end:
awk '/x/{flag = 1}; flag' file
Print file from last matching line to end (prints all lines in case of no match):
tac file | awk '!flag; /x/{flag = 1};' | tac
grep -A 1 x file | tail -n 2
-A 1 tells grep to print one line after a match line
with tail you get the last two lines.
or in a reversed way:
tac fail | grep -B 1 x -m1 | tac
Note: You should make sure your pattern is "strong" enough so it gets you the right lines. i.e. by enclosing it with ^ at the start and $ at the end.
This might work for you (GNU sed):
sed 'H;/x/h;$!d;x' file
Saves the last x and what follows in the hold space and prints it out at end-of-file.
not sure how to do it using sed, but you can try awk
awk '{a=a"\n"$0; if ($0 == "x"){ a=$0}} END{print a}' file
POSIX vi (or ex or ed), in case it is useful to someone
Done in Command mode, of course
:set wrapscan
Go to the first line and just search Backwards!
1G?pattern
Slower way, without :set wrapscan
G$?pattern
Explanation:
G go to the last line
Move to the end of that line $
? search Backwards for pattern
The first backwards match will be the same as the last forward match
Either way, you may now delete all lines above current (match)
:1,.-1d
or
kd1G
You could also delete to the beginning of the matched line prior to the line deletions with d0 in case there were multiple matches on the same line.
POSIX awk, as suggested at
get last line from grep search on multiple files
awk '(FNR==1)&&s{print s; s=""}/PATTERN/{s=$0}END{if(s) print s}'
if you wanna do awk in truly hideous one-liner fashion but getting awk to resemble closer to functional programming paradigm syntax without having to keep track when the last occurrence is
mawk/mawk2/gawk 'BEGIN { FS = "=7713[0-9]+="; RS = "^$";
} END { print ar0[split($(0 * sub(/\n.+$/,"",$NF)), ar0, ORS)] }'
Here i'm employing multiple awk short-hands :
sub(/[\n.+$/, "", $NF) # trimming all extra rows after pattern
g/sub() returns # of substitutions made, so multiplying that by 0 forces the split() to be splitting $0, the full file, instead.
split() returns # of items in the array (which is another way of saying the position of last element), so even though I've already trimmed out the trailing \n, i still can directly print ar0[split()], knowing that ORS will fill in the missing trailing \n.
That's why this code looks like i'm trying to extract array items before the array itself is defined, but due to flow of logic needed, the array will become defined by the time it reaches print.
Now if you want something simpler, these 2 also work
mawk/gawk 'BEGIN { FS="=7713[0-9]+="; RS = "^$"
} END { $NF = substr($NF, 1, index($NF, ORS));
FS = ORS; $0 = $0; print $(NF-1) }'
or
mawk/gawk '/=7713[0-9]+=/ { lst = $0 } END { print lst }'
I didn't use the same x|c requirements as OP just to showcase these work regardless of whether you need fixed-strings or regex based matches.
The above solutions only work for one single file, to print the last occurrence for many files (say with suffix .txt), use the following bash script
#!/bin/bash
for fn in `ls *.txt`
do
result=`grep 'pattern' $fn | tail -n 1`
echo $result
done
where 'pattern' is what you would like to grep.

Resources