to check if a line before and after a string empty - bash

I need to delete certain numbers of line before a desired text but only if a line before and after searched string is empty.
E.g (line number, content)
1
2
3 Hello
4
5 yellow
in this case, if lines before and after line containing Hello are empty (line 2 and 4), i have to delete lines from 3 to 1.
I can delete lines from 3 to 1 using tac and sed command but m having difficulty in putting tht if condition.
tac file1|sed -e '/Hello/,+3d'|tac

This might work for you (GNU sed):
sed ':a;N;s/\n/&/3;Ta;/\n\n.*Hello.*\n$/s/.*\n//;ta;P;D' file
Gather up 4 lines in the pattern space and if the 2nd and the 4th are empty and the 3rd contains Hello, delete the first three lines and repeat. Otherwise print the first line and repeat.

Could you please try following if you are ok with awk.
awk -v string="Hello" '
FNR==NR{
a[FNR]=$0
next
}
($0==string) && a[FNR-1]=="" && a[FNR+1]==""{
a[FNR-1]=a[FNR]=a[FNR-2]="del_flag"
}
END{
for(i=1;i<=length(a);i++){
if(a[i]!="del_flag"){
print a[i]
}
}
}
' Input_file Input_file

With GNU sed option -z you can match
some_line
empty line
line With Hello
empty line
and replace this with an empty line.
sed -rz 's/(^|\n)[^\n]*\n\nHello\n\n/\1\n/g' file1
EDIT: added g for multiple segments.

Related

delete lines if firstline matches expression, but next 2 lines do not match different expression

I have a test file in this format:
G03X22Y22.5
G01X48.5
M98P9001 (OFF)****
G00X20Y25
M98P8051 (FAST CUT)
G01X22Y34
G01X25Y33
I am trying to make a bash or MSDOS script that will :
Find all lines in the file that match : M98P9001
if the NEXT 2 LINES do not contain the code { M98P8050, M98P8080 OR M09 } Delete all 3 lines . which would result in the output :
G03X22Y22.5
G01X48.5
G01X22Y34
G01X25Y33
I've tried solutions with SED or AWK, but haven't gotten the right one yet:
sed -e '/M98P9001/,+2d' input.txt >> output.txt
this one will always delete all 3 lines after finding the match , but I need to only delete the lines if the next 2 lines following the match do not have a match with { M98P8050, M98P8080 OR M09 }.
a mark and sweep approach
$ awk 'NR==FNR {if(!(/M98P80[58]0|M09/ && p~/M98P80[58]0|M09/) && pp~/M98P9001/)
{a[NR]; a[NR-1]; a[NR-2]}
pp=p; p=$0; next}
!(FNR in a)' file{,}
G03X22Y22.5
G01X48.5
G01X22Y34
G01X25Y33
This seems to give your desired output:
awk '
/M98P9001/ {
getline l2; getline l3;
if((l2 l3)~/M98P8050|M98P8080|M09/) printf "%s\n%s\n%s\n", $0, l2, l3;
next;
}
{ print; }'
Description:
If first line pattern match, read in next two lines to variables.
Check concatenation of both lines for any of the 3 secondary patterns
If match, print all three lines, else print nothing.
go to next record.
on all other lines, print.
This might work for you (GNU sed):
sed -E ':a;N;s/\n/&/2;Ta;/^[^\n]*M98P9001/{/\n.*(M98P8050|M98P8080|M09)/!d};P;D' file
Open a three line window throughout the length of the file.
If the first line of the window contains M98P9001 and either of the second or third lines do not contain M98P8050, M98P8080 or M09 delete the entire window and repeat.
Otherwise, print/delete the first line of the window and repeat.
N.B. The idiom :a;N;s/\n/&/2;Ta tops up the three line window.

bash remove block of text from file

Suppose I have an input file with lines of text:
line 1
line 2
line 3
line 4
line 2
now suppose I would like to check if my inputfile contains
line 2
line 3
and remove that block of text if it is found. This would give:
line 1
line 4
line 2
Note that I don't want to remove just every occurrence of line 2 or line 3; but only if they are found one after another. (In reality I want to check for a block of 5 lines, and not just any block of code between two placeholders, but let's keep the example simple).
I looked into awk but that is getting complicated very quick (I'm not yet ready with this; since I feel this is not the right approach and will explode with 5 lines...)
awk '/line 2/ {if (line0) {print line0; line0=""}; line0=$0}' input.txt
One way with GNU awk for multi-char RS and RT:
$ awk -v RS='(^|\n)line 2\nline 3\n' '{ORS=(RT ~ /^\n/ ? "\n" : "")} 1' file
line 1
line 4
line 2
With any awk:
$ cat file
line 2
line 3
line 1
line 2
line 3
line 4
line 2
line 3
$ awk '
{ rec = rec $0 RS }
END {
rec = RS rec
gsub(/\nline 2\nline 3\n/,RS,rec)
gsub(/^\n|\n$/,"",rec)
print rec
}
' file
line 1
line 4
The above assumes you want to match using regexps since that's what your posted code does. If you want to do literal string matches instead that's do-able too with some massaging:
$ cat tst.awk
{ rec = rec $0 RS }
END {
while ( beg = index(RS rec,RS block RS) ) {
out = out substr(RS rec,1,beg-1)
rec = substr(RS rec,beg+length(block)+2)
}
print substr(out rec,2)
}
$ awk -v block='line 2\nline 3' -f tst.awk file
line 1
line 4
Not awk, but this is straightforward with Perl 5, as #triplee pointed out. With the five-line input file you showed above as foo.txt:
perl -0777 -pe 's{^line 2\nline 3\n}{}gm' foo.txt
produces the desired three-line output.
Explanation:
-0777 causes perl to read the entire input as one string (see perlrun).
The /m modifier on the regex causes ^ to match at the beginning of a line (see perlre).
Edit ^ will also match at the beginning of the file, so you can detect blocks of lines even if there is not a newline before them.
The separators between the lines are literal \ns because $ matches before the \n with the /m modifier. Therefore, it's easier just to match the \n.
Thanks to this U&L SE answer by Stéphane Chazelas for the basics.
With gnu sed
sed -z 's/line 2\nline 3\n//g;s/line 2\nline 3\n$//' infile
This might work for you (GNU sed):
sed '/^line 2$/!b;N;/^line 3$/Md;P;D' file
If a line does not match the string line 2, print it and begin the next cycle. Otherwise, append the following line and if that does match the string line 3, delete both lines. Otherwise, print then delete the first line and repeat.

Delete lines from a text file except the first and every nth

I have a long text file comprised of numbers, such as:
1
2
9.252
9.252
9.272
1
1
6.11
6.11
6.129
I would like to keep the first line, delete the subsequent three and then keep the next one. I would like to do this process for the whole file. Following that logic, considered the input above, I would like to have the following output:
1
9.272
1
6.129
Using GNU sed (needed for the ~ extension):
sed -n '1~5p;5~5p' file
Saving your numbers in a "textfile.txt" I can use the following with sed:
sed -n 'p;n;n;n;n;p;' textfile.txt
Sed prints the first line, reads the next 4 and prints the last line.
Or the following using while read in bash:
while read -r firstline && read -r nextone1 && read -r nextone2 && read -r nextone3 && read -r lastone; do
printf "%s\n" "$firstline" "$lastone";
done < textfile.txt
This just reads 5 lines at a time and prints only the first and 5th lines.
You can simply say:
awk 'NR%5<2' input.txt
Explanation: Considering the entire pattern repeats every five lines, let's start with applying modulo operation to the line number NR by five. Then we'll see the 1st line of the five-line block yields "1" and the 5th line of the block yields "0". Now they can be separated from other lines by comparing it to two.
To print the 1st and 5th line of every block of 5 lines (remember that 5%5 = 0):
$ awk '(NR%5) ~ /[10]/' file
1
9.272
1
6.129
If you want to print the 2nd, 3rd, and 4th line of every block of 5 lines instead of the 1st and 5th:
$ awk '(NR%5) ~ /[234]/' file
2
9.252
9.252
1
6.11
6.11
If you wanted to print the 27th and 53rd line of every block of 100:
awk '(NR%100) ~ /^(27|53)$/' file
We couldn't use a bracket expression there as we're now beyond single char numbers.
This might work for you (GNU sed):
sed '2~5,+2d' file
Starting from line 2, delete the next three lines using modulo 5.
An alternative:
sed -n '1p;5~5,+1p' file
Considering your groups are packed as 5 lines, you could use awk with a mod 5 operation.
awk '{i=(NR-1)%5;if(i==0||i==4)print $0}' input.txt
With indentation it looks like this:
{
i=(NR-1)%5;
if (i==0||i==4)
print $0;
}
i=(NR-1)%5 gets the line number and computes the modulo with 5, but since the line numbers start at 1 (instead of 0), you need to subtract 1 to it before computing the modulo.
This leaves you with an integer i that ranges from 0 to 4. You want to print the first line (index 0), skip the next three lines (indexes 1-3) and print the last line (index 4), which is exactly what does if (i==0||i==4) print $0
Alternately you can do the same thing with a shorter (and probably slightly more optimized version):
awk '((NR-1)%5==0||(NR-1)%5==4)' input.txt
This tells awk to do something for every 1st out of 5 lines and every 5th out of 5 lines. Since the "something" is not defined, by default it outputs the current line. If it helps, this is strictly equivalent to:
awk '((NR-1)%5==0||(NR-1)%5==4){print $0}' input.txt

I have a text file and I need to delete the first blank line and then all the text after the 2nd blank line

I'm using bash and I have a file that is in 3 parts of text. The first part, then a blank line, then the 2nd part then another blank line, then the file 3 part of text. I need to output this to a new file that contains only the first 2 parts without the blank line in between. I've been playing with sed and awk, but can't quite figure it out.
Most simply with awk:
awk -v RS= 'NR <= 2' filename
With an empty record separator RS, awk splits the file into records at empty lines. With the selection NR <= 2, only the first two are printed (delimited by the default output record separator, which is a newline).
If the file is very large, it might be prudent to amend this to
awk -v RS= '1; NR == 2 { exit }' filename
This stops processing the file after the second record and prints all until then.
Addendum: Obligatory crazy sed solution (not recommended for use, written for fun):
sed -n '/^$/ { x; /./q; H; d; }; p' filename

Delete line and update order

I have a file that contains lines starting with a number, for example
1 This is the first line
2 this is the second line
3 this is the third line
4 This is the fourth line
What I want to do is delete a line for example line 2 and update the numbering so the file would look like the following, I want to do this in a bash script.
1 This is the first line
2 this is the third line
3 This is the fourth line
Thanks
IMO it might be a little easier with awk:
awk '!/regex/ {$1=++x; print}' inputFile
In the /.../ you can put the regex that occurs on the line that needs to be deleted.
Test:
$ cat inputFile
1 This is the first line
2 this is the second line
3 this is the third line
4 This is the fourth line
$ awk '!/second/ {$1=++x; print}' inputFile
1 This is the first line
2 this is the third line
3 This is the fourth line
$ awk '!/third/ {$1=++x; print}' inputFile
1 This is the first line
2 this is the second line
3 This is the fourth line
$ awk '!/first/ {$1=++x; print}' inputFile
1 this is the second line
2 this is the third line
3 This is the fourth line
Note: Since we are re-constructing the $1 field, any white space sequences will get removed.
You can use this set of commands:
grep -v '^2 ' file | cut -d' ' -f2- | nl -w1 -s' '
Using grep with -v option allows to remove line #2.
cut program cuts the first column which is line number.
Finally, we just need to renumber the lines so we use nl.

Resources