bash remove block of text from file - bash

Suppose I have an input file with lines of text:
line 1
line 2
line 3
line 4
line 2
now suppose I would like to check if my inputfile contains
line 2
line 3
and remove that block of text if it is found. This would give:
line 1
line 4
line 2
Note that I don't want to remove just every occurrence of line 2 or line 3; but only if they are found one after another. (In reality I want to check for a block of 5 lines, and not just any block of code between two placeholders, but let's keep the example simple).
I looked into awk but that is getting complicated very quick (I'm not yet ready with this; since I feel this is not the right approach and will explode with 5 lines...)
awk '/line 2/ {if (line0) {print line0; line0=""}; line0=$0}' input.txt

One way with GNU awk for multi-char RS and RT:
$ awk -v RS='(^|\n)line 2\nline 3\n' '{ORS=(RT ~ /^\n/ ? "\n" : "")} 1' file
line 1
line 4
line 2
With any awk:
$ cat file
line 2
line 3
line 1
line 2
line 3
line 4
line 2
line 3
$ awk '
{ rec = rec $0 RS }
END {
rec = RS rec
gsub(/\nline 2\nline 3\n/,RS,rec)
gsub(/^\n|\n$/,"",rec)
print rec
}
' file
line 1
line 4
The above assumes you want to match using regexps since that's what your posted code does. If you want to do literal string matches instead that's do-able too with some massaging:
$ cat tst.awk
{ rec = rec $0 RS }
END {
while ( beg = index(RS rec,RS block RS) ) {
out = out substr(RS rec,1,beg-1)
rec = substr(RS rec,beg+length(block)+2)
}
print substr(out rec,2)
}
$ awk -v block='line 2\nline 3' -f tst.awk file
line 1
line 4

Not awk, but this is straightforward with Perl 5, as #triplee pointed out. With the five-line input file you showed above as foo.txt:
perl -0777 -pe 's{^line 2\nline 3\n}{}gm' foo.txt
produces the desired three-line output.
Explanation:
-0777 causes perl to read the entire input as one string (see perlrun).
The /m modifier on the regex causes ^ to match at the beginning of a line (see perlre).
Edit ^ will also match at the beginning of the file, so you can detect blocks of lines even if there is not a newline before them.
The separators between the lines are literal \ns because $ matches before the \n with the /m modifier. Therefore, it's easier just to match the \n.
Thanks to this U&L SE answer by Stéphane Chazelas for the basics.

With gnu sed
sed -z 's/line 2\nline 3\n//g;s/line 2\nline 3\n$//' infile

This might work for you (GNU sed):
sed '/^line 2$/!b;N;/^line 3$/Md;P;D' file
If a line does not match the string line 2, print it and begin the next cycle. Otherwise, append the following line and if that does match the string line 3, delete both lines. Otherwise, print then delete the first line and repeat.

Related

to check if a line before and after a string empty

I need to delete certain numbers of line before a desired text but only if a line before and after searched string is empty.
E.g (line number, content)
1
2
3 Hello
4
5 yellow
in this case, if lines before and after line containing Hello are empty (line 2 and 4), i have to delete lines from 3 to 1.
I can delete lines from 3 to 1 using tac and sed command but m having difficulty in putting tht if condition.
tac file1|sed -e '/Hello/,+3d'|tac
This might work for you (GNU sed):
sed ':a;N;s/\n/&/3;Ta;/\n\n.*Hello.*\n$/s/.*\n//;ta;P;D' file
Gather up 4 lines in the pattern space and if the 2nd and the 4th are empty and the 3rd contains Hello, delete the first three lines and repeat. Otherwise print the first line and repeat.
Could you please try following if you are ok with awk.
awk -v string="Hello" '
FNR==NR{
a[FNR]=$0
next
}
($0==string) && a[FNR-1]=="" && a[FNR+1]==""{
a[FNR-1]=a[FNR]=a[FNR-2]="del_flag"
}
END{
for(i=1;i<=length(a);i++){
if(a[i]!="del_flag"){
print a[i]
}
}
}
' Input_file Input_file
With GNU sed option -z you can match
some_line
empty line
line With Hello
empty line
and replace this with an empty line.
sed -rz 's/(^|\n)[^\n]*\n\nHello\n\n/\1\n/g' file1
EDIT: added g for multiple segments.

Print lines from text file splitted into parts by a pattern in bash

I have a text file like this:
line 1
line 2
*
line 3
*
line 4
line 5
line 6
*
line 7
line 8
I would like to write out parts which are between the two patterns (* in this case). So if I want the first section, I want to get
line 1
line 2
If I want to get the third one it should be
line 4
line 5
line 6
The returned lines should be without the asterisk, and it is important that there is no asterisk at the beginning or at the end.
I was thinking about "splitting" the whole text into columns using '*' as delimiter with sed or awk, but I did not succeed. Anyone could help? Thanks a lot.
This was what I tried:
sed "/^%/{x;s/^/X/;/^X\\{$choice\\}$/ba;x};d;:a;x;:b;$!{n;/^%/!b‌​b}" "$file"
but this needs to have an * at the beginning and it also prints the asterisks before and after.
$ awk -v num=3 '$0=="*"{ if (++count >= num) exit; next } num-1==count' data
line 4
line 5
line 6
$ awk -v num=1 '$0=="*"{ if (++count >= num) exit; next } num-1==count' data
line 1
line 2
awk to the rescue
$ awk -v RS='\n\\*\n' -v n=3 'NR==n' file
line 4
line 5
line 6
this requires multi-char record separator support (gawk).
Another alternative, with counting stars
$ awk -v n=3 '/^*/{c++;next} c==n-1' file
line 4
line 5
line 6

how to find continuous blank lines and convert them to one

I have a file -- a, and exist some continues blank line(more than one), see below:
cat a
1
2
3
4
5
So first I want to know if exist continues blank lines, I tried
cat a | grep '\n\n\n'
nothing output. So I have to use below manner
vi a
:set list
/\n\n\n
So I want to know if exist other shell command could easily implement this?
then if exist two and more blank lines I want to convert them to one? see below
1
2
3
4
5
at first I tried below shell
sed 's/\n\n\(\n\)*/\n\n/g' a
it does not work, then I tried this shell
cat a | tr '\n' '$' | sed 's/$$\(\$\)*/$$/g' | tr '$' '\n'
this time it works. And also I want to know if exist other manner could implement this?
Well, if your cat implementation supports
-s, --squeeze-blank
suppress repeated empty output lines
then it is as simple as
$ cat -s a
1
2
3
4
5
Also, both -s and -n for numbering lines is likely to be available with less command as well.
remark: lines containing only blanks will not be suppressed.
If your cat does not support -s then you could use:
awk 'NF||p; {p=NF}'
or if you want to guarantee a blank line after every record, including at the end of the output even if none was present in the input, then:
awk -v RS= -v ORS='\n\n' '1'
If your input contains lines of all white space and you want them to be treated just like lines of non white space (like cat -s does, see the comments below) then:
awk '/./||p; {p=/./}'
and to guarantee a blank line at the end of the output:
awk '/./||p; {p=/./} END{if (p) print ""}'
This awk command should work to produce an output with 2 line breaks at each line:
awk -v RS= '{printf "%s%s", $0, ORS (RT ~ /\n{2,}/ ? ORS : "")}' file
1
2
3
4
5
This awk is using:
-v RS=: sets empty input record separator so that each empty line becomes record separator
printf "%s%s", $0, ORS: prints each line with single line break
(RT ~ /\n{2,}/ ? ORS : ""): prints additional line break if input record separator has more than 2 line breaks
You may use perl as well in slurp mode:
perl -0777 -pe 's/\R{2,}/\n\n/g' file
1
2
3
4
5
Command breakup:
-0777 Slurp mode to read entire file
's/\R{2,}/\n\n/g' Match 2 or more line breaks and replace by 2 line breaks
You can --squeeze-repeats with tr and then use sed to insert just a new line:
<a tr -s '\n' | sed 'G'
remark: This is a copy from my answer here
A very quick way is using awk
awk 'BEGIN{RS="";ORS="\n\n"}1'
How does this work:
awk knowns the concept records (which is by default lines) and you can define a record by its record separator RS. If you set the value of RS to an empty string, it will match any multitude of empty lines as a record separator. The value ORS is the output record separator. It states which separator should be printed between two consecutive records. This is set to two <newline> characters. Finally, the statement 1 is a shorthand for {print $0} which prints the current record followed by the output record-separator ORS.
note: This will, just as cat -s keep lines with only blanks as actual lines and will not suppress them.
Another awk solution:
awk 'NF' ORS="\n\n" a
1
2
3
4
5
It checks if the line is not empty by testing if NF (number of fields) is not zero. It it matches, print the line as default action. ORS (output record separator) is set to 2 newline characters, so there is an empty line between non-empty lines.
1) awk solution
$ echo "a\n\n\nb\n\n\nc\n\n\n" | awk 'BEGIN{b=0} /^$/{b=1;next} {printf "%s%s\n", b==1?"\n":"",$0} {b=0} END{printf "%s",b==1?"\n":""}'
a
b
c
$
2) sed solution
sed '
/^$/{ ${ p; d; }; H; d; }
/^$/!{ x; s/^\(\n\{1,\}\)$/\1/; ts; Tf; }
:s { x; s/\(.*\)/\n\1/; x; s/.*//; x; p; d; }
:f { x; p; d; }
'
SED Explanation:
/^$/{ ${ p; d; }; H; d; }
--If input is blank, if it is the last line, just print, else append to the holdspace and delete the pattern space and start new cycle
/^$/!{ x; s/^\(\n\{1,\}\)$/\1/; ts; Tf; }
--If input is not blank, exchange content of the p space and h space and check if h space contains \n. if yes, jump to s, if not jump to f
:s { x; s/\(.*\)/\n\1/; x; s/.*//; x; p; d; }
--If blank lines are present in h space, then append \n to p space, then clear hold space , then print p space and delete p space
:f { x; p; d; }
--If blank lines are absent in h space, then print p space and delete p space

insert entire content of a text file in the position of single line using command line linux

I am bascially trying to replace a single line of text with the entire content of another text file from the command line (linux). Any idea how to do that ?
You can try this sed,
sed -e '/trigger/r newfile' -e '/trigger/d' org_file
Here,
newfile will have a content a content to be insert when trigger is found in org_file.
Test:
$ cat > org_file
line 1
line 2
line 3
trigger
line 6
line 7
$ cat > newfile
line 4
line 5
$ sed -e '/trigger/r newfile' -e '/trigger/d' org_file
line 1
line 2
line 3
line 4
line 5
line 6
line 7
Here is one way to do it with awk
awk 'FNR==NR {a[NR]=$0;f++;next} /trigger/ {for (i=1;i<=f;i++) print a[i];next}1' newdata orgfile
It stores the newdata in an array a
When trigger is found in orgfile, replace it by all data from array a
If you need to change a line and know the line number change /trigger/ to FNR==20
A variant of sat's answer with sed, which does not need to read org_file.txt twice:
sed '/trigger/{
r newfile.txt
d
}' org_file.txt
Or with fewer line breaks:
sed '/trigger/{r newfile.txt
d}' file.txt
There's a drawback: it's not a one-liner, as r commands interprets everything after it as a filename (more details here).
A workaround was provided by Peter.O (see above link):
sed -f <(sed 's/\\n/\n/'<<<'/trigger/{r new_file.txt\nd}') org_file.txt

Delete line and update order

I have a file that contains lines starting with a number, for example
1 This is the first line
2 this is the second line
3 this is the third line
4 This is the fourth line
What I want to do is delete a line for example line 2 and update the numbering so the file would look like the following, I want to do this in a bash script.
1 This is the first line
2 this is the third line
3 This is the fourth line
Thanks
IMO it might be a little easier with awk:
awk '!/regex/ {$1=++x; print}' inputFile
In the /.../ you can put the regex that occurs on the line that needs to be deleted.
Test:
$ cat inputFile
1 This is the first line
2 this is the second line
3 this is the third line
4 This is the fourth line
$ awk '!/second/ {$1=++x; print}' inputFile
1 This is the first line
2 this is the third line
3 This is the fourth line
$ awk '!/third/ {$1=++x; print}' inputFile
1 This is the first line
2 this is the second line
3 This is the fourth line
$ awk '!/first/ {$1=++x; print}' inputFile
1 this is the second line
2 this is the third line
3 This is the fourth line
Note: Since we are re-constructing the $1 field, any white space sequences will get removed.
You can use this set of commands:
grep -v '^2 ' file | cut -d' ' -f2- | nl -w1 -s' '
Using grep with -v option allows to remove line #2.
cut program cuts the first column which is line number.
Finally, we just need to renumber the lines so we use nl.

Resources