sed print between two line patterns only if both patterns are found - bash

Suppose I have a file with:
Line 1
Line 2
Start Line 3
Line 4
Line 5
Line 6
End Line 7
Line 8
Line 9
Start Line 10
Line 11
End Line 12
Line 13
Start line 14
Line 15
I want to use sed to print between the patterns only if both /Start/ and /End/ are found.
sed -n '/Start/,/End/p' works as expected if you know both markers are there and in the order expected, but it just prints from Start to the end of the file if End is either out of order or not present. (i.e., prints line 14 and line 15 in the example)
I have tried:
sed -n '/Start/,/End/{H;}; /End/{x; p;}' file
Prints:
# blank line here...
Start Line 3
Line 4
Line 5
Line 6
End Line 7
End Line 7
Start Line 10
Line 11
End Line 12
which is close but two issues:
Unwanted leading blank line
End Line 7 printed twice
I am hoping for a result similar to
$ awk '/Start/{x=1} x{buf=buf$0"\n"} /End/{print buf; buf=""; x=0}' file
Start Line 3
Line 4
Line 5
Line 6
End Line 7
Start Line 10
Line 11
End Line 12
(blank lines between the blocks not necessary...)

With GNU sed and sed from Solaris 11:
sed -n '/Start/{h;b;};H;/End/{g;p;}' file
Output:
Start Line 3
Line 4
Line 5
Line 6
End Line 7
Start Line 10
Line 11
End Line 12
If Start is found copy current pattern space to hold space (h) and branch to end of script (b). For every other line append current pattern space to hold space (H). If End is found copy hold space back to pattern space (g) and then print pattern space (p).

GNU sed: after encountering Start, keep appending lines as long as we don't see End; once we do, print the pattern space and start over:
$ sed -n '/Start/{:a;N;/End/!ba;p}' infile
Start Line 3
Line 4
Line 5
Line 6
End Line 7
Start Line 10
Line 11
End Line 12
Getting the newline between blocks is tricky. This would add one after each block, but results in an extra blank at the end:
$ sed -n '/Start/{:a;N;/End/!ba;s/$/\n/p}' infile
Start Line 3
Line 4
Line 5
Line 6
End Line 7
Start Line 10
Line 11
End Line 12
[blank]

You can use this awk:
awk 'x{buf=buf ORS $0} /Start/{x=1; buf=$0} /End/{print buf; buf=""; x=0}' file
Start Line 3
Line 4
Line 5
Line 6
End Line 7
Start Line 10
Line 11
End Line 12
Here is a sed version to do the same on OSX (BSD) sed (Based on Benjamin's sed command):
sed -n -e '/Start/{:a;' -e 'N;/End/!ba;' -e 'p;}' file

Personally, I prefer your awk solution, but:
sed -n -e '/start/,/end/H' -e '/end/{s/.*//; x; p}' input

Related

Remove lines with first column containing specific character in BASH

I have a file with four columns like this:
11 9929261 9929261 LOC101928008
11 99556214 100356220 CNTN5
11_JH159136v1_alt 193978 194908 OR8U9
I need bash script to remove all lines that contain "_" in first column.
Expected outcome would look like this:
11 9929261 9929261 LOC101928008
11 99556214 100356220 CNTN5
Even better if script can leave the line and first number of first column. In other words, remove all characters starting with "_" in first column of a file. In that case expected outcome would be like this:
11 9929261 9929261 LOC101928008
11 99556214 100356220 CNTN5
11 193978 194908 OR8U9
with awk this can be done with something like:
awk '{split($1,a,"_");$1=a[1]}1' input_file
You can try this:
sed "s/^\([0-9]*\).* \([0-9]*\) \([0-9]*\) \(.*\)/\1 \2 \3 \4/" < file.dat
# ___----------__-__________-__________-______ -----------
# 1 2 3 4 5 6 7 8 9 10
explanation
use sed substitute from begin of line
extract first column (number) and save it in variable1 \1
ignore all chars without to next blank
blank
extract second column (number) in varaible2 \2
blank
extract third column (number) in variable3 \3
blank
extract the rest of line into variable4 \4
write variable1 to 4 with blank as separator

SED to spit out nth and (n+1)th lines

EDITS: For reference, "stuff" is a general variable, as is "KEEP".
KEEP could be "Hi, my name is Dave" on line 2 and "I love pie" on line 7. The numbers I've put here are for illustration only and DO NOT show up in the data.
I had a file that needed to be parsed, keeping every 4th line, starting at the 3rd line. In other words, it looked like this:
1 stuff
2 stuff
3 KEEP
4
5 stuff
6 stuff
7 KEEP
8 stuff etc...
Great, sed solved that easily with:
sed -n -e 3~4p myfile
giving me
3 KEEP
7 KEEP
11 KEEP
Now I have a different file format and a different take on the pattern:
1 stuff
2 KEEP
3 KEEP
4
5 stuff
6 KEEP
7 KEEP etc...
and I still want the output of
2 KEEP
3 KEEP
6 KEEP
7 KEEP
10 KEEP
11 KEEP
Here's the problem - this is a multi-pattern "pattern" for sed. It's "every 4th line, spit out 2 lines, but start at line 2".
Do I need to have some sort of DO/FOR loop in my sed, or do I need a different command like awk or grep? Thus far, I have tried formats like:
sed -n -e '3~4p;4~4p' myfile
and
awk 'NR % 3 == 0 || NR % 4 ==0' myfile
and
sed -n -e '3~1p;4~4p' myfile
and
awk 'NR % 1 == 0 || NR % 4 ==0' myfile
source: https://superuser.com/questions/396536/how-to-keep-only-every-nth-line-of-a-file
If your intent is to print lines 2,3 then every fourth line after those two, you can do:
$ seq 20 | awk 'BEGIN{e[2];e[3]} (NR%4) in e'
2
3
6
7
10
11
14
15
18
19
You were pretty close with your sed:
$ printf '%s\n' {1..12} | sed -n '2~4p;3~4p'
2
3
6
7
10
11
this is the idiomatic way to write in awk
$ awk 'NR%4==2 || NR%4==3' file
however, this special case can be shortened to
$ awk 'NR%4>1' file
This might work for you (GNU sed):
sed '2~4,+1p;d' file
Use a range, the first parameter is the starting line and modulus (in this case from line 2 modulus 4). The second parameter is how man lines following the start of the range (in this case plus one). Print these lines and delete all others.
In the generic case, you want to keep lines p to p+q and p+n to p+q+n and p+2n to p+q+2n ... So you can write:
awk '(NR - p) % n <= q'

Process a line based on lines before and after in bash

I am trying to figure out how to write a bash script which uses the lines immediately before and after a line as a condition. I will give an example in a python-like pseudocode which makes sense to me.
Basically:
for line in FILE:
if line_minus_1 == line_plus_one:
line = line_minus_1
What would be the best way to do this?
So if I have an input file that reads:
3
1
1
1
2
2
1
2
1
1
1
2
2
1
2
my output would be:
3
1
1
1
2
2
2
2
1
1
1
2
2
2
2
Notice that it starts from the first line until the last line and respects changes made in earlier lines so if I have:
2
1
2
1
2
2
I would get:
2
2
2
2
2
2
and not:
2
1
1
1
2
2
$ awk 'minus2==$0{minus1=$0} NR>1{print minus1} {minus2=minus1; minus1=$0} END{print minus1}' file
3
1
1
1
2
2
2
2
1
1
1
2
2
2
2
How it works
minus2==$0{minus1=$0}
If the line from 2 lines ago is the same as the current line, then set the line from 1 line ago equal to the current line.
NR>1{print minus1}
If we are past the first line, then print the line from 1 line ago.
minus2=minus1; minus1=$0
Update the variables.
END{print minus1}
After we have finished reading the file, print the last line.
Multiple line version
For those who like their code spread over multiple lines:
awk '
minus2==$0{
minus1=$0
}
NR>1{
print minus1
}
{
minus2=minus1
minus1=$0
}
END{
print minus1
}
' file
Here is a (GNU) sed solution:
$ sed -r '1N;N;/^(.*)\n.*\n\1$/s/^(.*\n).*\n/\1\1/;P;D' infile
3
1
1
1
2
2
2
2
1
1
1
2
2
2
2
This works with a moving three line window. A bit more readable:
sed -r ' # -r for extended regular expressions: () instead of \(\)
1N # On first line, append second line to pattern space
N # On all lines, append third line to pattern space
/^(.*)\n.*\n\1$/s/^(.*\n).*\n/\1\1/ # See below
P # Print first line of pattern space
D # Delete first line of pattern space
' infile
N;P;D is the idiomatic way to get a moving two line window: append a line, print first line, delete first line of pattern space. To get a moving three line window, we read an additional line, but only once, namely when processing the first line (1N).
The complicated bit is checking if the first and third line of the pattern space are identical, and if they are, replacing the second line with the first line. To check if we have to make the substitution, we use the address
/^(.*)\n.*\n\1$/
The anchors ^ and $ are not really required as we'll always have exactly to newlines in the pattern space, but it makes it more clear that we want to match the complete pattern space. We put the first line into a capture group and see if it is repeated on the third line by using a backreference.
Then, if this is the case, we perform the substitution
s/^(.*\n).*\n/\1\1/
This captures the first line including the newline, matches the second line including the newline, and substitutes with twice the first line. P and D then print and remove the first line.
When reaching the end, the whole pattern space is printed so we're not swallowing any lines.
This also works with the second input example:
$ sed -r '1N;N;/^(.*)\n.*\n\1$/s/^(.*\n).*\n/\1\1/;P;D' infile2
2
2
2
2
2
2
To use with BSD sed (as found in OS X), you'd either have to use the -E instead of the -r option, or use no option, i.e., basic regular expressions and escape all parentheses (\(\)) in the capture groups. The newline matching should work, but I didn't test it. If in doubt, check this great answer lining out all the differences.

how to search and replace content of a file ,starting from a specific line number in bash

I have the below file .
$ cat testfile
line 1
line 2
line 3
line 4
line 5
line 6
$
I need to search and replace all the strings 'line' with 'LINE' from the line number 2 till the end . I tried like below
$sed '2 s/line/LINE/g' testfile
line 1
LINE 2
line 3
line 4
line 5
line 6
$
But my required output is :
line 1
LINE 2
LINE 3
LINE 4
LINE 5
LINE 6
$
How can I achieve it with sed command alone .
Try this:
# sed '2,$ s/line/LINE/g' /tmp/testfile
line 1
LINE 2
LINE 3
LINE 4
LINE 5
LINE 6
2,$ denotes from second line to end.
You are looking for:
sed '2,$s/line/LINE/g' file
I suggest you reading the man/info page of sed, the "address" part.
see: https://www.gnu.org/software/sed/manual/html_node/Addresses.html
sed '2,$ s/line/LINE/' testfile
If you want to change the file in-place (using -i parameter with sed) I recommend ed instead of sed.
echo -e '2,$ s/line/LINE/\nw' | ed -sv testfile

head command to skip last few lines of file on MAC OSX

I want to output all lines of a file, but skip last 4, on Terminal.
As per UNIX man page following could be a solution.
head -n -4 main.m
MAN Page:
-n, --lines=[-]N
print the first N lines instead of the first 10; with the lead-
ing '-', print all but the last N lines of each file
I read man page here. http://unixhelp.ed.ac.uk/CGI/man-cgi?head
But on MAC OSx I get following error.
head: illegal line count -- -4
What else can be done to achieve this goal?
GNU version of head supports negative numbers.
brew install coreutils
ghead -n -4 main.m
Use awk for example:
$ cat file
line 1
line 2
line 3
line 4
line 5
line 6
$ awk 'n>=4 { print a[n%4] } { a[n%4]=$0; n=n+1 }' file
line 1
line 2
$
It can be simplified to awk 'n>=4 { print a[n%4] } { a[n++%4]=$0 }' but I'm not sure if all awk implementations support it.
A Python one-liner:
$ cat foo
line 1
line 2
line 3
line 4
line 5
line 6
$ python -c "import sys; a=[]; [a.append(line) for line in sys.stdin]; [sys.stdout.write(l) for l in a[:-4]]" < foo
line 1
line 2

Resources