Delete the lines from file between pattern match - shell

How to delete all the lines between two pattern in file using sed.
Here pattern are //test and //endtest, file content:
blah blah blah
c
f
f
[
]
//test
all text to be deleted
line1
line2
xyz
amv
{
//endtest
l
dsf
dsfs
Expected result:
blah blah blah
c
f
f
[
]
//test
//endtest
l
dsf
dsfs

This is common feature of sed
sed '/^\/\/test$/,/^\/\/endtest/d'
As / is used to bound regex, they have to be escaped, in regex.
If you want to keep marks (as requested):
sed '/^\/\/test$/,/^\/\/endtest/{//!d}'
Explanation:
Have a look at info sed, search for sed address -> Regexp Addresses and Range Addresses.
Enclosed by { ... }, symbol // mean any bound.
The empty regular expression '//' repeats the last regular
expression match (the same holds if the empty regular expression is
passed to the 's' command).
! mean not, then d for delete line
Alternative: You could write:
sed '/^\/\/\(end\)\?test$/,//{//!d}'
or
sed -E '/^\/\/(end)?test$/,//{//!d}'
Will work same, but care, this could reverse effect if some extra pattern //endtest may exist before first open pattern (//test).
... All this was done, using GNU sed 4.4!
Under MacOS, BSD sed
Under MacOS, I've successfully dropped wanted lines with this syntax:
sed '/^\/\/test$/,/^\/\/endtest/{/^\/\/\(end\)\{0,1\}test$/!d;}'
or
sed -E '/^\/\/test$/,/^\/\/endtest/{/^\/\/(end)?test$/!d;}'

With awk:
$ awk '/\/\/endtest/{p=0} !p; /\/\/test/{p = 1}' file
blah blah blah
c
f
f
[
]
//test
//endtest
l
dsf
dsfs

if your data in 'd' file, try gnu sed:
sed -E '/\/\/test/,/\/\/endtest/{/\/\/.*test/!d}' d

Related

sed/awk between two patterns in a file: pattern 1 set by a variable from lines of a second file; pattern 2 designated by a specified charcacter

I have two files. One file contains a pattern that I want to match in a second file. I want to use that pattern to print between that pattern (included) up to a specified character (not included) and then concatenate into a single output file.
For instance,
File_1:
a
c
d
and File_2:
>a
MEEL
>b
MLPK
>c
MEHL
>d
MLWL
>e
MTNH
I have been using variations of this loop:
while read $id;
do
sed -n "/>$id/,/>/{//!p;}" File_2;
done < File_1
hoping to obtain something like the following output:
>a
MEEL
>c
MEHL
>d
MLWL
But have had no such luck. I have played around with grep/fgrep awk and sed and between the three cannot seem to get the right (or any output). Would someone kindly point me in the right direction?
Try:
$ awk -F'>' 'FNR==NR{a[$1]; next} NF==2{f=$2 in a} f' file1 file2
>a
MEEL
>c
MEHL
>d
MLWL
How it works
-F'>'
This sets the field separator to >.
FNR==NR{a[$1]; next}
While reading in the first file, this creates a key in array a for every line in file file.
NF==2{f=$2 in a}
For every line in file 2 that has two fields, this sets variable f to true if the second field is a key in a or false if it is not.
f
If f is true, print the line.
A plain (GNU) sed solution. Files are read only once. It is assumed that characters in File_1 needn't to be quoted in sed expression.
pat=$(sed ':a; $!{N;ba;}; y/\n/|/' File_1)
sed -E -n ":a; /^>($pat)/{:b; p; n; /^>/ba; bb}" File_2
Explanation:
The first call to sed generates a regular expression to be used in the second call to sed and stores it in the variable pat. The aim is to avoid reading repeatedly the entire File_2 for each line of File_1. It just "slurps" the File_1 and replaces new-line characters with | characters. So the sample File_1 becomes a string with the value a|c|d. The regular expression a|c|d matches if at least one of the alternatives (a, b, c for this example) matches (this is a GNU sed extension).
The second sed expression, ":a; /^>($pat)/{:b; p; n; /^>/ba; bb}", could be converted to pseudo code like this:
begin:
read next line (from File_2) or quit on end-of-file
label_a:
if line begins with `>` followed by one of the alternatives in `pat` then
label_b:
print the line
read next line (from File_2) or quit on end-of-file
if line begins with `>` goto label_a else goto label_b
else goto begin
Let me try to explain why your approach does not work well:
You need to say while read id instead of while read $id.
The sed command />$id/,/>/{//!p;} will exclude the lines which start
with >.
Then you might want to say something like:
while read id; do
sed -n "/^>$id/{N;p}" File_2
done < File_1
Output:
>a
MEEL
>c
MEHL
>d
MLWL
But the code above is inefficient because it reads File_2 as many times as the count of the id's in File_1.
Please try the elegant solution by John1024 instead.
If ed is available, and since the shell is involve.
#!/usr/bin/env bash
mapfile -t to_match < file1.txt
ed -s file2.txt <<-EOF
g/\(^>[${to_match[*]}]\)/;/^>/-1p
q
EOF
It will only run ed once and not every line that has the pattern, that matches from file1. Like say if you have a to z from file1,ed will not run 26 times.
Requires bash4+ because of mapfile.
How it works
mapfile -t to_match < file1.txt
Saves the entry/value from file1 in an array named to_match
ed -s file2.txt point ed to file2 with the -s flag which means don't print info about the file, same info you get with wc file
<<-EOF A here document, shell syntax.
g/\(^>[${to_match[*]}]\)/;/^>/-1p
g means search the whole file aka global.
( ) capture group, it needs escaping because ed only supports BRE, basic regular expression.
^> If line starts with a > the ^ is an anchor which means the start.
[ ] is a bracket expression match whatever is inside of it, in this case the value of the array "${to_match[*]}"
; Include the next address/pattern
/^>/ Match a leading >
-1 go back one line after the pattern match.
p print whatever was matched by the pattern.
q quit ed

bash script: how to insert text between two specific characters

For example, I have a file containing a line as below:
"abc":"def"
I need to insert 123 between "abc":" and def" so that the line will become: "abc":"123def".
As "abc" appears only once so I think I can just search it and do the insertion.
How to do this with bash script such as sed or awk?
AMD$ sed 's/"abc":"/&123/' File
"abc":"123def"
Match "abc":", then append this match with 123 (& will contain the matched string "abc":")
If you want to take care of space before and after :, you can use:
sed 's/"abc" *: *"/&123/'
For replacing all such patterns, use g with sed.
sed 's/"abc" *: *"/&123/g' File
sed:
$ sed -E 's/(:")(.*)/\1123\2/' <<<'"abc":"def"'
"abc":"123def"
(:") gets :" and put in captured group 1
(.*) gets the remaining portion and put in captured group 2
in the replacement, \1123\2 puts 123 between the groups
awk:
$ awk -F: 'sub(".", "&123", $2)' <<<'"abc":"def"'
"abc" "123def"
In the sub() function, the second ($2) field is being operated on, pattern is used as . (which would match "), and in the replacement the matched portion (&) is followed by 123.
echo '"abc":"def"'| awk '{sub(/def/,"123def")}1'
"abc":"123def"

sed: Replacing a range of text with contents of a file

There are many examples here and elsewhere on the interwebs for using sed's 'r' to replace a pattern, but it does not seem to work on a range, but maybe I'm just not holding it right.
The following works as expected, deleting BEGIN PATTERN and replacing it with the contents of /tmp/somefile.
sed -n "/BEGIN PATTERN/{ r /tmp/somefile d }" TARGET_FILE
This, however, only replaces END_PATTERN with the contents of /tmp/somefile.
sed -n "/BEGIN PATTERN/,/END PATTERN/ { r /tmp/somefile d }" TARGET_FILE
I suppose I could try perl or awk to do this as well, but it seems like sed should be able to do this.
I believe that this does what you want:
sed $'/BEGIN PATTERN/r somefile\n /BEGIN PATTERN/,/END PATTERN/d' file
Or:
sed -e '/BEGIN PATTERN/r somefile' -e '/BEGIN PATTERN/,/END PATTERN/d' file
How it works
/BEGIN PATTERN/r somefile
Whenever BEGIN PATTERN is found, this inserts the contents of somefile.
/BEGIN PATTERN/,/END PATTERN/d
Whenever we are in the range from a line with /BEGIN PATTERN/ to a line with /END PATTERN/, we delete (d) the contains of the pattern buffer.
Example
Let's consider these two test files:
$ cat file
prelude
BEGIN PATTERN
middle
END PATTERN
afterthought
and:
$ cat somefile
This is
New.
Our command produces:
$ sed $'/BEGIN PATTERN/r somefile\n /BEGIN PATTERN/,/END PATTERN/d' file
prelude
This is
New.
afterthought
This might work for you (GNU sed):
sed -e '/BEGIN PATTERN/,/END PATTERN/{/END PATTERN/!d;r somefile' -e 'd}' file
John1024's answer works if BEGIN PATTERN and END PATTERN are different. If this is not the case, the following works:
sed $'/PATTERN/,/PATTERN/d; 1,/PATTERN/ { /PATTERN/r somefile\n }' file
By preserving the pattern:
sed $'/PATTERN/,/PATTERN/ { /PATTERN/!d; }; 1,/PATTERN/ { /PATTERN/r somefile\n }' file
This solution can yield false positives if the pattern is not paired as potong pointed out.

sed not working as expected (trying to get value between two matches in a string)

I have a file (/tmp/test) the has a the string "aaabbbccc" in it
I want to extract "bbb" from the string with sed.
Doing this returns the entire string:
sed -n '/aaa/,/ccc/p' /tmp/test
I just want to return bbb from the string with sed (I am trying to learn sed so not interested in other solutions for this)
Sed works on a line basic, and a,b{action} will run action for lines matching a until lines matching b. In your case
sed -n '/aaa/,/ccc/p'
will start printing lines when /aaa/ is matched, and stop when /ccc/ is matched which is not what you want.
To manipulate a line there is multiply options, one is s/search/replace/ which can be utilized to remove the leading aaa and trailing ccc:
% sed 's/^aaa\|ccc$//g' /tmp/test
bbb
Breakdown:
s/
^aaa # Match literal aaa in beginning of string
\| # ... or ...
ccc$ # Match literal ccc at the end of the sting
// # Replace with nothing
g # Global (Do until there is no more matches, normally when a match is
# found and replacement is made this command stops replacing)
If you are not sure how many a's and c's you have you can use:
% sed 's/^aa*\|cc*$//g' /tmp/test
bbb
Which will match literal a followed by zero or more a's at the beginning of the line. Same for the c's but just at the end.
With GNU sed:
sed 's/aaa\(.*\)ccc/\1/' /tmp/test
Output:
bbb
See: The Stack Overflow Regular Expressions FAQ

How do I insert a newline/linebreak after a line using sed

It took me a while to figure out how to do this, so posting in case anyone else is looking for the same.
For adding a newline after a pattern, you can also say:
sed '/pattern/{G;}' filename
Quoting GNU sed manual:
G
Append a newline to the contents of the pattern space, and then append the contents of the hold space to that of the pattern space.
EDIT:
Incidentally, this happens to be covered in sed one liners:
# insert a blank line below every line which matches "regex"
sed '/regex/G'
This sed command:
sed -i '' '/pid = run/ a\
\
' file.txt
Finds the line with: pid = run
file.txt before
; Note: the default prefix is /usr/local/var
; Default Value: none
;pid = run/php-fpm.pid
; Error log file
and adds a linebreak after that line inside file.txt
file.txt after
; Note: the default prefix is /usr/local/var
; Default Value: none
;pid = run/php-fpm.pid
; Error log file
Or if you want to add text and a linebreak:
sed -i '/pid = run/ a\
new line of text\
' file.txt
file.txt after
; Note: the default prefix is /usr/local/var
; Default Value: none
;pid = run/php-fpm.pid
new line of text
; Error log file
A simple substitution works well:
sed 's/pattern.*$/&\n/'
Example :
$ printf "Hi\nBye\n" | sed 's/H.*$/&\nJohn/'
Hi
John
Bye
To be standard compliant, replace \n by backslash newline :
$ printf "Hi\nBye\n" | sed 's/H.*$/&\
> John/'
Hi
John
Bye
sed '/pattern/a\\r' file name
It will add a return after the pattern while g will replace the pattern with a blank line.
If a new line (blank) has to be added at end of the file use this:
sed '$a\\r' file name
Another possibility, e.g. if You don't have an empty hold register, could be:
sed '/pattern/{p;s/.*//}' file
Explanation:
/pattern/{...} = apply sequence of commands, if line with pattern found,
p = print the current line,
; = separator between commands,
s/.*// = replace anything with nothing in the pattern register,
then automatically print the empty pattern register as additional line)
The easiest option -->
sed 'i\
' filename

Resources