extract the data between two pattern and save it with different name [duplicate] - bash

Using awk or sed how can I select lines which are occurring between two different marker patterns? There may be multiple sections marked with these patterns.
For example:
Suppose the file contains:
abc
def1
ghi1
jkl1
mno
abc
def2
ghi2
jkl2
mno
pqr
stu
And the starting pattern is abc and ending pattern is mno
So, I need the output as:
def1
ghi1
jkl1
def2
ghi2
jkl2
I am using sed to match the pattern once:
sed -e '1,/abc/d' -e '/mno/,$d' <FILE>
Is there any way in sed or awk to do it repeatedly until the end of file?

Use awk with a flag to trigger the print when necessary:
$ awk '/abc/{flag=1;next}/mno/{flag=0}flag' file
def1
ghi1
jkl1
def2
ghi2
jkl2
How does this work?
/abc/ matches lines having this text, as well as /mno/ does.
/abc/{flag=1;next} sets the flag when the text abc is found. Then, it skips the line.
/mno/{flag=0} unsets the flag when the text mno is found.
The final flag is a pattern with the default action, which is to print $0: if flag is equal 1 the line is printed.
For a more detailed description and examples, together with cases when the patterns are either shown or not, see How to select lines between two patterns?.

Using sed:
sed -n -e '/^abc$/,/^mno$/{ /^abc$/d; /^mno$/d; p; }'
The -n option means do not print by default.
The pattern looks for lines containing just abc to just mno, and then executes the actions in the { ... }. The first action deletes the abc line; the second the mno line; and the p prints the remaining lines. You can relax the regexes as required. Any lines outside the range of abc..mno are simply not printed.

This might work for you (GNU sed):
sed '/^abc$/,/^mno$/{//!b};d' file
Delete all lines except for those between lines starting abc and mno

sed '/^abc$/,/^mno$/!d;//d' file
golfs two characters better than ppotong's {//!b};d
The empty forward slashes // mean: "reuse the last regular expression used". and the command does the same as the more understandable:
sed '/^abc$/,/^mno$/!d;/^abc$/d;/^mno$/d' file
This seems to be POSIX:
If an RE is empty (that is, no pattern is specified) sed shall behave as if the last RE used in the last command applied (either as an address or as part of a substitute command) was specified.

From the previous response's links, the one that did it for me, running ksh on Solaris, was this:
sed '1,/firstmatch/d;/secondmatch/,$d'
1,/firstmatch/d: from line 1 until the first time you find firstmatch, delete.
/secondmatch/,$d: from the first occurrance of secondmatch until the end of file, delete.
Semicolon separates the two commands, which are executed in sequence.

something like this works for me:
file.awk:
BEGIN {
record=0
}
/^abc$/ {
record=1
}
/^mno$/ {
record=0;
print "s="s;
s=""
}
!/^abc|mno$/ {
if (record==1) {
s = s"\n"$0
}
}
using: awk -f file.awk data...
edit: O_o fedorqui solution is way better/prettier than mine.

Don_crissti's answer from Show only text between 2 matching pattern?
firstmatch="abc"
secondmatch="cdf"
sed "/$firstmatch/,/$secondmatch/!d;//d" infile
which is much more efficient than AWK's application, see here.

perl -lne 'print if((/abc/../mno/) && !(/abc/||/mno/))' your_file

I tried to use awk to print lines between two patterns while pattern2 also match pattern1. And the pattern1 line should also be printed.
e.g.
source
package AAA
aaa
bbb
ccc
package BBB
ddd
eee
package CCC
fff
ggg
hhh
iii
package DDD
jjj
should has an ouput of
package BBB
ddd
eee
Where pattern1 is package BBB, pattern2 is package \w*. Note that CCC isn't a known value so can't be literally matched.
In this case, neither #scai 's awk '/abc/{a=1}/mno/{print;a=0}a' file nor #fedorqui 's awk '/abc/{a=1} a; /mno/{a=0}' file works for me.
Finally, I managed to solve it by awk '/package BBB/{flag=1;print;next}/package \w*/{flag=0}flag' file, haha
A little more effort result in awk '/package BBB/{flag=1;print;next}flag;/package \w*/{flag=0}' file, to print pattern2 line also, that is,
package BBB
ddd
eee
package CCC

This can also be done with logical operations and increment/decrement operations on a flag:
awk '/mno/&&--f||f||/abc/&&f++' file

Related

Searching a string and replacing another string above the searched string

I have a file with the lines below
123
456
123
789
abc
efg
xyz
I need to search with abc and replace immediate above 123 with 111. This is the requirement, abc is only one occurrence in the file but 123 can be multiple occurrences and 123 can be at any position above abc.
Please help me.
I have tried with below sed command
sed -i.bak "/abc/!{x;1!p;d;};x;s/123/1111" filename
With the above command, it is only replacing 123, if 123 is just above abc, if 123 is 2 lines above abc then replace is failing.
There's more than on way to do it. Here's one:
sed -i.bak '1{h;d;};/123/{x;p;d;};/abc/{x;s/123/111/;p;d;};H;${x;p;};d' filename
ed comes in handy for complex editing of files in scripts:
ed -s file <<EOF
/^abc$/;?^123$?;.c
111
.
w
EOF
This: Sets the current line to the first one matching abc (/^abc$/;). Then changes the first line before that point that matches 123 to 111 (?XXX? searches backwards for a matching regular expression, and ?^123$?;. selects that single line for c to change) and finally saves the modified file.
This is a classic case where you keep track of your previous line and change stuff depeinding on conditions satisfying the current line. Genearlly, an awk program looks like this:
awk '(FNR==1){prev=$0; next}
(condition_on_$0) { action_on_prev }
{ print prev; prev = $0 }
END { print $0 }'
So in the case of the OP, this would read:
awk '(FNR==1){prev=$0; next}
$0 == "abc" { if (prev == "123") prev = "111" }
{ print prev; prev = $0 }
END { print $0 }'
This might work for you (GNU sed):
sed -Ez 's/(.*)(\n123.*\nabc)/\1\n111\2/' file
This slurps the file into memory and inserts 111 in front of the last occurrence of 123 before abc.
A less memory intensive solution:
sed -E '/^123$/{:a;N;/\n123$/{h;s///p;g;s/.*\n//;ba};/\nabc$/!ba;s/^/111\n/}' file
This gathers up lines following a line containing 123. If another line containing 123 is encountered it offloads all lines before it and begins gathering lines again. If it finds a line containing abc it inserts 111 at the front of the lines gathered so far.
Another alternative:
sed '/abc/{x;/./{s/^/111\n/p;z};x;b};/123/{x;/./p;x;h;$!d;b};x;/./{x;H;$!d};x' file
$ tac file | awk 'f && sub(/123/,"111"){f=0} /abc/{f=1} 1' | tac
123
456
111
789
abc
efg
xyz

Replace a word of a line if matched

I am given a file. If a line has "xxx" as its third word then I need to replace it with "yyy". My final output must have all the original lines with the modified lines.
The input file is-
abc xyz mno
xxx xyz abc
abc xyz xxx
abc xxx xxx xxx
The required output file should be-
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
I have tried-
grep "\bxxx\b" file.txt | awk '{if ($3=="xxx") print $0;}' | sed -e 's/[^ ]*[^ ]/yyy/3'
but this gives the output as-
abc xyz yyy
abc xxx yyy xxx
Following simple awk may help you in same.
awk '$3=="xxx"{$3="yyy"} 1' Input_file
Output will be as follows.
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
Explanation: Checking condition here if $3 3rd field is equal to string xxx then setting $3's value to string yyy. Then mentioning 1 there, since awk works on method of condition then action. I am making condition TRUE here by mentioning 1 here and NOT mentioning any action here so be default print of current line will happen(either with changed 3rd field or with new 3rd field).
sed solution:
sed -E 's/^(([^[:space:]]+[[:space:]]+){2})apathy\>/\1empathy/' file
The output:
abc xyz mno
apathy xyz abc
abc xyz empathy
abc apathy empathy apathy
To modify the file inplace add -i option: sed -Ei ....
In general the awk command may look like
awk '{command set 1}condition{command set 2}' file
The command set 1 would be executed for every line while command set 2 will be executed if the condition preceding that is true.
My final output must have all the original lines with the modified
lines
In your case
awk 'BEGIN{print "Original File";i=1}
{print}
$3=="xxx"{$3="yyy"}
{rec[i++]=$0}
END{print "Modified File";for(i=1;i<=NR;i++)print rec[i]}'file
should solve that.
Explanation
$3 is the the third space-delimited field in awk. If it matches "xxx", then it is replaced. Print the unmodified lines first while storing the modified lines in an array. At the end, print the modified lines. BEGIN and END blocks are executed only at the beginning and the end respectively. NR is the awk built-in variable which denotes that number of records processed till the moment. Since it is used in the END block it should give us the total number of records.
All good :-)
Ravinder has already provided you with the shortest awk solution possible.
In sed, the following would work:
sed -E 's/(([^ ]+ ){2})xxx/\1yyy/'
Or if your sed doesn't include -E, you can use the more painful BRE notation:
sed 's/\(\([^ ][^ ]* \)\{2\}\)xxx/\1yyy/'
And if you're in the mood to handle this in bash alone, something like this might work:
while read -r line; do
read -r -a a <<<"$line"
[[ "${a[2]}" == "xxx" ]] && a[2]="yyy"
printf '%s ' "${a[#]}"
printf '\n'
done < input.txt

BASH: grep characters and replace by the same plus tab

Basically, the only thing I need is to replace two spaces by a tab; this is the query:
abc def ghi K00001 jkl
all the columns are separated by a tab; the K00001 jkl is separated by two spaces. But I want these two spaces to be replaced by a tab.
I cannot just grep all two spaces since other contents have to spaces and they should stay.
My approach would be to grep:
grep '[0-9][0-9][0-9][0-9][0-9] ' file
but I want to replace it to have the same K00001<TAB>jkl
How do I replace by the same string? Can I use variables to store the grep result and then print the modified (tab not spaces) by the same string?
sed -r "s/([A-Z][0-9]{5}) /&\t/" File
or
sed -r "s/([A-Z][0-9]{5})\s{2}/&\t/" File
Example :
AMD$ echo "abc def ghi K00001 jkl" | sed -r "s/([A-Z][0-9]{5}) /&\t/"
abc def ghi K00001 jkl
You can use this sed:
sed -E $'s/([^[:blank:]]) {2}([^[:blank:]])/\\1\t\\2/g' file
Regex ([^[:blank:]]) {2}([^[:blank:]]) makes sure to match 2 spaces surrounded by 2 non-space characters. In replacement we put back surrounding characters using back-references \1 and \2
I would use awk , since with awk no matter if fields are separated by one - two or more spaces i can force output to be with tabs:
$ echo "abc def ghi K00001 jkl" |awk -v OFS="\t" '{$1=$1}1'
abc def ghi K00001 jkl

how can I remove a pattern from the begining of lines between two words using sed or awk

I want to remove a pattern in the begining of each line of a paragraph that contains word1 in the first line and end with word2 for example if I have the following file and I want to subsitute --MW by nothing
--MW Word1 this is paragraph number 1
--MW aaa
--MW bbb
--MW ccc
--MW word2
I want to get as result :
Word1 this is paragraph number 1
aaa
bbb
ccc
word2
Thanks in advance
Using sed
sed '/Word1/,/word2/s/--MW //' file
Using awk
awk '/Word1/,/word2/{sub(/--MW /,a)}1' file
Both act on lines between and including the matched phrases and the do a substitution on each line. They print all lines.
If you have your text in myfile.txt you could try:
awk 'BEGIN{f=0}$2=="Word1"{f=1}{if (f==1) {$1="";print $0}else{print $0}}$2=="word2"{f=0}' myfile.txt
If you are sure the pattern is going to be in the beginning of the line, then this command might help:
sed 's/^--MW //' file.txt
Please test and let us know if this worked fine with you.
Hopefully, this will do it for you:
$ echo "--MW Word1 this is paragraph number 1" | cut -d ' ' -f 2-
Where you pass the text to cut command and remove the first token, using space as token separator, while keeping the rest of tokens,i.e., from second to the end.

How to select lines between two marker patterns which may occur multiple times with awk/sed

Using awk or sed how can I select lines which are occurring between two different marker patterns? There may be multiple sections marked with these patterns.
For example:
Suppose the file contains:
abc
def1
ghi1
jkl1
mno
abc
def2
ghi2
jkl2
mno
pqr
stu
And the starting pattern is abc and ending pattern is mno
So, I need the output as:
def1
ghi1
jkl1
def2
ghi2
jkl2
I am using sed to match the pattern once:
sed -e '1,/abc/d' -e '/mno/,$d' <FILE>
Is there any way in sed or awk to do it repeatedly until the end of file?
Use awk with a flag to trigger the print when necessary:
$ awk '/abc/{flag=1;next}/mno/{flag=0}flag' file
def1
ghi1
jkl1
def2
ghi2
jkl2
How does this work?
/abc/ matches lines having this text, as well as /mno/ does.
/abc/{flag=1;next} sets the flag when the text abc is found. Then, it skips the line.
/mno/{flag=0} unsets the flag when the text mno is found.
The final flag is a pattern with the default action, which is to print $0: if flag is equal 1 the line is printed.
For a more detailed description and examples, together with cases when the patterns are either shown or not, see How to select lines between two patterns?.
Using sed:
sed -n -e '/^abc$/,/^mno$/{ /^abc$/d; /^mno$/d; p; }'
The -n option means do not print by default.
The pattern looks for lines containing just abc to just mno, and then executes the actions in the { ... }. The first action deletes the abc line; the second the mno line; and the p prints the remaining lines. You can relax the regexes as required. Any lines outside the range of abc..mno are simply not printed.
This might work for you (GNU sed):
sed '/^abc$/,/^mno$/{//!b};d' file
Delete all lines except for those between lines starting abc and mno
sed '/^abc$/,/^mno$/!d;//d' file
golfs two characters better than ppotong's {//!b};d
The empty forward slashes // mean: "reuse the last regular expression used". and the command does the same as the more understandable:
sed '/^abc$/,/^mno$/!d;/^abc$/d;/^mno$/d' file
This seems to be POSIX:
If an RE is empty (that is, no pattern is specified) sed shall behave as if the last RE used in the last command applied (either as an address or as part of a substitute command) was specified.
From the previous response's links, the one that did it for me, running ksh on Solaris, was this:
sed '1,/firstmatch/d;/secondmatch/,$d'
1,/firstmatch/d: from line 1 until the first time you find firstmatch, delete.
/secondmatch/,$d: from the first occurrance of secondmatch until the end of file, delete.
Semicolon separates the two commands, which are executed in sequence.
something like this works for me:
file.awk:
BEGIN {
record=0
}
/^abc$/ {
record=1
}
/^mno$/ {
record=0;
print "s="s;
s=""
}
!/^abc|mno$/ {
if (record==1) {
s = s"\n"$0
}
}
using: awk -f file.awk data...
edit: O_o fedorqui solution is way better/prettier than mine.
Don_crissti's answer from Show only text between 2 matching pattern?
firstmatch="abc"
secondmatch="cdf"
sed "/$firstmatch/,/$secondmatch/!d;//d" infile
which is much more efficient than AWK's application, see here.
perl -lne 'print if((/abc/../mno/) && !(/abc/||/mno/))' your_file
I tried to use awk to print lines between two patterns while pattern2 also match pattern1. And the pattern1 line should also be printed.
e.g.
source
package AAA
aaa
bbb
ccc
package BBB
ddd
eee
package CCC
fff
ggg
hhh
iii
package DDD
jjj
should has an ouput of
package BBB
ddd
eee
Where pattern1 is package BBB, pattern2 is package \w*. Note that CCC isn't a known value so can't be literally matched.
In this case, neither #scai 's awk '/abc/{a=1}/mno/{print;a=0}a' file nor #fedorqui 's awk '/abc/{a=1} a; /mno/{a=0}' file works for me.
Finally, I managed to solve it by awk '/package BBB/{flag=1;print;next}/package \w*/{flag=0}flag' file, haha
A little more effort result in awk '/package BBB/{flag=1;print;next}flag;/package \w*/{flag=0}' file, to print pattern2 line also, that is,
package BBB
ddd
eee
package CCC
This can also be done with logical operations and increment/decrement operations on a flag:
awk '/mno/&&--f||f||/abc/&&f++' file

Resources