how can I remove a pattern from the begining of lines between two words using sed or awk - bash

I want to remove a pattern in the begining of each line of a paragraph that contains word1 in the first line and end with word2 for example if I have the following file and I want to subsitute --MW by nothing
--MW Word1 this is paragraph number 1
--MW aaa
--MW bbb
--MW ccc
--MW word2
I want to get as result :
Word1 this is paragraph number 1
aaa
bbb
ccc
word2
Thanks in advance

Using sed
sed '/Word1/,/word2/s/--MW //' file
Using awk
awk '/Word1/,/word2/{sub(/--MW /,a)}1' file
Both act on lines between and including the matched phrases and the do a substitution on each line. They print all lines.

If you have your text in myfile.txt you could try:
awk 'BEGIN{f=0}$2=="Word1"{f=1}{if (f==1) {$1="";print $0}else{print $0}}$2=="word2"{f=0}' myfile.txt

If you are sure the pattern is going to be in the beginning of the line, then this command might help:
sed 's/^--MW //' file.txt
Please test and let us know if this worked fine with you.

Hopefully, this will do it for you:
$ echo "--MW Word1 this is paragraph number 1" | cut -d ' ' -f 2-
Where you pass the text to cut command and remove the first token, using space as token separator, while keeping the rest of tokens,i.e., from second to the end.

Related

Bash; Replacing new line with ", " and ending with ".", can someone explain awk and sed, please?

so let's say i have
aaa
bbb
ccc
ddd
and i need to replace all new lines by comma and space, and end with a dot like so
aaa, bbb, ccc, ddd.
I found this, but i can't understand it at all and i need to know what every single character does ;
... | awk '{ printf $0", " }' | sed 's/.\{2\}$/./'
Can someone make those two commands human-readable ?
tysm!
About the command:
... | awk '{ printf $0", " }' | sed 's/.\{2\}$/./'
Awk prints the line $0 followed by , without newlines. When this is done, you have , trailing at the end.
Then the pipe to sed replaces the last , with a single dot as this part .\{2\}$ matches 2 times any character at the end of the string.
With sed using a single command, you can read all lines using N to pull the next line in the pattern space, and use a label to keep on replacing a newline as long as it is not the last line last line.
After that you can append a dot to the end.
sed ':a;N;$!ba;s/\n/, /g;s/$/./' file
Output
aaa, bbb, ccc, ddd.
ok,
first of all; thank u.
I do now understand that printf $0", " means 'print every line, and ", " at the end of each'
as for the sed command, a colleague explained it to me a minute ago;
in 's/.\{2\}$/./',
s/ replace
. any character
{2} x2, so two characters
$ at end of the line
/ by ( 's/ / /' = replace/ this / that /)
. the character '.'
/ end
without forgetting to escape { and }, so u end up with
's/ . \{2\} $ / . /'
but wait, it gets even better;
my colleague also told me that \{2\} wasn't necessary in this case ;
.{2} (without the escapes) could simply be replaced by
.. 'any character' twice.
so 's/..$/./' wich is way more readable i think
'replace/ wich ever two last characters / this character/'
hope this helps if any other 42 student gets here
tism again
awk '{ printf $0", " }'
This is awk command with single action, encased in {...}, this action is applied to every line of input.
Printf is print with format, here no formatting takes places but another feature of printf is leveraged - printf does not attach output row separator (default: newline) as opposed to print.
$0 denotes whole current line (sans trailing newline).
", " is string literal for comma followed by space.
$0", " instructs awk to concatenate current line with comma and space.
Whole command thus might be read as: for every line output current line followed by comma and space
sed 's/.\{2\}$/./'
s is one of commands, namely substitute, basic form is
s/regexp/replacement/
therefore
.\{2\}$ is regular expression, . denotes any characters, \{2\} repeated 2 times, $ denotes end of line, thus this one matches 2 last characters of each line, as text was already converted to single line, it will match 2 last characters of whole text.
. is replacement, literal dot character
Whole command thus might be read as: for every line replace 2 last characters using dot
Assuming the four lines are in a file...
#!/bin/sh
cat << EOF >> ed1
%s/$/,/g
$
s/,/./
wq
EOF
ed -s file < ed1
cat file | tr '\n' ' ' > f2
mv f2 file
rm -v ./ed1
echo 'aaa
bbb
ccc
ddd' |
mawk NF+=RS FS='\412' RS= OFS='\40\454' ORS='\456\12'
aaa, bbb, ccc, ddd.

How do you display all the words that start with 1 uppercase letter using grep in bash?

I tried something like this
sed 's/^[ \t]*//' file.txt | grep "^[A-Z].* "
but it will show only the lines that start with words starting with an uppercase.
file.txt content:
Something1 something2
word1 Word2
this is lower
The output will be Something1 something2 but I will like for it to also show the second line because also has a word that starts with an uppercase letter.
With GNU grep grep -P "[A-Z]+\w*" file.txt will work. Or, as #Shawn said in the comment below, grep -P '\b[A-Z]' file.txt will also work. If you only want the words, and not the entire line, grep -Po "[A-Z]+\w*" file.txt will give you the individual words.
With GNU grep, you can use
grep '\<[[:upper:]]' file
grep '\b[[:upper:]]' file
NOTE:
\< - a leading word boundary (\b is a word boundary)
[[:upper:]] - any uppercase letter.
See the online demo:
#!/bin/bash
s='Something1 something2
word1 Word2
this is lower
папа Петя'
grep '\<[[:upper:]]' <<< "$s"
Output:
Something1 something2
word1 Word2
папа Петя
How do you display all the words
That's simple:
grep -wo '[A-Z]\w*'

extract the data between two pattern and save it with different name [duplicate]

Using awk or sed how can I select lines which are occurring between two different marker patterns? There may be multiple sections marked with these patterns.
For example:
Suppose the file contains:
abc
def1
ghi1
jkl1
mno
abc
def2
ghi2
jkl2
mno
pqr
stu
And the starting pattern is abc and ending pattern is mno
So, I need the output as:
def1
ghi1
jkl1
def2
ghi2
jkl2
I am using sed to match the pattern once:
sed -e '1,/abc/d' -e '/mno/,$d' <FILE>
Is there any way in sed or awk to do it repeatedly until the end of file?
Use awk with a flag to trigger the print when necessary:
$ awk '/abc/{flag=1;next}/mno/{flag=0}flag' file
def1
ghi1
jkl1
def2
ghi2
jkl2
How does this work?
/abc/ matches lines having this text, as well as /mno/ does.
/abc/{flag=1;next} sets the flag when the text abc is found. Then, it skips the line.
/mno/{flag=0} unsets the flag when the text mno is found.
The final flag is a pattern with the default action, which is to print $0: if flag is equal 1 the line is printed.
For a more detailed description and examples, together with cases when the patterns are either shown or not, see How to select lines between two patterns?.
Using sed:
sed -n -e '/^abc$/,/^mno$/{ /^abc$/d; /^mno$/d; p; }'
The -n option means do not print by default.
The pattern looks for lines containing just abc to just mno, and then executes the actions in the { ... }. The first action deletes the abc line; the second the mno line; and the p prints the remaining lines. You can relax the regexes as required. Any lines outside the range of abc..mno are simply not printed.
This might work for you (GNU sed):
sed '/^abc$/,/^mno$/{//!b};d' file
Delete all lines except for those between lines starting abc and mno
sed '/^abc$/,/^mno$/!d;//d' file
golfs two characters better than ppotong's {//!b};d
The empty forward slashes // mean: "reuse the last regular expression used". and the command does the same as the more understandable:
sed '/^abc$/,/^mno$/!d;/^abc$/d;/^mno$/d' file
This seems to be POSIX:
If an RE is empty (that is, no pattern is specified) sed shall behave as if the last RE used in the last command applied (either as an address or as part of a substitute command) was specified.
From the previous response's links, the one that did it for me, running ksh on Solaris, was this:
sed '1,/firstmatch/d;/secondmatch/,$d'
1,/firstmatch/d: from line 1 until the first time you find firstmatch, delete.
/secondmatch/,$d: from the first occurrance of secondmatch until the end of file, delete.
Semicolon separates the two commands, which are executed in sequence.
something like this works for me:
file.awk:
BEGIN {
record=0
}
/^abc$/ {
record=1
}
/^mno$/ {
record=0;
print "s="s;
s=""
}
!/^abc|mno$/ {
if (record==1) {
s = s"\n"$0
}
}
using: awk -f file.awk data...
edit: O_o fedorqui solution is way better/prettier than mine.
Don_crissti's answer from Show only text between 2 matching pattern?
firstmatch="abc"
secondmatch="cdf"
sed "/$firstmatch/,/$secondmatch/!d;//d" infile
which is much more efficient than AWK's application, see here.
perl -lne 'print if((/abc/../mno/) && !(/abc/||/mno/))' your_file
I tried to use awk to print lines between two patterns while pattern2 also match pattern1. And the pattern1 line should also be printed.
e.g.
source
package AAA
aaa
bbb
ccc
package BBB
ddd
eee
package CCC
fff
ggg
hhh
iii
package DDD
jjj
should has an ouput of
package BBB
ddd
eee
Where pattern1 is package BBB, pattern2 is package \w*. Note that CCC isn't a known value so can't be literally matched.
In this case, neither #scai 's awk '/abc/{a=1}/mno/{print;a=0}a' file nor #fedorqui 's awk '/abc/{a=1} a; /mno/{a=0}' file works for me.
Finally, I managed to solve it by awk '/package BBB/{flag=1;print;next}/package \w*/{flag=0}flag' file, haha
A little more effort result in awk '/package BBB/{flag=1;print;next}flag;/package \w*/{flag=0}' file, to print pattern2 line also, that is,
package BBB
ddd
eee
package CCC
This can also be done with logical operations and increment/decrement operations on a flag:
awk '/mno/&&--f||f||/abc/&&f++' file

copy the last column in the first position awk

I want to copy the first value of colum in the first position and comment out the old value.
For example :
word1 word2 1233425 -----> 1233425 word1 word2 #1233425
word1 word2 word3 49586 -----> 49586 word1 word2 word3 #49586
I don't know the number of words preceding the number.
I tried with an awk script :
awk '{$1="";score=$NF;$NF="";print $score $0 #$score}' file
But It does not work.
What about this? It is pretty similar to yours.
$ awk '{score=$NF; $NF="#"$NF; print score, $0}' file
1233425 word1 word2 #1233425
49586 word1 word2 word3 #49586
Note that in your case you are emptying $1, which is not necessary. Just store score as you did and then add # to the beginning of $NF.
Using awk
awk '{f=$NF;$NF="#" $NF;print f,$0}' file
Since we posted the same answer, here is a shorter variation :)
awk '{$0=$NF FS$0;$NF="#"$NF}1' file
$0=$NF FS$0 add last field to line
$NF="#"$NF add # to last field.
1 print line
A perl way to do it:
perl -pe 's/^(.+ )(\d+)/$2 $1 #$2/' infile
sed 's/\(.*\) \([^[:blank:]]\{1,\}\)/\2 \1 #\2/' YourFile
with GNU sed add -posix option

How to select lines between two marker patterns which may occur multiple times with awk/sed

Using awk or sed how can I select lines which are occurring between two different marker patterns? There may be multiple sections marked with these patterns.
For example:
Suppose the file contains:
abc
def1
ghi1
jkl1
mno
abc
def2
ghi2
jkl2
mno
pqr
stu
And the starting pattern is abc and ending pattern is mno
So, I need the output as:
def1
ghi1
jkl1
def2
ghi2
jkl2
I am using sed to match the pattern once:
sed -e '1,/abc/d' -e '/mno/,$d' <FILE>
Is there any way in sed or awk to do it repeatedly until the end of file?
Use awk with a flag to trigger the print when necessary:
$ awk '/abc/{flag=1;next}/mno/{flag=0}flag' file
def1
ghi1
jkl1
def2
ghi2
jkl2
How does this work?
/abc/ matches lines having this text, as well as /mno/ does.
/abc/{flag=1;next} sets the flag when the text abc is found. Then, it skips the line.
/mno/{flag=0} unsets the flag when the text mno is found.
The final flag is a pattern with the default action, which is to print $0: if flag is equal 1 the line is printed.
For a more detailed description and examples, together with cases when the patterns are either shown or not, see How to select lines between two patterns?.
Using sed:
sed -n -e '/^abc$/,/^mno$/{ /^abc$/d; /^mno$/d; p; }'
The -n option means do not print by default.
The pattern looks for lines containing just abc to just mno, and then executes the actions in the { ... }. The first action deletes the abc line; the second the mno line; and the p prints the remaining lines. You can relax the regexes as required. Any lines outside the range of abc..mno are simply not printed.
This might work for you (GNU sed):
sed '/^abc$/,/^mno$/{//!b};d' file
Delete all lines except for those between lines starting abc and mno
sed '/^abc$/,/^mno$/!d;//d' file
golfs two characters better than ppotong's {//!b};d
The empty forward slashes // mean: "reuse the last regular expression used". and the command does the same as the more understandable:
sed '/^abc$/,/^mno$/!d;/^abc$/d;/^mno$/d' file
This seems to be POSIX:
If an RE is empty (that is, no pattern is specified) sed shall behave as if the last RE used in the last command applied (either as an address or as part of a substitute command) was specified.
From the previous response's links, the one that did it for me, running ksh on Solaris, was this:
sed '1,/firstmatch/d;/secondmatch/,$d'
1,/firstmatch/d: from line 1 until the first time you find firstmatch, delete.
/secondmatch/,$d: from the first occurrance of secondmatch until the end of file, delete.
Semicolon separates the two commands, which are executed in sequence.
something like this works for me:
file.awk:
BEGIN {
record=0
}
/^abc$/ {
record=1
}
/^mno$/ {
record=0;
print "s="s;
s=""
}
!/^abc|mno$/ {
if (record==1) {
s = s"\n"$0
}
}
using: awk -f file.awk data...
edit: O_o fedorqui solution is way better/prettier than mine.
Don_crissti's answer from Show only text between 2 matching pattern?
firstmatch="abc"
secondmatch="cdf"
sed "/$firstmatch/,/$secondmatch/!d;//d" infile
which is much more efficient than AWK's application, see here.
perl -lne 'print if((/abc/../mno/) && !(/abc/||/mno/))' your_file
I tried to use awk to print lines between two patterns while pattern2 also match pattern1. And the pattern1 line should also be printed.
e.g.
source
package AAA
aaa
bbb
ccc
package BBB
ddd
eee
package CCC
fff
ggg
hhh
iii
package DDD
jjj
should has an ouput of
package BBB
ddd
eee
Where pattern1 is package BBB, pattern2 is package \w*. Note that CCC isn't a known value so can't be literally matched.
In this case, neither #scai 's awk '/abc/{a=1}/mno/{print;a=0}a' file nor #fedorqui 's awk '/abc/{a=1} a; /mno/{a=0}' file works for me.
Finally, I managed to solve it by awk '/package BBB/{flag=1;print;next}/package \w*/{flag=0}flag' file, haha
A little more effort result in awk '/package BBB/{flag=1;print;next}flag;/package \w*/{flag=0}' file, to print pattern2 line also, that is,
package BBB
ddd
eee
package CCC
This can also be done with logical operations and increment/decrement operations on a flag:
awk '/mno/&&--f||f||/abc/&&f++' file

Resources