BASH: grep characters and replace by the same plus tab - bash

Basically, the only thing I need is to replace two spaces by a tab; this is the query:
abc def ghi K00001 jkl
all the columns are separated by a tab; the K00001 jkl is separated by two spaces. But I want these two spaces to be replaced by a tab.
I cannot just grep all two spaces since other contents have to spaces and they should stay.
My approach would be to grep:
grep '[0-9][0-9][0-9][0-9][0-9] ' file
but I want to replace it to have the same K00001<TAB>jkl
How do I replace by the same string? Can I use variables to store the grep result and then print the modified (tab not spaces) by the same string?

sed -r "s/([A-Z][0-9]{5}) /&\t/" File
or
sed -r "s/([A-Z][0-9]{5})\s{2}/&\t/" File
Example :
AMD$ echo "abc def ghi K00001 jkl" | sed -r "s/([A-Z][0-9]{5}) /&\t/"
abc def ghi K00001 jkl

You can use this sed:
sed -E $'s/([^[:blank:]]) {2}([^[:blank:]])/\\1\t\\2/g' file
Regex ([^[:blank:]]) {2}([^[:blank:]]) makes sure to match 2 spaces surrounded by 2 non-space characters. In replacement we put back surrounding characters using back-references \1 and \2

I would use awk , since with awk no matter if fields are separated by one - two or more spaces i can force output to be with tabs:
$ echo "abc def ghi K00001 jkl" |awk -v OFS="\t" '{$1=$1}1'
abc def ghi K00001 jkl

Related

Sed to remove substring | Can I make a flexible pattern to remove numbers before tab?

I wanted to ask some advice on an issue that I'm having in removing a substring from a string. I have a file with many lines like the following:
DOG; CSQ| 0.1234 | abcd | \t CAT
where \t represents a literal tab.
My aim is to remove a substring by using sed 's/CSQ.*|//g' so that I can get the following output:
DOG; CAT
However I face a problem where all the rows aren't formatted the same. For example, I also get lines such as:
DOG; CSQ| 0.1234 | abcd | 0 \t CAT
DOG; CSQ| 0.1234 | abcd | 0.9187 \t CAT
My code fails at this point because instead of getting DOG; CAT for all lines, I get:
DOG; CAT
DOG; 0 CAT
DOG; 0.9187 CAT
I've searched for possible solutions but I'm having difficulty (I'm also quite new to bash). I imagine there's something that I can do with sed that will handles all cases but I'm not sure.
You can find and replace all text from CSQ till the last | and all chars after that till the tab including it using
sed 's/CSQ.*|.*\t//' file > newfile
See the online demo.
The CSQ.*|.*\t is a POSIX BRE pattern that matches
CSQ - a CSQ string
.* - any text
| - a pipe char
.* - any text
\t - TAB char.
If the \t are two-char combinations double the backslash before t:
sed 's/CSQ.*|.*\\t//' file > newfile
See this online demo.
So optionally match it.
sed 's/CSQ.*|\( [0-9.]*\)\?//g'
You can learn regex online with fun with regex crosswords.
awk makes this pretty easy.
$: awk '/CSQ.*\t/{print $1" "$NF}' file
DOG; CAT
DOG; CAT
DOG; CAT
Note that the file has to have actual tabs, not \t sequences. awk will read the \t correctly.
If there are no other formatted lines in the file that you want, then maybe just
$: awk '{print $1" "$NF}' file
DOG; CAT
DOG; CAT
DOG; CAT

Replace a word of a line if matched

I am given a file. If a line has "xxx" as its third word then I need to replace it with "yyy". My final output must have all the original lines with the modified lines.
The input file is-
abc xyz mno
xxx xyz abc
abc xyz xxx
abc xxx xxx xxx
The required output file should be-
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
I have tried-
grep "\bxxx\b" file.txt | awk '{if ($3=="xxx") print $0;}' | sed -e 's/[^ ]*[^ ]/yyy/3'
but this gives the output as-
abc xyz yyy
abc xxx yyy xxx
Following simple awk may help you in same.
awk '$3=="xxx"{$3="yyy"} 1' Input_file
Output will be as follows.
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
Explanation: Checking condition here if $3 3rd field is equal to string xxx then setting $3's value to string yyy. Then mentioning 1 there, since awk works on method of condition then action. I am making condition TRUE here by mentioning 1 here and NOT mentioning any action here so be default print of current line will happen(either with changed 3rd field or with new 3rd field).
sed solution:
sed -E 's/^(([^[:space:]]+[[:space:]]+){2})apathy\>/\1empathy/' file
The output:
abc xyz mno
apathy xyz abc
abc xyz empathy
abc apathy empathy apathy
To modify the file inplace add -i option: sed -Ei ....
In general the awk command may look like
awk '{command set 1}condition{command set 2}' file
The command set 1 would be executed for every line while command set 2 will be executed if the condition preceding that is true.
My final output must have all the original lines with the modified
lines
In your case
awk 'BEGIN{print "Original File";i=1}
{print}
$3=="xxx"{$3="yyy"}
{rec[i++]=$0}
END{print "Modified File";for(i=1;i<=NR;i++)print rec[i]}'file
should solve that.
Explanation
$3 is the the third space-delimited field in awk. If it matches "xxx", then it is replaced. Print the unmodified lines first while storing the modified lines in an array. At the end, print the modified lines. BEGIN and END blocks are executed only at the beginning and the end respectively. NR is the awk built-in variable which denotes that number of records processed till the moment. Since it is used in the END block it should give us the total number of records.
All good :-)
Ravinder has already provided you with the shortest awk solution possible.
In sed, the following would work:
sed -E 's/(([^ ]+ ){2})xxx/\1yyy/'
Or if your sed doesn't include -E, you can use the more painful BRE notation:
sed 's/\(\([^ ][^ ]* \)\{2\}\)xxx/\1yyy/'
And if you're in the mood to handle this in bash alone, something like this might work:
while read -r line; do
read -r -a a <<<"$line"
[[ "${a[2]}" == "xxx" ]] && a[2]="yyy"
printf '%s ' "${a[#]}"
printf '\n'
done < input.txt

Using awk, eliminate any empty fields in a file and print in proper format

how to use awk on the following file named "awk.txt" and print all fields in proper length of space or tab length between.
# cat /root/awk.txt
abc hij klm
def pqr hij
mmm fgf hgt
yyt ghf jkw
I wanted to use awk on this and print in the following proper format.
abc hij klm
def pqr hij
mmm fgf hgt
yyt ghf jkw
Please help!!
Use the column command from coreutils:
column -t file
In this special case, where all entries have the same length, the following awk command would do the trick as well, however column can do the job even if the entries have different length:
awk '{$1=$1}1' OFS=' ' file
This line of awk will format the output using printf (documentation)
awk '{printf "%3s\t%3s\t%3s\n",$1,$2,$3}' awk.txt
If you want to strip the first line starting with #
awk '!/^#/{printf "%3s\t%3s\t%3s\n",$1,$2,$3}'

extract the data between two pattern and save it with different name [duplicate]

Using awk or sed how can I select lines which are occurring between two different marker patterns? There may be multiple sections marked with these patterns.
For example:
Suppose the file contains:
abc
def1
ghi1
jkl1
mno
abc
def2
ghi2
jkl2
mno
pqr
stu
And the starting pattern is abc and ending pattern is mno
So, I need the output as:
def1
ghi1
jkl1
def2
ghi2
jkl2
I am using sed to match the pattern once:
sed -e '1,/abc/d' -e '/mno/,$d' <FILE>
Is there any way in sed or awk to do it repeatedly until the end of file?
Use awk with a flag to trigger the print when necessary:
$ awk '/abc/{flag=1;next}/mno/{flag=0}flag' file
def1
ghi1
jkl1
def2
ghi2
jkl2
How does this work?
/abc/ matches lines having this text, as well as /mno/ does.
/abc/{flag=1;next} sets the flag when the text abc is found. Then, it skips the line.
/mno/{flag=0} unsets the flag when the text mno is found.
The final flag is a pattern with the default action, which is to print $0: if flag is equal 1 the line is printed.
For a more detailed description and examples, together with cases when the patterns are either shown or not, see How to select lines between two patterns?.
Using sed:
sed -n -e '/^abc$/,/^mno$/{ /^abc$/d; /^mno$/d; p; }'
The -n option means do not print by default.
The pattern looks for lines containing just abc to just mno, and then executes the actions in the { ... }. The first action deletes the abc line; the second the mno line; and the p prints the remaining lines. You can relax the regexes as required. Any lines outside the range of abc..mno are simply not printed.
This might work for you (GNU sed):
sed '/^abc$/,/^mno$/{//!b};d' file
Delete all lines except for those between lines starting abc and mno
sed '/^abc$/,/^mno$/!d;//d' file
golfs two characters better than ppotong's {//!b};d
The empty forward slashes // mean: "reuse the last regular expression used". and the command does the same as the more understandable:
sed '/^abc$/,/^mno$/!d;/^abc$/d;/^mno$/d' file
This seems to be POSIX:
If an RE is empty (that is, no pattern is specified) sed shall behave as if the last RE used in the last command applied (either as an address or as part of a substitute command) was specified.
From the previous response's links, the one that did it for me, running ksh on Solaris, was this:
sed '1,/firstmatch/d;/secondmatch/,$d'
1,/firstmatch/d: from line 1 until the first time you find firstmatch, delete.
/secondmatch/,$d: from the first occurrance of secondmatch until the end of file, delete.
Semicolon separates the two commands, which are executed in sequence.
something like this works for me:
file.awk:
BEGIN {
record=0
}
/^abc$/ {
record=1
}
/^mno$/ {
record=0;
print "s="s;
s=""
}
!/^abc|mno$/ {
if (record==1) {
s = s"\n"$0
}
}
using: awk -f file.awk data...
edit: O_o fedorqui solution is way better/prettier than mine.
Don_crissti's answer from Show only text between 2 matching pattern?
firstmatch="abc"
secondmatch="cdf"
sed "/$firstmatch/,/$secondmatch/!d;//d" infile
which is much more efficient than AWK's application, see here.
perl -lne 'print if((/abc/../mno/) && !(/abc/||/mno/))' your_file
I tried to use awk to print lines between two patterns while pattern2 also match pattern1. And the pattern1 line should also be printed.
e.g.
source
package AAA
aaa
bbb
ccc
package BBB
ddd
eee
package CCC
fff
ggg
hhh
iii
package DDD
jjj
should has an ouput of
package BBB
ddd
eee
Where pattern1 is package BBB, pattern2 is package \w*. Note that CCC isn't a known value so can't be literally matched.
In this case, neither #scai 's awk '/abc/{a=1}/mno/{print;a=0}a' file nor #fedorqui 's awk '/abc/{a=1} a; /mno/{a=0}' file works for me.
Finally, I managed to solve it by awk '/package BBB/{flag=1;print;next}/package \w*/{flag=0}flag' file, haha
A little more effort result in awk '/package BBB/{flag=1;print;next}flag;/package \w*/{flag=0}' file, to print pattern2 line also, that is,
package BBB
ddd
eee
package CCC
This can also be done with logical operations and increment/decrement operations on a flag:
awk '/mno/&&--f||f||/abc/&&f++' file

How to select lines between two marker patterns which may occur multiple times with awk/sed

Using awk or sed how can I select lines which are occurring between two different marker patterns? There may be multiple sections marked with these patterns.
For example:
Suppose the file contains:
abc
def1
ghi1
jkl1
mno
abc
def2
ghi2
jkl2
mno
pqr
stu
And the starting pattern is abc and ending pattern is mno
So, I need the output as:
def1
ghi1
jkl1
def2
ghi2
jkl2
I am using sed to match the pattern once:
sed -e '1,/abc/d' -e '/mno/,$d' <FILE>
Is there any way in sed or awk to do it repeatedly until the end of file?
Use awk with a flag to trigger the print when necessary:
$ awk '/abc/{flag=1;next}/mno/{flag=0}flag' file
def1
ghi1
jkl1
def2
ghi2
jkl2
How does this work?
/abc/ matches lines having this text, as well as /mno/ does.
/abc/{flag=1;next} sets the flag when the text abc is found. Then, it skips the line.
/mno/{flag=0} unsets the flag when the text mno is found.
The final flag is a pattern with the default action, which is to print $0: if flag is equal 1 the line is printed.
For a more detailed description and examples, together with cases when the patterns are either shown or not, see How to select lines between two patterns?.
Using sed:
sed -n -e '/^abc$/,/^mno$/{ /^abc$/d; /^mno$/d; p; }'
The -n option means do not print by default.
The pattern looks for lines containing just abc to just mno, and then executes the actions in the { ... }. The first action deletes the abc line; the second the mno line; and the p prints the remaining lines. You can relax the regexes as required. Any lines outside the range of abc..mno are simply not printed.
This might work for you (GNU sed):
sed '/^abc$/,/^mno$/{//!b};d' file
Delete all lines except for those between lines starting abc and mno
sed '/^abc$/,/^mno$/!d;//d' file
golfs two characters better than ppotong's {//!b};d
The empty forward slashes // mean: "reuse the last regular expression used". and the command does the same as the more understandable:
sed '/^abc$/,/^mno$/!d;/^abc$/d;/^mno$/d' file
This seems to be POSIX:
If an RE is empty (that is, no pattern is specified) sed shall behave as if the last RE used in the last command applied (either as an address or as part of a substitute command) was specified.
From the previous response's links, the one that did it for me, running ksh on Solaris, was this:
sed '1,/firstmatch/d;/secondmatch/,$d'
1,/firstmatch/d: from line 1 until the first time you find firstmatch, delete.
/secondmatch/,$d: from the first occurrance of secondmatch until the end of file, delete.
Semicolon separates the two commands, which are executed in sequence.
something like this works for me:
file.awk:
BEGIN {
record=0
}
/^abc$/ {
record=1
}
/^mno$/ {
record=0;
print "s="s;
s=""
}
!/^abc|mno$/ {
if (record==1) {
s = s"\n"$0
}
}
using: awk -f file.awk data...
edit: O_o fedorqui solution is way better/prettier than mine.
Don_crissti's answer from Show only text between 2 matching pattern?
firstmatch="abc"
secondmatch="cdf"
sed "/$firstmatch/,/$secondmatch/!d;//d" infile
which is much more efficient than AWK's application, see here.
perl -lne 'print if((/abc/../mno/) && !(/abc/||/mno/))' your_file
I tried to use awk to print lines between two patterns while pattern2 also match pattern1. And the pattern1 line should also be printed.
e.g.
source
package AAA
aaa
bbb
ccc
package BBB
ddd
eee
package CCC
fff
ggg
hhh
iii
package DDD
jjj
should has an ouput of
package BBB
ddd
eee
Where pattern1 is package BBB, pattern2 is package \w*. Note that CCC isn't a known value so can't be literally matched.
In this case, neither #scai 's awk '/abc/{a=1}/mno/{print;a=0}a' file nor #fedorqui 's awk '/abc/{a=1} a; /mno/{a=0}' file works for me.
Finally, I managed to solve it by awk '/package BBB/{flag=1;print;next}/package \w*/{flag=0}flag' file, haha
A little more effort result in awk '/package BBB/{flag=1;print;next}flag;/package \w*/{flag=0}' file, to print pattern2 line also, that is,
package BBB
ddd
eee
package CCC
This can also be done with logical operations and increment/decrement operations on a flag:
awk '/mno/&&--f||f||/abc/&&f++' file

Resources