Case insensitive search matching with sed? - bash

I'm trying to use SED to extract text from two words, such as "Account" and "Recognized", and I'd like that the searching be case insensitive. So I tried to use the I parameter, but receive this error message:
cat Security.txt | sed -n "/Account/,/Recognized/pI" | sed -e '1d' -e '$d'
sed: -e expression #1, char 24: extra characters after command

Avoid useless use of cat
/pattern/I is how to specify case-insensitive matching in sed
sed -n "/Account/I,/Recognized/Ip" Security.txt | sed -e '1d' -e '$d'
You can use single sed command to achieve the same:
sed -n '/account/I,/recognized/I{/account/I!{/recognized/I!p}}' Security.txt
Or awk
awk 'BEGIN{IGNORECASE=1} /account/{f=1; next} /recognized/{f=0} f' Security.txt
Reference:
How to select lines between two patterns?

Use:
sed -n "/Account/,/Recognized/Ip"
i.e. change the order to: Ip instead of pI

You have useless use of cat where you should've fed the file directly to sed. Below could be a way of doing it.
$ cat file.txt
Some stuff Account sllslsljjs Security.
Another stuff account name and ffss security.
$ sed -nE 's/^.*account[[:blank:]]*(.*)[[:blank:]]*security.*$/\1/pI' file.txt
sllslsljjs
name and ffss
The [[:blank:]]* is greedy and will strip the spaces before and after the required text. The -E option enables the use of extended regular expressions.

Related

sed - how to print regex match on particular lines

I want to print a match only of particular lines with sed;
sed '1,4 /regextomatch/p':
sed: -e expression #1, char 5: unknown command: `/'
What is a command for print?
I suggest:
sed -n '1,4{/regextomatch/p}' file
You can use quit command:
sed -n '/regex/p;4q' file
I'd rather do this:
head -n 4 filename | grep regextomatch

multiple sed with -e and escape characters

I'm trying to do multiple replacements in a gzipped file and have been having trouble.
zcat PteBra.fa.align.gz | sed -e 's#Simple_repeat/Satellite/Y-chromosome#Simple_repeat/Satellite#g' -e sed 's#Unknown/Unknown/Y-chromosome#Unknown/Unknown#g' -e sed 's#DNA/DNA/TcMar#DNA/TcMar#g' -e sed 's#DNA/DNA/Crypton#DNA/Crypton#g' -e sed 's#DNA/DNA/PIF-Harbinger#DNA/PIF-Harbinger#g' -e sed 's#DNA/DNA/CMC-Chapaev-3#DNA/CMC-Chapaev-3#g' -e sed 's#SINE/SINE/RTE#SINE/RTE#g' > PteBra.fa.align.corrected
Note that I'm using # instead of the standard / because of the presence of / in the text I want to replace. Each individual sed works with no problem but stringing them together yields this consistent error:
sed: -e expression #2, char 3: unterminated `s' command
I have looked all over for a solution but finally, to get the work done, just did all the sed's individually. It takes FOREVER, so I'd like to get this option working.
I've been at this for hours and would appreciate some help.
What am I doing wrong?
Thanks.
You don't have to write -e sed each time! -e will do.
zcat PteBra.fa.align.gz | sed -e 's#Simple_repeat/Satellite/Y-chromosome#Simple_repeat/Satellite#g' -e 's#Unknown/Unknown/Y-chromosome#Unknown/Unknown#g' -e 's#DNA/DNA/TcMar#DNA/TcMar#g' -e 's#DNA/DNA/Crypton#DNA/Crypton#g' -e 's#DNA/DNA/PIF-Harbinger#DNA/PIF-Harbinger#g' -e 's#DNA/DNA/CMC-Chapaev-3#DNA/CMC-Chapaev-3#g' -e 's#SINE/SINE/RTE#SINE/RTE#g' > PteBra.fa.align.corrected
or you can use semicolon inside sed string expression itself
zcat PteBra.fa.align.gz | sed -e '
s#Simple_repeat/Satellite/Y-chromosome#Simple_repeat/Satellite#g;
s#Unknown/Unknown/Y-chromosome#Unknown/Unknown#g;
s#DNA/DNA/TcMar#DNA/TcMar#g;
s#DNA/DNA/Crypton#DNA/Crypton#g;
s#DNA/DNA/PIF-Harbinger#DNA/PIF-Harbinger#g;
s#DNA/DNA/CMC-Chapaev-3#DNA/CMC-Chapaev-3#g;
s#SINE/SINE/RTE#SINE/RTE#g
' > PteBra.fa.align.corrected
As you already have a proper answer, this is not yet another answer
but a small suggestion for the actual operation.
I imagine writing the sed command in a line may be a messy job. How about
preparing a look-up table which describes a replacee and a replacer
in a line as a csv format like:
table.txt
Simple_repeat/Satellite/Y-chromosome,Simple_repeat/Satellite
Unknown/Unknown/Y-chromosome,Unknown/Unknown
DNA/DNA/TcMar,DNA/TcMar
DNA/DNA/Crypton,DNA/Crypton
DNA/DNA/PIF-Harbinger,DNA/PIF-Harbinger
DNA/DNA/CMC-Chapaev-3,DNA/CMC-Chapaev-3
SINE/SINE/RTE,SINE/RTE
Then you can execute the following awk script to replace the strings:
zcat PteBra.fa.align.gz | awk -F, '
NR==FNR {repl[$1] = $2; next}
{
for (r in repl) gsub(r, repl[r])
print
}
' table.txt - > PteBra.fa.align.corrected
Hope this helps.

how to remove last comma from line in bash using "sed or awk"

Hi I want to remove last comma from a line. For example:
Input:
This,is,a,test
Desired Output:
This,is,a test
I am able to remove last comma if its also the last character of the string using below command: (However this is not I want)
echo "This,is,a,test," |sed 's/,$//'
This,is,a,test
Same command does not work if there are more characters past last comma in line.
echo "This,is,a,test" |sed 's/,$//'
This,is,a,test
I am able to achieve the results using dirty way by calling multiple commands, any alternative to achieve the same using awk or sed regex ?(This is I want)
echo "This,is,a,test" |rev |sed 's/,/ /' |rev
This,is,a test
$ echo "This,is,a,test" | sed 's/\(.*\),/\1 /'
This,is,a test
$ echo "This,is,a,test" | perl -pe 's/.*\K,/ /'
This,is,a test
In both cases, .* will match as much as possible, so only the last comma will be changed.
You can use a regex that matches not-comma, and captures that group, and then restores it in the replacement.
echo "This,is,a,test" |sed 's/,\([^,]*\)$/ \1/'
Output:
This,is,a test
All the answer are based on regex. Here is a non-regex way to remove last comma:
s='This,is,a,test'
awk 'BEGIN{FS=OFS=","} {$(NF-1)=$(NF-1) " " $NF; NF--} 1' <<< "$s"
This,is,a test
In Gnu AWK too since tagged:
$ echo This,is,a,test|awk '$0=gensub(/^(.*),/,"\\1 ","g",$0)'
This,is,a test
One way to do this is by using Bash Parameter Expansion.
$ s="This,is,a,test"
$ echo "${s%,*} ${s##*,}"
This,is,a test

Text Manipulation using sed or AWK

I get the following result in my script when I run it against my services. The result differs depending on the service but the text pattern showing below is similar. The result of my script is assigned to var1. I need to extract data from this variable
$var1=HOST1*prod*gem.dot*serviceList : svc1 HOST1*prod*kem.dot*serviceList : svc3, svc4 HOST1*prod*fen.dot*serviceList : svc5, svc6
I need to strip the name of the service list from $var1. So the end result should be printed on separate line as follow:
svc1
svc2
svc3
svc4
svc5
svc6
Can you please help with this?
Regards
Using sed and grep:
sed 's/[^ ]* :\|,\|//g' <<< "$var1" | grep -o '[^ ]*'
sed deletes every non-whitespace before a colon and commas. Grep just outputs the resulting services one per line.
Using gnu grep and gnu sed:
grep -oP ': *\K\w+(, \w+)?' <<< "$var1" | sed 's/, /\n/'
svc1
svc3
svc4
svc5
svc6
grep is the perfect tool for the job.
From man grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
Sounds perfect!
As far as I'm aware this will work on any grep:
echo "$var1" | grep -o 'svc[0-9]\+'
Matches "svc" followed by one or more digits. You can also enable the "highly experimental" Perl regexp mode with -P, which means you can use the \d digit character class and don't have to escape the + any more:
grep -Po 'svc\d+' <<<"$var1"
In bash you can use <<< (a Here String) which supplies "$var1" to grep on the standard input.
By the way, if your data was originally on separate lines, like:
HOST1*prod*gem.dot*serviceList : svc1
HOST1*prod*kem.dot*serviceList : svc3, svc4
HOST1*prod*fen.dot*serviceList : svc5, svc6
This would be a good job for awk:
awk -F': ' '{split($2,a,", "); for (i in a) print a[i]}'

How to compose custom command-line argument from file lines?

I know about the xargs utility, which allows me to convert lines into multiple arguments, like this:
echo -e "a\nb\nc\n" | xargs
Results in:
a b c
But I want to get:
a:b:c
The character : is used for an example. I want to be able to insert any separator between lines to get a single argument. How can I do it?
If you have a file with multiple lines than you want to change to a single argument changing the NEWLINES by a single character, the paste command is what you need:
$ echo -en "a\nb\nc\n" | paste -s -d ":"
a:b:c
Then, your command becomes:
your_command "$(paste -s -d ":" your_file)"
EDIT:
If you want to insert more than a single character as a separator, you could use sed before paste:
your_command "$(sed -e '2,$s/^/<you_separator>/' your_file | paste -s -d "")"
Or use a single more complicated sed:
your_command "$(sed -n -e '1h;2,$H;${x;s/\n/<you_separator>/gp}' your_file)"
The example you gave is not working for me. You would need:
echo -e "a\nb\nc\n" | xargs
to get a b c.
Coming back to your need, you could do this:
echo "a b c" | awk 'OFS=":" {print $1, $2, $3}'
it will change the separator from space to : or whatever you want it to be.
You can also use sed:
echo "a b c" | sed -e 's/ /:/g
that will output a:b:c.
After all these data processing, you can use xargs to perform the command you want to. Just | xargs and do whatever you want.
Hope it helps.
You can join the lines using xargs and then replace the space(' ' ) using sed.
echo -e "a\nb\nc"|xargs| sed -e 's/ /:/g'
will result in
a:b:c
obviously you can use this output as argument for other command using another xargs.
echo -e "a\nb\nc"|xargs| sed -e 's/ /:/g'|xargs

Resources