multiple sed with -e and escape characters - bash

I'm trying to do multiple replacements in a gzipped file and have been having trouble.
zcat PteBra.fa.align.gz | sed -e 's#Simple_repeat/Satellite/Y-chromosome#Simple_repeat/Satellite#g' -e sed 's#Unknown/Unknown/Y-chromosome#Unknown/Unknown#g' -e sed 's#DNA/DNA/TcMar#DNA/TcMar#g' -e sed 's#DNA/DNA/Crypton#DNA/Crypton#g' -e sed 's#DNA/DNA/PIF-Harbinger#DNA/PIF-Harbinger#g' -e sed 's#DNA/DNA/CMC-Chapaev-3#DNA/CMC-Chapaev-3#g' -e sed 's#SINE/SINE/RTE#SINE/RTE#g' > PteBra.fa.align.corrected
Note that I'm using # instead of the standard / because of the presence of / in the text I want to replace. Each individual sed works with no problem but stringing them together yields this consistent error:
sed: -e expression #2, char 3: unterminated `s' command
I have looked all over for a solution but finally, to get the work done, just did all the sed's individually. It takes FOREVER, so I'd like to get this option working.
I've been at this for hours and would appreciate some help.
What am I doing wrong?
Thanks.

You don't have to write -e sed each time! -e will do.
zcat PteBra.fa.align.gz | sed -e 's#Simple_repeat/Satellite/Y-chromosome#Simple_repeat/Satellite#g' -e 's#Unknown/Unknown/Y-chromosome#Unknown/Unknown#g' -e 's#DNA/DNA/TcMar#DNA/TcMar#g' -e 's#DNA/DNA/Crypton#DNA/Crypton#g' -e 's#DNA/DNA/PIF-Harbinger#DNA/PIF-Harbinger#g' -e 's#DNA/DNA/CMC-Chapaev-3#DNA/CMC-Chapaev-3#g' -e 's#SINE/SINE/RTE#SINE/RTE#g' > PteBra.fa.align.corrected
or you can use semicolon inside sed string expression itself
zcat PteBra.fa.align.gz | sed -e '
s#Simple_repeat/Satellite/Y-chromosome#Simple_repeat/Satellite#g;
s#Unknown/Unknown/Y-chromosome#Unknown/Unknown#g;
s#DNA/DNA/TcMar#DNA/TcMar#g;
s#DNA/DNA/Crypton#DNA/Crypton#g;
s#DNA/DNA/PIF-Harbinger#DNA/PIF-Harbinger#g;
s#DNA/DNA/CMC-Chapaev-3#DNA/CMC-Chapaev-3#g;
s#SINE/SINE/RTE#SINE/RTE#g
' > PteBra.fa.align.corrected

As you already have a proper answer, this is not yet another answer
but a small suggestion for the actual operation.
I imagine writing the sed command in a line may be a messy job. How about
preparing a look-up table which describes a replacee and a replacer
in a line as a csv format like:
table.txt
Simple_repeat/Satellite/Y-chromosome,Simple_repeat/Satellite
Unknown/Unknown/Y-chromosome,Unknown/Unknown
DNA/DNA/TcMar,DNA/TcMar
DNA/DNA/Crypton,DNA/Crypton
DNA/DNA/PIF-Harbinger,DNA/PIF-Harbinger
DNA/DNA/CMC-Chapaev-3,DNA/CMC-Chapaev-3
SINE/SINE/RTE,SINE/RTE
Then you can execute the following awk script to replace the strings:
zcat PteBra.fa.align.gz | awk -F, '
NR==FNR {repl[$1] = $2; next}
{
for (r in repl) gsub(r, repl[r])
print
}
' table.txt - > PteBra.fa.align.corrected
Hope this helps.

Related

Bash script sed

I am trying to use sed in bash script as follows:
#!/bin/bash
for i in `seq 1 10`;
do
j=$(($i-1))
OLD="-option_something something/string1_${j}.txt"
NEW="-option_somehting something/string1_${i}.txt"
sed -e "s/$OLD/$NEW/g" file_to_edit.txt
# sed -e "s/$OLD/$NEW/g" file_to_edit.txt > file_to_edit.txt.tmp && mv file_to_edit.txt.tmp file_to_edit.txt
done
But I keep getting following error:
sed: -e expression #1, char 71: unknown option tos'`
I tried the commented line as well, but it does not work too.
It works fine on command line. I do not know what is the problem in script.
Any suggestions? Thanks.
You have a / in the value of OLD and NEW, which is the same character you're using as the delimiter in your sed expression. So the final expression ends up looking like:
sed -e "s/-option_something something/string1_${j}.txt/-option_somehting something/string1_${i}.txt/g"
Do you see all the / in there? Consider instead:
sed -e "s|$OLD|$NEW|g" file_to_edit.txt
You can use any character as the delimiter for sed's s command.

Case insensitive search matching with sed?

I'm trying to use SED to extract text from two words, such as "Account" and "Recognized", and I'd like that the searching be case insensitive. So I tried to use the I parameter, but receive this error message:
cat Security.txt | sed -n "/Account/,/Recognized/pI" | sed -e '1d' -e '$d'
sed: -e expression #1, char 24: extra characters after command
Avoid useless use of cat
/pattern/I is how to specify case-insensitive matching in sed
sed -n "/Account/I,/Recognized/Ip" Security.txt | sed -e '1d' -e '$d'
You can use single sed command to achieve the same:
sed -n '/account/I,/recognized/I{/account/I!{/recognized/I!p}}' Security.txt
Or awk
awk 'BEGIN{IGNORECASE=1} /account/{f=1; next} /recognized/{f=0} f' Security.txt
Reference:
How to select lines between two patterns?
Use:
sed -n "/Account/,/Recognized/Ip"
i.e. change the order to: Ip instead of pI
You have useless use of cat where you should've fed the file directly to sed. Below could be a way of doing it.
$ cat file.txt
Some stuff Account sllslsljjs Security.
Another stuff account name and ffss security.
$ sed -nE 's/^.*account[[:blank:]]*(.*)[[:blank:]]*security.*$/\1/pI' file.txt
sllslsljjs
name and ffss
The [[:blank:]]* is greedy and will strip the spaces before and after the required text. The -E option enables the use of extended regular expressions.

Sed command to replace comma-separated values with pipe-separated values

My CSV file has records in the following format:
571283,1,"R","01/15/2002","IBMS,SL"
I want to convert them to the following format:
571283|1|R|01/15/2002|IBMS,SL
I tried this:
sed -e 's/ //g' -e 's/\"\,\"/\|/g' -e 's/\,\"/\|/g' -e 's/\"$//' test.csv
but the output I am getting is:
571283,1|R|01/15/2002|IBMS,SL
Please advise.
Using gnu-awk with FPAT:
awk -v FPAT='"[^"]+"|[^,]+' -v OFS='|' '{for(i=1; i<=NF; i++) gsub(/"/, "", $i)} 1' file
571283|1|R|01/15/2002|IBMS,SL
In case gnu-awk is unavailable use this perl command:
perl -pe 's/(?=(([^"]*"){2})*[^"]*$),/|/g; s/"//g' file
571283|1|R|01/15/2002|IBMS,SL
This works:
sed -e 's/,/|/' -e 's/\"\,\"/\|/g' -e 's/\,\"/\|/g' -e 's/\"$//' test.csv
Result is:
571283|1|R|01/15/2002|IBMS,SL
Your first sequence:
-e 's/ //g'
has to be changed in:
-e 's/,/|/'
Expanding to reply to your comment.
First of all you have to take care of the fact that sed is sequential, so the order of the transformations is important.
In your string:
Market Basket - WF Note A-2,RECM-PS Transfer,09/22/2015,"330930929, 330931800",,
You have same characters that you want to transform in different ways. The use of g for global and the sequence of transformations are therefore very important.
Let us build the sequence:
first of all let us get rid of ",, that we want to transform in ||:
sed -e 's/\"\,\,/||/' test.csv
will give us:
Market Basket - WF Note A-2,RECM-PS Transfer,09/22/2015,"330930929, 330931800||
Then we do the same with ," that we want to become |:
sed -e 's/\"\,\,/||/' -e 's/\,\"/|/' test.csv
gives:
Market Basket - WF Note A-2,RECM-PS Transfer,09/22/2015|330930929, 330931800||
Now we still have 2 commas we want to transform in | but not the third, the simply way is to repeat twice that transformation:
sed -e 's/\"\,\,/||/' -e 's/\,\"/|/' -e 's/,/|/' -e 's/,/|/' test.csv
Market Basket - WF Note A-2|RECM-PS Transfer|09/22/2015|330930929, 330931800||
That is!
Advise:
Think to your sequence of transformations strategy;
Apply one transformation at the time and look at the result, remembering that that will be fed to the next one.
But finally I think you need to transform BOTH strings at the same time, so that:
Market Basket - WF Note A-2,RECM-PS Transfer,09/22/2015,"330930929, 330931800",,
571283,1,"R","01/15/2002","IBMS,SL"
becomes:
Market Basket - WF Note A-2|RECM-PS Transfer|09/22/2015|330930929, 330931800||
571283|1|R|01/15/2002|IBMS,SL
This is the sequence of transformations that does that:
sed -e 's/\",\"/|/' -e 's/\"\,\,/||/' -e 's/\,\"/|/' -e 's/,/|/' -e 's/,/|/' -e 's/\"//g' test.csv
Regards
Generally, using a bona fide CSV parser is the simplest and most robust choice when it comes to parsing CSV data.
For instance, python's CSV parsing provides a straightforward solution:
$ python -c 'import csv,sys; reader=csv.reader(sys.stdin)
for row in reader:
print("|".join(row))' < test.csv
571283|1|R|01/15/2002|IBMS,SL
As a one-liner (in bash, ksh, or zsh):
python -c $'import csv,sys; reader=csv.reader(sys.stdin)\nfor row in reader:\n print("|".join(row))' < test.csv
sed -e 's/ //g' -e 's/\"\,\"/\|/g' -e 's/\,\"/\|/g' -e 's/\"$//' -e 's/\,/\|/g' btest.txt

sed: file skripta.txt line 1: unknown option to `s'

I try to use:
sed -e 's/miza/stol/g' datoteka1.txt | sed -e '/klop/d' | sed -e '/^$/d' | sed -e 's/janez/Janez/g'
in a file named skripta.txt with "sed -f skripta.txt > datoteka2.txt" to save it in another file and I get this error mentioned in title.
If I run this code seperately it works just fine.
What is wrong here???
This is a shell script that uses sed, not a sed script.
Run it with bash skripta.txt > datoteka2.txt
So let me get this straight:
You have a file called skripta.txt that contains only this line:
sed -e 's/miza/stol/g' datoteka1.txt | sed -e '/klop/d' | sed -e '/^$/d' | sed -e 's/janez/Janez/g'
And you try to run it with
$ sed -f skripta.txt
Is that correct?
The error is not surprising then, because it expects just the sed commands, not a call to sed itself, inside the script file.
It interprets the initial s of the first sed as 'search and replace', but the following syntax ed doesn't match.
You can either change the skripta.txt into a shell script: (You could also change it's name into skripta.sh):
#!/usr/bin/env bash
sed -e 's/miza/stol/g' datoteka1.txt | sed -e '/klop/d' | sed -e '/^$/d' | sed -e 's/janez/Janez/g'
change it's mode to executable:
$ chmod u+x skripta.sh
Then you can just call it:
$ ./skripta.sh
Or you can turn it into a sed script by removing all the seds:
s/miza/stol/g
/klop/d
/^$/d
s/janez/Janez/g
Then you can run it with
$ sed -f skripta.txt < datoteka1.txt > datoteka2.txt

Replace the string content in sed with special chars

I have the code like this:
sed "s/TEST_CASES_R/$testCaseLocations/g" template >> newfile
where $testCaseLocations has ,tests/test/,tests/test/2. So this line is failing like:
bad flag in substitute command
How can I solve this?
Ah, sed code injection. What sed sees is
sed "s/TEST_CASES_R/,tests/test/,tests/test/2/g" template >> newfile
...which is nonsensical. The root problem is that sed cannot differentiate between the things you want it to see as data -- the contents of $testCaseLocations -- and instructions.
The best solution in my opinion is to use awk:
awk -v replacement="$testCaseLocations" '{ gsub(/TEST_CASES_R/, replacement); print }' template >> newfile
because this neatly sidesteps code injection problems by not treating testCaseLocations as code. You could, in this particular case, also use a different delimiter for sed, such as
sed "s#TEST_CASES_R#$testCaseLocations#g" template >> newfile
but then you'd run into trouble if $testCaseLocations contains a #, or if it contains a character that has meaning for sed in the context where it appears, such as \ or &.
Just use another separator for sed, otherwise it sees so many slashes around: sed 's#hello#bye#g' is fine.
In your case:
sed "s#TEST_CASES_R#$testCaseLocations#g" template >> newfile
See another test:
$ var="/hello"
$ echo "test" | sed "s/test/$var/g"
sed: -e expression #1, char 9: unknown option to `s'
$ echo "test" | sed "s#test#$var#g"
/hello

Resources