using sed to remove "number" - bash

I have this xml file
</license>
<parameters pca-dim="32"/>
<parameters resize_minpix="100000" npix="100000" ptch="24" step="4" nscale="5" maxscale="4"/>
<parameters notify-classes-removed="1"/>
<parameters grid-regions="1x1,1x3"/>
<feature_extractions>
<feature_extraction id="orh" params="8,4:0.7,0.5:0.4,0.6:0.01"/>
<feature_extraction id="col" params="4:mv:0.4,0.6:0.01"/>
</feature_extractions>
<vocabulary rebuild="IfDoesNotExist" gmm-iter="8" sig-norm-type="l2" sig-norm-pow="0.5"/>
<classifier type="sgd" lambda="1.0E-5" max-iterations="20"/>
<validation name="V1CrossValidation" folds="5" mode="fast" method="modulo" result-file="/opt/ADL_db/Users/mkhalil/CshellTest/ScriptTests/temp/V1CrossValidation-results.stats" score-flags="combine,normalize"/>
I would like to use sed command to change folds="5" to folds="6"

Try this :
sed -e 's/ folds="5"/ folds="6"/g' file.xml
You can omit the g modifier, it's for general replacement (all occurrences that match will be replaced).
If you want to write to the file rather than print to the standard output, use the -i (for in place) option :
sed -i 's/ folds="5"/ folds="6"/g' file.xml
If OS X, you can use :
sed -i '' 's/ folds="5"/ folds="6"/g' file.xml

Related

Transform multiple files in a directory in unix

I have a folder with the name Translated_cds.
in this folder, there are 52 text files. these are FASTA files that have information about proteins.
>lcl|NZ_JPMI01000003.1_prot_WP_043388330.1_1 [locus_tag=Q664_RS00010] [protein=HAMP domain-containing protein] [protein_id=WP_043388330.1] [location=complement(30..1904)] [gbkey=CDS]
MRIRTRLLLLLIVTAAVPTLAVGLLAWRDAERALSEAVAEQHRRTALAEAEHAATHVLSLATELGGALVHQEPLELGPSE
AQEFLIRVFLRRDRIAQVGLFDARGQLTASVFVDDPEAFARQEPQFRRHDTVAAGEVEDFQRRASELLSQVPEGRAYAIS
APYLTGVRRRPAVVVAARAPGTRTGGLAAELGLEELSQRLAARGVGDERVFLLDGAGRLLLDGEPERERHEDFTGKLPGA
VGARQTGLAAYEEEGRAWLAAYSPVPELGWVAVVARPREAALAPLHALARSTYGVLGLTLLGVLALALMLARALARPIAR
LAEGARALARGNLAHRISLKRRDELGDLARAFNDMGQALEQAHRELLGFNEQLAAQVEERTRELQQTQVQLSRSQRLAAM
GDLAAGMAHEMNNPLAAVLGNVQLMLMDLPKEDPSHRMLGTVHQQAQRIASIVRELQLLSERQQLGRLPLDLHRMLQRVL
ESRCAELSQVGVHVDCRFHPGEVKVLGDTQALGDVLGRLLGNALNAMRDRPERNLVLSTQVVDAEVVRVEMKDTGRGIAR
EHLERIFNPFFTTKQQWTGKGLSLAVCHRVIEDHGGTITLDSVEGVGTTVTLVLPAAPASSGLV
the line starting with > (called the header)is present in all the files. I want to replace the gap ' ' in the headers with _.
till now i have tried this
sed -i 's/ /_/g' Translated_cds*
We can lead with /^>/ to gate the substitution so that it isolates to the pattern we are interested in:
sed -i -e '/^>/ s/ /_/g' Translated_cds*
My test:
echo '>lcl|NZ_JPMI01000003.1_prot_WP_043388330.1_1 [locus_tag=Q664_RS00010] [protein=HAMP domain-containing protein] [protein_id=WP_043388330.1] [location=complement(30..1904)] [gbkey=CDS]
MRIRTRLLLLLIVTAAVPTLAVGLLAWRDAERALSEAVAEQHRRTALAEAEHAATHVLSLATELGGALVHQEPLELGPSE
AQEFLIRVFLRRDRIAQVGLFDARGQLTASVFVDDPEAFARQEPQFRRHDTVAAGEVEDFQRRASELLSQVPEGRAYAIS
APYLTGVRRRPAVVVAARAPGTRTGGLAAELGLEELSQRLAARGVGDERVFLLDGAGRLLLDGEPERERHEDFTGKLPGA
VGARQTGLAAYEEEGRAWLAAYSPVPELGWVAVVARPREAALAPLHALARSTYGVLGLTLLGVLALALMLARALARPIAR
LAEGARALARGNLAHRISLKRRDELGDLARAFNDMGQALEQAHRELLGFNEQLAAQVEERTRELQQTQVQLSRSQRLAAM
GDLAAGMAHEMNNPLAAVLGNVQLMLMDLPKEDPSHRMLGTVHQQAQRIASIVRELQLLSERQQLGRLPLDLHRMLQRVL
ESRCAELSQVGVHVDCRFHPGEVKVLGDTQALGDVLGRLLGNALNAMRDRPERNLVLSTQVVDAEVVRVEMKDTGRGIAR
EHLERIFNPFFTTKQQWTGKGLSLAVCHRVIEDHGGTITLDSVEGVGTTVTLVLPAAPASSGLV' | sed -e '/^>/ s/ /_/g'
My result:
>lcl|NZ_JPMI01000003.1_prot_WP_043388330.1_1_[locus_tag=Q664_RS00010]_[protein=HAMP_domain-containing_protein]_[protein_id=WP_043388330.1]_[location=complement(30..1904)]_[gbkey=CDS]
MRIRTRLLLLLIVTAAVPTLAVGLLAWRDAERALSEAVAEQHRRTALAEAEHAATHVLSLATELGGALVHQEPLELGPSE
AQEFLIRVFLRRDRIAQVGLFDARGQLTASVFVDDPEAFARQEPQFRRHDTVAAGEVEDFQRRASELLSQVPEGRAYAIS
APYLTGVRRRPAVVVAARAPGTRTGGLAAELGLEELSQRLAARGVGDERVFLLDGAGRLLLDGEPERERHEDFTGKLPGA
VGARQTGLAAYEEEGRAWLAAYSPVPELGWVAVVARPREAALAPLHALARSTYGVLGLTLLGVLALALMLARALARPIAR
LAEGARALARGNLAHRISLKRRDELGDLARAFNDMGQALEQAHRELLGFNEQLAAQVEERTRELQQTQVQLSRSQRLAAM
GDLAAGMAHEMNNPLAAVLGNVQLMLMDLPKEDPSHRMLGTVHQQAQRIASIVRELQLLSERQQLGRLPLDLHRMLQRVL
ESRCAELSQVGVHVDCRFHPGEVKVLGDTQALGDVLGRLLGNALNAMRDRPERNLVLSTQVVDAEVVRVEMKDTGRGIAR
EHLERIFNPFFTTKQQWTGKGLSLAVCHRVIEDHGGTITLDSVEGVGTTVTLVLPAAPASSGLV
If we want only the spaces within the keyword/value tags of the header replaced, then:
sed -i -e '/^>/ s/\([A-Za-z0-9]\) \([[A-Za-z0-9]\)/\1_\2/g' Translated_cds*
Or.... We can clarify a bit with more modern regex:
sed -i -E '/^>/ s/([[:alnum:]]) ([[:alnum:]])/\1_\2/g' Translated_cds*
The result will change only inside the header's keyword/value tags:
>lcl|NZ_JPMI01000003.1_prot_WP_043388330.1_1 [locus_tag=Q664_RS00010] [protein=HAMP_domain-containing_protein] [protein_id=WP_043388330.1] [location=complement(30..1904)] [gbkey=CDS]
MRIRTRLLLLLIVTAAVPTLAVGLLAWRDAERALSEAVAEQHRRTALAEAEHAATHVLSLATELGGALVHQEPLELGPSE
AQEFLIRVFLRRDRIAQVGLFDARGQLTASVFVDDPEAFARQEPQFRRHDTVAAGEVEDFQRRASELLSQVPEGRAYAIS
APYLTGVRRRPAVVVAARAPGTRTGGLAAELGLEELSQRLAARGVGDERVFLLDGAGRLLLDGEPERERHEDFTGKLPGA
VGARQTGLAAYEEEGRAWLAAYSPVPELGWVAVVARPREAALAPLHALARSTYGVLGLTLLGVLALALMLARALARPIAR
LAEGARALARGNLAHRISLKRRDELGDLARAFNDMGQALEQAHRELLGFNEQLAAQVEERTRELQQTQVQLSRSQRLAAM
GDLAAGMAHEMNNPLAAVLGNVQLMLMDLPKEDPSHRMLGTVHQQAQRIASIVRELQLLSERQQLGRLPLDLHRMLQRVL
ESRCAELSQVGVHVDCRFHPGEVKVLGDTQALGDVLGRLLGNALNAMRDRPERNLVLSTQVVDAEVVRVEMKDTGRGIAR
EHLERIFNPFFTTKQQWTGKGLSLAVCHRVIEDHGGTITLDSVEGVGTTVTLVLPAAPASSGLV

Search Replace Sed using wild card

Hello I'm sort of new to scripting and have the following problem
I need to replace 20069.1216.0
HintPath..\packages\String.20069.1216.0\lib\net\Thoo.Tkc.dll/HintPath
This works fine at replacing 20069.1216.0 with whatever is provided in $2
`xargs sed -i 's/String.20.........0/String.'"${2}"'/g'`
I need a way for sed to search for **"String.*\lib\net\"**where anything in between **String.** and **\lib...** is wildcard
This what i have tried
sed -i 's/String.*\/String.'"${2}"'/g'
sed -i 's/String.*\\/String.'"${2}"'/g'
sed -i 's/String.\(.*\)\\/String.'"${2}"'/g'
I'll assume that you call sed inside a function.
So try this code:
#!/bin/bash
replace() {
echo 'HintPath..\packages\String.20069.1216.0\lib\net\Thoo.Tkc.dll/HintPath' |\
sed "s/String\.[0-9.]*/String\.${2}/"
}
replace dont_care filename
If your path may contain --SNAPSHOT this solution should work for you:
#!/bin/bash
replace() {
echo 'HintPath..\packages\String.20069.1216.0--SNAPSHOT\lib\net\Thoo.Tkc.dll/HintPath' |\
sed "s/String[0-9.]*\(--SNAPSHOT\)\{0,1\}/String\.${2}/"
}
replace dont_care filename

removing all white spaces for lines starting with a pattern

I need to remove all the white spaces for lines which starting with a pattern in a file.
I don't want to loop through lines. Is there any simple and quick solution?
For example
Input file:
<id xxx>dafd</id>
<r>31,31, 31</r>
<r> 0, 0,0 </r>
The output file need to be
<id xxx>dafd</id>
<r>31,31,31</r>
<r>0,0,0</r>
Like this?:
echo "<id xxx>dafd</id>
<r>31,31, 31</r>
<r> 0, 0,0 </r>" | sed -r '/<r>/s/ //g;'
<id xxx>dafd</id>
<r>31,31,31</r>
<r>0,0,0</r>
Explanation:
sed -r : use extended regular expresions
/<r>/ : Lines matching
s/ //g; : Substitute blanks with nothing, globally.
Hi you can do it by the below script. First create a file like mytream.sh and add below lines and change the permission of the file and execute:-
vi mytream.sh
now add below lines:-
#!/bin/bash
file_to_tream="yourfilename"
sed '/<r>/s/ //g' $file_to_tream > tmp.txt
mv tmp.txt $file_to_tream
Or if you do it for any file, just change your script like below and provide the file name in command prompt
#!/bin/bash
sed '/<r>/s/ //g' $1 > tmp.txt
mv tmp.txt $1
Now run it like
chmod 777 mytream.sh
./mytream.sh yourfileName
Hope this will help you.

String manipulation via script

I am trying to get a substring between &DEST= and the next & or a line break.
For example :
MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546
In this I need to extract "SFO"
MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546
In this I need to extract "SANFRANSISCO"
MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE
In this I need to extract "SANJOSE"
I am reading a file line by line, and I need to update the text after &DEST= and put it back in the file. The modification of the text is to mask the dest value with X character.
So, SFO should be replaced with XXX.
SANJOSE should be replaced with XXXXXXX.
Output :
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX
Please let me know how to achieve this in script (Preferably shell or bash script).
Thanks.
$ cat file
MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=PORTORICA
MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE
$ sed -E 's/^.*&DEST=([^&]*)[&]*.*$/\1/' file
SFO
PORTORICA
SANFRANSISCO
SANJOSE
should do it
Replacing airports with an equal number of Xs
Let's consider this test file:
$ cat file
MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE
To replace the strings after &DEST= with an equal length of X and using GNU sed:
$ sed -E ':a; s/(&DEST=X*)[^X&]/\1X/; ta' file
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX
To replace the file in-place:
sed -i -E ':a; s/(&DEST=X*)[^X&]/\1X/; ta' file
The above was tested with GNU sed. For BSD (OSX) sed, try:
sed -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta file
Or, to change in-place with BSD(OSX) sed, try:
sed -i '' -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta file
If there is some reason why it is important to use the shell to read the file line-by-line:
while IFS= read -r line
do
echo "$line" | sed -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta
done <file
How it works
Let's consider this code:
search_str="&DEST="
newfile=chart.txt
sed -E ':a; s/('"$search_str"'X*)[^X&]/\1X/; ta' "$newfile"
-E
This tells sed to use Extended Regular Expressions (ERE). This has the advantage of requiring fewer backslashes to escape things.
:a
This creates a label a.
s/('"$search_str"'X*)[^X&]/\1X/
This looks for $search_str followed by any number of X followed by any character that is not X or &. Because of the parens, everything except that last character is saved into group 1. This string is replaced by group 1, denoted \1 and an X.
ta
In sed, t is a test command. If the substitution was made (meaning that some character needed to be replaced by X), then the test evaluates to true and, in that case, ta tells sed to jump to label a.
This test-and-jump causes the substitution to be repeated as many times as necessary.
Replacing multiple tags with one sed command
$ name='DEST|ORIG'; sed -E ':a; s/(&('"$name"')=X*)[^X&]/\1X/; ta' file
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=XXXX
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=XXXX
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX
Answer for original question
Using shell
$ s='MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546'
$ s=${s#*&DEST=}
$ echo ${s%%&*}
SFO
How it works:
${s#*&DEST=} is prefix removal. This removes all text up to and including the first occurrence of &DEST=.
${s%%&*} is suffix removal_. It removes all text from the first & to the end of the string.
Using awk
$ echo 'MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546' | awk -F'[=\n]' '$1=="DEST"{print $2}' RS='&'
SFO
How it works:
-F'[=\n]'
This tells awk to treat either an equal sign or a newline as the field separator
$1=="DEST"{print $2}
If the first field is DEST, then print the second field.
RS='&'
This sets the record separator to &.
With GNU bash:
while IFS= read -r line; do
[[ $line =~ (.*&DEST=)(.*)((&.*|$)) ]] && echo "${BASH_REMATCH[1]}fooooo${BASH_REMATCH[3]}"
done < file
Output:
MYREQUESTISTO8764GETTHIS&DEST=fooooo&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=fooooo&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=fooooo
Replace the characters between &DEST and & (or EOL) with x's:
awk -F'&DEST=' '{
printf("%s&DEST=", $1);
xlen=index($2,"&");
if ( xlen == 0) xlen=length($2)+1;
for (i=0;i<xlen;i++) printf("%s", "X");
endstr=substr($2,xlen);
printf("%s\n", endstr);
}' file

Why is my file filled with extbar after running sed?

Based on the information at https://tex.stackexchange.com/questions/48933/which-symbols-need-to-be-escaped-in-context, I want to prepare a file for use with ConTeXt. I need to make several replacements:
Replace # with \#.
Replace % with \percent.
Replace | with \textbar.
Replace $ with \textdollar.
Replace _ with \textunderscore.
Replace ~ with \textasciitilde.
Replace { with \textbraceleft.
Replace } with \textbraceright.
I have tried using the information from Replacing "#", "$", "%", "&", and "_" with "\#", "\$", "\%", "\&", and "\_" to do these replacements:
sed -i 's/\&/\\\&/g' ./File.csv
sed -i 's/\#/\\\#/g' ./File.csv
sed -i 's/\%/\\\percent/g' ./File.csv
sed -i 's/\|/\\\textbar/g' ./File.csv
sed -i 's/\$/\\\textdollar/g' ./File.csv
sed -i 's/\_/\\\textunderscore/g' ./File.csv
sed -i 's/\~/\\\textasciitilde/g' ./File.csv
sed -i 's/\{/\\\textbraceleft/g' ./File.csv
sed -i 's/\}/\\\textbraceright/g' ./File.csv
Unfortunately, when I run these scripts, the entire file is changed to a bunch of strange letters, numbers, and the words "extbar" everywhere.
How can I make these replacements?
Why is "extbar" appearing in my file after running these commands?
when you do
sed -i 's/|/\\\textbar/g' ./File.csv
sed reads it as s/|/\\\textbar/g \\ becomes \ and \t becomes tab character.
Try
sed -i "s/|/\\\textbar/g"
or
sed -i 's/|/\\textbar/g'
Use four backslashes instead of the to escape. They are evaluated twice. Following, you have the character \tas replacement, followed by the string 'extbar'(from \textbar)
This might work for you:
cat <<\! >Village.sed
s/&/\\&/g
s/#/\\#/g
s/%/\\percent/g
s/|/\\textbar/g
s/\$/\\textdollar/g
s/_/\\textunderscore/g
s/~/\\textasciitilde/g
s/{/\\textbraceleft/g
s/}/\\textbraceright/g
!
sed -f Village.sed ./File.csv
Not sure why "extbar" is appearing in your file probably to do with the line s/\|/\\\textbar/g where \| means alternation.
See here:
echo foo | sed 's/\|/\\bar/'
\barfoo
echo foo | sed 's/|/\\bar/'
foo

Resources